1 Introduction

This paper adds to the growing line of work on circuit-analysis algorithms, where we are given as input a Boolean circuit C from a fixed class \({\mathcal {C}}\) computing a function \(f:\{-1,1\}^n \rightarrow \{-1,1\}\),Footnote 1 and we are required to compute some parameter of the function f. A typical example of this is the question of satisfiability, i.e. whether f is the constant function 1 or not. In this paper, we are interested in computing #SAT(f), which is the number of satisfying assignments of f (i.e. \(\left| \{a \in \{-1,1\}^n \mid f(a) = -1\}\right| \)).

Problems of this form can always be solved by “brute-force” in time \(\mathop {\mathrm {poly}}(|C|)\cdot 2^n\) by trying all assignments to C. The question is can this brute-force algorithm be significantly improved, say to time \(2^n/n^{\omega (1)}\) when C is small, say \(|C| \le n^{O(1)}\).

Such algorithms, intuitively are able to distinguish a small circuit \(C \in {\mathcal {C}}\) from a “black-box” and hence find some structure in C. This structure, in turn, is useful in answering other questions about \({\mathcal {C}}\), such as proving lower bounds against the class \({\mathcal {C}}\).Footnote 2 There has been a large body of work in this area, a small sample of which can be found in [26, 27, 33, 37]. A striking result of this type was proved by Williams [33] who showed that for many circuit classes \({\mathcal {C}}\), even co-non-deterministic satisfiability algorithms running in better than brute-force time yield lower bounds against \({\mathcal {C}}\).

Recently, researchers have also uncovered tight connections between many combinatorial problems and circuit-analysis algorithms, showing that even modest improvements over brute-force search can be used to improve long-standing bounds for these combinatorial problems (see, e.g., [1,2,3, 38]). This yields further impetus in improving known circuit-analysis algorithms.

This paper is concerned with #SAT algorithms for constant depth threshold circuits, denoted as \(\mathrm {TC}^0\), which are Boolean circuits where each gate computes a linear threshold function (LTF); an LTF computes a Boolean function which accepts or rejects based on the sign of a (real-valued) linear polynomial evaluated on its input. Such circuits are surprisingly powerful: for example, they can perform all integer arithmetic efficiently [4, 10], compute conjectured families of pseudorandom functions [23] (and hence are not amenable to Natural lower bound proof techniques in the sense defined by Razborov and Rudich [28]) and are at the frontier of our current lower bound techniques [8, 21].

It is natural, therefore, to try to come up with circuit-analysis algorithms for threshold circuits. Indeed, there has been a large body of work in the area (reviewed in the Previous Work paragraph later in the Introduction), but some extremely simple questions remain open.

An example of such a question is the existence of a better-than-brute-force algorithm for satisfiability of degree-k PTFs where k is a constant greater than 2. Informally, the question is the following: we are given a degree-k polynomial \(Q(x_1, \ldots , x_n)\) in n Boolean variables and we ask if there is any Boolean assignment \(a \in \{-1,1\}^n\) to \(x_1,\ldots , x_n\) such that \(Q(a) <0\). Surprisingly, no algorithm is known for this problem that is significantly better than \(2^n\) time.Footnote 3

Note that for a linear polynomial (i.e. \(k=1\)), this problem is trivial. For \(k=2\) the problem is already non-trivial. While not noted explicitly in the literature, a better-than-brute-force algorithm for satisfiability of 2-PTFs is implied by the results from [32, 35]. However, the stronger counting variant of this problem for 2-PTFs is open as far as we know.

In this paper, we solve the counting variant of this problem for any constant-degree PTFs. We start with some definitions and then describe this result.

Definition 1

(Polynomial Threshold Functions) A Polynomial Threshold Function (PTF) on n variables of degree-k is a Boolean function \(f:\{-1,1\}^n\rightarrow \{-1,1\}\) such that there is a degree-k multilinear polynomial \(P(x_1,\ldots ,x_n)\in \mathbb {R}[x_1,\ldots ,x_n]\) that, for all \(a\in \{-1,1\}^n,\) satisfies \(f(a) = \mathrm {sgn}(P(a)).\) (We assume that \(P(a)\ne 0\) for any \(a\in \{-1,1\}^n\).)

In such a scenario, we call f a k-PTF. In the special case that \(k = 1\), we call f a Linear Threshold function (LTF). We also say that the polynomial P sign-represents f.

When \(P\in \mathbb {Z}[x_1,\ldots ,x_n]\), we define the weight of P, denoted w(P), to be the bit-complexity of the sum of the absolute values of all the coefficients of P. In particular, the coefficients of P are integers in the range \([-2^{w(P)},2^{w(P)}].\)

We now formally define the #SAT problem for k-PTFs. Throughout, we assume that k is a constant and not a part of the input.

Definition 2

(#SAT problem for k-PTFs) The problem is defined as follows.

Input: A k-PTF f, specified by a degree-k polynomial \(P(x_1,\ldots ,x_n)\) with integer coefficients.Footnote 4

Output: The number of satisfying assignments to f. That is, the number of \(a\in \{-1,1\}^n\) such that \(P(a) < 0.\)

We use #SAT(f) to denote this output. We say that the input instance has parameters (nM) if n is the number of input variables and \(w(P)\le M\).

Remark 3

An interesting setting of M is \(\mathop {\mathrm {poly}}(n)\) since any k-PTF can be represented by an integer polynomial with coefficients of bit-complexity at most \({\tilde{O}}(n^k)\) [22]. However, note that our algorithms work even when M is \(\exp (n^{o(1)}),\) i.e. when the weights are slightly short of doubly exponential in n.

We give a better-than-brute-force algorithm for #SAT\((k\text {-PTF})\). Formally we prove the following theorem.

Theorem 4

Fix any constant k. There is a deterministic algorithm that solves the #SAT problem for k-PTFs in time \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n-S}\) where \(S = {\tilde{\Omega }}(n^{1/(k+1)})\) and (nM) are the parameters of the input k-PTF f. (The \({\tilde{\Omega }}(\cdot )\) hides factors that are inverse polylogarithmic in n.)

Remark 5

An anonymous ITCS 2019 referee pointed out to us that from two results of Williams [32, 35], it follows that satisfiability for 2-PTFs can be solved in \(2^{n-\Omega (\sqrt{n})}\) time. Note that this is better than the runtime of our algorithm. However, the method does not seem to extend to \(k\ge 3\).

Using a different approach, we give another algorithm for the same problem. This result is incomparable to Theorem 4. While the running time is better (and comparable to Williams’ algorithm mentioned above when \(k=2\)) as long as M is subexponential in n, the algorithm is zero-error randomized.Footnote 5

Theorem 6

Fix any constant k. There is a zero-error randomized algorithm that solves the #SAT problem for k-PTFs in time \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n-S}\) where \(S = {\Omega }(n^{1/k})\) and (nM) are the parameters of the input k-PTF f.

We then extend this result to a powerful model of circuits called k-PTF circuits, where each gate computes a k-PTF. This model was first studied by Kane, Kabanets and Lu [17] who proved strong average case lower bounds for slightly superlinear-size constant-depth k-PTF circuits. Using these ideas, Kabanets and Lu [18] were able to give a #SAT algorithm for a restricted class of k-PTF circuits, where each gate computes a PTF with a subquadratically many, say \(n^{1.99}\), monomials (while the size remains the same, i.e. slightly superlinear).Footnote 6 A reason for this restriction on the PTFs was that they did not have an algorithm to handle even a single degree-2 PTF (which can have \(\Omega (n^2)\) many monomials).

Building on our #SAT algorithm for k-PTFs and the ideas of [18], we are able to handle general k-PTF circuits of slightly superlinear size. We state these results formally below.

We first define k-PTF circuits formally.

Definition 7

(k-PTF circuits) A k-PTF circuit on n variables is a Boolean circuit on n variables where each gate g of fan-in m computes a fixed k-PTF of its m inputs. The size of the circuit is the number of wires in the circuit, and the depth of the circuit is the longest path from an input to the output gate.Footnote 7

The problems we consider is the #SAT problem for k-PTF circuits, defined as follows.

Definition 8

(#SAT problem for k-PTF circuits) The problem is defined as follows.

Input: A k-PTF circuit C, where each gate g is labelled by an integer polynomial that sign-represents the function that is computed by g.

Output: The number of satisfying assignments to C.

We use #SAT(C) to denote this output. We say that the input instance has parameters (nsdM) where n is the number of input variables, s is the size of C, d is the depth of C and M is the maximum over the weights of the degree-k polynomials specifying the k-PTFs in C. We will say that M is the weight of C, denoted by w(C).

We now state our result on #SAT for k-PTF circuits. The following result also implies a zero-error randomized version of Theorem 4.

Theorem 9

Fix any constants kd. Then the following holds for some constant \(\varepsilon _{k,d} > 0\) depending on kd. There is a zero-error randomized algorithm that solves the #SAT problem for k-PTF circuits of size at most \(s = n^{1+\varepsilon _{k,d}}\) with probability at least 1/4 and outputs ? otherwise. The algorithm runs in time \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n-S}\), where \(S = n^{\varepsilon _{k,d}}\) and (nsdM) are the parameters of the input k-PTF circuit.

We note that in the Williams [33] framework of proving lower bounds via satisfiability algorithms, zero-error randomized algorithms are as good as deterministic algorithms (as noted already above, even co-non-deterministic algorithms are good enough). However, the above theorem does not imply any new lower bounds, as slightly superlinear-size k-PTF circuits already follows from the work of Kane, Kabanets and Lu [17].

Previous work Satisfiability algorithms for \(\mathrm {TC}^0\) have been widely investigated. Impagliazzo, Lovett, Paturi and Schneider [14, 16] give algorithms for checking satisfiability of depth-2 threshold circuits with O(n) gates. The former result was improved by Chen and Santhanam [6]. An incomparable result was proved by Williams [36] who obtained algorithms for subexponential-sized circuits from the class \(\mathrm {ACC}^0 \circ \text {LTF}\), which is a subclass of subexponential \(\mathrm {TC}^0\).Footnote 8 For the special case of k-PTFs (and generalizations to sparse PTFs over the \(\{0,1\}\) basis) with small weights, a #SAT algorithm was devised by Sakai et al. [31].Footnote 9 The high-level idea of our algorithm is the same as theirs.

For general constant-depth threshold circuits, the first satisfiability algorithm was given by Chen, Santhanam and Srinivasan [7]. In their paper, Chen et al. gave the first average case lower bound for \(\mathrm {TC}^0\) circuits of slightly super linear size \(n^{1+\varepsilon _d}\), where \(\varepsilon _d\) depends on the depth of the circuit. (These are roughly the strongest size lower bounds we know for general \(\mathrm {TC}^0\) circuits even in the worst case [15].) Using their ideas, they gave the first (zero-error randomized) improvement to brute-force-search for satisfiability algorithms (and indeed even #SAT algorithms) for constant depth \(\mathrm {TC}^0\) circuits of size at most \(n^{1+\varepsilon _d}\).

The lower bound results of [7] were extended to the much more powerful class of k-PTF circuits (of roughly the same size as [7]) by Kane, Kabanets and Lu [17]. In a follow-up paper, Kabanets and Lu [18] considered the satisfiability question for k-PTF circuits, and could resolve this question in the special case that each PTF is subquadratically sparse, i.e. has \(n^{2-\Omega (1)}\) monomials. One of the reasons for this sparsity restriction is that their strategy does not seem to yield a SAT algorithm for a single degree-2 PTF (which is a depth-1 2-PTF circuit of linear size).

1.1 Proof outline

For simplicity we discuss SAT algorithms instead of #SAT algorithms.

1.1.1 Satisfiability algorithm for k-PTFs

At a high level, we follow the same strategy as Sakai et al. [31]. Their algorithm uses memoization, which is a standard and very useful strategy for satisfiability algorithms (see, e.g. [29]). Let \({\mathcal {C}}\) be a circuit class and \({\mathcal {C}}_n\) be the subclass of circuits from \({\mathcal {C}}\) that have n variables. Memoization algorithms for \({\mathcal {C}}\)-SAT fit into the following two-step template.

  • Step 1 Solve by brute-force all instances of \({\mathcal {C}}\)-SAT where the input circuit \(C' \in {\mathcal {C}}_m\) for some suitable \(m \ll n\). (Typically, \(m = n^\varepsilon \) for some constant \(\varepsilon \).) Usually this takes \(\exp (m^{O(1)}) \ll 2^n\) time.

  • Step 2 On the input \(C \in {\mathcal {C}}_n\), set all input variables \(x_{m+1}, \ldots , x_n\) to Boolean values and for each such setting, obtain \(C'' \in {\mathcal {C}}_m\) on m variables. Typically \(C''\) is a circuit for which we have solved satisfiability in Step 1 and hence by a simple table lookup, we should be able to check if \(C''\) is satisfiable in \(\mathop {\mathrm {poly}}(|C|)\) time. Overall, this takes time \(O^*(2^{n-m}) \ll 2^n\).

At first sight, this seems perfect for k-PTFs, since it is a standard result that the number of k-PTFs on m variables is at most \(2^{O(m^{k+1})}\) [5]. Thus, Step 1 can be done in \(2^{O(m^{k+1})} \ll 2^n\) time.

For implementing Step 2, we need to ensure that the lookup (for satisfiability for k-PTFs on m variables) can be done quickly. Unfortunately how to do this is unclear. The following two ways suggest themselves.

  • Store all polynomials \(P' \in \mathbb {Z}[x_1, \ldots , x_m]\) with small coefficients. Since every k-PTF f can be sign-represented by an integer polynomial with coefficients of size \(2^{\mathop {\mathrm {poly}}(m)}\) [22], this can be done with a table of size \(2^{\mathop {\mathrm {poly}}(m)}\) and in time \(2^{\mathop {\mathrm {poly}}(m)}\). When the coefficients are small (say of bit-complexity \(\le n^{o(1)}\)), then this strategy already yields a #SAT algorithm, as observed by Sakai et al. [31]. Unfortunately, in general, given a restriction \(P'' \in \mathbb {Z}[x_1, \ldots , x_m]\) of a polynomial \(P \in \mathbb {Z}[x_1, \ldots , x_n],\) its coefficients can be much larger (say \(2^{\mathop {\mathrm {poly}}(n)}\)) and it is not clear how to efficiently find a polynomial with small coefficients that sign-represents the same function.

  • It is also known that every k-PTF on m variables can be uniquely identified by \(\mathop {\mathrm {poly}}(m)\) numbers of bit-complexity O(m) each [5]: these are called the “Chow parameters” of f. Again for this representation, it is unclear how to compute efficiently the Chow parameters of the function represented by the restricted polynomial \(P''\). (Even for an LTF, computing the Chow parameters is as hard as Subset-sum [25].)

We show two different ways of circumventing these problems, using two different ideas from the literature.

Using learning theory We use a beautiful recent result of Kane, Lovett and Moran [19], who show that there is a simple decision tree that, when given as input the coefficients of any degree-k polynomial \(P' \in \mathbb {Z}[x_1, \ldots , x_m]\), can determine the sign of the polynomial \(P'\) at all points in \(\{-1,1\}^m\) using only \(\mathop {\mathrm {poly}}(m)\) queries to the coefficients of P. Here, each query is a linear inequality on the coefficients of P; such a decision tree is called a linear decision tree.Footnote 10

Our strategy is to replace Step 1 with the construction of this linear decision tree (which can be done in \(\exp (m^{O(1)})\) time). At each leaf of the linear decision tree, we replace the truth table of the input polynomial \(P'\) by a single bit that indicates whether \(f' = \text {sgn}(P')\) is satisfiable or not.

In Step 2, we simply run this decision tree on our restricted polynomial \(P''\) and obtain the answer to the corresponding satisfiability query in \(\mathop {\mathrm {poly}}(m,w(P''))\) time. Note, crucially, that the height of the linear decision tree implied by [19] construction is \(\mathop {\mathrm {poly}}(m)\) and independent of the bit-complexity of the coefficients of the polynomial \(P''\) (which may be as big as \(\mathop {\mathrm {poly}}(n)\) in our algorithm). This concludes the description of the algorithm for k-PTF.

Using circuit complexity A famous result of Goldmann, Håstad and Razborov [9] shows that any linear threshold function (with possibly very large weights) can be simulated by a depth-2 threshold circuit with small weights. A simple proof of this was provided by Hofmeister [11]. The basic idea is to use the Chinese Remainder Theorem to reduce checking integer equalities involving very large integers to checking integer equalities with much smaller numbers (by going modulo small primes).

In our setting, this idea allows us to reduce (via a randomized procedure) the problem to be solved in Step 2 to solving satisfiability for k-PTFsFootnote 11 on m variables with small coefficients, as long as M is not too large. Since there are not many such PTFs, we can compute and store the answers to all such queries beforehand. This yields the algorithm.

1.1.2 Satisfiability algorithm for k-PTF circuits

For k-PTF circuits, we follow a template set up by the result of Kabanets and Lu [18] on sparse-PTF circuits. We start by describing this template and then describe what is new in our algorithm.

The Kabanets–Lu algorithm is an induction on the depth d of the circuit (which is a fixed constant). Given as input a depth d k-PTF circuit C on n variables, Kabanets and Lu do the following:

Depth-reduction: In [18], it is shown that on a random restriction that sets all but \(n^{1-2\beta }\) variables (here, think of \(\beta \) as a small constant, say 0.01) to random Boolean values, the bottom layer of C simplifies in the following sense.

All but \(t \le n^\beta \) gates at the bottom layer become exponentially biased, i.e. on all but \(\delta = \exp (-n^{\Omega (1)})\) fraction of inputs they are equal to a fixed \(b \in \{-1,1\}\). Now, for each such biased gate g,  there is a minority value \(b_g \in \{-1,1\}\) that it takes on very few inputs. [18] show how to enumerate this small number of inputs in \(\delta \cdot 2^n\) time and check if there is a satisfying assignment among these inputs. Having ascertained that there is no such assignment, we replace these gates by their majority value and there are only t gates at the bottom layer. At this point, we “guess” the output of these t “unbiased” gates and for each such guess \(\sigma \in \{-1,1\}^t\), we check if there is an assignment that simultaneously satisfies:

  1. (a)

    The depth \(d-1\) circuit \(C'\), obtained by setting the unbiased gates to the guess \(\sigma \), is satisfied.

  2. (b)

    Each unbiased gate \(g_i\) evaluates to the corresponding value \(\sigma _i\).

Base case: Continuing this way, we eventually get to a base case which is an AND of sparse PTFs for which there is a satisfiability algorithm using the polynomial method.

In the above algorithm, there are two steps where subquadratic sparsity is crucially used. The first is the minority assignment enumeration algorithm for PTFs, which uses ideas of Chen and Santhanam [6] to reduce the problem to enumerating biased LTFs, which is easy [7]. The second is the base case, which uses a non-trivial polynomial approximation for LTFs [30]. Neither of these results hold for even degree-2 PTFs in general. To overcome this, we do the following.

Enumerating minority assignments Given a k-PTF on m variables that is \(\delta =\exp (-n^{\Omega (1)})\)-close to \(b \in \{-1,1\}\), we enumerate its minority assignments as follows. First, we set up a linear decision tree as in the k-PTF satisfiability algorithm. Then we set all but \(q \approx \log \frac{1}{\delta }\) variables of the PTF. On most such settings, the resulting PTF becomes the constant function and we can check this using the linear decision tree we created earlier. In this setting, there is nothing to do. Otherwise, we brute-force over the remaining variables to find the minority assignments. Setting parameters suitably, this yields an \(O(\sqrt{\delta } \cdot 2^m)\) time algorithm to find the minority assignments of a k-PTF on m variables which is \(\delta \)-close to an explicit constant.

Base case Here, we make the additional observation (which [18] do not need) that the AND of PTFs that is obtained further is small in that it only has slightly superlinear size. Hence, we can apply another random restriction in the style of [18] and using the minority assignment enumeration ideas, reduce it to an AND of a small (say \(n^{0.1}\)) number of PTFs on \(n^{0.01}\) (say) variables. At this point, we can again run the linear decision tree (in a slightly more generalized form) to check satisfiability.

2 #SAT for k-PTFs: the first algorithm

2.1 A result of Kane, Lovett, and Moran [19]

In this subsection, we formally present the result from [19] which we use in the memoization step of our #SAT algorithm in the following subsection. We begin with the following couple of definitions.

Definition 10

(Coefficient vectors) Fix any \(k,m\ge 1.\) There are exactly \(r = \sum _{i=0}^k \left( {\begin{array}{c}m\\ i\end{array}}\right) \) many multilinear monomials of degree at most k. Any multilinear polynomial \(P(x_1,\ldots ,x_m)\) of degree k can be identified with a list of the coefficients of its monomials in lexicographic order (say) and hence with some vector \(w\in \mathbb {R}^r\). We call w the coefficient vector of P and use \(\mathrm {coeff}_{m,k}(P)\) to denote this vector. When mk are clear from context, we will simply use \(\mathrm {coeff}(P)\) instead of \(\mathrm {coeff}_{m,k}(P).\)

Definition 11

(Linear Decision Trees) A Linear Decision Tree for a function \(f:\mathbb {R}^r \rightarrow S\) (for some set S) is a decision tree where each internal node is labelled by a linear inequality, or query, of the form \(\sum _{i=1}^r w_i z_i \ge \theta \) (here \(z_1,\ldots ,z_n\) denote the input variables). Depending on the answer to this linear query, computation proceeds to the left or right child of this node, and this process continues until a leaf is reached, which is labelled with an element of S that is the output of f on the given input.

The following construction of linear decision trees from [19] will be crucial for us.

Theorem 12

There is a deterministic algorithm, which on input a positive integer r and a subset \(H\subseteq \{-1,1\}^r\), produces a linear decision tree of depth \(\Delta = O(r\log ^2 r\cdot \log |H|)\) that computes a function \(F:\mathbb {R}^r\rightarrow \{-1,1\}^{|H|}\) and has the following properties.

  1. 1.

    Each linear query has coefficients in \(\{-2,-1,0,1,2\}.\)

  2. 2.

    Given as input any \(w\in \mathbb {R}^r\) such that \(\langle w, a \rangle \ne 0\) for all \(a\in \{-1,1\}^r\), F(w) is the truth table of the LTF defined by w (with constant term 0) on inputs from \(H\subseteq \{-1,1\}^r\).

Moreover, the algorithm runs in time \(2^{O(\Delta )}.\)

Theorem 1.8 from [19] shows the existence of such a deterministic linear decision tree. However, as noted by an anonymous ITCS 2019 reviewer,Footnote 12 their proof can in fact be slightly modified to yield an algorithm to construct it in the claimed running time. For completeness, we give a proof in “Appendix A”.

We will need a version of Theorem 12 for evaluating (tuples of) k-PTFs. It follows easily from Theorem 12.

Corollary 13

Fix positive constants k and c. Let \(r = \sum _{i=0}^k\left( {\begin{array}{c}m\\ i\end{array}}\right) = \Theta (m^k)\) denote the number of coefficients in a degree-k multilinear polynomial in m variables. There is a deterministic algorithm which on input positive integers m and \(\ell \le m^c\) computes a function \(F:\mathbb {R}^{r\cdot \ell }\rightarrow \mathbb {N}\) as follows: given as input any \(\ell \)-tuple of coefficient vectors \({\overline{w}} = (\mathrm {coeff}_{m,k}(P_1),\ldots ,\mathrm {coeff}_{m,k}(P_\ell ))\in \mathbb {R}^{r\cdot \ell }\) such that \(P_i(a) \ne 0\) for all \(a\in \{-1,1\}^m\), \(F({\overline{w}})\) is the number of common satisfying assignments to all the k-PTFs on \(\{-1,1\}^m\) sign-represented by \(P_1,\ldots ,P_\ell \). Further, the algorithm runs in time \(2^{O(\Delta )}\), where \(\Delta = O(\ell \cdot m^{k+1}\log ^2 m)\).

Proof

For each \(b\in \{-1,1\}^m,\) define \(\mathrm {eval}_b\in \{-1,1\}^r\) to be the vector of all evaluations of multilinear monomials of degree at most k, taken in lexicographic order, on the input b. Define \(H\subseteq \{-1,1\}^r\) to be the set \(\{\mathrm {eval}_b\ |\ b\in \{-1,1\}^m\}.\) Clearly, \(|H| \le 2^m\). Further, note that given any polynomial \(P(x_1,\ldots ,x_m)\) of degree at most k, the truth table of the k-PTF sign-represented by P is given by the evaluation of the LTF represented by \(\mathrm {coeff}(P)\) at the points in H. Our aim, therefore, is to evaluate the LTFs corresponding to \(\mathrm {coeff}(P_1),\ldots ,\mathrm {coeff}(P_\ell )\) at all the points in H.

For each i, we use the deterministic algorithm from Theorem 12 to produce a decision tree \(\mathcal {T}_i\) that evaluates the Boolean function \(f_i:\{-1,1\}^m\rightarrow \{-1,1\}\) sign-represented by \(P_i\) (or equivalently, evaluating the LTF corresponding to \(\mathrm {coeff}(P_i)\) at all points in H). Note that \(\mathcal {T}_i\) has depth \(O(m^k\log ^2 m\cdot \log (2^m)) = O(m^{k+1}\log ^2 m)\). The final tree \(\mathcal {T}\) is obtained by simply running \(\mathcal {T}_1,\ldots ,\mathcal {T}_\ell \) in order, which is of depth \(O(\ell \cdot m^{k+1}\log ^2 m).\) Observe that the tree \(\mathcal {T}\) outputs the number of common satisfying assignments to all the \(f_i\).

The claim about the running time follows from the analogous claim in Theorem 12 and the fact that the number of common satisfying assignments to all the \(f_i\) can be computed from the truth tables in \(2^{O(m)}\) time. This completes the proof. \(\square \)

2.2 The #SAT algorithm

We are now ready to prove Theorem 4. We first state the algorithm, which follows a standard memoization idea (see, e.g. [29]). We assume that the input is a polynomial \(P\in \mathbb {Z}[x_1,\ldots ,x_n]\) of degree at most k that sign-represents a Boolean function f on n variables. The parameters of the instance are assumed to be (nM) (recall from Definition 2 that \(M = w(P)\) is the bit-complexity of the sum of the absolute values of all the coefficients of P). Set \(m = n^{1/(k+1)}/\log n\).

Algorithm \(\mathbf {\mathcal {A}}\)

  1. 1.

    Use the algorithm from Corollary 13 with \(\ell =1\) to construct a deterministic linear decision tree T such that on any input polynomial \(Q(x_1,\ldots ,x_m)\) (or more precisely \(\mathrm {coeff}_{m,k}(Q)\)) of degree at most k that sign-represents a k-PTF g on m variables, T computes the number of satisfying assignments to g.

  2. 2.

    Set \(N=0\) (N will ultimately be the number of satisfying assignments to f).

  3. 3.

    For each setting \(\sigma \in \{-1,1\}^{n-m}\) to the variables \(x_{m+1},\ldots ,x_n\), do the following:

    1. (a)

      Compute the polynomial \(P_\sigma \) obtained by substituting the variables \(x_{m+1,\ldots ,x_n}\) accordingly in P.

    2. (b)

      Run T on \(\mathrm {coeff}(P_\sigma )\) and compute its output \(N_\sigma \), the number of satisfying assignments to \(P_\sigma .\) Add this to the current value of N.

  4. 4.

    Output N.

Correctness It is clear from Corollary 13 (invoked for \(\ell =1\)) and step 3b that algorithm \(\mathcal {A}\) outputs the correct number of satisfying assignments to f.

Running time We show that the running time of algorithm \(\mathcal {A}\) is \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n-m}\). First note that by Corollary 13, the construction of a linear decision tree T takes \(2^{O(\Gamma )}\) time, where \(\Gamma = m^{k+1}\log ^2 m\), and hence, step 1 takes \(2^{O(\Gamma )}\) time. Next, for a setting \(\sigma \in \{-1,1\}^{n-m}\) to the variables \(x_{m+1},\ldots ,x_n\), computing \(P_\sigma \) and constructing the vector \(\mathrm {coeff}(P_\sigma )\) takes only \(\mathop {\mathrm {poly}}(n,M)\) time. Recall that the depth of T is \(O(\Gamma )\) and thus, on input vector \(\mathrm {coeff}(P_\sigma )\), each of whose entries has bit complexity at most M, it takes time \(O(\Gamma )\cdot \mathop {\mathrm {poly}}(M,n)\) to run T and obtain the output \(N_\sigma \). Therefore, step 3 takes \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n-m}\) time. Finally, the claim about the total running time of algorithm \(\mathcal {A}\) follows at once when we observe that for the setting \(m = (n/\log ^3 n)^{1/k+1}\), \(\Gamma = O(n/\log n) = o(n)\).

3 #SAT for k-PTFs: the second algorithm

Here, we present an alternate approach to #SAT for k-PTFs. This approach uses memoization as before, but the idea now will be to first reduce the size of the coefficients, by going modulo small primes. A major hurdle with this is that PTFs use inequalities, which do not gel well with this operation. Hence, we will first transform our PTFs into a similar model which uses equalities, namely Exact Polynomial Threshold Functions.

Definition 14

(Exact Polynomial Threshold Functions [12, 13]) A Boolean function \(E: \{-1, 1\}^n \rightarrow \{-1, 1\}\) is called an Exact Polynomial Threshold Function of degree \(k\), or a \(k\)-EPTF, if there exists a multilinear polynomial \(P \in \mathbb {R}[x_1,\ldots ,x_n]\) of degree \(k\) such that for all \(a \in \{-1, 1\}^n\), \(E(a) = -1\) if and only if \(P(a) = 0\). We refer to such a \(P\) as a representation of \(E\). When \(k=1\), we call \(E\) an Exact Threshold Function, or ETF for short.

The main idea in this algorithm is to first convert the given PTF to a disjoint OR of EPTFs. To do this, we follow Hofmeister [11], who showed how to do this for degree one. His proof is constructive and can easily be adapted to higher degrees.

Lemma 15

(Implicit in [11]) Let \(f: \{-1, 1\}^n \rightarrow \{-1, 1\}\) be a \(k\)-PTF sign-represented by a polynomial \(P\in \mathbb {Z}[x_1,\ldots ,x_n]\) with parameters \((n, M)\). Then it can be written as,

$$\begin{aligned} f = \bigvee _{i=1}^{h} E_i \end{aligned}$$

where \(h=O(Mn^{2k})\) and each \(E_i\) is a \(k\)-EPTF that can be represented by a degree k polynomial with weight \(O(M + k\log n)\).

Moreover, the OR is a disjoint OR i.e. at most one of the \(E_i\)s can evaluate to TRUE for a given input.

Finally, this transformation is constructive in the following sense. There is a deterministic algorithm running in \(\mathop {\mathrm {poly}}(n,M)\) time that, on input an integer polynomial \(P\in \mathbb {Z}[x_1,\ldots ,x_n]\) with parameters (nM) representing f, produces polynomials \(P_1,\ldots ,P_h\) of weight \(O(M+k \log n)\) where \(P_i\) represents \(E_i\) for each \(i \in [h]\).

We include a proof of this lemma in “Appendix B” for completeness.

One of the bottlenecks to a brute force approach to satisfiability directly using this lemma is the size of the coefficients. For this reason, instead of evaluating an EPTF as is, we evaluate it modulo many small primes. We first define this modular version of EPTFs.

Definition 16

(modular EPTFs) Let \(P\in \mathbb {Z}[x_1,\ldots ,x_n]\) be any integer polynomial of degree at most k. For a prime p, we define the Boolean function \(E_{P}^p:\{-1,1\}^n\rightarrow \{-1,1\}\) such that for all a, \(E^p_P(a) = -1\) if and only if \(P(a) \equiv 0\mod p\).

We call such a Boolean function \(E^p_P\) a p-modular k-EPTF.

Evaluating a polynomial modulo small enough primes will reduce the size of the coefficients but it will also introduce errors. However, if we evaluate modulo a random prime among sufficiently many primes, the error probability can be shown to be small. The underlying principle is the well-known Chinese Remainder Theorem, which we state below.

Theorem 17

(Chinese Remainder Theorem) Let \(\{p_1, \ldots , p_\ell \}\) be a set of distinct primes, and \(a \in \mathbb {Z}\). Then the following are equivalent:

  • for all \(1\le i \le \ell \), \(a \equiv 0 \mod p_i\).

  • \(a \equiv 0 \mod \prod _{i=1}^{\ell } p_i\).

In particular if \(a \ne 0\) and \(a \equiv 0 \pmod {p_i}\) for each \(i\in [\ell ]\), then \(\ell \le \log _2 |a|\).

Now we are ready to describe the algorithm in detail. The input, as earlier, will be a polynomial \(P \in \mathbb {Z}[x_1, \ldots , x_n]\) of degree at most \(k\) that sign-represents a Boolean function \(f\) on \(n\) variables. The parameters of the instance are taken to be \((n, M)\). We assume that all monomials of the same degree are ordered among themselves using a predetermined ordering. One such ordering is the lexicographic ordering.

Algorithm \(\mathcal {MOD}p:\)

Set \(m = \delta n^{1/k}\) and \(A = \beta 2^m\) for a small enough constant \(\delta \) and a large enough constant \(\beta \).

  1. 1.

    Using Lemma 15, decompose f as a disjoint OR of k-EPTFs

    $$\begin{aligned} f(X) = \bigvee _{i=1}^{h} E_i(X). \end{aligned}$$

    Here \(h = O(Mn^{2k}) = \mathop {\mathrm {poly}}(n,M)\) and each \(E_i\) is a \(k\)-EPTF represented by a degree-k polynomial \(P_i\) with weight at most \(M' = O(M+\log n)\).

  2. 2.

    We now describe the memoization step. For each prime \(p\in [1,M'A\log (M'A)],\) we do the following. For each \(i \in [h]\), consider all degree-k integer polynomials in \(x_1,\ldots ,x_m\) such that the following holds.

  • The coefficients of the monomials with degree exactly \(k\) are chosen by reducing the corresponding coefficients of the polynomial \(P_i\) modulo p (to obtain a non-negative integer less than p).

  • The coefficients of the monomials of degree less than k are allowed to be any non-negative integers less than p. Let \(\mathcal {P}_{i,p}\) be the set of all such polynomials.

Each polynomial \(Q\in \mathcal {P}_{i,p}\) defines a p-modular k-EPTF \(E_Q^p\) of m variables. For each such Q, use brute force over all m input variables and count the number of satisfying assignments of \(E_Q^p\). Store the results in a table.

  1. 3.

    Set \(N = 0\). (\(N\) will ultimately be the number of satisfying assignments to \(f\)).

  2. 4.

    For each \(\sigma : \{x_{m+1}, \ldots , x_n\} \rightarrow \{-1, 1\}\), for each \(i\in [h]\), do the following. For each \(j\in [2n]\), do the following.

    1. (a)

      Choose a uniformly random prime \(p_j \in [1, M'A\log (M'A)]\).

    2. (b)

      Compute \(P_{i, \sigma }\), the restriction of \(P_i\) given by the partial assignment \(\sigma \). (Note that \(P_{i,\sigma }\) is a polynomial in \(x_1,\ldots ,x_m\). Further, since \(P_i\) has degree at most k, the coefficient of any monomial of degree exactly k in \(P_{i,\sigma }\) is the same as it is in \(P_i\).)

    3. (C)

      Let Q be the polynomial in \(\mathcal {P}_{i,p}\) obtained by reducing the coefficients of \(P_i\) modulo \(p_j\). Look up the number of satisfying assignments of \(E_{Q}^{p_j}\) in the table constructed in Step 2. Let this be \(N_{i,\sigma ,j}\).

    Arrange \(N_{i,\sigma ,1},\ldots ,N_{i,\sigma ,2n}\) in increasing order and let \(N_{i,\sigma }\) be the smallest value. Add \(N_{i,\sigma }\) to N.

  3. 5.

    Output \(N\).

We now prove Theorem 6 from the introduction. Note that Theorem 6 is trivial when \(M= 2^{\Omega (n^{1/k})}\) since #SAT for k-PTFs can be solved in time \(\mathop {\mathrm {poly}}(n,M)2^n\) by a trivial brute-force algorithm. Hence, from now on, we assume that \(M\le 2^{\varepsilon n^{1/k}}\) for a suitably small constant \(\varepsilon \). The following statement now almost implies Theorem 6, except for the zero-error criterion.

Theorem 18

The following holds for large enough constant \(\beta \) and small enough constants \(\varepsilon ,\delta \). \(\mathcal {MOD}p\) is a randomized algorithm, which on input a polynomial \(P\) of degree \(k\) with parameters \((n, M)\) with \(M \le 2^{\varepsilon n^{1/k}}\), outputs the number of satisfying assignments for \(f = \mathrm {sgn}(P)\) with probability \(1-o(1)\). The algorithm runs in time at most \(2^{n - {\Omega }(n^{1/k})}\).

Proof

Correctness Recall that f is decomposed as a disjoint OR of k-EPTFs \(E_1,\ldots ,E_h\) represented by polynomials \(P_1,\ldots ,P_h\) in Step 1. For any assignment \(\sigma \) considered in Step 4, we still have that the corresponding relation between the restrictions \(f_\sigma \) and \(E_{1,\sigma },\ldots ,E_{h,\sigma }.\) It suffices to show that in Step 4, for each i and \(\sigma \), \(N_{i,\sigma }\) equals the number of satisfying assignments of \(E_{i,\sigma }\) with high probability.

To this end, we claim that for any restriction \(\sigma \) on the last \(n-m\) variables and any \(i\in [h]\) the following holds.

$$\begin{aligned} \mathop {\mathrm {Pr}}_{}[N_{i,\sigma } = \text { number of satisfying assignments of }E_{i,\sigma } ] \ge 1-\frac{1}{4^n} \end{aligned}$$
(1)

To show this we proceed as follows. Note that for each \(j\in [2n]\), \(N_{i,\sigma ,j}\) is equal to the number of \(a\in \{-1,1\}^m\) such that \(P_{i,\sigma }(a) \equiv 0 \pmod {p_j}\). We call \(p_j\) a bad prime for a if \(P_{i,\sigma }(a)\) is non-zero but \(P_{i,\sigma }(a)\equiv 0 \pmod {p_j}\), i.e a is not a satisfying assignment of \(E_{i,\sigma }\), but under the modulo operation, it gets counted as a satisfying assignment. Further, we say that \(p_j\) is a bad prime if it is bad for some \(a\in \{-1,1\}^m\), i.e. modulo \(p_j\) some non-satisfying assignment gets counted as a satisfying assignment. Note that \(N_{i,\sigma ,j}\) is always at least the number of satisfying assignments of \(E_{i,\sigma }\), with equality occurring if \(p_j\) is not a bad prime.

We now bound the probability that \(p_j\) is a bad prime. Fix any \(a\in \{-1,1\}^m\). As \(P_i\) has weight at most \(M'\), we have \(|P_{i,\sigma }(a) |\le 2^{M'}\). Using Theorem 17, the number of primes that are bad for a is bounded by \(M'\). Hence the total number of bad primes is at most \(M'2^m\). On the other hand, the total number of primes in the range \([1, M'A\log (M'A)]\) is at least \(\Omega (M'A)\) by the Prime Number theorem. For a large enough constant \(\beta \), this is at least \(4M'2^m\). Thus, the probability that \(p_j\) is a bad prime is at most 1/4.

Since \(N_{i,\sigma }\) is the smallest of all the \(N_{i,\sigma ,j}\) for \(j \in [2n]\), we see that \(N_{i,\sigma }\) is equal to the number of satisfying assignments of \(E_{i,\sigma }\) unless every \(p_j\) (\(j\in [2n]\)) is a bad prime. The probability of this is at most \((1/4)^n\). This proves (1).

By a union bound, the probability that there exist \(\sigma \) and i such that \(N_{i,\sigma }\) is not equal to the number of satisfying assignments of \(E_{i,\sigma }\) is at most \(h2^{n-m}/4^n = O(Mn^{2k}2^{n-m}/4^n) = o(1)\), where we have used the upper bound on M from the statement of the theorem. Hence with probability \(1-o(1)\), the algorithm correctly computes the number of satisfying assignments of \(E_{i,\sigma }\) for each \(\sigma , i\). In this case the algorithm correctly returns the number of satisfying assignments of f.

Running time The running time of the algorithm is dominated by the running times of Step 2 and Step 4.

In Step 2, the number of primes in the specified range is \(O(M'2^m)\). For a given prime p and \(i\in [h]\), the set \(\mathcal {P}_{i,p}\) has size at most \((c_1M'2^m)^{m^{k-1}}\) for some positive constant \(c_1\). Hence the number of modular \(k\)-EPTFs that are considered is at most \(O(hM'2^m) \cdot (c_1 M'2^m)^{m^{k-1}} = 2^{O(m^k+m^{k-1}\log M')}\). For each such EPTF, it takes time \(2^{m} \cdot \mathop {\mathrm {poly}}(n,M') = 2^m \cdot \mathop {\mathrm {poly}}(n,M)\) to brute force over all possible assignments. Hence the total time needed to execute this step is \(2^{O(m^k+m^{k-1}\log M')}\cdot \mathop {\mathrm {poly}}(n,M) \le 2^{n/2}\) for our choice of parameters and suitably small \(\varepsilon ,\delta \).

For Step 4, the total running time is \(2^{n - m} \mathop {\mathrm {poly}}(n,M)\). Hence, for the given choice of parameters, and using \(M \le 2^{\varepsilon n^{1/k}}\) for suitably small \(\varepsilon \), the final running time is \(2^{n - {\Omega }(n^{1/k})}\). \(\square \)

Making the algorithm zero-error The above almost implies Theorem 6 with the only exception being that the algorithm is not zero-error. However, there is a simple and elegant fix for this, as was pointed out to us by a anonymous reviewer.

Note that the above algorithm always returns an estimate that is at least the number of satisfying assignments of f, as the only source of error is when an non-satisfying assignment is incorrectly counted as a satisfying assignment at the time of computing \(N_{i,\sigma }\) for some \(i\in [h] ,\sigma \in \{-1,1\}^{n-m}\).

So, to get the zero-error algorithm, we proceed as follows. We run the above algorithm on polynomials P and \(-P\), which represent Boolean functions f and the negation of f respectively. The algorithm returns estimates \(N_1\) and \(N_2\), where \(N_1\ge N_f\) and \(N_2 \ge N_{\lnot f}\) and \(N_g\) denotes the number of satisfying assignments of g. Note that both inequalities are equalities precisely when \(N_1 +N_2 = 2^n\), and this happens with probability \(1-o(1)\). Hence, the algorithm simply checks that \(N_1 + N_2 = 2^n\) and returns \(N_1\) in this case. Otherwise, the algorithm returns ?.

4 Constant-depth circuits with PTF gates

In this section we give an algorithm for counting the number of satisfying assignments for a k-PTF circuit of constant depth and slightly superlinear size. We begin with some definitions.

Definition 19

Let \(\delta \le 1\) be any parameter. Two Boolean functions fg are said to be \(\delta \)-close if \(\mathop {\mathrm {Pr}}_{x}[f(x) \ne g(x)] \le \delta \).

A k-PTF f specified by a polynomial P is said to be \(\delta \)-close to an explicit constant if it is \(\delta \)-close to a constant and such a constant can be computed efficiently, i.e. \(\mathop {\mathrm {poly}}(n,M)\), where n is the number of variables in P and w(P) is at most M.

Definition 20

For a Boolean function \(f:\{-1,1\}^n \rightarrow \{-1,1\}\), the majority value of f is the bit value \(b \in \{-1,1\}\) which maximizes \(\mathop {\mathrm {Pr}}_{x}[f(x)=b]\) and the bit value \(-b\) is said to be its minority value.

For a Boolean function f with majority value b, an assignment \(x \in \{-1,1\}^n\) is said to be a majority assignment if \(f(x)=b\) and a minority assignment otherwise.

Definition 21

Given a k-PTF f on n variables specified by a polynomial P, a parameter \(m \le n\) and a partial assignment \(\sigma \in \{-1,1\}^{n-m}\) on \(n-m\) variables, let \(P_\sigma \) be the polynomial obtained by substituting the variables in P according to \(\sigma \). If P has parameters (nM) then \(P_\sigma \) has parameters (mM). For a k-PTF circuit C, \(C_\sigma \) is defined similarly. If C has parameters (nsdM) then \(C_\sigma \) has parameters (msdM).

Outline of the #SAT procedure For designing a #SAT algorithm for k-PTF circuits, we use the generic framework developed by Kabanets and Lu [18] with some crucial modifications.

Given a k-PTF circuit C on n variables of depth d we want to count the number of satisfying assignments \(a \in \{-1,1\}^n\) such that \(C(a)=-1\). We in fact solve a slightly more general problem. Given \((C,{\mathcal {P}})\), where C is a small k-PTF circuit of depth d and \({\mathcal {P}}\) is a set of k-PTF functions, such that \(\sum _{f \in {\mathcal {P}}} \text {fan-in}(f)\) is small, we count the number of assignments that simultaneously satisfy C and all the function in \({\mathcal {P}}\).

At the core of the algorithm that solves this problem, Algorithm \({\mathcal {B}}\), is a recursive procedure \({\mathcal {A}}_5\), which works as follows: on inputs \((C,{\mathcal {P}})\) it first applies a simplification step that outputs \(\ll 2^n\) instances of the form \((C', {\mathcal {P}}')\) such that

  • Both \(C'\) and functions in \({\mathcal {P}}'\) are on \(m \ll n\) variables.

  • The sets of satisfying assignments of these instances “almost” partition the set of satisfying assignments of \((C,{\mathcal {P}})\).

  • With all but very small probability the bottom layer of \(C'\) has the following nice structure.

  • At most n gates are \(\delta \)-close to an explicit constant. We denote this set of gates by B (as we will simplify them by setting them to the constant they are close to).

  • At most \(n^{\beta _d}\) gates are not \(\delta \)-close to an explicit constant. We denote these gates by G (as we will simplify them by “guessing” their values).

  • There is a small set of satisfying assignments that are not covered by the satisfying assignments of \((C',\mathcal {P}')\) but we can count these assignments with a brute-force algorithm that does not take too much time.

For each \(C'\) with this nice structure, then we try to use this structure to create \(C''\) which has depth \(d-1\). Suppose we reduce the depth as follows:

  • Set all the gates in B to the values that they are biased towards.

  • Try all the settings of the values that the gates in G can take, thereby from \(C'\) creating possibly \(2^{n^{\beta _d}}\) instances \((C'',{\mathcal {P}}')\).

\((C'',{\mathcal {P}}')\) now is an instance where \(C''\) has depth \(d-1\). Unfortunately, by simply setting biased gates to the values they are biased towards, we may miss out on the minority assignments to these gates which could eventually satisfy \(C'\). We design a subroutine \({\mathcal {A}}_3\) to precisely handle this issue, i.e. to keep track of the number of minority assignments, say \(N_{C'}\). This part of our algorithm is completely different from that of [18], which only works for subquadratically sparse PTFs.

Once \({\mathcal {A}}_3\) has computed \(N_{C'}\), i.e. the number of satisfying assignments among the minority assignments, we now need to only count the number of satisfying assignments among the rest of the assignments.

To achieve this we use an idea similar to that in [7, 18], which involves appending \({\mathcal {P}}'\) with a few more k-PTFs (this forces the biased gates to their majority values). This gives say a set \(\tilde{{\mathcal {P}}}'\). Similarly, while setting gates in G to their guessed values, we again use the same idea to ensure that we are counting satisfying assignments consistent with the guessed values, once again updating \(\tilde{{\mathcal {P}}}'\) to a new set \({\mathcal {P}}''\). This creates instances of the form \((C'',{\mathcal {P}}'')\), where the depth of \(C''\) is \(d-1\).

This way, we iteratively decrease the depth of the circuit by 1. Finally, we have instances \((C'',{\mathcal {P}}'')\) such that the depth of \(C''\) is 1, i.e. it is a single k-PTF, say h. At this stage we solve #SAT\(({\tilde{C}})\), where \({\tilde{C}} = h \wedge \bigwedge _{f \in {\mathcal {P}}''} f\). This is handled in a subroutine \({\mathcal {A}}_4\). Here too our algorithm differs significantly from [18].

In what follows we will prove Theorem 9. In order to do so, we will set up various subroutines \(\mathcal {A}_1,\mathcal {A}_2,\mathcal {A}_3,\mathcal {A}_4,\mathcal {A}_5\) designed to accomplish certain tasks and combine them together at the end to finally design algorithm \(\mathcal {B}\) for the #SAT problem for k-PTF circuits.

\(\mathcal {A}_1\) will be an oracle, used in other routines, which will compute the number of common satisfying assignments for small AND of PTFs on few variables (using the same idea as in the algorithm for #SAT for k-PTFs). \(\mathcal {A}_2\) will be a simplification step, which will allow us to argue about some structure in the circuit (this algorithm is from [18]). It will make many gates at the bottom of the circuit \(\delta \)-close to a constant, thus simplifying it. \(\mathcal {A}_3\) will be used to count minority satisfying assignments for a bunch of PTFs that are \(\delta \)-close to an explicit constant, i.e. assignments which cause at least one of the PTFs to evaluate to its minority value. \(\mathcal {A}_4\) will be a general base case of our algorithm, which will count satisfying assignments for AND of superlinearly many PTFs, by first using \(\mathcal {A}_2\) to simplify the circuit, then reducing it to the case of small AND of PTFs and then using \(\mathcal {A}_1\). \(\mathcal {A}_5\) will be a recursive procedure, which will use \(\mathcal {A}_2\) to first simplify the circuit, and then convert it into a circuit of lower depth, finally making a recursive call on the simplified circuit.

Parameter setting Let d be a constant. Let AB be two fixed absolute large constants. Let \(\zeta = \min (1, A/2Bk^2)\). For each \(2 \le i \le d\), let \(\beta _{i} = A\cdot \varepsilon _{i}\) and \(\varepsilon _i = (\frac{\zeta }{10A(k+1)})^i\). Choose \(\beta _{1} = 1/10\) and \(\varepsilon _{1}=1/10A\).

Oracle access to a subroutine Let \({\mathcal {A}}_1(n',s,f_1,\ldots ,f_s)\) denote a subroutine with the following specification. Here, n is the number of variables in the original input circuit.

Input: AND of k-PTFs, say \(f_1, \ldots , f_s\) specified by polynomials \(P_1, \ldots , P_s\) respectively, such that \(s \le n^{0.1}\) and for each \(i \in [s]\), \(f_i\) is defined over \(n'\le n^{1/(2(k+1))}\) variables and \(w(P_i)\le M\).

Output: #\(\{a \in \{-1,1\}^{n'} \mid \forall i \in [s], P_i(a) = -1\}\).

In what follows, we will assume that we have access to the above subroutine \({\mathcal {A}}_1\). We will set up such an oracle and show that it answers any call to it in time \(\mathop {\mathrm {poly}}(n,M)\) in Sect. 4.5.

4.1 Simplification of a k-PTF circuit

For any \(1 > \varepsilon \gg (\log n)^{-1}\), let \(\beta =A \varepsilon \) and \(\delta = \exp (-n^{\beta /B\cdot k^2})\), where A and B are constants. Note that it is these constants AB we use in the parameter settings paragraph above. Let \({\mathcal {A}}_2(C,d,n,M)\) be the following subroutine.

Input: k-PTF circuit C of depth d on n variables with size \(n^{1+\varepsilon }\) and weight M.

Output: A decision tree \(\mathrm {T_{\text {DT}}}\) which is a complete binary tree of depth \(n-n^{1-2\beta }\) such that for a uniformly random leaf \(\sigma \in \{-1,1\}^{n-n^{1-2\beta }}\), the corresponding circuit \(C_\sigma \) is a good circuit with probability \(1-\exp (-n^{\varepsilon })\), where \(C_\sigma \) is called good if its bottom layer has the following structure:

  • there are at most n gates which are \(\delta \)-close to an explicit constant. Let \(B_\sigma \) denote this set of gates.

  • there are at most \(n^{\beta }\) gates that are not \(\delta \)-close to an explicit constant. Let us denote this set of gates by \(G_\sigma \).

In [18], such a subroutine \({\mathcal {A}}_2(C,d,n,M)\) was designed. Specifically, they proved the following theorem.

Theorem 22

(Kabanets and Lu [18]) There is a zero-error randomized algorithm \({\mathcal {A}}_2(C,d,n,M)\) that runs in time \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n-n^{1-2\beta }})\) and outputs a decision tree as described above with probability at least \(1-1/2^{10n}\) (and outputs ? otherwise). Moreover, given a good \(C_\sigma \), there is a deterministic algorithm that runs in time \(\mathop {\mathrm {poly}}(n,M)\) which computes \(B_\sigma \) and \(G_\sigma \).

Remark 23

In [18], it is easy to see that the probability of outputting ? is at most 1/2. To bring down this probability to \(1/2^{10n}\), we run their procedure in parallel 10n times, and output the first tree that is output by the algorithm. The probability that no such tree is output is \(1/2^{10n}.\)

Remark 24

In designing the above subroutine in [18], they consider a more general class of polynomially sparse-PTF circuits (i.e. each gate computes a PTF with polynomially many monomials) as opposed to the k-PTF circuits we consider here. Under this weaker assumption, they get that \(\delta = \exp (-n^{\Omega (\beta ^3)})\). However, by redoing their analysis for degree k-PTFs, it is easy to see that \(\delta \) could be set to \(\exp (-n^{\beta /B\cdot k^2})\) for some constant B. Under this setting of \(\delta ,\) we get exactly the same guarantees.

Further, while the statement of the result in [18] does not guarantee that the decision tree \(\mathrm {T_{\text {DT}}}\) obtained is a complete binary tree, it is easy to see that this follows from their analysis (and the analysis from [7] that is used as a black box).

In this sense, the above theorem statement is a slight restatement of [18, Lemma 11].

4.2 Enumerating the minority assignments

We now design an algorithm \({\mathcal {A}}_3(m,\ell ,\delta , g_1, \ldots , g_\ell )\), which has the following behaviour.

Input: parameters \(m\le n, \ell , \delta \) such that \(\delta \in \left[ \exp (-m^{1/10(k+1)}) ,1\right] \), \(\ell \le m^2\), k-PTFs \(g_1, g_2, \ldots , g_\ell \) specified by polynomials \(P_1, \ldots , P_\ell \) on m variables (\(x_1,\ldots ,x_m\)) each of weight at most M and which are \(\delta \)-close to \(-1\).

Oracle access to: \({\mathcal {A}}_1\).

Output: The set of all \(a \in \{-1,1\}^m\) such that \(\exists i \in [\ell ]\) for which \(P_i(a)>0\).

Lemma 25

There is a deterministic algorithm \({\mathcal {A}}_3(m,\ell ,\delta , g_1, \ldots , g_\ell )\) as specified above that runs in time \(\mathop {\mathrm {poly}}(m,M)\cdot \sqrt{\delta }\cdot 2^{m}\).

Proof

We start with the description of the algorithm.

\(\mathbf {\mathcal {A}_3(m,\ell ,\delta , g_1, \ldots , g_\ell )}\)

  1. 1.

    Set \(q = \frac{1}{2}\log \frac{1}{\delta } \le \frac{m}{2}\) and let \(\mathcal {N} = \emptyset \). (\(\mathcal {N}\) will eventually be the collection of minority assignments i.e. all \(a\in \{-1,1\}^m\) such that \(\exists i\in [\ell ]\) for which \(P_i(a)>0\).)

  2. 2.

    For each setting \(\rho \in \{-1,1\}^{m-q}\) to the variables \(x_{q+1},\ldots ,x_m\), do the following:

    1. (a)

      Construct the restricted polynomials \(P_{1,\rho },\ldots ,P_{\ell ,\rho }\). Let \(g_{i,\rho } = \mathrm {sgn}(P_{i,\rho })\) for \(i\in [\ell ]\).

    2. (b)

      Using oracle \(\mathcal {A}_1(q,1,-g_{i,\rho })\), check for each \(i\in [\ell ]\) if \(g_{i,\rho }\) is the constant function \(-1\) by checking if the output of the oracle on the input \(-g_{i,\rho }\) is zero.

    3. (c)

      If there is an \(i\in [\ell ]\) such that \(g_{i,\rho }\) is not the constant function \(-1\), try all possible assignments \(\chi \) to the remaining q variables \(x_1,\ldots ,x_q\). This way, enumerate all assignments \(b = (\chi ,\rho )\) to \(x_1,\ldots ,x_m\) for which there is an \(i\in [\ell ]\) such that \(P_i(b)>0\). Add such an assignment to the collection \(\mathcal {N}\).

  3. 3.

    Output \(\mathcal {N}\).

Correctness If\( a\in \{-1,1\}^m\) is a minority assignment (i.e. \(\exists i_0\in [\ell ]\) so that \(P_{i_0}(a) > 0\)) and if \(a = (\chi ,\rho )\) where \(\rho \) is an assignment to the last \(m-q\) variables, and \(\chi \) to the first q, a will get added to \(\mathcal {N}\) in the loop of step 2 corresponding to \(\rho \) and that of \(\chi \) in step 2c, because of \(i_0\) being a witness. Conversely, observe that we only add to the collection \(\mathcal {N}\) when we encounter a minority assignment.

Running time For each setting \(\rho \in \{-1,1\}^{m-q}\) to the variables \(x_{q+1},\ldots ,x_m\), step 2a takes \(\mathop {\mathrm {poly}}(m,M)\) time and step 2b takes \(O(\ell ).\mathop {\mathrm {poly}}(m,M) = \mathop {\mathrm {poly}}(m,M)\) time and so combined, they take only \(\mathop {\mathrm {poly}}(m,M)\) time. Let \(\mathcal {T}\) be the set consisting of all assignments \(\rho \) to the last \(m-q\) variables such that the algorithm enters the loop described in step 2c i.e.

$$\begin{aligned} \mathcal {T} = \{\rho \in \{-1,1\}^{m-q}|\exists i \in [\ell ]:g_{i,\rho } \text { is not the constant function}-1\} \end{aligned}$$

and let \(\mathcal {T}^c\) denote its complement. Also note that for a \(\rho \in \mathcal {T}\), enumeration of minority assignments in step 2c takes \(2^q \cdot \ell \cdot \mathop {\mathrm {poly}}(m,M)\) time. Therefore, we can bound the total running time by

$$\begin{aligned} \mathop {\mathrm {poly}}(m,M)(2^q\cdot |\mathcal {T}| + |\mathcal {T}^c|). \end{aligned}$$

Next, we claim that the size of \(\mathcal {T}\) is small:

Lemma 26

\(|\mathcal {T}|\le \ell \cdot \sqrt{\delta }\cdot 2^{m-q}\).

Proof

We define for \(i\in [\ell ]\), \(\mathcal {T}_i =\{\rho \in \{-1,1\}^{m-q}|g_{i,\rho } \text { is not the constant function}-1\}\). By the union bound, it is sufficient to show that \(|\mathcal {T}_i| \le \sqrt{\delta }\cdot 2^{m-q}\) for a fixed \(i\in [\ell ]\). Let \(D_m\) denote the uniform distribution on \(\{-1,1\}^m\) i.e. on all possible assignments to the variables \(x_1,\ldots ,x_m\). Then from the definition of \(\delta \)-closeness, we know

$$\begin{aligned} \Pr _{a\sim D_m}[g_i(a) = 1] \le \delta . \end{aligned}$$

Writing LHS in the following way, we have

where \(D_{m-q}\) and \(D_q\) denote uniform distributions on assignments to the last \(m-q\) variables and the first q variables respectively. By Markov’s inequality,

$$\begin{aligned} \Pr _{\rho \sim D_{m-q}}[\Pr _{\chi \sim D_q}[g_{i,\rho } (\chi ) = 1] \ge \sqrt{\delta }] \le \sqrt{\delta }. \end{aligned}$$

Consider a \(\rho \) for which this event does not occur i.e. for which \(\Pr _{\chi \sim D_q}[g_{i,\rho } (\chi ) = 1] < \sqrt{\delta }\). For such a \(\rho \), \(g_{i,\rho }\) has only \(2^q = 1/\sqrt{\delta }\) many inputs and therefore, \(g_{i,\rho }\) must be the constant function \(-1\). Thus, we conclude that

$$\begin{aligned} \Pr _{\rho \sim D_{m-q}}[g_{i,\rho } \text { is not the constant function} -1] \le \sqrt{\delta } \end{aligned}$$

or in other words, \(|\mathcal {T}_i| \le \sqrt{\delta }\cdot 2^{m-q}\). \(\square \)

Finally, by using the trivial bound \(|\mathcal {T}^c| \le 2^{m-q}\) and the above claim, we obtain a total running time of \(\mathop {\mathrm {poly}}(m,M)\cdot \sqrt{\delta }\cdot 2^m\) and this concludes the proof of the lemma. \(\square \)

4.3 #SAT for AND of k-PTFs

We design an algorithm \({\mathcal {A}}_4(n,M,g_1,\ldots ,g_\tau )\) with the following functionality.

Input: A set of k-PTFs \(g_1, \ldots , g_\tau \) specified by polynomials \(P_1, \ldots , P_\tau \) on n variables such that \(w(p_i) \le M\) for each \(i \in [\tau ]\) and \(\sum _{i \in [\tau ]} \text {fan-in}(g_i) \le n^{1+\varepsilon _1}.\)

Oracle access to: \({\mathcal {A}}_1, {\mathcal {A}}_2\).

Output: #\(\{a \in \{-1,1\}^n \mid \forall i \in [\tau ], P_i(a)<0\}\).

4.3.1 The details of the algorithm

\(\mathbf {\mathcal {A}_4(n,M,g_1,\ldots ,g_\tau )}\)

  1. 1.

    Let \(m = n^\alpha \) for \(\alpha = \frac{\zeta \varepsilon _1}{2(k+1)}\). Let C denote the AND of \(g_1, \ldots , g_\tau \).

  2. 2.

    Run \({\mathcal {A}}_2(C,2,n,M)\) to obtain the decision tree \(\mathrm {T_{\text {DT}}}\). Initialize N to 0.

  3. 3.

    For each leaf \(\sigma \) of \(\mathrm {T_{\text {DT}}}\), do the following:

    1. (A)

      If \(C_\sigma \) is not good, count the number of satisfying assignments for \(C_\sigma \) by brute-force and add to \(N\).

    2. (B)

      If \(C_\sigma \) is good, do the following:

      1. (i)

        \(C_\sigma \) is now an AND of PTFs in \(B_\sigma \) and \(G_\sigma \), over \(n' = n^{1 - 2\beta _1}\) variables, where all PTFs in \(B_\sigma \) are \(\delta \)-close to an explicit constant, where \(\delta = \exp (-n^{\beta _1/B\cdot k^2})\). Moreover, \(\left| B_\sigma \right| \le n, \left| G_\sigma \right| \le n^{\beta _1}\). Let \(B_\sigma =\{h_1, \ldots , h_\ell \}\) be specified by \(Q_1, \ldots , Q_\ell \). Suppose for \(i \in [\ell ]\), \(h_i\) is close to \(a_i \in \{-1,1\}\). Then let \(Q'_i = -a_i \cdot Q_i\) and \(h_i' = \mathrm {sgn}(Q'_i)\). Let \(B'_\sigma = \{Q'_1, \ldots , Q'_\ell \}\).

      2. (ii)

        For each restriction \(\rho : \{x_{m+1}, \ldots , x_{n'}\} \rightarrow \{-1, 1\}\), do the following:

        1. (a)

          Check if there exists \(h' \in B'_\sigma \) such that \(h'_\rho \) is not the constant function \(-1\) using \(\mathcal {A}_1(m,1,h'_\rho )\).

        2. (b)

          If such an \(h' \in B'_{\sigma }\) exists, then count the number of satisfying assignments for \(C_{\sigma \rho }\) by brute-force and add to \(N\).

        3. (c)

          If the above does not hold, we have established that for each \(h_i \in B_\sigma \), \(h_{i,\rho }\) is the constant function \(a_i\). If \(\exists i \in [\ell ]\) such that \(a_i =1\), it means \(C_{\sigma \rho }\) is also a constant 1 . Then simply halt. Else set each \(h_i\) to \(a_i\). Thus, \(C_{\sigma \rho }\) has been reduced to an AND of \(n^{\beta _1}\) many PTFs over \(m\) variables. Call this set \(G'_{\sigma \rho }\), use \(\mathcal {A}_1(m,n^{\beta _1},G'_{\sigma \rho })\) to calculate the number of satisfying assignments and add the output to \(N\).

  4. 4.

    Finally, output N.

4.3.2 The correctness argument and running time analysis

Lemma 27

\(\mathcal {A}_4\) is a zero-error randomized algorithm that counts the number of satisfying assignments correctly. Further, \(\mathcal {A}_4\) runs in time \(\mathop {\mathrm {poly}}(n, M)\cdot O(2^{n - n^\alpha })\) and outputs the right answer with probability at least 1/2 (and outputs ? otherwise).

Proof

Correctness For a leaf \(\sigma \) of \(\mathrm {T_{\text {DT}}}\), when \(C_\sigma \) is not good, we simply use brute-force, which is guaranteed to be correct. Otherwise,

  • If \(h'_\rho \) not the constant function \(-1\) for some \(h' \in B'_\sigma \), then we again use brute-force, which is guaranteed to work correctly.

  • Otherwise, for each \(h' \in B'_\sigma \), \(h'_\rho \) is the constant function \(-1\). Here we only need to consider the satisfying assignments for the gates in \(G_{\sigma \rho }\). For this we use \(\mathcal {A}_1\), that works correctly by assumption.

Further, we need to ensure that the parameters that we call \(\mathcal {A}_1\) on, are valid. To see this, observe that \(m = n^\alpha \le n^{1/(2(k+1))}\) because of the setting of \(\alpha \) and further, we have \(n^{\beta _1} \le n^{0.1}\).

Finally, the claim about the error probability follows from the error probability of \(\mathcal {A}_2\) (Theorem 22).

Running Time The time taken for constructing \(\mathrm {T_{\text {DT}}}\) is \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n - n^{1 - 2\beta _1}})\), by Theorem 22. For a leaf \(\sigma \) of \(\mathrm {T_{\text {DT}}}\), we know that step (A) is executed with probability at most \(2^{-n^{\varepsilon _1}}\). The total time for running step (A) is thus \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n - n^{\varepsilon _1}})\). We know that the oracle \(\mathcal {A}_1\) answers calls in \(\mathop {\mathrm {poly}}(n,M)\) time. Hence, the total time for running step (a) is \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n - n^\alpha })\). Next, note that if step (b) is executed, then all PTFs in \(B_\sigma \) are \(\delta \)-close to \(-1\). So, the number of times it runs is at most \(\delta \cdot 2^{n'}\). Therefore, the total time for running step (b) is \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n + n^\alpha - n^{\beta _1/Bk^2}})\). Recall that \(\zeta = \min (1, A/2Bk^2)\), implying \(\alpha =\frac{\zeta \varepsilon _1}{2(k+1)}= \frac{\zeta \beta _1}{2A(k+1)}\le \frac{\beta _1}{4Bk^2(k+1)} < \frac{\beta _1}{Bk^2}\). Similar to the analysis of step (a), the total time for running step (c) is also \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n - n^\alpha })\). We conclude that the total running time is \(\mathop {\mathrm {poly}}(n,M)\cdot O(2^{n - n^\alpha })\). This completes the proof.

4.4 #SAT for larger depth k-PTF circuits

Let C be a k-PTF circuit of depth \(d\ge 1\) on n variables and let \({\mathcal {P}}\) be a set of k-PTFs \(g_1, \ldots , g_\tau \), which are specified by n-variate polynomials \(P_1, \ldots , P_\tau \). Let #SAT\((C,{\mathcal {P}})\) denote #\(\{a \in \{-1,1\}^n \mid C(a)<0 \text { and } \forall i \in [\tau ], P_i(a)<0 \}\). We now specify our depth-reduction algorithm \({\mathcal {A}}_5(n,d,M,n^{1+\varepsilon _d}, C, {\mathcal {P}})\).

Input: \((C, {\mathcal {P}})\) as follows:

  • k-PTF circuit C with parameters \((n,n^{1+\varepsilon _d},d,M)\).

  • a set \({\mathcal {P}}\) of k-PTFs \(g_1, \ldots , g_\tau \) on n variables, which are specified by polynomials \(P_1, \ldots , P_\tau \) such that \(\sum _{i=1}^\tau \text {fan-in}(g_i) \le n^{1+\varepsilon _{d}}\) and for each \(i \in [\tau ]\), \(w(P_i)\le M\).

Oracle access to: \({\mathcal {A}}_1, {\mathcal {A}}_4\).

Output: #SAT\((C,{\mathcal {P}})\).

We start by describing the algorithm.

4.4.1 The details of the algorithm

Let count be a global counter initialized to 0 before the execution of the algorithm.

\(\mathbf {{\mathcal {A}}_5(n,d,M,n^{1+\varepsilon _d},C,{\mathcal {P}})}\)

  1. 1.

    If \(d=1\), output \(\mathcal {A}_4(n,M,\{C\}\cup \mathcal {P})\) and halt.

  2. 2.

    Run \({\mathcal {A}}_2(C,d,n,M)\), which gives us a \(\mathrm {T_{\text {DT}}}\). (If not, output ?.)

  3. 3.

    For each leaf \(\sigma \in \{-1,1\}^{n-n^{1-2\beta _{d}}}\) of \(\mathrm {T_{\text {DT}}}\).

    1. (a)

      For each \(i \in [\tau ]\) compute \(P_{i,\sigma }\), the polynomial obtained by substituting \(\sigma \) in its variables. Let \({\mathcal {P}}_\sigma = \{P_{1,\sigma }, \ldots , P_{\tau , \sigma }\}\).

    2. (b)

      Obtain \(C_\sigma \). If \(C_\sigma \) is not a good circuit, then brute-force to find the number of satisfying assignments of \((C_\sigma ,\mathcal {P}_\sigma )\), say \(N_\sigma \), and set \(\texttt {count} = \texttt {count} + N_\sigma \).

    3. (c)

      If \(C_\sigma \) is good then obtain \(B_\sigma \) and \(G_\sigma \).

    4. (d)

      Let \(B_\sigma =\{h_1, \ldots , h_\ell \}\) be specified by \(Q_1, \ldots , Q_\ell \). We know that each \(h\in B_\sigma \) is \(\delta \)-close to an explicit constant, for \(\delta = 2^{-n^{\beta _d/Bk^2}}.\) Suppose for \(i \in [\ell ]\), \(h_i\) is close to \(a_i \in \{-1,1\}\). Then let \(Q'_i = -a_i \cdot Q_i\) and \(h_i' = \mathrm {sgn}(Q'_i)\). Let \(B'_\sigma = \{Q'_1, \ldots , Q'_\ell \}\).

    5. (e)

      Run \({\mathcal {A}}_3(n^{1-2\beta _{d}},\ell , \delta ,h_1', \ldots , h_\ell ')\) to obtain the set \(\mathcal {N}_\sigma \) of all the minority assignments of \(B_\sigma \). (Note that this uses oracle access to \({\mathcal {A}}_1\).) for each \(a\in \mathcal {N}_\sigma \), if (\((C(a)<0)\) AND \((\forall i \in [\tau ]\), \(P_{i,\sigma }(a)<0)\)), then \(\texttt {count} = \texttt {count}+1\).

    6. (f)

      Let \(G_\sigma = \{f_1, \ldots , f_t\}\) be specified by polynomials \(R_1, \ldots , R_t\). We know that \(t \le n^{\beta _{d}}\). For each \(b \in \{-1,1\}^t\),

      1. i

        Let \(R_i' = -b_i \cdot R_i\) for \(i \in [t]\). Let \(G'_{\sigma ,b} = \{R'_1, \ldots , R'_t\}\).

      2. ii

        Let \(C_{\sigma ,b}\) be the circuit obtained from \(C_\sigma \) by replacing each \(h_i\) by \(a_i\) \(1 \le i \le \ell \) and each \(f_j\) by \(b_j\) for \(1\le j \le t\).

      3. iii

        \({\mathcal {P}}_{\sigma ,b} = {\mathcal {P}}_\sigma \cup B'_\sigma \cup G'_{\sigma ,b}\).

      4. iv

        If \(d > 2\) then run \({\mathcal {A}}_5(n^{1-2\beta _d}, d-1, M, n^{1 + \varepsilon _d}, C_{\sigma , b}, {\mathcal {P}}_{\sigma ,b} )\) \(n_1=10n\) times and let \(N_\sigma \) be the output of the first run that does not output ?. Set \(\texttt {count} = \texttt {count} + N_\sigma \). (If all runs of \(\mathcal {A}_5\) output ?, then output ?.)

      5. v

        If \(d=2\) then run \({\mathcal {A}}_4(n^{1-2\beta _d}, M, C_{\sigma , b} \cup {\mathcal {P}}_{\sigma ,b})\) \(n_1=10n\) times and let \(N_\sigma \) be the output of the first run that does not output ?. Set \(\texttt {count} = \texttt {count} + N_\sigma .\) (If all runs of \(\mathcal {A}_5\) output ?, then output ?.)

  4. 4.

    Output count.

4.4.2 The correctness argument and running time analysis

Lemma 28

The algorithm \(\mathcal {A}_5\) described above is a zero-error randomized algorithm which on input \((C, \mathcal {P})\) as described above, correctly solves #SAT\((C,\mathcal {P})\). Moreover, the algorithm outputs the correct answer (and not ?) with probability at least 1/2. Finally, \(\mathcal {A}_5(n,d,M,n^{1+\varepsilon _d},C,\emptyset )\) runs in time \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n - n^{\zeta \varepsilon _d/2(k+1)}}\), where parameters \(\varepsilon _d, \zeta \) are as defined at the beginning of Sect. 4.

Proof

We argue correctness by induction on the depth d of the circuit C.

Clearly, if \(d=1,\) correctness follows from the correctness of algorithm \(\mathcal {A}_4.\) This takes care of the base case.

If \(d\ge 2\), we argue first that if the algorithm does not output ?, then it does output #SAT\((C,\mathcal {P})\) correctly. Assume that the algorithm \(\mathcal {A}_2\) outputs a decision tree \(\mathrm {T_{\text {DT}}}\) as required (otherwise, the algorithm outputs ? and we are done). Now, it is sufficient to argue that for each \(\sigma ,\) the number of satisfying assignments to \((C_\sigma ,\mathcal {P}_\sigma )\) is computed correctly (if the algorithm does not output ?).

Fix any \(\sigma .\) If \(C_\sigma \) is not a good circuit, then the algorithm uses brute-force to compute #SAT\((C_\sigma ,\mathcal {P}_\sigma )\) which yields the right answer. So we may assume that \(C_\sigma \) is indeed good.

Now, the satisfying assignments to \((C_\sigma ,\mathcal {P}_\sigma )\) break into two kinds: those that are minority assignments to the set \(B_\sigma \) and those that are majority assignments to \(B_\sigma .\) The former set is enumerated in Step 3e (correctly by our analysis of \(\mathcal {A}_3\)) and hence we count all these assignments in this step.

Finally, we claim that the satisfying assignments to \((C_\sigma ,\mathcal {P}_\sigma )\) that are majority assignments of all gates in \(B_\sigma \) are counted in Step 3f. To see this, note that each such assignment \(a\in \{-1,1\}^{n^{1-2\beta _d}}\) forces the gates in \(G_\sigma \) to some values \(b_1,\ldots ,b_t\in \{-1,1\}\). Note that for each such \(b\in \{-1,1\}^t\), these assignments are exactly the satisfying assignments of the pair \((C_{\sigma ,b},\mathcal {P}_{\sigma ,b})\) as defined in the algorithm. In particular, the number satisfying assignments to \((C_\sigma , {\mathcal {P}}_\sigma )\) that are majority assignments of all gates in \(B_\sigma \) can be written as

$$\begin{aligned} \sum _{b\in \{-1,1\}^t}\# \text { SAT}(C_{\sigma ,b},\mathcal {P}_{\sigma ,b}). \end{aligned}$$

We now want to apply the induction hypothesis to argue that all the terms in the sum are computed correctly. To do this, we need to argue that the size of \(C_{\sigma ,b}\) and the total fan-in of the gates in \(\mathcal {P}_{\sigma ,b}\) are bounded as required (note that the total size of C remains the same, while the total fan-in of \(\mathcal {P}\) increases by the total fan-in of the gates in \(B_\sigma '\cup G_{\sigma ,b}'\) which is at most \(n^{1+\varepsilon _d}\)). It can be checked that this boils down to the following two inequalities

$$\begin{aligned} n^{(1-2\beta _d)(1+\varepsilon _{d-1})}\ge n^{1+\varepsilon _{d}}\text { and } n^{(1-2\beta _d)(1+\varepsilon _{d-1})}\le 2n^{1+\varepsilon _{d}} \end{aligned}$$

both of which are easily verified for our choice of parameters (for large enough n). Thus, by the induction hypothesis, all the terms in the sum are computed correctly (unless we get ?). Hence, the output of the algorithm is correct by induction.

Now, we analyze the probability of error. If \(d=1\), the probability of error is at most 1/2 by the analysis of \(\mathcal {A}_4.\) If \(d > 2\), we get an error if either \(\mathcal {A}_2\) outputs ? or there is some \(\sigma \) such that the corresponding runs of \(\mathcal {A}_5\) or \(\mathcal {A}_4\) output ?. The probability of each is at most \(1/2^{10n}\). Taking a union bound over at most \(2^n\) many \(\sigma ,\) we see that the probability of error is at most \(1/2^{\Omega (n)}\le 1/2.\)

Finally, we analyze the running time. Define \(\mathcal {T}(n,d,M)\) to be the running time of the algorithm on a pair \((C,\mathcal {P})\) as specified in the input description above. We need the following claim.

Lemma 29

\(\mathcal {T}(n,d,M)\le \mathop {\mathrm {poly}}(n,M)\cdot 2^{n-n^{\zeta \varepsilon _d/2(k+1)}}.\)

To see the above, we argue by induction. The case \(d=1\) follows from the running time of \(\mathcal {A}_4.\) Further from the description of the algorithm, we get the following inequality for \(d\ge 2.\)

$$\begin{aligned} \mathcal {T}(n,d,M)\le \mathop {\mathrm {poly}}(n,M)\cdot (2^{n-n^{1-2\beta _d}} + 2^{n-n^{\varepsilon _d}} + 2^{n-\frac{1}{2}\cdot n^{\beta _d/(Bk^2)}} + 2^{n-n^{(1-2\beta _d)\zeta \varepsilon _{d-1}/2(k+1)}}) \end{aligned}$$
(2)

The first term above accounts for the running time of \(\mathcal {A}_2\) and all steps other than Steps 3b, 3e and 3f. The second term accounts for the brute force search in Step 3b since there are only a \(2^{-n^{\varepsilon _d}}\) fraction of \(\sigma \) where it is performed. The third term accounts for the minority enumeration algorithm in Step 3e (running time follows from the running time of that algorithm). The last term is the running time of Step 3f and follows from the induction hypothesis.

It suffices to argue that each term in the RHS of (2) can be bounded by \(2^{n-n^{\zeta \varepsilon _d/2(k+1)}}.\) This is an easy verification from our choice of parameters and left to the reader. This concludes the proof. \(\square \)

4.5 Putting it together

In this subsection, we complete the proof of Theorem 9 using the aforementioned subroutines. We also need to describe the subroutine \(\mathcal {A}_1\), which is critical for all the other subroutines. We shall do so inside our final algorithm for the #SAT problem for k-PTF circuits, algorithm \(\mathcal {B}\). Recall that \(\mathcal {A}_1\) has the following specifications:

Input: AND of k-PTFs, say \(f_1, \ldots , f_s\) specified by polynomials \(P_1, \ldots , P_s\) respectively, such that \(s \le n^{0.1}\) and for each \(i \in [s]\), \(f_i\) is defined over \(n'\le n^{1/(2(k+1))}\) variables and \(w(P_i)\le M\).

Output: #\(\{a \in \{-1,1\}^{n'} \mid \forall i \in [s], f_i(a) = -1\}\).

We are now ready to complete the proof of Theorem 9. Suppose C is the input k-PTF circuit with parameters \((n,n^{1+\varepsilon _d},d,M)\). On these input parameters \((C,n,n^{1+\varepsilon _d},d,k,M)\), we finally have the following algorithm for the #SAT problem for k-PTF circuits:

\(\mathbf {\mathcal {B}(C,n,n^{1+\varepsilon _d},d,k,M)}\)

  1. 1.

    (Oracle Construction Step) Construct the oracle \(\mathcal {A}_1\) as follows. Use the algorithm from Corollary 13, with \(\ell \) chosen to be \( n^{0.1}\) and m to be \(n^{1/2(k+1)}\), to construct a deterministic linear decision tree T such that on any input \({\overline{w}} = (\mathrm {coeff}_{m,k}(Q_1),\ldots ,\mathrm {coeff}_{m,k}(Q_\ell ))\in \mathbb {R}^{r\cdot \ell }\) (where \(Q_i\)s are polynomials of degree at most k that sign-represent k-PTFs \(g_i\), each on m variables), T computes the number of common satisfying assignments to \(g_1,\ldots ,g_\ell \).

  2. 2.

    Run \(\mathcal {A}_5(n,d,M,n^{1+\varepsilon _d},C,\emptyset )\). For an internal call to \(\mathcal {A}_1\), say on parameters \((n',s,f_1,\ldots ,f_s)\) where \(n'\le m\) and \(s\le \ell \), run T on the input \({\overline{w}} = (\mathrm {coeff}_{n',k}(P_1),\ldots ,\mathrm {coeff}_{n',k}(P_s))\in \mathbb {R}^{r\cdot s}\). (We expand out the coefficient vectors with dummy variables so that they depend on exactly m variables. Similarly, using some dummy polynomials, we can assume that there are exactly \(\ell \) polynomials.)

Lemma 30

The construction of the oracle \(\mathcal {A}_1\) in the above algorithm takes \(2^{O(n^{0.6})}\) time. Once constructed, the oracle \(\mathcal {A}_1\) answers any call (with valid parameters) in \(\mathop {\mathrm {poly}}(n,M)\) time.

Proof

Substituting the parameters \(\ell = n^{0.1}\) and \(m = n^{1/(2(k+1))}\) in Corollary 13, we see that the construction of \(\mathcal {A}_1\) (step 1) takes \(2^{O(n^{0.6}\log ^2 n)}\) time. Also, the claimed running time of answering a call follows from the bound on the depth of T given by the proof of Corollary 13. \(\square \)

With the correctness of \(\mathcal {A}_1\) now firmly established, we finally argue the correctness and running time of algorithm \(\mathcal {B}\).

Correctness The correctness of \(\mathcal {B}\) follows from that of \(\mathcal {A}_1,\mathcal {A}_2,\mathcal {A}_3,\mathcal {A}_4,\) and \(\mathcal {A}_5\) (see Lemma 30, Theorem 22, Lemmas 2527, and 28 respectively). From the analysis of \(\mathcal {A}_5,\) we see that the probability of error in \(\mathcal {B}\) is at most 1/2.

Running Time By Lemma 28 and 30, the running time of \(\mathcal {B}\) will be \(2^{O(n^{0.6}\log ^2 n)}+\mathop {\mathrm {poly}}(n,M)\cdot 2^{n - n^{\zeta \varepsilon _d/2(k+1)}}\). Thus, the final running time is \(\mathop {\mathrm {poly}}(n,M)\cdot 2^{n-S}\) where \(S = n^{\zeta \varepsilon _{d}/2(k+1)}\) and where \(\varepsilon _{d}>0\) is a constant depending only on k and d. Setting \(\varepsilon _{k,d} = \zeta \varepsilon _d/2(k+1)\) gives the statement of Theorem 9.