Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

A \(k\)-bit gate is a function \(f : \{0,1\}^k \rightarrow \{0,1\}\). A formula \(\varphi \) over a set of gates \(\mathcal{{S}}\) is a rooted tree in which each node with \(k\) children is associated to a \(k\)-bit gate from \(\mathcal{{S}}\), for \(k = 1, 2, \ldots \). Any such tree with \(n\) leaves naturally defines a function \(\varphi : \{0,1\}^n \rightarrow \{0,1\}\), by placing the input bits on the leaves in a fixed order and evaluating the gates recursively toward the root. Such functions are often called read-once formulas, as each input bit is associated to one leaf only.

The formula-evaluation problem is to evaluate a formula \(\varphi \) over \(\mathcal{{S}}\) on an input \(x \in \{0,1\}^n\). The formula is given, but the input string \(x\) must be queried one bit at a time. How many queries to \(x\) are needed to compute \(\varphi (x)\)? We would like to understand this complexity as a function of \(\mathcal{{S}}\) and asymptotic properties of \(\varphi \). Roughly, larger gate sets allow \(\varphi \) to have less structure, which increases the complexity of evaluating \(\varphi \). Another important factor is often the balancedness of the tree \(\varphi \). Unbalanced formulas often seem to be more difficult to evaluate.

For applications, the most important gate set consists of all AND and OR gates. Formulas over this set are known as AND-OR formulas. Evaluating such a formula solves the decision version of a MIN-MAX tree, also known as a two-player game tree. Unfortunately, the complexity of evaluating formulas, even over this limited gate set, is unknown, although important special cases have been solved. The problem over much larger gate sets appears to be combinatorially intractable. For some formulas, it is known that “non-directional” algorithms that do not work recursively on the structure of the formula perform better than any recursive procedure.

In this article, we show that the formula-evaluation problem becomes dramatically simpler when we allow the algorithm to be a bounded-error quantum algorithm, and allow it coherent query access to the input string \(x\). Fix \(\mathcal{{S}}\) to be any finite set of gates. We give an optimal quantum algorithm for evaluating “almost-balanced” formulas over \(\mathcal{{S}}\). The balance condition states that the complexities of the input subformulas to any gate differ by at most a constant factor, where complexity is measured by the general adversary bound \({\mathrm {Adv}^{\pm }}\). In general, \({\mathrm {Adv}^{\pm }}\) is the value of an exponentially large semi-definite program (SDP). For a formula \(\varphi \) with constant-size gates, though, \({\mathrm {Adv}^{\pm }}(\varphi )\) can be computed efficiently by solving constant-size SDPs for each gate.

To place this work in context, some classical and quantum results for evaluating formulas are summarized in Table 1. The stated upper bounds are on query complexity and not time complexity. However, for the \(\mathrm{OR }_n\) and balanced \(\mathrm{AND }_2\)-\(\mathrm{OR }_2\) formulas, the quantum algorithms’ running times are only slower by a poly-logarithmic factor. For the other formulas, the quantum algorithms’ running times are slower by a poly-logarithmic factor provided that:

  1. 1.

    A polynomial-time classical preprocessing step, outputting a string \(s(\varphi )\), is not charged for.

  2. 2.

    The algorithms are allowed unit-cost coherent access to \(s(\varphi )\).

Our algorithm is based on the framework relating span programs and quantum algorithms from [Rei09]. Previous work has used span programs to develop quantum algorithms for evaluating formulas [RŠ08]. Using this and the observation that the optimal span program witness size for a boolean function \(f\) equals the general adversary bound \({\mathrm {Adv}^{\pm }}(f)\), Ref. [Rei09] gives an optimal quantum algorithm for evaluating “adversary-balanced” formulas over an arbitrary finite gate set. The balance condition is that each gate’s input subformulas have equal general adversary bounds.

In order to relax this strict balance requirement, we must maintain better control in the recursive analysis. To help do so, we define a new span program complexity measure, the “full witness size.” This complexity measure has implications for developing time- and query-efficient quantum algorithms based on span programs. Essentially, using a second result from [Rei09], that properties of eigenvalue-zero eigenvectors of certain bipartite graphs imply “effective” spectral gaps around zero, it allows quantum algorithms to be based on span programs with free inputs. This can simplify the implementation of a quantum walk on the corresponding graph.

Besides allowing a relaxed balance requirement, our approach has the additional advantage of making the constants hidden in the big-\(O\) notation more explicit. The formula-evaluation quantum algorithms in [RŠ08,Rei09] evaluate certain formulas \(\varphi \) using \(O\big ({\mathrm {Adv}^{\pm }}(\varphi )\big )\) queries, where the hidden constant depends on the gates in \(\mathcal{{S}}\) in a complicated manner. It is not known how to upper-bound the hidden constant in terms of, say, the maximum fan-in \(k\) of a gate in \(\mathcal{{S}}\). In contrast, the approach we follow here allows bounding this constant by an exponential in \(k\).

Table 1. Comparison of some classical and quantum query complexity results for formula evaluation. Here \(\mathcal{{S}}\) is any fixed, finite gate set, and the exponent \(\alpha \) is given by \(\alpha = \log _2 (\frac{1 + \sqrt{33}}{4}) \approx 0.753\). Under certain assumptions, the algorithms’ running times are only poly-logarithmically slower.

It is known that the general adversary bound is a nearly tight lower bound on quantum query complexity for any boolean function [Rei09], including in particular boolean formulas. However, this comes with no guarantees on time complexity. The main contribution of this paper is to give a nearly time-optimal algorithm for formula evaluation. The algorithm is also tight for query complexity, removing the extra logarithmic factor from the bound in [Rei09].

Additionally, we apply the same technique to study AND-OR formulas. For this special case, special properties of span programs for AND and for OR gates allow the almost-balance condition to be significantly weakened. Ambainis et al. [ACR+10] have studied this case previously. By applying the span program framework, we identify a slight weakness in their analysis. Tightening the analysis extends the algorithm’s applicability to a broader class of AND-OR formulas.

A companion paper [Rei11] applies the span program framework to the problem of evaluating arbitrary AND-OR formulas. By studying the full witness size for span programs constructed using a novel composition method, it gives an \(O(\sqrt{n} \log n)\)-query quantum algorithm to evaluate a formula of size \(n\), for which the time complexity is poly-logarithmically worse after preprocessing. This nearly matches the \(\varOmega (\sqrt{n})\) lower bound, and improves a \(\sqrt{n} 2^{O(\sqrt{\log n})}\)-query quantum algorithm from [ACR+10]. Reference [Rei11] shares the broader motivation of this paper, to study span program properties and design techniques that lead to time-efficient quantum algorithms.

Sections 1.1 and 1.2 below give further background on the formula-evaluation problem, for classical and quantum algorithms. Section 1.3 precisely states our main theorem, the proof of which is given in Sect. 3 after some background on span programs. The theorem for approximately balanced AND-OR formulas is stated in Sect. 1.4, and proved in Sect. 4.

1.1 History of the Formula-Evaluation Problem for Classical Algorithms

For a function \(f : \{0,1\}^n \rightarrow \{0,1\}\), let \(D(f)\) be the least number of input bit queries sufficient to evaluate \(f\) on any input with zero error. \(D(f)\) is known as the deterministic decision-tree complexity of \(f\), or the deterministic query complexity of \(f\). Let the randomized decision-tree complexity of \(f\), \(R(f) \le D(f)\), be the least expected number of queries required to evaluate \(f\) with zero error (i.e., by a Las Vegas randomized algorithm). Let the Monte Carlo decision-tree complexity, \(R_2(f) = O\big (R(f)\big )\), be the least number of queries required to evaluate \(f\) with error probability at most \(1/3\) (i.e., by a Monte Carlo randomized algorithm).

Classically, formulas over the gate set \(\mathcal{{S}}= \{ \mathrm{NAND }_k : k \in \mathbf{N}\}\) have been studied most extensively, where \(\mathrm{NAND }_k(x_1, \ldots , x_k) = 1 - \prod _{j=1}^k x_j\). By De Morgan’s rules, any formula over \(\mathrm{NAND }\) gates can also be written as a formula in which the gates at an even distance from the formula’s root are \(\mathrm{AND }\) gates and those an odd distance away are \(\mathrm{OR }\) gates, with some inputs or the output possibly complemented. Thus formulas over \(\mathcal{{S}}\) are also known as AND-OR formulas.

For any AND-OR formula \(\varphi \) of size \(n\), i.e., on \(n\) inputs, \(D(\varphi ) = n\). However, randomization gives a strict advantage; \(R(\varphi )\) and \(R_2(\varphi )\) can be strictly smaller. Indeed, let \(\varphi _d\) be the complete, binary AND-OR formula of depth \(d\), corresponding to the tree in which each internal vertex has two children and every leaf is at distance \(d\) from the root, with alternating levels of AND and OR gates. Its size is \(n = 2^d\). Snir [Sni85] has given a randomized algorithm for evaluating \(\varphi _d\) using in expectation \(O(n^\alpha )\) queries, where \(\alpha = \log _2 (\frac{1+\sqrt{33}}{4}) \approx 0.753\) [SW86]. This algorithm, known as randomized alpha-beta pruning, evaluates a random subformula recursively, and only evaluates the second subformula if necessary. Saks and Wigderson [SW86] have given a matching lower bound on \(R(\varphi _d)\), which Santha has extended to hold for Monte Carlo algorithms, \(R_2(\varphi _d) = \varOmega (n^\alpha )\) [San95].

Thus the query complexities have been characterized for the complete, binary AND-OR formulas. In fact, the tight characterization works for a larger class of formulas, called “well balanced” formulas by [San95]. This class includes, for example, alternating \(\mathrm{AND }_2\)-\(\mathrm{OR }_2\) formulas where for some \(d\) every leaf is at depth \(d\) or \(d-1\), Fibonacci trees and binomial trees [SW86]. It also includes skew trees, for which the depth is the maximal \(n-1\).

For arbitrary AND-OR formulas, on the other hand, little is known. It has been conjectured that complete, binary AND-OR formulas are the easiest to evaluate, and that in particular \(R(\varphi ) = \varOmega (n^\alpha )\) for any size-\(n\) AND-OR formula \(\varphi \) [SW86]. However, the best general lower bound is \(R(\varphi ) = \varOmega (n^{0.51})\), due to Heiman and Wigderson [HW91]. Reference [HW91] also extends the result of [SW86] to allow for AND and OR gates with fan-in more than two.

It is perhaps not surprising that formulas over most other gate sets \(\mathcal{{S}}\) are even less well understood. For example, Boppana has asked the complexity of evaluating the complete ternary majority (\({\mathrm{MAJ }_3}\)) formula of depth \(d\) [SW86], and the best published bounds on its query complexity are \(\varOmega \big ((7/3)^d\big )\) and \(O\big ((2.6537\ldots )^d\big )\) [JKS03]. In particular, the naïve, “directional,” generalization of the randomized alpha-beta pruning algorithm is to evaluate recursively two random immediate subformulas and, if they disagree, then also the third. This algorithm uses \(O\big ((8/3)^d\big )\) expected queries, and is suboptimal. This suggests that the complete \({\mathrm{MAJ }_3}\) formulas are significantly different from the complete AND-OR formulas.

Heiman, Newman and Wigderson have considered read-once threshold formulas in an attempt to separate the complexity classes \(\mathsf {TC^0}\) from \(\mathsf {NC^1}\) [HNW93]. That is, they allow the gate set to be the set of Hamming-weight threshold gates \(\{ {T_{{m}}^{{k}}} : m, k \in \mathbf{N}\}\) defined by \({T_{{m}}^{{k}}}: \{0, 1\}^k \rightarrow \{0,1\}\), \({T_{{m}}^{{k}}}(x) = 1\) if and only if the Hamming weight of \(x\) is at least \(m\). AND, OR and majority gates are all special cases of threshold gates. Heiman et al. prove that \(R(\varphi ) \ge n/2^d\) for \(\varphi \) a threshold formula of depth \(d\), and in fact their proof extends to gate sets in which every gate “contains a flip” [HNW93]. This implies that a large depth is necessary for the randomized complexity to be much lower than the deterministic complexity.

Of course there are some trivial gate sets for which the query complexity is fully understood, for example, the set of parity gates. Overall, though, there are many more open problems than results. Despite its structure, formula evaluation appears to be combinatorially complicated. However, there is another approach, to try to leverage the power of quantum computers. Surprisingly, the formula-evaluation problem simplifies considerably in this different model of computation.

1.2 History of the Formula-Evaluation Problem for Quantum Algorithms

In the quantum query model, the input bits can be queried coherently. That is, the quantum algorithm is allowed unit-cost access to the unitary operator \(O_x\), called the input oracle, defined by

$$\begin{aligned} O_x: \, {|\varphi \rangle }\otimes {|j\rangle } \otimes {|b\rangle } \mapsto {|\varphi \rangle }\otimes {|j\rangle } \otimes {|b \oplus x_j\rangle } . \end{aligned}$$
(1.1)

Here \({|\varphi \rangle }\) is an arbitrary pure state, \(\{{|j\rangle } : j = 1,2,\ldots ,n\}\) is an orthonormal basis for \(\mathbf{C}^{n}\), \(\{{|b\rangle } : b = 0, 1\}\) is an orthonormal basis for \(\mathbf{C}^2\), and \(\oplus \) denotes addition mod two. \(O_x\) can be implemented efficiently on a quantum computer given a classical circuit that computes the function \(j \mapsto x_j\) [NC00]. For a function \(f : \{0,1\}^n \rightarrow \{0,1\}\), let \(Q(f)\) be the number of input queries required to evaluate \(f\) with error probability at most \(1/3\). It is immediate that \(Q(f) \le R_2(f)\).

Research on the formula-evaluation problem in the quantum model began with the \(n\)-bit OR function, \(\mathrm{OR }_n\). Grover gave a quantum algorithm for evaluating \(\mathrm{OR }_n\) with bounded one-sided error using \(O(\sqrt{n})\) oracle queries and \(O(\sqrt{n} \log \log n)\) time [Gro96,Gro02]. In the classical case, on the other hand, it is obvious that \(R_2(\mathrm{OR }_n)\), \(R(\mathrm{OR }_n)\) and \(D(\mathrm{OR }_n)\) are all \(\varTheta (n)\).

Grover’s algorithm can be applied recursively to speed up the evaluation of more general AND-OR formulas. Call a formula layered if the gates at the same depth are the same. Buhrman, Cleve and Wigderson show that a layered, depth-\(d\), size-\(n\) AND-OR formula can be evaluated using \(O(\sqrt{n} \log ^{d-1} n)\) queries [BCW98]. The logarithmic factors come from using repetition at each level to reduce the error probability from a constant to be polynomially small.

Høyer, Mosca and de Wolf [HMW03] consider the case of a unitary input oracle \(\tilde{O}_x\) that maps

$$\begin{aligned} \tilde{O}_x: \, {|\varphi \rangle }\otimes {|j\rangle } \otimes {|b\rangle } \otimes {|0\rangle } \mapsto {|\varphi \rangle }\otimes {|j\rangle } \otimes \big ( {|b \oplus x_j\rangle } \otimes {|\psi _{x,j,x_j}\rangle } + {|b \oplus \overline{x}_j\rangle } \otimes {|\psi _{x,j,\overline{x}_j}\rangle } \big ) , \end{aligned}$$
(1.2)

where \({|\psi _{x,j,x_j}\rangle }\), \({|\psi _{x,j,\overline{x}_j}\rangle }\) are pure states with \({\Vert {|\psi _{x,j,x_j}\rangle } \Vert }^2 \ge 2/3\). Such an oracle can be implemented when the function \(j \mapsto x_j\) is computed by a bounded-error, randomized subroutine. Høyer et al. allow access to \(\tilde{O}_x\) and \(\tilde{O}_x^{-1}\), both at unit cost, and show that \(\mathrm{OR }_n\) can still be evaluated using \(O(\sqrt{n})\) queries. This robustness result implies that the \(\log n\) steps of repetition used by [BCW98] are not necessary, and a depth-\(d\) layered AND-OR formula can be computed in \(O(\sqrt{n} \, c^{d-1})\) queries, for some constant \(c > 1000\). If the depth is constant, this gives an \(O(\sqrt{n})\)-query quantum algorithm, but the result is not useful for the complete, binary AND-OR formula, for which \(d = \log _2 n\).

In 2007, Farhi, Goldstone and Gutmann presented a quantum algorithm for evaluating complete, binary AND-OR formulas [FGG08]. Their breakthrough algorithm is not based on iterating Grover’s algorithm in any way, but instead runs a quantum walk—analogous to a classical random walk—on a graph based on the formula. The algorithm runs in time \(O(\sqrt{n})\) in a certain continuous-time query model.

Ambainis et al. discretized the [FGG08] algorithm by reinterpreting a correspondence between (discrete-time) random and quantum walks due to Szegedy [Sze04] as a correspondence between continuous-time and discrete-time quantum walks [ACR+10]. Applying this correspondence to quantum walks on certain weighted graphs, they gave an \(O(\sqrt{n})\)-query quantum algorithm for evaluating “approximately balanced” AND-OR formulas. For example, \({\mathrm{MAJ }_3}(x_1, x_2, x_3) = (x_1 \wedge x_2) \vee \big ((x_1 \vee x_2) \wedge x_3\big )\), so there is a size-\(5^d\) AND-OR formula that computes \({\mathrm{MAJ }_3}^d\) the complete ternary majority formula of depth \(d\). Since the formula is approximately balanced, \(Q({\mathrm{MAJ }_3}^d) = O(\sqrt{5}^d)\), better than the \(\varOmega \big ((7/3)^d\big )\) classical lower bound.

The [ACR+10] algorithm also applies to arbitrary AND-OR formulas. If \(\varphi \) has size \(n\) and depth \(d\), then the algorithm, applied directly, evaluates \(\varphi \) using \(O(\sqrt{n} \, d)\) queries.Footnote 1 This can be as bad as \(O(n^{3/2})\) if the depth is \(d = n\). However, Bshouty, Cleve and Eberly have given a formula rebalancing procedure that takes AND-OR formula \(\varphi \) as input and outputs an equivalent AND-OR formula \(\varphi '\) with depth \(d' = 2^{O(\sqrt{\log n})}\) and size \(n' = n \, 2^{O(\sqrt{\log n})}\) [BCE91,BB94]. The formula \(\varphi '\) can then be evaluated using \(O(\sqrt{n'} \, d' ) = \sqrt{n} \, 2^{O(\sqrt{\log n})}\) queries.

Our understanding of lower bounds for the formula-evaluation problem progressed in parallel to this progress on quantum algorithms. There are essentially two techniques, the polynomial and adversary methods, for lower-bounding quantum query complexity.

  • The polynomial method, introduced in the quantum setting by Beals et al. [BBC+01], is based on the observation that after making \(q\) oracle \(O_x\) queries, the probability of any measurement result is a polynomial of degree at most \(2q\) in the variables \(x_j\).

  • Ambainis generalized the classical hybrid argument, to consider the system’s entanglement when run on a superposition of inputs [Amb02]. A number of variants of Ambainis’s bound were soon discovered, including weighted versions [HNS02,BS04,Amb06 Zha05], a spectral version [BSS03], and a version based on Kolmogorov complexity [LM04]. These variants can be asymptotically stronger than Ambainis’s original unweighted bound, but are equivalent to each other [ŠS06]. We therefore term it simply “the adversary bound,” denoted by \({\mathrm {Adv}}\).

The adversary bound is well-suited for lower-bounding the quantum query complexity for evaluating formulas. For example, Barnum and Saks proved that for any size-\(n\) AND-OR formula \(\varphi \), \({\mathrm {Adv}}(\varphi ) = \sqrt{n}\), implying the lower bound \(Q(\varphi ) = \varOmega (\sqrt{n})\) [BS04]. Thus the [ACR+10] algorithm is optimal for approximately balanced AND-OR formulas, and is nearly optimal for arbitrary AND-OR formulas. This is a considerably more complete solution than is known classically.

It is then natural to consider formulas over larger gate sets. The adversary bound continues to work well, because it transforms nicely under function composition:

Theorem 1

(Adversary bound composition [Amb06,LLS06,HLŠ05]). Let \(f : \{0,1\}^k \rightarrow \{0,1\}\) and let \(f_j : \{0,1\}^{m_j} \rightarrow \{0,1\}\) for \(j = 1, 2, \ldots , k\). Define \(g : \{0,1\}^{m_1} \times \cdots \times \{0,1\}^{m_k} \rightarrow \{0,1\}\) by \(g(x) = f\big (f_1(x_1), \ldots , f_k(x_k)\big )\). Let \(s = ({\mathrm {Adv}}(f_1), \ldots , {\mathrm {Adv}}(f_k))\). Then

$$\begin{aligned} {\mathrm {Adv}}(g)&= {\mathrm {Adv}}_s(f) . \end{aligned}$$
(1.3)

See Definition 3 for the definition of the adversary bound with “costs,” \({\mathrm {Adv}}_s\). The \({\mathrm {Adv}}\) bound equals \({\mathrm {Adv}}_s\) with uniform, unit costs \(s = \overrightarrow{1}\). For a function \(f\), \({\mathrm {Adv}}(f)\) can be computed using a semi-definite program in time polynomial in the size of \(f\)’s truth table. Therefore, Theorem 1 gives a polynomial-time procedure for computing the adversary bound for a formula \(\varphi \) over an arbitrary finite gate set: compute the bounds for subformulas, moving from the leaves toward the root. At an internal node \(f\), having computed the adversary bounds for the input subformulas \(f_1, \ldots , f_k\), Eq. (1.3) says that the adversary bound for \(g\), the subformula rooted at \(f\), equals the adversary bound for the gate \(f\) with certain costs. Computing this requires \(2^{O(k)}\) time, which is a constant if \(k = O(1)\). For example, if \(f\) is an \(\mathrm{OR }_k\) or \(\mathrm{AND }_k\) gate, then \({\mathrm {Adv}}_{(s_1, \ldots , s_k)}(f) = \sqrt{\sum _j s_j^2}\), from which follows immediately the [BS04] result \({\mathrm {Adv}}(\varphi ) = \sqrt{n}\) for a size-\(n\) AND-OR formula \(\varphi \).

A special case of Theorem 1 is when the functions \(f_j\) all have equal adversary bounds, so \({\mathrm {Adv}}(g) = {\mathrm {Adv}}(f) {\mathrm {Adv}}(f_1)\). In particular, for a function \(f : \{0,1\}^k \rightarrow \{0,1\}\) and a natural number \(d \in \mathbf{N}\), let \(f^d : \{0,1\}^{k^d} \rightarrow \{0,1\}\) denote the complete, depth-\(d\) formula over \(f\). That is, \(f^1 = f\) and \(f^d(x) = f \big (f^{d-1}(x_1,\ldots , x_{k^{d-1}}), \ldots , f^{d-1}(x_{k^d-k^{d-1}+1}, \ldots , x_{k^d}) \big )\) for \(d\!>\!1\). Then we obtain:

Corollary 1

For any function \(f : \{0,1\}^k \rightarrow \{0,1\}\),

$$\begin{aligned} {\mathrm {Adv}}(f^d) = {\mathrm {Adv}}(f)^d . \end{aligned}$$
(1.4)

In particular, Ambainis defined a boolean function \(f : \{0,1\}^4 \rightarrow \{0,1\}\) that can be represented exactly by a polynomial of degee two, but for which \({\mathrm {Adv}}(f) = 5/2\) [Amb06]. Thus \(f^d\) can be represented exactly by a polynomial of degree \(2^d\), but by Corollary 1, \({\mathrm {Adv}}(f^d) = (5/2)^d\). For this function, the adversary bound is strictly stronger than any bound obtainable using the polynomial method. Many similar examples are given in [HLŠ06]. However, for other functions, the adversary bound is asymptotically worse than the polynomial method [ŠS06,AS04,Amb05].

In 2007, though, Høyer et al. discovered a strict generalization of \({\mathrm {Adv}}\) that also lower-bounds quantum query complexity [HLŠ06]. We call this new bound the general adversary bound, or \({\mathrm {Adv}^{\pm }}\). For example, for Ambainis’s four-bit function \(f\), \({\mathrm {Adv}^{\pm }}(f) \ge 2.51\) [HLŠ06]. Like the adversary bound, \(\mathrm{ADV }^{\pm }_{s}(f)\) can be computed in time polynomial in the size of \(f\)’s truth table, and also composes nicely:

Theorem 2

([HLŠ06,Rei09]). Under the conditions of Theorem 1,

$$\begin{aligned} {\mathrm {Adv}^{\pm }}(g)&= \mathrm{ADV }^{\pm }_{s}(f) . \end{aligned}$$
(1.5)

In particular, if \({\mathrm {Adv}^{\pm }}(f_1) = \cdots = {\mathrm {Adv}^{\pm }}(f_k)\), then we have \({\mathrm {Adv}^{\pm }}(g)\) \(= {\mathrm {Adv}^{\pm }}(f) {\mathrm {Adv}^{\pm }}(f_1)\).

Define a formula \(\varphi \) to be adversary balanced if at each internal node, the general adversary bounds of the input subformulas are equal. In particular, by Theorem 2 this implies that \({\mathrm {Adv}^{\pm }}(\varphi )\) is equal to the product of the general adversary bounds of the gates along any path from the root to a leaf. Complete, layered formulas are an example of adversary-balanced formulas.

Returning to upper bounds, Reichardt and Špalek [RŠ08] generalized the algorithmic approach started by [FGG08]. They gave an optimal quantum algorithm for evaluating adversary-balanced formulas over a considerably extended gate set, including in particular all functions \(\{0,1\}^k \rightarrow \{0,1\}\) for \(k \le 3\), \(69\) inequivalent four-bit functions, and the gates \(\mathrm{AND }_k\), \(\mathrm{OR }_k\), \(\mathrm{PARITY }_k\) and \({\mathrm{EQUAL }}_k\), for \(k = O(1)\). For example, \(Q({\mathrm{MAJ }_3}^d) = \varTheta (2^d)\).

The [RŠ08] result follows from a framework for developing formula-evaluation quantum algorithms based on span programs. A span program, introduced by Karchmer and Wigderson [KW93], is a certain linear-algebraic way of defining a function, which corresponds closely to eigenvalue-zero eigenvectors of certain bipartite graphs. [RŠ08] derived a quantum algorithm for evaluating certain concatenated span programs, with a query complexity upper-bounded by the span program witness size, denoted wsize. In particular, a special case of [RŠ08, Theorem 4.7] is:

Theorem 3

([RŠ08]). Fix a function \(f : \{0,1\}^k \rightarrow \{0,1\}\). If span program \(P\) computes \(f\), then

$$\begin{aligned} Q(f^d) = O\big ({{\mathrm{wsize }}({P})}^d\big ) . \end{aligned}$$
(1.6)

From Theorem 2, this result is optimal if \({{\mathrm{wsize }}({P})} = {\mathrm {Adv}^{\pm }}(f)\). The question therefore becomes how to find optimal span programs. Using an ad hoc search, [RŠ08] found optimal span programs for a variety of functions with \({\mathrm {Adv}^{\pm }}= {\mathrm {Adv}}\). Further work automated the search, by giving a semi-definite program (SDP) for the optimal span program witness size for any given function [Rei09]. Remarkably, the SDP’s value always equals the general adversary bound:

Theorem 4

([Rei09]). For any function \(f : \{0,1\}^n \rightarrow \{0,1\}\),

$$\begin{aligned} \inf _{P} {{\mathrm{wsize }}({P})} = {\mathrm {Adv}^{\pm }}(f) , \end{aligned}$$
(1.7)

where the infimum is over span programs \(P\) computing \(f\). Moreover, this infimum is achieved.

This result greatly extends the gate set over which the formula-evaluation algorithm of [RŠ08] works optimally. For example, combined with Theorem 3, it implies that \(\lim _{d \rightarrow \infty } Q(f^d)^{1/d} = {\mathrm {Adv}^{\pm }}(f)\) for every boolean function \(f\). More generally, Theorem 4 allows the [RŠ08] algorithm to be run on formulas over any finite gate set \(\mathcal{{S}}\). A factor is lost that depends on the gates in \(\mathcal{{S}}\), but it will be a constant for \(\mathcal{{S}}\) finite. Combining Theorem 4 with [RŠ08, Theorem 4.7] gives:

Theorem 5

([Rei09]). Let \(\mathcal{{S}}\) be a finite set of gates. Then there exists a quantum algorithm that evaluates an adversary-balanced formula \(\varphi \) over \(\mathcal{{S}}\) using \(O\big ({\mathrm {Adv}^{\pm }}(\varphi )\big )\) input queries. After efficient classical preprocessing independent of the input \(x\), and assuming unit-time coherent access to the preprocessed classical string, the running time of the algorithm is \({\mathrm {Adv}^{\pm }}(\varphi ) \big (\log {\mathrm {Adv}^{\pm }}(\varphi )\big )^{O(1)}\).

In the discussion so far, we have for simplicity focused on query complexity. The query complexity is an information-theoretic quantity that does not charge for operations independent of the input string, even though these operations may require many elementary gates to implement. For practical algorithms, it is important to be able to bound the algorithm’s running time, which counts the cost of implementing the input-independent operations. Theorem 5 puts an optimal bound on the query complexity, and also puts a nearly optimal bound on the algorithm’s time complexity. In fact, all of the query-optimal algorithms so far discussed are also nearly time optimal.

In general, though, an upper bound on the query complexity does not imply an upper bound on the time complexity. Reference [Rei09] also generalized the span program framework of [RŠ08] to apply to quantum algorithms not based on formulas. The main result of [Rei09] is:

Theorem 6

([Rei09]). For any function \(f : {\mathcal {D}}\rightarrow \{1, 2, \ldots , m\}\), with \({\mathcal {D}}\subseteq \{0,1\}^n\), \(Q(f)\) satisfies

$$\begin{aligned} Q(f) = \varOmega ({\mathrm {Adv}^{\pm }}(f)) \end{aligned}$$
(1.8)
$$\begin{aligned} \text {and} \quad Q(f) = O\bigg ({\mathrm {Adv}^{\pm }}(f) \, \frac{\log {\mathrm {Adv}^{\pm }}(f)}{\log \log {\mathrm {Adv}^{\pm }}(f)} \log (m) \log \log m \bigg ). \end{aligned}$$
(1.9)

Theorem 6 in particular allows us to compute the query complexity of formulas, up to the logarithmic factor. It does not give any guarantees on running time. However, the analysis required to prove Theorem 6 also leads to significantly simpler proofs of Theorem 5 and the AND-OR formula results of [ACR+10,FGG08]. Moreover, we will see that it allows the formula-evaluation algorithms to be extended to formulas that are not adversary balanced.

1.3 Quantum Algorithm for Evaluating Almost-Balanced Formulas

We give a formula-evaluation algorithm that is both query-optimal, without a logarithmic overhead, and, after an efficient preprocessing step, nearly time optimal. Define almost balance as follows:

Definition 1

Consider a formula \(\varphi \) over a gate set \(\mathcal{{S}}\). For a vertex \(v\) in the corresponding tree, let \(\varphi _v\) denote the subformula of \(\varphi \) rooted at \(v\), and, if \(v\) is an internal vertex, let \(g_v\) be the corresponding gate. The formula \(\varphi \) is \(\beta \) -balanced if for every vertex \(v\), with children \(c_1, c_2, \ldots , c_k\),

$$\begin{aligned} \frac{\mathrm{{max}}_{j} {\mathrm {Adv}^{\pm }}(\varphi _{c_j})}{\min _{j} {\mathrm {Adv}^{\pm }}(\varphi _{c_j})} \le \beta . \end{aligned}$$
(1.10)

(If \(c_j\) is a leaf, \({\mathrm {Adv}^{\pm }}(\varphi _{c_j}) = 1\).) Formula \(\varphi \) is almost balanced if it is \(\beta \)-balanced for some \(\beta = O(1)\).

In particular, an adversary-balanced formula is \(1\)-balanced. We will show:

Theorem 7

Let \(\mathcal{{S}}\) be a fixed, finite set of gates. Then there exists a quantum algorithm that evaluates an almost-balanced formula \(\varphi \) over \(\mathcal{{S}}\) using \(O\big ({\mathrm {Adv}^{\pm }}(\varphi )\big )\) input queries. After polynomial-time classical preprocessing independent of the input, and assuming unit-time coherent access to the preprocessed string, the running time of the algorithm is \({\mathrm {Adv}^{\pm }}(\varphi ) \big (\log {\mathrm {Adv}^{\pm }}(\varphi )\big )^{O(1)}\).

Theorem 7 is significantly stronger than Theorem 5, which requires exact balance. There are important classes of exactly balanced formulas, such as complete, layered formulas. In fact, it is sufficient that the multiset of gates along the simple path from the root to a leaf not depend on the leaf. Moreover, sometimes different gates have the same \({\mathrm {Adv}^{\pm }}\) bound; see [HLŠ06] for examples. Even still, exact adversary balance is a very strict condition.

The proof of Theorem 7 is based on the span program framework developed in Ref. [Rei09]. In particular, [Rei09, Theorem 9.1] gives two quantum algorithms for evaluating span programs. The first algorithm is based on a discrete-time simulation of a continuous-time quantum walk. It applies to arbitrary span programs, and is used, in combination with Theorem 4, to prove Theorem 6. However, the simulation incurs a logarithmic query overhead and potentially worse time complexity overhead, so this algorithm is not suitable for proving Theorem 7.

The second algorithm in [Rei09] is based directly on a discrete-time quantum walk, similar to previous optimal formula-evaluation algorithms [ACR+10,RŠ08]. However, this algorithm does not apply to an arbitrary span program. A bound is needed on the operator norm of the entry-wise absolute value of the weighted adjacency matrix for a corresponding graph. Further graph sparsity conditions are needed for the algorithm to be time efficient (see Theorem 9).

Unfortunately, the span program from Theorem 4 will not generally satisfy these conditions. Theorem 4 gives a canonical span program ([Rei09, Definition 5.1]). Even for a simple formula, the optimal canonical span program will typically correspond to a dense graph with large norm.

An example should clarify the problem. Consider the AND-OR formula \(\psi (x) = \big ( [ (x_1 \wedge x_2) \vee x_3 ] \wedge x_4 \big ) \vee \big ( x_5 \wedge [x_6 \vee x_7] \big )\), and consider the two graphs in Fig. 1. For an input \(x \in \{0,1\}^7\), modify the graphs by attaching dangling edges to every vertex \(j\) for which \(x_j = 0\). Observe then that each graph has an eigenvalue-zero eigenvector supported on vertex \(0\)—called a witness—if and only if \(\psi (x) = 1\). The graphs correspond to different span programs computing \(\psi \), and the quantum algorithm works essentially by running a quantum walk starting at vertex \(0\) in order to detect the witness. The graph on the left is a significantly simplified version of a canonical span program for \(\psi \), and its density still makes it difficult to implement the quantum walk.

Fig. 1.
figure 1

Graphs corresponding to two span programs both computing the same function.

We will be guided by the second, simpler graph. Instead of applying Theorem 4 to \(\varphi \) as a whole, we apply it separately to every gate in the formula. We then compose these span programs, one per gate, according to the formula, using direct-sum composition (Definition 6). In terms of graphs, direct-sum composition attaches the output vertex of one span program’s graph to an input vertex of the next [RŠ08]. This leads to a graph whose structure somewhat follows the structure of the formula \(\varphi \), as the graph in Fig. 1(b) follows the structure of \(\psi \). (However, the general case will be more complicated than shown, as we are plugging together constant-size graph gadgets, and there may be duplication of some subgraphs.)

Direct-sum composition keeps the maximum degree and norm of the graph under control—each is at most twice its value for the worst single gate. Therefore the second [Rei09] algorithm applies. However, direct-sum composition also leads to additional overhead. In particular, a witness in the first graph will be supported only on numbered vertices (note that the graph is bipartite), whereas a witness in the second graph will be supported on some of the internal vertices as well. This means roughly that the second witness will be harder to detect, because after normalization its overlap on vertex \(0\) will be smaller. Scale both witnesses so that the amplitude on vertex \(0\) is one. The witness size (\({\mathrm{wsize }}\)) measures the squared length of the witness only on numbered vertices, whereas the full witness size (\({\mathrm{fwsize }}\)) measures the squared length on all vertices. For [Rei09], it was sufficient to consider only span program witness size, because for canonical span programs like in Fig. 1(a) the two measures are equal. (For technical reasons, we will actually define \({\mathrm{fwsize }}\) to be \(1 + {\mathrm{wsize }}\) even in this case.) For our analysis, we will need to bound the full witness size in terms of the witness size. We maintain this bound in a recursion from the formula’s leaves toward its root.

A span program is called strict if every vertex on one half of the bipartite graph is either an input vertex (vertices \(1\)\(7\) in the graphs of Fig. 1) or the output vertex (vertex \(0\)). Thus the first graph in the example above corresponds to a strict span program, and the second does not. The original definition of span programs, in [KW93], allowed for only strict span programs. This was sensible because any other vertices on the input/output part of the graph’s bipartition can always be projected away, yielding a strict span program that computes the same function. For developing time-efficient quantum algorithms, though, it seems important to consider span programs that are not strict. Unfortunately, going backwards, e.g., from 1(a) to 1(b), is probably difficult in general.

Theorem 7 does not follow from the formula-evaluation techniques of [RŠ08], together with Theorem 3 from [Rei09]. This tempting approach falls into intractable technical difficulties. In particular, the same span program can be used at two vertices \(v\) and \(w\) in \(\varphi \) only if \(g_v = g_w\) and the general adversary bounds of \(v\)’s input subformulas are the same as those for \(w\)’s inputs up to simultaneous scaling. In general, then, an almost-balanced formula will require an unbounded number of different span programs. However, the analysis in [RŠ08] loses a factor that depends badly on the individual span programs. Since the dependence is not continuous, even showing that the span programs in use all lie within a compact set would not be sufficient to obtain an \(O(1)\) upper bound. In contrast, the approach we follow here allows bounding the lost factor by an exponential in \(k\), uniformly over different gate imbalances.

1.4 Quantum Algorithm to Evaluate Approximately Balanced AND-OR Formulas

Ambainis et al. [ACR+10] use a weaker balance criterion for AND-OR formulas than Definition 1. They define an AND-OR formula to be approximately balanced if \({\sigma _-(\varphi )} = O(1)\) and \({\sigma _+({\varphi })} = O(n)\). Here \(n\) is the size of the formula, i.e., the number of leaves, and \({\sigma _-(\varphi )}\) and \({\sigma _+({\varphi })}\) are defined by:

Definition 2

For each vertex \(v\) in a formula \(\varphi \), let

$$\begin{aligned} {\sigma _-(v)}&= \mathop {\mathrm{{max}}}\limits _\xi \, \sum _{w \in \xi } \frac{1}{{\mathrm {Adv}^{\pm }}(\varphi _w)} \nonumber \\ {\sigma _+({v})}&= \mathop {\mathrm{{max}}}\limits _\xi \, \sum _{w \in \xi } {\mathrm {Adv}^{\pm }}(\varphi _w)^2 , \end{aligned}$$
(1.11)

with each maximum taken over all simple paths \(\xi \) from \(v\) to a leaf. Let \(\sigma _\pm (\varphi ) = \sigma _\pm (r)\), where \(r\) is the root of \(\varphi \).

Recall that \({\mathrm {Adv}^{\pm }}(\varphi ) = {\mathrm {Adv}}(\varphi ) = \sqrt{n}\) for an AND-OR formula. Definition 1 is a stricter balance criterion because \(\beta \)-balance of a formula \(\varphi \) implies (by Lemma 3) that \({\sigma _-(\varphi )}\) and \({\sigma _+({\varphi })}\) are both dominated by geometric series. However, the same steps followed by the proof of Theorem 7 still suffice for proving the [ACR+10] result, and, in fact, for strengthening it. We show:

Theorem 8

Let \(\varphi \) be an AND-OR formula of size \(n\). Then after polynomial-time classical preprocessing that does not depend on the input \(x\), \(\varphi (x)\) can be evaluated by a quantum algorithm with error at most \(1/3\) using \(O\big (\sqrt{n} \, {\sigma _-(\varphi )}\big )\) input queries. The algorithm’s running time is \(\sqrt{n} \, {\sigma _-(\varphi )}\) \((\log n)^{O(1)}\) assuming unit-cost coherent access to the preprocessed string.

For the special case of AND-OR formulas with \({\sigma _-(\varphi )} = O(1)\), Theorem 8 strengthens Theorem 7. The requirement that \({\sigma _-(\varphi )} = O(1)\) allows for some gates in the formula to be very unbalanced. Theorem 8 also strengthens [ACR+10, Theorem 1] because it does not require that \({\sigma _+({\varphi })} = O(n)\). For example, a formula that is biased near the root, but balanced at greater depths can have \({\sigma _-(\varphi )} = O(1)\) and \({\sigma _+({\varphi })} = \omega (n)\). By substituting the bound \({\sigma _-(\varphi )} = O(\sqrt{d})\) for a depth-\(d\) formula [ACR+10, Definition 3], a corollary of Theorem 8 is that a depth-\(d\), size-\(n\) AND-OR formula can be evaluated using \(O(\sqrt{n d})\) queries. This improves the depth-dependence from [ACR+10], and matches the dependence from an earlier version of that article [Amb07].

The essential reason that the Definition 1 balance condition can be weakened is that for the specific gates AND and OR, by writing out the optimal span programs explicitly we can prove that they satisfy stronger properties than are necessarily true for other functions.

2 Span Programs

2.1 Definitions

We briefly recall some definitions from [Rei09, Section 2]. Additionally, we define a span program complexity measure, the full witness size, that charges even for the “free” inputs. This quantity is important for developing quantum algorithms that are time efficient as well as query efficient.

For a natural number \(n\), let \([n] = \{1, 2, \ldots , n\}\). For a finite set \(X\), let \(\mathbf{C}^X\) be the inner product space \(\mathbf{C}^{{|X|}}\) with orthonormal basis \(\{ {|x\rangle } : x \in X \}\). For vector spaces \(V\) and \(W\) over \(\mathbf{C}\), let \({\mathcal {L}}(V, W)\) be the set of linear transformations from \(V\) into \(W\), and let \({\mathcal {L}}(V) = {\mathcal {L}}(V, V)\). For \(A \in {\mathcal {L}}(V, W)\), \({\Vert A \Vert }\) is the operator norm of \(A\). For a string \(x \in \{0,1\}^n\), let \(\bar{x}\) denote its bitwise complement.

Definition 3

([HLŠ05,HLŠ07]). For finite sets \(C\), \(E\) and \({\mathcal {D}}\subseteq C^n\), let \(f: {\mathcal {D}}\rightarrow E\). An adversary matrix for \(f\) is a real, symmetric matrix \(\varGamma \in {\mathcal {L}}(\mathbf{C}^{\mathcal {D}})\) that satisfies \({\langle x|} \varGamma {|y\rangle } = 0\) whenever \(f(x) = f(y)\).

The general adversary bound for \(f\), with costs \(s \in [0, \infty )^n\), is

$$\begin{aligned} \mathrm{ADV }^{\pm }_{s}(f) = \mathop {\mathrm{{max}}}\limits _{\Large \begin{array}{c} \text {adversary matrices} \varGamma : \\ \forall j \in [n], \, {\Vert \varGamma \circ \varDelta _j \Vert } \le s_j \end{array}} {\Vert \varGamma \Vert } . \end{aligned}$$
(2.1)

Here \(\varGamma \circ \varDelta _j\) denotes the entry-wise matrix product between \(\varGamma \) and \(\varDelta _j = \sum _{x, y : x_j \ne y_j} {{|x\rangle }\!{\langle y|}}\). The (nonnegative-weight) adversary bound for \(f\), with costs \(s\), is defined by the same maximization, except with \(\varGamma \) restricted to have nonnegative entries. In particular, \(\mathrm{ADV }^{\pm }_{s}(f) \ge {\mathrm {Adv}}_s(f)\).

Letting \(\overrightarrow{1} = (1, 1, \ldots , 1)\), the adversary bound for \(f\) is \({\mathrm {Adv}}(f) = {\mathrm {Adv}}_{\overrightarrow{1}}(f)\) and the general adversary bound for \(f\) is \({\mathrm {Adv}^{\pm }}(f) = {\mathrm {Adv}^{\pm }}_{\overrightarrow{1}}(f)\). By [HLŠ06], \(Q(f) = \varOmega ({\mathrm {Adv}^{\pm }}(f))\).

Definition 4

(Span program [KW93]). A span program \(P\) consists of a natural number \(n\), a finite-dimensional inner product space \(V\) over \(\mathbf{C}\), a “target” vector \({|t\rangle } \in V\), disjoint sets \(I_\mathrm {free}\) and \(I_{j,b}\) for \(j \in [n]\), \(b \in \{0,1\}\), and “input vectors” \({|v_i\rangle } \in V\) for \(i \in I_\mathrm {free}\cup \bigcup _{j \in [n], b \in \{0,1\}} I_{j,b}\).

To \(P\) corresponds a function \(f_P : \{0,1\}^n \rightarrow \{0,1\}\), defined on \(x \in \{0,1\}^n\) by

$$\begin{aligned} f_P(x) = {\left\{ \begin{array}{ll} 1 &{} \text {if} \ {|t\rangle } \in \mathrm{Span }(\{ {| v_i \rangle } : i \in I_\mathrm {free}\cup \bigcup _{j \in [n]} I_{j, x_j} \}) \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(2.2)

Some additional notation is convenient. Fix a span program \(P\). Let \(I = I_\mathrm {free}\cup \bigcup _{j \in [n], b \in \{0,1\}} I_{j,b}\). Let \(A \in {\mathcal {L}}(\mathbf{C}^I, V)\) be given by \(A = \sum _{i \in I} {{|v_i\rangle }\!{\langle i|}}\). For \(x \in \{0,1\}^n\), let \(I(x) = I_\mathrm {free}\cup \bigcup _{j \in [n]} I_{j, x_j}\) and \(\varPi (x) = \sum _{i \in I(x)} {{|i\rangle }\!{\langle i|}} \in {\mathcal {L}}(\mathbf{C}^I)\). Then \(f_P(x) = 1\) if \({|t\rangle } \in \mathrm{Range }(A \varPi (x))\). A vector \({|w\rangle } \in \mathbf{C}^I\) is said to be a witness for \(f_P(x) = 1\) if \(\varPi (x) {|w\rangle } = {|w\rangle }\) and \(A {|w\rangle } = {|t\rangle }\). A vector \({|w'\rangle } \in V\) is said to be a witness for \(f_P(x) = 0\) if \({\langle t|w'\rangle } = 1\) and \(\varPi (x) A^\dagger {|w'\rangle } = 0\).

Definition 5

(Witness size). Consider a span program \(P\), and a vector \(s \in [0, \infty )^n\) of nonnegative “costs.” Let \(S = \sum _{j \in [n], b \in \{0,1\}, i \in I_{j,b}} \sqrt{s_j} {{|i\rangle }\!{\langle i|}} \in {\mathcal {L}}(\mathbf{C}^I)\). For each input \(x \in \{0,1\}^n\), define the witness size of \(P\) on \(x\) with costs \(s\), \({{\mathrm{wsize }}_{s}({P},{x})}\), as follows:

$$\begin{aligned} {{\mathrm{wsize }}_{s}({P},{x})} = {\left\{ \begin{array}{ll} \mathop {\min }\nolimits _{{|w\rangle } : \, A \varPi (x) {|w\rangle } = {|t\rangle }} {\Vert S {|w\rangle } \Vert }^2 &{} \text {if} f_P(x) = 1 \\ \mathop {\min }\nolimits _{\begin{array}{c} {|w'\rangle } : \, {\langle t|w'\rangle } = 1 \\ \varPi (x) A^\dagger {|w'\rangle } = 0 \end{array}} {\Vert S A^\dagger {|w'\rangle } \Vert }{}^2&\text {if} f_P(x) = 0 \end{array}\right. } \end{aligned}$$
(2.3)

The witness size of \(P\) with costs \(s\) is

$$\begin{aligned} {{\mathrm{wsize }}_{s}({P})} = \mathop {\mathrm{{max}}}\limits _{x \in \{0,1\}^n} {{\mathrm{wsize }}_{s}({P},{x})} . \end{aligned}$$
(2.4)

Define the full witness size \({{\mathrm{fwsize }}_{s}({P})}\) by letting \(S^f= S + \sum _{i \in I_\mathrm {free}} {{|i\rangle }\!{\langle i|}}\) and

$$\begin{aligned} {{\mathrm{fwsize }}_{s}({P},{x})}&= {\left\{ \begin{array}{ll} \mathop {\min }\nolimits _{{|w\rangle } : \, A \varPi (x) {|w\rangle } = {|t\rangle }} (1 + {\Vert S^f{|w\rangle } \Vert }{}^2) &{} \text {if} \ f_P(x) = 1 \\ \mathop {\min }\nolimits _{\begin{array}{c} {|w'\rangle } : \, {\langle t|w'\rangle } = 1 \\ \varPi (x) A^\dagger {|w'\rangle } = 0 \end{array}} ({\Vert {|w'\rangle } \Vert }{}^2 + {\Vert S A^\dagger {|w'\rangle } \Vert }{}^2)&\text {if} \ f_P(x) = 0 \end{array}\right. } \end{aligned}$$
(2.5)
$$\begin{aligned} {{\mathrm{fwsize }}_{s}({P})}&= \mathop {\mathrm{{max}}}\limits _{x \in \{0,1\}^n} {{\mathrm{fwsize }}_{s}({P},{x})} . \end{aligned}$$
(2.6)

When the subscript \(s\) is omitted, the costs are taken to be uniform, \(s = \overrightarrow{1} = (1, 1, \ldots , 1)\), e.g., \({{\mathrm{fwsize }}({P})} = {{\mathrm{fwsize }}_{\overrightarrow{1}}({P})}\). The witness size is defined in [RŠ08]. The full witness size is defined in [Rei09, Section 8], but is not named there. A strict span program has \(I_\mathrm {free}= \emptyset \), so \(S^f= S\), and a monotone span program has \(I_{j, 0} = \emptyset \) for all \(j\) [Rei09, Definition 4.9].

2.2 Quantum Algorithm to Evaluate a Span Program Based on Its Full Witness Size

[Rei09, Theorem 9.3] gives a quantum query algorithm for evaluating span programs based on the full witness size. The algorithm is based on a quantum walk on a certain graph. Provided that the degree of the graph is not too large, it can actually be implemented efficiently.

Theorem 9

([Rei09, Theorem 9.3]). Let \(P\) be a span program. Then \(f_P\) can be evaluated using

$$\begin{aligned} T = O\big ( {{\mathrm{fwsize }}({P})} \, {\Vert \mathrm{abs }(A_{G_P}) \Vert } \big ) \end{aligned}$$
(2.7)

quantum queries, with error probability at most \(1/3\). Moreover, if the maximum degree of a vertex in \(G_P\) is \(d\), then the time complexity of the algorithm for evaluating \(f_P\) is at most a factor of \((\log d) \big (\log (T \log d) \big )^{O(1)}\) worse, after classical preprocessing and assuming constant-time coherent access to the preprocessed string.

Proof

(sketch) The query complexity claim is actually slightly weaker than [Rei09, Theorem 9.3], which allows the target vector to be scaled downward by a factor of \(\sqrt{{{\mathrm{fwsize }}({P})}}\).

The time-complexity claim will follow from the proof of[Rei09, Theorem 9.3], in [Rei09, Prop. 9.4, Theorem 9.5]. The algorithm for evaluating \(f_P(x)\) uses a discrete-time quantum walk on the graph \(G_P(x)\). If the maximum degree of a vertex in \(G_P\) is \(d\), then each coin reflection can be implemented using \(O(\log d)\) single-qubit unitaries and queries to the preprocessed string [GR02,CNW10]. Finally, the \(\big (\log (T \log d) \big )^{O(1)}\) factor comes from applying the Solovay-Kitaev Theorem [KSV02] to compile the single-qubit unitaries into products of elementary gates, to precision \(1/O(T \log d)\). \(\Box \)

We remark that together with [Rei09, Theorem 3.1], Theorem 9 gives a way of transforming a one-sided-error quantum algorithm into a span program, and back into a quantum algorithm, such that the time complexity is nearly preserved, after preprocessing. This is only a weak equivalence, because aside from requiring preprocessing the algorithm from Theorem 9 also has two-sided error. To some degree, though, it complements the equivalence results for best span program witness size and bounded-error quantum query complexity [Rei09, Theorem 7.1, Theorem 9.2].

2.3 Direct-Sum Span Program Composition

Let us study the full witness size of the direct-sum composition of span programs. We begin by recalling the definition of direct-sum composition.

Let \(f : \{0,1\}^n \rightarrow \{0,1\}\) and \(S \subseteq [n]\). For \(j \in [n]\), let \(m_j\) be a natural number, with \(m_j = 1\) for \(j \notin S\). For \(j \in S\), let \(f_j : \{0,1\}^{m_j} \rightarrow \{0,1\}\). Define \(y : \{0,1\}^{m_1} \times \cdots \times \{0,1\}^{m_n} \rightarrow \{0,1\}^n\) by

$$\begin{aligned} y(x)_j = {\left\{ \begin{array}{ll} f_j(x_j) &{} \text {if} \ j \in S \\ x_j &{} \text {if} \ j \notin S \end{array}\right. } \end{aligned}$$
(2.8)

Define \(g : \{0,1\}^{m_1} \times \cdots \times \{0,1\}^{m_n} \rightarrow \{0,1\}\) by \(g(x) = f(y(x))\). For example, if \(S = [n] \backslash \{1\}\), then

$$\begin{aligned} g(x) = f\big (x_1, f_2(x_2), \ldots , f_n(x_n)\big ) . \end{aligned}$$
(2.9)

Given span programs for the individual functions \(f\) and \(f_j\) for \(j \in S\), we will construct a span program for \(g\). We remark that although we are here requiring that the inner functions \(f_j\) act on disjoint sets of bits, this assumption is not necessary for the definition. It simplifies the notation, though, for the cases \(S \ne [n]\), and will suffice for our applications.

Let \(P\) be a span program computing \(f_P = f\). Let \(P\) have inner product space \(V\), target vector \({|t\rangle }\) and input vectors \({|v_i\rangle }\) indexed by \(I_\mathrm {free}\) and \(I_{jc}\) for \(j \in [n]\) and \(c \in \{0,1\}\).

For \(j \in [n]\), let \(s_j \in [0, \infty )^{m_j}\) be a vector of costs, and let \(s \in [0, \infty )^{\sum m_j}\) be the concatenation of the vectors \(s_j\). For \(j \in S\), let \(P^{j0}\) and \(P^{j1}\) be span programs computing \(f_{P^{j1}} = f_j : \{0,1\}^{m_j} \rightarrow \{0,1\}\) and \(f_{P^{j0}} = \lnot f_j\), with \(r_j = {{\mathrm{wsize }}_{s_j}({P^{j0}})} = {{\mathrm{wsize }}_{s_j}({P^{j1}})}\). For \(c \in \{0,1\}\), let \(P^{jc}\) have inner product space \(V^{jc}\) with target vector \({|t^{jc}\rangle }\) and input vectors indexed by \(I_\mathrm {free}^{jc}\) and \(I^{jc}_{kb}\) for \(k \in [m_j]\), \(b \in \{0,1\}\). For \(j \notin S\), let \(r_j = s_j\).

Let \(I_S = \bigcup _{j \in S, c \in \{0,1\}} I_{jc}\). Define \(\varsigma : I_S \rightarrow [n] \times \{0,1\}\) by \(\varsigma (i) = (j,c)\) if \(i \in I_{jc}\). The idea is that \(\varsigma \) maps \(i\) to the input span program that must evaluate to \(1\) in order for \({|v_i\rangle }\) to be available in \(P\).

There are several ways of composing the span programs \(P\) and \(P^{jc}\) to obtain a span program \(Q\) computing the composed function \(f_Q = g\) with \({{\mathrm{wsize }}_{s}({Q})} \le {{\mathrm{wsize }}_{r}({P})}\) [Rei09, Defs. 4.4, 4.5, 4.6]. We focus on direct-sum composition.

Definition 6

([Rei09, Definition 4.5]). The direct-sum-composed span program \(Q^\oplus \) is defined by:

  • The inner product space is \(V^\oplus = V \oplus \bigoplus _{j \in S, c \in \{0,1\}} (\mathbf{C}^{I_{jc}} \otimes V^{jc})\). Any vector in \(V^\oplus \) can be uniquely expressed as \({|u\rangle }_V + \sum _{i \in I_S} {|i\rangle } \otimes {|u_i\rangle }\), where \({|u\rangle } \in V\) and \({|u_i\rangle } \in V^{\varsigma (i)}\).

  • The target vector is \({|t^\oplus \rangle } = {|t\rangle }_V\).

  • The free input vectors are indexed by \(I_\mathrm {free}^\oplus = I_\mathrm {free}\cup I_S \cup \bigcup _{j \in S, c \in \{0,1\}} (I_{jc} \times I_\mathrm {free}^{jc})\) with, for \(i \in I_\mathrm {free}^\oplus \),

    $$\begin{aligned} {|v^\oplus _i\rangle } = {\left\{ \begin{array}{ll} {|v_i\rangle }_V &{} \text {if} \ i \in I_\mathrm {free}\\ {|v_i\rangle }_V - {|i\rangle } \otimes {|t^{jc}\rangle } &{} \text {if} \ i \in I_{jc} \text {and} j \in S \\ {|i'\rangle } \otimes {|v_{i''}\rangle } &{} \text {if} \ i = (i', i'') \in I_{jc} \times I_\mathrm {free}^{jc} \end{array}\right. } \end{aligned}$$
    (2.10)
  • The other input vectors are indexed by \(I^\oplus _{(jk)b}\) for \(j \in [n]\), \(k \in [m_j]\), \(b \in \{0,1\}\). For \(j \notin S\), \(I^\oplus _{(j1)b} = I_{jb}\), with \({|v^\oplus _i\rangle } = {|v_i\rangle }_V\) for \(i \in I^\oplus _{(j1)b}\). For \(j \in S\), let \(I^\oplus _{(jk)b} = \bigcup _{c \in \{0,1\}} (I_{jc} \times I^{jc}_{kb})\). For \(i \in I_{jc}\) and \(i' \in I^{jc}_{kb}\), let

    $$\begin{aligned} {|v^\oplus _{ii'}\rangle } = {|i\rangle } \otimes {|v_{i'}\rangle } . \end{aligned}$$
    (2.11)

By [Rei09, Theorem 4.3], \(f_{Q^\oplus } = g\) and \({{\mathrm{wsize }}_{s}({Q^\oplus })} \le {{\mathrm{wsize }}_{r}({P})}\). (While that theorem is stated only for the case \(S = [n]\), it is trivially extended to other \(S \subset [n]\).) We give a bound on how quickly the full witness size can grow relative to the witness size:

Lemma 1

Under the above conditions, for each input \(x \in \{0,1\}^{m_1} \times \cdots \times \{0,1\}^{m_n}\), with \(y = y(x)\),

  • If \(g(x) = 1\), let \({|w\rangle }\) be a witness to \(f_P(y) = 1\) such that

    $$\sum _{j \in [n], i \in I_{j y_j}} r_j {|w_i|}^2 = {{\mathrm{wsize }}_{r}({P},{y})}.$$

    Then

    $$\begin{aligned} \frac{{{\mathrm{fwsize }}_{s}({Q^\oplus },{x})} }{ {{\mathrm{wsize }}_{r}({P},{y})} }&\le \sigma \big (y, {|w\rangle }\big ) + \frac{1 + \sum _{i \in I_\mathrm {free}} {|w_i|}^2}{{{\mathrm{wsize }}_{r}({P},{y})}} \nonumber \\&\text {where} \ \sigma (y, {|w\rangle }) = \mathop {\mathrm{{max}}}\limits _{\begin{array}{c} j \in S : \\ \exists i \in I_{j y_j} \text {with} {\langle i|w\rangle } \ne 0 \end{array}} \frac{{{\mathrm{fwsize }}_{s_j}({P^{j y_j}})}}{{{\mathrm{wsize }}_{s_j}({P^{j y_j}})}} . \end{aligned}$$
    (2.12)
  • If \(g(x) = 0\), let \({|w'\rangle }\) be a witness to \(f_P(y) = 0\) such that

    $$\sum _{j \in [n], i \in I_{j \bar{y}_j}} r_j {|{\langle w'|v_i\rangle }|}^2 = {{\mathrm{wsize }}_{r}({P},{y})}.$$

    Then

    $$\begin{aligned} \frac{ {{\mathrm{fwsize }}_{s}({Q^\oplus },{x})} }{{{\mathrm{wsize }}_{r}({P},{y})} }&\le \sigma (\bar{y}, {|w'\rangle }) + \frac{{\Vert {|w'\rangle } \Vert }^2}{{{\mathrm{wsize }}_{r}({P},{y})}} \nonumber \\&\text {where} \ \sigma (\bar{y}, {|w'\rangle }) = \mathop {\mathrm{{max}}}\limits _{\begin{array}{c} j \in S : \\ \exists i \in I_{j \bar{y}_j} \text {with} {\langle v_i|w'\rangle } \ne 0 \end{array}} \frac{{{\mathrm{fwsize }}_{s_j}({P^{j \bar{y}_j}})}}{{{\mathrm{wsize }}_{s_j}({P^{j \bar{y}_j}})}} . \end{aligned}$$
    (2.13)

If \(S = \emptyset \), then \(\sigma (y, {|w\rangle })\) and \(\sigma (\bar{y}, {|w'\rangle })\) should each be taken to be \(1\) in the above equations.

Proof

We follow the proof of [Rei09, Theorem 4.3], except keeping track of the full witness size. Note that if \(S = \emptyset \), then Eqs. (2.12) and (2.13) are immediate by definition of \({{\mathrm{fwsize }}_{s}({Q^\oplus },{x})}\).

Let \(I(y)' = I(y) \backslash I_\mathrm {free}= \bigcup _{j \in [n]} I_{j y_j}\).

In the first case, \(g(x) = 1\), for \(j \in S\) let \({|w^{j y_j}\rangle } \in \mathbf{C}^{I^{j y_j}}\) be a witness to \(f_{P^{j y_j}}(x_j) = 1\) such that \({{\mathrm{fwsize }}_{s}({P^{j y_j}},{x_j})} = 1 + \sum _{i \in I_\mathrm {free}^{jy_j}} {|w^{j y_j}_i|}^2 + \sum _{k \in [m_j], i \in I^{j y_j}_{k (x_j)_k}} (s_j)_k {|w^{j y_j}_i|}^2\). As in [Rei09, Theorem 4.3], let \({|w^\oplus \rangle } \in \mathbf{C}^{I^\oplus (x)}\) be given by

$$\begin{aligned} w^\oplus _i = {\left\{ \begin{array}{ll} w_i &{} \text {if} \ i \in I(y) \\ w_{i'} w^{\varsigma (i')}_{i''} &{} \text {if} \ i = (i', i'') \ \text {with} \ i' \in I(y)' \cap I_S, i'' \in I^{\varsigma (i')}(x) \\ 0 &{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(2.14)

Then \({|w^\oplus \rangle }\) is a witness for \(f_{Q^\oplus }(x) = 1\), and we compute

$$\begin{aligned} {{\mathrm{fwsize }}_{s}({Q^\oplus },{x})}&\le 1 + \sum _{i \in I_\mathrm {free}^\oplus } {|w^\oplus _i|}^2 + \mathop {\sum }\limits _{\begin{array}{c} j \in [n], k \in [m_j], \\ i \in I^\oplus _{(jk)(x_j)_k} \end{array}} (s_j)_k {|w^\oplus _i|}^2 \nonumber \\&= 1 + \sum _{i \in I_\mathrm {free}} {|w_i|}^2 + \sum _{j \in [n] \backslash S, i \in I_{j x_j}} s_j {|w_i|}^2 \\&\qquad + \sum _{j \in S, i \in I_{j y_j}} {|w_i|}^2 \Bigg ( 1 + \sum _{i' \in I_\mathrm {free}^{j y_j}} {|w^{j y_j}_{i'}|}{}^2 \nonumber \\&\qquad + \sum _{k \in [m_j], i' \in I^{j y_j}_{k (x_j)_k}} (s_j)_k {|w^{j y_j}_{i'}|}{}^2 \Bigg ) \nonumber \\&= 1 + \sum _{i \in I_\mathrm {free}} {|w_i|}^2 + \sum _{j \in [n] \backslash S, i \in I_{j x_j}} s_j {|w_i|}^2 \nonumber \\&\qquad + \sum _{j \in S, i \in I_{j y_j}} {|w_i|}^2 \, {{\mathrm{fwsize }}_{s_j}({P^{j y_j}},{x_j})} \nonumber . \end{aligned}$$
(2.15)

Equation (2.12) follows using the bound \({{\mathrm{fwsize }}_{s_j}({P^{jy_j}},{x_j})} \le \sigma (y, {|w\rangle }) r_j\) for \(j \in S\), and \(s_j = r_j\) for \(j \notin S\).

Next consider the case \(g(x) = 0\). For \(j \in S\), let \({|u^{j \bar{y}_j}\rangle } \in V^{j \bar{y}_j}\) be a witness for \(f_{P^{j \bar{y}_j}}(x_j) = 0\) with \({{\mathrm{fwsize }}_{s}({P^{j \bar{y}_j}},{x_j})} = {\Vert {|u^{j \bar{y}_j}\rangle } \Vert }{}^2 + \sum _{k \in [m_j], i \in I^{j \bar{y}_j}_{k \overline{(x_j)_k}}} (s_j)_k {|{\langle v_i|u^{j \bar{y}_j}\rangle }|}{}^2\). As in [Rei09, Theorem 4.3], let

$$\begin{aligned} {|u^\oplus \rangle } = {|w'\rangle }_V + \sum _{i \in I_S \backslash I(y)} {\langle v_i|w'\rangle } {|i\rangle } \otimes {|u^{\varsigma (i)}\rangle } . \end{aligned}$$
(2.16)

Then \({|u^\oplus \rangle }\) is a witness for \(f_{Q^\oplus }(x) = 0\), and, moreover,

$$\begin{aligned} {{\mathrm{fwsize }}_{s}({Q^\oplus },{x})}&\le {\Vert {|u^\oplus \rangle } \Vert }{}^2 + \sum _{j \in [n], k \in [m_j], i \in I^\oplus _{(jk)\overline{(x_j)_k}}} (s_j)_k {|{\langle v^\oplus _i|u^\oplus \rangle }|}{}^2 \nonumber \\&= {\Vert {|u^\oplus \rangle } \Vert }{}^2 + \mathop {\sum }\limits _{\begin{array}{c} j \in [n] \backslash S \\ i \in I_{j \bar{x}_j} \end{array}} s_j {|{\langle v^\oplus _i|u^\oplus \rangle }|}{}^2 \nonumber \\&\qquad + \mathop {\sum }\limits _{\begin{array}{c} j \in S, k \in [m_j], \\ i \in I_{j \bar{y}_j}, i' \in I^{j \bar{y}_j}_{k \overline{(x_j)_k}} \end{array}} (s_j)_k {|{\langle v^\oplus _{ii'}|u^\oplus \rangle }|}{}^2 \nonumber \\&= {\Vert {|w'\rangle } \Vert }^2 + \mathop {\sum }\limits _{\begin{array}{c} j \in [n] \backslash S \\ i \in I_{j \bar{x}_j} \end{array}} s_j {|{\langle v_i|w'\rangle }|}{}^2 \\&\qquad + \sum _{j \in S, i \in I_{j \bar{y}_j}} {|{\langle v_i|w'\rangle }|}^2 \Bigg ( {\Vert {|u^{j \bar{y}_j}\rangle } \Vert }{}^2 \nonumber \\&\qquad + \sum _{k \in [m_j], i' \in I^{j \bar{y}_j}_{k \overline{(x_j)_k}}} (s_j)_k {|{\langle v_{i'}|u^{j \bar{y}_j}\rangle }|}{}^2 \Bigg ) \nonumber \\&= {\Vert {|w'\rangle } \Vert }^2 + \mathop {\sum }\limits _{\begin{array}{c} j \in [n] \backslash S \\ i \in I_{j \bar{x}_j} \end{array}} r_j {|{\langle v_i|w'\rangle }|}{}^2 \nonumber \\&\qquad + \sum _{j \in S, i \in I_{j \bar{y}_j}} {|{\langle v_i|w'\rangle }|}^2 \, {{\mathrm{fwsize }}_{s_j}({P^{j \bar{y}_j}},{x_j})} \nonumber . \end{aligned}$$
(2.17)

Equation (2.13) follows using the bound \({{\mathrm{fwsize }}_{s_j}({P^{j \bar{y}_j}},{x_j})} \le \sigma (\bar{y}, {|w'\rangle }) r_j\) for \(j \in S\). \(\Box \)

Lemma 1 is a key step in the formula-evaluation results in this article and [Rei11]. It is used to track the full witness size for span programs recursively composed in a direct-sum manner along a formula. The proof of Theorem 7 will require the lemma with the weaker bounds \(\sigma (y, {|w\rangle }), \sigma (\bar{y}, {|w'\rangle }) \le \mathrm{{max}}_{j \in S, c \in \{0,1\}} {{\mathrm{fwsize }}_{s_j}({P^{jc}})} / {{\mathrm{wsize }}_{s_j}({P^{jc}})}\). Theorem 8 will use only the slightly stronger bounds \(\sigma (y, {|w\rangle }) \le \mathrm{{max}}_{j \in S} {{\mathrm{fwsize }}_{s_j}({P^{j y_j}})} / {{\mathrm{wsize }}_{s_j}({P^{j y_j}})}\), \(\sigma (\bar{y}, {|w'\rangle }) \le \mathrm{{max}}_{j \in S} {{\mathrm{fwsize }}_{s_j}({P^{j \bar{y}_j}})} / {{\mathrm{wsize }}_{s_j}({P^{j \bar{y}_j}})}\). However, the proof of [Rei11, Theorem 1.1] will require the bounds of Eqs. (2.12) and (2.13).

3 Evaluation of Almost-Balanced Formulas

In this section, we will apply the span program framework from [Rei09] to prove Theorem 7. Our algorithm will be given by applying Theorem 9 to a certain span program. Before beginning the proof, though, we will give two necessary lemmas.

Consider a span program \(P\) with corresponding weighted graph \(G_P\), from [Rei09, Definition 8.2]. We will need a bound on the operator norm of \(\mathrm{abs }(A_{G_{P_v}})\), the entry-wise absolute value of the weighted adjacency matrix \(A_{G_{P_v}}\). If \(P\) is canonical [Rei09, Definition 5.1], then we can indeed obtain such a bound in terms of the witness size of \(P\):

Lemma 2

Let \(s \in (0, \infty )^k\), and let \(P\) be a canonical span program computing a function \(f : \{0,1\}^k \rightarrow \{0,1\}\) with input vectors indexed by the set \(I\). Assume that for each \(x \in \{0,1\}^k\) with \(f(x) = 0\), an optimal witness to \(f_P(x) = 0\) is \({|x\rangle }\) itself. Then

$$\begin{aligned} {\Vert \mathrm{abs }(A_{G_P}) \Vert } \le 2^k \Big ( 1 + \frac{{{\mathrm{wsize }}_{s}({P})}}{\min _{j \in [k]} s_j} \Big ) + {|I|} . \end{aligned}$$
(3.1)

Proof

Recall from [Rei09, Definition 5.1], that \(P\) being in canonical form implies that its target vector is \({|t\rangle } = \sum _{x : f(x) = 0} {|x\rangle }\), and that the matrix \(A\) whose columns are the input vectors of \(P\) can be expressed as

$$\begin{aligned} A = \sum _{i \in I} {{|v_i\rangle }\!{\langle i|}} = \sum _{j \in [k], \, x : f(x) = 0} {{|x\rangle }\!{\langle j, \bar{x}_j|}} \otimes {\langle v_{xj}|} . \end{aligned}$$
(3.2)

By assumption, for each \(x \in f^{-1}(0)\),

$$\begin{aligned} \sum _{j \in [k]} s_j {\Vert {|v_{xj}\rangle } \Vert }^2 = {{\mathrm{wsize }}_{s}({P},{x})} \le {{\mathrm{wsize }}_{s}({P})} . \end{aligned}$$
(3.3)

In particular, letting \(\sigma = \min _{j \in [k]} s_j > 0\), we can bound

$$\begin{aligned} \sum _{j \in [k]} {\Vert {|v_{xj}\rangle } \Vert }^2&\le \frac{1}{\sigma }\sum _{j \in [k]} s_j {\Vert {|v_{xj}\rangle } \Vert }^2 \nonumber \\&\le \frac{{{\mathrm{wsize }}_{s}({P})}}{\sigma } . \end{aligned}$$
(3.4)

The rest of the argument follows from the definition of the weighted adjacency matrix \(A_{G_P}\). From [Rei09, Definition 8.1, Prop. 8.8], \({\Vert \mathrm{abs }(A_{G_P}) \Vert } \le {\Vert \mathrm{abs }(B_{G_P}) \Vert }^2\), where \(B_{G_P}\) is the biadjacency matrix corresponding to \(P\),

$$\begin{aligned} B_{G_P} = \left( \begin{matrix} {|t\rangle } &{} A \\ 0 &{} \varvec{1}\end{matrix} \right) , \end{aligned}$$
(3.5)

and \(\varvec{1}\) is an \({|I|} \times {|I|}\) identity matrix. Now bound \({\Vert \mathrm{abs }(B_{G_P}) \Vert }\) by its Frobenius norm:

$$\begin{aligned} {\Vert \mathrm{abs }(A_{G_P}) \Vert }&\le {\Vert \mathrm{abs }(B_{G_P}) \Vert }^2 \nonumber \\&\le {\Vert \mathrm{abs }(B_{G_P}) \Vert }_F^2 \nonumber \\&= {\Vert {|t\rangle } \Vert }^2 + \mathop {\sum }\limits _{\begin{array}{c} x : f(x) = 0, \\ j \in [k] \end{array}} {\Vert {|v_{xj}\rangle } \Vert }^2 + {|I|} \nonumber \\&\le 2^k + 2^k \mathop {\mathrm{{max}}}\limits _{x : f(x) = 0} \mathop {\sum }\limits _{j \in [k]} {\Vert {|v_{xj}\rangle } \Vert }^2 + {|I|} . \end{aligned}$$
(3.6)

Equation (3.1) follows by substituting in Eq. (3.4). \(\Box \)

An important quantity in the proof of Theorem 7 will be \({\sigma _-(\varphi )}\), from Definition 2. For an almost-balanced formula \(\varphi \), \({\sigma _-(\varphi )} = O(1)\).

Lemma 3

Consider a \(\beta \)-balanced formula \(\varphi \) over a gate set \(\mathcal{{S}}\) in which every gate depends on at least two input bits. Then for every vertex \(v\), with children \(c_1, c_2, \ldots , c_k\),

$$\begin{aligned} \frac{{\mathrm {Adv}^{\pm }}(\varphi _v)}{\mathop {\mathrm{{max}}}\nolimits _j {\mathrm {Adv}^{\pm }}(\varphi _{c_j})} \ge \sqrt{1+\frac{1}{\beta ^2}} . \end{aligned}$$
(3.7)

In particular,

$$\begin{aligned} {\sigma _-(\varphi )} \le (2+\sqrt{2})\beta ^2 . \end{aligned}$$
(3.8)

Proof

Consider a vertex \(v\) with corresponding gate \(g = g_v : \{0,1\}^k \rightarrow \{0,1\}\). By Theorem 2, \({\mathrm {Adv}^{\pm }}(\varphi _v) = {\mathrm {Adv}^{\pm }}_s(g)\), where \(s_j = {\mathrm {Adv}^{\pm }}(\varphi _{c_j})\). It is immediate from the definitions that \(\mathrm{ADV }^{\pm }_{s}(g) \ge {\mathrm {Adv}}_s(g)\). We will show that \({\mathrm {Adv}}_s(g) \ge \sqrt{1+1/\beta ^2} (\mathrm{{max}}_j s_j)\), using that \(\mathrm{{max}}_j s_j / \min _j s_j \le \beta \).

Use the weighted minimax formulation of the adversary bound from [HLŠ07, Theorem 18]:

$$\begin{aligned} {\mathrm {Adv}}_s(g) = \min _{p} \mathop {\mathrm{{max}}}\limits _{\begin{array}{c} x, y \in \{0,1\}^k\\ g(x) \ne g(y) \end{array}} \frac{1}{\sum _{j : x_j \ne y_j} \sqrt{p_x(j) p_y(j)} / s_j} , \end{aligned}$$
(3.9)

where the minimization is over all choices of probability distributions \(p_x\) over \([k]\) for \(x \in \{0,1\}^k\).

Since the adversary bound is monotone increasing in each weight, the worst case is when all but one of the weights are equal to \(\mathrm{{max}}_j s_j / \beta \). Since for a scalar \(c\), \({\mathrm {Adv}}_{c s}(g) = c {\mathrm {Adv}}_s(g)\), we may scale so that one weight is \(\beta \) and all other weights are \(1\). Assume that the first weight is \(s_1 = \beta \); the other \(k-1\) cases, \(s_2 = \beta \) and so on, are symmetrical. Assume also that \(g\) depends on the first bit; otherwise \(\mathrm {Adv}^{\pm }_s(g)\) will not depend on \(s_1\) so one of the other cases will be worse. Therefore, there exist inputs \(x, y \in \{0,1\}^k\) that differ only on the first bit, but for which \(g(x) \ne g(y)\).

Since the function \(g\) depends on at least two input bits, there also exists a third input \(z \in \{0,1\}^k\) with \(x_1 = z_1\) but \(g(z) = g(y) \ne g(x)\). Indeed, if \(g(z) = g(x)\) for every \(z\) with \(z_1 = x_1\), and if \(g(z) = g(y)\) for every \(z\) with \(z_1 = y_1\), then \(g\) depends only on the first bit.

By Eq. (3.9),

(3.10)

where the minimization is over only the three probability distributions \(p_x\), \(p_y\) and \(p_z\). In the above expression, we may clearly take \(p_y(1) = 1\) and \(p_y(j) = 0\) for \(j \ge 2\). We may also use the Cauchy-Schwarz inequality to bound the second term above, and finally substitute \(s_1 = \beta \), \(s_j = 1\) for \(j \ge 2\) to obtain,

$$\begin{aligned} \mathrm{ADV }^{\pm }_{s}(g) \ge \min _{p_x} \mathrm{{max}}\Big \{ \frac{\beta }{\sqrt{p_x(1)}}, \frac{1}{\sqrt{\sum _{j \ge 2} p_x(j)}} \Big \} . \end{aligned}$$
(3.11)

The optimum is achieved for \(p_x(1) = \beta ^2 / (1 + \beta ^2)\), so \({\mathrm {Adv}^{\pm }}_s(g) \ge \sqrt{1 + \beta ^2}\), as claimed.

To derive Eq. (3.8), note that \(\beta \ge 1\) necessarily. Then the sum \({\sigma _-(\varphi )}\) is dominated by the geometric series

$$\begin{aligned} \sum _{k=0}^{\infty } \Big ( 1 + \frac{1}{\beta ^2} \Big )^{-k/2} , \end{aligned}$$
(3.12)

which is at most \((2+\sqrt{2})\beta ^2\), with equality at \(\beta = 1\). \(\Box \)

Note that the \(1\)-balanced formulas over \(\mathcal{{S}}= \{ \mathrm{OR }_2 \}\) satisfy the inequality (3.7) with equality and come arbitrarily close to saturating the inequality (3.8).

With Lemmas 2 and 3 in hand, we are ready to prove Theorem 7.

Proof

(of Theorem 7) First of all, we may assume without loss of generality that every gate in \(\mathcal{{S}}\) depends on at least two input bits. Indeed, if a gate \(g : \{0,1\}^k \rightarrow \{0,1\}\) depends on no input bits, i.e., is the constant \(0\) or constant \(1\) function, then \(g\) can be eliminated from any formula over \(\mathcal{{S}}\) without changing the adversary balance condition, since \(\mathrm{ADV }^{\pm }_{s}(g) = 0\) for all cost vectors \(s \in [0,\infty )^k\). If a gate \(g : \{0,1\}^k \rightarrow \{0,1\}\) depends only on one input bit, say the first bit, then \(\mathrm{ADV }^{\pm }_{s}(g) = s_1\) for all cost vectors \(s\), and therefore similarly \(g\) can be eliminated without affecting the adversary balance condition.

Consider \(\varphi \) an \(n\)-variable, \(\beta \)-balanced, read-once formula over the finite gate set \(\mathcal{{S}}\). Let \(r\) be the root of \(\varphi \). We begin by recursively constructing a span program \(P_\varphi \) that computes \(\varphi \) and has witness size \({{\mathrm{wsize }}({P_\varphi })} = {\mathrm {Adv}^{\pm }}(\varphi )\). \(P_\varphi \) is constructed using direct-sum composition of span programs for each node in \(\varphi \). (Direct-sum composition is also the composition method used in [RŠ08].)

The construction works recursively, starting at the leaves of \(\varphi \) and moving toward the root. Consider an internal vertex \(v\), with children \(c_1, \ldots , c_k\). Let \(\alpha _j = {\mathrm {Adv}^{\pm }}(\varphi _{c_j})\), where \(\varphi _{c_j}\) is the subformula of \(\varphi \) rooted at \(c_j\) (Definition 1). In particular, if \(c_j\) is a leaf, then \(\alpha _j = 1\). Assume that for \(j \in [k]\) we have inductively constructed span programs \(P_{\varphi _{c_j}}\) and \(P_{\varphi _{c_j}}^\dagger \) computing \(\varphi _{c_j}\) and \(\lnot \varphi _{c_j}\), respectively, with \({{\mathrm{wsize }}({P_{\varphi _{c_j}}})} = {{\mathrm{wsize }}({P_{\varphi _{c_j}}^\dagger })} = \alpha _j\). Apply [Rei09, Theorem 6.1], a generalization of Theorem 4, twice to obtain span programs \(P_v\) and \(P_v^\dagger \) computing \(f_{P_v} = g_v\) and \(f_{P_v^\dagger } = \lnot g_v\), with \({{\mathrm{wsize }}_{\alpha }({P_v})}= {{\mathrm{wsize }}_{\alpha }({P_v^\dagger })}= \mathrm{ADV }^{\pm }_{\alpha }(g_v) = {\mathrm {Adv}^{\pm }}(\varphi _v)\).

Then let \(P_{\varphi _v}\) and \(P_{\varphi _v}^\dagger \) be the direct-sum-composed span programs of \(P_v\) and \(P_v^\dagger \), respectively, with the span programs \(P_{\varphi _{c_j}}\), \(P_{\varphi _{c_j}}^\dagger \) according to the formula \(\varphi \). By definition of direct-sum composition, the graph \(G_{P_{\varphi _v}}\) is built by replacing the input edges of \(G_{P_v}\) with the graphs \(G_{P_{\varphi _{c_j}}}\) or \(G_{{P_{\varphi _{c_j}}^\dagger }}\); and similarly for \(G_{{P_{\varphi _v}^\dagger }}\). Some examples are given in [Rei09, Appendix B] and in [RŠ08]. By [Rei09, Theorem 4.3], \(P_{\varphi _v}\) (resp. \(P_{\varphi _v}^\dagger \)) computes \(\varphi _v\) (\(\lnot \varphi _v\)) with \({{\mathrm{wsize }}({P_{\varphi _v}})} = {{\mathrm{wsize }}({P_{\varphi _v}^\dagger })} = {\mathrm {Adv}^{\pm }}(\varphi _v)\).

Let \(P_\varphi = P_{\varphi _r}\). We wish to apply Theorem 9 to \(P_\varphi \) to obtain a quantum algorithm, but to do so will need some more properties of the span programs \(P_v\) and \(P_v^\dagger \). Recall from [Rei09, Theorem 5.2] that each \(P_v\) may be assumed to be in canonical form, satisfying in particular that for any input \(y \in \{0,1\}^k\) with \(g_v(y) = 0\) an optimal witness is \({|y\rangle } \in \mathbf{C}^{g_v^{-1}(0)}\) itself. Therefore, Lemma 2 applies, and we obtain

$$\begin{aligned} {\Vert \mathrm{abs }(A_{G_{P_v}}) \Vert } = 2^k \bigg ( 1 + \frac{{{\mathrm{wsize }}_{\alpha }({P_v})}}{\min _j \alpha _j} \bigg ) + {|I|} , \end{aligned}$$
(3.13)

where \({|I|}\) is the number of input vectors in \(P_v\). Now use

$$\begin{aligned} \frac{{{\mathrm{wsize }}_{\alpha }({P_v})}}{\min _j \alpha _j}&= \frac{\mathrm{{max}}_j \alpha _j}{\min _j \alpha _j} \frac{{\mathrm {Adv}^{\pm }}_\alpha (g_v)}{\mathrm{{max}}_j \alpha _j} \nonumber \\&\le \beta k , \end{aligned}$$
(3.14)

where we have applied Eq. (1.10) and also \(\mathrm{ADV }^{\pm }_{\alpha }(g_v) / \mathrm{{max}}_j \alpha _j \le {\mathrm {Adv}^{\pm }}(g_v) \le k\). Additionally, by [Rei09, Lemma 6.6], we may assume that \({|I|} \le 2 k^2 2^k\). Thus

$$\begin{aligned} {\Vert \mathrm{abs }(A_{G_{P_v}}) \Vert } = \beta \, 2^{O(k)} . \end{aligned}$$
(3.15)

By repeating this argument for the negated function \(\lnot g_v\) computed by a dual span program \(P_v^\dagger \) ([Rei09, Lemma 4.1]), we also have \({\Vert \mathrm{abs }(A_{G_{P_v^\dagger }}) \Vert } = \beta \, 2^{O(k)}\).

A consequence is that

$$\begin{aligned} {\Vert \mathrm{abs }(A_{G_{P_\varphi }}) \Vert } = \beta \, 2^{O(k_{\text {max}})} \end{aligned}$$
(3.16)

where \(k_{\text {max}}\) is the maximum fan-in of any gate used in \(\varphi \). Indeed, \(G_{P_\varphi }\) is built by “plugging together” the graphs \(G_{P_v}\) and \(G_{P_v^\dagger }\) for the different vertices \(v\). Split the graph \(G_{P_\varphi }\) into two pieces, \(G_0\) and \(G_1\), comprising those subgraphs \(G_{P_v}\) and \(G_{P_v^\dagger }\) for which the distance of \(v\) from \(r\) is even or odd, respectively. Then \({\Vert \mathrm{abs }(A_{G_{P_\varphi }}) \Vert } \le {\Vert \mathrm{abs }(A_{G_0}) \Vert } + {\Vert \mathrm{abs }(A_{G_1}) \Vert }\). Since each \(G_b\) is the disconnected union of graphs \(G_{P_v}\) and \(G_{P_v^\dagger }\), \({\Vert \mathrm{abs }(A_{G_b}) \Vert } \le \mathop {\mathrm{{max}}}\nolimits _v \mathrm{{max}}\{ {\Vert \mathrm{abs }(A_{G_{P_v}}) \Vert }, {\Vert \mathrm{abs }(A_{{G_{{P_v^\dagger }}}}) \Vert } \}\).

Let us bound the full witness size of \(P_\varphi \).

Lemma 4

Let \(v\) be a vertex of \(\varphi \). Then

$$\begin{aligned} \mathrm{{max}}\big \{ {{\mathrm{fwsize }}({P_{\varphi _v}})}, {{\mathrm{fwsize }}({P_{\varphi _v}^\dagger })} \big \} \le {\sigma _-(v)} {\mathrm {Adv}^{\pm }}(\varphi _v) . \end{aligned}$$
(3.17)

Proof

The proof is by induction in the maximum distance from \(v\) to a leaf. The base case, that all of \(v\)’s inputs are themselves leaves is by definition of \(P_v\) and \(P_v^\dagger \), since then \({\sigma _-(v)} = 1 + 1/{\mathrm {Adv}^{\pm }}(g_v)\).

Let \(v\) have children \(c_1, \ldots , c_k\). By Lemma 1 with \(s = \overrightarrow{1}\) and \(S = \{ j \in [k] : c_j \text {is not a leaf} \}\),

$$\begin{aligned} \frac{{{\mathrm{fwsize }}({P_{\varphi _v}})}}{{\mathrm {Adv}^{\pm }}(\varphi _v)} \le \frac{1}{{\mathrm {Adv}^{\pm }}(\varphi _v)} + \mathop {\mathrm{{max}}}\limits _{j \in S} \mathrm{{max}}\Bigg \{ \frac{{{\mathrm{fwsize }}({P_{\varphi _{c_j}}})}}{{\mathrm {Adv}^{\pm }}(\varphi _{c_j})}, \frac{{{\mathrm{fwsize }}({P_{\varphi _{c_j}}^\dagger })}}{{\mathrm {Adv}^{\pm }}(\varphi _{c_j})} \Bigg \} . \end{aligned}$$
(3.18)

In the case \(\varphi _v(x) = 1\), this follows since \(P_v\) is strict, so in Eq. (2.12) the sum over \(I_\mathrm {free}\) is zero. In the case \(\varphi _v(x) = 0\), this follows since \(P_v\) is in canonical form, so in Eq. (2.13), \({\Vert {|w'\rangle } \Vert }^2 = 1\).

Now by induction, the right-hand side is at most \({\mathrm {Adv}^{\pm }}(\varphi _v)^{-1} + \mathrm{{max}}_{j \in S} \sigma _-({\varphi _{c_j}}) = {\sigma _-(v)}\). \(\Box \)

In particular, applying Lemma 4 for the case \(v = r\), we find

$$\begin{aligned} {{\mathrm{fwsize }}({P_\varphi })} \le {\sigma _-(\varphi )} {\mathrm {Adv}^{\pm }}(\varphi ) = O\big (\beta ^2 {\mathrm {Adv}^{\pm }}(\varphi ) \big ) \end{aligned}$$
(3.19)

since \({\sigma _-(\varphi )} = O(\beta ^2)\) by Lemma 3. Combining Eqs. (3.16) and (3.19) gives

$$\begin{aligned} {{\mathrm{fwsize }}({P_\varphi })} \, {\Vert \mathrm{abs }(A_{G_{P_\varphi }}) \Vert } = \beta ^3 \, 2^{O(k_{\text {max}})} {\mathrm {Adv}^{\pm }}(\varphi ) . \end{aligned}$$
(3.20)

This is \(O({\mathrm {Adv}^{\pm }}(\varphi ))\); since the gate set \(\mathcal{{S}}\) is fixed and finite, \(k_{\text {max}} = O(1)\). Theorem 7 now follows from Theorem 9. \(\Box \)

Note that the lost constant in the theorem grows cubically in the balance parameter \(\beta \) and exponentially in the maximum fan-in \(k_{\text {max}}\) of a gate in \(\mathcal{{S}}\). It is conceivable that this exponential dependence can be improved.

For future reference, we state separately the bound used above to derive Eq. (3.16).

Lemma 5

If \(P_\varphi \) is the direct-sum composition along a formula \(\varphi \) of span programs \(P_v\) and \(P_v^\dagger \), then

$$\begin{aligned} {\Vert \mathrm{abs }(A_{G_P}) \Vert } \le 2 \mathop {\mathrm{{max}}}\limits _{v \in \varphi } \mathrm{{max}}\{ {\Vert \mathrm{abs }(A_{G_{P_v}}) \Vert }, {\Vert \mathrm{abs }(A_{G_{{P_v^\dagger }}}) \Vert } \} . \end{aligned}$$
(3.21)

If the span programs \(P_v\) are monotone, then \({\Vert \mathrm{abs }(A_{G_P}) \Vert } \le 2 \mathop {\mathrm{{max}}}\limits _v {\Vert \mathrm{abs }(A_{G_{P_v}}) \Vert }\).

The claim for monotone span programs follows because then the dual span programs \(P_v^\dagger \) are not used in \(P_\varphi \).

4 Evaluation of Approximately Balanced AND-OR Formulas

The proof of Theorem 8 will again be a consequence of Lemma 1 and Theorem 9.

We will use the following strict, monotone span programs for fan-in-two AND and OR gates:

Definition 7

For \(s_1, s_2 > 0\), define span programs \(P_{\mathrm{AND }}(s_1, s_2)\) and \(P_{\mathrm{OR }}(s_1, s_2)\) computing \(\mathrm{AND }_2\) and \(\mathrm{OR }_2\), \(\{0,1\}^2 \rightarrow \{0,1\}\), respectively, by

$$\begin{aligned} P_{\mathrm{AND }}(s_1, s_2):&{|t\rangle } = \left( \begin{matrix} \alpha _1 \\ \alpha _2 \end{matrix} \right) ,\;&{|v_1\rangle }&= \left( \begin{matrix} \beta _1 \\ 0 \end{matrix} \right) ,\;&{|v_2\rangle }&= \left( \begin{matrix} 0 \\ \beta _2 \end{matrix} \right) \end{aligned}$$
(4.1)
$$\begin{aligned} P_{\mathrm{OR }}(s_1, s_2):&{|t\rangle } = \delta ,\;&{|v_1\rangle }&= \epsilon _1 ,\;&{|v_2\rangle }&= \epsilon _2 \end{aligned}$$
(4.2)

Both span programs have \(I_{1,1} = \{1\}\), \(I_{2,1} = \{2\}\) and \(I_\mathrm {free}= I_{1,0} = I_{2,0} = \emptyset \). Here the parameters \(\alpha _j, \beta _j, \delta , \epsilon _j\), for \(j \in [2]\), are given by

$$\begin{aligned} \alpha _j&= (s_j / s_p)^{1/4}&\beta _j&= 1 \end{aligned}$$
(4.3)
$$\begin{aligned} \delta&= 1&\epsilon _j&= (s_j / s_p)^{1/4} , \end{aligned}$$
(4.4)

where \(s_p = s_1 + s_2\). Let \(\alpha = \sqrt{\alpha _1^2 + \alpha _2^2}\) and \(\epsilon = \sqrt{\epsilon _1^2 + \epsilon _2^2}\).

Note that \(\alpha , \epsilon \in (1, 2^{1/4}]\). They are largest when \(s_1 = s_2\).

Claim

The span programs \(P_{\mathrm{AND }}(s_1, s_2)\) and \(P_{\mathrm{OR }}(s_1, s_2)\) satisfy:

$$\begin{aligned} {{\mathrm{wsize }}_{(\sqrt{s_1}, \sqrt{s_2})}({P_{\mathrm{AND }}},{x})}&= {\left\{ \begin{array}{ll} \sqrt{s_p} &{} \text {if}\ x \in \{11, 10, 01\} \\ \frac{\sqrt{s_p}}{2} &{} \text {if}\ x = 00 \end{array}\right. } \nonumber \\ {{\mathrm{wsize }}_{(\sqrt{s_1}, \sqrt{s_2})}({P_{\mathrm{OR }}},{x})}&= {\left\{ \begin{array}{ll} \sqrt{s_p} &{} \text {if}\ x \in \{00, 10, 01\} \\ \frac{\sqrt{s_p}}{2} &{} \text {if}\ x = 11 \end{array}\right. } \end{aligned}$$
(4.5)

Proof

These are calculations using Definition 5 for the witness size. Letting \(\sigma = (\sqrt{s_1}, \sqrt{s_2})\), \(Q = P_{\mathrm{AND }}(s_1, s_2)\) and \(R = P_{\mathrm{OR }}(s_1, s_2)\), we have

$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({Q},{11})}&= \Big (\frac{\alpha _1}{\beta _1}\Big )^2 \sqrt{s_1} + \Big (\frac{\alpha _2}{\beta _2}\Big )^2 \sqrt{s_2} = \sqrt{s_p}\end{aligned}$$
(4.6)
$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({Q},{10})}&= \Big ( \frac{\beta _2}{\alpha _2} \Big )^2 \sqrt{s_2} = \sqrt{s_p} \end{aligned}$$
(4.7)
$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({Q},{00})}&= \left( \Big (\frac{\alpha _1}{\beta _1}\Big )^2 \frac{1}{\sqrt{s_1}} + \Big (\frac{\alpha _2}{\beta _2}\Big )^2 \frac{1}{\sqrt{s_2}} \right) ^{-1} \!\! = \frac{\sqrt{s_p}}{2} \end{aligned}$$
(4.8)
$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({Q},{01})}&= \Big ( \frac{\beta _1}{\alpha _1} \Big )^2 \sqrt{s_1} = \sqrt{s_p} \end{aligned}$$
(4.9)

and

$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({R},{11})}&= \delta ^2 \Big ( \frac{\epsilon _1^2}{\sqrt{s_1}} + \frac{\epsilon _2^2}{\sqrt{s_2}} \Big )^{-1} = \frac{\sqrt{s_p}}{2} \end{aligned}$$
(4.10)
$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({R},{10})}&= \Big ( \frac{\delta }{\epsilon _1} \Big )^2 \sqrt{s_1} = \sqrt{s_p} \end{aligned}$$
(4.11)
$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({R},{00})}&= \Big (\frac{\epsilon _1}{\delta }\Big )^2 \sqrt{s_1} + \Big (\frac{\epsilon _2}{\delta }\Big )^2 \sqrt{s_2} = \sqrt{s_p} \end{aligned}$$
(4.12)
$$\begin{aligned} {{\mathrm{wsize }}_{\sigma }({R},{01})}&= \Big ( \frac{\delta }{\epsilon _2} \Big )^2 \sqrt{s_2} = \sqrt{s_p} . \end{aligned}$$
(4.13)

It is not a coincidence that \({{\mathrm{wsize }}_{\sigma }({Q},{x})} = {{\mathrm{wsize }}_{\sigma }({R},{\bar{x}})}\) for all \(x \in \{0,1\}^2\). This can be seen as a consequence of De Morgan’s laws and span program duality—see [Rei09, Lemma 4.1]. \(\Box \)

Proof

(of Theorem 8) Let \(\varphi \) be an AND-OR formula of size \(n\), i.e., on \(n\) input bits.

First expand out the formula so that every AND gate and every OR gate has fan-in two. This expansion can be carried out without increasing \({\sigma _-(\varphi )}\) by more than a factor of \(10\):

Lemma 6

([ACR+10, Lemma 8]) For any AND-OR formula \(\varphi \), one can efficiently construct an equivalent AND-OR formula \(\varphi '\) of the same size, such that all gates in \(\varphi '\) have fan-in at most two, and \({\sigma _-(\varphi ')} = O({\sigma _-(\varphi )})\).

Therefore we may assume that \(\varphi \) is a formula over fan-in-two \(\mathrm{AND }\) and \(\mathrm{OR }\) gates.

Now use direct-sum composition to compose the \(\mathrm{AND }\) and \(\mathrm{OR }\) gates according to the formula \(\varphi \), as in the proof of Theorem 7. Since the span programs for \(\mathrm{AND }\) and \(\mathrm{OR }\) are monotone, direct-sum composition does not make use of dual span programs computing NAND or NOR. Therefore there is no need to specify these span programs. At a vertex \(v\), set the weights \(s_1\) and \(s_2\) to equal the sizes of \(v\)’s two input subformulas. Let \(P_v\) be the span program used at vertex \(v\), \(P_{\varphi _v}\) be the span program thus constructed for the subformula \(\varphi _v\), and \(P_\varphi \) be the span program constructed computing \(\varphi \). With this choice of weights, it follows from Claim 4 and [Rei09, Theorem 4.3] that \({{\mathrm{wsize }}({P_{\varphi _v}})} = {\mathrm {Adv}^{\pm }}(\varphi _v) = {\mathrm {Adv}}(\varphi _v)\).

Notice that for all \(s_1, s_2 \in [0, \infty )\), \({\Vert \mathrm{abs }(A_{G_{P_{\mathrm{AND }}(s_1, s_2)}}) \Vert } = O(1)\) and \({\Vert \mathrm{abs }(A_{G_{P_{\mathrm{OR }}(s_1, s_2)}}) \Vert } = O(1)\). Therefore, by Lemma 5, we obtain that \({\Vert \mathrm{abs }(A_{G_{P_\varphi }}) \Vert } = O(1)\).

Thus to apply Theorem 9 we need only bound \({{\mathrm{fwsize }}({P_\varphi })}\). Lemma 4 does not apply, because for \(P_{\mathrm{AND }}(s_1, s_2)\), an optimal witness \({|w'\rangle }\) to \(f_{P_{\mathrm{AND }}}(x) = 0\) might have \({\Vert {|w'\rangle } \Vert }^2 > 1\), as each \(\alpha _j < 1\). (Lemma 4 would apply had we set the parameters to be \(\alpha _1 = \alpha _2 = 1\), \(\beta _j = (s_p / s_j)^{1/4}\), but then \({\Vert A_{G_{P_{\mathrm{AND }}}} \Vert }\) would not necessarily be \(O(1)\).) However, analogous to Lemma 4, we will show:

Lemma 7

Let \(v\) be a vertex of \(\varphi \). Then

$$\begin{aligned} {{\mathrm{fwsize }}({P_{\varphi _v}},{x})} \le {\left\{ \begin{array}{ll} {\sigma _-(v)} {\mathrm {Adv}}(\varphi _v) &{} \text {if}\ \varphi _v(x) = 1 \\ 2 {\sigma _-(v)} {\mathrm {Adv}}(\varphi _v) - 1 &{} \text {if}\ \varphi _v(x) = 0 \end{array}\right. } \end{aligned}$$
(4.14)

Proof

The proof is by induction in the maximum distance from \(v\) to a leaf. The base case, that \(v\)’s two inputs are themselves leaves is by definition of \(P_v\), since then \({\sigma _-(v)} = 1 + 1/\sqrt{2}\).

Let \(v\) have children \(c_1\) and \(c_2\). We will use Lemma 1 with \(s = \overrightarrow{1}\), \(S = \{ j \in [2] : c_j \text {is not a leaf} \}\).

If \(\varphi _v(x) = 1\), then since \(P_v\) is a strict span program, i.e., \(I_\mathrm {free}= \emptyset \), Eq. (2.12) gives

$$\begin{aligned} \frac{ {{\mathrm{fwsize }}({P_{\varphi _v}},{x})} }{ {\mathrm {Adv}}(\varphi _v) } \le \frac{1}{{\mathrm {Adv}}(\varphi _v)} + \mathop {\mathrm{{max}}}\limits _{j \in S} \frac{{{\mathrm{fwsize }}({P_{\varphi _{c_j}}})}}{{\mathrm {Adv}}(\varphi _{c_j})} . \end{aligned}$$
(4.15)

By induction, the right-hand side is at most \(1/{\mathrm {Adv}}(\varphi _v) + \mathop {\mathrm{{max}}}\limits _j {\sigma _-(c_j)} = {\sigma _-(v)}\).

If \(\varphi _v(x) = 0\) and \(g_v\) is an OR gate, then the unique witness \({|w'\rangle }\) for \(P_v\) has \({\Vert {|w'\rangle } \Vert } = 1\), from Definition 7. From Eq. (2.13) and the induction hypothesis,

$$\begin{aligned} \frac{ {{\mathrm{fwsize }}({P_{\varphi _v}},{x})} }{ {\mathrm {Adv}^{\pm }}(\varphi _v) }&\le \frac{ 1 }{{\mathrm {Adv}}(\varphi _v)} + \mathop {\mathrm{{max}}}\limits _{j \in S} \Big ( 2 {\sigma _-(c_j)} - \frac{1}{{\mathrm {Adv}}(\varphi _{c_j})} \Big ) \nonumber \\&< 2 {\sigma _-(v)} - \frac{1}{{\mathrm {Adv}}(\varphi _v)} , \end{aligned}$$
(4.16)

as claimed.

Therefore assume that \(\varphi _v(x) = 0\) and \(g_v\) is an \(\mathrm{AND }\) gate. Let \(s_1\) and \(s_2\) be the sizes of the two input subformulas to \(v\), \(s_p = s_1 + s_2 = {\mathrm {Adv}}(\varphi _v)^2\), and assume without loss of generality that \(\varphi _{c_1}(x) = 0\). If \(\varphi _{c_2}(x) = 0\) as well, then assume without loss of generality that \(2 {\sigma _-(c_1)} - \frac{1}{\sqrt{s_1}} \ge 2 {\sigma _-(c_2)} - \frac{1}{\sqrt{s_2}}\), so \(\sigma (\bar{y}) \le 2 {\sigma _-(c_1)} - \frac{1}{\sqrt{s_1}}\). Then the witness \({|w'\rangle }\) may be taken to be \({|w'\rangle } = (1/\alpha _1, 0) = \big ( (s_p/s_1)^{1/4}, 0 \big )\). From Eq. (2.13),

$$\begin{aligned} \frac{{{\mathrm{fwsize }}({P_{\varphi _v}},{x})} }{{\mathrm {Adv}^{\pm }}(\varphi _v) }&\le \frac{ \sqrt{s_p/s_1} }{{\mathrm {Adv}^{\pm }}(\varphi _v)} + \sigma (\bar{y}) \nonumber \\&\le \frac{1}{\sqrt{s_1}} + \Big ( 2 {\sigma _-(c_1)} - \frac{1}{\sqrt{s_1}} \Big ) \nonumber \\&< 2 {\sigma _-(v)} - \frac{1}{\sqrt{s_p}} , \end{aligned}$$
(4.17)

as claimed. \(\Box \)

In particular, applying Lemma 7 for the case \(v = r\), we find

$$\begin{aligned} {{\mathrm{fwsize }}({P_\varphi })} \le 2 {\sigma _-(\varphi )} {\mathrm {Adv}}(\varphi ) = 2 {\sigma _-(\varphi )} \sqrt{n} . \end{aligned}$$
(4.18)

Theorem 8 now follows from Theorem 9. \(\Box \)

5 Open Problems

In order to begin to relax the balance condition for general formulas, it seems that we need a better understanding of the canonical span programs. For example, can the norm bound Lemma 2 be improved?

Although the two-sided bounded-error quantum query complexity of evaluating formulas is beginning to be understood, the zero-error quantum query complexity [BCWZ99] appears to be more complicated. For example, the exact and zero-error quantum query complexities for \(\mathrm{OR }_n\) are both \(n\) [BBC+01]. On the other hand, Ambainis et al. [ACGT10] use the [ACR+10] algorithm as a subroutine in the construction of a self-certifying, zero-error quantum algorithm that makes \(O(\sqrt{n} \log ^2 n)\) queries to evaluate the balanced binary AND-OR formula. It is not known how to relax the balance requirement or extend the gate set.

Can we develop further methods for constructing span programs with small full witness size, norm and maximum degree? A companion paper [Rei11] studies reduced tensor-product span program composition in order to complement the direct-sum composition that we have used here.

The case of formulas over non-boolean gates may be more complicated [Rei09], but is still intriguing.