1 Introduction

Span programs are a model of computation first used to study logspace complexity [15], and more recently, introduced to the study of quantum algorithms in [21]. They are of immense theoretical importance, having been used to show that the general adversary bound gives a tight lower bound on the quantum query complexity of any decision problem [19, 20]. As a means of designing quantum algorithms, it is known that for any decision problem, there exists a span-program-based algorithm with asymptotically optimal quantum query complexity, but this fact alone gives no indication of how to find such an algorithm. Despite the relative difficulty in designing quantum algorithms this way, there are many applications, including formula evaluation [20, 21], a number of algorithms based on the learning graph framework [5], st-connectivity [7] and k-distinctness [4]. Although generally quantum algorithms designed via span programs can only be analyzed in terms of their query complexity, in some cases their time complexity can also be analyzed, as is the case with the quantum algorithm for st-connectivity. In the case of the quantum algorithm for k-distinctness, the ideas used in designing the span program could be turned into a quantum algorithm for 3-distinctness with time complexity matching its query complexity up to logarithmic factors [3].

In this work, we consider new ways of designing quantum algorithms via span programs. Consider Grover’s quantum search algorithm, which, on input \(x\in \{0,1\}^n\), decides if there is some \(i\in [n]\) such that \(x_i=1\) using only \(O(\sqrt{n})\) quantum operations [11]. The ideas behind this algorithm have been used in innumerable contexts, but in particular, a careful analysis of the ideas behind Grover’s algorithm led to algorithms for similar problems, including a class of threshold functions: given \(x\in \{0,1\}^n\), decide if \(|x|\ge t\) or \(|x|<\varepsilon t\), where |x| denotes the Hamming weight; and approximate counting: given \(x\in \{0,1\}^n\), output an estimate of |x| to some desired accuracy. The results in this paper offer the possibility of obtaining analogous results for any span program. That is, given a span program for some problem f, our results show that one can obtain, not only an algorithm for f, but algorithms for a related class of threshold functions, as well as an algorithm for estimating a quantity called the span program witness size, which is analogous to |x| in the above example (and is in fact exactly 1 / |x| in the span program for the OR function — see Sect. 3.3).

New Algorithms from Span Programs We give several new means of constructing quantum algorithms from span programs. Roughly speaking, a span program can be turned into a quantum algorithm that decides between two types of inputs: those that “hit” a certain “target vector”, and those that don’t. We show how to turn a span program into an algorithm that decides between inputs that get “close to” the target vector, and those that don’t. Whereas traditionally a span program has been associated with some decision problem, we can now associate, with one span program, a whole class of threshold problems.

In addition, for any span program P, we can construct a quantum algorithm that estimates the positive witness size, \(w_+(x)\), to accuracy \(\varepsilon \) in \(\frac{1}{\varepsilon ^{3/2}}\sqrt{w_+(x)\widetilde{W}_-}\) queries, where \(\widetilde{W}_-\) is the approximate negative witness complexity of P. This construction is useful whenever we can construct a span program for which \(w_+(x)\) corresponds to some function we care to estimate, as is the case with the span program for OR, in which \(w_+(x)=\frac{1}{|x|}\), or the span from for st-connectivity, in which \(w_+(G)=\frac{1}{2}R_{s,t}(G)\), where G is a graph, and \(R_{s,t}(G)\) is the effective resistance between s and t in G. We show similar results for estimating the negative witness size.

Structural Results Our analysis of the structure of span programs increases the theoretical understanding of this important model. One implication of this is alternative algorithms for estimating the witness size when the phase gap (or spectral gap) of a certain unitary associated with the span program can be lower bounded. This is in contrast to previous span program algorithms, including those mentioned in the previous paragraph, which have all relied on effective spectral gap analysis. We show how the phase gap can be lower bounded by \(\frac{\sigma _{\max }(A)}{\sigma _{\min }(A(x))}\), where A and A(x) are linear operators associated with the span program and some input x, and \(\sigma _{\min }\) and \(\sigma _{\max }\) are the smallest and largest nonzero singular values.

In addition, our exposition highlights the relationship between span programs and estimating the size of the smallest solution to a linear system, which is a problem solved by Harrow et al. [12]. It is not yet clear if this relationship can lead to new algorithms, but it is an interesting direction for future work, which we discuss in Sect. 6.

Application to Effective Resistance An immediate application of our results is a quantum algorithm for estimating the effective resistance between two vertices in a graph, \(R_{s,t}(G)\). This example is immediate, because in [7], a span program for st-connectivity was presented, in which the positive witness size corresponds to \(R_{s,t}(G)\). The results of Belovs and Reichardt [7], combined with our new span program algorithms, immediately yield an upper bound of \(\widetilde{O}(\frac{1}{\varepsilon ^{3/2}}n\sqrt{R_{s,t}(G)})\) for estimating the effective resistance to relative accuracy \(\varepsilon \). This upper bound also holds for time complexity, due to the time complexity analysis of Belovs and Reichardt [7]. Using our new spectral analysis techniques, we are also able to get an often better upper bound of \(\widetilde{O}\left( \frac{1}{\varepsilon }n\sqrt{{R_{s,t}(G)}/{\mu }}\right) \), on the time complexity of estimating effective resistance, where \(\mu \) is a lower bound on \(\lambda _2(G)\), the second smallest eigenvalue of the Laplacian of G. These upper bounds are incomparable. The second is preferable if it is promised that \(\lambda _2(G)\ge \mu \) for some \(\mu \) that is larger than the desired error bound \(\varepsilon \), and otherwise the first upper bound is better. Both algorithms use \(O(\log n)\) space. We also show that a linear dependence on n is necessary, so our results cannot be significantly improved.

These are the first quantum algorithms for this problem in the adjacency query model. Previous results have studied the problem in the edge-list model [23]. At the end of Sect. 5, we compare our results to Wang [23]. Classically, this quantity can be computed exactly by inverting the Laplacian, which costs \(O(m)=O(n^2)\), where m is the number of edges in the input graph.

Outline In Sect. 2, we describe the algorithmic subroutines and standard linear algebra that will form the basis of our algorithms. In Sect. 3.1, we review the use of span programs in the context of quantum query algorithms, followed in Sect. 3.2 by our new paradigm of approximate span programs. At this point we will be able to formally state our results about how to use span programs to construct quantum algorithms. In Sect. 3.4, we describe the structure of span programs, giving several results that will help us develop algorithms. The new algorithms from span programs are developed in Sect. 4, and finally, in Sect. 5, we present our applications to estimating effective resistance. In Sect. 6, we discuss open problems.

2 Preliminaries

To begin, we fix notation. For vector spaces V and W, we let \(\mathcal {L}(V,W)\) denote the set of linear operators from V to W. For any operator \(A\in \mathcal {L}(V,W)\), we denote by \(\mathrm {col}A\) the columnspace, \(\mathrm {row}A\) the rowspace, and \(\ker A\) the kernel of A. \(\sigma _{\min }(A)\) and \(\sigma _{\max }(A)\) denote the smallest and largest non-zero singular values, respectively. \(A^+\) denotes the Moore-Penrose pseudo-inverse.

The algorithms in this paper solve either decision problems, or estimation problems. For \(f:X\subseteq [q]^n\rightarrow \{0,1\}\), we say that an algorithm decides f with bounded error if for any \(x\in X\), with probability at least 2 / 3, the algorithm outputs f(x) on input x. For \(f:X\subseteq [q]^n\rightarrow \mathbb {R}_{\ge 0}\), we say that an algorithm estimatesfto relative accuracy\(\varepsilon \) with bounded error if for any \(x\in X\), with probability at least 2 / 3, on input x the algorithm outputs \(\tilde{f}\) such that \(|{f(x)-\tilde{f}}|\le \varepsilon f(x).\) In both cases, using 2 / 3 is without loss of generality: any algorithm with success probability bounded above 1 / 2 by a constant can be amplified to success probability arbitrarily close to 1 by taking the median of the outputs of a constant number of repetitions of the algorithm. We generally omit the description “with bounded error”, as all of our algorithms have bounded error.

All algorithms presented in this paper are based on the following structure. We have some initial state \({|}\phi _0\rangle \), and some unitary operator U, and we want to estimate \(\left\| \varPi _0{|}\phi _0\rangle \right\| \), where \(\varPi _0\) is the orthogonal projector onto the 1-eigenspace of U. The first step in this process is a quantum algorithm that estimates, in a new register, the phase of U applied to the input state.

Theorem 1

(Phase Estimation [8, 14]) Let U be a unitary with spectral decomposition \(U=\sum _{j=1}^m e^{i\theta _j}{|}\psi _j\rangle {\langle }\psi _j|\), for \(\theta _1,\dots ,\theta _m \in (-\pi ,\pi ]\). For any \(\varTheta \in (0,\pi )\) and \(\varepsilon \in (0,1)\), there exists a quantum algorithm that makes \(O\left( \frac{1}{\varTheta }\log \frac{1}{\varepsilon }\right) \) controlled calls to U and, on input \({|}\psi _j\rangle \), outputs a state \({|}\psi _j\rangle {|}\omega \rangle \) such that if \(\theta _j=0\), then \({|}\omega \rangle ={|}0\rangle \), and if \(|\theta _j|\ge \varTheta \), \(|{{\langle }0\vert }\omega \rangle |^2\le \varepsilon \). If U acts on s qubits, the algorithm uses \(O(s+\log \frac{1}{\varTheta })\) space.

The precision needed to isolate \(\varPi _0{|}\phi _0\rangle \) depends on the smallest nonzero phase of U, the phase gap.

Definition 1

(Phase Gap) Let \(\{e^{i\theta _j}\}_{j\in S}\) be the eigenvalues of a unitary operator U, with \(\{\theta _j\}_{j\in S}\subset (-\pi ,\pi ]\). Then the phase gap of U is \(\varDelta (U):=\min \{|\theta _j|:\theta _j\ne 0\}\).

In order to estimate \(\left\| \varPi _0{|}\phi _0\rangle \right\| ^2\), given a state \({|}0\rangle \varPi _0{|}\phi _0\rangle +{|}1\rangle (I-\varPi _0){|}\phi _0\rangle \), we use the following.

Theorem 2

(Amplitude Estimation [6]) Let \(\mathcal {A}\) be a quantum algorithm that outputs \(\sqrt{p(x)}{|}0\rangle {|}\varPsi _x(0)\rangle +\sqrt{1-p(x)}{|}1\rangle {|}\varPsi _x(1)\rangle \) on input x. Then there exists a quantum algorithm that estimates p(x) to precision \(\varepsilon \) using \(O\left( \frac{1}{\varepsilon }\frac{1}{\sqrt{p(x)}}\right) \) calls to \(\mathcal {A}\).

If we know that the amplitude is either \(\le p_1\) or \(\ge p_0\) for some \(p_1<p_0\), then we can use amplitude estimation to distinguish between these two cases.

Corollary 1

(Amplitude Gap) Let \(\mathcal {A}\) be a quantum algorithm that outputs \(\sqrt{p(x)}{|}0\rangle {|}\varPsi _x(0)\rangle +\sqrt{1-p(x)}{|}1\rangle {|}\varPsi _x(1)\rangle \) on input x. For any \(0\le p_1<p_0\le 1\), we can distinguish between the cases \(p(x)\ge p_0\) and \(p(x)\le p_1\) with bounded error using \(O\left( \frac{\sqrt{p_0}}{p_0-p_1}\right) \) calls to \(\mathcal {A}\).

Proof

By [6, Thm. 12], using M calls to \(\mathcal {A}\), we can obtain an estimate \(\tilde{p}\) of p(x) such that

$$\begin{aligned} \left| \tilde{p}-p(x) \right| \le \frac{2\pi \sqrt{p(x)(1-p(x))}}{M}+\frac{\pi ^2}{M^2} \end{aligned}$$

with probability 3 / 4. Let \(M=4\pi \frac{\sqrt{p_0+p_1}}{p_0-p_1}\). Then note that for any \(x_1\) and \(x_0\) such that \(p(x_1)\le p_1\) and \(p(x_0)\ge p_0\), using \(\sqrt{p_0+p_1}\ge (\sqrt{p_0}+\sqrt{p_1})/\sqrt{2}\),

$$\begin{aligned} M\ge & {} 2\sqrt{2}\pi \frac{\sqrt{p_0}+\sqrt{p_1}}{p_0-p_1} =2\sqrt{2}\pi \frac{1}{\sqrt{p_0}-\sqrt{p_1}} \ge 2\sqrt{2}\pi \frac{1}{\sqrt{p(x_0)}-\sqrt{p(x_1)}}\\= & {} 2\sqrt{2}\pi \frac{\sqrt{p(x_0)}+\sqrt{p(x_1)}}{p(x_0)-p(x_1)}. \end{aligned}$$

If \(\tilde{p}_1\) is the estimate obtained on input \(x_1\), then we have, with probability 3/4:

$$\begin{aligned} \tilde{p}_1\le & {} p(x_1)+\frac{2\pi \sqrt{p(x_1)(1-p(x_1))}}{M}+\frac{\pi ^2}{M^2}\\\le & {} p(x_1)+\frac{\sqrt{p(x_1)}(p(x_0)-p(x_1))}{\sqrt{2}(\sqrt{p(x_0)}+\sqrt{p(x_1)})}+\frac{(p_0-p_1)^2}{16(p_0+p_1)}. \end{aligned}$$

On the other hand, if \(\tilde{p}_0\) is an estimate of \(p(x_0)\), then with probability 3/4:

$$\begin{aligned} \tilde{p}_0\ge & {} p(x_0)-\frac{{2}\pi \sqrt{p(x_0)(1-p(x_0))}}{M}-\frac{\pi ^2}{M^2}\\\ge & {} p(x_0)-\frac{\sqrt{p(x_0)}(p(x_0)-p(x_1))}{\sqrt{2}(\sqrt{p(x_0)}+\sqrt{p(x_1)})}-\frac{(p_0-p_1)^2}{16(p_0+p_1)}. \end{aligned}$$

We complete the proof by showing that \(\tilde{p}_1<\tilde{p}_0\), so we can distinguish these two events. We have:

$$\begin{aligned} \tilde{p}_0-\tilde{p}_1\ge & {} p(x_0)-p(x_1)-\frac{(p(x_0)-p(x_1))(\sqrt{p(x_0)}+\sqrt{p(x_1)})}{\sqrt{2}(\sqrt{p(x_0)}+\sqrt{p(x_1)})}-\frac{(p_0-p_1)^2}{8(p_0+p_1)}\\\ge & {} \left( 1-\frac{1}{\sqrt{2}}\right) (p_0-p_1)-\frac{1}{8}(p_0-p_1) \ge \frac{1}{6}(p_0-p_1) > 0. \end{aligned}$$

Thus, using \(4\pi \frac{\sqrt{p_0+p_1}}{p_0-p_1}=O\left( \frac{\sqrt{p_0}}{p_0-p_1}\right) \) calls to \(\mathcal {A}\), we can distinguish between \(p(x)\le p_1\) and \(p(x)\ge p_0\) with success probability 3/4. \(\square \)

In order to make use of phase estimation, we will need to analyze the spectrum of a particular unitary, which, in our case, consists of a pair of reflections. The following was first used in the context of quantum algorithms in [22]:

Theorem 3

Let \(U=(2\varPi _A-I)(2\varPi _B-I)\) be a product of two reflections on a space H containing \(A=\mathrm {span}\{{|}\psi _1\rangle ,\dots ,{|}\psi _a\rangle \}\) and \(B=\mathrm {span}\{{|}\phi _1\rangle ,\dots ,{|}\phi _b\rangle \}\), with \(\varPi _A=\sum _{i=1}^a{|}\psi _i\rangle {\langle }\psi _i|\) and \(\varPi _B=\sum _{i=1}^b{|}\phi _i\rangle {\langle }\phi _i|\). Let \(D=\varPi _A\varPi _B\) be the discriminant of U with singular value decomposition \(\sum _{j=1}^r\cos \theta _j{|}\alpha _j\rangle {\langle }\beta _j|\), with \(\theta _j\in [0,\frac{\pi }{2}]\). Then the spectrum of U is \(\{e^{\pm 2i\theta _j}\}_j\). The 1-eigenspace of U is \((A\cap B)\oplus (A^\bot \cap B^\bot )\) and the \((-1)\)-eigenspace is \((A\cap B^\bot )\oplus (A^\bot \cap B)\).

Let \(\varLambda _A=\sum _{j=1}^a{|}\psi _j\rangle {\langle }j|\) and \(\varLambda _B=\sum _{j=1}^b{|}\phi _j\rangle {\langle }j|\). We note that in the original statement of Theorem 3, the discriminant is defined \(D'=\varLambda _A^\dagger \varLambda _B\). However it is easy to see that \(D'\) and D have the same singular values: if \(D'=\sum _i\sigma _i{|}v_i\rangle {\langle }u_i|\) is a singular value decomposition of \(D'\), then \(D=\sum _i\sigma _i\varLambda _A{|}v_i\rangle {\langle }u_i|\varLambda _B^\dagger \) is a singular value decomposition of D.

The following corollary to Theorem 3 will be useful in the analysis of several algorithms.

Corollary 2

(Phase gap and discriminant) Let D be the discriminant of a unitary \(U=(2\varPi _A-I)(2\varPi _B-I)\). Then \(\varDelta (-U)\ge 2\sigma _{\min }(D)\).

Proof

By Theorem 3, if \(\{\sigma _0=\cos \theta _0< \sigma _1=\cos \theta _1 < \cdots \sigma _m=\cos \theta _m\}\) are the non-zero singular values of D, for \(\theta _j\in [0,\frac{\pi }{2})\), then the phases of U in \((-\pi ,\pi )\) are \(\{\pm 2\theta _j\}_{j=0}^m\) (U might also have \((-1)=e^{i\pi }\) as an eigenvalue), and so \(-U\) has non-zero phases \(\{\pm 2\theta _j\mp \pi \}_{j=0}^m=\{\pm (2\theta _j-\pi )\}_{j=0}^m\). Thus

$$\begin{aligned} \varDelta (-U)= & {} \min \{|\pi -2\theta _j|:\theta _j\ne {\pi }/{2}\}=|\pi -2\cos ^{-1}\min \{\sigma _j:\sigma _j\ne 0\}|\\= & {} |\pi -2\cos ^{-1}\sigma _{\min }(D)|. \end{aligned}$$

We have \(\theta \ge \sin \theta = \cos ({\pi }/{2}-\theta )\), so \(\sigma _{\min }(D)\ge \cos (\pi /2-\sigma _{\min }(D))\). Then since \(\cos \) is decreasing on the interval \([0,\pi /2]\), we have \(\cos ^{-1}(\sigma _{\min }(D))\le {\pi }/{2}-\sigma _{\min }(D)\), and thus

$$\begin{aligned} \varDelta (-U)\ge \left| \pi -2\left( {\pi }/{2}-\sigma _{\min }(D)\right) \right| =2\sigma _{\min }(D). \end{aligned}$$

\(\square \)

Finally, we will make use of the following lemma, which first appeared in this form in [16]:

Lemma 1

(Effective Spectral Gap Lemma) Let \(U=(2\varPi _A-I)(2\varPi _B-I)\) be the product of two reflections, and let \(\varPi _\varTheta \) be the orthogonal projector onto \(\mathrm {span}\{{|}u\rangle :U{|}u\rangle =e^{i\theta }{|}u\rangle ,|\theta |\le \varTheta \}\). Then if \(\varPi _A{|}u\rangle =0\), \(\left\| \varPi _\varTheta \varPi _B{|}u\rangle \right\| \le \frac{\varTheta }{2}\left\| {|}u\rangle \right\| \).

3 Approximate Span Programs

3.1 Span Programs and Decision Problems

In this section, we review the concept of span programs, and their use in quantum algorithms.

Definition 2

(Span Program) A span program \(P=(H,V,\tau ,A)\) on \([q]^n\) consists of

  1. 1.

    finite-dimensional inner product spaces \(H=H_1\oplus \dots \oplus H_n\oplus H_\mathrm {true}\oplus H_\mathrm {false}\), and \(\{H_{j,a}\subseteq H_j\}_{j\in [n],a\in [q]}\) such that \(H_{j,1}+\dots +H_{j,q}=H_j\),

  2. 2.

    a vector space V,

  3. 3.

    a target vector\(\tau \in V\), and

  4. 4.

    a linear operator \(A\in \mathcal {L}(H,V)\).

To each \(x\in [q]^n\), we associate a subspace \(H(x):=H_{1,x_1}\oplus \dots \oplus H_{n,x_n}\oplus H_\mathrm {true}\).

Although our notation in Definition 2 deviates from previous span program definitions, the only difference in the substance of the definition is that the spaces \(H_{j,a}\) and \(H_{j,b}\) for \(a\ne b\) need not be orthogonal in our definition. This has the effect of removing \(\log q\) factors in the equivalence between span programs and the dual adversary bound (for details see [13, Sec. 7.1]). The spaces \(H_{\mathrm {true}}\) and \(H_{\mathrm {false}}\) can be useful for designing a span program, but are never required, since we can always add an \((n+1)\)\(^\mathrm {th}\) variable, set \(x_{n+1}=1\), and let \(H_{n+1,0}=H_{\mathrm {false}}\) and \(H_{n+1,1}=H_{\mathrm {true}}\).

A span program on \([q]^n\) partitions \([q]^n\) into two sets: positive inputs, which we call \(P_1\), and negative inputs, which we call \(P_0\). The importance of this partition stems from the fact that a span program may be converted into a quantum algorithm for deciding this partition in the quantum query model [19, 20]. Thus, if one can construct a span program whose partition of \([q]^n\) corresponds to a problem one wants to solve, an algorithm follows. In order to describe how a span program partitions \([q]^n\) and the query complexity of the resulting algorithm, we need the concept of positive and negative witnesses and witness size.

Definition 3

(Positive and Negative Witness) Fix a span program P on \([q]^n\), and a string \(x\in [q]^n\). We say that \({|}w\rangle \) is a positive witness for x in P if \({|}w\rangle \in H(x)\), and \(A{|}w\rangle =\tau \). We define the positive witness size of x as:

$$\begin{aligned} w_+(x,P)=w_+(x)=\min \{\left\| {|}w\rangle \right\| ^2: {|}w\rangle \in H(x):A{|}w\rangle =\tau \}, \end{aligned}$$

if there exists a positive witness for x, and \(w_+(x)=\infty \) else. We say that \(\omega \in \mathcal {L}(V,\mathbb {R})\) is a negative witness for x in P if \(\omega A\varPi _{H(x)} = 0\) and \(\omega \tau = 1\). We define the negative witness size of x as:

$$\begin{aligned} w_-(x,P)=w_-(x)=\min \{\left\| \omega A \right\| ^2:{\omega \in \mathcal {L}(V,\mathbb {R}): \omega A\varPi _{H(x)}=0, \omega \tau =1}\}, \end{aligned}$$

if there exists a negative witness, and \(w_-(x)=\infty \) otherwise. If \(w_+(x)\) is finite, we say that x is positive (wrt. P), and if \(w_-(x)\) is finite, we say that x is negative. We let \(P_1\) denote the set of positive inputs, and \(P_0\) the set of negative inputs for P. Note that for every \(x\in [q]^n\), exactly one of \(w_-(x)\) and \(w_+(x)\) is finite; that is, \((P_0,P_1)\) partitions \([q]^n\).

For a decision problem \(f:X\subseteq [q]^n\rightarrow \{0,1\}\), we say that Pdecidesf if \(f^{-1}(0)\subseteq P_0\) and \(f^{-1}(1)\subseteq P_1\). In that case, we can use P to construct a quantum algorithm that decides f.

Theorem 4

[19] Fix \(f:X\subseteq [q]^n\rightarrow \{0,1\}\), and let P be a span program on \([q]^n\) that decides f. Let \(W_+(f,P)=\max _{x\in f^{-1}(1)}w_+(x,P)\) and \(W_-(f,P)=\max _{x\in f^{-1}(0)}w_-(x,P)\). Then there exists a quantum algorithm that decides f using \(O(\sqrt{W_+(f,P)W_-(f,P)})\) queries.

We call \(\sqrt{W_+(f,P)W_-(f,P)}\) the complexity of P. It is known that for any decision problem, there exists a span program whose complexity is equal, up to constants, to its query complexity [19, 20] ([13, Sec. 7.1] removes log factors in this statement), however, it is generally a difficult task to find such an optimal span program.

3.2 Span Programs and Approximate Decision Problems

Consider a span program P and \(x\in P_0\). Suppose there exists \({|}w\rangle \in H(x)\) such that \(A{|}w\rangle \) is very close to \(\tau \). We might say that x is very close to being in \(P_1\). If all vectors in H(y) for \(y\in P_0\setminus \{x\}\) are mapped far from \(\tau \), it might be more natural to consider the partition \((P_0\setminus \{x\},P_1\cup \{x\})\) rather than \((P_0,P_1)\).

As further motivation, we mention a construction of Reichardt [19, Sec. 3 of full version] that takes any quantum query algorithm with one-sided error, and converts it into a span program whose complexity matches the query complexity of the algorithm. The target of the span program is the vector \({|}1,\bar{0}\rangle \), which corresponds to a quantum state with a 1 in the answer register and 0s elsewhere. If an algorithm has no error on 1-inputs, it can be modified so that it always ends in exactly this state, by uncomputing all but the answer register. An algorithm with two-sided error cannot be turned into a span program using this construction, because there is error in the final state. This is in intuitive opposition to the evidence that span programs characterize bounded (two-sided) error quantum query complexity.

This motivates us to consider the positive error of an input, or how close it comes to being positive. Since there is no meaningful notion of distance in V, we consider closeness in H.

Definition 4

(Positive Error) For any span program P on \([q]^n\), and \(x\in [q]^n\), we define the positive error of x in P as:

$$\begin{aligned} e_+(x)=e_+(x,P):=\min \left\{ \left\| \varPi _{H(x)^\bot }{|}w\rangle \right\| ^2:A{|}w\rangle =\tau \right\} . \end{aligned}$$

Note that \(e_+(x,P)=0\) if and only if \(x\in P_1\). Any \({|}w\rangle \) such that \(\left\| \varPi _{H(x)^\bot }{|}w\rangle \right\| ^2=e_+(x)\) is called a min-error positive witness for x in P. We define

$$\begin{aligned} \tilde{w}_+(x)=\tilde{w}_+(x,P):=\min \left\{ \left\| {|}w\rangle \right\| ^2:A{|}w\rangle =\tau , \left\| \varPi _{H(x)^\bot }{|}w\rangle \right\| ^2=e_+(x)\right\} . \end{aligned}$$

A min-error positive witness that also minimizes \(\left\| {|}w\rangle \right\| ^2\) is called an optimal min-error positive witness for x.

Note that if \(x\in P_1\), then \(e_+(x)=0\). In that case, a min-error positive witness for x is just a positive witness, and \(\tilde{w}_+(x)=w_+(x)\).

We define a similar notion for positive inputs, to measure their closeness to being negative.

Definition 5

(Negative Error) For any span program P on \([q]^n\) and \(x\in [q]^n\), we define the negative error of x in P as:

$$\begin{aligned} e_-(x)=e_-(x,P):=\min \left\{ \left\| \omega A\varPi _{H(x)} \right\| ^2: \omega (\tau )=1\right\} . \end{aligned}$$

Again, \(e_-(x,P)=0\) if and only if \(x\in P_0\). Any \(\omega \) such that \(\left\| \omega A\varPi _{H(x)} \right\| ^2=e_-(x,P)\) is called a min-error negative witness for x in P. We define

$$\begin{aligned} \tilde{w}_-(x)=\tilde{w}_-(x,P):=\min \left\{ \left\| \omega A \right\| ^2:\omega (\tau )=1,\left\| \omega A\varPi _{H(x)} \right\| ^2=e_-(x,P)\right\} . \end{aligned}$$

A min-error negative witness that also minimizes \(\left\| \omega A \right\| ^2\) is called an optimal min-error negative witness for x.

It turns out that the notion of span program error has a very nice characterization as exactly the reciprocal of the witness size:

$$\begin{aligned} \forall x\in P_0,\;w_-(x)=\frac{1}{e_+(x)},\qquad \text{ and }\qquad \forall x\in P_1,\; w_+(x)=\frac{1}{e_-(x)}, \end{aligned}$$

which we prove shortly in Theorems 8 and 9. This is a very nice state of affairs, for a number of reasons. It allows us two ways of thinking about approximate span programs: in terms of how small the error is, or how large the witness size is. That is, we can say that an input \(x\in P_0\) is almost positive either because its positive error is small, or equivalently, because its negative witness size is large. In general, we can think of P as not only partitioning P into \((P_0,P_1)\), but inducing an ordering on \([q]^n\) from most negative—smallest negative witness, or equivalently, largest positive error—to most positive—smallest positive witness, or equivalently, largest negative error. For example, on the domain \(\{x^{(1)},\dots ,x^{(6)}\}\subset [q]^n\), P might induce the following ordering:

figure a

The inputs \(\{x^{(1)},x^{(2)},x^{(3)}\}\) are in \(P_0\), and \(w_-(x^{(1)})< w_-(x^{(2)})<w_-(x^{(3)})\) (although it is generally possible for two inputs to have the same witness size). The inputs \(\{x^{(4)},x^{(5)},x^{(6)}\}\) are in \(P_1\), and \(w_+(x^{(4)})>w_+(x^{(5)})>w_+(x^{(6)})\). The span program exactly decides partition \((\{x^{(1)},x^{(2)},x^{(3)}\},\{x^{(4)},x^{(5)},x^{(6)}\})\), but we say it approximates any partition that respects the ordering. If we obtain a partition by drawing a line somewhere on the left side, for example \((\{x^{(1)},x^{(2)}\},\{x^{(3)},x^{(4)},x^{(5)},x^{(6)}\})\), we say Pnegatively approximates the function corresponding to that partition, whereas if we obtain a partition by drawing a line on the right side, for example \((\{x^{(1)},x^{(2)},x^{(3)},x^{(4)},x^{(5)}\},\{x^{(6)}\})\), we say Ppositively approximates the function.

Definition 6

(Functions Approximately Associated withP) Let P be a span program on \([q]^n\), and \(f:X\subseteq [q]^n\rightarrow \{0,1\}\) a decision problem. For any \(\lambda \in (0,1)\), we say that Ppositively\(\lambda \)-approximatesf if \(f^{-1}(1)\subseteq P_1\), and for all \(x\in f^{-1}(0)\), either \(x\in P_0\), or \(w_+(x,P)\ge \frac{1}{\lambda }W_+(f,P)\) (note that since \(f^{-1}(1)\subseteq P_1\), \(W_+(f,P)=\max _{x\in f^{-1}(1)}w_+(x,P)\) is finite). We say that Pnegatively\(\lambda \)-approximatesf if \(f^{-1}(0)\subseteq P_0\), and for all \(x\in f^{-1}(1)\), either \(x\in P_1\), or \(w_-(x,P)\ge \frac{1}{\lambda }W_-(f,P)\) (note that since \(f^{-1}(0)\subseteq P_0\), \(W_-(f,P)\) is finite). If P decides f exactly, then both conditions hold for any value of \(\lambda \), and so we can say that P 0-approximates f.

This allows us to consider a much broader class of functions associated with a particular span program. This association is useful, because as with the standard notion of association between a function f and a span program, if a function is approximated by a span program, we can convert the span program into a quantum algorithm that decides f using a number of queries related to the witness sizes. Specifically, we get the following, proven in Sect. 4.

Theorem 5

(Approximate Span Program Decision Algorithms) Fix \(f:X\subseteq [q]^n\rightarrow \{0,1\}\), and let P be a span program that positively \(\lambda \)-approximates f. Define

$$\begin{aligned}&W_+=W_+(f,P):=\max _{x\in f^{-1}(1)}w_+(x,P)\\&\text{ and } \qquad \widetilde{W}_-=\widetilde{W}_-(f,P):=\max _{x\in f^{-1}(0)}\tilde{w}_-(x,P). \end{aligned}$$

There is a quantum algorithm that decides f with bounded error in query complexity \(O\left( \frac{\sqrt{W_+\widetilde{W}_-}}{(1-\lambda )^{3/2}}\log \frac{1}{1-\lambda }\right) \). Similarly, let P be a span program that negatively \(\lambda \)-approximates f. Define

$$\begin{aligned}&W_-=W_-(f,P):=\max _{x\in f^{-1}(0)}w_-(x,P)\\&\text{ and } \qquad \widetilde{W}_+=\widetilde{W}_+(f,P):=\max _{x\in f^{-1}(1)}\tilde{w}_+(x,P). \end{aligned}$$

There is a quantum algorithm that decides f with bounded error in query complexity \(O\left( \frac{\sqrt{W_-\widetilde{W}_+}}{(1-\lambda )^{3/2}}\log \frac{1}{1-\lambda }\right) \).

With the ability to distinguish between different witness sizes, we can obtain algorithms for estimating the witness size.

Theorem 6

(Witness Size Estimation Algorithm) Fix \(f:X\subseteq [q]^n\rightarrow \mathbb {R}_{\ge 0}\). Let P be a span program such that for all \(x\in X\), \(f(x)=w_+(x,P)\) and define \(\widetilde{W}_-=\widetilde{W}_-(f,P)=\max _{x\in X}\tilde{w}_-(x,P)\). There exists a quantum algorithm that estimates f to accuracy \(\varepsilon \) in \(\widetilde{O}\left( \frac{1}{\varepsilon ^{3/2}}\sqrt{w_+(x)\widetilde{W}_-}\right) \) queries. Similarly, let P be a span program such that for all \(x\in X\), \(f(x)=w_-(x,P)\) and define \(\widetilde{W}_+=\widetilde{W}_+(f,P)=\max _{x\in X}\tilde{w}_+(x,P)\). Then there exists a quantum algorithm that estimates f to accuracy \(\varepsilon \) in \(\widetilde{O}\left( \frac{1}{\varepsilon ^{3/2}}\sqrt{w_-(x)\widetilde{W}_+}\right) \) queries.

The algorithms of Theorems 5 and 6 involve phase estimation of a particular unitary U, as with previous span program algorithms, in order to distinguish the 1-eigenspace of U from its other eigenspaces. In general, it may not be feasible to calculate the phase gap of U, so for the algorithms of Theorems 5 and 6, as with previous algorithms, we use the effective spectral gap lemma to bound the overlap of a particular initial state with eigenspaces of U corresponding to small phases. However, by relating the phase gap of U to the spectrum of A and \(A(x):=A\varPi _{H(x)}\), we show how to lower bound the phase gap in some cases, which may give better results. In particular, in our application to effective resistance, it is not difficult to bound the phase gap in this way, which leads to an improved upper bound. In general we have the following theorem.

Theorem 7

(Witness Size Estimation Algorithm Using Real Phase Gap) Fix \(f:X\subseteq [q]^n\rightarrow \mathbb {R}_{\ge 0}\) and let \(P=(H,V,\tau ,A)\) be a normalized span program (see Definition 7) on \([q]^n\) such that \(\forall x\in X\), \(f(x)=w_+(x,P)\) (resp. \(f(x)=w_-(x)\)). If \(\kappa \ge \frac{\sigma _\mathrm {max}(A)}{\sigma _\mathrm {min}(A\varPi _{H(x)})}\), \(\forall x\in X\), then the quantum query complexity of estimating f(x) to relative accuracy \(\varepsilon \) is at most \(\widetilde{O}\left( \frac{\sqrt{f(x)}\kappa }{\varepsilon }\right) \).

Theorem 5 is proven in Sect. 4.2, Theorem 6 is proven in Sect. 4.3, and Theorem 7 is proven in Sect. 4.4.

3.3 Example

To illustrate how these ideas might be useful, we give a brief example of how a span program that leads to an algorithm for the OR function can be combined with our results to additionally give algorithms for threshold functions and approximate counting. We define a span program P on \(\{0,1\}^n\) as follows:

$$\begin{aligned} V=\mathbb {R},\quad \tau =1,\quad H_i=H_{i,1}=\mathrm {span}\{{|}i\rangle \},\quad H_{i,0}=\{0\},\quad A=\sum _{i=1}^n{\langle }i|. \end{aligned}$$

So \(H=\mathrm {span}\{{|}i\rangle :i\in [n]\}\) and \(H(x)=\mathrm {span}\{{|}i\rangle :x_i=1\}\). It is not difficult to see that P decides OR. In particular, we can see that the optimal positive witness for any x such that \(|x|>0\) is \({|}w_x\rangle =\sum _{i:x_i=1}\frac{1}{|x|}{|}i\rangle \). The only linear map \(\omega :\mathbb {R}\rightarrow \mathbb {R}\) such that \(\omega \tau =1\) is the identity, and indeed, this is a negative witness for the all-zeros string \(\bar{0}\), since \(H(\bar{0})=\{0\}\), and so \(\omega A\varPi _{H(\bar{0})}=0\).

Let \(\lambda \in (0,1)\), \(t\in [n]\), and let f be a threshold function defined by \(f(x)=1\) if \(|x|\ge t\) and \(f(x)=0\) if \(|x|\le \lambda t\), with the promise that one of these conditions holds. If \(f(x)=1\), then \(w_+(x)=\left\| {|}w_x\rangle \right\| ^2=\frac{1}{|x|}\le \frac{1}{t}\), so \(W_+(f,P)=\frac{1}{t}\). On the other hand, if \(f(x)=0\), then \(w_+(x)=\frac{1}{|x|}\ge \frac{1}{\lambda t}=\frac{1}{\lambda }W_+(f,P)\), so P positively \(\lambda \)-approximates f. The only approximate negative witness is \(\omega \) the identity, so we have \(\widetilde{W}_-=\left\| \omega A \right\| ^2=\left\| A \right\| ^2=n\). By Theorem 5, there is a quantum algorithm for f with query complexity \(\frac{1}{(1-\lambda )^{3/2}}\sqrt{W_+\widetilde{W}_-}=\frac{1}{(1-\lambda )^{3/2}}\sqrt{n/t}\).

Furthermore, since \(w_+(x)=\frac{1}{|x|}\), by Theorem 6, we can estimate \(\frac{1}{|x|}\) to relative accuracy \(\varepsilon \), and therefore we can estimate |x| to relative accuracy \(2\varepsilon \), in quantum query complexity \(\frac{1}{\varepsilon ^{3/2}}\sqrt{n/|x|}\).

These upper bounds do not have optimal scaling in \(\varepsilon \), as the actual quantum query complexities of these problems are \(\frac{1}{1-\lambda }\sqrt{n/t}\) and \(\frac{1}{\varepsilon }\sqrt{n/|x|}\) [1, 2, 6], however, using Theorem 7, the optimal query complexities can be recovered.

3.4 Span Program Structure and Scaling

In this section, we present some observations about the structure of span programs that will be useful in the design and analysis of our algorithms, and for general intuition. We begin by formally stating and proving Theorems 8 and 9, relating error to witness size.

Theorem 8

Let P be a span program on \([q]^n\) and \(x\in P_0\). If \({|}\tilde{w}\rangle \) is a min-error positive witness for x, and \(\omega \) is an optimal negative witness for x,

$$\begin{aligned} (\omega A)^\dagger = \frac{\varPi _{H(x)^\bot }{|}\tilde{w}\rangle }{\left\| \varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle \right\| ^2},\quad \text{ and } \text{ so }\quad w_-(x)=\frac{1}{e_+(x)}. \end{aligned}$$

Proof

Let \({|}{\tilde{w}}\rangle \) be a min-error positive witness for x, and \(\omega \) an optimal negative witness for x. We have \((\omega A){|}{\tilde{w}}\rangle =\omega \tau =1\) and, since \(\omega A\varPi _{H(x)}=0\), we have \((\omega A)\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle =1\). Thus, write \((\omega A)^\dagger = \frac{\varPi _{H(x)^\bot }{|}\tilde{w}\rangle }{\left\| \varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle \right\| ^2}+{|}u\rangle \) such that \({\langle }u|\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle =0\). Define \({|}w_{\text {err}}\rangle =\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle \). We have \(A({|}{\tilde{w}}\rangle -\varPi _{\ker A}{|}w_{\text {err}}\rangle )=A{|}\tilde{w}\rangle =\tau ,\) so by assumption that \({|}{\tilde{w}}\rangle \) has minimal error,

$$\begin{aligned} \left\| \varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle \right\|\le & {} \left\| \varPi _{H(x)^\bot }({|}{\tilde{w}}\rangle -\varPi _{\ker A}{|}w_{\text {err}}\rangle ) \right\| \le \left\| \varPi _{H(x)^\bot }{|}\tilde{w}\rangle -\varPi _{\ker A}{|}w_{\text {err}}\rangle \right\| \\= & {} \left\| \varPi _{(\ker A)^\bot }{|}w_{\text {err}}\rangle \right\| , \end{aligned}$$

so \(\left\| {|}w_{\mathrm {err}}\rangle \right\| \le \left\| \varPi _{(\ker A)^\bot }{|}w_{\mathrm {err}}\rangle \right\| \), and so we must have \({|}w_{\text {err}}\rangle \in (\ker A)^\bot \). Thus, \(\ker {\langle }w_{\text {err}}|\subseteq \ker A\), so by the fundamental homomorphism theorem, there exists a linear function \(\bar{\omega }:\mathrm {col}A\rightarrow \mathbb {R}\) such that \(\bar{\omega }A={\langle }w_\text {err}|\). Furthermore, we have \(\bar{\omega }\tau =\bar{\omega }A{|}{\tilde{w}}\rangle ={\langle }\tilde{w}|\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle =\left\| \varPi _{H(x)^\bot }{|}\tilde{w}\rangle \right\| ^2=e_+(x),\) so \(\omega '=\frac{{\bar{\omega }}}{e_+(x)}\) has \(\omega '\tau =1\). By the optimality of \(\omega \), we must have \(\left\| \omega A \right\| ^2\le \left\| \omega ' A \right\| ^2\), so

$$\begin{aligned} \left\| \frac{\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle }{e_+(x)}+{|}u\rangle \right\| ^2\le & {} \left\| \frac{\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle }{e_+(x)} \right\| ^2 \end{aligned}$$

and so \({|}u\rangle =0\). Thus \((\omega A)^\dagger = \frac{\varPi _{H(x)^\bot }{|}{\tilde{w}}\rangle }{e_+(x)}\) and

$$\begin{aligned} w_-(x)=\left\| \omega A \right\| ^2 = \frac{\left\| \varPi _{H(x)^\bot }{|}\tilde{w}\rangle \right\| ^2}{e_+(x)^2}=\frac{1}{e_+(x)}. \end{aligned}$$

\(\square \)

Theorem 9

Let P be a span program on \([q]^n\) and \(x\in P_1\). If \({|}w\rangle \) is an optimal positive witness for x, and \({{\tilde{\omega }}}\) is a min-error negative witness for x,

$$\begin{aligned} {|}w\rangle =\frac{\varPi _{H(x)}({{\tilde{\omega }}} A)^\dagger }{\left\| {{\tilde{\omega }}} A\varPi _{H(x)} \right\| ^2} \quad \text{ and } \text{ so }\quad w_+(x)=\frac{1}{e_-(x)}. \end{aligned}$$

Proof

Let \({\tilde{\omega }}\) be a min-error negative witness for x, and define \({|}w'\rangle =\frac{\varPi _{H(x)}(\tilde{\omega }A)^\dagger }{\left\| \tilde{\omega }A\varPi _{H(x)} \right\| ^2}\). First note that \({|}w'\rangle \in H(x)\). We will show that \({|}w'\rangle \) is a positive witness for x by showing \(A{|}w'\rangle =\tau \). Suppose \(\tau \) and \(A{|}w'\rangle \) are linearly independent, and let \(\alpha \in \mathcal {L}(V,\mathbb {R})\) be such that \(\alpha (A{|}w'\rangle )=0\) and \(\alpha (\tau )=1\). Then for any \(\varepsilon \in [0,1]\), we have \((\varepsilon {\tilde{\omega }}+(1-\varepsilon )\alpha )\tau =1\), so by optimality of \({\tilde{\omega }}\),

$$\begin{aligned} \left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2\le & {} \left\| (\varepsilon {\tilde{\omega }}+(1-\varepsilon )\alpha ) A\varPi _{H(x)} \right\| ^2\\= & {} \varepsilon ^2 \left\| {{\tilde{\omega }} A}\varPi _{H(x)} \right\| ^2+(1-\varepsilon )^2\left\| {\alpha A}\varPi _{H(x)} \right\| ^2 \\ (1-\varepsilon ^2)\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2\le & {} (1-\varepsilon )^2\left\| {\alpha A}\varPi _{H(x)} \right\| ^2, \end{aligned}$$

where the equality follows from the fact that \(\alpha (A\varPi _{H(x)}({\tilde{\omega }} A)^\dagger )=0\). This implies \(\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| \le 0\), a contradiction, since \(\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| >0\). Thus, we must have \(A{|}w'\rangle =r{\tau }\) for some scalar r, so \({\tilde{\omega }}(A{|}w'\rangle )=r{\tilde{\omega }}({\tau })\). We then have \({\tilde{\omega }}(A{|}w'\rangle ) = {{\tilde{\omega }} A}\frac{\varPi _{H(x)}({{\tilde{\omega }} A})^\dagger }{\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2}=1,\) and so we have \(r=1\), and thus \(A{|}w'\rangle ={\tau }\). So \({|}w'\rangle \) is a positive witness for x. Let \({|}w\rangle \in H(x)\) be an optimal positive witness for x, so \(\left\| {|}w\rangle \right\| ^2=w_+(x)\). We have

$$\begin{aligned} {{\langle }w'\vert }w\rangle =\frac{{\tilde{\omega }} A\varPi _{H(x)}{|}w\rangle }{\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2}=\frac{{\tilde{\omega }}\tau }{\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2}=\frac{1}{\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2}=\left\| {|}w'\rangle \right\| ^2. \end{aligned}$$

Thus \(\left\| {|}w'\rangle \right\| ^2\le \left\| {|}w'\rangle \right\| \left\| {|}w\rangle \right\| \) by the Cauchy–Schwarz inequality, so since \({|}w\rangle \) is optimal, we must have \(\left\| {|}w\rangle \right\| =\left\| {|}w'\rangle \right\| \). Since the the smallest \({|}w\rangle \) such that \(A\varPi _{H(x)}{|}w\rangle =\tau \) is uniquely defined as \((A\varPi _{H(x)})^+\tau \), we have \({|}w\rangle ={|}w'\rangle \). Thus \(w_+(x)=\left\| {|}w\rangle \right\| ^2=\left\| {|}w'\rangle \right\| ^2=\frac{1}{\left\| {\tilde{\omega }} A\varPi _{H(x)} \right\| ^2}=\frac{1}{e_-(x)}\). \(\square \)

Positive Witnesses Fix a span program \(P=(H,V,\tau ,A)\) on \([q]^n\). In general, a positive witness is any \({|}w\rangle \in H\) such that \(A{|}w\rangle ={\tau }\). Assume the set of all such vectors is non-empty, and let \({|}w\rangle \) be any vector in H such that \(A{|}w\rangle ={\tau }\). Then the set of positive witnesses is exactly

$$\begin{aligned} W:= {|}w\rangle +\ker A=\{{|}w\rangle +{|}h\rangle :{|}h\rangle \in \ker A\}. \end{aligned}$$

It is well known, and a simple exercise to prove, that the unique shortest vector in W is \(A^+\tau \), and it is the unique vector in \(W\cap (\ker A)^\bot \). We can therefore talk about the unique smallest positive witness, whenever W is non-empty.

Definition 7

Fix a span program P, and suppose \(W=\{{|}h\rangle \in H:A{|}h\rangle =\tau \}\) is non-empty. We define the minimal positive witness of P to be \({|}w_0\rangle \in W\) with smallest norm—that is, \({|}w_0\rangle =A^+\tau \). We define \(N_+(P):= \left\| {|}w_0\rangle \right\| ^2\).

Since \({|}w_0\rangle \in (\ker A)^\bot \), we can write any positive witness \({|}w\rangle \) as \({|}w_0\rangle +{|}w_{0}^\bot \rangle \) for some \({|}w_{0}^{\bot }\rangle \in \ker A\). If we let \(T=A^{-1}(\mathrm {span}\{\tau \})\), we can write \(T=\mathrm {span}\{{|}w_0\rangle \}\oplus \ker A\).

Negative Witnesses Just as we can talk about a minimal positive witness, we can also talk about a minimal negative witness of P: any \(\omega _0\in \mathcal {L}(V,\mathbb {R})\) such that \(\omega _0\tau =1\), that minimizes \(\left\| \omega _0 A \right\| \). Define \(N_-(P)=\min _{\omega _0:\omega _0(\tau )=1}\left\| \omega _0A \right\| ^2\). Note that unlike \({|}w_0\rangle \), \(\omega _0\) might not be unique. There may be distinct \(\omega _0,\omega _0'\in \mathcal {L}(V,\mathbb {R})\) that map \(\tau \) to 1 and have minimal complexity, however, one can easily show that in that case, \(\omega _0A=\omega _0'A\), and that the unique minimal negative witness in \(\mathrm {col}A\) is \(\frac{{\langle }\tau |}{\left\| \tau \right\| ^2}\).

For any minimal negative witness, \(\omega _0\), \(\omega _0A\) is related to the minimal positive witness \({|}w_0\rangle \) by \((\omega _0 A)^\dagger = \frac{{|}w_0\rangle }{N_+(P)}\), and \(N_+(P)=\frac{1}{N_-(P)}\). This can be proven, for example, by replacing H(x) with \(\{0\}\) in the proof of Theorem 8.

Span Program Scaling and Normalization By scaling \(\tau \) to get a new target \(\tau '=B\tau \), we can scale a span program by an arbitrary positive real number B, so that all positive witnesses are scaled by B, and all negative witnesses are scaled by \(\frac{1}{B}\). Note that this leaves \(W_+W_-\) unchanged, so we can in some sense consider the span program invariant under this scaling.

Definition 8

A span program P is normalized if \(N_+(P)=N_-(P)=1\).

Any span program can be converted to a normalized span program by replacing the target with \(\tau '=\frac{\tau }{N_+}\). However, it will turn out to be desirable to normalize a span program, and also scale it, independently. We can accomplish this to some degree, as shown by the following theorem.

Theorem 10

(Span program scaling) Let \(P=(H,V,\tau ,A)\) be any span program on \([q]^n\), and let \(N=\left\| {|}w_0\rangle \right\| ^2\) for \({|}w_0\rangle \) the minimal positive witness of P. For \(\beta \in \mathbb {R}_{>0}\), define \(P^\beta =(H^{\beta },V^{\beta },\tau ^{\beta },A^{\beta })\) as follows, for \({|}\hat{0}\rangle \) and \({|}\hat{1}\rangle \) two vectors orthogonal to H and V:

$$\begin{aligned}&\forall j\in [n], a\in [q], H_{j,a}^{\beta }:=H_{j,a},\\&H^{\beta }_{\mathrm {true}}=H_{\mathrm {true}}\oplus \mathrm {span}\{{|}{\hat{1}}\rangle \},\quad H^{\beta }_{\mathrm {false}}=H_{\mathrm {false}}\oplus \mathrm {span}\{{|}{\hat{0}}\rangle \}\\&V^{\beta }=V\oplus \mathrm {span}\{{|}{\hat{1}}\rangle \},\quad A^{\beta }=\beta A+{|}\tau \rangle {\langle }{\hat{0}}|+\frac{\sqrt{\beta ^2+N}}{\beta }{|}{\hat{1}}\rangle {\langle }{\hat{1}}|,\quad \tau ^{\beta }={|}\tau \rangle +{|}{\hat{1}}\rangle \end{aligned}$$

Then we have the following:

  • \(\forall x\in P_1\), \(w_+(x,P^{\beta })=\frac{w_+(x,P)}{\beta ^2}+\frac{\beta ^2}{N+\beta ^2}\) and \(\tilde{w}_-(x,P^{\beta })\le \beta ^2\tilde{w}_-(x,P)+2\);

  • \(\forall x\in P_0\), \(w_-(x,P^{\beta })=\beta ^2 w_-(x,P)+1\) and \(\tilde{w}_+(x,P^{\beta })\le \frac{1}{\beta ^2}\tilde{w}_+(x,P)+2\);

  • the minimal witness of \(P^{\beta }\) is \({|}w_0^\beta \rangle =\frac{\beta }{\beta ^2+N}{|}w_0\rangle +\frac{N}{\beta ^2+N}{|}{\hat{0}}\rangle +\frac{\beta }{\sqrt{\beta ^2+N}}{|}{\hat{1}}\rangle \), and \(\left\| {|}w_0^\beta \rangle \right\| ^2=1\).

Proof of Theorem 10 appears in “Appendix A”.

4 Span Program Algorithms

In this section we describe several ways in which a span program can be turned into a quantum algorithm. As in the case of algorithms previously constructed from span programs, our algorithms will consist of many applications of a unitary on H, applied to some initial state. Unlike previous applications, we will use \({|}w_0\rangle \), the minimal positive witness of P, as the initial state, assuming P is normalized so that \(\left\| {|}w_0\rangle \right\| =1\). This state is independent of the input, and so can be generated with 0 queries. For negative span program algorithms, where we want to decide a function negatively approximated by P, we will use a unitary U(Px), defined as follows:

$$\begin{aligned} U(P,x):=(2\varPi _{\ker A}-I)(2\varPi _{H(x)}-I)=(2\varPi _{(\ker A)^\bot }-I)(2\varPi _{H(x)^\bot }-I). \end{aligned}$$

This is similar to the unitary used in previous span program algorithms. Note that \((2\varPi _{\ker A}-I)\) is input-independent, and so can be implemented in 0 queries. However, in order to analyze the time complexity of a span program algorithm, this reflection must be implemented (as we are able to do for our applications, following [7]). The reflection \((2\varPi _{H(x)}-I)\) depends on the input, but it is not difficult to see that it requires two queries to implement. Since our definition of span programs varies slightly from previous definitions, we provide a proof of this fact.

Lemma 2

The reflection \(2\varPi _{H(x)}-I\) can be implemented using 2 queries to x.

Proof

For every \(i\in [n]\) and \(a\in [q]\), let \(R_{i,a}=(I-2\varPi _{H_{i,a}^\bot \cap H_i})\), the operator that reflects every vector in \(H_i\) that is orthogonal to \(H_{i,a}\). This operation is input independent, and so, can be implemented in 0 queries. For every \(i\in [n]\), let \(\{{|}\psi _{i,1}\rangle ,\dots ,{|}\psi _{i,m_i}\rangle \}\) be an orthonormal basis for \(H_i\). Recall that the spaces \(H_i\) are orthogonal, so we can map \({|}\psi _{i,j}\rangle \mapsto {|}i\rangle {|}\psi _{i,j}\rangle \). Then using one query, we can map \({|}i\rangle {|}\psi _{i,j}\rangle \mapsto {|}i\rangle {|}x_i\rangle {|}\psi _{i,j}\rangle \). We then perform \(R_{i,x_i}\) on the last register, conditioned on the first two registers, and then uncompute the first two registers, using one additional query. \(\square \)

For positive span program algorithms, where we want to decide a function positively approximated by P, or estimate the positive witness size, we will use a slightly different unitary:

$$\begin{aligned} U'(P,x)=(2\varPi _{H(x)}-I)(2\varPi _{T}-I), \end{aligned}$$

where \(T=\ker A\oplus \mathrm {span}\{{|}w_0\rangle \}\), the span of positive witnesses. We have \(U'=U^\dagger (I-2{|}w_0\rangle {\langle }w_0|)\).

We begin by analyzing the overlap of the initial state, \({|}w_0\rangle \), with the phase spaces of the unitaries U and \(U'\) in Sect. 4.1. In particular, we show that the projections of \({|}w_0\rangle \) onto the 0-phase spaces of U and \(U'\) are exactly related to the witness size. Using the effective spectral gap lemma (Lemma 1), we show that the overlap of \({|}w_0\rangle \) with small nonzero phase spaces is not too large. Using this analysis, in Sect. 4.2, we describe how to convert a span program into an algorithm for any decision problem that is approximated by the span program, proving Theorem 5, and in Sect. 4.3, we describe how to convert a span program into an algorithm that estimates the span program witness size, proving Theorem 6.

Finally, in Sect. 4.4, we give a lower bound on the phase gap of U in terms of the spectra of A and \(A(x)=A\varPi _{H(x)}\), giving an alternative analysis to the effective spectral gap analysis of Sect. 4.1 that may be better in some cases, and proving Theorem 7.

4.1 Analysis

Negative Span Programs In this section we analyze the overlap of \({|}w_0\rangle \) with the eigenspaces of U(Px). For any angle \(\varTheta \in [0,\pi )\), we define \(\varPi _\varTheta ^x\) as the orthogonal projector onto the \(e^{i\theta }\)-eigenspaces of U(Px) for which \(|\theta |\le \varTheta \).

Lemma 3

Let P be a normalized span program on \([q]^n\). For any \(x\in [q]^n\),

$$\begin{aligned} \left\| \varPi _\varTheta ^x{|}w_0\rangle \right\| ^2\le \frac{\varTheta ^2}{4}\tilde{w}_+(x)+\frac{1}{w_-(x)}. \end{aligned}$$

In particular, for any \(x\in P_1\), \(\left\| \varPi _\varTheta ^x{|}w_0\rangle \right\| ^2\le \frac{\varTheta ^2}{4}w_+(x)\).

Proof

Suppose \(x\in P_1\), and let \({|}w_x\rangle \) be an optimal positive witness for x, so \(\varPi _{(\ker A)^\bot }{|}w_x\rangle ={|}w_0\rangle \). Then since \(\varPi _{H(x)^\bot }{|}w_x\rangle =0\), by the effective spectral gap lemma (Lemma 1):

$$\begin{aligned} \left\| \varPi _\varTheta ^x{|}w_0\rangle \right\| ^2=\left\| \varPi _\varTheta ^x\varPi _{(\ker A)^\bot }{|}w_x\rangle \right\| ^2\le \frac{\varTheta ^2}{4}\left\| {|}w_x\rangle \right\| ^2=\frac{\varTheta ^2}{4}w_+(x). \end{aligned}$$

Suppose \(x\in P_0\) and let \(\omega _x\) be an optimal zero-error negative witness for x and \({|}{\tilde{w}}_x\rangle \) an optimal min-error positive witness for x. First note that \(\varPi _{(\ker A)^\bot }{|}{\tilde{w}}_x\rangle ={|}w_0\rangle \), so \(\varPi _{(\ker A)^\bot }\varPi _{H(x)}{|}{\tilde{w}}_x\rangle +\varPi _{(\ker A)^\bot }\varPi _{H(x)^\bot }{|}{\tilde{w}}_x\rangle ={|}w_0\rangle \). Since \(\varPi _{H(x)^\bot }\left( \varPi _{H(x)}{|}{\tilde{w}}_x\rangle \right) =0\), we have, by Lemma 1,

$$\begin{aligned} \left\| \varPi _\varTheta \varPi _{(\ker A)^\bot }\varPi _{H(x)}{|}{\tilde{w}}_x\rangle \right\| ^2\le & {} \frac{\varTheta ^2}{4}\left\| \varPi _{H(x)}{|}{\tilde{w}}_x\rangle \right\| ^2\\ \left\| \varPi _\varTheta \left( {|}w_0\rangle -\varPi _{(\ker A)^\bot }\varPi _{H(x)^\bot }{|}{\tilde{w}}_x\rangle \right) \right\| ^2\le & {} \frac{\varTheta ^2}{4}\left\| {|}{\tilde{w}}_x\rangle \right\| ^2\\ \left\| \varPi _\varTheta \left( {|}w_0\rangle -\varPi _{(\ker A)^\bot }\frac{(\omega _x A)^\dagger }{w_-(x)}\right) \right\| ^2\le & {} \frac{\varTheta ^2}{4}\left\| {|}\tilde{w}_x\rangle \right\| ^2. \end{aligned}$$

In the last step, we used the fact that \(\frac{(\omega _x A)^\dagger }{w_-(x)}=\varPi _{H(x)^\bot }{|}{\tilde{w}}_x\rangle \), by Theorem 8. Next note that \(\varPi _{(\ker A)^\bot } (\omega _x A)^\dagger = (\omega _x A)^\dagger \) and \(\varPi _{H(x)^\bot }(\omega _x A)^\dagger = (\omega _x A)^\dagger \), so \(U(\omega _x A)^\dagger = (\omega _x A)^\dagger \), and therefore, \(\varPi _\varTheta (\omega _x A)^\dagger = (\omega _x A)^\dagger \). Thus:

$$\begin{aligned} \left\| \varPi _\varTheta {|}w_0\rangle -\frac{(\omega _x A)^\dagger }{w_-(x)} \right\| ^2\le & {} \frac{\varTheta ^2}{4}\left\| {|}{\tilde{w}}_x\rangle \right\| ^2\\ \left\| \varPi _\varTheta {|}w_0\rangle \right\| ^2+\frac{1}{w_-(x)}-2\frac{1}{w_-(x)}{\langle }w_0|\varPi _\varTheta (\omega _x A)^\dagger\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_+(x)\\ \left\| \varPi _\varTheta {|}w_0\rangle \right\| ^2+\frac{1}{w_-(x)}-2\frac{1}{w_-(x)} (\omega _x A{|}w_0\rangle )^\dagger\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_+(x)\\ \left\| \varPi _\varTheta {|}w_0\rangle \right\| ^2+\frac{1}{w_-(x)}-2\frac{1}{w_-(x)} (\omega _x \tau )^\dagger\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_+(x)\\ \left\| \varPi _\varTheta {|}w_0\rangle \right\| ^2\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_+(x) +\frac{1}{w_-(x)}, \end{aligned}$$

where in the last line we used the fact that \(\omega _x\tau =1\). \(\square \)

Lemma 4

Let P be a normalized span program on \([q]^n\). For any \(x\in [q]^n\),

$$\begin{aligned} \left\| \varPi _0^x{|}w_0\rangle \right\| ^2= \frac{1}{w_-(x)}. \end{aligned}$$

In particular, for any \(x\in P_1\), \(\left\| \varPi _0^x{|}w_0\rangle \right\| =0\).

Proof

By Lemma 3, we have \(\left\| \varPi ^x_0{|}w_0\rangle \right\| ^2\le \frac{1}{w_-(x)}\). To see the other direction, let \(\omega _x\) be an optimal zero-error negative witness for x (if none exists, then \(w_-(x)=\infty \) and the statement is vacuously true). Define \({|}u\rangle =(\omega _x A)^\dagger \). By the proof of Lemma 3, \(U{|}u\rangle ={|}u\rangle \). We have \({{\langle }u\vert }w_0\rangle =\omega _x A{|}w_0\rangle =\omega _x\tau =1\) and \(\left\| {|}u\rangle \right\| ^2=\left\| \omega _x A \right\| ^2=w_-(x),\) so we have: \(\left\| \varPi ^x_0{|}w_0\rangle \right\| ^2\ge \left\| \frac{{|}u\rangle {\langle }u|}{\left\| {|}u\rangle \right\| ^2}{|}w_0\rangle \right\| ^2=\frac{1}{w_-(x)}\).

Positive Span Programs We now prove results analogous to Lemmas 3 and 4 for the unitary \(U'(P,x)\). For any angle \(\varTheta \in [0,\pi )\), we define \(\overline{\varPi }^x_{\varTheta }\) as the projector onto the \(\theta \)-phase spaces of \(U'(P,x)\) for which \(|\theta |\le \varTheta \).

Lemma 5

Let P be a normalized span program on \([q]^n\). For any \(x\in [q]^n\),

$$\begin{aligned} \left\| \overline{\varPi }^x_\varTheta {|}w_0\rangle \right\| ^2\le \frac{\varTheta ^2}{4}\tilde{w}_-(x)+\frac{1}{w_+(x)}. \end{aligned}$$

In particular, if \(x\in P_0\), then \(\left\| \overline{\varPi }^x_\varTheta {|}w_0\rangle \right\| ^2\le \frac{\varTheta ^2}{4}w_-(x)\).

Proof

If \(x\in P_0\), then let \(\omega _x\) be an optimal exact negative witness for x, so \(\omega _x A\varPi _{H(x)}=0\), and thus, by the effective spectral gap lemma (Lemma 1),

$$\begin{aligned} \left\| \overline{\varPi }_\varTheta ^x\varPi _T(\omega _x A)^\dagger \right\| ^2\le \frac{\varTheta ^2}{4}\left\| \omega _x A \right\| ^2=\frac{\varTheta ^2}{4}w_-(x). \end{aligned}$$

We have \(\omega _x A\varPi _T=\omega _x A(\varPi _{\ker A}+{|}w_0\rangle {\langle }w_0|)=\omega _x A{|}w_0\rangle {\langle }w_0|=\omega _x\tau {\langle }w_0|={\langle }w_0|\), so \(\left\| \overline{\varPi }_\varTheta ^x{|}w_0\rangle \right\| ^2\le \frac{\varTheta ^2}{4}w_-(x)\).

Suppose \(x\in P_1\), and let \({|}w_x\rangle \) be an optimal zero-error positive witness for x, and \({\tilde{\omega }}_x\) an optimal min-error negative witness for x. By Theorem 9, we have \(\frac{{|}w_x\rangle }{w_+(x)}=\varPi _{H(x)}({\tilde{\omega }}_x A)^\dagger \). Since \(\varPi _{H(x)}({\tilde{\omega }}_x A\varPi _{H(x)^\bot })^\dagger =0\), we have, by Lemma 1,

$$\begin{aligned} \left\| \overline{\varPi }^x_\varTheta \varPi _T({\tilde{\omega }}_x A\varPi _{H(x)^\bot })^\dagger \right\| ^2\le & {} \frac{\varTheta ^2}{4}\left\| {\tilde{\omega }}_x A\varPi _{H(x)^\bot } \right\| ^2\\ \left\| \overline{\varPi }^x_\varTheta \varPi _T\left( ({\tilde{\omega }}_x A)^\dagger - \frac{{|}w_x\rangle }{w_+(x)}\right) \right\| ^2\le & {} \frac{\varTheta ^2}{4}\left\| {\tilde{\omega }}_x A \right\| ^2\\ \left\| \overline{\varPi }^x_\varTheta \varPi _T({\tilde{\omega }}_x A)^\dagger - \frac{{|}w_x\rangle }{w_+(x)} \right\| ^2\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_-(x).\\ \end{aligned}$$

In the last line we used the fact that \(\varPi _T{|}w_x\rangle =\varPi _{H(x)}{|}w_x\rangle ={|}w_x\rangle \), so \(U'{|}w_x\rangle ={|}w_x\rangle \), and thus \(\overline{\varPi }^x_\varTheta {|}w_x\rangle ={|}w_x\rangle \).

Note that \({\tilde{\omega }}_xA\varPi _T={\tilde{\omega }}_xA(\varPi _{\ker A}+{|}w_0\rangle {\langle }w_0|)={\tilde{\omega }}_xA{|}w_0\rangle {\langle }w_0|={\tilde{\omega }}_x\tau {\langle }w_0|={\langle }w_0|\). Thus, we can continue from above as:

$$\begin{aligned} \left\| \overline{\varPi }^x_\varTheta {|}w_0\rangle - \frac{{|}w_x\rangle }{w_+(x)} \right\| ^2\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_-(x)\\ \left\| \overline{\varPi }^x_\varTheta {|}w_0\rangle \right\| ^2+\left\| \frac{{|}w_x\rangle }{w_+(x)} \right\| ^2 -\frac{2}{w_+(x)}{\langle }w_0|\overline{\varPi }^x_\varTheta {|}w_x\rangle\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_-(x)\\ \left\| \overline{\varPi }^x_\varTheta {|}w_0\rangle \right\| ^2+\frac{1}{w_+(x)} -\frac{2}{w_+(x)}{{\langle }w_0\vert }w_x\rangle\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_-(x)\\ \left\| \overline{\varPi }^x_\varTheta {|}w_0\rangle \right\| ^2\le & {} \frac{\varTheta ^2}{4}{\tilde{w}}_-(x)+\frac{1}{w_+(x)},\\ \end{aligned}$$

where in the last line we used the fact that \({{\langle }w_0\vert }w_x\rangle =1\). \(\square \)

Lemma 6

Let P be a normalized span program on \([q]^n\). For any \(x\in [q]^n\),

$$\begin{aligned} \left\| \overline{\varPi }^x_0{|}w_0\rangle \right\| ^2=\frac{1}{w_+(x)}. \end{aligned}$$

In particular, if \(x\in P_0\), then \(\left\| \overline{\varPi }_0^x{|}w_0\rangle \right\| =0\).

Proof

By Lemma 5, \(\left\| \overline{\varPi }^x_0{|}w_0\rangle \right\| ^2\le \frac{1}{w_+(x)}\). Let \({|}w_x\rangle ={|}w_0\rangle +{|}w_0^\bot \rangle \) be an optimal zero-error positive witness for x. Since \({|}w_x\rangle \in H(x)\cap T\), \(U'{|}w_x\rangle ={|}w_x\rangle \), so \(\left\| \overline{\varPi }^x_0{|}w_0\rangle \right\| ^2 \ge \frac{{{\langle }w_x\vert }w_0\rangle }{\left\| {|}w_x\rangle \right\| ^2}\; \ge \; \frac{1}{w_+(x)}\).

4.2 Algorithms for Approximate Span Programs

Using the spectral analysis from Sect. 4.1, we can design an algorithm that decides a function that is approximated by a span program. We will give details for the negative case, using Lemmas 3 and 4. A nearly identical argument proves the analogous statement for the positive case, using Lemmas 5 and 6 instead.

Throughout this section, fix a decision problem f on \([q]^n\), and let P be a normalized span program that negatively \(\lambda \)-approximates f. By Lemmas 4 and 3, it is possible to distinguish between the cases \(f(x)=0\), in which \(\frac{1}{w_-(x)}\ge \frac{1}{W_-}\), and \(f(x)=1\), in which \(\frac{1}{w_-(x)}\le \frac{\lambda }{W_-}\) using phase estimation to sufficient precision, and amplitude estimation on a 0 in the phase register. We give details in the following theorem.

Lemma 7

Let P be a normalized \(\lambda \)-negative approximate span program for f. Then the quantum query complexity of f is \(O\left( \frac{1}{(1-\lambda )^{3/2}}W_-\sqrt{\widetilde{W}_+}\log \frac{W_-}{1-\lambda }\right) \).

Proof

Let \(U(P,x)=\sum _{j=1}^me^{i\theta _j}{|}\psi _j\rangle {\langle }\psi _j|\), and let \({|}w_0\rangle =\sum _{j=1}^m\alpha _j{|}\psi _j\rangle \). Then applying phase estimation (Theorem 1) to precision \(\varTheta =\sqrt{\frac{4(1-\lambda )}{3W_-\widetilde{W}_+}}\) and error \(\varepsilon =\frac{1}{6}\frac{1-\lambda }{W_-}\) produces a state \({|}w_0'\rangle =\sum _{j=1}^m\alpha _j{|}\psi _j\rangle {|}\omega _j\rangle \) such that if \(\theta _j=0\), then \({|}\omega _j\rangle ={|}0\rangle \), and if \(|\theta _j|>\varTheta \) then \(|{{\langle }\omega _j\vert }0\rangle |^2\le \varepsilon \). Let \(\varLambda _0\) be the projector onto states with 0 in the phase register. We have: \(\left\| \varLambda _0{|}w_0'\rangle \right\| ^2=\sum _{j=1}^m|\alpha _j|^2|{{\langle }0\vert }\omega _j\rangle |^2.\) By Lemma 4, we have \(\left\| \varPi _0^x{|}w_0\rangle \right\| ^2=\frac{1}{w_-(x)}\), so if \(x\in f^{-1}(0)\), we have:

$$\begin{aligned} \left\| \varLambda _0{|}w_0'\rangle \right\| ^2\ge \sum _{j:\theta _j=0}|\alpha _j|^2|{{\langle }0\vert }0\rangle |^2=\left\| \varPi _0^x{|}w_0\rangle \right\| ^2=\frac{1}{w_-(x)}\ge \frac{1}{W_-}=:p_0. \end{aligned}$$

On the other hand, suppose \(x\in f^{-1}(1)\). Since P negatively \(\lambda \)-approximates f and \(x\in f^{-1}(1)\), \(w_-(x,P)\ge \frac{1}{\lambda }W_-(x,P)\). By Lemma 3, we have

$$\begin{aligned} \left\| \varPi _\varTheta ^x{|}w_0\rangle \right\| ^2 \le \frac{1}{w_-(x,P)}+\frac{\varTheta ^2}{4}\tilde{w}_+(x,P) \le \frac{\lambda }{W_-}+\frac{1-\lambda }{3W_-\widetilde{W}_+}\widetilde{W}_+=\frac{1}{3}\frac{1+2\lambda }{W_-} \end{aligned}$$

and thus

$$\begin{aligned} \left\| \varLambda _0{|}w_0'\rangle \right\| ^2\le & {} \sum _{j:|\theta _j|\le \varTheta }|\alpha _j|^2+\sum _{j:|\theta _j|>\varTheta }|\alpha _j|^2|{{\langle }\omega _j\vert }0\rangle |^2\\= & {} \left\| \varPi _\varTheta ^x{|}w_0\rangle \right\| ^2+\varepsilon \sum _{j:|\theta _j|>\varTheta }|\alpha _j|^2 \le \frac{1+2\lambda }{3W_-}+\frac{1-\lambda }{6W_-}=:p_1. \end{aligned}$$

By Corollary 1, we can distinguish between these cases using \(O\left( \frac{\sqrt{p_0}}{p_0-p_1}\right) \) calls to phase estimation, which costs \(\frac{1}{\varTheta }\log \frac{1}{\varepsilon }\) calls to U. In this case, we have

$$\begin{aligned} p_0-p_1=\frac{1-\frac{1}{3}-\frac{2}{3}\lambda -\frac{1}{6}+\frac{1}{6}\lambda }{W_-}=\frac{1}{2}\frac{1-\lambda }{W_-}. \end{aligned}$$

The total number of calls to U is:

$$\begin{aligned} \frac{\sqrt{p_0}}{p_0-p_1}\frac{1}{\varTheta }\log \frac{1}{\varepsilon }=\frac{{W_-}}{\sqrt{W_-}(1-\lambda )}\sqrt{\frac{W_-\widetilde{W}_+}{1-\lambda }}\log \frac{W_-}{1-\lambda }=\frac{W_-\sqrt{\widetilde{W}_+}}{(1-\lambda )^{3/2}}\log \frac{W_-}{1-\lambda }. \end{aligned}$$

\(\square \)

In addition to wanting to extend this to non-normalized span programs, we note that this expression is not symmetric in the positive and negative error. Using Theorem 10, we can normalize any span program, while also scaling the positive and negative witnesses. This gives us the following.

Corollary 3

Let P be any span program that negatively \(\lambda \)-approximates f. Then the quantum query complexity of f is at most

$$\begin{aligned} O\left( \frac{1}{(1-\lambda )^{3/2}}\sqrt{W_-(f,P)\widetilde{W}_+(f,P)}\log \frac{1}{1-\lambda }\right) . \end{aligned}$$

Proof

We will use the scaled span program described in Theorem 10. Let \(\beta =\frac{1}{\sqrt{W_-(f,P)}}\). Then \(P^\beta \) is a normalized span program with

$$\begin{aligned} W_-(f,P^\beta )=\max _{x\in f^{-1}(0)}w_-(x,P^\beta )= {\beta ^2} \max _{x\in f^{-1}(0)}w_-(x,P)+1= \frac{1}{W_-}W_-+1=2, \end{aligned}$$

and

$$\begin{aligned} \widetilde{W}_+(f,P^\beta )= & {} \max _{x\in f^{-1}(1)}\tilde{w}_+(x,P^\beta )\le \frac{1}{\beta ^2} \max _{x\in f^{-1}(1)}\tilde{w}_+(x,P)+2\\= & {} W_-(f,P)\widetilde{W}_+(f,P)+2. \end{aligned}$$

If we define \(\lambda ^{(\beta )}:=\frac{\max _{x\in f^{-1}(0)}w_-(x,P^\beta )}{\min _{x\in f^{-1}(1)}w_-(x,P^\beta )}=\frac{\beta ^2W_-(f,P)+1}{\beta ^2\frac{1}{\lambda }W_-(f,P)+1}=\frac{2}{\frac{1}{\lambda }+1}\), then clearly \(P^{\beta }\) negatively \(\lambda ^{(\beta )}\)-approximates f, so we can apply Lemma 7. We have \(\frac{1}{1-\lambda ^{(\beta )}}=\frac{1}{1-\frac{2\lambda }{1+\lambda }}=\frac{1+\lambda }{1-\lambda }\) so we can decide f in query complexity (neglecting constants):

$$\begin{aligned}&\left( \frac{1+\lambda }{1-\lambda }\right) ^{\frac{3}{2}}\!\!\sqrt{2\left( W_-(f,P)\widetilde{W}_+(f,P)+2\right) }\log 2\frac{1+\lambda }{1-\lambda }\\&\quad = \frac{1}{(1-\lambda )^{\frac{3}{2}}}\sqrt{W_-(f,P)\widetilde{W}_+(f,P)}\log \frac{1}{1-\lambda }. \end{aligned}$$

\(\square \)

By computations analogous to Lemma 7 and Corollary 3 (using \(\beta =\sqrt{W_+}\)), we can show that if P positively \(\lambda \)-approximates f, then f has quantum query complexity \(O\left( \frac{1}{(1-\lambda )^{3/2}}\sqrt{W_+\widetilde{W}_-}\log \frac{1}{1-\lambda }\right) \). This and Corollary 3 imply Theorem 5.

4.3 Estimating the Witness Size

Using the algorithms for deciding approximate span programs (Theorem 5) as a black box, we can construct a quantum algorithm that estimates the positive or negative witness size of an input using standard algorithmic techniques. We give the full proof for the case of positive witness size, as negative witness size is virtually identical. This proves Theorem 6.

Theorem 11

(Estimating the Witness Size) Fix \(f:X\subseteq [q]^n\rightarrow \mathbb {R}_{>0}\). Let P be a span program on \([q]^n\) such that for all \(x\in X\), \(f(x)=w_+(x,P)\). The quantum query complexity of estimating f to accuracy \(\varepsilon \) is \(\widetilde{O}\left( \frac{\sqrt{w_+(x)\widetilde{W}_-(P)}}{\varepsilon ^{3/2}}\right) \).

Proof

We will estimate \(e(x)=\frac{1}{w_+(x)}\). The basic idea is to use the algorithm from Theorem 5 to narrow down the interval in which the value of e(x) may lie. Assuming that the span program is normalized (which is without loss of generality, since normalizing by scaling \(\tau \) does not impact relative accuracy) we can begin with the interval [0, 1]. We stop when we reach an interval \([e_{\min },e_{\max }]\) such that the midpoint \(\tilde{e}=\frac{e_{\max }+e_{\min }}{2}\) satisfies \((1-\varepsilon )e_{\max }\le \tilde{e}\le (1+\varepsilon )e_{\min }\).

Let \(\mathtt {Decide}(P,w,\lambda )\) be the quantum algorithm from Theorem 5 that decides the (partial) function \(g:P_1\rightarrow \{0,1\}\) defined by \(g(x)=1\) if \(w_+(x)\le w\) and \(g(x)=0\) if \(w_+(x)\ge \frac{w}{\lambda }\). We will amplify the success probability so that with high probability, \(\mathtt {Decide}\) returns g(x) correctly every time it is called by the algorithm, and we will assume that this is the case. The full witness estimation algorithm consists of repeated calls to \(\mathtt {Decide}\) as follows:

figure b

We can see by induction that for every i, \(e_{\min }^{(i)}\le \frac{1}{w_+(x)}\le e_{\max }^{(i)}\). This is certainly true for \(i=1\), since \(w_+(x)\ge \left\| {|}w_0\rangle \right\| ^2=1\). Suppose it is true at step i. At step i we run \(\mathtt {Decide}(P,w_i,\lambda _i)\) with \(w_i={1}/{e_1^{(i)}}\) and \(\frac{w_i}{\lambda _i}={1}/{e_0^{(i)}}\). If \(\frac{1}{w_+(x)}\ge e_1^{(1)}\), then \(\mathtt {Decide}\) returns 1, so we have \(\frac{1}{w_+(x)}\in [e_{0}^{(i)},e_{\max }^{(i)}]=[e_{\min }^{(i+1)},e_{\max }^{(i+1)}]\). If \(\frac{1}{w_+(x)}\le e_0^{(i)}\), then \(\mathtt {Decide}\) returns 0, so we have \(\frac{1}{w_+(x)}\in [e_{\min }^{(i)},e_{1}^{(i)}]=[e_{\min }^{(i+1)},e_{\max }^{(i+1)}]\). Otherwise, \(\frac{1}{w_+(x)}\in [e_0^{(i)},e_1^{(i)}]\), which is a subset of both \([e_{0}^{(i)},e_{\max }^{(i)}]\) and \([e_{\min }^{(i)},e_{1}^{(i)}]\), so in any case, \(\frac{1}{w_+(x)}\in [e_{\min }^{(i+1)},e_{\max }^{(i+1)}]\).

To see that the algorithm terminates, let \(\varDelta _i=e_{\max }^{(i)}-e_{\min }^{(i)}\) denote the length of the remaining interval at round i. We either have \(\varDelta _{i+1}=e_{\max }^{(i)}-e_0^{(i)}=e_{\max }^{(i)}-\frac{1}{3}e_{\max }^{(i)}-\frac{2}{3}e_{\min }^{(i)}=\frac{2}{3}\varDelta _i\), or \(\varDelta _{i+1}=e_1^{(i)}-e_{\min }^{(i)}=\frac{2}{3}e_{\max }^{(i)}+\frac{1}{3}e_{\min }^{(i)}-e_{\min }^{(i)}=\frac{2}{3}\varDelta _i\), so \(\varDelta _i=(2/3)^{i-1}\). We terminate at the smallest T such that \((2/3)^{T-1}=\varDelta _T=e_{\max }^{(T)}-e_{\min }^{(T)}\le (1+\varepsilon -1) e_{\min }^{(T)}\le \frac{\varepsilon }{w_+(x)}\). Thus we terminate before \(T={\lceil }\log _{3/2}\frac{w_+(x)}{\varepsilon }+1\rceil \).

Next, we show that, assuming \(\mathtt {Decide}\) does not err, the estimate is correct to within \(\varepsilon \). Let \(\tilde{e}=\frac{1}{2}(e^{(T)}_{\max }+e^{(T)}_{\min })\) be the returned estimate. Recall that we only terminate when \(e_{\max }^{(T)}\le (1+\varepsilon )e_{\min }^{(T)}\). We have

$$\begin{aligned} \frac{1}{\tilde{e}}=\frac{2}{e_{\max }^{(T)}+e_{\min }^{(T)}}\le \frac{2}{e_{\max }^{(T)}\left( 1+\frac{1}{1+\varepsilon }\right) }\le \frac{2}{\frac{1}{w_+(x)}\left( \frac{2+\varepsilon }{1+\varepsilon }\right) }\le \left( 1+\varepsilon \right) w_+(x), \end{aligned}$$

and

$$\begin{aligned} \frac{1}{\tilde{e}}\ge & {} \frac{2}{e_{\min }(1+1+\varepsilon )} \ge \frac{1}{\frac{1}{w_+(x)}(1+\varepsilon /2)}\\= & {} \left( 1-\frac{\varepsilon /2}{1+\varepsilon /2}\right) w_+(x)\ge \left( 1-\frac{\varepsilon }{2}\right) w_+(x). \end{aligned}$$

Thus, \(|1/\tilde{e}-w_+(x)|\le \varepsilon w_+(x)\).

By Theorem 5, \(\mathtt {Decide}(P,w,\lambda )\) runs in cost \(O\left( \frac{\sqrt{w \widetilde{W}_-}}{(1-\lambda )^{3/2}}\log \frac{1}{1-\lambda }\right) \). Let \(w_i=1/e_1^{(i)}\) and \(\lambda _i=e_0^{(i)}/e_1^{(i)}\) be the values used at the i\(^\mathrm {th}\) iteration. Since \(e_1^{(i)}\le e_{\max }^{(i)}\le \frac{1}{w_+(x)}+\varDelta _i\), we have

$$\begin{aligned} \frac{1}{1-\lambda _i}= & {} \frac{e_1^{(i)}}{e_1^{(i)}-e_0^{(i)}}\le \frac{\frac{1}{w_+(x)}+\varDelta _i}{\frac{2}{3}e_{\max }^{(i)}+\frac{1}{3}e_{\min }^{(i)}-\frac{1}{3}e_{\max }^{(i)}-\frac{2}{3}e_{\min }^{(i)}}\\= & {} \frac{3}{w_+(x)\varDelta _i}+3=O\left( \frac{1}{\varepsilon }\right) , \end{aligned}$$

since \(\varDelta _i=(2/3)^{i-1}\ge (2/3)^{T-1}=\varOmega \left( \frac{\varepsilon }{w_+(x)}\right) \). Next, observe that \(\frac{\sqrt{w_i}}{(1-\lambda _i)^{3/2}}=\frac{e_1^{(i)}}{(e_1^{(i)}-e_0^{(i)})^{3/2}}\le \left( \frac{1}{w_+(x)}+\varDelta _i\right) \frac{3^{3/2}}{\varDelta _i^{3/2}},\) so, ignoring the \(\log \frac{1}{1-\lambda _i}=O(\log \frac{1}{\varepsilon })\) factor, the cost of the i\(^\mathrm {th}\) iteration can be computed as:

$$\begin{aligned} C_i= & {} \frac{\sqrt{w_i\widetilde{W}_-}}{(1-\lambda _i)^{3/2}} \le \sqrt{\widetilde{W}_-}\left( \frac{1}{w_+(x)}+\varDelta _i\right) \frac{3^{3/2}}{\varDelta _i^{3/2}}\\= & {} 3^{3/2}\frac{\sqrt{\widetilde{W}_-}}{w_+(x)}\left( \frac{3}{2}\right) ^{\frac{3}{2}(i-1)}+3^{3/2}\sqrt{\widetilde{W}_-}\left( \frac{3}{2}\right) ^{\frac{1}{2}(i-1)}. \end{aligned}$$

We can thus compute the total cost (neglecting logarithmic factors):

$$\begin{aligned} \sum _{i=1}^TC_i\!\le & {} \!\frac{\sqrt{\widetilde{W}_-}}{w_+(x)}\sum _{i=1}^T\left( \frac{3}{2}\right) ^{\frac{3}{2}(i-1)}+\sqrt{\widetilde{W}_-}\sum _{i=1}^T\left( {\frac{3}{2}}\right) ^{\frac{1}{2}(i-1)}\\\le & {} \frac{\sqrt{\widetilde{W}_-}}{w_+(x)}\frac{\left( \frac{3}{2}\right) ^{\frac{3}{2}T}-1}{\left( \frac{3}{2}\right) ^{3/2}-1}+\sqrt{\widetilde{W}_-}\frac{\left( \frac{3}{2}\right) ^{\frac{1}{2}T}-1}{\left( \frac{3}{2}\right) ^{1/2}-1}\\\le & {} \!O\left( \frac{\sqrt{\widetilde{W}_-}}{w_+(x)}\left( \frac{w_+(x)}{\varepsilon }\right) ^{3/2}+\sqrt{\widetilde{W}_-}\left( \frac{w_+(x)}{\varepsilon }\right) ^{1/2}\right) \\= & {} O\left( \frac{\sqrt{\widetilde{W}_-w_+(x)}}{\varepsilon ^{3/2}}\right) , \end{aligned}$$

using the fact that \((2/3)^{T}=\varTheta \left( \frac{\varepsilon }{w_+(x)}\right) \).

Finally, we have been assuming that \(\mathtt {Decide}\) returns the correct bit on every call. We now justify this assumption. At round i, we will amplify the success probability of \(\mathtt {Decide}\) to \(1-\frac{1}{9}(2/3)^{i-1}\), incurring a factor of \(\log (9(3/2)^{i-1})=O(\log \frac{w_+(x)}{\varepsilon })\) in the complexity. Then the total error is at most:

$$\begin{aligned} \sum _{i=1}^T \frac{1}{9}(2/3)^{i-1}=\frac{1}{9}\frac{1-(2/3)^{T-1}}{1-\frac{2}{3}}=\frac{1}{3}\left( 1-\frac{\varepsilon }{w_+(x)}\right) \le \frac{1}{3}. \end{aligned}$$

Thus, with probability \(\ge 2/3\), \(\mathtt {Decide}\) never errs, and the algorithm is correct. \(\square \)

4.4 Span Program Phase Gap

The scaling in the error from Theorem 11, \({1}/{\varepsilon ^{3/2}}\), is not ideal. For instance, we showed in Sect. 3.3 how to construct a quantum algorithm for approximate counting based on a simple span program for the OR function with complexity that scales like \({1}/{\varepsilon ^{3/2}}\) in the error, whereas the best quantum algorithm for this task has complexity scaling as \({1}/{\varepsilon }\) in the error. However, the following theorem, which is a corollary to Lemmas 4 and 6, gives an alternative analysis of the complexity of the algorithm in Theorem 11 that may be better in some cases, and in particular, has the more natural error dependence \({1}/{\varepsilon }\).

Theorem 12

Fix \(f:X\subseteq [q]^n\rightarrow \mathbb {R}_{>0}\). Let P be a normalized span program on \([q]^n\) such that \(X\subseteq P_0\), and \(\forall x\in X\), \(w_-(x,P)=f(x)\). Define \(\varDelta (f)=\min _{x\in X}\varDelta (U(P,x))\). Then there is a quantum algorithm that estimates f to relative accuracy \(\varepsilon \) using \(\widetilde{O}\left( \frac{1}{\varepsilon }\frac{\sqrt{w_-(x,P)}}{\varDelta (f)}\right) \) queries. Similarly, let P be a normalized span program such that \(X\subseteq P_1\), and \(\forall x\in X\), \(w_+(x,P)=f(x)\). Define \(\varDelta '(f)=\min _{x\in X}\varDelta (U'(P,x))\). Then there is a quantum algorithm that estimates f with relative accuracy \(\varepsilon \) using \(\widetilde{O}\left( \frac{1}{\varepsilon }\frac{\sqrt{w_+(x,P)}}{\varDelta '(f)}\right) \) queries.

Proof

To estimate \(w_-(x)\), we can use phase estimation of U(Px) applied to \({|}w_0\rangle \), with precision \(\varDelta =\varDelta (f)\) and accuracy \(\eta =\frac{\varepsilon }{8}\frac{1}{W_-(P,f)}\), however, this results in \(\log W_-\) factors, and \(W_-\) may be significantly larger than \(w_-(x)\). Instead, we will start with \(\eta =\frac{1}{2}\), and decrease it by 1 / 2 until \(\eta \approx \frac{\varepsilon }{w_-(x,P)}\).

Let \({|}w_0'\rangle \) be the result of applying phase estimation to precision \(\varDelta =\varDelta (f)\) and accuracy \(\eta \), and let \(\varLambda _0\) be the projector onto states with 0 in the phase register. We will then estimate \(\left\| \varLambda _0{|}w_0'\rangle \right\| ^2\) to relative accuracy \(\varepsilon /4\) using amplitude estimation. Since \(\varDelta \le \varDelta (U(P,x))\), we have \(\left\| \varPi _0^x{|}w_0\rangle \right\| ^2\le \left\| \varLambda _0{|}w_0'\rangle \right\| ^2\le \left\| \varPi _\varDelta ^x{|}w_0\rangle \right\| ^2+\eta =\left\| \varPi _0^x{|}w_0\rangle \right\| ^2+\eta \). By Lemma 4, we have \(\left\| \varPi _0^x{|}w_0\rangle \right\| ^2=\frac{1}{w_-(x)}\), so we will obtain an estimate \(\tilde{p}\) of \(\frac{1}{w_-(x)}\) such that

$$\begin{aligned} \left( 1-\frac{\varepsilon }{4}\right) \frac{1}{w_-(x)}\le \tilde{p}\le \left( 1+\frac{\varepsilon }{4}\right) \left( \frac{1}{w_-(x)}+\eta \right) . \end{aligned}$$

If \(\tilde{p}>2(1+\frac{\varepsilon }{4})\eta \), then we know that \(\frac{1}{w_-(x)}\ge \eta \), so we perform one more estimate with accuracy \(\eta '=\frac{\varepsilon }{8}\eta \le \frac{\varepsilon }{8}\frac{1}{w-(x)}\) and return the resulting estimate. Otherwise, we let \(\eta '=\eta /2\) and repeat.

To see that we will eventually terminate, suppose \(\eta \le \frac{1}{4w_-(x)}\). Then

$$\begin{aligned} {\tilde{p}}\ge (1-\varepsilon /4)\frac{1}{w_-(x)}\ge (3/4)4\eta \ge (3/4)({4}/{5})(1+\varepsilon /4) 4\eta \ge 2(1+\varepsilon /4)\eta , \end{aligned}$$

so the algorithm terminates. Upon termination, we have

$$\begin{aligned} {\tilde{p}} \le \left( 1+\frac{\varepsilon }{4}\right) \left( \frac{1}{w_-(x)}+\eta \right) \le \left( 1+\frac{\varepsilon }{4}\right) \left( \frac{1}{w_-(x)}+\frac{\varepsilon }{8}\frac{1}{w_-(x)}\right) \le \left( 1+\frac{\varepsilon }{2}\right) \frac{1}{w_-(x)}, \end{aligned}$$

so \(|1/{\tilde{p}}-w_-(x)|\le \varepsilon w_-(x).\) By Theorem 1 and  2, the number of calls to U is:

$$\begin{aligned}&\sum _{i=0}^{\log 4w_-(x)}\frac{1}{\varDelta }\frac{\sqrt{w_-(x)}}{\varepsilon }\log 2^i+\frac{\sqrt{w_-(x)}}{\varDelta \varepsilon }\log \frac{w_-(x)}{\varepsilon }\\&\quad =\frac{1}{\varDelta }\frac{\sqrt{w_-(x)}}{\varepsilon }\left( \sum _{i=0}^{\log 6w_-(x)}i+\log \frac{w_-(x)}{\varepsilon }\right) , \end{aligned}$$

which is at most \(\frac{\sqrt{w_-(x)}}{\varDelta }{\varepsilon }\log ^2\frac{w_-(x)}{\varepsilon }=\widetilde{O}\left( \frac{\sqrt{w_-(x)}}{\varDelta \varepsilon }\right) \). Similarly, we can estimate \(w_+(x)\) to relative accuracy \(\varepsilon \) using \(\widetilde{O}\left( \frac{\sqrt{w_+(x)}}{\varDelta '\varepsilon }\right) \) calls to \(U'\).

Theorem 12 is only useful if a lower bound on the phase gap of U(Px) or \(U'(P,x)\) can be computed. This may not always be feasible, but the following two theorems show it is sufficient to compute the spectral norm of A, and the spectral gap, or specifically, smallest nonzero singular value, of the matrix \(A(x)=A\varPi _{H(x)}\). This may still not be an easy task, but in Sect. 5, we show that we can get a better algorithm for estimating the effective resistance by this analysis, which, in the case of effective resistance, is very simple.

Theorem 13

Let P be any span program on \([q]^n\). For any \(x\in [q]^n\), we have \(\varDelta (U(P,x))\ge 2\frac{\sigma _{\min }(A(x))}{\sigma _{\max }(A)}.\)

Proof

Let \(U=U(P,x)\). Consider \(-U=(2\varPi _{(\ker A)^\bot }-I)(2\varPi _{H(x)}-I)\). By Corollary 2, if D is the discriminant of \(-U\), then \(\varDelta (U)\ge 2\sigma _{\min }(D)\), so we will lower bound \(\sigma _{\min }(D)\). Since the orthogonal projector onto \((\ker A)^\bot =\mathrm {row}A\) is \(A^+A\), we have \(D=A^+A\varPi _{H(x)}=A^+A(x)\).

We have \(\sigma _{\min }(D)=\min _{{|}u\rangle \in \mathrm {row}D}\frac{\left\| D{|}u\rangle \right\| }{\left\| {|}u\rangle \right\| }\), so let \({|}u\rangle \in \mathrm {row}D\) be a unit vector that minimizes \(\left\| D{|}u\rangle \right\| \). Since \({|}u\rangle \in \mathrm {row}D\subseteq \mathrm {row}A(x)\), we have \(\left\| A(x){|}u\rangle \right\| \ge \sigma _{\min }(A(x))\). Since \(A(x){|}u\rangle \in \mathrm {col}A(x)\subseteq \mathrm {col}A=\mathrm {row}A^+\), we have

$$\begin{aligned} \sigma _{\min }(D)= & {} \left\| A^+A(x){|}u\rangle \right\| \ge \sigma _{\min }(A^+)\left\| A(x){|}u\rangle \right\| \\\ge & {} \sigma _{\min }(A^+)\sigma _{\min }(A(x))=\frac{\sigma _{\min }(A(x))}{\sigma _{\max }(A)}, \end{aligned}$$

since \(\sigma _{\min }(A^+)=\frac{1}{\sigma _{\max }(A)}\). Thus \(\varDelta (U)\ge 2\frac{\sigma _{\min }(A(x))}{\sigma _{\max }(A)}\). \(\square \)

Theorem 14

Let P be any span program. \(\forall x\in P_1\), \(\varDelta (U'(P,x))\ge 2\frac{\sigma _{\min }(A(x))}{\sigma _{\max }(A)}\).

Proof

We have

$$\begin{aligned} -U'(P,x)^\dagger= & {} (2(I-\varPi _{\ker A\oplus \mathrm {span}\{{|}w_0\rangle \}})-I)(2\varPi _{H(x)}-I)\\= & {} (2(I-\varPi _{\ker A}-\varPi _{{|}w_0\rangle })-I)(2\varPi _{H(x)}-I), \end{aligned}$$

since \({|}w_0\rangle \in (\ker A)^\bot \), so \(-U'(P,x)^\dagger \) has discriminant:

$$\begin{aligned} D'= & {} (\varPi _{(\ker A)^\bot }-\varPi _{{|}w_0\rangle })\varPi _{H(x)}=\varPi _{(\ker A)^\bot }\varPi _{H(x)}-\varPi _{{|}w_0\rangle }\varPi _{(\ker A)^\bot }\varPi _{H(x)}\\= & {} \varPi _{{|}w_0\rangle ^\bot }D. \end{aligned}$$

Since \(x\in P_1\), let \({|}w_x\rangle =A(x)^+{|}\tau \rangle \). Then \(D{|}w_x\rangle =A^+A(x){|}w_x\rangle =A^+{|}\tau \rangle ={|}w_0\rangle \), so \({|}w_0\rangle \in \mathrm {col}D\). Let \(\{{|}\phi _0\rangle ={|}w_0\rangle ,{|}\phi _1\rangle ,\dots ,{|}\phi _{r-1}\rangle \}\) be an orthogonal basis for \(\mathrm {col}D\). Then we can write \(D=\sum _{i=0}^{r-1}{|}\phi _i\rangle {\langle }v_i|\) for \({|}v_i\rangle =D^\dagger {|}\phi _i\rangle \ne 0\) (not necessarily orthogonal). Then \(D'=\sum _{i=0}^{r-1}\varPi _{{|}w_0\rangle ^\bot }{|}\phi _i\rangle {\langle }v_i|=\sum _{i=1}^{r-1}{|}\phi _i\rangle {\langle }v_i|\), so \(\mathrm {col}D'=\mathrm {span}\{{|}\phi _1\rangle ,\dots ,{|}\phi _{r-1}\rangle \}=\{{|}\phi \rangle \in \mathrm {col}D:{{\langle }\phi \vert }w_0\rangle =0\}\). Thus:

$$\begin{aligned} \sigma _{\min }(D')= & {} \min _{{|}u\rangle \in \mathrm {col}D'}\frac{\left\| {\langle }u|D' \right\| }{\left\| {|}u\rangle \right\| } = \min _{{|}u\rangle \in \mathrm {col}D:{{\langle }w_0\vert }u\rangle =0}\frac{\left\| {\langle }u|\varPi _{{|}w_0\rangle ^\bot }D \right\| }{\left\| {|}u\rangle \right\| }\\= & {} \min _{{|}u\rangle \in \mathrm {col}D:{{\langle }w_0\vert }u\rangle =0}\frac{\left\| {\langle }u|D \right\| }{\left\| {|}u\rangle \right\| } \ge \min _{{|}u\rangle \in \mathrm {col}D}\frac{\left\| {\langle }u|D \right\| }{\left\| {|}u\rangle \right\| } = \sigma _{\min }(D). \end{aligned}$$

By the proof of Theorem 13, we have \(\sigma _{\min }(D)\ge \frac{\sigma _{\min }(A(x))}{\sigma _{\max }(A)}\) and by Corollary 2, we have \(\varDelta (U'(P,x)^\dagger )=\varDelta (U'(P,x))\ge 2\sigma _{\min }(D')\ge 2\sigma _{\min }(D)\ge 2\frac{\sigma _{\min }(A(x))}{\sigma _{\max }(A)}\). \(\square \)

Combining the last three theorems, we get the following, which has Theorem 7 as a special case:

Theorem 15

Fix \(f:X\subseteq [q]^n\rightarrow \mathbb {R}_{>0}\), and define \(\kappa (f)=\max _{x\in X}\dfrac{\sigma _{\max }(A)}{\sigma _{\min }(A(x))}\). Let P be any span program on \([q]^n\) such that \(X\subseteq P_0\) (resp. \(X\subseteq P_1\)), and for all \(x\in X\), \(f(x)=w_-(x,P)\) (resp. \(f(x)=w_+(x,P)\)). Let \(N=\left\| {|}w_0\rangle \right\| ^2\). Then there is a quantum algorithm that estimates f to relative accuracy \(\varepsilon \) using \(\widetilde{O}\left( \frac{\kappa (f)}{\varepsilon }\sqrt{Nf(x)}\right) \) (resp. \(\widetilde{O}\left( \frac{\kappa (f)}{\varepsilon }\sqrt{\frac{f(x)}{N}}\right) \)) queries.

Proof

Let \(P'\) be the span program that is the same as P, but with target \(\tau '=\frac{\tau }{\sqrt{N}}\). Then it is clear that \(\frac{{|}w_0\rangle }{\sqrt{N}}\) is the minimal positive witness of \(P'\), and furthermore, it has norm 1, so \(P'\) is normalized. We can similarly see that for any \(x\in P_1\), if \({|}w_x\rangle \) is an optimal positive witness for x in P, then \(\frac{1}{\sqrt{N}}{|}w_x\rangle \) is an optimal positive witness for x in \(P'\), so \(w_+(x,P')=\frac{w_+(x,P)}{N}\). Similarly, for any \(x\in P_0\), if \(\omega _x\) is an optimal negative witness for x in P, then \(\sqrt{N}\omega _x\) is an optimal negative witness for x in \(P'\), so \(w_-(x,P')=Nw_-(x,P)\). By Theorems 13 and 14, for all \(x\in X\), \(\frac{1}{\varDelta (U(P',x))}\le \kappa (f)\) (resp. \(\frac{1}{\varDelta (U'(P',x))}\le \kappa (f)\)). The result then follows from Theorem 12. \(\square \)

5 Applications

In this section, we will demonstrate how to apply the ideas from Sect. 4 to get new quantum algorithms. Specifically, we will give upper bounds of \(\widetilde{O}(n\sqrt{R_{s,t}}/\varepsilon ^{3/2})\) and \(\widetilde{O}(n\sqrt{R_{s,t}/\lambda _2}/\varepsilon )\) on the time complexity of estimating the effective resistance, \(R_{s,t}\), between two vertices, s and t, in a graph. Unlike previous upper bounds, we study this problem in the adjacency model, however, there are similarities between the ideas of this upper bound and a previous quantum upper bound in the edge-list model due to Wang [23], which we discuss further at the end of this section.

A unit flow from s to t in G is a real-valued function \(\theta \) on the directed edges \(\overset{\rightarrow }{E}(G)=\{(u,v):\{u,v\}\in E(G)\}\) such that:

  1. 1.

    \(\forall (u,v)\in \overset{\rightarrow }{E}\), \(\theta (u,v)=-\theta (v,u)\);

  2. 2.

    \(\forall u\in [n]\setminus \{s,t\}\), \(\sum _{v\in \varGamma (u)}\theta (u,v)=0\), where \(\varGamma (u)=\{v\in [n]:\{u,v\}\in E\}\);

  3. 3.

    \(\sum _{u\in \varGamma (s)}\theta (s,u)=\sum _{u\in \varGamma (t)}\theta (u,t)=1\).

Let \({\mathcal {F}}\) be the set of unit flows from s to t in G. The effective resistance from s to t in G is defined:

$$\begin{aligned} R_{s,t}(G)=\min _{\theta \in {{\mathcal {F}}}}\sum _{\{u,v\}\in E(G)}\theta (u,v)^2. \end{aligned}$$

This quantity gives the resistance of a network of unit resistors described by G, but is also an interesting quantity for graph theoretic reasons. For instance, the commute time between s and t, which is the expected number of steps in a random walk starting from s to reach t, and then return to s, is exactly the product of the number of edges in G, and \(R_{s,t}(G)\) [9].

In the adjacency model, we are given, as input, a string \(x\in \{0,1\}^{n\times n}\), representing a graph \(G_x=([n],\{\{i,j\}:x_{i,j}=1\})\) (we assume that \(x_{i,i}=0\) for all i, and \(x_{i,j}=x_{j,i}\) for all ij). The problem of st-connectivity is the following. Given as input \(x\in \{0,1\}^{n\times n}\) and \(s,t\in [n]\), decide if there exists a path from s to t in \(G_x\); that is, whether or not s and t are in the same component of \(G_x\). A span-program-based algorithm for this problem was given in [7], with time complexity \(\widetilde{O}(n\sqrt{p})\), under the promise that, if s and t are connected in \(G_x\), they are connected by a path of length \(\le p\). They use the following span program, defined on \(\{0,1\}^{n\times n}\):

$$\begin{aligned} H_{(u,v),0}= & {} \{0\},\; H_{(u,v),1}=\mathrm {span}\{{|}u,v\rangle \},\; V=\mathbb {R}^{n},\\ A= & {} \sum _{u,v\in [n]}({|}u\rangle -{|}v\rangle ){\langle }u,v|,\; {|}\tau \rangle ={|}s\rangle -{|}t\rangle . \end{aligned}$$

We have \(H=\mathrm {span}\{{|}u,v\rangle :u,v\in [n]\}\), and \(H(x)=\mathrm {span}\{{|}u,v\rangle :\{u,v\}\in E(G_x)\}\). Throughout this section, P will denote the above span program. We will use this span program to define algorithms for estimating the effective resistance. Ref. [7] are even able to show how to efficiently implement a unitary similar to U(Px), giving a time efficient algorithm. In “Appendix B”, we adapt their proof to our setting, showing how to efficiently implement \(U'(P^\beta ,x)\) for any \(n^{-O(1)}\le \beta \le n^{O(1)}\) and efficiently construct the initial state \({|}w_0\rangle \), making our algorithms time efficient as well.

The effective resistance between s and t is related to st-connectivity by the fact that if s and t are not connected, then \(R_{s,t}\) is infinite (there is no flow from s to t) and if s and t are connected then \(R_{s,t}\) is related to the number and length of paths from s to t. In particular, if s and t are connected by a path of length p, then \(R_{s,t}(G)\le p\) (take the unit flow that simply travels along this path). In general, if s and t are connected in G, then \(\frac{2}{n}\le R_{s,t}(G)\le n-1\). The span program for st-connectivity is amenable to the task of estimating the effective resistance due to the following.

Lemma 8

[7] For any graph \(G_x\) on [n], \(x\in P_1\) if and only if s and t are connected, and in that case, \(w_+(x,P)=\frac{1}{2}R_{s,t}(G_x)\).

The following is a near immediate consequence of Lemma 8 and Theorem 6.

Theorem 16

There exists a quantum algorithm for estimating \(R_{s,t}(G_x)\) to accuracy \(\varepsilon \) with time complexity \(\widetilde{O}\left( \frac{n\sqrt{R_{s,t}(G_x)}}{\varepsilon ^{3/2}}\right) \) and space complexity \(O(\log n)\).

Proof

We merely observe that if G is a connected graph, an approximate negative witness is \(\omega :[n]\rightarrow \mathbb {R}\) that minimizes \(\left\| \omega A\varPi _{H(x)} \right\| ^2=\sum _{\{u,v\}\in E}(\omega (u)-\omega (v))^2\) and satisfies \(\omega (s)-\omega (t)=1\). That is, \(\omega \) is the voltage induced by a unit potential difference between s and t (see [10] for details). This is not unique, but if we fix \(\omega (s)=1\) and \(\omega (t)=0\), then the \(\omega \) that minimizes \(\left\| \omega A\varPi _{H(x)} \right\| ^2\) is unique, and this is without loss of generality. In that case, for all \(u\in [n]\), \(0\le \omega (u)\le 1\), so \(\textstyle \tilde{w}_-(x)=\left\| \omega A \right\| ^2=\sum _{u,v\in [n]}(\omega (u)-\omega (v))\le 2n^2\quad \text{ and } \text{ thus } \quad \widetilde{W}_-\le 2n^2.\)

By Theorem 6, we can estimate \(R_{s,t}\) to precision \(\varepsilon \) using \(\widetilde{O}\left( \frac{\sqrt{\widetilde{W}_-w_+(x)}}{\varepsilon ^{3/2}}\right) =\widetilde{O}\left( \frac{n\sqrt{R_{s,t}(G_x)}}{\varepsilon ^{3/2}}\right) \) calls to \(U'(P^\beta ,x)\) for some \(\beta \), which, by Theorem 19, costs \(O(\log n)\) time and space. \(\square \)

By analyzing the spectra of A and A(x), and applying Theorem 7, we can get an often better algorithm (Theorem 17). The spectral gap of a graph G, denoted \(\lambda _2(G)\), is the second smallest eigenvalue (including multiplicity) of the Laplacian of G, which is defined \(L_G=\sum _{u\in [n]}d_u{|}u\rangle {\langle }u|-\sum _{u\in [n]}\sum _{v\in \varGamma (u)}{|}u\rangle {\langle }v|\), where \(d_u\) is the degree of u, and \(\varGamma (u)\) is the set of neighbours of u. The smallest eigenvalue of \(L_G\) is 0 for any graph G. A graph G is connected if and only if \(\lambda _2(G)>0\). A connected graph G has \(\frac{2}{n^2}\le \lambda _2(G)\le n\).

The following theorem is an improvement over Theorem 16 when \(\lambda _2(G)> \varepsilon \). In particular, it is an improvement for all \(\varepsilon \) when we know that \(\lambda _2(G)>1\).

Theorem 17

Let \(\mathcal {G}\) be a family of graphs such that for all \(x\in \mathcal {G}\), \(\lambda _2(G_x)\ge \mu \). Let \(f:\mathcal {G}\times [n]\times [n]\rightarrow \mathbb {R}_{>0}\) be defined by \(f(x,s,t)=R_{s,t}(G_x)\). There exists a quantum algorithm for estimating f to relative accuracy \(\varepsilon \) that has time complexity \(\widetilde{O}\left( \frac{1}{\varepsilon }n\sqrt{R_{s,t}(G_x)/\mu }\right) \) and space complexity \(O(\log n)\).

Proof

Before applying Theorem 7, we compute \(\left\| {|}w_0\rangle \right\| ^2\), to normalize P.

Lemma 9

\(N=\left\| {|}w_0\rangle \right\| ^2=\frac{1}{n}\).

Proof

Since \(H(x)=H\) when \(G_x\) is the complete graph, by Lemma 8, we need only compute \(R_{s,t}\) in the complete graph. It is simple to verify that the optimal unit st-flow in the complete graph has \(\frac{1}{n}\) units of flow on every path of the form (sut) for \(u\in [n]\setminus \{s,t\}\), and \(\frac{2}{n}\) units of flow on the edge (st). Thus, \(R_{s,t}(K_n)=\sum _{u\in [n]\setminus \{s,t\}}2(1/n)^2+(2/n)^2=2/n\). So \(\left\| {|}w_0\rangle \right\| ^2=\frac{1}{2}R_{s,t}(K_n)=\frac{1}{n}\).

Next, we compute the following:

Lemma 10

For any \(x\in \mathcal {G}\), \(\frac{\sigma _{\max }(A)}{\sigma _{\min }(A(x))}=\sqrt{\frac{n}{\lambda _2(G_x)}}\le \sqrt{\frac{n}{\mu }}\), so \(\kappa (f)\le \sqrt{\frac{n}{\mu }}\).

Proof

Let \(L_x\) denote the Laplacian of \(G_x\). We have:

$$\begin{aligned} A(x)A(x)^T= & {} \sum _{u\in [n]}\sum _{v\in \varGamma (u)}({|}u\rangle -{|}v\rangle )({\langle }u|-{\langle }v|)\\= & {} 2\sum _{u\in [n]}d_u{|}u\rangle {\langle }u|-2\sum _{u\in [n]}\sum _{v\in \varGamma (u)}{|}u\rangle {\langle }v|=2L_x. \end{aligned}$$

Thus, if L denotes the Laplacian of the complete graph, we also have \(AA^T=2L\). Letting J denote the all ones matrix, we have \(L=(n-1)I-(J-I)=nI-J\), and since \(J=n{|}u\rangle {\langle }u|\) where \({|}u\rangle =\frac{1}{\sqrt{n}}\sum _{i=1}^n{|}i\rangle \), if \({|}u_1\rangle ,\dots ,{|}u_{n-1}\rangle ,{|}u\rangle \) is any orthonormal basis of \(\mathbb {R}^n\), then \(L=n\sum _{i=1}^{n-1}{|}u_i\rangle {\langle }u_i|+n{|}u\rangle {\langle }u|-n{|}u\rangle {\langle }u|=\sum _{i=1}^{n-1}n{|}u_i\rangle {\langle }u_i|\), so the spectrum of L is 0, with multiplicity 1, and n with multiplicity \(n-1\). Thus, the only nonzero singular value of A is \(\sqrt{2n}=\sigma _{\max }(A)\). Furthermore, since \(\lambda _2(G_x)\) is the smallest nonzero eigenvalue of \(L_x\), and \(A(x)A(x)^T=2L_x\), \(\sigma _{\min }(A(x))=\sqrt{2\lambda _2(G_x)}\). The result follows. \(\square \)

Finally, by Lemma 8, we have \(w_+(x,P)=\frac{1}{2}R_{s,t}(G_x)\), so, applying Theorem 15, we get an algorithm that makes \(\widetilde{O}\left( \frac{\kappa (f)}{\varepsilon }\sqrt{\frac{w_+(x,P)}{N}}\right) =\widetilde{O}\left( \frac{1}{\varepsilon }\sqrt{n/\mu }\sqrt{R_{s,t}n}\right) \) calls to \(U'(P,x)\). By Theorem 19, the time complexity of this algorithm is \(\widetilde{O}\left( \frac{1}{\varepsilon }n\sqrt{R_{s,t}/\mu }\right) \) and the space complexity is \(O(\log n)\). \(\square \)

Both of our upper bounds have linear dependence on n, and the following theorem shows that this is optimal.

Theorem 18

(Lower Bound) There exists a family of graphs \(\mathcal {G}\) such that estimating effective resistance on \(\mathcal {G}\) costs at least \(\varOmega (n)\) queries.

Fig. 1
figure 1

The graphs in \(\mathcal {G}_0\) contain only the solid edges. The graphs in \(\mathcal {G}_1\) contain the solid edges and one of the dashed edges. We can embed an instance of OR in the dashed edges. If a dashed edge is included, the number of st-paths increases, decreasing the effective resistance

Proof

Let \(\mathcal {G}_0\) be the set of graphs consisting of two stars \(K_{1,n/2-1}\), centered at s and t, with an edge connecting s and t (see Fig. 1). Let \(\mathcal {G}_1\) be the set of graphs consisting of graphs from \(\mathcal {G}_0\) with a single edge added between two degree one vertices from different stars. Let \(\mathcal {G}=\mathcal {G}_0\cup \mathcal {G}_1\). We first note that we can distinguish between \(\mathcal {G}_0\) and \(\mathcal {G}_1\) by estimating effective resistance on \(\mathcal {G}\) to accuracy \(\frac{1}{10}\): If \(G\in \mathcal {G}_0\), then there is a single st-path, consisting of one edge, so the effective resistance is 1. If \(G\in \mathcal {G}_1\), then there are two st-paths, one of length 1 and one of length 3. We put a flow of \(\frac{1}{4}\) on the length-3 path and \(\frac{3}{4}\) on the length-1 path to get effective resistance at most \((3/4)^2+3(1/4)^2=\frac{3}{4}\).

We now describe how to embed an instance \(y\in \{0,1\}^{(n/2-1)^2}\) of OR\(_{(n/2-1)^2}\) in a graph. We let \(s=1\) be connected to every vertex in \(\{2,\dots ,n/2\}\), and \(t=n\) be connected to every vertex in \(\{n/2+1,\dots ,n-1\}\). Let the values of \(\{G_{i,j}:i\in \{2,\dots ,n/2\},j\in \{n/2,\dots ,n-1\}\}\) be determined by y. Let all other values \(G_{i,j}\) be 0. Then clearly \(R_{s,t}(G)\ge 1\) if and only if \(y=0\dots 0\) (in that case \(G\in \mathcal {G}_0\)) and otherwise, \(R_{s,t}(G)\le 3/4\), since there is at least one extra path from s to t (in that case \(G\in \mathcal {G}_1\)). The result follows from the lower bound of \(\varOmega (\sqrt{(n/2-1)^2})=\varOmega (n)\) on OR\(_{(n/2-1)^2}\). \(\square \)

Discussion The algorithms from Theorems 16 and 17 are the first quantum algorithms for estimating the effective resistance in the adjacency model, however, the problem has been studied previously in the edge-list model [23], where Wang obtains a quantum algorithm with complexity \(\widetilde{O}\left( \frac{d^{3/2}\log n}{\varPhi (G)^2\varepsilon }\right) \), where \(\varPhi (G)\le 1\) is the conductance (or edge-expansion) of G. In the edge-list model, the input \(x\in [n]^{[n]\times [d]}\) models a d-regular graph (or d-bounded degree graph) \(G_x\) by \(x_{u,i}=v\) for some \(i\in [d]\) whenever \(\{u,v\}\in E(G_x)\). Wang requires edge-list queries to simulate walking on the graph, which requires constructing a superposition over all neighbours of a given vertex. This type of edge-list query can be simulated by \(\sqrt{n/d}\) adjacency queries to a d-regular graph, using quantum search, so Wang’s algorithm can be converted to an algorithm in the adjacency query model with cost \(\widetilde{O}\left( \frac{d^{3/2}}{\varPhi (G)^2\varepsilon }\sqrt{\frac{n}{d}}\right) \). We can compare our results to this by noticing that \(R_{s,t}\le \frac{1}{\lambda _2(G)}\) [9], implying that our algorithm always runs in time at most \(\widetilde{O}\left( \frac{1}{\varepsilon }\frac{n}{\mu }\right) \). If G is a connected d-regular graph, then \(\lambda _2(G)=d\delta (G)\), where \(\delta (G)\) is the spectral gap of a random walk on G. By Cheeger inequalities, we have \(\frac{\varPhi ^2}{2}\le \delta \) [17], so the complexity of the algorithm from Theorem 17 is at most \(\widetilde{O}\left( \frac{1}{\varepsilon }\frac{n}{d\delta }\right) =\widetilde{O}\left( \frac{1}{\varepsilon }\frac{n}{d\varPhi ^2}\right) \), which is an improvement over the bound of \(\widetilde{O}\left( \frac{1}{\varepsilon }\frac{d^{3/2}}{\varPhi ^2}\sqrt{\frac{n}{d}}\right) =\widetilde{O}\left( \frac{1}{\varepsilon }\frac{d}{\varPhi ^2}\sqrt{n}\right) \) given by naively adapting Wang’s algorithm to the adjacency model whenever \(d>\root 4 \of {n}\). In general our upper bound may be much better than \(\frac{1}{\varepsilon }\frac{n}{d\varPhi ^2}\), since the Cheeger inequality is not tight, and \(R_{s,t}\) can be much smaller than \(\frac{1}{\lambda _2}\).

It is worth further discussing Wang’s algorithms for estimating effective resistance, due to their relationship with the ideas presented here. In order to get a time-efficient algorithm for st-connectivity, Belovs and Reichardt show how to efficiently reflect about the kernel of A (see also “Appendix B”), A being related to the Laplacian of a complete graph, L, by \(AA^T=2L\). This implementation consists, in part, of a quantum walk on the complete graph. Wang’s algorithm directly implements a reflection about the kernel of A(x) by instead using a quantum walk on the graph G, which can be done efficiently in the edge-list model. For general span programs, when a reflection about the kernel of A(x) can be implemented efficiently in such a direct way, this can lead to an efficient quantum algorithm for estimating the witness size.

We also remark on another quantum algorithm for estimating effective resistance, also from Wang [23] with worse complexity \(\widetilde{O}\left( \frac{d^8\mathrm {polylog}n}{\varPhi (G)^{10}\varepsilon ^2}\right) \), obtained by using the HHL algorithm [12] to estimate \(\left\| A(x)^+{|}\tau \rangle \right\| ^2\), which is the positive witness size of x, or in this case, the effective resistance. We remark that, for any span program, \(w_+(x)=\left\| {|}w_x\rangle \right\| ^2=\left\| A(x)^+{|}\tau \rangle \right\| ^2\), so HHL may be another means of estimating the positive witness size. There are several caveats: A(x) must be efficiently row-computable, and the complexity additionally depends on \(\frac{\sigma _{\max }(A(x))}{\sigma _{\min }(A(x))}\), the condition number of A(x) (We remark that this is upper bounded by \(\frac{\sigma _{\max }(A)}{\sigma _{\min }(A(x))}\), upon which the complexity of some of our algorithms depends as well). However, if this approach yields an efficient algorithm, it is efficient in time complexity, not only query complexity. We leave further exploration of this idea for future research.

6 Conclusion and Open Problems

Summary We have presented several new techniques for turning span programs into quantum algorithms, which we hope will have future applications. Specifically, given a span program P, in addition to algorithms for deciding any function f such that \(f^{-1}(0)\subseteq P_0\) and \(f^{-1}(1)\subseteq P_1\), we also show how to get several different algorithms for deciding a number of related threshold problems, as well as estimating the witness size. In addition to algorithms based on the standard effective spectral gap lemma, we also show how to get algorithms by analyzing the real phase gap.

We hope that the importance of this work lies not only in its potential for applications, but in the improved understanding of the structure and power of span programs. A number of very important quantum algorithms rely on a similar structure, using phase estimation of a unitary that depends on the input to distinguish between different types of inputs. Span-program-based algorithms represent a very general class of such algorithms, making them not only important to the study of the quantum query model, but to quantum algorithms in general.

Further Applications The main avenue for future work is in applications of our techniques to obtain new quantum algorithms. We stress that any span program for a decision problem can now be turned into an algorithm for estimating the positive or negative witness size, if these correspond to some meaningful function, or deciding threshold functions related to the witness size. A natural source of potential future applications is in the rich area of property testing problems (for a survey, see [18]).

Span Programs and HHL One final open problem, briefly discussed at the end of the previous section, is the relationship between estimating the witness size and the HHL algorithm [12]. The HHL algorithm can be used to estimate \(\left\| M^+{|}u\rangle \right\| ^2\), given the state \({|}u\rangle \) and access to a row-computable linear operator M. When \(M=A(x)\), this quantity is exactly \(w_+(x)\), so if A(x) is row-computable — that is, there is an efficient procedure for computing the i\(^\mathrm {th}\) nonzero entry of the j\(^\mathrm {th}\) row of A(x), then HHL gives us yet another means of estimating the witness size, whose time complexity is known, rather than only its query complexity. It may be interesting to explore this connection further.