Keywords

Mathematics Subject Classification (2010)

7.1 Introduction

Algebraic complexity theory is the study of computation via algebraic models, hence, algebraic techniques. In this article we work with only one model—arithmetic circuit (in short, circuit). A circuit \(C(x_1,\ldots ,x_n)\), over a ring \(R\), computes a polynomial \(f\) in \(R[x_1,\ldots ,x_n]\). Its description is in the form of a rooted tree; with the leaves having the variables or constants as input, the internal nodes computing addition or multiplication, and the root having the \(f\) as output. The edges in \(C\), called wires, carry the intermediate polynomials and could also be used to multiply by a constant (from \(R\)). By the size, respectively the depth, of \(C\) we mean the natural notions (sometimes to avoid “trivialities” we might want to take into account the bit-size needed to represent an element in \(R\)).

A moment’s thought would suggest that a circuit is a rather compact way of representing polynomials. Example a circuit of size \(s\) could produce a polynomial of degree \(2^s\) (hint: repeated squaring). In fact, a single product gate could multiply \(s\) linear polynomials and produce \(n^{\Omega (s)}\) many monomials. Thus, a circuit is an ‘exponentially’ compact representation of some polynomial families (as opposed to simply writing it as a sum of monomials). Conversely, are there ‘explicit’ polynomial families (say \(n\)-variate \(n\)-degree) that require exponential (i.e. \(2^{\Omega (n)}\)) sized circuits? We “expect” almost every polynomial to be this hard, but, the question of finding an explicit family is open and is the main goal motivating the development of algebraic complexity.

One can try to directly give a good lower bound against circuits by designing an explicit polynomial family \(\{f_n\}\) and prove that it requires a ‘large’-sized circuit family \(\{C_n\}\). The other, indirect, way is to design an efficient hitting-set \(\mathcal {H}\) for the circuit family, i.e. if \(C_n\ne 0\) then \(\exists a\in \mathcal {H}\), \(C_n(a)\ne 0\). This ‘flip’ from lower bounds to algorithms was first remarked by [HS80] and now it has several improved versions [KI04, Agr05, Agr06]. This is a remarkable phenomenon and is one of the primary motivations to study the question of PIT: Given a circuit \(C\) test it for zeroness, in time polynomial in size\((C)\). The hitting-set version of PIT is also called blackbox PIT (contrasted with whitebox PIT).

The last 10 years have seen a decent growth of algebraic tools and techniques to understand the properties of polynomials that a circuit computes. The feeling is that these polynomials are special, different from general polynomials, but a strong enough algebraic ‘invariant’ or a combinatorial ‘concept’ is still lacking. There have been several articles surveying the known techniques and the history of PIT [Sax09, AS09, SY10, CKW11, Sap13]. In this survey we will attempt not to repeat what those surveys have already covered. So, we will focus only on the new ideas and assume that the reader has given at least a cursory glance at the older ones. We directly move on to the Leitfaden.

7.1.1 Survey Overview

This article deals mainly with three broad topics—the ‘universality’ of depth-\(3\) circuits, the design of hitting-sets via ‘faithful’ morphisms and that via rank ‘concentration’. A major emerging area that we skip in this article is that of PIT vis à vis GCT (geometric complexity theory) program [Mul11, Mul12a, Mul12b]; the algebraic-geometry interpretations there are interesting though any concrete PIT algorithm, or application, is yet to emerge.

Shallow circuits A depth-\(2\) circuit (top \(+\) gate) of size \(s\), over a field, essentially computes a sum of \(s\) monomials. Such polynomials are called sparse polynomials; blackbox PIT for them was solved few decades ago. So, our next stop is depth-\(3\): Polynomials of the form

$$C = \sum _{i=1}^k \prod _{j=1}^d L_{i,j}, $$

where \(L_{i,j}\) are linear polynomials in \(\mathbb {F}[x_1,\ldots ,x_n]\). Significant research has been done with this model, but both subexponential PIT and exponential lower bounds are open here. Recently, a remarkable universality result was shown for depth-\(3\) [GKKS13]: If an \(n\)-variate \({{\mathrm{poly}}}(n)\)-degree polynomial can be nontrivially computed by a circuit, then it can be nontrivially computed in depth-\(3\). This ‘squashing’ of depth means that it suffices to focus on depth-\(3\) for PIT purposes.

If we consider a depth-\(2\) circuit (top \(\times \) gate), over a ring \(R\), then again we get some remarkable connections. Fix \(R\) to be the \(2\times 2\) matrix algebra \(M_2(\mathbb {F})\), and consider the circuit

$$D=\prod _{i=1}^d L_i,$$

where \(L_i\) are linear polynomials in \(R[x_1,\ldots ,x_n]\). Traditionally, \(D\) is called a width-2 algebraic branching program (ABP). It was shown by [SSS09] that depth-\(3\) PIT efficiently reduces to width-\(2\) ABP PIT.

 Faithful morphisms It was observed in the last few years that in all the known hitting-sets, the key idea in the proof is to work with a homomorphism \(\varphi \) and an algebraic property that the image of \(\varphi \) should preserve. [SS12] used a (Vandermonde-based) map \(\varphi :\mathbb {F}[x_1,\ldots ,x_n]\rightarrow \mathbb {F}[y_1,\ldots ,y_k]\) that preserves the ‘linear’ rank of any \(k\) linear polynomials. This gave the first blackbox PIT for bounded top fanin depth-\(3\), over any field.

Beecken et al. [BMS13] and Agrawal et al. [ASSS12] used a (Vandermonde and Kronecker-based) map \(\varphi :\mathbb {F}[x_1,\ldots , x_n]\rightarrow \mathbb {F}[y_1,\ldots ,y_k]\) that preserves the ‘algebraic’ rank (formally, transcendence degree) of certain \(k\) polynomials. This gave the first blackbox PIT (and lower bounds) for several well-studied classes of constant-depth circuits. One drawback of the technique is that it requires zero/large characteristic fields.

Rank concentration Inspired from the tensors, a restricted circuit model called multilinear read-once ABP (ROABP) has been intensively studied. Let \(R\) be the \(w\times w\) matrix algebra \(M_w(\mathbb {F})\) and let \(\{S_i\}\) be a partition of \([n]\). Consider the circuit \(D=\prod _{i=1}^d L_i,\) where \(L_i\) are linear polynomials in \(R[x_{S_i}]\) (i.e. the linear factors have disjoint variables). For \(D\) [FSS13] gave a hitting-set in time \({{\mathrm{poly}}}(wn)^{\log w\cdot \log n}\), i.e. quasi-poly-time. The proof is based on the idea, following [ASS13], that after applying a small (Kronecker-based) ‘shift’, \(D\) gets the following property: The rank of its coefficients (viewed as \(\mathbb {F}\)-vectors) is concentrated in the ‘low’ support monomials. Thus, checking the zeroness of these low monomials is enough!

We conjecture that rank concentration, after a ‘small’ shift, should be attainable in any ABP \(D\). But currently the proof techniques are not that strong. Recently, [AGKS13] have achieved rank concentration in multilinear depth-\(3\) circuits where the partitions (corresponding to each product gate) are ‘close’ to each other in the sense of ‘refinement’.

7.2 Shallow Circuits, Deep Interconnections

In this section, we exhibit the key ideas behind the universality of two shallow circuits.

7.2.1 The Depth-\(3\) Chasm

In the study of circuits one feels that low-depth should already hold the key. This feeling was confirmed in a series of work [VSBR83, AV08, Koi12, Tav13]: Any \({{\mathrm{poly}}}(n)\)-degree \(n\)-variate polynomial computed by a \({{\mathrm{poly}}}(n)\)-sized circuit \(C\) can also be computed by a \(n^{O(\sqrt{n})}\)-sized depth-\(4\) circuit!

The idea for this is, in retrospect, simple—since the degree is only \({{\mathrm{poly}}}(n)\), first, squash the depth of \(C\) to \(O(\log n)\) by only a polynomial blowup in the size. This is done in a way so as to make the product gates quite balanced, i.e. their two inputs are roughly of the same degree. Next, identify a subcircuit \(C_2\) by picking those gates whose output polynomial has degree at least \(\sqrt{n}\), and call the remaining subcircuit \(C_1\). We view \(C_2\) as our circuit of interest that takes gates of \(C_1\) as input. It can be shown that \(C_2\) computes a polynomial of degree \(\approx \) \(\sqrt{n}\) of its input variables (which are \({{\mathrm{poly}}}(n)\) many). Obviously, each gate of \(C_1\) also computes a polynomial of degree \(\approx \) \(\sqrt{n}\) of its input variables (which are \(x_1,\ldots ,x_n\)). Thus, \(C_2\) finally computes a sum of \(\approx \) \(\left( {\begin{array}{c}{{\mathrm{poly}}}(n)+\sqrt{n}\\ \sqrt{n}\end{array}}\right) \) products, each product has \(\sqrt{n}\) factors, and each factor is itself a sum of \(\approx \) \(\left( {\begin{array}{c}n+\sqrt{n}\\ \sqrt{n}\end{array}}\right) \) degree-\(\sqrt{n}\) monomials. To put it simply (and ignoring the constant factors), \(C\) can be expressed as a \(\sum \prod ^{\sqrt{n}}\sum \prod ^{\sqrt{n}}\) circuit of size \(n^{O(\sqrt{n})}\). The details of this proof can be seen in [Tav13].

The strength of depth-\(4\) is surprising. Recently, an even more surprising reduction has been shown [GKKS13]—that to depth-\(3\) (again, \(n^{O(\sqrt{n})}\) sized). We will now sketch the proof. It ties together the known results in an unexpected way.

Essentially, the idea is to modify a \(\sum \prod ^a\sum \prod ^a\) circuit \(C\) of size \(s:=n^a\) (where \(a:=\sqrt{n}\)) by using two polynomial identities that are in a way “inverse” to each other, and are to do with powers-of-linear-forms. First, replace the product gates using Fischer’s identity:

Lemma 2.1

 [Fis94] Any degree \(a\) monomial can be expressed as a linear combination of \(2^{a-1}\) \(a\)th powers of linear polynomials, as:

$$y_1\cdots y_a\ =\ (2^{a-1}\cdot a!)^{-1}\cdot \mathop {\sum }_{r_2,\ldots ,r_a\in \{\pm 1\}} \left( y_1 + \mathop {\sum }_{i=2}^{a} r_i y_i \right) ^a\cdot (-1)^{\#\{i|r_i=-1\}}. $$

We denote this type of a circuit by the notation \(\sum \bigwedge ^a\sum \), where the wedge signifies the powering by \(a\). The above identity transforms the \(\sum \prod ^a\sum \prod ^a\) circuit \(C\) to a \(\sum \bigwedge ^a\sum \bigwedge ^a\sum \) circuit, of size \({{\mathrm{poly}}}(s)\). We reuse \(s\) for this size estimate.

Next, the two power gates are ‘opened’ up using an identity introduced by the author:

Lemma 2.2

  [Sax08] For any \(a,m\), there exist degree-\(a\) univariate polynomials \(f_{i,j}\) such that

$$(y_1+\cdots +y_m)^a\ =\ \mathop {\sum }_{i=1}^{ma+1} \mathop {\prod }_{j=1}^{m} f_{i,j}(y_j).$$

Let us carefully see the jugglery on \(C\). The \(\sum \bigwedge ^a\sum \bigwedge ^a\sum \) circuit \(C\) has the expression \(C=\sum _i T_i\), where each \(T_i\) has the form \((\sum _{j=1}^s \ell _{i,j}^{e_{i,j}})^{a}\) with linear \(\ell _{i,j}\)’s. We want to open up the top power gate of \(C\). By Lemma 2.2 we get

$$T_i\ =\ \mathop {\sum }_{u=1}^{sa+1} \mathop {\prod }_{j=1}^{s} f_{u,j}(\ell _{i,j}^{e_{i,j}}).$$

Since \(f_{u,j}\) is a univariate, it splits into linear polynomials when the base field \(\mathbb {F}\) is algebraically closed. As \(\ell _{i,j}\) is already a linear polynomial, we deduce that \(T_i\), and hence \(C\), is a \(\sum \prod \sum \) circuit of size \({{\mathrm{poly}}}(s)\).

Finally, note that for the above arguments to work, we require \(\mathbb {F}\) to be algebraically closed and char\((\mathbb {F})>a\). Lemma 2.2 has been generalized to all characteristics by [FS13b], so it is likely that this depth-\(3\) reduction can be extended to all algebraically closed fields.

The optimality of \(n^{\sqrt{n}}\)-size, in this reduction, is open. However, [KSS13] showed that any decent reduction in this size bound would imply \({\textit{VNP}}\ne {\textit{VP}}\).

7.2.2 The Width-\(2\) Chasm

Here we look at \(\prod \sum \) circuits over a matrix algebra. Though the model \(D=\prod _i L_i\), with linear \(L_i\in R[x_1,\ldots ,x_n]\), seems innocuous at first sight, a closer look proves the opposite! It can be shown fairly easily that a polynomial computed by a constant-depth circuit (over a field) can as well be computed by a \(D\) over a \(3\times 3\) matrix algebra [BC88]. On the other extreme, by taking \(R=M_n(\mathbb {F})\) we can compute the determinant of a matrix in \(\mathbb {F}^{n\times n}\) [MV97], hence, arithmetic formulas (not general circuits!) can be simulated in this model [Val79].

Perhaps surprisingly, [SSS09] showed that: A polynomial \(C\) computed by a depth-\(3\) circuit (over a field) can be “almost”Footnote 1 computed by a \(D\) over a \(2\times 2\) matrix algebra. This, togetherwith the previous subsection, makes the \(\prod \sum \) circuits over \(M_2(\mathbb {F})\) quite strong.

Say, we want to express the depth-\(3\) circuit \(C=\sum _{i=1}^k T_i\) in a \(2\times 2\) matrix product. First, we express a product \(T_i=\prod _{j=1}^d \ell _{i,j}\) as:

$$ \left[ \begin{array}{c@{\quad }c} \ell _{i,1} &{} 0\\ 0 &{} 1 \end{array}\right] \cdots \left[ \begin{array}{c@{\quad }c} \ell _{i,d-1} &{} 0\\ 0 &{} 1 \end{array}\right] \cdot \left[ \begin{array}{c@{\quad }c} 1 &{} \ell _{i,d}\\ 0 &{} 1 \end{array}\right] = \left[ \begin{array}{c@{\quad }c} T_i' &{} T_i\\ 0 &{} 1 \end{array}\right] , \text { where } T_i':=T_i/\ell _{i,d}. $$

Once we have such \(k\) \(2\times 2\) matrices, each containing \(T_i\) in the \((1,2)\)th place, we would like to sum the \(T_i\)’s in a ‘doubling’ fashion (instead of one-by-one).

We describe one step of the iteration. Let \(\left[ \begin{array}{c@{\quad }c} L_1 &{} L_2f\\ 0 &{} L_3 \end{array}\right] \) and \(\left[ \begin{array}{c@{\quad }c} M_1 &{} M_2g\\ 0 &{} M_3 \end{array}\right] \) be encapsulating two intermediate summands \(f\) and \(g\). With the goal of getting (a multiple of) \(f+g\) we consider the following, carefully designed, product:

$$\begin{aligned}&\left[ \begin{array}{c@{\quad }c} L_1 &{} L_2f\\ 0 &{} L_3 \end{array}\right] \cdot \left[ \begin{array}{c@{\quad }c} L_2M_3 &{} 0\\ 0 &{} L_1M_2 \end{array}\right] \cdot \left[ \begin{array}{c@{\quad }c} M_1 &{} M_2g\\ 0 &{} M_3 \end{array}\right] \\&\quad = \left[ \begin{array}{c@{\quad }c} L_1M_1L_2M_3 &{} L_2M_3L_1M_2(f+g)\\ 0 &{} L_3M_3L_1M_2 \end{array}\right] \end{aligned}$$

After \(\log k\) such iterations, we get a multiple of \(C\) in the \((1,2)\)-th entry of the final \(2\times 2\) matrix product. Note that the middle matrix, introduced in the LHS above, potentially doubles (in the degree of the entry polynomials) in each iteration. Thus, finally, \(D\) is a product of \({{\mathrm{poly}}}(d2^{\log k})\) linear polynomials over \(M_2(\mathbb {F})\). Thus, the size blowup is only polynomial in going from depth-\(3\) to width-\(2\).

7.3 Faithful Morphisms, Hitting-Sets

In algebraic complexity the study of certain maps has been fruitful—homomorphisms \(\varphi :\mathcal {R}:=\mathbb {F}[x_1,\ldots ,x_n]\rightarrow \mathbb {F}[y_1,\ldots ,y_k]=:\mathcal {R}'\) such that the algebraic ‘relationship’ of certain polynomials \(\{f_1,\ldots ,f_k\}\) does not change in the image of \(\varphi \). When \(f_i\)’s are linear this boils down to a linear algebra question and we can easily design \(\varphi \) in time \({{\mathrm{poly}}}(n)\) (hint: employ Vandermonde matrix). This business becomes complicated when \(f_i\)’s are nonlinear. Then we have to ask how are \(f_i\)’s represented. If they are given via monomials then we invoke the Jacobian criterion to design \(\varphi \), but the time complexity becomes exponential in \(k\). Several variants of such faithful maps are discussed in the Ph.D thesis [Mit13]. We sketch the ideas behind two basic maps here.

7.3.1 Bounded Fanin Depth-3 Blackbox PIT

Let \(C=\sum _{i\in [k]} T_i\) be a depth-\(3\) circuit. When \(k\) is constant, \(C\) is naturally called bounded fanin depth-\(3\). This case of PIT has, by now, a rich history [DS07, KS07, KS11, SS11, KS09, SS13, SS12]. Several new techniques have sprung up from this model— a locally decodable code structure, a rank-preserving map via extractors, Sylvester-Gallai configurations (higher dimensions and all fields) and rank bounds. We will sketch here the main idea behind the poly-time blackbox PIT of bounded fanin depth-\(3\). The details are quite technical and could be seen in [SS13, SS12].

Vandermonde map We define a homomorphism \(\Psi _\beta \), for a \(\beta \in \mathbb {F}\), as:

$$ \forall i\in [n],\ \ \Psi _\beta : x_i \mapsto \sum _{j=1}^{k}\beta ^{ij}y_j, $$

and \(\Psi _\beta (\alpha )=\) \(\alpha \) for all \(\alpha \in \mathbb {F}\). This (naturally) defines the action of \(\Psi _\beta \), on all the elements of \(\mathcal {R}\), that preserves the ring operations. We have the following nice property, as a consequence of ([GR08, Lemma 6.1):

Lemma 3.1

  [\(\Psi _\beta \) preserves \(k\)-rank] Let \(S\) be a subset of linear forms in \(\mathcal {R}\) with \({{\mathrm{rk}}}(S)\le k\), and \(|\mathbb {F}|>nk^2\). Then \(\exists \beta \in \mathbb {F},\, {{\mathrm{rk}}}(\psi _\beta (S)) = {{\mathrm{rk}}}(S)\).

Intuitively, \(\Psi _\beta \) is faithful to any algebraic object involving the elements in span\((S)\). The proof of this lemma is by studying the coefficient-matrix of the linear polynomials in \(S\), and its change under \(\Psi _\beta \). This map has a role to play in bounded fanin depth-\(3\) owing to a certain structural theorem from [SS13]—certificate for a non-identity.

To discuss this certificate we need a definition, that of ‘paths’ of ‘nodes’ in \(C\) (assumed to be nonzero). A path \(\varvec{p}\) with respect to an ideal \(I\) is a sequence of terms \(\{p_1, p_2, \ldots , p_b\}\) (these are products of linear forms) with the following property. Each \(p_i\) divides \(T_i\), and each \(p_i\) is a ‘node’ of \(T_i\) with respect to the ideal \(\langle {I, p_1, p_2, \ldots , p_{i-1}} \rangle \).Footnote 2 So \(p_1\) is a node of \(T_1\) wrt \(I\), \(p_2\) is a node of \(T_2\) wrt \(\langle {I,p_1} \rangle \), etc.

Let us see an example of a path \((\langle {0} \rangle , p_1, p_2, p_3)\) in Fig. 7.1. The oval bubbles represent the list of forms in a product gate, and the rectangles enclose forms in a node. The arrows show a path. Starting with the zero ideal, nodes \(p_1 := x^2_1\), \(p_2 := x_2(x_2\,+\,2x_1)\), and \(p_3 := (x_4\,+\, x_2)(x_4\,+\,4x_2 - x_1)(x_4 \,+\, x_2\,+\, x_1)(x_4\,+\, x_2 - 2x_1)\) form a path. Initially, the path is just the zero ideal, so \(x^2_1\) is a node. Note how \(p_2\) is a power of \(x_2\) modulo \(\mathrm{radsp}\langle {p_1} \rangle \), and \(p_3\) is a power of \(x_4\) modulo \(\mathrm{radsp}\langle {p_1,p_2} \rangle \).

Fig. 7.1
figure 1

Nodes and paths in \(C = T_1 + T_2 + T_3 + \cdots \)

The non-identity certificate theorem ([SS13], Theorem 25) states that for any non-identity \(C\), there exists a path \(\varvec{p}\) such that modulo \(\langle {\varvec{p}} \rangle \), \(C\) reduces to a single nonzero multiplication term.

Theorem 3.2 (Certificate for a non-identity)

Let \(I\) be an ideal generated by some multiplication terms. Let \(C=\sum _{i\in [k]}T_i\) be a depth-\(3\) circuit that is nonzero modulo \(I\). Then \(\exists i\in \{0,\ldots ,k-1\}\) such that \(C_{[i]}\) Footnote 3 mod \(I\) has a path \(\varvec{p}\) satisfying: \(C\equiv \alpha \cdot T_{i+1}\not \equiv 0\) \((\mod I+\langle {\varvec{p}} \rangle )\) for some \(\alpha \in \mathbb {F}^*\).

The proof of this theorem involves an extension of Chinese remaindering to ideals that are generated by multiplication terms. Once we have this structural result about depth-\(3\), observe that we would be done if we could somehow ensure \(T_{i+1}\notin \langle {\varvec{p}} \rangle \) (in our application \(I\) is zero). How do we preserve this ideal non-membership under a cheap map?

Notice that the rank of the set \(S_0\) of linear polynomials that divide the nodes in the path \(\varvec{p}\) is \(<\) \(k\) (since path length is below \(k\)). Moreover, \(T_{i+1}\) factors into at most \(d\) linear polynomials, denote the set by \(S_1\). So if we apply a map that preserves the rank of each of the \(d\) sets \(S_0\cup \{\ell \}, \ell \in S_1\), then, intuitively, the ideal non-membership should be preserved. As \({{\mathrm{rk}}}(S_0\cup \{\ell \})\le k\) we can employ the previously discussed map \(\Psi _\beta \) (over a field satisfying \(|\mathbb {F}|>dnk^2\)). This idea could be easily turned into a proof; details are in [SS12].

Finally, what we have achieved is the construction of a map \(\Psi _\beta \), in time \({{\mathrm{poly}}}(dnk)\), that reduces the variables of \(C\) from \(n\) to \(k\) and preserves nonzeroness. Once this is done, the \({{\mathrm{poly}}}(nd^k)\) blackbox PIT follows from the brute-force hitting-set.

7.3.2 Depth \(\ge 3\) Results

Looking at the success of bounded fanin depth-\(3\) one wonders about the analogous depth-\(4\) model:

$$\begin{aligned} C\ =\ \mathop {\sum }_{i\in [k]}\mathop {\prod }_{j\in [d]}f_{i,j},\ \text { where } f_{i,j} \text { are sparse polynomials.} \end{aligned}$$
(7.1)

Here we are thinking of a bounded \(k\). But now even \(k=2\) seems non-trivial! In fact, a simpler PIT case than this is an old open question in a related area [vzG83].

This bounded top fanin depth-\(4\) PIT is an important open question currently. What is doable are other restricted models of depth-\(4\). Inspired from the last subsection we ask: Is there a notion of ‘rank’ for general polynomials, are there easy ‘faithful’ maps, and finally is all this useful in PIT?

There are several notions of rank in commutative algebra. The one we [BMS13] found useful is—transcendence degree (trdeg). We say that a set \(S\) of polynomials \(\{f_1,\ldots ,f_m\}\subset \mathbb {F}[x_1,\ldots ,x_n]\) is algebraically dependent if there exists a nonzero annihilating polynomial \(A(y_1,\ldots ,y_m)\), over \(\mathbb {F}\), such that \(A(f_1,\ldots ,f_m)=0\). The largest number of algebraically independent polynomials in \(S\) is called \({{\mathrm{trdeg}}}(S)\). With this notion we call a homomorphism \(\varphi \) faithful if \({{\mathrm{trdeg}}}(S) = {{\mathrm{trdeg}}}(\varphi (S))\). The usefulness of \(\varphi \) (assuming that one can come up with it efficiently) was first proved in [BMS13]:

Lemma 3.3 (Faithful is useful)

Let \(\varphi \) be a homomorphism faithful to \(\mathbf {f} = \{f_1,\ldots , f_m\} \subset \mathbb {F}[\mathbf {x}]\). Then for any \(C\in \mathbb {F}[\mathbf {y}]\), \(C(\mathbf {f}) = 0 \Leftrightarrow C(\varphi (\mathbf {f})) = 0\).

This implies that we can use a faithful map to ‘reduce’ the number of variables \(n\) without changing the nonzeroness of \(C\). The strategy can be used in cases where \({{\mathrm{trdeg}}}(\mathbf {f})\) is small, say, smaller than a constant \(r\).

The only missing piece is the efficiency of \(\varphi \).Footnote 4 To do this we need three fundamental ingredients—an efficient criterion for algebraic independence (Jacobian), its behaviour under \(\varphi \) (chain rule), and standard maps (Vandermonde and Kronecker-based).

Lemma 3.4 (Jacobian criterion)

Let \(\mathbf {f} \subset \mathbb {F}[\mathbf {x}]\) be a finite set of polynomials of degree at most \(d\), and \({{\mathrm{trdeg}}}(\mathbf {f}) \le r\). If \({{\mathrm{char}}}(\mathbb {F}) = 0\) or \({{\mathrm{char}}}(\mathbb {F}) > d^r\), then \({{\mathrm{trdeg}}}(\mathbf {f}) = {{\mathrm{rk}}}_{\mathbb {F}(\mathbf {x})} \mathcal {J}_{\mathbf {x}}(\mathbf {f})\), where \(\mathcal {J}_{\mathbf {x}}(\mathbf {f}) := \left( \partial f_i/ \partial x_j\right) _{m \times n}\) is the Jacobian matrix.

There are several proofs of this, see [Jac41, For91, BMS13, MSS12]. This gives us an efficient way to capture trdeg, when the characteristic is zero/large. Let us now see how the Jacobian matrix changes under \(\varphi \).

Lemma 3.5 (Chain rule)

\(\mathcal {J}_{\mathbf {y}}(\varphi (\mathbf {f})) = \varphi \left( \mathcal {J}_{\mathbf {x}}(\mathbf {f})\right) \cdot \mathcal {J}_{\mathbf {y}}(\varphi (\mathbf {x}))\), where \(\varphi \) applied to a matrix/set refers to the matrix/set obtained by applying \(\varphi \) to every entry.

This is a simple consequence of the chain rule of ‘derivatives’. It suggests that for \(\varphi \) to preserve the trdeg of the polynomials, we need to control—(1) the image of the original Jacobian under \(\varphi \), and (2) the Jacobian of the image of \(\mathbf {x}\). In our applications, the former is achieved by a Kronecker-based map (i.e. sparse PIT tricks, e.g. [BHLV09]) and the latter by Vandermonde map (as seen in the previous subsection).

This general ‘recipe’ has been successfully implemented to various circuit models. The case of the circuit \(C'(\mathbf {x}) := C(\mathbf {f})\), where \({{\mathrm{trdeg}}}(\mathbf {f}) \le r\) and \(f_i\)’s are polynomials of sparsity at most \(s\), was worked out in [BMS13]. The proof follows exactly the above strategy. The time complexity is polynomial in \({{\mathrm{size}}}(C')\) and \((s\cdot {{\mathrm{deg}}}(C'))^r\), where the exponential dependence comes from the sparsity estimate of \(\mathcal {J}_{\mathbf {x}}(\mathbf {f})\) (and of course the final brute-force hitting-set for the \(r\)-variate \(\varphi (C')\)).

Agrawal et al. [ASSS12] extended the recipe to depth-\(4\) circuits (7.1) where the number of \(f_{i,j}\)’s where any variable appears is bounded by \(r.\) Footnote 5 This model is called occur-r depth-\(4\); it generalizes the well-studied multilinear read-\(r\) depth-\(4\). Interestingly, slightly modified techniques also provided exponential lower bounds against these special models. This required proving some combinatorial properties of the derivatives of immanant (e.g. permanent, determinant).

The faithful maps recipe has been able to unify all the assorted poly-time hitting-sets known. However, one drawback is that it needs the characteristic to be zero/large. Baby steps in resolving that issue have been taken by [MSS12].

7.4 Rank Concentration, Shift, Hitting-Sets

The hitting-sets that we saw till now were for models where some parameter was kept bounded. But we could also study models with a ‘structural’ restriction, e.g. multilinearity. This route has also been successful and enlightening. We call a depth-\(3\) circuit \(C=\sum _i T_i\) multilinear if the linear factors in \(T_i\) involve disjoint variables. Hence, each product gate \(T_i\) induces a partition \(\mathcal {P}_i\) on the variables (or indices) \([n]\). Moreover, we call \(C\) set-multilinear if these partitions are the same across all \(T_i\)’s.

There is a large body of work on the set-multilinear model [RS05, AMS10, FS12, FS13b, ASS13, FS13a, FSS13, AGKS13]. The motivation for this model is, on the one hand, the algebraic concept of tensors, and, on the other hand, the interest in read-once boolean branching programs [Nis92, IMZ12, Vad12]. Interestingly, [FSS13] has shown (extending the ideas of [ASS13]) that the current situation in the arithmetic world is exponentially better than that in the boolean one!

Here we will exhibit the key ideas of [ASS13] and [AGKS13] on two toy cases that are already quite instructive; this saves us from the gory technical machinery that drives the more general cases.

7.4.1 Multilinear ROABP

Agrawal et al. [ASS13] gave the first quasi-poly-time hitting-set for set-multilinear depth-\(3\) (and extensions to constant-depth, non-multilinear versions). This was generalized by [FSS13] to any depth; in fact, they dealt directly with the multilinear ROABP \(D=\prod _i L_i\) over \(M_w(\mathbb {F})\), where \(L_i\)’s are linear polynomials in disjoint variables. Both the papers proved ‘low-support rank concentration’ in their models.

For the following discussion we fix a base commutative ring \(R=H_w(\mathbb {F})\) called the Hadamard algebra (instead of the \(w\times w\) matrix algebra). This is basically \((\mathbb {F}^k,+,\star )\), where \(+\) is the vector addition and \(\star \) is the coordinate-wise vector product (called the Hadamard product).

\(\ell \)-concentration. We say that a polynomial \(f\in R[x_1,\ldots ,x_n]\) is \(\ell \)-concentrated if

$${{\mathrm{rk}}}_\mathbb {F}\{\mathrm{coef }_f (x_S) \mid S \subseteq [n], |S | < \ell \} = {{\mathrm{rk}}}_\mathbb {F}\{\mathrm{coef }_f (x_S) \mid S \subseteq [n]\},$$

where \(\mathrm{coef }_f\) extracts a coefficient in \(f\).

I.e. the coefficient-vectors of ‘lower’ monomials already span every possible coefficient-vector in \(f\). We are interested in studying whether circuits compute an \(\ell \)-concentrated polynomial for small \(\ell \) (say, \(\log n\) instead of \(n\)). By itself this is not true, e.g. the trivial circuit \(D=x_1\cdots x_n\) is not even \(n\)-concentrated. But, maybe we can transform \(f\) a bit and then attain \((\log n)\)-concentration? In this case, \(D':=D(x_1+1,\ldots ,x_n+1)\) is suddenly \(1\)-concentrated!

It was shown by [ASS13] that any \(D\), above \(R\), becomes \((\log k)\)-concentrated after applying a ‘small’ shift; the price of which is \(n^{\log k}\) time. Once we have this it directly applies to the set-multilinear depth-\(3\) model. Since, a depth-\(3\) \(C=\sum _{i\in [k]} T_i\) can be rewritten as \(C=\left[ 1,\ldots ,1\right] \cdot D\), where \(D=\left[ \begin{array}{c}T_1\\ \vdots \\ T_k \end{array} \right] \) is of the promised sort over \(R=H_k(\mathbb {F})\) (since \(D\) completely factorizes into disjoint-variate linear polynomials). So, \(\ell \)-concentration in \(D\) implies an easy way to check \(C\) for zeroness—test the coefficients of the monomials below \(\ell \)-support in \(C\).

Glimpse of a proof We now show how to achieve \(\ell \)-concentration, \(\ell =O(\log k)\), in the following toy model:

$$\begin{aligned} D\ =\ \mathop {\prod }_{i\in [n]} (1+z_ix_i),\, \text { where }z_i\in H_k(\mathbb {F}). \end{aligned}$$
(7.2)

Because of the disjointness of the factors it can be seen, as a simple exercise, that: \(D\) is \(\ell \)-concentrated iff \(D_S:=\prod _{i\in S} (1+z_ix_i)\) is \(\ell \)-concentrated, for all \(S\in \left( {\begin{array}{c}[n]\\ \ell \end{array}}\right) \). Thus, from now on we assume, wlog, \(n=\ell \).

Shift \(D\) by formal variables \(\mathbf {t}\), and normalize, to get a new circuit:

$$D'\ = \ \mathop {\prod }_{i\in [\ell ]} (1+z'_ix_i),\, \text { where }z'_i\in H_k(\mathbb {F}(\mathbf {t})).$$

We can express the new coefficients as:

$$z'_i = z_i/(1+z_it_i),\, \forall i\in [\ell ].$$

Conversely, we write:

$$\begin{aligned} z_i = z'_i/(1-z'_it_i),\, \forall i\in [\ell ]. \end{aligned}$$
(7.3)

We write \(z_S\) for \(\prod _{i\in S} z_i\). Now the goal is to ‘lift’ an \(\mathbb {F}\)-dependence of \(z_S\)’s to the \(z'_S\); which ultimately shows the condition on the shift that shall yield concentration.

Consider the \(2^\ell \) vectors \(\{z_S\; \mid \; S\subseteq [\ell ]\}\). If \(\ell >\log k\) then there is a nontrivial linear dependence amongst these vectors, say,

$$\mathop {\sum }_{S\subseteq [\ell ]} \alpha _S z_S = 0,\, \text { where } \alpha _S\in \mathbb {F}.$$

Rewriting this in terms of \(z'_S\) we get:

$$\begin{aligned} \mathop {\sum }_{S\subseteq [\ell ]} \alpha _S\cdot \mathop {\prod }_{i\in S} z'_i/(1-z'_it_i)&= 0. \nonumber \\ \text {Or, } \mathop {\sum }_{S\subseteq [\ell ]} \alpha _S\cdot z'_S\cdot \mathop {\prod }_{i\in [\ell ]\setminus S} (1-z'_it_i)&= 0. \end{aligned}$$
(7.4)

Let us collect the ‘coefficient’ of \(z'_{[\ell ]}\) in the above expression. It comes out to,

$$\begin{aligned} \mathop {\sum }_{S\subseteq [\ell ]} \alpha _S\cdot (-1)^{|[\ell ]\setminus S |}\cdot t_{[\ell ]\setminus S}. \end{aligned}$$
(7.5)

If we can ensure this expression to be nonzero then Eq. (7.4) tells us that \(z'_{[\ell ]}\) is in the \(\mathbb {F}(\mathbf {t})\)-span of the ‘lower’ \(z'_S\). But, ensuring the nonzeroness of Eq. (7.5) is easy—use \(t_i\)’s such that all the (\(\le \) \(\ell \))-support monomials \(t_S\) are distinct. We can use standard sparse PIT tricks [BHLV09] for this, in time \({{\mathrm{poly}}}(n^\ell )\).

What we have shown is that, after applying a Kronecker-based shift, the circuit \(D\) becomes \(\ell \)-concentrated; all this in time \(n^{O(\log k)}\). This ‘recipe’ of studying the generic shift, via some combinatorial properties of the ‘transfer’ equations (7.3), is generalized in [ASS13] to other \(D\); and further improved in [FSS13] to multilinear ROABP. The latter use a ‘primal’ interpretation of the ‘transfer’ matrix and show that the linear transformation– corresponding to a Kronecker shift together-with the truncation of the high-support monomials –behaves like a rank-extractor.

It is not known how to design such hitting-sets, even for the toy case, in poly-time.

7.4.2 Towards Multilinear Depth-\(3\)

It is tantalizing to achieve \(\ell \)-concentration in multilinear depth-\(3\) (before embarking on the general depth-\(3\)!). A partial result in that direction was obtained in [AGKS13]. We will sketch their ideas in a toy model.

Consider a multilinear depth-\(3\) circuit \(C\) with only two partitions being induced by the product gates—\(\mathcal {P}_1=\left\{ \{1\}, \cdots , \{n\} \right\} \) and an arbitrary partition \(\mathcal {P}_2\). Say, the number of the corresponding product gates is \(k_1\) respectively \(k_2\) (summing to \(k\)). We can say, naturally, that \(\mathcal {P}_1\) is a refinement of \(\mathcal {P}_2\) (denoted \(\mathcal {P}_1\le \mathcal {P}_2\)) because: For every color (or part) \(S\in \mathcal {P}_2\) there exist colors in \(\mathcal {P}_1\) whose union is exactly \(S\). In this refinement situation [AGKS13] showed that, again, a suitable shift in the \(\prod \sum \) circuit \(D\) (corresponding to \(C\)) achieves \(\ell \)-concentration in time \({{\mathrm{poly}}}(n^{\log k})\).

Glimpse of a proof We can assume \(\mathcal {P}_2\) different from \(\mathcal {P}_1\), otherwise this case is no different from the last subsection. We assume that the first \(k_1\) product gates in \(C=\sum _{i\in [k]} T_i\) respect \(\mathcal {P}_1\) and the rest \(k_2\) respect \(\mathcal {P}_2\). The corresponding circuit \(D\) where we desire to achieve concentration is \(D=\left[ \begin{array}{c}T_1\\ \vdots \\ T_k \end{array} \right] \) over \(R=H_k(\mathbb {F})\). But now the linear factors of \(D\) are not necessarily in disjoint variables. Example \(\left[ \begin{array}{c}x_1x_2\\ x_1+x_2 \end{array} \right] = \left( x_1 + \left[ \begin{array}{c} 0\\ 1 \end{array} \right] \cdot x_2 \right) \cdot \left( \left[ \begin{array}{c} 0\\ 1 \end{array} \right] + \left[ \begin{array}{c} 1\\ 0 \end{array} \right] \cdot x_2 \right) \) over \(H_2(\mathbb {F})\).

To get some kind of a reduction to the set-multilinear case, we prove rank concentration in parts. First, we consider those monomials (called \(\mathcal {P}_1\)-type) that could only be produced by the ‘upper’ part of \(D\) (i.e. the first \(k_1\) product gates of \(C\)). Such a monomial, say indexed by \(S\subseteq [n]\), is characterized by the presence of \(i,j\in S\) that are in the same color of \(\mathcal {P}_2\). For a fixed such \(i, j\) we can “access” all such monomials by the derivative \(\partial ^2 D/\partial x_i\partial x_j =: \partial _{i,j}D\). Notice that this differentiation kills the ‘lower’ part of \(D\) and only the \(\mathcal {P}_1\)-part remains. So, we can prove \((2+\log k_1)\)-concentration in the monomials containing \(i,j\) as in Sect. 7.4.1. This proves \(O(\log k_1)\)-concentration in the monomials of \(\mathcal {P}_1\)-type.

Next, we want to understand the remaining monomials (called \(\mathcal {P}_2\)-type); those that could be produced by the ‘lower’ part of \(D\) (i.e. the last \(k_2\) product gates of \(C\)). These, obviously, could also be produced by the upper part of \(D\). Let us fix such a monomial, say \(x_1\cdots x_\ell \). Assume that \(S_1,\ldots ,S_\ell \in \mathcal {P}_2\) are the colors that contain one of the indices \(1,\ldots ,\ell \). Consider the subcircuit \(D_{\ell }\) that in its \(i\)-th coordinate, \(\forall i\in [k]\), simply drops those factors of \(T_i\) that are free of the variables \(S_1\cup \cdots \cup S_\ell \). The problem here is that \(D_{\ell }\) may be a ‘high’ degree circuit (\(\approx n\) instead of \(\ell \)) and so we cannot use a proof like in Sect. 7.4.1.

But, notice that all the degree-\((\ge \ell )\) monomials in \(D_{\ell }\) are \(\mathcal {P}_1\)-type; where we know how to achieve \(\ell \)-concentration. So, we only have to care about degree-(\(\le \) \(\ell \)) \(\mathcal {P}_2\)-type monomials in \(D_{\ell }\). There, again, \((\log k)\)-concentration can be shown using Sect. 7.4.1 and the well-behaved transfer equations.

This sketch, handling two refined partitions, can be made to work for significantly generalized models [AGKS13]. But, multilinear depth-\(3\) PIT is still open (nothing better than exponential time known).

Remark 4.1

Using a different technique [AGKS13] also proves constant- concentration, hence designs poly-time hitting-sets, for certain constant-width ROABP. These models are arithmetic analogs of the boolean ones—width-\(2\) read-once branching programs [AGHP92, NN93] and constant-width read-once permutation branching programs [KNP11].

7.5 Open Ends

The search for a strong enough technique to study arithmetic circuits continues. We collect here some easy-to-state questions that interest us.

Top fanin-2 depth-4 Find a faithful map \(\varphi \) that preserves the algebraic independence of two products-of-sparse polynomials \(\prod _i f_i\) and \(\prod _j g_j\). If we look at the relevant \(2\times 2\) Jacobian determinant, say wrt variables \(X:=\{x_1, x_2\}\), then the question boils down to finding a hitting-set for the special rational function \(\sum _{i,j}\frac{\det \mathcal {J}_X(f_i,g_j)}{f_ig_j}\). Can this version of rational sparse PIT be done in subexponential time?

Independence over \(\mathbb {F}_p\) Currently, there is no subexponential time algorithm/heuristic known to test two given circuits for algebraic independence over a ‘small’ finite field \(\mathbb {F}_p\). The reason is that something as efficient as the Jacobian criterion is not readily available, see [MSS12].

Model in Eqn (7.2) Find a poly-time hitting-set for this simple model. Note that a poly-time whitebox PIT is already known [RS05].

Multilinear depth-3 Achieve \(o(n)\)-concentration in multilinear depth-\(3\) circuits, in \(n^{o(n)}\) time. Here, the presence of an exponential lower bound against the model [RY09] is quite encouraging.