Keywords

1 The State of the Art and Our Progress

Univariate polynomial root-finding has been the central problem of mathematics and computational mathematics for four millennia and remains the subject of intensive research motivated by applications in Computer Algebra and various other areas of computing (see pointers to the huge bibliography in  [18]).

Subdivision iterations traced back to  [5, 14, 20, 22], recently became a leading root-finder due to the progress reported in  [2, 9,10,11]. Our current paper and its companion  [19] represent part of a large work [18] on significant acceleration of the previous algorithm of  [2]. Already the initial implementation of the new algorithm in  [9] shows 3-fold acceleration, but further implementation work should demonstrate substantially stronger progress, including dramatic acceleration in the important case of a black box polynomial, represented by a subroutine for its fast or numerically stable evaluation rather than its coefficients. E.g., black box evaluation is very fast for sparse, Mandelbrot’s polynomials defined by recurrence expressions,

$$\begin{aligned} p_0(x)=1,~p_1(x)=x,~ p_{i+1}(x)=xp_i(x)^2+1~\mathrm{for}~i=0,1,\dots , \end{aligned}$$
(1)

and various other polynomials, while evaluation is numerically stable for polynomials represented in Bernstein or Chebyshev bases. Furthermore, dealing with black box polynomials, we avoid the costly auxiliary stages of Dandelin–Lobachevsky–Gräffe’s recursive root-squaring  [7] and Taylor’s shift of the variable, which greatly slow down the computations of [2].

We refer the reader to  [18] for full exposition of this work, which includes its comparison with other leading root-finders, their acceleration, and Boolean cost estimates, while we occasionally estimate arithmetic complexity where we can control the precision of computing.  [18] specifies a number of directions for further progress in subdivision iterations and fully develops some of them. Their implementation should make the algorithm of  [18] competitive with MPSolve (Multiprecision Polynomial Solver) of  [3, 4], which is the package of root-finding subroutines of user’s choice since 2000. Actually already the previous implementation of subdivision iterations in [10], based on the algorithm of  [2], has slightly outperformed MPSolve for root-finding in a region of the complex plane containing only a small number of roots.

Organization of the Paper. We devote the next short section to background, where, in particular, we briefly cover subdivision iterations. We extensively study their amendments based on the computation of Cauchy sums in a disc on the complex plane in Sect. 3. We devote Sects. 4 and 5 to deterministic and probabilistic support of the application of our Cauchy sum approach to root-counting and exclusion tests for a disc without estimation of the isolation of its boundary circle from the roots.

2 Background

Quite typically in the literature, a polynomial \(p=p(x)\) is represented in monomial basis – with its coefficients,

$$\begin{aligned} p(x)=\sum _{i=0}^dp_ix^i= p_d\prod _{j=1}^d(x-x_j),~p_d\ne 0, \end{aligned}$$
(2)

where we may have \(x_k=x_l\) for \(k\ne l\), but we allow its representation just by a black box for its evaluation.

We study roots numerically: we count a root of multiplicity m as m simple roots and do not distinguish it from their cluster whose diameter is within a fixed tolerance bound.

We deal with the discs \(D(c,\rho )\), annuli \(A(c,\rho ,\rho ')\), and circles \(C(c,\rho )\) having complex centers c and positive radii \(\rho \) and \(\rho '>\rho \).

denotes a primitive qth root of unity.

Definition 1

A domain on the complex plain with a center c and its boundary have an isolation ratio \(\theta \) or equivalently are \(\theta \)-isolated for a polynomial p, real \(\theta \ge 1\), and complex c if the root set of p in the domain is invariant in its \(\theta \)- and \(\frac{1}{\theta }\)-dilation with the center c. In particular (see Fig. 1), a disc \(D(c,\rho )\) and its boundary circle \(C(c,\rho )\) are \(\theta \)-isolated for p, \(\theta \ge 1\), and complex c if no roots of p lie in the open annulus \(A(c,\rho /\theta ,\rho \theta )\). A domain and its boundary are well-isolated if they are \(\theta \)-isolated for \(\theta -1\) exceeding a positive constant.

Fig. 1.
figure 1

The internal disc D(Xr) is R/r-isolated

Subdivision iterations extend the bisection iterations from root-finding on a line to polynomial root-finding in the complex plane and under the name of Quad-tree Construction have been extensively used in Computational Geometry. Their version of Becker et al. Algorithm of  [2] has nearly optimal Boolean complexity, up to polylogarithmic factor in the input size, provided that an input polynomial is represented by its coefficients.Footnote 1

Suppose that we seek all roots of p in a fixed square on the complex plane, which is well isolated from the external roots of p, e.g., contains all d roots; call this square suspect. At a low cost, one can readily compute such a square centered at the origin and containing all roots of p (cf., e.g., [18, Sec. 4.8]).

A subdivision iteration divides every suspect square into four congruent sub-squares and to each of them applies an exclusion test: a sub-square is discarded if the test proves that it contains no roots of p; otherwise the sub-square is called suspect and is processed at the next iteration (see Fig. 2).

Fig. 2.
figure 2

Four roots of p are marked by asterisks; sub-squares that contain them are suspect; the other sub-squares are discarded

A root of p can make at most four squares suspect if an exclusion test enabled us to discard every square that contains no roots of p, and then we would have \(k\le 4\). Realistically, the subdivision processes have been made less expensive overall by means of incorporation of soft exclusion tests, which keep a tested square \(S(c,\rho )\) suspect if a disc \(D(c,u\rho )\) contains a root of p for some u exceeding \(\sqrt{2}\). Then the constant k grows above 4, but the cost of performing exclusion test and the overall cost of subdivision root-finding can decrease.

Exclusion tests are the main computational block; their cost strongly dominates the overall complexity of subdivision root-finding.

At every subdivision iteration, all roots are approximated by the centers of suspect squares, within at most their half-diameter. This bound decreases by twice at every subdivision, and [2, 14, 20] accelerate such a linear convergence to the roots to superlinear – by using Newton’s or QIR iterations, which combine the secant and Newton’s iterations. The transition to faster iterations involves root-counting in a disc on a complex plane – the second main computational block of the algorithms of [2, 14, 20].

Work  [2] proposed a novel exclusion test and root-counter by means of Pellet’s classical theorem, based on pairwise comparison of the absolute values of the coefficients of p. The authors justly referred to its as the main algorithmic novelty versus  [20] and  [14], but  [18] achieves new significant progress based on novel efficient root-counting and exclusion test for a black box polynomial, not handled by  [2].

The new approach relies on the approximation of the power sums of the roots of a polynomial lying in a disc \(D(c,\rho )\) on the complex plane

$$\begin{aligned} s_h=s_h(D(c,\rho )):=\sum _{x_j\in D(c,\rho )}x_j^h=\int _{C(c,\rho )}\frac{p'(x)}{p(x)}~x^h ~dx,~h=0,1,\dots ; \end{aligned}$$
(3)

the latter representation is valid by virtue of Cauchy integral theorem.

3 Power Sums, Cauchy Sums, Root-Counting, and Exclusion Test

3.1 Cauchy Sum Computation

Let \(f:=\prod _{j=1}^{d_\mathrm{in}}(x-x_j)\) where the roots \(x_1,\dots ,x_{d_\mathrm{in}}\) of p lie in a disc \(D(c,\rho )\), while all other roots lie outside it; \(f=p\) for \(d=d_\mathrm{in}\). We approximate the power sums \(s_h\), \(h=0,1,\dots \), of the roots of p in a disc \(D(c,\rho )\) with the Cauchy sums in that disc, which discretize the contour integral above:

$$\begin{aligned} s_h^*: =\frac{1}{q}\sum _{g=0}^{q-1}\zeta ^{(h+1)g}~\frac{p'(c+\rho \zeta ^g)}{p(c+\rho \zeta ^g)}~ \mathrm{for}~h=0,1,\dots ,q-1. \end{aligned}$$
(4)

Algorithm 1. Computation of Cauchy sum in a disc \(D(c,\rho )\).

Fix a positive integer q and assume that \({p(c+\rho \zeta ^g)}\ne 0\) for \(g=0,1,\dots ,q-1\). Successively compute the values

  1. (i)

    \(p(c+\rho \zeta ^g)\), \(p'(c+\rho \zeta ^g)\), \(\sigma _g=\frac{p'(c+\rho \zeta ^g)}{p(c+\rho \zeta ^g)}\) for \(g=0,1,\dots ,q-1\),

  2. (ii)

    \(s^*_h\) of (4) for \(h=0,1,\dots ,q-1\).

\(qs_h^*= \sum _{g=0}^{q-1}\sigma _g\zeta ^{(h+1)g}\) for \(h=q-1,1,2,\dots ,q-2\) are the values of the polynomial \(\sigma (x)=\sum _{g=0}^{q-1}\sigma _g x\) at the qth roots of unity, and so the computation of \(qs_h^*\) for \(h=q-1,1,2,\dots ,q-2\) at stage (ii) is precisely the discrete Fourier transform (DFT) on q points, which one can perform fast by using FFT (see, e.g.,  [15, Section 2.2] or [19, Appendix]). Consequently, at a dominated cost of performing less than \(3q\log _2 q\) arithmetic operations one can extend the computation of the Cauchy sum \(s_0^*\) to the computation of all Cauchy sums \(s_h^*\), \(h=0,1,\dots ,q-1\).

Remark 1

Instead of assuming that \({p(c+\rho \zeta ^g)}\ne 0\) for all g we can ensure these inequalities with probability 1 by applying Algorithm 1 to the polynomial \(t(x)=p(a\frac{x-c}{\rho })\) for \(a =\exp (\phi \mathbf{i)}\) or \(a^q =\exp (\phi \mathbf{i)}\) and a random scalar \(\phi \in [0,2\pi )\).

The constructive proof of the following theorem supports application of Algorithm 1 to a black box polynomial (see  [12] or  [1]).

Theorem 1

Given an algorithm that evaluates at a point x a black box polynomial p(x) over a field \(\mathcal {K}\) of constants by using A additions and subtractions, S scalar multiplications (that is, multiplications by elements from the field \(\mathcal {K}\)), and M other multiplications and divisions, one can extend this algorithm to the evaluation at x of both p(x) and \(p'(x)\) by using \(2A + M\) additions and subtractions, \(2\,S\) scalar multiplications, and \(3\,M\) other multiplications and divisions.

3.2 Cauchy Sums: Their Link to the Roots and Approximation of Power Sums

The following basic result is  [18, Corollary 4.1].

Theorem 2

For the roots \(x_j\) of p(x) and all h, the Cauchy sums \(s_h^*\) of (4) for \(c=0\), \(\rho -1\) satisfy \(s_h^*=\sum _{j=1}^d\frac{x_j^h}{1-x_j^q}~ \mathrm{unless}~ x_j^q=1~ \mathrm{for~some}~j.\)

By virtue of this theorem, the Cauchy sum \(s_h^*\) is the power sums \(s_h=\sum _{j=1}^dx_j^h\) with the weights \(\frac{1}{1-x^q_j}\) assigned to the terms \(x_j^h\) for \(j=1,\dots ,d\). This defines an upper bound on \(|s_h-s_h^*|\) that converges to 0 exponentially fast in \(q-h\).

Corollary 1

[21].Footnote 2 Let the disc D(0, 1) be \(\theta \)-isolated and contain precisely \(d_\mathrm{in}\) roots of p. Write \(\eta :=1/\theta \). Then

$$\begin{aligned} |s_h^*-s_h|\le \frac{d_\mathrm{in}\eta ^{q+h}+(d-d_\mathrm{in})\eta ^{q-h}}{1-\eta ^q}~\mathrm{for}~h=0,1,\dots ,q-1, \end{aligned}$$
(5)

and, in particular,

$$\begin{aligned} s_h=0~\mathrm{and}~|s_h^*|\le \frac{d\eta ^{q+h}}{1-\eta ^q}~\mathrm{for}~h=0,1,\dots ,q-1~\mathrm{if}~d_\mathrm{in}=0. \end{aligned}$$
(6)
$$\begin{aligned} \mu :=|s_0^*-s_0|\le \frac{d}{\theta ^q-1}, ~\mathrm{and~so}~ \mu < 1/2~\mathrm{if}~q>\frac{\log (2d+1)}{\log (\theta )}, \end{aligned}$$
(7)
$$\begin{aligned} \theta \le \Big (\frac{\mu +d}{\mu }\Big )^{1/q},~\mathrm{and~so}~\theta \le (d+1)^{1/q}~\mathrm{if}~\mu =|s_0^*-s_0|\ge 1. \end{aligned}$$
(8)

Corollary 2

Suppose that Algorithm 1, applied to the unit disc D(0, 1) for \(q\ge b\log _2(2d+1)\) and \(b>0\), outputs \(s_0^*>1/2\). Then the disc \(D(0,\theta )\) contains a root of p for \(\theta =2^{1/b}\).

3.3 Cauchy Root-Counting, Cauchy Exclusion Test, and Isolation of a Disc

Clearly the 0th Cauchy sum \(s_0^*\) can serve as a root-counter in a disc if \(\mu =|s_0-s_0^*|<0.5\). By virtue of (7), this holds if \(q\ge \log _{\theta }(2d+1)\), e.g., if \(\theta =2\) and \(d\le 1,000,000\), and then we can choose any \(q\ge 21\).

Remark 2

We can narrow Cauchy root-counting to Cauchy exclusion test if we only check whether \(s_0^*\approx 0\), but we can strengthen this test at a low additional cost by verifying whether \(s_h^*\approx 0\) for \(h=0,\dots ,q-1\). Indeed, (i) we can compute the Cauchy sums \(s_h^*\) for \(h=0,\dots ,q-1\) at the cost of the computation of \(s_0^*\) and performing DFT at q points and (ii) \(s_0=0\) if and only if \(s_h=0\) for \(h=0,1,\dots \).

Hereafter we refer to Algorithm 1 restricted to the computation of \(s_0^*\) as Algorithm 1a. Seeking correct output of a Cauchy root-counter or exclusion test but avoiding unnecessary increase of the parameter q, one can first apply Algorithm 1a for a small integer q and then recursively double it, reusing the results of the previous computations, until the computed values of the Cauchy sum \(s_0^*\) stabilize near an integer or just until they approximate an integer closely enough. Later we prove that such an integer is \(s_0\) with a high probability (hereafter whp) under random root models.

This result supports root-finding computations in  [18, Section 6.4], but not in the subdivision processes of  [2, 14, 20], where root-counting is applied only to well-isolated discs, in which case Algorithm  1a yields non-costly solution \(s_0\) by virtue of Corollary 1. For correctness of our exclusion test, we seek stronger support because in the subdivision iterations of  [2, 14, 20], such a test is applied to the discs for whose isolation ratios no estimates are known. By virtue of Corollary 2, Algorithm 1a applied to such a disc certifies that its controlled dilation contains a root of p unless the algorithm outputs \( s_0^*\) close to 0. The following algorithm completes an exclusion test in the latter case.

Algorithm 2. Completion of a Cauchy soft exclusion test.

Input: A black box polynomial p(x) of degree d such that Algorithm 1a, applied to the disc D(0, 2) andFootnote 3 the polynomial p(x) has output \(s_0^*\) close to 0.

Output: Certification that (i) the disc D(0, 2) contains a root of p definitely if \(q>d\) or whp otherwise or (ii) the unit disc D(0, 1) definitely contains no roots of p, where cases (i) and (ii) are compatible.

Initialization: Choose an integer q such that

$$\begin{aligned} q_0< q\le 2q_0 ~\mathrm{for}~ q_0\ge \max \Big \{1,~\log _2\Big (\frac{d}{q_0\alpha _d\sqrt{3}}\Big )\Big \}~ \mathrm{and}~\alpha _d=\sqrt{d+\sqrt{d}}. \end{aligned}$$
(9)

Computations: Apply Algorithm 1 to the unit disc D(0, 1) for the selected q. Hereafter

$$\begin{aligned} \mathbf{s_*}:=(s_{q-1}^*,s_0^*,s_1^*, \dots ,s_{q-2}^*)^T \end{aligned}$$
(10)

denotes the vector of the values \(s_h^*\) of the Cauchy sums output by the algorithm and \(||\mathbf{s_*}||\) denotes the Euclidean norm \((\sum _{h=0}^{q-1}|s_h^*|^2)^{1/2}\). If \(||\mathbf{s_*}||~q_0~\alpha _d \ge 1\), conclude that the disc \(D(0,\theta )\) definitely contains a root of p. Otherwise conclude that the disc D(0, 1) contains no roots of p definitely if \(q> d\) or whp otherwise.

Work [18] as well as  [19] readily prove that the disc D(0, 2) contains a root of p if \(||\mathbf{s_*}||~q_0~\alpha _d \ge 1\), and this reduces correctness proof of the algorithm to certification that the disc D(0, 1) contains no roots of p if the vector of the q Cauchy sums \(\mathbf{s}_*\) has Euclidean norm satisfying \(||\mathbf{s_*}||~q_0~\alpha _d <1\). This certification, deterministic for \(q>d\) and probabilistic for \(2\le q\le d\), is the subject of the next two sections.

Remark 3

For the computation of Cauchy sums for q of order of d, we should evaluate p and \(p'\) at order of d points; by applying our reduction of multipoint polynomial evaluation (MPE) to fast multipole method (FMM) (see  [18, Appendix E]), we can do this by using order of \(d\log ^2(d)\) arithmetic operations, performed numerically with the precision of order \(\log (d)\) bits. It outputs the vector of the first q Cauchy sums \(s_0,\dots ,s_{q-1}\) within a relative error of order \(\log (d)\). This should be sufficient in order to verify the bounds of Algorithm 2, [18, Theorem 19 and Corollaries 4.2 and 4.3] because FMM is celebrated for being very stable numerically, although further formal and experimental study is in order.

The algorithm runs faster as we decrease integer q, and already under the choice of \(q_0\) of order of \(\log (d)\) the disc D(0, 2) contains a root of p if \(||\mathbf{s_*}||~q_0~\alpha _d \ge 1\). For \(q\le d\), we have only probabilistic support of correctness of Algorithm 2 in the case where \(||\mathbf{s_*}||~q_0~\alpha _d <1\), but we can try to strengthen reliability of our exclusion tests by verifying additional necessary conditions for correctness of our exclusion test and root-counting:

  1. (a)

    the Cauchy sums \(s_h^*\) for \(h=0,1,\dots ,q-1\) still nearly vanish for the polynomials t(x) obtained from p(x) by means of various mappings of the variable x that keep an input disc and the power sum \(s_0\) invariant (cf. Remark 1);

  2. (b)

    an exclusion test should succeed for any disc lying in the disc \(D(c,\rho )\). In particular, if the disc covers a suspect square, then exclusion tests should succeed for the four discs that cover the four congruent sub-squares obtained from sub-dividing the input square;

  3. (c)

    all suspect squares of a subdivision iteration together contain precisely d roots of p.

If these additional necessary conditions hold, it is still plausible that the disc \(D(c,\rho )\) contains a root of p. We can, however, detect whether we have lost any root at the end of the subdivision process, when \(d-w\) roots are tamed, that is, closely approximated, and when w roots remain at large; we call the latter roots wild. If \(0<w\ll d\), then at a low cost, we can deflate the wild factor of p, whose root set is made up of the w wild roots; then we can approximate the roots of this factor at a low cost (see  [18, Section 7]).

It is natural to call a point c a tame root of p if \(r_d(c,p)\le \) TOL for a fixed tolerance TOL. The algorithm of [18, Section 6.2] closely approximates \(r_d(c,p)\) at a relatively low cost, but it is even less expensive to verify whether \(d \left| {p'(c)}/{p(c)} \right| \le \)TOL and then to recall that \(r_d(c,p)\le d \left| {p'(c)}/{p(c)} \right| \) (see [5, Theorem 6.4g]).

Empirical support from the initial implementation and testing of our algorithms in  [9] has substantially superseded their formal support here and in  [18]. In these tests, subdivision iterations with Cauchy exclusion tests by means of Algorithm 1a have consistently approximated the integer \(s_0=0\) within 1/4 for \(q=\lceil \log (4d+1)/\log (4\theta )\rceil \). For discs containing no roots and for \(q=\lceil \log (4d+1)/\log (4\theta )\rceil +1\), Algorithm 1 has consistently approximated both \(s_0\) and \(s_1\) within 1/4 (cf.  [9, equation(22) in Corollary 12 for \(e=1/4\)]).

4 Correctness Certification of an Exclusion Test

In this section, we complete the correctness certification of the exclusion test by means of Algorithm 2.

4.1 Deterministic Certification

We begin with a lemma that implies q linear equations on the coefficients of the polynomial p provided that \(s_h^*=0\) for \(h=0,1,\dots ,q-1\).

Lemma 1

Suppose that \(s_h^*=0\) for a polynomial p(x), \(s_h^*\) of (4), \(c=0\), \(\rho =1\), \(h=0,1,\dots ,q-1\), and a positive q. Then the polynomial \(p'(x)\) is divided by \(x^q-1\).

Proof

Observe that

(11)

denoting the matrix of DFT at q points, and \(\mathbf{s_*}:=(s_{q-1}^*,s_0^*,\dots ,s_{q-2}^*)^T\) of ().

Under the assumptions of the lemma \(\mathbf{s_*}= F\mathbf{v}=\mathbf{0}\) for the vector \(\mathbf{0}\) of length q filled with 0s. Pre-multiply this vector equation by the matrix \(\frac{1}{q}F^*\) of the inverse DFT at q points and obtain that \(\frac{p'(\zeta ^g)}{qp(\zeta ^g)}=0\) for all g. Hence \(p'(\zeta ^g)=0\), for \(g=0,1,\dots ,q-1\), and, therefore, \(x^q-1\) divides \(p'(x)\).

Corollary 3

Under the assumptions of Lemma 1 let \(q\ge d\). Then the polynomial \(p'(x)\) is identically 0, and so the polynomial p(x) is a constant and has no roots unless it is identically 0.

Next we assume that \(q>d\) and extend the corollary under much weaker assumption that \(||\mathbf{s_*}||<\frac{1}{q\alpha _d}\) rather than \(\mathbf{s_*}=\mathbf{0}\). Recall that \(||\mathbf{v}||=(\sum _{i=1}^k|v_i|^2)^{1/2}\) denotes the Euclidean norm of a vector \(\mathbf{v}=(v_i)_{i=1}^kv_i\).

Theorem 3

Given a polynomial p(x) of (2) and a positive integer q, write

$$\begin{aligned} \widehat{p}(x):=p(x)\mod (x^q-1),~ \widehat{p}'(x):=p'(x)\mod (x^q-1), \end{aligned}$$
(12)

and \(\widehat{p}_0:=\widehat{p}(0)\), let , \(\widehat{\mathbf{p}}\), and \(\widehat{\mathbf{p}}_1\) denote the coefficient vectors of the polynomials , \(\widehat{p}(x)\), and \(\widehat{p}(x)-\widehat{p}_0\), respectively, and suppose that

$$\begin{aligned} ||\mathbf{s_*}||=||F\mathbf{v}||\le \tau , \end{aligned}$$
(13)

for \(\mathbf{s_*}=F\mathbf{v}\), F and \(\mathbf{v}\) of (11), and a positive tolerance \(\tau \). Then

$$\begin{aligned} |\widehat{p}_0|^2\ge |(\tau q)^{-2}-1|~||\widehat{\mathbf{p}}_1||^2. \end{aligned}$$
(14)

Proof

Multiply equation (13) by the matrix \(\frac{1}{q}F^*\) of the inverse DFT and obtain

$$\frac{1}{q}F^*\mathbf{s_*}=\mathbf{v}.$$

Hence

$$||\mathbf{v}||\le \frac{1}{q}||F^*||~||\mathbf{s_*}||\le \frac{\tau }{\sqrt{q}}.$$

Therefore,

$$\Big |\Big |\Big (\frac{p'(\zeta ^g)}{qp(\zeta ^g)}\Big )_{g=0}^{q-1}\Big |\Big |\le \frac{\tau }{\sqrt{q}},$$

and consequently

$$||(p'(\zeta ^g))_{g=0}^{q-1}||\le \tau \sqrt{q}~ \max _{g=0}^{q-1}|p(\zeta ^g)|\le \tau q ~ ||(p(\zeta ^g))_{g=0}^{q-1}||.$$

Substitute the equations \(p(\zeta ^g)=\widehat{p}(\zeta ^g)\) and \(p'(\zeta ^g)=\widehat{p}'(\zeta ^g)\) in the above and obtain

$$\begin{aligned} ||(\widehat{p}'(\zeta ^g))_{g=0}^{q-1}||\le \tau q ~ ||(\widehat{p}(\zeta ^g))_{g=0}^{q-1}|| \end{aligned}$$
(15)

for the polynomials \(\widehat{p}'(x)\) and \(\widehat{p}(x)\) of (12) with the coefficient vectors and \(\widehat{\mathbf{p}}\), respectively. Observe that

for the DFT matrix \(F=(\zeta ^{ij})_{i,j=0}^{q-1}\) of (11).

Substitute these expressions into bound (15) and obtain \(||F \widehat{\mathbf{p}}' ||\le \tau q~ ||F \widehat{\mathbf{p}}||\). Hence

$$\begin{aligned} ||\widehat{\mathbf{p}}' ||\le \tau q~ ||\widehat{\mathbf{p}}|| \end{aligned}$$
(16)

because F is a unitary matrix up to scaling by \(\sqrt{q}\).

Furthermore observe that

$$||\widehat{\mathbf{p}}||^2=|\widehat{p}_0|^2 +||\widehat{\mathbf{p}}_1||^2~\mathrm{and}~ ||\widehat{\mathbf{p}}'||^2\ge ||\widehat{\mathbf{p_1}}||^2$$

for \(\widehat{p}_0=\widehat{p}(0)\) and the vector \(\widehat{\mathbf{p}}_1\) of the coefficients of \(\widehat{p}(x)-\widehat{p}_0\).

Combine these observations with bound (16) and obtain the theorem.

Corollary 4

Under the assumptions of Theorem 3 let

$$q>d~\mathrm{and}~(1+\sqrt{d})\tau ^2 q^2\sqrt{d}<1.$$

Then the polynomial p(x) has no roots in the unit disc \(D(0,1)=\{x:~|x|\le 1\}\).

Proof

The bound \((1+\sqrt{d})\tau ^2 q^2<1\) implies that \((\tau q)^{-2}-1>\sqrt{d}\), while the bound \(q>d\) implies that \(\widehat{p}(x)=p(x)\) and \(\widehat{p}_0=p(0)=p_0\). Hence Theorem 3 implies that \(|p_0|^2>\sqrt{d}\sum _{i=1}^d |p_i|^2\) and so \(|p_0|>|p|=\sum _{i=1}^d |p_i|\). The latter bound is impossible if \(p(x)=0\) for \(|x|\le 1\).

Remark 4

We proved this corollary assuming that Cauchy exclusion test by means of Algorithm 2 has been applied to the roots of a polynomial p(x) in the unit disc D(0, 1), but our proof supports such a test for the roots of a polynomial \(t(x)=p((x-c)/\rho )\) in that disc for any complex c and positive \(\rho \). Hence the corollary also holds if the test is applied to the roots of a polynomial p(x) in a disc \(D(c,\rho )\).

4.2 Probabilistic Certification Under Random Coefficient Model

Our next extensions of the result to the case \(2\le q\le d\) are probabilistic under a random coefficient model. We only state asymptotic probability estimates in the case where \(\tau \rightarrow 0\), but specific estimates for fixed bounds on \(\tau \) are implicit in the proofs.

Corollary 5

Define a Random Coefficient Model such that the coefficients \(p_0\), \(p_1\), \(\dots ,p_d\) of p are independent Gaussian random variables having expected values \(a_i\) and positive variance \(\sigma _i^2\), for \(i=0,1,\dots ,d\). Suppose that the Cauchy exclusion test by means of Algorithm 2 has been applied to the unit disc D(0, 1) and a polynomial p(x) under this model and let \(2\le q\le d\). Then the bound of Theorem 3 holds with a probability that fast converges to 0 as \(\tau \rightarrow 0\).

Proof

Let \(d=(k-1)q+l\) for \(k\ge 0\) and \(0\le l\le q-1\) and write

$$\widehat{p}_i:=\sum _{j=0}^{k_i}p_{jq+i},~i=0,1,\dots ,q-1,~k_i=k~\mathrm{for}~i<l,~k_i=k-1~\mathrm{for}~i\ge l.$$

Then

$$\begin{aligned} \widehat{\mathbf{p}}=(\widehat{p}_i)_{i=0}^{q-1}, ~\widehat{\mathbf{p}}_1=(\widehat{p}_i)_{i=1}^{q-1} \end{aligned}$$
(17)

where \(\widehat{p}_i\) for all i are independent Gaussian variables with expected values \(\widehat{a}_i\) and positive variance values \(\widehat{\sigma }_i^2\) given by

$$\begin{aligned} \widehat{a}_i=\sum _{j=0}^{k_i}a_{jq+i}~\mathrm{and}~\widehat{\sigma }_i^2= \sum _{j=0}^{k_i}\sigma _{jq+i}^2,~i=0,1,\dots ,q-1. \end{aligned}$$
(18)

Such variables are strongly concentrated about their expected values. Bound (14) of Theorem  3 implies that

$$|\widehat{a}_0|=|\sum _{j=0}^{k_0}a_{jq}| \ge |(\tau q)^{-2}-1|\max _{i=1}^{q-1} |\widehat{a}_i|$$

for \(\widehat{a}_i\) of (18).Footnote 4 Since \(\widehat{p}_0,\dots ,\widehat{p}_{q-1}\) are independent Gaussian random variables, which are strongly concentrated about their expected values \(\widehat{a}_i\), this inequality strongly restricts the class of polynomials p(x) satisfying (14) for small \(\tau \) and \(q\ge 2\); furthermore, the probability that this inequality and bound (14) hold converges to 0 exponentially fast as \(\tau \rightarrow 0\).

Now suppose that the Cauchy exclusion test has been applied to the disc \(D(c,\rho )\) for a complex c and a positive \(\rho \) and restate the above argument for the polynomial \(t(x):=\sum _{j=0}^dt_jx^j:=p(\frac{x-c}{\rho })\) where (cf. [15, Problem 2.4.3])

$$\begin{aligned} t_j=\sum _{i=j}^dp_i\cdot (-c)^{i-j}\rho ^{-i}\begin{pmatrix} i\\ j \end{pmatrix},~j=0,1,\dots ,d, \end{aligned}$$
(19)

and so the partial sums \(\widehat{t}_j\), \(j=0,1,\dots ,q-1\), are still independent Gaussian variables strongly concentrated about their expected values \(\mathbb {E}(\widehat{t}_j)\). Equations (19) imply that

$$\begin{aligned} \mathbb {E}(\widehat{t}_j)=\sum _{i=j}^da_i\cdot (-c)^{i-j}\rho ^{-i}\begin{pmatrix} i\\ j \end{pmatrix},~j=0,1,\dots ,d. \end{aligned}$$
(20)

Expressions \(\widehat{\mathbf{t}}=(\widehat{t}_i)_{i=0}^{q-1}, ~\widehat{\mathbf{t}}_1=(\widehat{t}_i)_{i=1}^{q-1}\) replace (17), and bound (14) restricts the classes of polynomials t(x) and consequently p(x).

Now observe that \(\lim _{\rho \rightarrow \infty }\widehat{t}_0=t_0\), where \(t_0\) does not depend on \(\rho \). Furthermore, \(t_j\rightarrow 0\) as \(\rho \rightarrow \infty \) for \(j\ne 0\), and so \(\lim _{\rho \rightarrow \infty }\widehat{\mathbf{t}}_1= \mathbf{0}\). Therefore, the value \(\mathbb {E}|\widehat{t}_0|^2\) strongly dominates the value \(\mathbb {E}||\widehat{\mathbf{t}}_1||^2\), and this strongly restricts the the classes of polynomials t(x) and consequently p(x).

Next assume that the ratio \(|p_0|=|t_0|/\max _{i=1}^d|t_i|\) is not large and then argue that bound (14) strongly restricts the classes of polynomials t(x) and p(x). Indeed rewrite equation (20) as follows:

$$\mathbb {E}(t_j)=\sum _{i=j}^du_i(-c)^{-j}\begin{pmatrix} i\\ j \end{pmatrix},~\mathrm{for}~u_i= a_i(-c)^i\rho ^{-i},~i,j=0,1,\dots ,d.$$

Now observe that \(L_0=\mathbb {E}(\widehat{t}_0-t_0)\) and \(L_1=\mathbb {E}(\widehat{t}_1)\) are linear combinations in the same variables \(u_i\), for \(i=0,1,\dots ,d\), whose coefficients are polynomials in \(-1/c\) with positive integer coefficients. Furthermore, such polynomials in \(L_0\) and \(L_1\) consist of the terms \((-c)^j{i\atopwithdelims ()j}\), which make up pairs of terms, such that one term of every pair is in \(L_0\), another is in \(L_1\), and dc exceeds the ratio of these terms in every pair. It follows that bound (14) for \(|(\tau q)^{-2}-1|~|\gg dc\) strongly restricts the classes of polynomials p(x) and t(x). Then again this follows because of the strong concentration of a Gaussian variable about its expected value.

5 Cauchy Root-Counting Under Two Random Root Models

5.1 Error Estimates

Random root models are less popular in the study of root-finding than random coefficient models but still enable some insight into this subject: indeed, if a property holds whp under a random root model, then it must hold for a large input class if not for most of inputs.

Definition 2

Under Random Root Models 1 and 2, the roots of p are iid random variables sampled under the uniform probability distribution from some fixed regions \(\mathbb {D}\) on the complex plain.

  1. 1.

    \(\mathbb {D}\) is the disc D(0, R) for a fixed positive R in Random Root Model 1.

  2. 2.

    \(\mathbb {D}\) is the union of two distinct domains in Random Root Models 2: for a fixed \(R>1\) and a fixed nonnegative integer \(k\le d\), the roots \(x_{k+1},\dots ,x_d\) of p are sampled from a disc D(0, R), but the roots \(\{x_1,\dots ,x_{k}\}\) are sampled from a fixed narrow annulus \(A(0,1/\theta ,\theta )\) about the boundary circle C(0, 1), for a reasonably small positive \(\theta -1\).

The following readily verified theorem implies that the roots of p lie on or near any fixed circle on the complex plain with a low probability (wlp) under Random Root Model 1.

Theorem 4

For a polynomial \(p=p(x)\), \(\theta > 1\), a complex c, and positive \(\rho \) and R such that \(R>\rho +|c|\) and \(\rho \sqrt{d}=O(R)\), assume Random Root Model 1.

(i) Then for any fixed integer j in the range [0, d], the root \(x_j\) lies in the annulus \(A(c,\rho /\theta ,\rho \theta )\) with the probability \(P_{R,\rho ,\theta }=\frac{(\theta ^4-1)\rho ^2}{R^2\theta ^2}\).

(ii) The probability that at least one root of p lies in the annulus \(A(c,\rho /\theta ,\rho \theta )\) is at most \(P_{R,\rho ,\theta }d=\frac{(\theta ^4-1)\rho ^2 d}{R^2\theta ^2}\).

Proof

Recall that the probability \(P_{R,\rho ,\theta }\) is the ratio of the areas \({(\theta ^4-1)\rho ^2}/{\theta ^2}\) and \(\pi R^2\) of the annulus \(A(c,\rho /\theta ,\rho \theta )\) and the disc D(0, R), respectively, and obtain claim (i). Immediately extend it to claim (ii).

Notice that the bound \(P_{R,\rho ,\theta }d\) converges to 0 as \(\frac{\rho }{R}\sqrt{(\theta -1)d}\rightarrow 0\), where the ratio \({\rho }/{R}\) never exceeds 1 and decreases by twice at every subdivision iteration and hence by a factor of d in \(\lceil \log _2(d)\rceil \) iterations.

Combine claim (ii) of the theorem with bound (7) and conclude that the Cauchy sum \(s^*_0\) approximates the power sum \(s_0\) within less than 1/2 unless some roots of p(x) lie on or very close to the boundary circle \(C(c,\rho )\), and the latter property of the roots holds wlp under Random Root Model 1.

5.2 Probabilistic Correctness Verification

The latter claim is no longer valid under Random Root Model 2 because of the impact of the roots lying on or near boundary circle the Cauchy sum \(s_0^*\) can be misleading, that is close to a wrong integer, distinct from \(s_0\). We are going to prove, however, that this can only occur wlp. We begin with an alternative proof that the value \(|s_0^*-s_0|\) is small whp under Random Root Model 1 and then readily extend it to proving similar results under Random Root Model 2.

We will only deduce that

$$\begin{aligned} |s_0^*-u|\le 0.1 v \end{aligned}$$
(21)

wlp P for a fixed complex number u and a fixed positive v under Random Root Models 1 and 2. This will immediately imply that \(|s_0^*-i|\le 0.1 v~\mathrm{for}~i\in \{0,1,\dots ,d\}\) with a probability at most \((d+1)P\). Therefore, if (21) holds, then \(i=s_0\) with a probability at least \(1-(d+1)P\) under Random Root Models 1 and 2.

Lemma 2

Write

$$\begin{aligned} y:=\frac{1}{1-x^q},~ \tilde{y}:=\frac{1}{1-\tilde{x}^q}, ~\mathrm{and}~\delta =|\tilde{y}-y| \end{aligned}$$
(22)

where \(|y|\ge v>10\delta >0\). Then

$$\begin{aligned} |\tilde{x}-x|\le \nabla ~\mathrm{for}~\nabla :=\frac{\delta }{q}\cdot \Big (1+\frac{1}{0.81v^2}\Big )^{(1-q)/q}. \end{aligned}$$
(23)

Proof

Equation (22) implies that

$$x=\Big (1-\frac{1}{y}\Big )^{1/q},~ \tilde{x}=\Big (1-\frac{1}{\tilde{y}}\Big )^{1/q},~\mathrm{and}~\tilde{x}-x= \Big (1-\frac{1}{\tilde{y}}\Big )^{1/q}- \Big (1-\frac{1}{y}\Big )^{1/q}.$$

Apply Taylor–Lagrange formula to the function \(x(y)=\Big (1-\frac{1}{y}\Big )^{1/q}\) and obtain

$$\tilde{x}-x=(\tilde{y}-y)\frac{d}{dy}\Big (\Big (1-\frac{1}{y}\Big )^{1/q }\Big )= \frac{\tilde{y}-y}{q} \Big (1+\frac{1}{(y+\xi )^2}\Big )^{(1-q)/q}$$

for \(\xi \in [0,\tilde{y}-y]\). Substitute \(\delta =|\tilde{y}-y|\) and \(|y+\xi |\ge 0.9|y|\ge 0.9v\).

Theorem 5

For \(R>1\) and any fixed complex number u, write \(s:=\sum _{j=2}^d\frac{1}{1-x_j^q} =\) \(s_0^*-\frac{1}{1-x_1^q}\), \(v:=\frac{1}{R^q-1}\), and \(\delta :=|s_0^*-u|\) for the Cauchy sum \(s_0^*\) in the unit disc D(0, 1) and assume Random Root Model 1. Then \(\delta \le 0.1v\) with a probability at most

$$\begin{aligned} P= \frac{4\nabla }{R}\cdot \Big (1-\frac{\nabla }{R}\Big )~ \mathrm{for}~\nabla ~\mathrm{of~(23)}. \end{aligned}$$
(24)

In particular, bound (24) is close to

$$\begin{aligned} \frac{4\nabla }{R}\approx \frac{4\delta }{Rq} \end{aligned}$$
(25)

if v is a small positive number, in which case \(\nabla \approx \delta /q\).

Proof

Keep the root \(x_1\) random, but fix the other roots \(x_2,\dots ,x_d\). Apply Lemma 2 for \(x=x_1\), \(y=\frac{1}{1-x_1^q}\), \(\tilde{y}=u-s\), and \(\tilde{x}\) such that \(\tilde{y}=\frac{1}{1-\tilde{x}^q}\). Notice that in this case, \(|y|\ge v\) because \(|x_1|\le R\), and so the assumptions of the lemma are fulfilled.

The lemma implies that \(|x_1-\tilde{x}|\le \nabla \) for \(\nabla \) of (23). Hence \(x_1\) lies in the annulus

$$\begin{aligned} A(0,|\tilde{x}|-\nabla ,|\tilde{x}|+\nabla )=\{z:~ |\tilde{x}|-\nabla \le z\le |\tilde{x}|+\nabla \}, \end{aligned}$$
(26)

whose area is maximized for \(\tilde{x}=R-\nabla \) and then reaches

$$\pi \cdot (R^2-(R-2\nabla )^2)= 4\pi \cdot (R-\nabla )\nabla .$$

Divide this by the area \(\pi R^2\) of the disc D(0, R) and obtain bound (24).

We have proved Theorem 5 assuming that the Cauchy sum \(s_0^*\) is computed for the unit disc D(0, 1). We can extend this study to any disc \(D(c,\rho )\) such that \(R+|c|> \rho \) by replacing an input polynomial p(x) with the polynomial \(t(x)=p(\frac{x-c}{\rho })\). We can apply the same proof if we replace the disc D(0, R) with \(D(c,\frac{R-|c|}{\rho })\). This would imply that in the statement of the theorem, R changes into \(R/\rho \) and \(v:=\frac{1}{R^q-1}\) changes into

$$\begin{aligned} v:=\frac{1}{R(c,\rho )^q-1}~\mathrm{for}~R(c,\rho )=\frac{R+|c|}{\rho }>1. \end{aligned}$$
(27)

Here is the resulting extended version of this theorem.

Theorem 6

Let the assumptions of Theorem 5 hold except that now the Cauchy sum \(s_0^*\) has been computed in a sub-disc \(D(c,\rho )\) of the disc D(0, R) for a complex c and a positive \(\rho \). Then \(\delta :=|s_0^*-u|\le 0.1v\) with a probability at most

$$\begin{aligned} P= \frac{4\nabla \rho }{R}\cdot \Big (1-\frac{\nabla }{R}\Big )~ \mathrm{for}~\nabla ~\mathrm{of~(23)}. \end{aligned}$$
(28)

We can readily extend the estimates of Theorems 5 and 6 to the case where the roots are sampled under Random Root Model 2 because under that model, we can prove that the contribution of the roots \(x_1,\dots ,x_k\) to the Cauchy sum \(s_0^*\) moves it close to an integer distinct from \(s_0\) wlp if \(\nabla =o(\theta -1).\) Indeed repeat the proof of Theorem 5 but replace R by \(\theta \) while estimating the area of annulus (26). Then divide this area by the area \(\pi \cdot (\theta ^2-\frac{1}{\theta ^2})\) of the annulus \(A(0,\frac{1}{\theta },\theta )\) and thus extend Theorem 5. Similarly extend Theorem 6.

Theorem 7

Under Random Root Model 2, write \(v:=\frac{1}{\theta ^q-1}\) and fix \(\nabla \) of (23) and a complex number u. Then \(|s_0^*-u|\le 0.1 v\) with a probability at most

$$\begin{aligned} P= \frac{4\nabla \cdot (\rho \theta -\nabla )}{\rho ^2\cdot (\theta ^2-1/\theta ^2)}, ~\mathrm{and~in~particular}~\frac{4\nabla \cdot ( \theta -\nabla )}{\theta ^2-1/\theta ^2}~\mathrm{for}~ \rho =1. \end{aligned}$$
(29)

In the rest of this subsection, we soften the assumption that y can be required to be as small as \(v:=\frac{1}{R^q-1}\) under Random Root Model 1 (cf. (27)). Under Random Root Model 2, (27) turns into quite a reasonable bound \(v:=\frac{1}{\theta ^q-1}\), and we do not need to soften it.

Theorem 8

Let the assumptions of Theorem 5 hold except that we can choose any positive value v. Then \(\delta :=|s_0^*-u|\le 0.1v\) for the Cauchy sum \(s_0^*\) with a probability at most \(P_0+P_1\) for \(\nabla \) of (23),

$$\begin{aligned} P_0=\Big (\frac{r}{R}\Big )^{2d}\le \Big (\frac{1}{R}\Big )^{2d} , ~P_1= \frac{4r\nabla }{R^2-r^2}\le \frac{4\nabla }{R^2-1},~\mathrm{and}~r=\Big |\frac{v-1}{v}\Big |^{1/q}. \end{aligned}$$
(30)

Proof

\(P_0\) bounds the probability that all d independent random variables \(x_1,\dots ,x_d\) lie in the disc D(0, r), whose area is \(\pi r^2\), while the area of the disc D(0, R) is \(\pi R^2\).

Hence \(x_j\) lies outside the disc D(0, r) for at least one j, say, \(j=1\) with a probability at least \(1-P_0\). In this case, \(|y_1|\ge v\), and then Theorem 5 narrows the range for \(x_1\) from the annulus \(A(0,r,R)=D(0,R)-D(0,r)\) to the annulus \(A(0,r-\nabla ,r+\nabla )= D(0,r+\nabla )-D(0,r-\nabla )\) for \(\nabla \) of (23). Obtain the bound \(P_1\) as the ratio of the areas of these two annuli.

We extend Theorem 6 similarly.

Theorem 9

Let the assumptions of Theorem 6 hold except that we can choose any positive value v. Then \(\delta :=|s_0^*-u|\le 0.1v\) for the Cauchy sum \(s_0^*\) with a probability at most \(P_0+P_1\) for

$$\begin{aligned} P_0=\Big (\frac{r\rho }{R}\Big )^{2d},~P_1= \frac{4r\rho \nabla }{R^2-(r\rho )^2}, ~r=\Big |\frac{v-1}{v}\Big |^{1/q},~\mathrm{and}~\nabla ~\mathrm{of~(23)}. \end{aligned}$$
(31)

Remark 5

[Optimization of the probability bound.] For a fixed pair of a disc \(D(c,r/\rho )\) and an integer q, Theorem 9 bounds the probability \(P_0+P_1\) of having \(|s^*-u|\le 0.1 v\) as a function of a single parameter v in the range \([0,\frac{1}{R(c,\rho )^q-1}]\) for \(R(c,\rho )=\frac{R+|c|}{\rho }\). We leave to the reader the challenge of the choice of this parameter that would minimize our bound on \(P_0+P_1\) or a similar bound under the inequalities \(|s_0^*-u|\le \beta u\) for any reasonable choice of a small positive \(\beta \).