Clustering Complex Zeros of Triangular Systems of Polynomials

Imbach, Rémi; Pouget, Marc; Yap, Chee

doi:10.1007/s11786-020-00482-0

Clustering Complex Zeros of Triangular Systems of Polynomials

Published: 17 June 2020

Volume 15, pages 271–292, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematics in Computer Science Aims and scope Submit manuscript

Clustering Complex Zeros of Triangular Systems of Polynomials

Download PDF

Rémi Imbach¹,
Marc Pouget² &
Chee Yap¹

70 Accesses
1 Citation
Explore all metrics

Abstract

This paper gives the first algorithm for finding a set of natural $\epsilon $-clusters of complex zeros of a regular triangular system of polynomials within a given polybox in ${{\mathbb {C}}}^n$, for any given $\epsilon >0$. Our algorithm is based on a recent near-optimal algorithm of Becker et al. (Proceedings of the ACM on international symposium on symbolic and algebraic computation, 2016) for clustering the complex roots of a univariate polynomial where the coefficients are represented by number oracles. Our algorithm is based on recursive subdivision. It is local, numeric, certified and handles solutions with multiplicity. Our implementation is compared to with well-known homotopy solvers on various triangular systems. Our solver always gives correct answers, is often faster than the homotopy solvers that often give correct answers, and sometimes faster than the ones that give sometimes correct results.

Accelerated Subdivision for Clustering Roots of Polynomials Given by Evaluation Oracles

New Practical Advances in Polynomial Root Clustering

Computing the roots of sparse high–degree polynomials that arise from the study of random simplicial complexes

Article 04 June 2019

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

This paper considers the fundamental problem of finding the complex solutions of a system $\mathbf{f}(\mathbf{z})=\mathbf{0}$ of $n$ polynomial equations in $n$ complex variables $\mathbf{z}=(z_1,\ldots ,z_n)$. The system $\mathbf{f}=(f_1,\ldots ,f_n):{\mathbb {C}}^n\rightarrow {\mathbb {C}}^n$ is triangular in the sense that $f_i\in {\mathbb {C}}[z_1,\ldots ,z_i]$ for $1\le i \le n$, where $d_{z_i}(f_i)\ge 1$. As in [7], we assume that the system is regular: this means that for each i, if $(\alpha _1,\ldots ,\alpha _{i-1})$ is a zero of $f_{i-1}$ and $c_i(z_1,\ldots ,z_{i-1})$ is the leading coefficient of $z_i$ in $f_{i}$, then $c_i(\alpha _1,\ldots ,\alpha _{i-1})\ne 0$. Thus $\mathbf{f}$ is a 0-dimensional system. But unlike [7], we do not assume that the system is square-free: indeed the goal of this paper is to demonstrate new techniques that can properly determine the multiplicity of the root clusters of $\mathbf{f}$, up to any $\epsilon >0$ resolution.

Throughout this paper, we use boldface symbols to denote vectors and tuples; for instance $\mathbf{0}$ stands for $(0,\ldots ,0)$.

We are interested in finding clusters of solutions of triangular systems and in counting the total multiplicity of solutions in clusters. Solving triangular systems is a fundamental task in polynomial equations solving, since there are many algebraic techniques to decompose the original system into triangular systems.

The problem of isolating the complex solutions of a polynomial system in an initial region-of-interest (ROI) is defined as follows: let $\texttt {Zero}(\mathbf{B},\mathbf{f})$ denote the set of solutions of $\mathbf{f}$ in $\mathbf{B}$, regarded^{Footnote 1} as a multiset.

This is “local” because we restrict attention to roots in a ROI $\mathbf{B}$. There are two issues with (LIP) as formulated above: deciding if $\texttt {Zero}(\mathbf{\Delta }^j,\mathbf{f})$ is a singleton, and deciding if such a singleton lies in $\mathbf{B}$, are two “zero problems” that require exact computation. Generally, this can only be decided if $\mathbf{f}$ is algebraic. Even in the algebraic case, this may be very expensive. In [4, 27] these two issues are side-stepped by defining the local clustering problem which is described next.

Before proceeding, we fix some general notations for this paper. A polydisc $\varvec{\Delta }$ is a vector $(\Delta _1,\ldots ,\Delta _n)$ of complex discs. The center of $\varvec{\Delta }$ is the vector of the centers of its components and the radius $r (\varvec{\Delta })$ of $\varvec{\Delta }$ is the vector of the radii of its components. If $\delta $ is any positive real number, we denote by $\delta \varvec{\Delta }$ the polydisc $(\delta \Delta _1,\ldots ,\delta \Delta _n)$ that has the same center as $\varvec{\Delta }$ and radius $\delta r(\varvec{\Delta })$. We also say $r(\varvec{\Delta })\le \delta $ if each component of $r(\varvec{\Delta })$ is $\le \delta $. A (square complex) box B is a complex interval $[\ell _1,u_1]+\varvec{i}([\ell _2,u_2])$ where $u_2-\ell _2=u_1-\ell _1$ and $\varvec{i}{\mathrel {\,{:}{=}\,}}\sqrt{-1}$; the width $w(B)$ of B is $u_1-\ell _1$ and the center of B is $u_1+\frac{w(B)}{2} + \varvec{i}(u_2+\frac{w(B)}{2})$. A polybox $\mathbf{B}\subseteq {\mathbb {C}}^n$ is the set $\prod _{i=1}^n B_i$ which is represented by the vector $(B_1,\ldots ,B_n)$ of boxes. The center of $\mathbf{B}$ is the vector of the centers of its components; the width $w(\mathbf{B})$ of $\mathbf{B}$ is the max of the widths of its components. If $\delta $ is any positive real number, we denote by $\delta \mathbf{B}$ the polybox $(\delta B_1,\ldots ,\delta B_n)$ that has the same center than $\mathbf{B}$ and width $\delta w(\mathbf{B})$. It is also convenient to identify $\varvec{\Delta }$ as the subset $\prod _{i=1}^n\varvec{\Delta }_i$ of ${{\mathbb {C}}}^n$; a similar remark applies to $\mathbf{B}$.

We introduce three notions to define the local solution clustering problem. Let $\mathbf{a}\in {\mathbb {C}}^n$ be a solution of $\mathbf{f}(\mathbf{z})=\mathbf{0}$. The multiplicity of $\mathbf{a}$ in $\mathbf{f}$, also called the intersection multiplicity of $\mathbf{a}$ in $\mathbf{f}$ is classically defined by localization of rings as in [28, Def. 1, p. 61], we denote it by $\#(\mathbf{a},\mathbf{f})$. An equivalent definition uses dual spaces, see [12, Def. 1, p. 117]. For any set $S\subseteq {\mathbb {C}}^n$, we denote by $\texttt {Zero}(S,\mathbf{f})$ the multiset of zeros (i.e., solutions) of $\mathbf{f}$ in S, and $\#(S,\mathbf{f})$ the total multiplicity of $\texttt {Zero}(S,\mathbf{f})$. If S is a polydisc, we call $\texttt {Zero}(S,\mathbf{f})$ a cluster if it is non-empty, and S is an isolator of the cluster. If in addition, we have that $\texttt {Zero}(S,\mathbf{f})=\texttt {Zero}(3\cdot S,\mathbf{f})$, we call $\texttt {Zero}(S,\mathbf{f})$ a natural cluster and call S a natural isolator. In the context of numerical algorithm, the notion of cluster of solutions is more meaningful than that of solution with multiplicity since the perturbation of a multiple solution generates a cluster. We thus “soften” the problem of isolating the solutions of a triangular system of polynomial equations while counting their multiplicities by translating it into the local solution clustering problem defined as follows:

In this (LCP) reformulation of (LIP), we have removed the two “zero problems” noted above: we output clusters to avoid the first problem, and we allow the output to contain zeroes outside the ROI $\mathbf{B}$ to avoid the second one. We choose $2\mathbf{B}$ for simplicity; it is easy to replace the factor of 2 by $1+\delta $ for any desired $\delta >0$.

Overview In the remaining of this section we explain our contribution, summarize previous work and the local univariate clustering method of [4]. In Sect. 2, we define the notion of tower of clusters together with a recursive method to compute the sum of multiplicities of the solutions it contains. Section 3 analyzes the loss of precision induced by approximate specialization. Our algorithm for solving the local clustering problem for triangular systems is introduced in Sect. 4. The implementation and experimental results are presented in Sect. 5.

1.1 Our Contributions

We propose an algorithm for solving the complex clustering problem for a triangular system $\mathbf{f}(\mathbf{z})=\mathbf{0}$ with a zero-dimensional solution set. To this end, we propose a formula to count the sum of multiplicities of solutions in a cluster. Our formula is derived from a result of [28] that links the intersection multiplicity of a solution of a triangular system to multiplicities in fibers.

We define towers of clusters to encode clusters of solutions of a triangular system in stacks (or towers) of clusters of roots of univariate polynomials.

Our algorithm exploits the triangular form of $\mathbf{f}=(f_1,\ldots ,f_n)$: the standard idea is to recursively find roots of the form $(\alpha _1,\ldots ,\alpha _{n-1})$ of $f_1=\cdots =f_{n-1}=0$, then substituting them into $f_n$ to obtain a univariate polynomial $g_n(z_n)=f_n(\alpha _1,\ldots ,\alpha _{n-1}, z_n)$. If $\alpha _n$ is a root of $g_n(z_n)$, then we have have extended the solution to $(\alpha _1,\ldots ,\alpha _n)$ of the original $\mathbf{f}$. The challenge is to extend this idea to compute clusters of zeros of $\mathbf{f}$ from clusters of zeros of $f_1=\cdots =f_{n-1}=0$. Moreover, we want to allow the coefficients of each $f_i$ to be oracle numbers. The use of oracle numbers allows us to treat polynomial systems whose coefficients are algebraic numbers and beyond.

To compute clusters of roots of a univariate polynomial given as an oracle, we rely on the recent algorithm described in [4], based on a predicate introduced in [5] that combines Pellet’s theorem and Graeffe iterations to determine the number of roots counted with multiplicities in a complex disc; this predicate is called soft because it only requires the polynomial to be known as approximations. It is used in a subdivision framework combined with Newton iterations to achieve a near optimal complexity.

We implemented our algorithm and made it available as the Julia^{Footnote 2} package Ccluster.jl.^{Footnote 3} Our experiments show that it advantageously compares to major homotopy solvers for solving random dense triangular systems in terms of solving times and reliability (i.e. getting the correct number of solutions and the correct multiplicity structures). Homotopy solving is more general because it deals with any polynomial system. We also propose experiments with triangular systems obtained with elimination procedures.

1.2 Related Work

There is a vast literature on solving polynomial systems and we can only refer to book surveys and references therein, see for instance [13, 25]. On the algebraic side, symbolic tools like Groebner basis, resultant, rational univariate parametrization or triangularization, find an equivalent triangular system or set thus reducing the problem to the univariate case. Being symbolic, these methods handle all input, in particular with solutions with multiplicities, and are certified but at the price of a high complexity that limits their use in practice. Implementations of hybrid symbolic-numeric solvers are available for instance in Singular^{Footnote 4} via solve.lib or in Maple via RootFinding[Isolate].

On the numerical side, one can find subdivision and homotopy methods. The main advantage of subdivision methods is their locality: the practical complexity depends on the size of the solving domain and the number of solutions in this domain. Their main drawback is that they are only practical for low dimensional systems. On the other hand, homotopy methods are efficient for high dimensional systems, they are not local but solutions are computed independently from one another. Numerical methods only work for restricted classes of systems and the certification of the output remains a challenge. Multiprecision arithmetic, interval analysis, deflation and $\alpha $-theory are now classical tools to address this certification issue [6, 15, 22, 26].

In the univariate case, practical certified algorithms are now available for real and complex solving that match the best known complexity bounds together with efficient implementations [17, 19]. For the bivariate case, the problem of solving a triangular system can be seen as a univariate isolation in an extension field. The most recent contributions in this direction presenting algorithms together with complexity analysis are [23, 24].

Only a few work address the specific problem of solving triangular polynomial systems. The solving can then be performed coordinate by coordinate by specialization and univariate solving in fibers. When the systems only have regular solutions, extensions of classical univariate isolation algorithms to polynomial with interval coefficients have been proposed [7, 9, 14]. In the presence of multiple solutions, one approach is to use a symbolic preprocessing to further decompose the system in regular sub-systems. Another approach is the sleeve method with separation bounds [8]. The authors of [28] propose a formula to compute the multiplicity of a solution of a triangular system: the latter multiplicity is the product of the multiplicities of the components of a solution in the fibers. Then, by using square free factorization of univariate polynomials specialized in fibers, they describe an algorithm to retrieve the real solutions of a triangular system with their multiplicities. In [20], the method of Local Generic Position is adapted to the special case of triangular systems with the advantage of only using resultant computations (instead of Goebner basis), multiplicities are also computed.

1.3 Definitions and Notations

Convention for Vectors. We introduce some general conventions for vectors that will simplify the following development. Vectors are indicated by bold fonts. If ${\varvec{v}}=(v_1,\ldots ,v_n)$ is an n-vector, and $i=1,\ldots ,n$, then the i-th component $v_i$ is^{Footnote 5} denoted ${\varvec{v}}_i$ and the i-vector $(v_1,\ldots ,v_i)$ is denoted ${\varvec{v}}_{(i)}$. Thus ${\varvec{v}}=({\varvec{v}}_{(n-1)},{\varvec{v}}_n)$, and “${\varvec{v}}={\varvec{v}}_{(n)}$” is an idiomatic way of saying that ${\varvec{v}}$ is an n-vector. Because of the subscript convention, we will superscript such as ${\varvec{v}}^1, {\varvec{v}}^2$, etc, to distinguish among a set of related n-vectors.

Normed Vector Spaces. In order to do error analysis, we need to treat ${{\mathbb {C}}}[{\varvec{z}}]$ and ${{\mathbb {C}}}^n$ as normed vector spaces: for $f\in {{\mathbb {C}}}[{\varvec{z}}]$ and ${\varvec{b}}\in {{\mathbb {C}}}^n$, let $\Vert f\Vert $ and $\Vert {\varvec{b}}\Vert $ denote the infinity norm on polynomials and vectors, respectively. We use the following perturbation convention: let $\delta \ge 0$. Then we will write $f{\pm } \delta $ to denote some polynomial ${\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]$ that satisfies $\Vert f-{\widetilde{f}}\Vert \le \delta $. Similarly, ${\varvec{b}}{\pm } \delta $ denotes some vector $\widetilde{{\varvec{b}}}\in {{\mathbb {C}}}^n$ that satisfies $\Vert {\varvec{b}}-\widetilde{{\varvec{b}}}\Vert \le \delta $. If $\delta \le 2^{-L}$ then $\widetilde{{\varvec{b}}}$ and $\widetilde{{\varvec{f}}}$ are called L -bit approximations of ${\varvec{b}}$ and ${\varvec{f}}$, respectively.

We define the degree sequence. of $f\in {{\mathbb {C}}}[{\varvec{z}}]$ to be ${\varvec{d}}={\varvec{d}}(f)$ where ${\varvec{d}}_i$ is the degree of $z_i$ in f. If ${\varvec{b}}\in {{\mathbb {C}}}^k$ ($k=1,\ldots ,n$), let $f({\varvec{b}})$ denote the polynomial that results from the substitution ${\varvec{z}}_i \rightarrow {\varvec{b}}_i$ (for $i=1,\ldots ,k$). The result is a polynomial $f({\varvec{b}})\in {{\mathbb {C}}}[z_{k+1},\ldots ,z_n]$ called the specialization of f by ${\varvec{b}}$. Note that $f({\varvec{b}})$ is a polynomial in at most $n-k$ variables. In particular, when $n=k$, then $f({\varvec{b}})$ is a constant (called the evaluation of f at ${\varvec{b}}$). For instance, suppose ${\varvec{b}}\in {{\mathbb {C}}}^n$, then $f({\varvec{b}}_{(n-1)})$ is a polynomial in $z_n$ and $f({\varvec{b}}_{(n-1)})({\varvec{b}}_n)=f({\varvec{b}})$.

If $B\subseteq {\mathbb {C}}$ is a box with center c and width w, we denote by $\Delta (B)$ the disc with center c and radius $\frac{3}{4}w$. Note that $\Delta (B)$ contains B. If $\mathbf{B}\subset {\mathbb {C}}^n$ is a polybox, let $\Delta (\mathbf{B})$ be the polydisc where $\Delta (\mathbf{B})_i = \Delta (\mathbf{B}_i)$.

Oracle Computational Model. We use two kinds of numbers in our algorithms: an explicit kind which is standard in computing, and an implicit kind which we call “oracles”. Our explicit numbers are dyadic numbers (i.e., bigFloats), ${{\mathbb {D}}}{\mathrel {\,{:}{=}\,}}\left\{ n2^m: n,m\in {{\mathbb {Z}}} \right\} $. A pair (n, m) of integers represents the nominal value of $n2^m\in {{\mathbb {D}}}$. However, we also want this pair to represent the interval $[(n-\textstyle {\frac{1}{2}})2^m, (n+\textstyle {\frac{1}{2}})2^m]$. To distinguish between them, we write $(n,m)_0$ for the nominal value, and $(n,m)_1$ for the interval of width $2^m$. Call $(n,m)_1$ an L -bit dyadic interval if $m\le -L$ (so the interval has width at most $2^{-L}$). Note that $(2n,m)_1$ and $(n,m+1)_1$ are different despite having the same nominal value. As another example, note that $(0,m)_1$ is the interval $[-2^{m-1},2^{m-1}]$. When we say a box, disc, polybox, etc, is dyadic, it means that all its parameters are given by dyadic numbers. The set of closed intervals with dyadic endpoints is denoted . Also, let denote n-dimensional dyadic boxes.

The implicit numbers in our algorithms are functions: for any real number $x\in {{\mathbb {R}}}$, an oracle for x is a function such that ${\mathcal {O}}_x(L)$ is an L-bit dyadic interval containing x. There is normally no confusion in identifying the real number x with any oracle function ${\mathcal {O}}_x$ for x. Moreover, we write $(x)_L$ instead of ${\mathcal {O}}_x(L)$. E.g., if x is a real algebraic number with defining polynomial $p\in {{\mathbb {Z}}}[X]$ and isolating interval I, we may define an oracle ${\mathcal {O}}_x={\mathcal {O}}(p,I)$ for x in a fairly standard way. Next, an oracle ${\mathcal {O}}_z$ for a complex number $z=x+\varvec{i}y$ is a function such that ${\mathcal {O}}_z(L)={\mathcal {O}}_x(L)+\varvec{i}{\mathcal {O}}_y(L)$ where ${\mathcal {O}}_x,{\mathcal {O}}_y$ are oracles for x and y. Again, we may identify z with any oracle ${\mathcal {O}}_z$, and write $(z)_L$ instead of ${\mathcal {O}}_z(L)$. For polynomials $f\in {{\mathbb {C}}}[{\varvec{z}}_{(n)}]$ in $n\ge 2$ variables, we assume a sparse representation, $f=\sum _{{\varvec{\alpha }}\in \mathop {Supp}(f)} f_{{\varvec{\alpha }}} {\varvec{z}}^{\varvec{\alpha }}$ with fixed support $\mathop {Supp}(f)\subseteq {{\mathbb {N}}}^n$, with coefficients $f_{\varvec{\alpha }}\in {{\mathbb {C}}}{\setminus }\left\{ 0 \right\} $, and ${\varvec{z}}^{\varvec{\alpha }}{\mathrel {\,{:}{=}\,}}\prod _{i=1}^n {\varvec{z}}_i^{{\varvec{\alpha }}_i}$ are power products. An oracle ${\mathcal {O}}_f$ for f amounts to having oracles for each coefficient $f_{\varvec{\alpha }}$ of f. Moreover ${\mathcal {O}}_f(L)$ may be written $(f)_L$ is the interval polynomial whose coefficients are $(f_\alpha )_L$. Call $(f)_L$ a dyadic interval polynomial.

1.4 Oracles for Root Cluster of Univariate Polynomials

The starting point for this paper is the fundamental result that the Local Clustering Problem (LCP) has been solved in the univariate setting:

Proposition 1

(See [4, 5]) There is an algorithm $Cluster(f, B,\epsilon )$ that solves the Local Clustering Problem when $f:{{\mathbb {C}}}\rightarrow {{\mathbb {C}}}$ is a univariate oracle polynomial.

In other words, the output of $Cluster(f, B,\epsilon )$ is a set $\left\{ (\Delta _i,m_i): i=1,\ldots ,k \right\} $ such that each $\Delta _i$ is a natural $\epsilon $-isolator, and

$$\begin{aligned} \texttt {Zero}(B,f) \subseteq \bigcup _{i=1}^{k} \texttt {Zero}(\Delta _i,f) \subseteq \texttt {Zero}(2B,f). \end{aligned}$$

To make this result the basis of our multivariate clustering algorithm, we need to generalize this result. In particular, we need to be able to further refine each output $(\Delta _i,m_i)$ of this algorithm. If $(\Delta _i,m_i)$ represents the cluster $C_i$ of roots, we want to get better approximation of $C_i$, i.e., we want to treat $C_i$ like number oracles. Fortunately, the algorithm in [4, 5] already contains the tools to do this. What is lacking is a conceptual framework to capture this.

Our goal is to extend the concept of number oracles to “cluster oracles”. To support the several modifications which are needed, we revise our previous view of “oracles as functions”. We now think of an oracle ${\mathcal {O}}$ as a computational object with state information, and which can transform itself in order to update its state information. For any $L\in {{\mathbb {Z}}}$, recall that ${\mathcal {O}}(L)$ is a dyadic object that is at least L-bit accurate. E.g., if ${\mathcal {O}}$ is the oracle for $x\in {{\mathbb {R}}}$, ${\mathcal {O}}(L)$ is an interval containing x of width $\le 2^{-L}$. But now, we say that oracle is transformed to a new oracle which we shall denote by “$({\mathcal {O}})_L$” whose state information is ${\mathcal {O}}(L)$. In general, let $\sigma ({\mathcal {O}})$ denote the state information in ${\mathcal {O}}$. Next, for a cluster $C\subseteq {{\mathbb {C}}}$ of roots of a univariable polynomial $p(z)\in {{\mathbb {C}}}[z]$, its oracle ${\mathcal {O}}_C$ has state $\sigma ({\mathcal {O}}_C)$ that is a pair $(\Delta , m)$ where $\Delta \subseteq {{\mathbb {C}}}$ is a dyadic disc satisfying $C = \texttt {Zero}(\Delta ,p)=\texttt {Zero}(3\Delta ,p)$ and m is the total multiplicity of C. Thus C is automatically a natural cluster. We say ${\mathcal {O}}_C$ is L -bit accurate if the radius of $\Delta $ is at most $2^{-L}$. Intuitively, we expect $({\mathcal {O}}_C)_L$ to be an oracle for C that is L-bit accurate. Unfortunately, this may be impossible unless C is a singleton cluster. In general, we may have to split C into two or more clusters. We therefore need one more extension: the map $L\mapsto ({\mathcal {O}}_C)_L$ returns a set

$$\begin{aligned} \left\{ {\mathcal {O}}_{C_1},\ldots ,{\mathcal {O}}_{C_k} \right\} , \quad (\text {for some } k\ge 1) \end{aligned}$$

of cluster oracles with the property that $C=\cup _{i=1}^k C_i$ (union of multisets), and each ${\mathcal {O}}_{C_i}$ is L-bit accurate. We generalize Proposition 1 so that it outputs a collection of cluster oracles:

Proposition 2

(See [4, 5, 17])

Let ${\mathcal {O}}_f$ be an oracle for a univariate polynomial $f:{{\mathbb {C}}}\rightarrow {{\mathbb {C}}}$. There is an algorithm $ClusterOracle({\mathcal {O}}_f, B, L)$ that returns a set $\left\{ {\mathcal {O}}_{C_i}: i=1,\ldots ,k \right\} $ of cluster oracles such that

$$\begin{aligned} \texttt {Zero}(B,f) \subseteq \bigcup _{i=1}^k C_i \subseteq \texttt {Zero}(2B,f). \end{aligned}$$

and each ${\mathcal {O}}_{C_i}$ is L-bit accurate.

2 Sum of Multiplicities in Clusters of Solutions

We extend in Sect. 2.2 a result of [28] to an inductive formula giving the sum of multiplicities of solutions of a triangular system in a cluster. In Sect. 2.3, we introduce a representation of clusters of solutions of $\mathbf{f}$ called tower representation, reflecting the triangular form of $\mathbf{f}$. Section 2.1 presents two illustrative examples.

2.1 Two Examples

Let $\delta >0$ be an integer. We define the systems $\mathbf{g}(\mathbf{z})=(g_1(z_1),g_2(z_1,z_2))=\mathbf{0}$ and $\mathbf{h}(\mathbf{z})=(h_1(z_1),h_2(z_1,z_2))=\mathbf{0}$ as follows:

$$\begin{aligned} (\mathbf{g}(\mathbf{z})= & {} \mathbf{0}): \left\{ \begin{array}{rcl} (z_1-2^{-\delta })(z_1+2^{-\delta }) &{} = &{} 0\\ (z_2-2^{2\delta }z_1^2)z_2 &{} = &{} 0\\ \end{array}\right. \end{aligned}$$

(1)

$$\begin{aligned} (\mathbf{h}(\mathbf{z})= & {} \mathbf{0}): \left\{ \begin{array}{rcl} (z_1-2^{-\delta })^2(z_1+2^{-\delta }) &{} = &{} 0\\ (z_2+2^{\delta }z_1^2)^2(z_2-1)z_2 &{} = &{} 0\\ \end{array}\right. \end{aligned}$$

(2)

$\mathbf{g}(\mathbf{z})=0$ has 4 solutions: $\mathbf{a}^1=(2^{-\delta },0)$, $\mathbf{a}^2=(2^{-\delta },1)$, $\mathbf{a}^3=(-2^{-\delta },1)$ and $\mathbf{a}^4=(-2^{-\delta },0)$. $\mathbf{h}(\mathbf{z})=0$ has 6 solutions: $\mathbf{a}^1$, $\mathbf{a}^2$, $\mathbf{a}^3$, $\mathbf{a}^4$, $\mathbf{a}^5=(-2^{-\delta },-2^{-\delta })$ and $\mathbf{a}^6=(2^{-\delta },-2^{-\delta })$. For $1\le i\le 6$, let $\mathbf{a}^i=(a_1^i,a_2^i)$. The solutions of both $\mathbf{g}=0$ and $\mathbf{h}=0$ are depicted in Fig. 1.

2.2 Sum of Multiplicities in a Cluster

We recall a theorem of Zhang [28] for counting multiplicities of solutions of triangular systems, based multiplicities in fibers. We may rephrase it inductively:

Proposition 3

[28] Let $n\ge 2$ and $\mathbf{a}\in {\mathbb {C}}^{n}$ be a solution of the triangular system $\mathbf{f}(\mathbf{z})=0$. The multiplicity of $\mathbf{a}$ in $\mathbf{f}$ is

$$\begin{aligned} \#(\mathbf{a},\mathbf{f}) = \#({\mathbf{a}}_n,{\mathbf{f}}_n(\mathbf{\mathbf{a}}_{(n-1)})) \times \#(\mathbf{\mathbf{a}}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)}). \end{aligned}$$

We extend Proposition 3 to a formula giving the total multiplicity of a cluster $\texttt {Zero}(\varvec{\Delta },\mathbf{f})$.

Theorem 1

Let $\texttt {Zero}(\varvec{\Delta },\mathbf{f})$ be a cluster of solutions of the triangular system $\mathbf{f}(\mathbf{z})=0$. If there is an integer $m\ge 1$ so that for any solution $\mathbf{a}\in \texttt {Zero}(\varvec{\Delta },\mathbf{f})$, one has $m=\#({\varvec{\Delta }}_n,{\mathbf{f}}_n(\mathbf{\mathbf{a}}_{(n-1)}))$, then

$$\begin{aligned} \#(\varvec{\Delta },\mathbf{f})=m\times \#(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)}). \end{aligned}$$

where $\#(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)})=1$ when $n=1$.

Let us apply Proposition 3 to compute the multiplicities of solutions of $\mathbf{g}(\mathbf{z})=0$ and $\mathbf{h}(\mathbf{z})=0$ (see Eq. 1 and Eq. 2). $\mathbf{a}^1$ has multiplicity 1 in $\mathbf{g}$: $\#(\mathbf{a}^1,\mathbf{g})=\#(a_1^1,g_1)\times \#(a_2^1,g_2({a_1^1)})=1\times 1$. $\mathbf{a}^1$ has multiplicity 2 in $\mathbf{h}$: $\#(\mathbf{a}^1,\mathbf{h})=\#(a_1^1,h_1)\times \#(a_2^1,h_2({a_1^1}))=2\times 1$. The multiplicities of other solutions are given in Fig. 1.

Let $\mathbf{B}^1=(B^1_1,B^1_2)$ be the polybox centered in (0, 0) having width $2\times 2^{-\delta }$. $\texttt {Zero}(\varvec{\Delta }(\mathbf{B}^1),\mathbf{g}) =\{\mathbf{a}^1,\mathbf{a}^4\}$ and $\#(\varvec{\Delta }(\mathbf{B}^1),\mathbf{g})=2$. Since $\#(\Delta (B^1_2),{\mathbf{g}}_n(\mathbf{\mathbf{a}^1}_{(n-1)})) = \#(\Delta (B^1_2),{\mathbf{g}}_n(\mathbf{\mathbf{a}^4}_{(n-1)}))=1$, applying Theorem 1 yields $\#(\varvec{\Delta }(\mathbf{B}^1),\mathbf{g})=2\times 1$.

$\texttt {Zero}(\varvec{\Delta }(\mathbf{B}^1),\mathbf{h}) =\{\mathbf{a}^1,\mathbf{a}^4, \mathbf{a}^5, \mathbf{a}^6\}$ and $\#(\varvec{\Delta }(\mathbf{B}^1),\mathbf{h})=9$. Again, one has $\#(\Delta (B^1_2),{\mathbf{h}}_n(\mathbf{\mathbf{a}^1}_{(n-1)})) = \#(\Delta (B^1_2),{\mathbf{h}}_n(\mathbf{\mathbf{a}^4}_{(n-1)}))=3$. Thus applying Theorem 1 yields $\#(\varvec{\Delta }(\mathbf{B}^1),\mathbf{h})=3\times 3$.

Let $\mathbf{B}^2$ be the polybox centered in (0, 1) having width $2\times 2^{-\delta }$. One can apply Theorem 1 to obtain $\#(\varvec{\Delta }(\mathbf{B}^2),\mathbf{g})=2\times 1$ and $\#(\varvec{\Delta }(\mathbf{B}^2),\mathbf{h})=3\times 1$. The real parts of $\varvec{\Delta }(\mathbf{B}^1)$ and $\varvec{\Delta }(\mathbf{B}^2)$ are depicted in Fig. 1.

Proof

(of Theorem 1.) Remark that $\texttt {Zero}(\varvec{\Delta },\mathbf{f})=\{ \mathbf{a}\in \varvec{\Delta } | \mathbf{f}(\mathbf{a})=\mathbf{0}\}$ can be defined in an inductive way as $\texttt {Zero}(\varvec{\Delta },\mathbf{f})= \{ (\mathbf{b},c)\in \varvec{\Delta } | \mathbf{b}\in \texttt {Zero}(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)}) \text { and } c\in \texttt {Zero}({\varvec{\Delta }}_n,{\mathbf{f}}_n(\mathbf{b}) ) \}$. Using Proposition 3, we may write

$$\begin{aligned} \#(\varvec{\Delta },\mathbf{f})= & {} \sum \limits _{(\mathbf{b},c)\in \texttt {Zero}(\varvec{\Delta },\mathbf{f})} \#(\mathbf{b},\mathbf{\mathbf{f}}_{(n-1)}) \times \#(c,{\mathbf{f}}_n(\mathbf{b}))\\= & {} \sum \limits _{\mathbf{b}\in \texttt {Zero}(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)})} \left( \#(\mathbf{b},\mathbf{\mathbf{f}}_{(n-1)}) \times \sum \limits _{c\in \texttt {Zero}({\varvec{\Delta }}_n,{\mathbf{f}}_n(\mathbf{b}))} \#(c,{\mathbf{f}}_n(\mathbf{b})) \right) \\= & {} \sum \limits _{\mathbf{b}\in \texttt {Zero}(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)})} \#(\mathbf{b},\mathbf{\mathbf{f}}_{(n-1)}) \times m\\= & {} m\times \#(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)}). \end{aligned}$$

$\square $

2.3 Tower Representation

Definition 1

(Tower Representations) Let $\varvec{\Delta }\subseteq {\mathbb {C}}^n$ be a polydisc and $\mathbf{m}$ a n-vector of positive integers. The pair $(\varvec{\Delta },\mathbf{m})$ is a tower (relative to $\mathbf{f}$) if it satisfies: if $n=1$, then $(\varvec{\Delta },\mathbf{m})=(\varvec{\Delta }_1,\mathbf{m}_1)$ is a cluster representation relative to $\mathbf{f}$. Inductively, if $n>1$ then:

(i):: $(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{m}}_{(n-1)})$ is a tower relative to $\mathbf{\mathbf{f}}_{(n-1)}$
(ii):: $\forall \mathbf{b}\in \mathbf{\varvec{\Delta }}_{(n-1)}$, $({\varvec{\Delta }}_n,{\mathbf{m}}_n)$ is a cluster representation relative to ${\mathbf{f}}_n(\mathbf{b})$.

The height of the tower $(\varvec{\Delta },\mathbf{m})$ is $n$.

If we replace ‘cluster’ by ‘natural cluster’ in the above definition, then $(\varvec{\Delta },\mathbf{m})$ is a natural tower. If $\mathbf{B}$ is a polybox, and $(\Delta (\mathbf{B}),\mathbf{m})$ is a tower relative to $\mathbf{f}$, then we can also call $(\mathbf{B},\mathbf{m})$ a (polybox) tower relative to $\mathbf{f}$. Below, we will only consider natural towers and will omit the word natural.

Let $\mathbf{g},\mathbf{h}$ be defined as in Eqs. 1 and 2 and $\mathbf{B}^1=(B^1_1,B^1_2)$, $\mathbf{B}^2=(B^2_1,B^2_2)$ be as defined in Sect. 2.2. The pair $(\Delta (B^1_1),3)$ is a tower relative to $h_1$. Moreover, if $\delta \ge 3$, $(\Delta (\mathbf{B}^1), (3,3))$ and $(\Delta (\mathbf{B}^2), (3,1))$ are towers relative to $\mathbf{h}$. Consider the polynomial $h_2(z_1,z_2)=(z_2+2^{\delta }z_1^2)^2(z_2-1)z_2$. If $z_2\in 3\Delta (B^1_2)$ then $|z_2|<\frac{3\times 3}{16}<1$ and for any $z_1\in \Delta (B^1_1)$, $h_2$ has 3 roots counted with multiplicity in $\Delta (B^1_2)$ and in $3\Delta (B^1_2)$. Hence for any $b\in \Delta (B^1_1)$, $(\Delta (B^1_2),3)$ is a tower relative to $h_2(b)$ then $(\Delta (\mathbf{B}^1), (3,3))$ is a tower relative to $\mathbf{h}$. Similarly, $(\Delta (\mathbf{B}^2), (3,1))$ is a tower relative to $\mathbf{h}$. $(\mathbf{B}^1, (3,3))$ and $(\mathbf{B}^2, (3,1))$ are (polybox) towers relative to $\mathbf{h}$.

In contrast, although $(\Delta (B^1_1),2)$ is a tower relative to $g_1$, there exist no tower relative to $\mathbf{g}$ having $\mathbf{B}^1$ or $\mathbf{B}^2$ as box: $-2^{-\delta }$, 0 and $2^{-\delta }$ are three points of $B^1_1=B^2_1$; consider the three polynomials $g_2(-2^{-\delta })$, $g_2(0)$ and $g_2(2^{-\delta })$. $g_2(-2^{-\delta })$ and $g_2(2^{-\delta })$ have each 1 root of multiplicity 1 in $B^1_2$ while $g_2(0)$ has 1 root of multiplicity 2 in $B^1_2$: there is no $\mathbf{m}$ that satisfy condition (ii) of Def. 1. In the case of $\mathbf{B}^2$, $g_2(-2^{-\delta })$ and $g_2(2^{-\delta })$ have both 1 root of multiplicity 1 in $B^2_2$ while $g_2(0)$ has no root in $B^2_2$.

An immediate consequence of the previous theorem is

Corollary 4

Let $(\varvec{\Delta },\mathbf{m})$ be a tower relative to $\mathbf{f}$ of height $n>1$. Then

$$\begin{aligned} \#(\varvec{\Delta },\mathbf{f}) = {\mathbf{m}}_n\times \#(\mathbf{\varvec{\Delta }}_{(n-1)},\mathbf{\mathbf{f}}_{(n-1)}). \end{aligned}$$

Inductively, we have $\#(\mathbf {\Delta },\mathbf {f}) = \prod _{i=1}^n \mathbf {m}_i.$

Remark finally that if $(\mathbf{B},\mathbf{m})$ is a tower relative to $\mathbf{\mathbf{f}}_{(n-1)}$ and f is an oracle for ${\mathbf{f}}_n(\mathbf{b})$ for any $\mathbf{b}\in \Delta (\mathbf{B})$, one can use Cluster, as specified in Prop. 1, to compute clusters of ${\mathbf{f}}_n(\mathbf{b})$ for any $\mathbf{b}\in \Delta (\mathbf{B})$ in a box B. If this returns a list $\{ (B^j,m^j) | 1\le j \le l \}$, then for all $1\le j \le l$, $((\Delta (\mathbf{B}),\Delta (B^j)),(\mathbf{m},m^j))$ is a tower relative to $\mathbf{f}$, and from corollary 4, $\#((\Delta (\mathbf{B}),\Delta (B^j)),\mathbf{f})=m^j\times \prod _{k=1}^{n-1}\mathbf{m}_k$. Moreover, $\texttt {Zero}((\mathbf{B},B),\mathbf{f})\subseteq \bigcup _{j=1}^{l} \texttt {Zero}((\Delta (\mathbf{B}),\Delta (B^j)),\mathbf{f}) \subseteq \texttt {Zero}((2\mathbf{B},2B),\mathbf{f})$. In other words, $\{ ((\Delta (\mathbf{B}),\Delta (B^j)),m^j\times \prod _{k=1}^{n-1}\mathbf{m}_k) | 1\le j \le l \}$ is a solution for the clustering problem in $(\mathbf{B},B)$.

We show in Sect. 4 how to setup an oracle for ${\mathbf{f}}_n(\mathbf{b})$ for any $\mathbf{b}\in \Delta (\mathbf{B})$. This oracle may refine $(\mathbf{B},\mathbf{m})$ and split it into several clusters.

3 Error Analysis of Approximate Specializations

The proofs for this section are found in the “Appendix”.

Given $f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]={{\mathbb {C}}}[{\varvec{z}}_{(n)}]$ and ${\varvec{b}}, \widetilde{{\varvec{b}}}\in {{\mathbb {C}}}^n$, our basic goal is to bound the evaluation error

$$\begin{aligned} \Vert f({\varvec{b}})-{\widetilde{f}}(\widetilde{{\varvec{b}}})\Vert \end{aligned}$$

in terms of $\delta _f {\mathrel {\,{:}{=}\,}}\Vert f-{\widetilde{f}}\Vert $ and $\delta _{\varvec{b}}{\mathrel {\,{:}{=}\,}}\Vert {\varvec{b}}-\widetilde{{\varvec{b}}}\Vert $. This will be done by induction on n. Our analysis aims not just to produce some error bound, but to express this error in terms that are easily understood, and which reveals the underlying inductive structure. Towards this end, we introduce the following $\beta $ -bound function: if d is a positive integer and $b\in {{\mathbb {C}}}$,

$$\begin{aligned} \beta (d,b){\mathrel {\,{:}{=}\,}}\sum _{i=0}^d |b|^i. \end{aligned}$$

(3)

Let ${\varvec{d}}= {\varvec{d}}(f)$, i.e., ${\varvec{d}}_i = \mathop {deg}_{{\varvec{z}}_i}(f)$ for each i. The support of f is $\mathop {Supp}(f)\subseteq {{\mathbb {N}}}^n$ where $f = \sum _{{\varvec{\alpha }}\in \mathop {Supp}(f)} c_{\varvec{\alpha }}{\varvec{z}}^{\varvec{\alpha }}$ where $c_{\varvec{\alpha }}\in {{\mathbb {C}}}{\setminus } \left\{ 0 \right\} $. Here, ${\varvec{z}}^{\varvec{\alpha }}{\mathrel {\,{:}{=}\,}}\prod _{i=1}^n {\varvec{z}}_i^{{\varvec{\alpha }}_i}$. We assume that $\mathop {Supp}({\widetilde{f}})\subseteq \mathop {Supp}(f)$.

Our induction variable is $k=1,\ldots ,n$. For ${\varvec{\alpha }}\in {{\mathbb {N}}}^n$, let $\pi _k({\varvec{\alpha }}){\mathrel {\,{:}{=}\,}}(0,\ldots ,0,{\varvec{\alpha }}_{k+1},\ldots ,{\varvec{\alpha }}_n)$. E.g., if $k=n$ then $\pi _k({\varvec{\alpha }})=\varvec{0}$. Thus ${\varvec{\alpha }}-\pi _k({\varvec{\alpha }}) = ({\varvec{\alpha }}_1,\ldots ,{\varvec{\alpha }}_k,0,\ldots ,0)$. Next define $\mathop {Supp}_k(f){\mathrel {\,{:}{=}\,}}\left\{ \pi _k({\varvec{\alpha }}): {\varvec{\alpha }}\in \mathop {Supp}(f) \right\} $. With this notation, we can write

$$\begin{aligned} f = \sum _{{\varvec{\alpha }}\in \mathop {Supp}_k(f)} f_{\varvec{\alpha }}{\varvec{z}}^{\varvec{\alpha }}\end{aligned}$$

(4)

where each $f_{\varvec{\alpha }}\in {{\mathbb {C}}}[{\varvec{z}}_{(k)}]$. E.g., if $k=n$ then $\mathop {Supp}_k(f)=\left\{ \varvec{0} \right\} $ and so $f_{\varvec{0}}=f$.

Assume that we are given $f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]={{\mathbb {C}}}[{\varvec{z}}_{(n)}]$ and ${\varvec{b}},\widetilde{{\varvec{b}}}\in {{\mathbb {C}}}$. Also the degree sequences satisfy

${\varvec{d}}({\widetilde{f}})\le {\varvec{d}}(f)$, that is the inequality holds componentwise. Then we may define these quantities for $k=1,\ldots ,n$:

$$\begin{aligned} \begin{array}{llll} \delta _k {\varvec{b}}&{}{\mathrel {\,{:}{=}\,}}&{} |{\varvec{b}}_k -\widetilde{{\varvec{b}}}_k|, \\ \delta _k f &{}{\mathrel {\,{:}{=}\,}}&{} \Vert f({\varvec{b}}_{(k)})-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(k)})\Vert &{} \qquad (\text {with } \delta _0 f=\Vert f-{\widetilde{f}}\Vert ),\\ {\varvec{\beta }}_k &{}{\mathrel {\,{:}{=}\,}}&{} \beta ({\varvec{d}}_k, {\varvec{b}}_k) \\ \widetilde{\varvec{\beta }}_k &{}{\mathrel {\,{:}{=}\,}}&{} \beta ({\varvec{d}}_k, |{\varvec{b}}_k| + \delta _k{\varvec{b}}). \end{array} \end{aligned}$$

Note that $\delta _k$ is a operator that must attach to some function f or vector ${\varvec{b}}$ to denote the “kth perturbation” of f or ${\varvec{b}}$.

Lemma 5

Let $n\ge 1$ and $k=1,\ldots ,n$:

(i)
$\Vert f({\varvec{b}}_{(k)})-f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k) \Vert \le \delta _k {\varvec{b}}\cdot \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert \cdot \widetilde{\varvec{\beta }}_k.$
(ii)
$\Vert f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k)-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(k)})\Vert \le \delta _{k-1} f \cdot \widetilde{\varvec{\beta }}_k.$
(iii)
$\delta _k f \le \Big [ \delta _k {\varvec{b}}\cdot \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert + \delta _{k-1} f \Big ]\cdot \widetilde{\varvec{\beta }}_k.$

We now have a recursive bound $\Vert \delta _n f\Vert $. But we need to convert the bound to only depend on the data $\Vert {\varvec{b}}\Vert , \Vert f\Vert , \delta _k{\varvec{b}}$. In particular, we remove any occurrences of $\partial _{k}f_{\varvec{\alpha }}$ with the help of the next lemma:

Lemma 6

For $k=1,\ldots ,n$:

(i)
$\Vert f({\varvec{b}}_{(k)})\Vert \le \Vert f\Vert \cdot \prod _{i=1}^k {\varvec{\beta }}_i$
(ii)
For ${\varvec{\alpha }}\in \mathop {Supp}_k(f)$,
$$\begin{aligned} \Big \Vert \partial _{k}f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Big \Vert \le {\varvec{d}}_k\cdot \Vert f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert . \end{aligned}$$
(iii)
$\Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert \le {\varvec{d}}_k \cdot \Vert f\Vert \cdot \prod _{i=1}^{k-1} {\varvec{\beta }}_i$

Putting it all together:

Theorem 2

For $k=1,\ldots ,n$,

$$\begin{aligned} \delta _k f \le \Big [ \delta _0 f + \Vert f\Vert \cdot \sum _{i=1}^k {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big ] \cdot \Big (\prod _{i=1}^k \widetilde{\varvec{\beta }}_i\Big ). \end{aligned}$$

The next lemma answers the question: given $\delta _L>0$, how can we ensure that

$$\begin{aligned} \delta _{n-1}f {\mathrel {\,{:}{=}\,}}\Vert f({\varvec{b}}_{(n-1)})-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(n-1)})\Vert \end{aligned}$$

is upper bounded by $\delta _L$?

Lemma 7

Given $\delta _L>0, f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]$ and ${\varvec{b}},\widetilde{{\varvec{b}}}\in {\mathbb {C}}^{n-1}$ where $n>1.$

Let $d=\max _i (\mathop {deg}_{{{z}}_i}(f))$ and $M=\Vert {\varvec{b}}\Vert +1.$

If

$$\begin{aligned}&\delta _f \le \frac{\delta _L}{2((d+1)M^d)^{n-1}} \quad (*) \end{aligned}$$

and

$$\begin{aligned}&\delta _{\varvec{b}}\le \min (1, \frac{\delta _L}{2d\Vert f\Vert (n-1)((d+1)M^d)^{n-1}}), \quad {(**)} \end{aligned}$$

then $\delta _{n-1} f \le \delta _L.$

4 Clustering for Triangular Systems

We now present our algorithm for solving the LCP for a given triple $(\mathbf{f},\mathbf{B}^0,\epsilon )$, where $\epsilon >0$. Instead of $\epsilon $, we use $L{\mathrel {\,{:}{=}\,}}\lceil \log _2(1/\epsilon ) \rceil $. $\mathbf{B}^0=\mathbf{B}^0_{(n)}$ is the ROI (a polybox). $\mathbf{f}=\mathbf{f}_{(n)}$ is a triangular map with 0-dimensional set of zeros. We give our algorithm in the case where each $\mathbf{f}_i$ is known exactly; it can be generalized for oracle polynomials. It is based on the one-dimensional clustering algorithm (see Prop. Proposition 2), that proceeds by subdividing the initial ROI. The key stone in such a subdivision algorithm is a test that counts the number of roots with multiplicity in a disk. In the one-dimensional case, it is done with the so-called Pellet’s test. Here we use this test with interval polynomials to compute towers. The main objects manipulated in our algorithms are cluster oracles, and their generalization when $n>1$.

Cluster oracles in dimension $n\ge 1$.

A polybox $\mathbf{B}\subseteq {{\mathbb {C}}}^n$ is called an $\ell $ -tower if there exists an $\ell $-vector $\mathbf{m}_{(\ell )}$ such that $(\mathbf{B}_{(\ell )},\mathbf{m}_{(\ell )})$ is a tower. A cluster oracle ${\mathcal {O}}$ in dimension $n>1$ is defined to be a triple

$$\begin{aligned}{\mathcal {O}}=\langle \ell ,\mathbf{B},\mathbf{L}\rangle =\langle level({\mathcal {O}}), domain({\mathcal {O}}), precision({\mathcal {O}})\rangle \end{aligned}$$

where $\ell \in \{0,\ldots ,n\}$ is called the level, $\mathbf{B}$ is a polybox called the domain and $\mathbf{L}$ is a vector of integers called the precision. We will guarantee that if $level({\mathcal {O}})\ge 1$, $domain({\mathcal {O}})$ is an $\ell $-tower and $r(\Delta (domain({\mathcal {O}})_i))\le 2^{-\mathbf {L}_i}, \forall i=1,\ldots ,\ell $. The multiplicity information is implicitly carried out by a cluster oracle.

Cluster oracles at level 1.

We generalize the ClusterOracle algorithm in Proposition 2 to $ClusterOracle1(\mathbf{f},{\mathcal {O}})$, which returns a set $\left\{ {\mathcal {O}}^1,\ldots ,{\mathcal {O}}^k \right\} $ of cluster oracles at level 1. If $L=precision({\mathcal {O}})_1$ and $\mathbf{B}^i$ is the domain of ${\mathcal {O}}^i$, then $(\Delta (\mathbf{B}^i))_1$ has radius at most $2^{-L}$, and $\mathbf{B}^i_j=(domain({\mathcal {O}}))_j$ for $j=2,\ldots ,n$. Moreover, the domains of these ${\mathcal {O}}^i$’s form a cover for the solution set of $\mathbf{f}$ in the domain of ${\mathcal {O}}$. All our oracles are subsequently descended from these ${\mathcal {O}}^i$’s.

Pellet test.

Our goal is to “lift” a cluster oracle ${\mathcal {O}}=\langle \ell ,\mathbf{B},\mathbf{L}\rangle $ to one or more at level $\ell +1$ (provided $\ell <n$) arising from subdividing $\mathbf{B}$. The fundamental tool for this purpose is the “Pellet test” and its variants (Graeffe-accelerated, soft-version, etc. – see [4, 5, 17]). Without distinguishing among these variants, we may describe a generic Pellet test denoted

$$\begin{aligned}T_*(\mathbf{f}_{\ell +1},\mathbf{B}_{(\ell )},\mathbf{B}_{\ell +1})\end{aligned}$$

which returns an integer $m\ge -2$. $m\ge 0$ holds only if $m=\#(\Delta (\mathbf{B}_{\ell +1}),\mathbf{f}_{\ell +1}(\mathbf{B}_{(\ell )}))$, where $\mathbf{f}_{\ell +1}(\mathbf{B}_{(\ell )})$ is the univariate interval polynomial obtained by evaluating $\mathbf{f}_{\ell +1}$ on $\mathbf{B}_{(\ell )}$. If $m\ge 1$, then this implies that $\mathbf{B}$ is an $(\ell +1)$-tower. If $m=0$, $\mathbf{B}$ does not contain zeros of $\mathbf{f}$. If $m=-1$ or $m=-2$, we say that the $T_*$ test failed. These two modes of failure are important to understand for efficiency. Informally^{Footnote 6}$m=-1$ means the disc $\Delta (\mathbf{B}_{\ell +1})$ is not well-isolated (there are zeros near its boundary). In this case, the response is to subdivide $\mathbf{B}_{l+1}$. On the other hand, $m=-2$ means we need more accuracy in the evaluation $\mathbf{f}_{\ell +1}(\mathbf{B}_{(\ell )})$, which requires subdividing components $\mathbf{B}_i$ of $\mathbf{B}$ with $i<\ell $.

Lift of a natural cluster. The lifting process is performed by a function

$$\begin{aligned}ClusterOracleN(\mathbf{f},{\mathcal {O}})\end{aligned}$$

that takes in input a triangular map and a cluster oracle and outputs a pair (flag, S) where $flag\in \{\mathbf{success}, \mathbf{failure}\}$ and S is a set of cluster oracles. It is essentially the clustering algorithm depicted in [4]; it uses the $T_*$-test described above with $\ell =level({\mathcal {O}})$ to count the number of roots in a disc. It returns the pair $(\mathbf{failure}, {\mathcal {O}})$ when one $T_*$-test returns $-2$. When $ClusterOracleN(\mathbf{f},{\mathcal {O}})$ returns $(\mathbf{success}, S)$, then S is a list of pairwise disjoint cluster oracles at level $\ell +1$ so that any solution in ${\mathcal {O}}$ is in a cluster oracle in S.

Solving the LCP problem. We are ready to present our main algorithm, called $ClusterTri(\mathbf{f},\mathbf{B},L)$, described in Algorithm 1. It uses a queue Q to hold the active cluster oracles and lift these clusters level by level to level $n$. Let ${\mathcal {O}}$ be a cluster oracle in Q. If $level({\mathcal {O}})=0$, ${\mathcal {O}}$ is lifted with ClusterOracle1 which returns a set S of cluster oracles at level 1 containing all the solutions in ${\mathcal {O}}$. If $level({\mathcal {O}})>0$, ${\mathcal {O}}$ is lifted with ClusterOracleN, which may fail; in that case, the asked precision for levels less that $\ell $ of ${\mathcal {O}}$ is doubled and it’s level is set to 0, this will force its refining in later executions of the while loop. When the lift of ${\mathcal {O}}$ succeeds, one obtains a set S of cluster oracles at level $\ell +1$.

The correctness of ClusterTri is a direct consequence of the correctness of ClusterOracle1 (see Prop. 2) and ClusterOracleN, and Corollary 4.

The halting of ClusterTri is a consequence of Lemma 7 (equation (**)) which shows that as long as the radius of $\Delta (\mathbf{B}_{(\ell )})$ approaches zero, Pellet test will eventually succeed; thus so does ClusterOracleN.

5 Implementation and Benchmarks

We implemented in Julia^{Footnote 7} our complex solution clustering algorithm and made it available through the package Ccluster.jl.^{Footnote 8} It is named hereafter tcluster. It uses, as routine for clustering roots of univariate polynomials given by approximations, the univariate solver ccluster described in [17] and available in Ccluster.jl. The procedure for approximating a multivariate polynomial specialized in a cluster of fibers relies on the ball arithmetic library arb (see [18]), interfaced in Julia through the package Nemo.^{Footnote 9}

Section 5.1 reports how tcluster performs on systems having clusters of solutions. Section 5.2 proposes benchmarks for solving random dense triangular systems with only regular solutions, and with solutions with multiplicities; tcluster is compared with three homotopy solvers. Section 5.3 is about using tcluster to cluster solutions of system triangularized with regular chains. Unless specified, tcluster is used with $\epsilon =2^{-53}$. tcluster global (resp. local) holds for tcluster with initial box $\mathbf{B}$ centered in $\mathbf{0}$ with width $10^6$ (resp. 2).

All the timings given below are sequential times in seconds on a Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz machine with linux.

5.1 Clustering Ability

Consider the triangular systems $\mathbf{g}=(f,g_2)=\mathbf{0}$ and $\mathbf{h}=(f,h_2)=\mathbf{0}$ where

$$\begin{aligned} \begin{array}{lcl} \qquad \,f(z_1)= z_1^{d_1} - (2^{\delta }z_1-1)^c\\ g_2(z_1,z_2) = z_2^{d_2}z_1^{d_2} -1 \\ h_2(z_1,z_2) = z_2^{d_2} - z_1^{d_2} \end{array} \end{aligned}$$

(5)

with $d_1=30$, $c=10$, $\delta =128$ and $d_2=10$. All the roots of f have multiplicity 1. A cluster $S_1$ of 10 roots is in a disk centered in $2^{-\delta }$ with radius $2^{-b}=2^{-\frac{d_1\delta +\delta -1}{c}}\simeq 2^{-397}$ (see [21]). Since $d_1>c>1$, roots in $S_1$ have modulus $\le 2^{-\delta } + 2^{-b}\le 2^{-\delta +1}=2^{-127}={\hat{\gamma }}$. $S_2$ denotes the set of $d_1-c$ others roots of f, that have a modulus of the order of $\gamma =2^{\frac{c\delta }{d_1-c}}=2^{64}$. The $d_2$ roots of $g_2$ are on a circle centered in 0 with radius $\ge {\hat{\gamma }}^{-1}$ when $z_1\in S_1$, and of order $\gamma ^{-1}$ when $z_1\in S_2$. The $d_2$ roots of $h_2$ are on a circle centered in 0 with radius $\le {\hat{\gamma }}$ when $z_1\in S_1$, and of order $\gamma $ when $z_1\in S_2$. All the solutions of $\mathbf{g}=\mathbf{0}$ and $\mathbf{h}=\mathbf{0}$ are included in the box $\mathbf{B}$ centered in $\mathbf{0}$ with width $10^{40}$.

We computed clusters of solutions for the two systems with tcluster in $\mathbf{B}$ for four values of $\epsilon $ and reported the cluster structure as a sum $c_1 + c_2\times 10 + c_3\times 100$ where $c_1$ (respectively $c_2$, $c_3$) stands for the number of clusters with sum of multiplicities 1 (resp. 10, 100). Table 1 gives this structure in columns #Sols, the solving time in columns t and the min and max precision required on clusters of $f_1$ in columns M and m (i.e. the $log_2$ of the radius of the disk isolating the clusters).

$(\mathbf{g}=\mathbf{0})$ has 20 clusters of 10 solutions above each root in $S_2$, where solutions have pairwise distance $\simeq 2^{-64}$. It has 10 clusters of 10 solutions above the cluster $S_1$ where solutions have pairwise distance $\le 2^{-b}\simeq 2^{-397}$. This structure is found by tcluster with $\epsilon =2^{-53}$. When $\epsilon =2^{-106}$, the 20 clusters above roots in $S_2$ are split, not the ones above roots in $S_1$. When $\epsilon =2^{-212}$, the clusters above roots in $S_1$ are split even if the pairwise distances between solutions in these clusters are far smaller than $2^{-212}$; this is because isolating roots of $g_2$ with error less than $\epsilon =2^{-212}$ requires more precision on roots of f, as shown is column (m,M). When $\epsilon =2^{-424}$, all the clusters are split.

$(\mathbf{h}=\mathbf{0})$ has 200 solutions above roots in $S_2$ and a cluster of 100 simple solutions above roots in $S_1$. The first (resp. second) components of the solutions in this cluster are in a disc of radius $\le 2^{-b}\simeq 2^{-397}$ (resp. ${\hat{\gamma }}=2^{-127}$). This cluster structure is found by tcluster with $\epsilon =2^{-53}$ and $\epsilon =2^{-106}$. When $\epsilon =2^{-212}$, the cluster of 100 solutions is split in 10 clusters of 10 solutions. When $\epsilon =2^{-424}$, the cluster is split in 100 solutions.

Table 1 Clustering the solutions of systems defined in Sect. 5.1 with $d_1=30$, $c=10$, $\delta =128$, $d_2=10$ for four values of $\epsilon $ in box $\mathbf{B}$ centered in $\mathbf{0}$ with width $10^{40}$

Full size table

5.2 Benchmarks with Random Dense Systems

We present benchmarks for randomly generated triangular systems without and with multiple solutions. We compare the efficiency and the robustness of tcluster and two homotopy solvers.

Homotopy solvers. Homotopy solving is a two-step process. First, an upper bound D (either the Bézout’s bound, or a bound obtained with polyhedral homotopy, see [16]) on the number of solutions of the system is computed. Then D paths are followed to find the solutions. Among available homotopy solvers,^{Footnote 10} we used in our benchmarks HOM4PS-2.0,^{Footnote 11}Bertini^{Footnote 12} (see [2]) and HomotopyContinuation.jl^{Footnote 13} (hereafter, we denote it HomCont.jl). HOM4PS-2.0 and HomCont.jl implement polyhedral homotopy, thus follow possibly less paths. Bertini and HomCont.jl compute the multiplicity structure of solutions. Bertini can use an Adaptive Multi-Precision (AMP) arithmetic; below Bertini AMP refers to Bertini with AMP.

Systems. We follow the approach of [8] to generate triangular systems with and without multiple solutions. The type of a triangular system $\mathbf{f}(\mathbf{z})=\mathbf{0}$ with $n$ equations is the list $(d_1,\ldots ,d_n)$ where $d_i=deg_{z_i}(\mathbf{f}_i)$.

A random dense polynomial $\mathbf{f}_i\in {\mathbb {C}}[z_1,\ldots ,z_{i}]$ of degree $d_i$ in $z_i$ is generated as follows. If $i>1$, $\mathbf{f}_i= \sum _{j=0}^{d_i} g_j z_i^j$ where $g_j\in {\mathbb {C}}[z_1,\ldots ,z_{i-1}]$ is a random dense polynomial of degree $d_i-j$ in $z_{i-1}$. $\mathbf{f}_1$ is a random dense polynomial in ${\mathbb {C}}[z_1]$ of degree $d_1$. A system $\mathbf{f}(\mathbf{z})=\mathbf{0}$ of type $(d_1,\ldots ,d_n)$ is obtained by generating successively random dense polynomials $\mathbf{f}_i$ of degrees $d_i$ in $z_i$. Triangular systems with multiple solutions are obtained by taking $\mathbf{f}_1$ as above, and for $i=2,\ldots ,n$, $\mathbf{f}_i= a_i^2(b_iz_i+c_i)^{\lfloor \frac{d_i+1}{2}\rfloor -\lfloor \frac{d_i}{2}\rfloor }$ where $a_i\in {\mathbb {C}}[z_1,\ldots ,z_{i}]$ has degree $\lfloor \frac{d_i}{2}\rfloor $ in $z_i$ and $b_i,c_i$ are in ${\mathbb {C}}[z_1,\ldots ,z_{i-1}]$ and have degrees $d_i$ in $z_{i-1}$.

Benchmarks. In Table 2, we compare the three homotopy solvers and tcluster global and local on triangular systems with integer coefficients without and with multiple solutions. Coefficients of systems without multiple solutions are in $[-2^{9},2^{9}]$, while coefficients of systems with multiple solutions are in $[-2^{34},2^{34}]$. In both cases, we generated 5 systems of each type. Here tcluster global found all the solutions but in general this is not guaranteed.

The columns #Sols give the average number of solutions counted with multiplicities found by each solver and the columns t the average time. The columns #Clus give the average number of clusters found by tcluster. The systems we generated have $d_1\times \cdots \times d_n$ solutions which is the Bézout’s bound, and the homotopy solvers have to follow this number of paths.

Systems with only simple solutions. For type (9,9,9,9,9), Bertini AMP has been stopped after 1 hour and HOM4PS-2.0 terminates with a segmentation fault. Homotopy solvers should find all the solutions. Bertini AMP failed in this task for one system of type (9, 9, 9, 9) and two systems of type (2, 2, 2, 2, 2, 2, 2, 2, 2, 2) but acknowledged that solutions could be missing. HOM4PS-2.0 returns incorrect results without warnings. In contrast, tcluster global always finds the correct number of solutions. tcluster global is in general faster than Bertini AMP and is faster than HOM4PS-2.0 for systems of types (6, 6, 6, 6, 6) and (9, 9, 9, 9). For systems of highest degree polynomials, tcluster global and HomCont.jl present similar solving times. The timings for systems of type (2, 2, 2, 2, 2, 2, 2, 2, 2, 2) emphasize that the efficiency of our solver is not penalized by high dimensional systems since it performs inductively subdivisions in boxes in ${{\mathbb {C}}}$. tcluster local is significantly faster than the other approaches.

Table 2 Solving random dense triangular systems with tcluster, HOM4PS-2.0, Bertini AMP and HomCont.jl

Full size table

Systems with multiple solutions. A well isolated multiple solution is reported by tcluster in a cluster with its multiplicity. In all cases, the number of clusters found by tcluster global is the number of distinct solutions of each systems. HOM4PS-2.0 fails in finding all the solutions. Bertini AMP computes correctly the multiplicity of solutions. HomCont.jl fails in computing correctly the multiplicity structure of solutions. For type (9,9,9), Bertini AMP has been stopped after 1 hour. tcluster global is faster than Bertini AMP and HomCont.jl, and faster than HOM4PS-2.0 for systems of type (9, 9, 9) and (6, 6, 6, 6).

5.3 Systems Obtained by Triangularization

In this subsection, we report on using tcluster for clustering the solutions of triangular systems $\mathbf{f}(\mathbf{z})=\mathbf{0}$ obtained from a non-triangular system $\mathbf{g}(\mathbf{z})=\mathbf{0}$ with Regular Chains (RC, see [1, 10]). Algorithms for triangularizing systems with RC produce a set of triangular systems $\{ \mathbf{f}_1(\mathbf{z})=\mathbf{0}, \ldots , \mathbf{f}_l(\mathbf{z})=\mathbf{0}\}$ having distinct solutions whose union is the set of distinct solutions of $\mathbf {g}(\mathbf {z})=\mathbf {0}$. The multiplicities of solutions are not preserved by this process.

Systems. We consider non-triangular systems $\mathbf{g}(\mathbf{z})=\mathbf{0}$ both classical (coming from [7]), and sparse random where $\mathbf{g}=(g_1,\ldots ,g_n)$ and each $g_i$ has the form $g_i(\mathbf{z}) = z_i^{d_i} - g_i'(\mathbf{z})$ where $g_i'$ is a polynomial in ${\mathbb {Z}}[\mathbf{z}]$ having total degree $d_i-1$, integers coefficients in $[-2^{8},2^{8}]$ and 5 monomials. The type of such a system is the tuple $(d_1,\ldots , d_n)$. The set of all the examples can be found at https://members.loria.fr/MPouget/files/IPY19/IPY19.txt.

The benchmark. For several types, we generated a system as described above and computed a triangular system with the Maple function RegularChains[Triangularize] with option ’probability’=0.9. For the classical systems, we used no option. In Table 3, column RC gives the time to compute the RCs. We solved the triangular systems of the obtained regular chains with tcluster; columns tcluster global report the number of solutions and solving time for tcluster. We also used the function RootFinding[Isolate] of Maple with options digits=15, output=interval, method=’RC’ (i.e. using regular chains) to solve our systems; columns Isolate RC report the number of real solutions and the solving time for Isolate. We also used Bertini AMP to solve the original systems; columns Bertini AMP report the number of paths followed (column #Paths), the solving time and the number of solutions with the multiplicity structure found by Bertini: $c_1 + c_2\times m_2 + c_3\times m_3$ means $c_1$ (respectively $c_2$, $c_3$) solutions with multiplicity 1 (resp. $m_2$, $m_3$). We also tested HOM4PS-2.0 and HomCont.jl for these systems. The running time of HOM4PS-2.0 is always less than 0.05s, but the number of solutions reported is wrong. HomCont.jl always finds the correct number of solutions but is slower than Bertini AMP except for two systems for which polyhedral homotopy allows to reduce the number of paths to be followed. HomCont.jl solves Czapor-Geddes-Wang in 3.67 s and 5-body-homog in 3.44 s.

Table 3 Solving non-triangular systems with regular chains and tcluster, and Bertini AMP

Full size table

Random systems in Table 3. Here the number of solutions is the Bézout’s bound and Bertini AMP follows one path per solution. Homotopy solving in these cases is much more efficient than triangularizing the system with RC. The RC algorithm produces a triangular system of type $(d,1,\ldots ,1)$ where d is the Bézout’s bound with a huge bitsize: For the type (3, 3, 4, 4), the triangular system has type (144, 1, 1, 1) and each equation has bitsize about 738. tcluster has to isolate some solutions of the first equation at precision $2^{-424}$. Solving the first equation with ccluster and $\epsilon =2^{-424}$ takes 8.27s: tcluster spends most of the time in isolating roots of the first polynomial. Any improvement of ccluster will directly benefit to tcluster. For three of these systems, RootFinding[Isolate] has been stopped after 1000s.

Classical systems with only simple solutions in Table 3. These systems have few finite solutions compared to their Bézout’s bounds, and Bertini AMP wastes time in following paths going to infinity. In contrast, tcluster is sensitive to the number of solutions in the initial solving domain. This explains why computing triangular systems and solving it with tcluster is faster than Bertini AMP for systems Arnborg-Lazard, Czapor-Geddes-Wang.

Classical systems with multiple solutions in Table 3. For these systems, Bertini AMP reports the multiplicity structure of the solutions. The triangularization step removes the multiplicity, and the RCs obtained are easier to solve; tcluster finds only clusters with one solution counted with multiplicity.

6 Future Work

We presented an algorithm for computing clusters of complex solutions, together with multiplicity information, of triangular systems of polynomial equations. It is numerical and certified, it handles solutions with multiplicity and works locally. It can deal with systems whose equations are given by oracle polynomials. An implementation is publicly available and the experiments we carried out show the efficiency and robustness of our approach.

Our error analysis for the partial specialization of polynomials on algebraic numbers represented by oracle numbers is a first step towards a complexity analysis of our algorithm, which constitutes our future work. We would like to present such an analysis in terms of geometric parameters (e.g. separation of solutions) instead of only syntactic parameters (bit-size and degree).

Notes

A multiset S is a pair $(\underline{S},\mu )$ where $\underline{S}$ is an ordinary set called the underlying set and $\mu :\underline{S}\rightarrow {\mathbb {N}}$ assigns a positive integer $\mu (x)$ to each $x\in \underline{S}$. Call $\mu (x)$ the multiplicity of x in S, and $\mu (S){\mathrel {\,{:}{=}\,}}\sum _{x\in \underline{S}}\mu (x)$ the total multiplicity of S. Also, let |S| denote the cardinality of $\underline{S}$. If $|S|=1$, then S is called a singleton. We can form the union $S\cup S'$ of two multisets with underlying set $\underline{S}\cup \underline{S'}$, and the multiplicities add up as expected.
https://julialang.org/.
https://github.com/rimbach/Ccluster.jl.
https://www.singular.uni-kl.de/.
In general, ${\varvec{v}}_i\ne v_i$ since ${\varvec{v}}$ and $v_i$ are independent variables. So our bold font variables ${\varvec{v}}$ do not entail the existence of non-bold font counterparts such as $v_i$.
The two failure modes may be traced to our soft comparison of real numbers x : y (see [17, 27]). It is reduced to the interval comparison $(x)_L:(y)_L$ for increasing L. If we can conclude $x>y$ or $x<y$, it is a success, else it is a failure. There are two failure modes: if we can conclude $\textstyle {\frac{1}{2}}x<y< 2x$, this is a $(-1)$-failure (it is a potential “zero problem”). Otherwise it is a $(-2)$-failure (we repeat the interval test with larger L).
https://julialang.org/.
https://github.com/rimbach/Ccluster.jl.
http://nemocas.org/links.html.
Other major homotopy solvers are NAG4M2 (for Macaulay2), PHCpack and HOM4PS-3. Bertini2 is still in development.
http://www.math.nsysu.edu.tw/~leetsung/works/HOM4PS_soft.htm.
https://bertini.nd.edu.
https://www.juliahomotopycontinuation.org/.

References

Aubry, P., Lazard, D., Maza, M.M.: On the theories of triangular sets. J. Symb. Comput. 28(1), 105–124 (1999)
Article MathSciNet Google Scholar
Bates, D.J., Hauenstein, J.D., Sommese, A.J., Wampler, C.W.: Bertini: Software for numerical algebraic geometry. Available at bertini.nd.edu with permanent https://doi.org/10.7274/R0H41PB5
Batra, P.: Globally convergent, iterative path-following for algebraic equations. Math. Comput. Sci. 4(4), 507–537 (2010)
Article MathSciNet Google Scholar
Becker, R., Sagraloff, M., Sharma, V., Xu, J., Yap, C.: Complexity analysis of root clustering for a complex polynomial. In: Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation. pp. 71–78. ISSAC ’16, ACM, New York, NY, USA (2016)
Becker, R., Sagraloff, M., Sharma, V., Yap, C.: A near-optimal subdivision algorithm for complex root isolation based on Pellet test and Newton iteration. J. Symb. Comput. 86, 51–96 (2018)
Article MathSciNet Google Scholar
Beltrán, C., Leykin, A.: Certified numerical homotopy tracking. Exp. Math. 21(1), 69–83 (2012)
Article MathSciNet Google Scholar
Boulier, F., Chen, C., Lemaire, F., Moreno Maza, M.: Real root isolation of regular chains. In: Feng, R., Lee, W.S., Sato, Y. (eds.) Computer Mathematics, pp. 33–48. Springer, Berlin (2014)
Google Scholar
Cheng, J.S., Gao, X.S., Yap, C.K.: Complete numerical isolation of real roots in zero-dimensional triangular systems. J. Symb. Comput. 44(7), 768–785 (2009)
Article MathSciNet Google Scholar
Collins, G.E., Johnson, J.R., Krandick, W.: Interval arithmetic in cylindrical algebraic decomposition. J. Symb. Comput. 34(2), 145–157 (2002)
Article MathSciNet Google Scholar
Dahan, X., Maza, M.M., Schost, E., Wu, W., Xie, Y.: Lifting techniques for triangular decompositions. In: Proceedings of the 2005 International Symposium on Symbolic and Algebraic Computation, pp. 108–115. ISSAC ’05, ACM, New York, NY, USA (2005)
Darboux, G.: Sur les développements en série des fonctions d’une seule variable. Journal de mathématiques pures et appliquées (Liouville Journal) 3(II), 291–312 (1876)
Dayton, B.H., Zeng, Z.: Computing the multiplicity structure in solving polynomial systems. In: Proceedings of the 2005 International Symposium on Symbolic and Algebraic Computation, pp. 116–123. ACM (2005)
Dickenstein, A., Emiris, I. (eds.): Solving Polynomial Equations: Foundations, Algorithms, and Applications. Springer, Berlin (2005)
MATH Google Scholar
Eigenwillig, A., Kettner, L., Krandick, W., Mehlhorn, K., Schmitt, S., Wolpert, N.: A descartes algorithm for polynomials with bit-stream coefficients. In: Ganzha, V., Mayr, E., Vorozhtsov, E. (eds.) CASC, LNCS, vol. 3718, pp. 138–149. Springer, Berlin (2005)
Giusti, M., Lecerf, G., Salvy, B., Yakoubsohn, J.C.: On location and approximation of clusters of zeros: case of embedding dimension one. Found. Comput. Math. 7(1), 1–58 (2007)
Article MathSciNet Google Scholar
Huber, B., Sturmfels, B.: A polyhedral method for solving sparse polynomial systems. Math. Comput. 64(212), 1541–1555 (1995)
Article MathSciNet Google Scholar
Imbach, R., Pan, V.Y., Yap, C.: Implementation of a near-optimal complex root clustering algorithm. In: Davenport, J.H., Kauers, M., Labahn, G., Urban, J. (eds.) Mathematical Software—ICMS 2018, pp. 235–244. Springer, Cham (2018)
Johansson, F.: Arb: efficient arbitrary-precision midpoint-radius interval arithmetic. IEEE Trans. Comput. 66, 1281–1292 (2017)
Article MathSciNet Google Scholar
Kobel, A., Rouillier, F., Sagraloff, M.: Computing real roots of real polynomials ... and now for real! In: Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation, pp. 303–310. ISSAC ’16, ACM, New York, NY, USA (2016)
Li, J., Cheng, J., Tsigaridas, E.: Local generic position for root isolation of zero-dimensional triangular polynomial systems. In: Koepf, W., Vorozhtsov, E. (eds.) CASC 2012—14th International Workshop on Computer Algebra in Scientific Computing. Lecture Notes in Computer Science, vol. 7442, pp. 186–197. Springer, Maribor (2012)
Mignotte, M.: On the distance between the roots of a polynomial. Appl. Algebra Eng. Commun. Comput. 6(6), 327–332 (1995)
Article MathSciNet Google Scholar
Moore, R.E., Kearfott, R.B., Cloud, M.J.: Introduction to Interval Analysis. SIAM, Philadelphia (2009)
Book Google Scholar
Niang Diatta, D., Diatta, S., Rouillier, F., Roy, M.F., Sagraloff, M.: Bounds for polynomials on algebraic numbers and application to curve topology. arXiv e-prints arXiv:1807.10622 (Jul 2018)
Strzebonski, A., Tsigaridas, E.: Univariate real root isolation in an extension field and applications. J. Symb. Comput. 92, 31–51 (2019)
Article MathSciNet Google Scholar
Wampler, I.C.W., et al.: The Numerical Solution of Systems of Polynomials Arising in Engineering and Science. World Scientific, Singapore (2005)
MATH Google Scholar
Xu, J., Burr, M., Yap, C.: An approach for certifying homotopy continuation paths: Univariate case. In: Proceedings of the 2018 ACM International Symposium on Symbolic and Algebraic Computation, pp. 399–406. ISSAC ’18, ACM, New York, NY, USA (2018)
Yap, C., Sagraloff, M., Sharma, V.: Analytic root clustering: a complete algorithm using soft zero tests. In: The Nature of Computation. Logic, Algorithms, Applications. LNCS, vol. 7921, pp. 434–444. Springer, Berlin (2013)
Zhang, Z., Fang, T., Xia, B.: Real solution isolation with multiplicity of zero-dimensional triangular systems. Sci. China Inf. Sci. 54(1), 60–69 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Courant Institute of Mathematical Sciences, New York University, New York, USA
Rémi Imbach & Chee Yap
Université de Lorraine, CNRS, Inria, LORIA, 54000, Nancy, France
Marc Pouget

Authors

Rémi Imbach
View author publications
You can also search for this author in PubMed Google Scholar
Marc Pouget
View author publications
You can also search for this author in PubMed Google Scholar
Chee Yap
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rémi Imbach.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rémi’s work is supported by the European Union’s Horizon 2020 research and innovation programme No. 676541, NSF Grants # CCF-1563942, # CCF-1564132 and # CCF-1708884. Chee’s work is supported by NSF Grants # CCF-1423228 and # CCF-1564132.

Appendix: Error Analysis

This Appendix contains all the proofs for our error analysis. Section 3 is an excerpt.

Given $f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]$ and ${\varvec{b}}, \widetilde{{\varvec{b}}}\in {{\mathbb {C}}}^n$, our basic goal is to bound the evaluation error

$$\begin{aligned} \Vert f({\varvec{b}})-{\widetilde{f}}(\widetilde{{\varvec{b}}})\Vert \end{aligned}$$

in terms of $\delta _f {\mathrel {\,{:}{=}\,}}\Vert f-{\widetilde{f}}\Vert $ and $\delta _{\varvec{b}}{\mathrel {\,{:}{=}\,}}\Vert {\varvec{b}}-\widetilde{{\varvec{b}}}\Vert $. This will be done by induction on n. Our analysis aims not just to produce some error bound, but to express this error in terms that are easily understood, and which reveals the underlying inductive structure. Towards this end, we introduce the following $\beta $ -bound function: if d is a positive integer and $b\in {{\mathbb {C}}}$,

$$\begin{aligned} \beta (d,b){\mathrel {\,{:}{=}\,}}\sum _{i=0}^d |b|^i. \end{aligned}$$

(6)

A simple application of this $\beta $-bound is:

Lemma 8

Let $b\in {{\mathbb {C}}}$ and $f\in {{\mathbb {C}}}[z]$. If d is the degree of f, then

$$\begin{aligned} |f(b)|\le \Vert f\Vert \cdot \beta (d, b),\qquad |f'(b)|\le d\Vert f\Vert \cdot \beta (d-1, b). \end{aligned}$$

Note that $\beta (d,b)\le \max \left\{ d+1, \frac{|b|^{d+1}-1}{|b|-1} \right\} $.

We first treat the case $n=1$. It will serve as the base for the inductive proof. Its proof requires a complex version of the Mean Value Theorem. Since this result is not well-known, we provide a statement and proof.

Theorem 3

(Complex mean value theorem) If $f:{{\mathbb {C}}}\rightarrow {{\mathbb {C}}}$ is holomorphic, then for any $a, b\in {{\mathbb {C}}}$,

$$\begin{aligned} f(b)-f(a) = \omega \cdot (b-a)\cdot f'(\xi ) \end{aligned}$$

for some $\xi $ in the line segment [a, b] and some $\omega \in {{\mathbb {C}}}$ with $|\omega |\le 1$.

Proof

This is a simple application of a similarly little known theorem of Darboux [11] which gives a finite Taylor expansion of f; see Bünger’s formulation and proof in [3, Appendix]. For any $k\ge 1$, the theorem says

$$\begin{aligned} f(b)=\sum _{i=0}^{k-1} \frac{(b-a)^i}{i!}f^{(i)}(a) + \omega \frac{(b-a)^k}{k!} f^{(k)}(\xi ) \end{aligned}$$

for some $\xi $ in the line segment [a, b] and $\omega \in {{\mathbb {C}}}$ with $|\omega |\le 1$. Choosing $k=1$, $f(b)=f(a)+\omega (b-a)f'(\xi )$ or $f(b)-f(a)=\omega (b-a)f'(\xi )$. $\square $

Corollary 9

(Complex mean value inequality) For all $a,b\in {{\mathbb {C}}}$, there is some $\xi \in [a,b]$ such that

$$\begin{aligned} |f(b)-f(a)| \le |b-a|\cdot |f'(\xi )|. \end{aligned}$$

Lemma 10

(Case $n=1$) Let $f,{\widetilde{f}}\in {{\mathbb {C}}}[z]$, $b,\widetilde{b}\in {{\mathbb {C}}}$, and ${\varvec{d}}({\widetilde{f}})\le {\varvec{d}}(f) \le d$. If ${\widetilde{f}}= f{\pm }\delta _f$ and $\widetilde{b}=b{\pm } \delta _b$, then:

(i)
$|f(b)-f(\widetilde{b})|$ $\le $ $\delta _b\cdot \Vert f'\Vert \cdot \beta (d,|b|+\delta _b)$ where $f'$ is the differentiation of f.
(ii)
$|f(\widetilde{b})-{\widetilde{f}}(\widetilde{b})|$ $\le $ $\delta _f \cdot \beta (d,|b|+\delta _b)$
(iii)
$|f(b)-{\widetilde{f}}(\widetilde{b})|$ $\le $ $\Big [ \delta _f + \delta _b \cdot \Vert f'\Vert \Big ]\cdot \beta (d,|b|+\delta _b).$

Proof

(i)
By the complex mean value inequality (Corollary 9):
$$\begin{aligned} | f(b) -f(\widetilde{b}) |\le & {} |b -\widetilde{b}| \cdot |f'(b{\pm }\delta _b)| \\\le & {} \delta _b \cdot \sum _{i=1}^{d} \left| i f_i (|b|+\delta _b)^{i-1} \right| \\\le & {} \delta _b \cdot \Vert f'\Vert \sum _{i=0}^{d-1} \left| (|b|+\delta _b)^i\right| . \end{aligned}$$
(ii)
Also
$$\begin{aligned} | f(\widetilde{b}) -{\widetilde{f}}(\widetilde{b}) |= & {} \Big |\sum _{i=0}^d (f_i-{\widetilde{f}}_i)\widetilde{b}^i\Big | \\\le & {} \delta _f\cdot \sum _{i=0}^d \Big |\widetilde{b}^i\Big | \\\le & {} \delta _f\cdot \beta (d,|b|+\delta _b). \end{aligned}$$
(iii)
This follows from the triangular inequality
$$\begin{aligned} | f(b) - {\widetilde{f}}(\widetilde{b}) | \le | f(b) -f(\widetilde{b}) | + |f(\widetilde{b}) - {\widetilde{f}}(\widetilde{b})|. \end{aligned}$$
and the bounds in parts (i) and (ii).

$\square $

The appearance of $\Vert f'\Vert $ in the above bound may be replaced by $d\Vert f\Vert $. Below, we develop similar bounds on partial derivatives in the multivariate case. For a general $n>1$, we need to generalize the notations:

$$\begin{aligned} f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}],\qquad {\varvec{b}},\widetilde{{\varvec{b}}}\in {{\mathbb {C}}}^n \end{aligned}$$

(7)

satisfying $\widetilde{{\varvec{b}}}={\varvec{b}}{\pm } \varvec{\delta }{\varvec{b}}$ (i.e., $\widetilde{{\varvec{b}}}_i={\varvec{b}}_i {\pm } \varvec{\delta }{\varvec{b}}_i$ for each i). Let ${\varvec{d}}= {\varvec{d}}(f)$ (i.e., ${\varvec{d}}_i = \mathop {deg}_{{\varvec{z}}_i}(f)$ for each i). The support of f is $\mathop {Supp}(f)\subseteq {{\mathbb {N}}}^n$ where $f = \sum _{{\varvec{\alpha }}\in \mathop {Supp}(f)} c_{\varvec{\alpha }}{\varvec{z}}^{\varvec{\alpha }}$ where $c_{\varvec{\alpha }}\in {{\mathbb {C}}}\setminus \left\{ 0 \right\} $. Here, ${\varvec{z}}^{\varvec{\alpha }}{\mathrel {\,{:}{=}\,}}\prod _{i=1}^n {\varvec{z}}_i^{{\varvec{\alpha }}_i}$. We assume that $\mathop {Supp}({\widetilde{f}})\subseteq \mathop {Supp}(f)$.

Our induction variable is $k=1,\ldots ,n$. For ${\varvec{\alpha }}\in {{\mathbb {N}}}^n$, let $\pi _k({\varvec{\alpha }}){\mathrel {\,{:}{=}\,}}(0,\ldots ,0,{\varvec{\alpha }}_{k+1},\ldots ,{\varvec{\alpha }}_n)$. E.g., if $k=n$ then $\pi _k({\varvec{\alpha }})=\varvec{0}$. Thus ${\varvec{\alpha }}-\pi _k({\varvec{\alpha }}) = ({\varvec{\alpha }}_1,\ldots ,{\varvec{\alpha }}_k,0,\ldots ,0)$. Next define $\mathop {Supp}_k(f){\mathrel {\,{:}{=}\,}}\left\{ \pi _k({\varvec{\alpha }}): {\varvec{\alpha }}\in \mathop {Supp}(f) \right\} $. With this notation, we can write

$$\begin{aligned} f = \sum _{{\varvec{\alpha }}\in \mathop {Supp}_k(f)} f_{\varvec{\alpha }}{\varvec{z}}^{\varvec{\alpha }}\end{aligned}$$

(8)

where each $f_{\varvec{\alpha }}\in {{\mathbb {C}}}[{\varvec{z}}_{(k)}]$. E.g., if $k=n$ then $\mathop {Supp}_k(f)=\left\{ \varvec{0} \right\} $ and so $f_{\varvec{0}}=f$.

Running Example Consider

$$\begin{aligned} f=xy+(x^3-1)y^2z + (x^2-y^2)z^3 \end{aligned}$$

(9)

where ${\varvec{z}}=(x,y,z)$. Then $\mathop {Supp}(f)=\left\{ 110, 321, 021, 203, 023 \right\} $. We can represent f using the support $\mathop {Supp}_1(f)=\left\{ 010, 021, 003, 023 \right\} $ as follows: $ f = f_{010}\cdot y+f_{021}\cdot y^2z + f_{003}\cdot z^3 f_{023}\cdot y^2z^3 $ where $f_{010}=x$, $f_{021}=x^3-1$, $f_{003}=x^2$, $f_{023}=-1$. Alternatively, using the support $\mathop {Supp}_2(f)=\left\{ 000, 001, 003 \right\} $, we can write $ f = f_{000}+f_{001}\cdot z + f_{003}\cdot z^3 $ where $f_{000}=xy$, $f_{001}=(x^3-1)y^2$, $f_{003}=(x^2-y^2)$.

Using (8), the partial specialization $f({\varvec{b}}_{(k)})\in {{\mathbb {C}}}[z_{k+1},\ldots ,z_n]$ may be written

$$\begin{aligned} f({\varvec{b}}_{(k)}) = \sum _{{\varvec{\alpha }}\in \mathop {Supp}_k(f)} f_{\varvec{\alpha }}({\varvec{b}}_{(k)})\cdot {\varvec{z}}^{\varvec{\alpha }}\end{aligned}$$

It follows that

$$\begin{aligned} \Vert f({\varvec{b}}_{(k)})\Vert =\max _{{\varvec{\alpha }}\in \mathop {Supp}_k(f)} \Big |f_{\varvec{\alpha }}({\varvec{b}}_{(k)})\Big |. \end{aligned}$$

(10)

The k-th partial derivative is $\partial _{k} f {\mathrel {\,{:}{=}\,}}\frac{\partial f}{\partial z_k} = \sum _{{\varvec{\alpha }}\in \mathop {Supp}_k(f)} (\partial _{k}f_{\varvec{\alpha }}) {\varvec{z}}^{\varvec{\alpha }}.$ Upon evaluation at ${\varvec{b}}_{(k)}$, its norm is given by

$$\begin{aligned} \Vert \partial _{k} f({\varvec{b}}_{(k)})\Vert =\max _{{\varvec{\alpha }}\in \mathop {Supp}_k(f)} \Big |\partial _{k} f_{\varvec{\alpha }}({\varvec{b}}_{(k)})\Big |. \end{aligned}$$

(11)

Using our running example (8), let $k=2$. Then $f=f_{000}+f_{001}\cdot z + f_{003}\cdot z^3$ with $f_{001}=xy$, $f_{001}=(x^3-1)y^2$, $f_{003}=(x^2-y^2)$. Thus $\partial _{2} f = x + (x^3-1)2y\cdot z -2y\cdot z^3$. If ${\varvec{b}}_{(2)}=(-1,3)$, then $\Vert f({\varvec{b}}_{(2)})\Vert =\max \left\{ 3, 18, 8 \right\} =18$ and $\Vert \partial _{2} f({\varvec{b}}_{(2)})\Vert =\max \left\{ 1, 12, 6 \right\} =12$.

We are ready for the generalization of Lemma 10.

Assume that we are given $f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]={{\mathbb {C}}}[{\varvec{z}}_{(n)}]$ and ${\varvec{b}},\widetilde{{\varvec{b}}}\in {{\mathbb {C}}}$. Also the degree sequences satisfies ${\varvec{d}}({\widetilde{f}})\le {\varvec{d}}(f)$, that is the inequality holds componentwise. Then we may define these quantities for $k=1,\ldots ,n$:

$$\begin{aligned} \begin{array}{llll} \delta _k {\varvec{b}}&{}{\mathrel {\,{:}{=}\,}}&{} |{\varvec{b}}_k -\widetilde{{\varvec{b}}}_k|, \\ \delta _k f &{}{\mathrel {\,{:}{=}\,}}&{} \Vert f({\varvec{b}}_{(k)})-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(k)})\Vert &{} \qquad (\text {with } \delta _0 f=\Vert f-{\widetilde{f}}\Vert ),\\ {\varvec{\beta }}_k &{}{\mathrel {\,{:}{=}\,}}&{} \beta ({\varvec{d}}_k, {\varvec{b}}_k) \\ \widetilde{\varvec{\beta }}_k &{}{\mathrel {\,{:}{=}\,}}&{} \beta ({\varvec{d}}_k, |{\varvec{b}}_k| + \delta _k{\varvec{b}}). \end{array} \end{aligned}$$

Note that $\delta _k$ is a operator that must attach to some function f or vector ${\varvec{b}}$ to denote the “kth perturbation” of f or ${\varvec{b}}$. We may restate Lemma 10(iii) using the new notations:

Corollary 11

For a univariate f,

$$\begin{aligned} \delta _1 f \le \Big [ \delta _0 f + \delta _1 {\varvec{b}}\cdot {\varvec{d}}_1 \cdot \Vert f\Vert \Big ]\widetilde{\varvec{\beta }}_1. \end{aligned}$$

(12)

We now address the case of multivariate f:

Lemma 12

(=Lemma 5 in Text) For $n\ge 1$ and each $k=1,\ldots ,n$:

(i)
$\Vert f({\varvec{b}}_{(k)})-f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k) \Vert $ $\le $ $\delta _k {\varvec{b}}\cdot \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert \cdot \widetilde{\varvec{\beta }}_k$.
(ii)
$\Vert f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k)-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(k)})\Vert $ $\le $ $\delta _{k-1} f \cdot \widetilde{\varvec{\beta }}_k$.
(iii)
$\delta _k f$ $\le $ $\Big [ \delta _k {\varvec{b}}\cdot \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert + \delta _{k-1} f \Big ]\cdot \widetilde{\varvec{\beta }}_k$.

Proof

We note that (iii) amounts to adding the inequalities of (i) and (ii): specifically, $\delta _k f \le \Vert f({\varvec{b}}_{(k)})-f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k) \Vert + \Vert f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k)-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(k)})\Vert $. Thus we only have to verify (i) and (ii). This will be shown by induction on k.

Suppose $k=1$. This will be an application of Lemma 10(i) and (ii). We use the fact that $f = \sum _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} f_{\varvec{\alpha }}{\varvec{z}}^{\varvec{\alpha }}$, and $f({\varvec{b}}_{(k-1)})=f({\varvec{b}}_{(0)})=f$. Then (i) becomes

$$\begin{aligned} \begin{array}{llll} \Vert f({\varvec{b}}_1)-f(\widetilde{{\varvec{b}}}_1) \Vert &{}=&{} \Vert \sum _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} (f_{\varvec{\alpha }}({\varvec{b}}_1)- f_{\varvec{\alpha }}(\widetilde{{\varvec{b}}}_1)) {\varvec{z}}^{\varvec{\alpha }}\Vert \\ &{}=&{} \max _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} |f_{\varvec{\alpha }}({\varvec{b}}_1)- f_{\varvec{\alpha }}(\widetilde{{\varvec{b}}}_1)| \\ &{}\le &{} \max _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} \delta _1{\varvec{b}}\cdot \Vert f_{\varvec{\alpha }}^{'}\Vert \cdot \widetilde{\varvec{\beta }}_1 &{} (\text {by } Lemma~10(i)) \\ &{}=&{} \max _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} \delta _1{\varvec{b}}\cdot \Vert \partial _{1} f_{\varvec{\alpha }}\Vert \cdot \widetilde{\varvec{\beta }}_1\\ &{}=&{} \delta _1{\varvec{b}}\cdot \Vert \partial _{1} f\Vert \cdot \widetilde{\varvec{\beta }}_1. \end{array} \end{aligned}$$

Similarly, (ii) follows from

$$\begin{aligned} \begin{array}{llll} \Vert f(\widetilde{{\varvec{b}}}_1)-{\widetilde{f}}(\widetilde{{\varvec{b}}}_1) \Vert &{}=&{} \Vert \sum _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} (f_{\varvec{\alpha }}(\widetilde{{\varvec{b}}}_1)- {\widetilde{f}}_{\varvec{\alpha }}(\widetilde{{\varvec{b}}}_1)) {\varvec{z}}^{\varvec{\alpha }}\Vert \\ &{}=&{} \max _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} \Big |f_{\varvec{\alpha }}(\widetilde{{\varvec{b}}}_1)-{\widetilde{f}}_{\varvec{\alpha }}(\widetilde{{\varvec{b}}}_1)\Big |\\ &{}\le &{} \max _{{\varvec{\alpha }}\in \mathop {Supp}_1(f)} \delta _{f_{\varvec{\alpha }}}\cdot \widetilde{\varvec{\beta }}_1 &{} (\text {by } Lemma~10(ii)) \\ &{}=&{} \Vert f -{\widetilde{f}}\Vert \cdot \widetilde{\varvec{\beta }}_1\\ &{}=&{} \delta _0 f \cdot \widetilde{\varvec{\beta }}_1. \end{array} \end{aligned}$$

Suppose $k>1$. We now prove (i). The left hand side (LHS) $\Vert f({\varvec{b}}_{(k)})-f({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k) \Vert $ is the maximum of

$$\begin{aligned} |f_{\varvec{\alpha }}({\varvec{b}}_{(k)}) -f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k)| \end{aligned}$$

(A)

where ${\varvec{\alpha }}$ ranges over $\mathop {Supp}_k(f)$. We can rewrite (A) in the form $|f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})({\varvec{b}}_k) -f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})(\widetilde{{\varvec{b}}}_k)|$. Applying Lemma 10(i), we can upper bound (A) by “$\delta _b\cdot \Vert f'\Vert \cdot \beta (d,|b|+\delta _b)$” where “$\delta _b$” here is $|{\varvec{b}}_k-\widetilde{{\varvec{b}}}_k|=\delta _k$, “$\Vert f'\Vert $” is $\Vert \partial _{k}f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert $ and “$\beta (d,|b|+\delta _b)$” is ${\varvec{\beta }}_k$. This establishes (i). Finally (ii) is proved by a similar invocation of Lemma 10(ii). $\square $

We now have a recursive bound $\Vert \delta _n f\Vert $. But we need to convert the bound to only depend on the data $\Vert {\varvec{b}}\Vert , \Vert f\Vert , \delta _k{\varvec{b}}$. In particular, we remove any occurrences of $\partial _{k}f_{\varvec{\alpha }}$ with the help of the next lemma:

Lemma 13

(= Lemma 6 in Text) For $k=1,\ldots ,n$:

(i)
$\Vert f({\varvec{b}}_{(k)})\Vert \le \Vert f\Vert \cdot \prod _{i=1}^k {\varvec{\beta }}_i$
(ii)
For ${\varvec{\alpha }}\in \mathop {Supp}_k(f)$,
$$\begin{aligned} \Big \Vert \partial _{k}f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Big \Vert \le {\varvec{d}}_k\cdot \Vert f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert . \end{aligned}$$
(iii)
$\Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert \le {\varvec{d}}_k \cdot \Vert f\Vert \cdot \prod _{i=1}^{k-1} {\varvec{\beta }}_i$

Proof

(i)
The LHS of the inequality is equal to the maximum of $|f_{\varvec{\alpha }}({\varvec{b}}_{(k)})|$ where ${\varvec{\alpha }}\in \mathop {Supp}_k(f)$. First consider $k=1$. In this case, $f_{\varvec{\alpha }}$ is a univariate polynomial in ${\varvec{z}}_1$ of degree at most ${\varvec{d}}_1$, say $f_{\varvec{\alpha }}({\varvec{z}}_1)=\sum _{i=0}^{{\varvec{d}}_1}c_i {\varvec{z}}_1^i$ where c is a coefficient of f. By Lemma 8,
$$\begin{aligned} |f_{\varvec{\alpha }}({\varvec{b}}_1)| \le \Vert f_{\varvec{\alpha }}\Vert {\varvec{\beta }}_1 \le \Vert f\Vert {\varvec{\beta }}_1, \end{aligned}$$
proving the result for $k=1$. For $k>1$, each $f_{\varvec{\alpha }}$ is a polynomial in ${\varvec{z}}_{(k)}$, and we can write $f_{\varvec{\alpha }}({\varvec{b}}_{(k)})$ as $f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})({\varvec{b}}_k)$. By induction, the polynomial $f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})({\varvec{z}}_k)$ has norm at most $\Vert f\Vert \cdot \prod _{i=1}^{k-1}{\varvec{\beta }}_i$. Moreover, its degree is at most ${\varvec{d}}_k$. So evaluating it at ${\varvec{b}}_k$ gives a value of size at most $\Vert f\Vert \cdot \prod _{i=1}^{k}{\varvec{\beta }}_i$.
(ii)
Write $f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)}) = \sum _{i=0}^{{\varvec{d}}_k} c_i {\varvec{z}}_k^i$ where $c_i\in {{\mathbb {C}}}$ satisfies $|c_i|\le \Vert f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert $. Thus $\partial _{k}f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})$ is a polynomial with norm
$$\begin{aligned} \Big \Vert \partial _{k}f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Big \Vert \le {\varvec{d}}_k \Vert f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert . \end{aligned}$$
(iii)
Letting ${\varvec{\alpha }}$ range over $\mathop {Supp}_k(f)$,
$$\begin{aligned} \begin{array}{llll} \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert &{}=&{} \max _{{\varvec{\alpha }}} \Vert \partial _{k}f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert &{}\\ &{}\le &{} \max _{{\varvec{\alpha }}} {\varvec{d}}_k \cdot \Vert f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert &{} \qquad (\text {from part (ii) second formula})\\ &{}\le &{} {\varvec{d}}_k \cdot \max _{{\varvec{\alpha }}} \Vert f_{\varvec{\alpha }}({\varvec{b}}_{(k-1)})\Vert \\ &{}\le &{} {\varvec{d}}_k \cdot \Vert f({\varvec{b}}_{(k-1)})\Vert \\ &{}\le &{} {\varvec{d}}_k \cdot \Vert f\Vert \cdot \prod _{i=1}^{k-1}{\varvec{\beta }}_i &{} \qquad (\text {from part (i)})\\ \end{array} \end{aligned}$$

$\square $

Putting it all together:

Theorem 4

(=Theorem 2 in Text) For $k=1,\ldots ,n$,

$$\begin{aligned} \delta _k f \le \Big [ \delta _0 f + \Vert f\Vert \cdot \sum _{i=1}^k {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big ] \cdot \Big (\prod _{i=1}^k \widetilde{\varvec{\beta }}_i\Big ). \end{aligned}$$

Proof

When $k=1$, our formula is

$$\begin{aligned} \delta _1 f \le \Big [ \delta _0 f + \Vert f\Vert \cdot {\varvec{d}}_1\cdot \delta _1 {\varvec{b}}\Big ] \widetilde{\varvec{\beta }}_1. \end{aligned}$$

follows from the case $k=1$ of Lemma 12(iii). For $k>1$, we use induction:

$$\begin{aligned} \delta _k f\le & {} \Big [ \delta _k {\varvec{b}}\cdot \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert + \delta _{k-1} f \Big ]\cdot \widetilde{\varvec{\beta }}_k\,\, \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \qquad \text {(By Lemma~12(iii)}\\\le & {} \Big [ \delta _k {\varvec{b}}\cdot \Vert \partial _{k}f({\varvec{b}}_{(k-1)})\Vert + \Big \{ \delta _0 f + \Vert f\Vert \cdot \sum _{i=1}^{k-1} {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big \} \cdot \Big (\prod _{i=1}^{k-1} \widetilde{\varvec{\beta }}_i\Big ) \Big ] \cdot \widetilde{\varvec{\beta }}_k\,\, \qquad \quad \text {(By induction)}\\\le & {} \Big [ \delta _k {\varvec{b}}\cdot {\varvec{d}}_k \cdot \Vert f\Vert \cdot \Big (\prod _{i=1}^{k-1} {\varvec{\beta }}_i\Big ) + \Big \{ \delta _0 f + \Vert f\Vert \cdot \sum _{i=1}^{k-1} {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big \} \cdot \Big (\prod _{i=1}^{k-1} \widetilde{\varvec{\beta }}_i\Big ) \Big ] \cdot \widetilde{\varvec{\beta }}_k\,\,\,\, \text {(By Lemma~13(iii))}\\\le & {} \Big [ \delta _k {\varvec{b}}\cdot {\varvec{d}}_k \cdot \Vert f\Vert + \Big \{ \delta _0 f + \Vert f\Vert \cdot \sum _{i=1}^{k-1} {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big \} \Big ] \cdot \Big (\prod _{i=1}^{k} \widetilde{\varvec{\beta }}_i\Big )\,\,\,\,\, \qquad \qquad \qquad \qquad \text {(since }{\varvec{\beta }}_i\le \widetilde{\varvec{\beta }}_i)\\= & {} \Big [ \delta _0 f + \Vert f\Vert \cdot \sum _{i=1}^{k} {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big ] \cdot \Big (\prod _{i=1}^{k} \widetilde{\varvec{\beta }}_i\Big ). \end{aligned}$$

$\square $

The next lemma answers the question: given $\delta _L>0$, how can we ensure that

$$\begin{aligned} \delta _{n-1}f {\mathrel {\,{:}{=}\,}}\Vert f({\varvec{b}}_{(n-1)})-{\widetilde{f}}(\widetilde{{\varvec{b}}}_{(n-1)})\Vert \end{aligned}$$

is upper bounded by $\delta _L$?

Lemma 14

(=Lemma 7 in Text)

Given $\delta _L>0$, $f,{\widetilde{f}}\in {{\mathbb {C}}}[{\varvec{z}}]$ and ${\varvec{b}},\widetilde{{\varvec{b}}}\in {\mathbb {C}}^{n-1}$ where $n>1$.

Let $d=\max _i (\mathop {deg}_{{{z}}_i}(f))$ and $M=\Vert {\varvec{b}}\Vert +1$.

If

$$\begin{aligned} \delta _f \le \frac{\delta _L}{2((d+1)M^d)^{n-1}} \quad (*) \end{aligned}$$

and

$$\begin{aligned} \delta _{\varvec{b}}\le \min (1, \frac{\delta _L}{2d\Vert f\Vert (n-1)((d+1)M^d)^{n-1}}), \quad {(**)} \end{aligned}$$

then

$$\begin{aligned} \delta _{n-1} f \le \delta _L. \end{aligned}$$

Proof

Note that $\delta _f {\mathrel {\,{:}{=}\,}}\Vert f-{\widetilde{f}}\Vert $ in (*) and $\delta _{\varvec{b}}{\mathrel {\,{:}{=}\,}}\Vert {\varvec{b}}-\widetilde{{\varvec{b}}}\Vert $ in (**). Since $\delta _{\varvec{b}}\le 1$, we conclude that $\Vert \widetilde{{\varvec{b}}}\Vert \le M$. Using the bounds ${\varvec{\beta }}_k\le \widetilde{\varvec{\beta }}_k\le (d+1)M^d$, the bound of Theorem 4 for the case $k=n-1$ becomes

$$\begin{aligned} \delta _{n-1} f\le & {} \Big (\prod _{i=1}^{n-1} \widetilde{\varvec{\beta }}_i\Big ) \Big [ \delta _f + \Vert f\Vert \cdot \sum _{i=1}^{n-1} \Big ( {\varvec{d}}_i\cdot \delta _i {\varvec{b}}\Big ) \Big ]\\\le & {} \Big ((d+1)M^d \Big )^{n-1} \Big [ \delta _f + d\Vert f\Vert \delta _{\varvec{b}}(n-1) \Big ]. \end{aligned}$$

The inequalities (*) on $\delta _f$, and (**) on $\delta _{\varvec{b}}$, are designed to ensure that $\delta _{n-1} f \le \delta _L$. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Imbach, R., Pouget, M. & Yap, C. Clustering Complex Zeros of Triangular Systems of Polynomials. Math.Comput.Sci. 15, 271–292 (2021). https://doi.org/10.1007/s11786-020-00482-0

Download citation

Received: 09 April 2019
Revised: 15 September 2019
Accepted: 23 September 2019
Published: 17 June 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11786-020-00482-0

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Clustering Complex Zeros of Triangular Systems of Polynomials

Abstract

Similar content being viewed by others

Accelerated Subdivision for Clustering Roots of Polynomials Given by Evaluation Oracles

New Practical Advances in Polynomial Root Clustering

Computing the roots of sparse high–degree polynomials that arise from the study of random simplicial complexes

1 Introduction

1.1 Our Contributions

1.2 Related Work

1.3 Definitions and Notations

1.4 Oracles for Root Cluster of Univariate Polynomials

Proposition 1

Proposition 2

2 Sum of Multiplicities in Clusters of Solutions

2.1 Two Examples

2.2 Sum of Multiplicities in a Cluster

Proposition 3

Theorem 1

Proof

2.3 Tower Representation

Definition 1

Corollary 4

3 Error Analysis of Approximate Specializations

Lemma 5

Lemma 6

Theorem 2

Lemma 7

4 Clustering for Triangular Systems

5 Implementation and Benchmarks

5.1 Clustering Ability

5.2 Benchmarks with Random Dense Systems

5.3 Systems Obtained by Triangularization

6 Future Work

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Error Analysis

Appendix: Error Analysis

Lemma 8

Theorem 3

Proof

Corollary 9

Lemma 10

Proof

Corollary 11

Lemma 12

Proof

Lemma 13

Proof

Theorem 4

Proof

Lemma 14

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation