1 Introduction

The accurate and efficient computation of the real roots of polynomials is a fundamental problem in numerical analysis, of great importance in numerous applications. Although this problem has been intensively investigated since the advent of digital computers, a “universal” root–finding algorithm based on floating–point arithmetic, that infallibly computes to a specified precision the roots of polynomials with arbitrary coefficients, remains an elusive ideal. A primary factor in this state of affairs is the high variability observed in the sensitivity of polynomial roots to perturbations in the polynomial coefficients, and the sometimes counter–intuitive trends of this sensitivity.

For polynomials with exactly specified integer coefficients, the vagaries of floating–point arithmetic can be circumvented by invoking arbitrary–precision rational arithmetic to determine isolating intervals for each root, smaller than any prescribed width. However, this approach can become very demanding in computing time and memory requirements, and for many practical problems, the initial polynomial coefficients will not be exactly known.

Floating–point arithmetic, based on rounding or truncation of significant digits to ensure a fixed memory size for numbers, has become widely adopted in scientific computing since the 1960s. In the late 1950s, it was implemented in software rather than in the cpu, and the two–part paper [13] by the British numerical analyst J. H. Wilkinson describes one of its earliest applications in computing polynomial roots. Subsequently [15], he characterized his choice of an “easy” test polynomial (with the 20 equally spaced real roots 1,2,…,20) as the most traumatic experience in my career as a numerical analyst, due to the extreme difficulty in computing any accurate digits for most of the roots. The polynomial with the 20 roots 2− 1,2− 2,…,2− 20 in geometric progression is also considered in [13], and despite the close proximity of successive roots, it was found that they could be computed to a high accuracy.

The choice of basis plays an important role in determining the sensitivity of the roots of a polynomial to uncertainties in its coefficients, and it is known that the usual monomial (power) form a0 + a1t + ⋯ + amtm is not, in general, a favorable choice. For computing the real roots on a finite interval t ∈ [a,b], the Bernstein form is preferred, and in fact it is “optimally stable” among all bases that are non–negative over that interval [4].

In the present paper, we consider the roots of a sequence of sparse high–degree polynomials that arise from the study of random simplicial complexes (see Section 2), wherein each instance pn(t) includes only the n + 1 monomial terms tk for k = 1,2,…,2n with binomial coefficients of alternating sign. Because of the probabilistic interpretation of the independent variable t, the main focus is on real roots t ∈ [0,1] although we also briefly discuss positive roots > 1, negative roots, and complex conjugate root pairs.

The remainder of this paper is organized as follows. Section 2 introduces the topological problem that motivates the study of the positive roots of the family of sparse high–degree polynomials considered herein. In Section 3, we elucidate some of the distinctive properties of the roots of these polynomials. Section 4 then considers numerical computation of the roots on (0,1) using double–precision floating–point arithmetic for n ≤ 10, based upon the power and Bernstein representations, and Section 5 discusses the accuracy of the computed roots in relation to the root condition numbers. Finally, Section 6 summarizes the principle results of the present study.

2 Euler characteristic of random simplicial complexes

We begin by explaining how the sequence of polynomials investigated herein arises quite naturally in the context of a fundamental question concerning the homotopy type of randomly constructed simplicial complexes.

2.1 Random simplicial complexes

Erdős and Rényi introduced random graphs [1] in 1959. In the years since their introduction, random graphs have proven to be an extremely powerful tool in graph theory, and their study has opened up a huge collection of new research directions. The most useful and deeply studied probability model starts with n vertices and a fixed probability t. Then, for each pair of vertices u and v, we flip a “t–coin” (that comes up heads with probability t) to decide whether or not they should be connected by an edge.

Even though, to a topologist, a graph is just a 1–dimensional simplicial complex, the study of random simplicial complexes has attracted substantial attention only in the past 15 years or so. The generalization from the random models of Erdős and Rényi to probability models for random simplicial complexes is straightforward. Recall that an n–dimensional simplex Δn is the convex hull of a set of n + 1 points {x1,…,xn+ 1} in general position in \(\mathbb {R}^{m}\) (with mn). The points xi are called the vertices of the simplex. Any subset with k + 1 of these vertices defines a subsimplex of dimension k. Thus, there are \(\binom {n+1}{k+1}\)k–dimensional simplices in Δn, and a total of 2n+ 1 simplices, including the empty simplex. Combinatorially, a simplicial complex K of Δn is simply the union of some (perhaps all) of the simplices of Δn, with the proviso that if a simplex is in K, then so are all its subsimplices.

Several probability models for constructing random simplicial complexes have been studied, which are all specializations of the following extremely general procedure. List all the non–empty simplices of Δn with an ordering \(\sigma _{1},\ldots ,\sigma _{2^{n+1}-1}\) such that their dimensions are weakly increasing, and for each simplex σ, assign a probability pσ ∈ [0,1]. Then, beginning with \(K =\varnothing \) and σ = σ1, check if all the subsimplices of σ are already in K. If they are, then σ is “available for inclusion into K” and we flip a coin that comes up heads with probability pσ to decide whether or not to include it. Once that decision is made, we consider the next simplex in our list; the construction ends once all of the simplices of Δn have been considered.

We focus henceforth on the specialization of this general model in which we choose a fixed value t ∈ [0,1] and set pσ = t for all simplices σ of Δn.

2.2 Euler characteristic

A topologist is naturally interested in the homotopy–theoretical properties of random simplicial complexes. The most fundamental question is whether or not these spaces are contractible (i.e., homotopically trivial). One way to distinguish the homotopy types of spaces is through the Euler characteristic, defined as

$$ \chi(K) := \sum\limits_{k=0}^{\dim(K)} (-1)^{k} \#\{ \text{\textit{k}-simplices in \textit{K}} \} . $$

If K and L are homotopy equivalent, then χ(K) = χ(L), and since χ(K) = 1 if K is a single point, we see that K cannot be contractible when χ(K)≠ 1. Let the random variable χ(n) be the Euler characteristic of the complex K constructed by the above procedure, with expected value E(χ(n)).

Proposition 1

\( \displaystyle {E(\chi (n)) = - \frac {1}{t} \sum \limits _{j=1}^{n+1} (-1)^{j} \binom {n+1}{j} t^{2^{j}}} . \)

This follows directly from the discussion above, together with the linearity of expected value. Straightforward manipulation of the equation E(χ(n)) = 1 yields the following result.

Corollary 1

The “expected random subcomplex” ofΔn− 1is not contractible if t is not a root of the polynomial

$$ p_n(t) := \sum\limits_{k=0}^n (-1)^k \binom{n}{k} t^{2^k} . $$
(1)

Motivation for studying the roots of pn(t)

Corollary 1 indicates that the random process generally produces homotopically non–trivial complexes—with possible exceptions when the probability t is one of the finite number of real roots of the polynomial pn(t) in the interval [0,1]. Furthermore, our study of these polynomials suggests that they always change sign at the roots, which means that the algebraic topology of the random complexes may have a phase transition or threshold of some kind at these particular t values. The study of thresholds separating different kinds of behavior is a major strand in the well–established theory of random graphs.

3 Anatomy of the polynomials

We are interested in the roots of the sparse polynomials pn(t) of degree 2n defined for n ≥ 1 by (1). The first few instances of these polynomials are

$$ \begin{array}{@{}rcl@{}} p_1(t) &=& t-t^2 , \\ p_2(t) &=& t-2 t^2+t^4 , \\ p_3(t) &=& t-3 t^2+3 t^4-t^8 , \\ p_4(t) &=& t-4 t^2+6 t^4-4 t^8+t^{16} , \\ p_5(t) &=& t-5 t^2+ 10 t^4-10 t^8+5 t^{16}-t^{32} , \\ p_6(t) &=& t-6 t^2+ 15 t^4-20 t^8+ 15 t^{16}-6 t^{32}+t^{64} , \end{array} $$

and one can easily verify that they satisfy the simple recursion relation

$$ p_{n+1}(t) = p_n(t)-p_n(t^2) . $$
(2)

For all n ≥ 1, t = 0 is obviously a root of pn(t), and t = 1 is also a root, since

$$ p_{n}(1) = \sum\limits_{k=0}^{n} (-1)^{k} \binom{n}{k} = 0 . $$

These are simple roots of pn(t), since \(p_{n}^{\prime }(0)=1\) and \(p_{n}^{\prime }(1)=(-1)^{n}\). Thus, the factor (1 − t)t can be extracted from pn(t) to obtain a polynomial \(\tilde {p}_{n}(t)\) of the modestly lower degree 2n − 2, although \(\tilde {p}_{n} (t)\) is no longer sparse. From an abstract algebraic point of view, we have verified using Maple that \(\tilde {p}_{n}(t)\) is irreducible over the rational numbers for 2 ≤ n ≤ 10.

Clearly, t = 0 and t = 1 are the only roots of p1(t), while p2(t) has the additional real roots \(t=\textstyle \frac {1}{2}\displaystyle (-1\pm \sqrt {5})\), one negative and one positive. Using Maple, we have verified that the Galois group of the splitting field E of \(\tilde {p}_{3}(t)\) is isomorphic to the full symmetric group Sym(6), and since this group is not solvable, the roots of p3(t) (apart from 0 and 1) cannot be expressed in terms of radicals. It seems likely that this is also true for all n ≥ 3, although the degree of p4(t) is already too high for Maple to cope with.

Since the polynomial (1) has n sign changes in its coefficients, Descartes’ law of signs [12] indicates that the number of its positive real roots is less than n by an even amount. We show below that pn(t) has exactly n positive roots (n − 1 roots on (0,1) together with the root t = 1) and identify an “interlacing” property of the roots for successive n values, i.e., the roots of pn+ 1(t) on (0,1) lie within intervals delineated by the roots of pn(t).

Lemma 1

Forn ≥ 1 the polynomials (1) have at mostn + 1 roots on[0,1] and their derivatives have at most n roots on (0,1).

Proof

Since pn(t) exhibits n sign changes in its coefficients, Descartes’ law of signs indicates that it has at most n positive roots, including t = 1. Since t = 0 is also a root of pn(t), it has at most n + 1 roots on [0,1]. The derivative of pn(t), namely

$$ p_{n}^{\prime}(t) = \sum\limits_{k=0}^{n} (-1)^{k}\binom{n}{k} 2^{k} t^{2^{k}-1} , $$

also exhibits n coefficient sign changes, so \(p_{n}^{\prime }(t)\) has at most n roots on (0,1). Note that 0 and 1 are not roots, since \(p_{n}^{\prime }(0)=1\) and \(p_{n}^{\prime }(1)=(-1)^{n}\). □

Proposition 2

Consider a polynomialf(t) satisfying:

  • f(t) has only simple roots on [0,1] including 0 and 1.

  • r < s2for consecutive root pairsr,soff(t) on [0,1].

  • f(t) has exactly one root between r and s.

Then, the polynomialg(t) = f(t) − f(t2) inherits the following properties:

  • g(t) has one simple root w between the rootsr,soff(t).

  • The root w ofg(t) actually lies between\(\sqrt {r}\)and s.

  • Any two consecutive rootsv,wofg(t) satisfyv < w2.

Proof

Let q,r,s be consecutive roots of f(t) on t ∈ [0,1]. Then, \(r<\sqrt {r}<s\) and q < r2t2 for rts. Now the values of f(t) on [r,s] are replicated by f(t2) on \([ \sqrt {r},\sqrt {s} ]\). In particular, f(t) and f(t2) must have opposite signs on \([ r,\sqrt {r} ]\) and \([ s,\sqrt {s} ]\) so g(t) cannot vanish on these intervals (see Fig. 1). For simplicity, we assume that f(t) < 0 for r < t < s (an analogous argument holds when f(t) > 0). Then, f(t) < 0 = f(t2) at \(t=\sqrt {r}\), and 0 = f(t) > f(t2) at t = s. Thus, since g(t) < 0 at \(t= \sqrt {r}\) and g(t) > 0 at t = s, the interval \((\sqrt {r},s)\) must contain a root w of g(t). Moreover, since

$$ g^{\prime}(t) = f^{\prime}(t)-2 t f^{\prime}(t^{2}) , $$

and f(t) > 0 and f(t2) < 0 at t = w when f(t) < 0 for r < t < s, we have g(w) > 0, so w is a simple root of g(t).

Fig. 1
figure 1

The graphs (red) of p4(t) and (blue) of p4(t2), showing consecutive roots r,s and \(\sqrt {r},\sqrt {s}\). Observe that p4(t), p4(t2) have opposite signs on the intervals \([ r,\sqrt {r} ]\) and \([ s,\sqrt {s} ]\) but the same sign on \([ \sqrt {r},s ]\), which contains a root of p5(t), indicated by the dashed line, where the graphs of p4(t),p4(t2) cross. Note also that p4(t),p4(t2) have derivatives of opposite sign on \([ \sqrt {r},s ]\)

Finally, consecutive roots v,w of g(t) lying between successive root pairs q,r and r,s of f(t) must satisfy

$$ q < \sqrt{q} < v < r < \sqrt{r} < w < s , $$

and consequently it is clear that v < w2. □

Theorem 1

Forn ≥ 1, the root structureof the polynomials (1) on [0,1] may be characterized as follows:

  • pn(t) has exactlyn + 1 simple roots on [0,1] includingt = 0 andt = 1.

  • The roots exhibit a “super–quadratic distribution”—i.e.,r < s2for any pair of consecutive rootsr,sofpn(t).

Proof

We argue by induction. First, the stated properties are easily verified by direct calculation for small n values. Now suppose that they hold for some larger n. Under this supposition, Lemma 1 indicates that pn(t) has exactly n extrema on (0,1) that isolate its n + 1 roots on [0,1]. Consequently, pn(t) satisfies the conditions stipulated for the polynomial f(t) in Proposition 2, and pn+ 1(t)—as defined by the recursion relation (2)—has the properties of the polynomial g(t) in Proposition 2. This implies that pn+ 1(t) has n + 2 simple roots on [0,1] (including t = 0 and t = 1) with a super–quadratic distribution, i.e., v < w2 for any pair of consecutive roots v,w of pn+ 1(t). This completes the induction proof. □

As a direct consequence of Theorem 1, we make the following observations.

Remark 1

By Descartes’ law of signs, the polynomials pn(t) have no real roots for t > 1.

Corollary 2

The n rootstn+ 1,1,…,tn+ 1,nofpn+ 1(t) on (0,1) interlace then − 1 rootstn,1,…,tn,n− 1ofpn(t) on (0,1) — i.e., withtn,0 = 0 andtn,n = 1 they satisfy

$$ t_{n,k-1} < t_{n+1,k} < t_{n,k} \quad \text{for} k=1,\ldots,n . $$
(3)

The ordering (3) is just a weaker form of the relation \(\sqrt {t_{n,k-1}} < t_{n+1,k} < t_{n,k}\) for k = 1,…,n established in Theorem 1.

The remarkable behavior of the graph of pn(t), illustrated in Fig. 2 for t ∈ [0,1] in the case n = 10, makes it an interesting test case for polynomial root solvers. Using a linear scale in t, only the first 6 of its 9 roots on (0,1) are clearly discernible. As n increases, the graph of pn(t) on [0,1] oscillates with increasing frequency but decreasing amplitude between negative and positive values as t → 1, and the roots become tightly clustered. For t > 1, pn(t) rapidly diverges to + or − according to whether n is even or odd.

Fig. 2
figure 2

Graph of the polynomial p10(t) over the interval t ∈ (0,1)

4 Computing the roots on (0,1)

At the cost of sacrificing certainty, numerical computations permit greater scope and efficiency than symbolic computations in computing the real roots of (1) on t ∈ (0,1). In particular, the Bernstein form

$$ f(t) = \sum\limits_{i=0}^{m} c_{i} \binom{m}{i}(1-t)^{m-i}t^{i} $$

of a degree–m polynomial f(t) on t ∈ [0,1] offers significant computational advantages, including:

  • Interpolation of end–point values: f(0) = c0 and f(1) = cm.

  • Convex hull property: minicif(t) ≤maxici for t ∈ [0,1].

  • Variation–diminishing property: # of roots on (0,1) = # sign variations of (c0,…,cm) − 2k for some integer k ≥ 0.

  • Subdivision algorithm: for any point τ ∈ (0,1), the Bernstein coefficients on the subintervalsFootnote 1[0,τ] and [τ,1] can be computed through recursive convex combinations of the coefficients c0,…,cm.

  • Numerical stability: f(t) on t ∈ [0,1] is always less sensitive to uniform perturbations in its Bernstein coefficients than in its power coefficients.

A scheme to isolate the real roots of a polynomial on t ∈ (0,1) may be based on the variation–diminishing property and subdivision algorithm. See [3] for a comprehensive review of the properties of the Bernstein form—further details on the numerical stability properties may be found in [4, 5].

4.1 Bernstein form of pn(t)

The change of variables t ∈ (0,) → u ∈ (0,1) specified by

$$ t = \frac{u}{1-u} $$
(4)

defines a bijective relation between t and u, since dt/du = (1 − u)− 2 > 0 for u ∈ (0,1). Note also that t ∈ (0,1) and t ∈ (1,) correspond to \(u\in (0,\textstyle \frac {1}{2}\displaystyle )\) and \(u\in (\textstyle \frac {1}{2}\displaystyle ,1)\). Substituting (4) into (1) yields

$$ p_{n}(u) = \frac{1}{(1-u)^{2^{n}}} \sum\limits_{k=0}^{n} (-1)^{k} \binom{n}{k} (1-u)^{2^{n}-2^{k}}u^{2^{k}} , $$

In computing the positive roots of (1), we omit the factor \(1/(1-u)^{2^{n}}\) since it is non–zero and finite on (0,1). The remaining term \(f_{n} (u)=(1-u)^{2^{n}}p_{n}(u)\) can be expressed in the Bernstein basis of degree 2n on u ∈ [0,1] as

$$ f_n(u) = \sum\limits_{i=0}^{2^n} c_i \binom{2^n}{i} (1-u)^{2^n-i}u^i , $$
(5)

where ci = 0 if i≠ 2k for k = 0,…,n and

$$ c_i = (-1)^k \frac{\displaystyle\binom{n}{k}} {\displaystyle\binom{2^n}{2^k}} \quad \text{if} i=2^k \text{for} k=0,\ldots,n . $$
(6)

The form (5) retains the sparsity of (1), with only n + 1 non–zero coefficients.

Although the Bernstein coefficients of the restriction of (5) to \(u\in \left (0,\textstyle \frac {1}{2}\displaystyle \right )\) can be computed by the de Casteljau subdivision algorithm [2], it is preferable to derive them from first principles. Setting

$$ p_n(t) = \sum\limits_{l=0}^{2^n} c_l \binom{2^n}{l}(1-t)^{2^n-l} t^l , $$
(7)

the coefficients \(c_{0},\ldots ,c_{2^{n}}\) of (1) on t ∈ [0,1] may be determined as follows. For each k, the kth term in (7) is multiplied by \(1=[ (1-t)+t ]^{2^{n}-2^{k}}\) to give

$$ p_{n}(t) = \sum\limits_{k=0}^{n} (-1)^{k}\binom{n}{k} t^{2^{k}} [ (1-t)+t ]^{2^{n}-2^{k}} , $$

and through binomial expansions of the factors \([ (1-t)+t ]^{2^{n}-2^{k}}\), we obtain

$$ \begin{array}{@{}rcl@{}} p_n(t) \!\! &=& \!\! \sum\limits_{k=0}^n (-1)^k\binom{n}{k} t^{2^k} \sum\limits_{j=0}^{2^n-2^k} \binom{2^n-2^k}{j}(1-t)^{2^n-2^k-j} t^j \\ \!\! &=& \!\! \sum\limits_{k=0}^n \sum\limits_{j=0}^{2^n-2^k} (-1)^k \binom{n}{k}\binom{2^n-2^k}{j}(1-t)^{2^n-2^k-j} t^{2^k+j} . \end{array} $$

By setting l = 2k + j, re–arranging the order of summation, and performing some manipulations on the binomial coefficients, the Bernstein form of pn(t) on t ∈ [0,1] is determined to have coefficients defined by c0 = 0 and

$$ c_l = \frac{1}{{\displaystyle\binom{2^n}{l}}} \sum\limits_{k=0}^{k_{\max}} (-1)^{k} \binom{n}{k}\binom{2^n-2^k}{l-2^k} , \quad l=1,\ldots,2^n , $$
(8)

kmax = ⌊log2l⌋ being the largest integer k with 2kl. Note that \(c_{2^{n}}=0\), since t = 1 is a root of pn(t). Because of the restricted range of summation, and the appearance of the summation index as an exponentFootnote 2 in the binomial coefficients, there is no obvious further reduction of the sum (8).

The binomial coefficients appearing in (8) can be very large, because of the exponential dependence on n. Goetgheluck [8, 9] describes an algorithm to represent any binomial coefficient through its factorization

$$ \binom{m}{l} = \prod\limits_{i=1}^{r} p_i^{ e_i} , $$
(9)

where {p1,p2,p3,…} = {2,3,5,…} is the ordered set of prime numbers, and e1,e2,e3,… are their corresponding exponents. Since primes larger than m cannot appear in (9), the index r is such that prm and pr+ 1 > m. The form (9) allows exact representation of high–order binomial coefficients, and also their products and ratios, since multiplication and division are performed by adding or subtracting their prime exponents. Having determined the sum in (8) exactly as an integer, the coefficient cl is obtained by a single floating–point division. Nevertheless, using extended–precision integer arithmetic (the long long data type in the C programming language), this approach is only feasible for n ≤ 6, since larger values incur overflow in evaluating the sums in (8). Thus, for n ≤ 6, the prime factorization (9) is used in evaluating the binomial coefficients in (8), and for n > 6, they are evaluated using double–precision floating–point arithmetic.

The Bernstein representation (7) is “dense” in the sense that, apart from \(c_{0}=c_{2^{n}}=0\) since pn(0) = pn(1) = 0, the coefficients cl for 1 ≤ l ≤ 2n − 1 are, in general, non–zero. For n ≤ 3, the non–zero coefficients of pn(t) are

$$ n=1: c_{1} = \frac{1}{2} , \qquad n=2: c_{1},c_{2},c_{3} = \frac{1}{4},\frac{1}{6},-\frac{1}{4} , $$
$$ n=3: c_{1},c_{2},c_{3},c_{4},c_{5},c_{6},c_{7} = \frac{1}{8},\frac{2}{14}, \frac{3}{56},-\frac{1}{10},-\frac{13}{56},-\frac{3}{14},\frac{1}{8} . $$

For n > 3, it is impractical to list the coefficients, and they are best displayed graphically. The Bernstein form (7) of pn(t) has control points specified by

$$ (l/2^{n},c_{l}) \quad \text{for} l=0,\ldots,2^{n} $$

and the control polygon is the piecewise–linear graph obtained by connecting them in order. The control polygon mimics the graph of pn(t), as illustrated in Fig. 3 for n = 5 and 6, and it can be refined by degree elevation or subdivision [2] so as to converge monotonically to pn(t).

Fig. 3
figure 3

Graphs of the polynomials p5(t) and p6(t) over the interval [0,1], on the left and right, together with their Bernstein–form control polygons

The roots of pn(t) on (0,1) can be isolated through a hierarchical binary subdivision, such that each subinterval exhibits exactly 0 or 1 sign change in its associated Bernstein coefficients. Moreover, a simple check for guaranteed convergence of Newton–Raphson iterations [11] to the unique real root on each isolating interval can be expressed in terms of these Bernstein coefficients and their first and second forward differences.

Using initial estimates obtained from the graphs of pn(t) for n ≤ 6, the roots on (0,1) were computed by Newton–Raphson iterations using both the power form (1) and Bernstein form (7). To evaluate pn(t) and its derivative, Horner’s method and the de Casteljau algorithm were used in the former and latter cases. To ensure the greatest accuracy, the Bernstein coefficients (8) were computed using the binomial coefficient factorizations (9).

Table 1 Computed roots of pn(t) on (0,1) for 2 ≤ n ≤ 6

The roots on t ∈ (0,1) for n ≤ 6 computed using the power and Bernstein forms were found to differ by < 10− 15. They are enumerated in Table 1 to 10 decimal places. For n > 6, double–precision floating–point arithmetic was used to compute the binomial coefficients in (8), to avoid integer overflow. However, the roots computed using the power and Bernstein forms still differ by < 10− 15 in most cases, and 10− 13 for the roots clustered near t = 1 when n = 10. These roots are listed in Table 2 to 10 decimal places.

Table 2 Computed roots of pn(t) on (0,1) for 7 ≤ n ≤ 10
Fig. 4
figure 4

Distribution of real roots of the polynomials pn(t) over the interval t ∈ (0,1) for the cases n ≤ 2 ≤ 10 (the roots at t = 0 and t = 1 are omitted)

Fig. 5
figure 5

Distribution of real roots of the polynomials pn(t) over the interval t ∈ (0,1) for the cases 2 ≤ n ≤ 10, employing logarithmic scales for t and n

4.2 Distribution of the positive real roots

The computed roots of pn(t) on (0,1) for 2 ≤ n ≤ 10, as listed in Tables 1 and 2, are shown in Fig. 4 with linear scales for t and n, and in Fig. 5 with logarithmic scales. For n > 10, the computations are very cumbersome, and the clustering of roots near t = 1 makes them difficult to resolve. Figure 5 suggests a power–law dependence of the roots on n, but this is not exact. A least–squares fit to the smallest root of pn(t) for 2 ≤ n ≤ 10 gives

$$ t \approx a n^b , \quad a \approx 1.146625 , b \approx -1.036283 . $$
(10)

As seen in Table 3, there is a noticeable discrepancy at the smaller n values, although the accuracy improves significantly if the fit is restricted to n ≥ 4.

Table 3 Comparison of smallest roots of the polynomials pn(t) for 2 ≤ n ≤ 10 with the values obtained from a power law least–squares fit of the form (10)

Since the tightly clustered roots near t = 1 are not easily distinguishable for the larger n values in Figs. 4 and 5, an alternative plot is presented in Fig. 6, showing their distances from t = 1 on a logarithmic scale in t.

Figure 7 illustrates the super–quadratic distribution of the roots of pn(t) identified in Theorem 1 for the case n = 10. The interlacing property of the roots of pn(t) and pn+ 1(t) is evident in Tables 1 and 2, and through a close inspection of Fig. 4.

4.3 The complete set of roots

Although we are mainly interested in the positive real roots of pn(t) on (0,1), we can easily identify some basic facts about its entire set of roots. First, as a consequence of Theorem 1 and Descartes’ law of signs, we observe that pn(t) has no real roots t > 1. Concerning the negative roots, we note that, whereas pn(t) has n coefficient sign variations, pn(−t) has only n − 1. Thus, Descartes’ law of signs implies that pn(t) has at least one negative real root for even n. Moreover, pn(−t) = pn(t) − 2t, and we have pn(− 1) = − 2 for any n, since pn(1) = 0. Hence, noting that pn(t) → + as t →− for even n, there must be a real root on (−,− 1) if n is even. The negative real roots of pn(t) can be identified as the negatives of the values t > 0 where the graphs of pn(t) and 2t intersect. For t ∈ (0,1), this first occurs when n = 10, as can be seen by noting in Fig. 2 that the graph of 2t crosses p10(t) twice.

Fig. 6
figure 6

Distance of the real roots of the polynomials pn(t) on (0,1) from t = 1 for the cases n ≤ 2 ≤ 10, plotted using a logarithmic scale for 1 − t

Fig. 7
figure 7

The “super–quadratic” distribution property \(t_{n,k}^{2}>t_{n,k-1}\) for k = 2,…,n − 1 of the roots tn,1,…,tn,n− 1 of the polynomial pn(t) for n = 10

Fig. 8
figure 8

The entire set of (real and complex) roots of pn(t) for n = 4,6,8,10

Finally, we briefly comment on the complete set of 2n (real and complex) roots of pn(t) which (as computed in Maple) are illustrated in Fig. 8 for the cases n = 4,6,8,10. It is seen that (most of) the complex roots cluster near a circle centered at the origin, whose radius approaches 1 as n increases.

5 Numerical stability of roots

It is known that, for a polynomial specified in the power and Bernstein forms

$$ f(t) = \sum\limits_{i=0}^{m} a_{i}t^{i} = \sum\limits_{i=0}^{m} c_{i} \binom{m}{i}(1-t)^{m-i}t^{i} , $$

on t ∈ [0,1], the values (and roots) of f(t) are systematically less sensitive to uncertainties of a uniform maximum relative magnitude 𝜖 in the Bernstein coefficients than in the power coefficients [5]. In fact, it has been shown [4] that among all non–negative polynomial bases on [0,1], the Bernstein basis is “optimally stable” with respect to such coefficient perturbations.

For the power form (1), the n + 1 non–zero coefficients have magnitudes ≥ 1 that (except for the lowest– and highest–order terms) are increasing with n. On the other hand, the Bernstein form (7) has 2n − 1 non–zero coefficients, of magnitude ≤ 1 that are decreasing with n. For both forms, the non–zero coefficients have alternating signs. A somewhat surprising result is that both representations admit computation of the roots of pn(t) with similar accuracy—it is only for the tightly clustered roots near t = 1 in the case n = 10 that the power form begins to exhibit slower and more erratic convergence of the Newton–Raphson iterations than the Bernstein form (see Table 4).

Table 4 Newton–Raphson iterations for the largest real root of p10(t) from the initial value t = 0.999966, using the Bernstein and power representations. Note that the computed values differ only in the 14th and 15th decimal places

This phenomenon may be understood as follows. For the power form (1), the n + 1 non–zero coefficients are known exactlyFootnote 3 a priori as integers. For the Bernstein form (7), on the other hand, the 2n − 1 non–zero coefficients must be computed from the expression (8). This can be done almost exactly for n ≤ 6 using “long” (64–bit) integers, but floating–point arithmetic must be employed for n > 6, incurring “initial” coefficient errors.

However, initial coefficient errors are not the only factors influencing the accuracy of the computed roots. In backward error analysis [14, 16], the result of each floating–point arithmetic operation is interpreted as an exact outcome for perturbed operands. By “backward” propagation of these perturbations through the algorithm, from the final result to the input values, the outcome of the computation can be regarded as exact for certain perturbed inputs. Knowing these input perturbations and the condition number for the problem (i.e., the sensitivity of the output values to changes in the input values), the accuracy of the final result can (in principle) be assessed.

The condition number of a simple real root t = τ of a polynomial pm(t) with coefficients a0,…,am in the basis {ϕ0(t),…,ϕm(t)} is conventionally defined [6, 7] as

$$ C(\tau) := \frac{\displaystyle{\sum\limits_{i=0}^{m} | a_{i}\phi_{i}(\tau) |}}{|p_{m}^{\prime}(\tau)|} . $$

In the limit 𝜖 → 0, the perturbation δτ in the root τ incurred by errors in the coefficients of uniform maximum relative magnitude 𝜖 satisfies

$$ | \delta\tau | \le C(\tau) \epsilon . $$

Table 5 lists the condition numbers CB and CP in the Bernstein and power bases for the 9 roots of the instance n = 10 of the polynomial (1). It is seen that the inequality CB < CP holds for all the roots, and the growth of CP as the roots approach 1 is noteworthy (whereas CB actually decreases). This explains the slower convergence to the largest root seen in Table 4.

Table 5 Root condition numbers for the Bernstein and power forms of p10(t)

The condition number values listed in Table 5 are actually very modest. This fact, coupled with the remarkable degree of agreement between the roots computed using the power and Bernstein forms, imparts confidence in the roots quoted herein. In proceeding beyond n = 10, it is likely that the Bernstein form will furnish better results for the roots tightly clustered near t = 1, although at a significantly higher computational cost.

6 Closure

In this study, several interesting theoretical properties of the polynomials pn(t) defined by (1) have been elucidated—including (i) the existence of a simple recursion relating pn+ 1(t) to pn(t); (ii) the fact that these polynomials have, for each n > 1, precisely n − 1 roots on t ∈ (0,1) and none for t > 1; (iii) the super–quadratic distribution of these roots, i.e., each root is smaller than the square of the next–largest root; and (iv) the interlacing of the roots of pn(t) and pn+ 1(t), wherein the roots of the latter on (0,1) lie within the intervals delineated by the roots of former.

The unusual nature of the polynomials pn(t) poses formidable challenges for root computation algorithms on account of their high degrees and strong scale variations of their graphs. Nevertheless, it was observed that for n ≤ 10, their roots are remarkably well–conditioned, and admit accurate computation in ordinary double–precision arithmetic using both the power and Bernstein forms. Although the Bernstein form is more stable than the power form, it lacks the sparsity of the power form, and its advantages are only apparent with the tightly clustered roots of p10(t) near t = 1. Due to the exponential growth in the degree, root computations for pn(t) with n > 10 have not been attempted: these tax the capabilities of both symbolic and numeric methods, and special procedures may be needed to reliably address them.