Abstract
We compare two variants of a multivariate Horner scheme for polynomials of a certain total degree with respect to computational effort and backward error.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The evaluation of polynomials at a given point is a classical issue, and the univariate method by means of nested multiplications or synthetic division, known as the Horner scheme is well-investigated and covered for example in [2, p. 94ff]. A multivariate variant of the Horner scheme, tailored especially to the evaluation in Bernstein–Beziér form and its comparison with the de Casteljau algorithm, has been given in [6], however without a detailed analysis of the roundoff-error. The numerical behavior of the Horner scheme was first considered in detail in [3, 4] where it was shown, based on an observation from [5], that the recursive structure of the multivariate Horner scheme leads to a backwards error proportional to the total degree of the polynomial. In high dimensions this is orders of magnitude smaller than the number of operations performed in the evaluation. Slightly more recently, de Boor gave another approach to the multivariate Horner scheme in [1] as a byproduct of a method to properly and efficiently index the coefficients of a multivariate polynomial in monomial basis.
In this paper, we compare the two variants in terms of computational effort and numerical stability, showing that de Boor’s method outperforms the one from [3] in both respects. Nevertheless, like the differences in the algorithms themselves, the advantages provided by de Boor’s method are subtle and not a matter of orders of magnitude. On the other hand, it is quite surprising and therefore interesting that such subtle differences can lead to “measurable” effects.
2 Notation and basic facts
A polynomial in \(d\) variables is a finite linear combination of monomials,
The length or absolute value of a multiindex \(\alpha \in {\mathbb {N}}_0^d\) is as usual defined as \(|\alpha | = \alpha _1 + \cdots + \alpha _d\) and the (total) degree of a polynomial \(f \ne 0\) as
Indexing the coefficients of polynomials by multiindices is elegant and useful for theoretical purposes, but is quite unpractical in any implementation where one has to rely on linearly ordered arrays. Therefore, we have to map the multiindices to \({\mathbb {N}}\). This can be done in various ways—in this context we will use the lexicographical (lex) and graded lexicographical (glex) ordering as well as the reverse lexicographical (rlex) ordering.
We denote by \(\Pi _n\) the vector space of all polynomials of total degree at most \(n\). The definition of polynomials in (2.1) can be generalized by replacing the monomial basis by any basis of the form
as long as the vectors \(v_{j,k} \in {\mathbb {R}}^d\), \(j=1,\dots ,d\), are linearly independent for any \(k \in {\mathbb {N}}_0\), cf. [3]. This corresponds to a linear change of variables in each step of the recursive evaluation and can, as shown in [3], be used to obtain better bounds on the forward error of the recursive evaluation process, depending, for example, on a certain a priori knowledge of a region \(\Omega \subseteq {\mathbb {R}}\) where the polynomial is to be evaluated.
The Horner scheme performs the evaluation of the polynomial by splitting off linear factors and thus decomposing the problem recursively into simpler subproblems. Nevertheless, this can be done in several ways, resulting in algorithms of significantly different performance.
The Horner scheme proposed in [3] makes use of the straightforward nested recursive factorization
where the subpolynomials
are evaluated recursively and rely on the coefficient subvectors \(f_{(\beta ,k)}\) ordered with respect to a reverse lexicographical ordering. Hence, a recursive evaluation that first completely evaluates \(q_0\), then \(q_1\) and so on accesses the coefficients of the polynomial in reverse lexicographical order and thus can avoid any costly order conversions between multiindex and linear ordering, cf. [1]. Unfortunately, for many applications a graded ordering of coefficients is more useful and beneficial, especially when polynomials of different total degree have to be added. Then, the advantage of the above evaluation method is lost.
The most natural and most widely used situation is, of course, to return to the monomial basis by setting
so that \(\ell _\alpha (x) = x^\alpha \), \(\alpha \in {\mathbb {N}}_0^d\). In this special case, the Horner scheme from [3] can be improved in terms of efficiency and numerical stability by using the slightly different nested evaluation method from [1].
3 Another evaluation method and performance analysis
In this section we will analyze the computational effort of the two evaluation methods by counting the additions and multiplications that have to be performed. Starting with deBoor’s method we first give some elementary facts on monomial matrices.
Definition 3.1
Let \(k \in {\mathbb {N}}_0\), \(d \in {\mathbb {N}}\). The index matrix or monomial matrix
is the matrix obtained by arranging all multiindices \(\alpha \in {\mathbb {N}}_0^d\), \(|\alpha |=k\), in lexicographical order.
We write \(\# A_{k,d} \in {\mathbb {N}}\) for the number of rows of \(A_{k,d}\). Moreover, we define the matrices
as the submatrices of \(A_{k,d}\) whose first \(i\) columns vanish.
We identify these matrices with the (ordered) set of their rows, so the notation \(\alpha \in A_{k,d}\) means that \(\alpha \) is a row of \(A_{k,d}\). Since \(\alpha \in A_{k,d}^{(i)}\) requires \(\alpha \in {\mathbb {N}}_0^d\) and \(|\alpha |=k\), we also have \(\alpha \in A_{k,d}\).
Example 3.1
There exist many useful properties of monomial matrices, but in our context we only need a small selection:
The formulas in (3.1) are well known; a connection between \(A_{k,d}\) and \(A_{k,d}^{(i)}\) is shown in the following lemma.
Lemma 3.1
For \(k \in {\mathbb {N}}_0\), \(d \in {\mathbb {N}}\) and \(0 \le i \le d-1\),
Proof
Let \(\alpha \in {\mathbb {N}}_0^d\) be a row of the right hand side matrix, then \(\alpha _1 = \cdots = \alpha _i = 0 < \alpha _{i+1}\) and \(|\alpha | = \sum _{k=1}^d \alpha _k = 0 + 1 + (j-1) = j\), hence \(\alpha \in A_{j,d}^{(i)}\).
For \(\alpha \in A_{j,d}^{(i)}\) we have \(\alpha = (0,\widetilde{\alpha })\) with \(\widetilde{\alpha }= (\alpha _{i+1}, \dots ,\alpha _d) \in A_{j,d-i}\). Since \(\alpha _{i+1} > 0\) by definition of \(A_{j,d}^{(i)}\), this can be written as \(\widetilde{\alpha }= \varepsilon _1 + (\widetilde{\alpha }- \varepsilon _1)\), where \(\varepsilon _1 \in {\mathbb {N}}_0^{d-i}\) denotes the first unit vector and \((\widetilde{\alpha }- \varepsilon _1) \in A_{j-1,d-i}\). Consequently, \(\alpha \) must be a row of the matrix on the right hand side and therefore both matrices contain the same rows and are in reverse lexicographical order, so they must be identical. \(\square \)
The following algorithm describes de Boor’s evaluation method [1, p. 297ff]:
This algorithm terminates since all loops are finite and that it returns \(f_{[0,\ldots ,0]} = f(\xi )\) follows readily from the observation that the coefficients of the polynomials can be rearranged as
whenever \(\alpha _i > 0\).
As mentioned before multiindices can be avoided by a linear mapping. Following deBoor in [1] we use a graded lexicographical ordering. This means every \(\alpha \in {\mathbb {N}}_0^d\) can be identified by the index of the corresponding row in the stacked monomial matrix
Applying this mapping to Algorithm 1 leads to the following Lemma.
Lemma 3.2
([1], p. 298) Let \(f \in \Pi _k\) and \(f_j\) denote the \(j\)th coefficient of \(f\) in the linear ordering described above. Then Algorithm 1 can be reformulated to
Algorithm 2 now requires the computation of \(\# A_{i,j}\) for many \(i,j \in {\mathbb {N}}\). This task can be performed efficiently by providing a lookup table:
Definition 3.2
Let \(k \in {\mathbb {N}}_0\), \(d \in {\mathbb {N}}\). The matrix
is called Pascal matrix.
In this representation we have \(\# A_{i,j} = P_{k,d}(i-1,j)\). The Pascal matrix \(P_{k,d}\) can be computed efficiently since it is symmetric and follows a well-known recursion formula, and it is even directly available in Matlab. Moreover, the matrix depends only on the total degree and the number of variables and can be extended reuseably to higher degrees or dimensions by adding appropriate rows and columns, respectively. Due to the fact that Pascal matrices can be easily precomputed, we neglect this matrix in the complexity analysis.
In [1] Algorithm 1 was stated to attain a lower bound for the number of additions and multiplications in the also presented symmetric nested multiplication algorithm. Since there is only one addition and one multiplication in Algorithm 2 this number can be determined by simply counting the loops
Theorem 3.1
The evaluation of \(f \in \Pi _k\), \(\deg (f)=k\), by Algorithm 2 requires \({d+k \atopwithdelims ()k}-1\) multiplications and the same number of additions, hence \(2{d+k \atopwithdelims ()k}-2\) operations.
This count of operations quantifies the computational cost of Algorithm 2. To compare the methods, we also analyze the algorithm given in [3] with the following result.
Theorem 3.2
The evaluation of \(f \in \Pi _k\), \(\deg (f)=k\), by the method given in [3] requires \({d+k+1 \atopwithdelims ()d}-1 = {d+k \atopwithdelims ()k}+{d+k \atopwithdelims ()d-1}-1\) additions and \({d+k \atopwithdelims ()k}-1\) multiplications, hence \(2{d+k \atopwithdelims ()k}-2+{d+k \atopwithdelims ()d-1}\) operations.
In summary, the algorithm of Peña/Sauer requires \({d+k \atopwithdelims ()d-1}\) operations more than the algorithm of deBoor. To prove this statement we first observe that the evaluation method of Peña and Sauer is based on different term ordering, for which purpose we need to modify the monomial matrices.
Definition 3.3
Let \({\mathbf {A}}_{k,d}^{rlex}\) be the matrix of all multiindices \(\alpha \in {\mathbb {N}}_0^d\), \(\alpha \le |k|\), in decreasing reverse lexicographical order.
Example 3.2
The matrix \({\mathbf {A}}_{k,d}^{rlex}\) is simply a reordered version of the stacked monomial matrix \({\mathbf {A}}_{k,d}^{glex}\). Hence, the size of \({\mathbf {A}}_{k,d}^{rlex}\) can be described as
Based on this matrix we can formulate the evaluation method, denoting by \(f_0,\dots ,f_{M-1}\) the coefficients of \(f\) in reverse lexicographical order and by \({\mathbf {A}}_{k,d}^{rlex}(i,j)\) the \((i,j)\)th element of the modified monomial matrix.
Proposition 3.1
The evaluation of \(f \in \Pi _k\), by Algorithm 3 requires \({d+k \atopwithdelims ()k}-1\) multiplications.
This is easy to see since the only multiplication in Algorithm 3 appears in line 5 and this statement is executed \(\# A_{k,d}^{rlex}-1\) times. To count the additions, we need some more preliminary work as the number of additions in line 5 depends on the value of \(K \in \{ 1,\ldots ,d \}\).
Lemma 3.3
For \(f \in \Pi _k\) Algorithm 3 assigns \(K\) the value \((d-j+1)\), \(j=1,\ldots ,d\), exactly \({k+j-1 \atopwithdelims ()j}\) times.
To prove Lemma 3.3 we need a recursion formula for the matrices \({\mathbf {A}}_{k,d}^{rlex}\).
Lemma 3.4
For \(k, d \in {\mathbb {N}}\),
Proof
The decreasing reverse lexicographical order partitions the matrix \({\mathbf {A}}_{k,d}^{rlex}\) into \(d+1\) blocks in the following way: The multiindex \(\alpha \in {\mathbf {A}}_{k,d}^{rlex}\) is a part of block \(j\), \(j=0,\dots ,d\), if and only if \(\alpha _{d-j} \ne 0 = \alpha _{d-j+1} = \cdots = \alpha _d\). Thus,
Let \(\alpha \) be in the \((d-j+1)\)th block of \({\mathbf {A}}_{k,d}^{rlex}\), i.e., \(|\alpha | \le k\) and \(\alpha = (\widetilde{\alpha },0)\) with \(\widetilde{\alpha }= (\alpha _1,\ldots ,\alpha _j)\) so that \(|\widetilde{\alpha }| \le k\) and \(\alpha _j \ge 1\). This yields \((\widetilde{\alpha }- \varepsilon _j) \in {\mathbf {A}}_{k-1,j}^{rlex}\) and \(\widetilde{\alpha }\in \widetilde{{\mathbf {A}}_{k-1,j}^{rlex}}\) as well.
Hence, \(\widetilde{{\mathbf {A}}_{k-1,j}^{rlex}}\) and the \((d-j+1)\)th block of \({\mathbf {A}}_{k,d}^{rlex}\) contain exactly the same rows and equality again follows from the fact that \(\widetilde{{\mathbf {A}}_{k-1,j}^{rlex}}\), \(j=0,\ldots ,d\), are both in the same order, this time a decreasing reverse lexicographical one. \(\square \)
Proof (of Lemma 3.3)
We use induction on \(k\), where for \(k=1\) we have
hence, the assignment to \(K\) is
Therefore \(K\) takes on each value from \(\{ d-j+1 : j=1,\dots ,d \}\) exactly once.
Now suppose that the hypothesis hold for \((k-1)\), \(k \ge 2\). We consider \({\mathbf {A}}_{k,d}^{rlex}\) and the number of times the value \((d-j+1)\) is taken on by \(K\). By using Lemma 3.4 we can count these values for each block of \({\mathbf {A}}_{k,d}^{rlex}\) separately and it is sufficient to consider the first \(j\) blocks only. Since the value of \(K\) is not affected by increasing a column by the same value we can use the matrices \({\mathbf {A}}_{k-1,d-i}^{rlex}\), \(i=0,\dots ,j-1\). Then the induction hypothesis yields that \(K\) takes on the value \((d-j+1)=((d-i)-(j-i)+1)\) for
times within the first \(j\) blocks.
Furthermore, we notice that the algorithm also sets \(K\) to \((d-j+1)\) by passing from block \(j\) to \(j+1\). Hence, we have to increase the counter by one:
This advances the induction hypothesis and completes the proof. \(\square \)
Proposition 3.2
The evaluation of \(f \in \Pi _k\) by Algorithm 3 requires \({d+k+1 \atopwithdelims ()d}-1\) additions.
Proof
The algorithm computes
additions in line 9. Besides that, we need \(d\) additions to create the output value:
\(\square \)
Now the results of Propositions 3.1 and 3.2 immediately yield Theorem 3.2.
4 Numerical stability
In this section we analyze the numerical stability of both methods. For that reason we need some notions from (backwards) error analysis which can be found in [2].
Definition 4.1
For \(a \in {\mathbb {R}}\) the expression \({{\mathrm{fl}}}(a)\) denotes the computed value in a floating point arithmetic. The computations in this arithmetic follow the standard model
where \(u\) denotes the unit roundoff.
The following two results are standard methods to computationally handle roundoff errors in backwards error analysis.
Lemma 4.1
([2], p. 63) Let \(|\delta _i| \le u\), \(\rho _i = \pm 1\), \(i=1,\ldots ,n\) and \(nu < 1\), then
where
Lemma 4.2
([2], p. 67) Let \(\theta _k\), \(\gamma _k\) be like above, then
-
1.
\((1+\theta _k)(1+\theta _j) = 1+\theta _{k+j},\)
-
2.
\(\gamma _k+\gamma _j+\gamma _k\gamma _j \le \gamma _{k+j},\)
-
3.
\(\gamma _k+u \le \gamma _{k+1}.\)
Finally, we introduce another simplifying notation from [2, p. 68], namely
which gives the convenient computational rule
In [3] the backwards error of Algorithm 3 was given as
In comparison, we perform the backwards error analysis of de Boor’s method, Algorithm 1, which performs even slightly better.
Theorem 4.1
For \(f(x) = \sum _{|\alpha | \le m} f_{\alpha }x^{\alpha }\) suppose that \(m := \deg (f)\) satisfies \((2m+d-1)u < 1\) and denote by \({{\mathrm{fl}}}(f(x)) = \sum _{\alpha \in {\mathbb {N}}_0^d} \hat{f}_{\alpha } x^{\alpha }\) the computed result of Algorithm 1. Then
where \(\mu (\alpha ) := \min \{ j : \alpha _j \ne 0 \}\).
Proof
We proceed by induction on \(m = \deg (f)\). For \(m=1\) let \(f(x) = f_0+f_{\varepsilon _1}x^{\varepsilon _1} + \cdots + f_{\varepsilon _d}x^{\varepsilon _d}\) where \(\varepsilon _j \in {\mathbb {N}}_0^d\), \(j=1,\dots ,d\), denotes the \(j\)th unit vector. The evaluation in the floating point model can be written by means of (4.1) as
and a comparison of the coefficients yields
that is,
Now suppose that the claim is proved for \((m-1)\), \(m \ge 2\), and let \(f(x) = \sum _{|\alpha | \le m} f_{\alpha }x^{\alpha }\). We perform one iteration step of Algorithm 1 and compute new coefficients
without changing the value of \(f(x)\). Therefore we can consider \(f\) to be a polynomial of degree \((m-1)\),
and apply the induction hypothesis
The last identity holds because \(\mu (\alpha +\varepsilon _i)=i\), \(i=1, \dots , \mu (\alpha )\). Comparison of coefficients again yields
and
This advances the induction hypothesis and completes the proof. \(\square \)
In [3] the backwards error estimates have been used to also derive a forward error bound. One can copy these methods almost verbatim and use the absolute and relative condition numbers
respectively. It then follows that in the case of de Boor’s method we have whenever \(f(x) \ne 0\)
It is also possible to give global estimates for the forward error of evaluation on a bounded domain, where again we refer to [3] where this issue is discussed in detail. Here, however, is the only disadvantage of de Boor’s method: since it is designed for monomials, the improved backwards stability is compensated to some extent by the higher condition number of the monomials.
5 Design aspects and numerical experiments
In this section we will show how the shortcomings of Peña and Sauer’s variant affect the practical application. To this end Algorithm 2 and Algorithm 3 have been implemented in GNU Octave and applied to the following scenario: Let \(d=3\) and \(f_0,\ldots ,f_{10} \in \Pi _d\), \(\deg (f_k) = k\). These polynomials have been evaluated at the point \(\xi = (1,2,3)\) by the methods described above and the computation time has been measuredFootnote 1 for each polynomial. The results are given in Fig. 1.
This experiment shows that for \(d=3\) and \(k=10\) the evaluation method by deBoor is about 10-times faster than the variant by Peña and Sauer. This becomes even more significant if many evaluations have to be performed, e.g. in Vandermonde matrices.
But there is another important point that is not taken into account by results above. Both algorithms are restricted to a specific coefficient ordering. To avoid costly order conversions design aspects of an algorithm using multivariate polynomials should include the compatibility of the evaluation method and the coefficient ordering.
6 Summary
We have shown that de Boor’s variant of the multivariate Horner scheme outperforms the one by Peña and Sauer in terms of computational effort and backwards stability. Even if these differences are minor ones, they are “measurable”. The only disadvantage, if any, of de Boor’s approach is that it is tied to the monomial basis while the approach from [3] allows for a more general extension in the sense of (2.5) and thus for bases with a better condition number and hence a better forward error estimate.
Notes
Reference: Intel i5 @ 2.7 Ghz, GNU Octave, version 3.4.0.
References
de Boor, C.: Computational aspects of multivariate polynomial interpolation: indexing the coefficients. Adv. Comput. Math. 12(4), 289–301 (2000)
Higham, N.: Accuracy and Stability of Numerical Algorithms, 2nd edn. SIAM (2002)
Peña, J., Sauer, T.: On the multivariate Horner scheme. SIAM J. Numer. Anal. 37, 1186–1197 (2000)
Peña, J.M., Sauer, T.: On the multivariate Horner scheme II: running error analysis. Computing 65, 313–322 (2000)
Sauer, T.: Computational aspects of multivariate polynomial interpolation. Adv. Comput. Math. 3, 219–238 (1995)
Schumaker, L.L., Volk, W.: Efficient evaluation of multivariate polynomials. Comput. Aided Geom. Design 3, 149–154 (1986)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Michael Floater.
Rights and permissions
About this article
Cite this article
Czekansky, J., Sauer, T. The multivariate Horner scheme revisited. Bit Numer Math 55, 1043–1056 (2015). https://doi.org/10.1007/s10543-014-0533-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10543-014-0533-x