1 Introduction

The evaluation of polynomials at a given point is a classical issue, and the univariate method by means of nested multiplications or synthetic division, known as the Horner scheme is well-investigated and covered for example in [2, p. 94ff]. A multivariate variant of the Horner scheme, tailored especially to the evaluation in Bernstein–Beziér form and its comparison with the de Casteljau algorithm, has been given in [6], however without a detailed analysis of the roundoff-error. The numerical behavior of the Horner scheme was first considered in detail in [3, 4] where it was shown, based on an observation from [5], that the recursive structure of the multivariate Horner scheme leads to a backwards error proportional to the total degree of the polynomial. In high dimensions this is orders of magnitude smaller than the number of operations performed in the evaluation. Slightly more recently, de Boor gave another approach to the multivariate Horner scheme in [1] as a byproduct of a method to properly and efficiently index the coefficients of a multivariate polynomial in monomial basis.

In this paper, we compare the two variants in terms of computational effort and numerical stability, showing that de Boor’s method outperforms the one from [3] in both respects. Nevertheless, like the differences in the algorithms themselves, the advantages provided by de Boor’s method are subtle and not a matter of orders of magnitude. On the other hand, it is quite surprising and therefore interesting that such subtle differences can lead to “measurable” effects.

2 Notation and basic facts

A polynomial in \(d\) variables is a finite linear combination of monomials,

$$\begin{aligned} f (x) = \sum _{\alpha \in I} f_\alpha \, x^\alpha , \qquad I \subset {\mathbb {N}}_0^d, \quad \# I < \infty . \end{aligned}$$
(2.1)

The length or absolute value of a multiindex \(\alpha \in {\mathbb {N}}_0^d\) is as usual defined as \(|\alpha | = \alpha _1 + \cdots + \alpha _d\) and the (total) degree of a polynomial \(f \ne 0\) as

$$\begin{aligned} \deg f = \max \left\{ |\alpha | \;:\; f_\alpha \ne 0 \right\} . \end{aligned}$$

Indexing the coefficients of polynomials by multiindices is elegant and useful for theoretical purposes, but is quite unpractical in any implementation where one has to rely on linearly ordered arrays. Therefore, we have to map the multiindices to \({\mathbb {N}}\). This can be done in various ways—in this context we will use the lexicographical (lex) and graded lexicographical (glex) ordering as well as the reverse lexicographical (rlex) ordering.

$$\begin{aligned} \alpha \prec _{lex} \beta \,&:\Leftrightarrow \,&\alpha _k < \beta _k, \alpha _j = \beta _j, j=1,\ldots ,k-1, \end{aligned}$$
(2.2)
$$\begin{aligned} \alpha \prec _{glex} \beta \,&:\Leftrightarrow \,&|\alpha | < |\beta | \text { or } |\alpha |=|\beta | \text { and } \alpha \prec _{lex} \beta , \end{aligned}$$
(2.3)
$$\begin{aligned} \alpha \prec _{rlex} \beta \,&:\Leftrightarrow \,&\alpha _k < \beta _k, \alpha _j = \beta _j, j=k+1,\ldots ,d. \end{aligned}$$
(2.4)

We denote by \(\Pi _n\) the vector space of all polynomials of total degree at most \(n\). The definition of polynomials in (2.1) can be generalized by replacing the monomial basis by any basis of the form

$$\begin{aligned} \ell _\alpha (x) := \prod _{j=1}^d \prod _{k_j=0}^{\alpha _j-1} \ell _{j,k_j} (x), \qquad \ell _{j,k} (x) := \left( v_{j,k}^T x + w_{j,k} \right) , \quad v_{j,k} \in {\mathbb {R}}^d, \, w_{j,k} \in {\mathbb {R}}, \end{aligned}$$

as long as the vectors \(v_{j,k} \in {\mathbb {R}}^d\), \(j=1,\dots ,d\), are linearly independent for any \(k \in {\mathbb {N}}_0\), cf. [3]. This corresponds to a linear change of variables in each step of the recursive evaluation and can, as shown in [3], be used to obtain better bounds on the forward error of the recursive evaluation process, depending, for example, on a certain a priori knowledge of a region \(\Omega \subseteq {\mathbb {R}}\) where the polynomial is to be evaluated.

The Horner scheme performs the evaluation of the polynomial by splitting off linear factors and thus decomposing the problem recursively into simpler subproblems. Nevertheless, this can be done in several ways, resulting in algorithms of significantly different performance.

The Horner scheme proposed in [3] makes use of the straightforward nested recursive factorization

$$\begin{aligned} f(x) = \sum _{\alpha \in I} f_\alpha \, \ell _\alpha (x) = q_0 (x) + \ell _{d,0} (x) \left( q_1 (x) + \ell _{d,1} (x) \left( \cdots + \ell _{d,n} (x) \, q_n (x) \right) \right) \end{aligned}$$
(2.5)

where the subpolynomials

$$\begin{aligned} q_k := \sum _{\alpha _d = k} f_\alpha \, \prod _{j=1}^{d-1} \prod _{k_j=1}^{\alpha _j -1} \ell _{j,k_j}, \quad k=0,\dots ,n, \end{aligned}$$
(2.6)

are evaluated recursively and rely on the coefficient subvectors \(f_{(\beta ,k)}\) ordered with respect to a reverse lexicographical ordering. Hence, a recursive evaluation that first completely evaluates \(q_0\), then \(q_1\) and so on accesses the coefficients of the polynomial in reverse lexicographical order and thus can avoid any costly order conversions between multiindex and linear ordering, cf. [1]. Unfortunately, for many applications a graded ordering of coefficients is more useful and beneficial, especially when polynomials of different total degree have to be added. Then, the advantage of the above evaluation method is lost.

The most natural and most widely used situation is, of course, to return to the monomial basis by setting

$$\begin{aligned} v_{j,k} = e_j, \quad w_{j,k} = 0, \end{aligned}$$

so that \(\ell _\alpha (x) = x^\alpha \), \(\alpha \in {\mathbb {N}}_0^d\). In this special case, the Horner scheme from [3] can be improved in terms of efficiency and numerical stability by using the slightly different nested evaluation method from [1].

3 Another evaluation method and performance analysis

In this section we will analyze the computational effort of the two evaluation methods by counting the additions and multiplications that have to be performed. Starting with deBoor’s method we first give some elementary facts on monomial matrices.

Definition 3.1

Let \(k \in {\mathbb {N}}_0\), \(d \in {\mathbb {N}}\). The index matrix or monomial matrix

$$\begin{aligned} A_{k,d} = [ \alpha \in {\mathbb {N}}_0^d : |\alpha | = k] \in {\mathbb {N}}_0^{\# A_{k,d} \times d}. \end{aligned}$$

is the matrix obtained by arranging all multiindices \(\alpha \in {\mathbb {N}}_0^d\), \(|\alpha |=k\), in lexicographical order.

We write \(\# A_{k,d} \in {\mathbb {N}}\) for the number of rows of \(A_{k,d}\). Moreover, we define the matrices

$$\begin{aligned} A_{k,d}^{(i)} = [ \alpha \in {\mathbb {N}}_0^d : |\alpha | = k, \alpha _1 = \cdots = \alpha _i = 0 < \alpha _{i+1} ], \quad i=0, \dots , d-1, \end{aligned}$$

as the submatrices of \(A_{k,d}\) whose first \(i\) columns vanish.

We identify these matrices with the (ordered) set of their rows, so the notation \(\alpha \in A_{k,d}\) means that \(\alpha \) is a row of \(A_{k,d}\). Since \(\alpha \in A_{k,d}^{(i)}\) requires \(\alpha \in {\mathbb {N}}_0^d\) and \(|\alpha |=k\), we also have \(\alpha \in A_{k,d}\).

Example 3.1

$$\begin{aligned} A_{0,3} = [0,0,0], \quad A_{1,3} = \begin{bmatrix} 0&\quad 0&\quad 1 \\ 0&\quad 1&\quad 0 \\ 1&\quad 0&\quad 0 \end{bmatrix}, \quad A_{2,3} = \begin{bmatrix} 0&\quad 0&\quad 2 \\ 0&\quad 1&\quad 1 \\ 0&\quad 2&\quad 0 \\ 1&\quad 0&\quad 1 \\ 1&\quad 1&\quad 0 \\ 2&\quad 0&\quad 0 \end{bmatrix} \end{aligned}$$
$$\begin{aligned} A_{2,3}^{(2)} = [0,0,2], \quad A_{2,3}^{(1)} = \begin{bmatrix} 0&\quad 1&\quad 1 \\ 0&\quad 2&\quad 0 \end{bmatrix}, \quad A_{2,3}^{(0)} = \begin{bmatrix} 1&\quad 0&\quad 1 \\ 1&\quad 1&\quad 0 \\ 2&\quad 0&\quad 0 \end{bmatrix} \end{aligned}$$

There exist many useful properties of monomial matrices, but in our context we only need a small selection:

$$\begin{aligned} \# A_{k,d} = {k+d-1 \atopwithdelims ()k}, \quad \sum _{j=0}^k \# A_{j,d} = \# A_{k,d+1}, \quad A_{k,d} = \begin{bmatrix} A_{k,d}^{(d-1)} \\ \vdots \\ A_{k,d}^{(0)} \end{bmatrix}. \end{aligned}$$
(3.1)

The formulas in (3.1) are well known; a connection between \(A_{k,d}\) and \(A_{k,d}^{(i)}\) is shown in the following lemma.

Lemma 3.1

For \(k \in {\mathbb {N}}_0\), \(d \in {\mathbb {N}}\) and \(0 \le i \le d-1\),

$$\begin{aligned} A_{j,d}^{(i)} = \begin{bmatrix} \underbrace{ \begin{bmatrix} 0&\quad \cdots&\quad 0 \\ \vdots&\quad \ddots&\quad \vdots \\ 0&\quad \cdots&\quad 0 \end{bmatrix}}_{i},&\underbrace{ \begin{bmatrix} 1&\quad 0&\quad \cdots&\quad 0 \\ \vdots&\quad \vdots&\quad \ddots&\quad \vdots \\ 1&\quad 0&\quad \cdots&\quad 0 \end{bmatrix}}_{d-i} + A_{j-1,d-i} \end{bmatrix} \in {\mathbb {N}}_0^{(\# A_{j-1,d-i}) \times d} . \end{aligned}$$

Proof

Let \(\alpha \in {\mathbb {N}}_0^d\) be a row of the right hand side matrix, then \(\alpha _1 = \cdots = \alpha _i = 0 < \alpha _{i+1}\) and \(|\alpha | = \sum _{k=1}^d \alpha _k = 0 + 1 + (j-1) = j\), hence \(\alpha \in A_{j,d}^{(i)}\).

For \(\alpha \in A_{j,d}^{(i)}\) we have \(\alpha = (0,\widetilde{\alpha })\) with \(\widetilde{\alpha }= (\alpha _{i+1}, \dots ,\alpha _d) \in A_{j,d-i}\). Since \(\alpha _{i+1} > 0\) by definition of \(A_{j,d}^{(i)}\), this can be written as \(\widetilde{\alpha }= \varepsilon _1 + (\widetilde{\alpha }- \varepsilon _1)\), where \(\varepsilon _1 \in {\mathbb {N}}_0^{d-i}\) denotes the first unit vector and \((\widetilde{\alpha }- \varepsilon _1) \in A_{j-1,d-i}\). Consequently, \(\alpha \) must be a row of the matrix on the right hand side and therefore both matrices contain the same rows and are in reverse lexicographical order, so they must be identical. \(\square \)

The following algorithm describes de Boor’s evaluation method [1, p. 297ff]:

figure a

This algorithm terminates since all loops are finite and that it returns \(f_{[0,\ldots ,0]} = f(\xi )\) follows readily from the observation that the coefficients of the polynomials can be rearranged as

$$\begin{aligned} f(\xi ) = \cdots + f_{\alpha -\varepsilon _i} \xi ^{\alpha -\varepsilon _i} + \cdots + f_{\alpha } \xi ^{\alpha } + \cdots = \cdots + \left( f_{\alpha -\varepsilon _i} + \xi _i f_{\alpha } \right) \xi ^{\alpha -\varepsilon _i} + \cdots \end{aligned}$$

whenever \(\alpha _i > 0\).

As mentioned before multiindices can be avoided by a linear mapping. Following deBoor in [1] we use a graded lexicographical ordering. This means every \(\alpha \in {\mathbb {N}}_0^d\) can be identified by the index of the corresponding row in the stacked monomial matrix

$$\begin{aligned} {\mathbf {A}}_{k,d}^{glex} := \left[ A_{0,d}^T, A_{1,d}^T, \ldots , A_{k,d}^T \right] ^T. \end{aligned}$$

Applying this mapping to Algorithm 1 leads to the following Lemma.

Lemma 3.2

([1], p. 298) Let \(f \in \Pi _k\) and \(f_j\) denote the \(j\)th coefficient of \(f\) in the linear ordering described above. Then Algorithm 1 can be reformulated to

figure b

Algorithm 2 now requires the computation of \(\# A_{i,j}\) for many \(i,j \in {\mathbb {N}}\). This task can be performed efficiently by providing a lookup table:

Definition 3.2

Let \(k \in {\mathbb {N}}_0\), \(d \in {\mathbb {N}}\). The matrix

$$\begin{aligned} P_{k,d} := \left[ {i+j \atopwithdelims ()j} : \begin{array}{c} i=0,\dots ,d \\ j=0,\dots ,k \end{array} \right] \in {\mathbb {N}}^{(d+1) \times (k+1)} \end{aligned}$$

is called Pascal matrix.

In this representation we have \(\# A_{i,j} = P_{k,d}(i-1,j)\). The Pascal matrix \(P_{k,d}\) can be computed efficiently since it is symmetric and follows a well-known recursion formula, and it is even directly available in Matlab. Moreover, the matrix depends only on the total degree and the number of variables and can be extended reuseably to higher degrees or dimensions by adding appropriate rows and columns, respectively. Due to the fact that Pascal matrices can be easily precomputed, we neglect this matrix in the complexity analysis.

In [1] Algorithm 1 was stated to attain a lower bound for the number of additions and multiplications in the also presented symmetric nested multiplication algorithm. Since there is only one addition and one multiplication in Algorithm 2 this number can be determined by simply counting the loops

$$\begin{aligned} \sum _{j=1}^k \sum _{i=1}^j \# A_{j,d}^{(i)} = \sum _{j=1}^k \# A_{j,d} = \# A_{k,d+1}-1 = {d+k \atopwithdelims ()k}-1. \end{aligned}$$

Theorem 3.1

The evaluation of \(f \in \Pi _k\), \(\deg (f)=k\), by Algorithm 2 requires \({d+k \atopwithdelims ()k}-1\) multiplications and the same number of additions, hence \(2{d+k \atopwithdelims ()k}-2\) operations.

This count of operations quantifies the computational cost of Algorithm 2. To compare the methods, we also analyze the algorithm given in [3] with the following result.

Theorem 3.2

The evaluation of \(f \in \Pi _k\), \(\deg (f)=k\), by the method given in [3] requires \({d+k+1 \atopwithdelims ()d}-1 = {d+k \atopwithdelims ()k}+{d+k \atopwithdelims ()d-1}-1\) additions and \({d+k \atopwithdelims ()k}-1\) multiplications, hence \(2{d+k \atopwithdelims ()k}-2+{d+k \atopwithdelims ()d-1}\) operations.

In summary, the algorithm of Peña/Sauer requires \({d+k \atopwithdelims ()d-1}\) operations more than the algorithm of deBoor. To prove this statement we first observe that the evaluation method of Peña and Sauer is based on different term ordering, for which purpose we need to modify the monomial matrices.

Definition 3.3

Let \({\mathbf {A}}_{k,d}^{rlex}\) be the matrix of all multiindices \(\alpha \in {\mathbb {N}}_0^d\), \(\alpha \le |k|\), in decreasing reverse lexicographical order.

Example 3.2

$$\begin{aligned} {\mathbf {A}}_{0,3}^{rlex} = \begin{bmatrix} 0&\quad 0&\quad 0 \end{bmatrix}, \quad {\mathbf {A}}_{1,3}^{rlex} = \begin{bmatrix} 0&\quad 0&\quad 1 \\ 0&\quad 1&\quad 0 \\ 1&\quad 0&\quad 0 \\ 0&\quad 0&\quad 0 \end{bmatrix}, \quad {\mathbf {A}}_{2,3}^{rlex} = \begin{bmatrix} 0&\quad 0&\quad 2 \\ 0&\quad 1&\quad 1 \\ 1&\quad 0&\quad 1 \\ 0&\quad 0&\quad 1 \\ 0&\quad 2&\quad 0 \\ 1&\quad 1&\quad 0 \\ 0&\quad 1&\quad 0 \\ 2&\quad 0&\quad 0 \\ 1&\quad 0&\quad 0 \\ 0&\quad 0&\quad 0 \end{bmatrix}. \end{aligned}$$

The matrix \({\mathbf {A}}_{k,d}^{rlex}\) is simply a reordered version of the stacked monomial matrix \({\mathbf {A}}_{k,d}^{glex}\). Hence, the size of \({\mathbf {A}}_{k,d}^{rlex}\) can be described as

$$\begin{aligned} \# {\mathbf {A}}_{k,d}^{rlex} = \# {\mathbf {A}}_{k,d}^{glex} = \sum _{j=0}^k \# A_{j,d} = \# A_{k,d+1} = {d+k \atopwithdelims ()k}. \end{aligned}$$

Based on this matrix we can formulate the evaluation method, denoting by \(f_0,\dots ,f_{M-1}\) the coefficients of \(f\) in reverse lexicographical order and by \({\mathbf {A}}_{k,d}^{rlex}(i,j)\) the \((i,j)\)th element of the modified monomial matrix.

figure c

Proposition 3.1

The evaluation of \(f \in \Pi _k\), by Algorithm 3 requires \({d+k \atopwithdelims ()k}-1\) multiplications.

This is easy to see since the only multiplication in Algorithm 3 appears in line 5 and this statement is executed \(\# A_{k,d}^{rlex}-1\) times. To count the additions, we need some more preliminary work as the number of additions in line 5 depends on the value of \(K \in \{ 1,\ldots ,d \}\).

Lemma 3.3

For \(f \in \Pi _k\) Algorithm 3 assigns \(K\) the value \((d-j+1)\), \(j=1,\ldots ,d\), exactly \({k+j-1 \atopwithdelims ()j}\) times.

To prove Lemma 3.3 we need a recursion formula for the matrices \({\mathbf {A}}_{k,d}^{rlex}\).

Lemma 3.4

For \(k, d \in {\mathbb {N}}\),

$$\begin{aligned} {\mathbf {A}}_{k,d}^{rlex} = \begin{bmatrix} \widetilde{{\mathbf {A}}_{k-1,d}^{rlex}} \\ \widetilde{{\mathbf {A}}_{k-1,d-1}^{rlex}}&0 \\ \vdots&\ddots \\ \widetilde{{\mathbf {A}}_{k-1,0}^{rlex}}&0&\cdots&0 \\ \end{bmatrix}, \quad \widetilde{{\mathbf {A}}_{k,j}^{rlex}} = {\mathbf {A}}_{k,j}^{rlex}+\begin{bmatrix} 0&\cdots&0&1 \\ \vdots&\vdots&\vdots \\ 0&\cdots&0&1 \end{bmatrix}, \; j=0,1,\dots ,d. \end{aligned}$$

Proof

The decreasing reverse lexicographical order partitions the matrix \({\mathbf {A}}_{k,d}^{rlex}\) into \(d+1\) blocks in the following way: The multiindex \(\alpha \in {\mathbf {A}}_{k,d}^{rlex}\) is a part of block \(j\), \(j=0,\dots ,d\), if and only if \(\alpha _{d-j} \ne 0 = \alpha _{d-j+1} = \cdots = \alpha _d\). Thus,

$$\begin{aligned} {\mathbf {A}}_{k,d}^{rlex} = \left[ \begin{array}{cccc} * &{}\quad \cdots &{}\quad \cdots &{}\quad * \\ * &{}\quad \cdots &{}\quad * &{}\quad 0 \\ \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ * &{}\quad 0 &{} \quad \cdots &{}\quad 0 \\ 0 &{}\quad 0 &{} \quad \cdots &{}\quad 0 \\ \end{array}\right] . \end{aligned}$$

Let \(\alpha \) be in the \((d-j+1)\)th block of \({\mathbf {A}}_{k,d}^{rlex}\), i.e., \(|\alpha | \le k\) and \(\alpha = (\widetilde{\alpha },0)\) with \(\widetilde{\alpha }= (\alpha _1,\ldots ,\alpha _j)\) so that \(|\widetilde{\alpha }| \le k\) and \(\alpha _j \ge 1\). This yields \((\widetilde{\alpha }- \varepsilon _j) \in {\mathbf {A}}_{k-1,j}^{rlex}\) and \(\widetilde{\alpha }\in \widetilde{{\mathbf {A}}_{k-1,j}^{rlex}}\) as well.

Hence, \(\widetilde{{\mathbf {A}}_{k-1,j}^{rlex}}\) and the \((d-j+1)\)th block of \({\mathbf {A}}_{k,d}^{rlex}\) contain exactly the same rows and equality again follows from the fact that \(\widetilde{{\mathbf {A}}_{k-1,j}^{rlex}}\), \(j=0,\ldots ,d\), are both in the same order, this time a decreasing reverse lexicographical one. \(\square \)

Proof (of Lemma 3.3)

We use induction on \(k\), where for \(k=1\) we have

$$\begin{aligned} {\mathbf {A}}_{1,d}^{rlex} = \begin{bmatrix} 0&\quad \cdots&\quad 0&\quad 1 \\ 0&\quad \cdots&\quad 1&\quad 0 \\ \vdots&\quad \ddots&\quad \ddots&\quad \vdots \\ 1&\quad 0&\quad \cdots&\quad 0 \\ 0&\quad 0&\quad \cdots&\quad 0 \end{bmatrix} \in {\mathbb {N}}_0^{(d+1) \times d}, \end{aligned}$$

hence, the assignment to \(K\) is

$$\begin{aligned} \max \{ 1 \le i \le d : {\mathbf {A}}_{1,d}^{rlex}(n+1,i) \ne {\mathbf {A}}_{1,d}^{rlex}(n,i) \} = n, \qquad n=1,\dots ,d. \end{aligned}$$

Therefore \(K\) takes on each value from \(\{ d-j+1 : j=1,\dots ,d \}\) exactly once.

Now suppose that the hypothesis hold for \((k-1)\), \(k \ge 2\). We consider \({\mathbf {A}}_{k,d}^{rlex}\) and the number of times the value \((d-j+1)\) is taken on by \(K\). By using Lemma 3.4 we can count these values for each block of \({\mathbf {A}}_{k,d}^{rlex}\) separately and it is sufficient to consider the first \(j\) blocks only. Since the value of \(K\) is not affected by increasing a column by the same value we can use the matrices \({\mathbf {A}}_{k-1,d-i}^{rlex}\), \(i=0,\dots ,j-1\). Then the induction hypothesis yields that \(K\) takes on the value \((d-j+1)=((d-i)-(j-i)+1)\) for

$$\begin{aligned} \sum _{i=0}^{j-1} {(k-1)+(j-i)-1 \atopwithdelims ()j-i} = \sum _{i=1}^{j} {(k-1)+i-1 \atopwithdelims ()i} \end{aligned}$$

times within the first \(j\) blocks.

Furthermore, we notice that the algorithm also sets \(K\) to \((d-j+1)\) by passing from block \(j\) to \(j+1\). Hence, we have to increase the counter by one:

$$\begin{aligned} 1+\sum _{i=1}^j {(k-1)+i-1 \atopwithdelims ()i} = \sum _{i=0}^j {(k-1) +i-1 \atopwithdelims ()i} = { k+j-1 \atopwithdelims ()j }. \end{aligned}$$

This advances the induction hypothesis and completes the proof. \(\square \)

Proposition 3.2

The evaluation of \(f \in \Pi _k\) by Algorithm 3 requires \({d+k+1 \atopwithdelims ()d}-1\) additions.

Proof

The algorithm computes

$$\begin{aligned} \sum _{j=1}^d {k+j-1 \atopwithdelims ()j} (d-j+1) \end{aligned}$$

additions in line 9. Besides that, we need \(d\) additions to create the output value:

$$\begin{aligned} d + \sum _{j=1}^d {k+j-1 \atopwithdelims ()j} (d-j+1)&= {k-1 \atopwithdelims ()0} \cdot (d+1) -1\\&\quad + \sum _{j=1}^d {k+j-1 \atopwithdelims ()j} (d-j+1) \\&= -1+\sum _{j=0}^d {k+j-1 \atopwithdelims ()j} (d-j+1) \\&= -1+{d+k+1 \atopwithdelims ()d}. \end{aligned}$$

\(\square \)

Now the results of Propositions 3.1 and 3.2 immediately yield Theorem 3.2.

4 Numerical stability

In this section we analyze the numerical stability of both methods. For that reason we need some notions from (backwards) error analysis which can be found in [2].

Definition 4.1

For \(a \in {\mathbb {R}}\) the expression \({{\mathrm{fl}}}(a)\) denotes the computed value in a floating point arithmetic. The computations in this arithmetic follow the standard model

$$\begin{aligned} {{\mathrm{fl}}}(a \cdot b) =: a \odot b = (a \cdot b)(1+\delta ), \quad |\delta | \le u, \quad \cdot = +,-,\times ,/, \end{aligned}$$

where \(u\) denotes the unit roundoff.

The following two results are standard methods to computationally handle roundoff errors in backwards error analysis.

Lemma 4.1

([2], p. 63) Let \(|\delta _i| \le u\), \(\rho _i = \pm 1\), \(i=1,\ldots ,n\) and \(nu < 1\), then

$$\begin{aligned} \prod _{i=1}^n (1+\delta _i)^{\rho _i} = 1+\theta _n, \end{aligned}$$

where

$$\begin{aligned} |\theta _n| \le \frac{nu}{1-nu} =: \gamma _n. \end{aligned}$$

Lemma 4.2

([2], p. 67) Let \(\theta _k\), \(\gamma _k\) be like above, then

  1. 1.

    \((1+\theta _k)(1+\theta _j) = 1+\theta _{k+j},\)

  2. 2.

    \(\gamma _k+\gamma _j+\gamma _k\gamma _j \le \gamma _{k+j},\)

  3. 3.

    \(\gamma _k+u \le \gamma _{k+1}.\)

Finally, we introduce another simplifying notation from [2, p. 68], namely

$$\begin{aligned}{}[k] := \prod _{i=1}^k (1+\delta _i)^{\rho _i}, \quad \rho _i=\pm 1, \quad |\delta _i| \le u, \end{aligned}$$

which gives the convenient computational rule

$$\begin{aligned}{}[j][k] = [j+k]. \end{aligned}$$
(4.1)

In [3] the backwards error of Algorithm 3 was given as

$$\begin{aligned} \frac{| f_{\alpha } - \hat{f}_{\alpha } |}{|f_{\alpha }|} \le {\left\{ \begin{array}{ll} \gamma _{2|\alpha |+d}, &{}\quad |\alpha | < m, \\ \gamma _{2|\alpha |+d-1}, &{}\quad |\alpha | = m. \end{array}\right. } \end{aligned}$$

In comparison, we perform the backwards error analysis of de Boor’s method, Algorithm 1, which performs even slightly better.

Theorem 4.1

For \(f(x) = \sum _{|\alpha | \le m} f_{\alpha }x^{\alpha }\) suppose that \(m := \deg (f)\) satisfies \((2m+d-1)u < 1\) and denote by \({{\mathrm{fl}}}(f(x)) = \sum _{\alpha \in {\mathbb {N}}_0^d} \hat{f}_{\alpha } x^{\alpha }\) the computed result of Algorithm 1. Then

$$\begin{aligned} \frac{|f_{\alpha }-\hat{f}_{\alpha }|}{|f_{\alpha }|} \le {\left\{ \begin{array}{ll} \gamma _{2|\alpha |+d}, &{}\quad |\alpha | <m, \\ \gamma _{2|\alpha |+d-\mu (\alpha )}, &{}\quad |\alpha | = m, \end{array}\right. } \end{aligned}$$

where \(\mu (\alpha ) := \min \{ j : \alpha _j \ne 0 \}\).

Proof

We proceed by induction on \(m = \deg (f)\). For \(m=1\) let \(f(x) = f_0+f_{\varepsilon _1}x^{\varepsilon _1} + \cdots + f_{\varepsilon _d}x^{\varepsilon _d}\) where \(\varepsilon _j \in {\mathbb {N}}_0^d\), \(j=1,\dots ,d\), denotes the \(j\)th unit vector. The evaluation in the floating point model can be written by means of (4.1) as

$$\begin{aligned} {{\mathrm{fl}}}(f(x))&= f_0 \oplus (f_{\varepsilon _1} \otimes x^{\varepsilon _1}) \oplus (f_{\varepsilon _2} \otimes x^{\varepsilon _2}) \oplus \cdots \oplus (f_{\varepsilon _d} \otimes x^{\varepsilon _d}) \\&= \Big ( \big ( (f_0 \oplus f_{\varepsilon _1}x^{\varepsilon _1}[1]) \oplus f_{\varepsilon _2}x^{\varepsilon _2}[1] \big ) \oplus \cdots \Big ) \oplus (f_{\varepsilon _d}x^{\varepsilon _d}[1]) \\&= \Big ( \big ( (f_0[1] + f_{\varepsilon _1}x^{\varepsilon _1}[2]) \oplus f_{\varepsilon _2}x^{\varepsilon _2}[1] \big ) \oplus \cdots \Big ) \oplus (f_{\varepsilon _d}x^{\varepsilon _d}[1]) \\&= \Big ( \big ( f_0[2] + f_{\varepsilon _1}x^{\varepsilon _1}[3] + f_{\varepsilon _2}x^{\varepsilon _2}[2] \big ) \oplus \cdots \Big ) \oplus (f_{\varepsilon _d}x^{\varepsilon _d}[1]) \\&= \cdots \\&=\! \Big ( f_0[d\!-\!1] \!+\! f_{\varepsilon _1}x^{\varepsilon _1}[d] \!+\! f_{\varepsilon _2}x^{\varepsilon _2}[d\!-\!1] \!+\! \cdots \!+\! f_{\varepsilon _{d\!-\!1}}x^{\varepsilon _{d\!-\!1}}[2] \Big ) \oplus (f_{\varepsilon _d}x^{\varepsilon _d}[1]) \\&= f_0[d] + f_{\varepsilon _1}x^{\varepsilon _1}[d+1] + f_{\varepsilon _2}x^{\varepsilon _2}[d] + \cdots + f_{\varepsilon _d}x^{\varepsilon _d}[2] \\&=: \hat{f}_0 + \hat{f}_{\varepsilon _1} x^{\varepsilon _1} + \hat{f}_{\varepsilon _2} x^{\varepsilon _2} + \cdots + \hat{f}_{\varepsilon _d} x^{\varepsilon _d}, \end{aligned}$$

and a comparison of the coefficients yields

$$\begin{aligned} \hat{f}_0 = f_0[d] = f_0(1+\theta _{d,0}), \end{aligned}$$

that is,

$$\begin{aligned} \hat{f}_{\varepsilon _j} = f_{\varepsilon _j}[2+d-j] = f_{\varepsilon _j}(1+\theta _{2|\varepsilon _j|+d-\mu (\varepsilon _j),\varepsilon _j}). \end{aligned}$$

Now suppose that the claim is proved for \((m-1)\), \(m \ge 2\), and let \(f(x) = \sum _{|\alpha | \le m} f_{\alpha }x^{\alpha }\). We perform one iteration step of Algorithm 1 and compute new coefficients

$$\begin{aligned} \widetilde{f}_{\alpha } = {\left\{ \begin{array}{ll} f_{\alpha }, &{}\quad |\alpha | < m-1, \\ f_{\alpha }+\sum _{i=1}^{\mu (\alpha )} f_{\alpha +\varepsilon _i}x^{\varepsilon _i}, &{}\quad |\alpha | = m-1,\\ 0, &{}\quad |\alpha | = m, \end{array}\right. } \end{aligned}$$

without changing the value of \(f(x)\). Therefore we can consider \(f\) to be a polynomial of degree \((m-1)\),

$$\begin{aligned} f(x) = \sum _{|\alpha | \le m} f_{\alpha }x^{\alpha } = \sum _{|\alpha | \le m-1} \widetilde{f}_{\alpha }x^{\alpha }, \end{aligned}$$

and apply the induction hypothesis

$$\begin{aligned} {{\mathrm{fl}}}(f(x))&= \sum _{|\alpha | \le m-2} {{\mathrm{fl}}}(\widetilde{f}_{\alpha }) x^{\alpha } [2|\alpha |+d] + \sum _{|\alpha | = m-1} {{\mathrm{fl}}}(\widetilde{f}_{\alpha }) x^{\alpha } [2|\alpha |+d-\mu (\alpha )] \\&= \sum _{|\alpha | \le m-2} f_{\alpha } x^{\alpha } [2|\alpha |+d] \\&\quad +\! \sum _{|\alpha | \!=\! m-1} \Big ( f_{\alpha }[\mu (\alpha )]+\sum _{i=1}^{\mu (\alpha )} f_{\alpha +\varepsilon _i}x^{\varepsilon _i}[2+\mu (\alpha )-i] \Big ) x^{\alpha } [2|\alpha |+d-\mu (\alpha )] \\&= \sum _{|\alpha | \le m-2} f_{\alpha } x^{\alpha } [2|\alpha |+d] + \sum _{|\alpha | = m-1} f_{\alpha }x^{\alpha }[2|\alpha |+d] \\&\quad + \sum _{i=1}^{\mu (\alpha )} f_{\alpha +\varepsilon _i}x^{\alpha +\varepsilon _i}[2(|\alpha |+1)+d-i]\\&= \sum _{|\alpha | \le m-1} f_{\alpha } x^{\alpha } [2|\alpha |+d] + \sum _{|\alpha | = m-1} \sum _{i=1}^{\mu (\alpha )} f_{\alpha +\varepsilon _i}x^{\alpha +\varepsilon _i}[2(|\alpha |+1)+d-i]\\&= \sum _{|\alpha | \le m-1} f_{\alpha } x^{\alpha } [2|\alpha |+d] + \sum _{|\alpha | = m} f_{\alpha }x^{\alpha }[2|\alpha |+d-\mu (\alpha )] =: \sum _{|\alpha | \le m} \hat{f}_{\alpha } x^{\alpha }. \end{aligned}$$

The last identity holds because \(\mu (\alpha +\varepsilon _i)=i\), \(i=1, \dots , \mu (\alpha )\). Comparison of coefficients again yields

$$\begin{aligned} \hat{f}_{\alpha } = f_{\alpha }[2|\alpha |+d] = f_{\alpha }(1+\theta _{(2|\alpha |+d),\alpha }), \quad |\alpha | \le m-1, \end{aligned}$$

and

$$\begin{aligned} \hat{f}_{\alpha } = f_{\alpha }[2|\alpha |+d-\mu (\alpha )] = f_{\alpha }(1+\theta _{(2|\alpha |+d-\mu (\alpha )),\alpha }), \quad |\alpha | = m. \end{aligned}$$

This advances the induction hypothesis and completes the proof. \(\square \)

In [3] the backwards error estimates have been used to also derive a forward error bound. One can copy these methods almost verbatim and use the absolute and relative condition numbers

$$\begin{aligned} \kappa _a (f,x) := \sum _{|\alpha | \le m} | f_\alpha \, x^\alpha |, \qquad \kappa _r (f,x) := \dfrac{\sum _{|\alpha | \le m} | f_\alpha \, x^\alpha |}{| \sum _{|\alpha | \le m} f_\alpha \, x^\alpha |}, \end{aligned}$$

respectively. It then follows that in the case of de Boor’s method we have whenever \(f(x) \ne 0\)

$$\begin{aligned} \frac{|f(x) - \hat{f} (x)|}{| f(x) |} \le \gamma _{2|\alpha |+d} \, \kappa _r (f,x). \end{aligned}$$

It is also possible to give global estimates for the forward error of evaluation on a bounded domain, where again we refer to [3] where this issue is discussed in detail. Here, however, is the only disadvantage of de Boor’s method: since it is designed for monomials, the improved backwards stability is compensated to some extent by the higher condition number of the monomials.

5 Design aspects and numerical experiments

In this section we will show how the shortcomings of Peña and Sauer’s variant affect the practical application. To this end Algorithm 2 and Algorithm 3 have been implemented in GNU Octave and applied to the following scenario: Let \(d=3\) and \(f_0,\ldots ,f_{10} \in \Pi _d\), \(\deg (f_k) = k\). These polynomials have been evaluated at the point \(\xi = (1,2,3)\) by the methods described above and the computation time has been measuredFootnote 1 for each polynomial. The results are given in Fig. 1.

Fig. 1
figure 1

Computation time of polynomial evaluation in \(d=3\) variables by Algorithms 2 and 3

This experiment shows that for \(d=3\) and \(k=10\) the evaluation method by deBoor is about 10-times faster than the variant by Peña and Sauer. This becomes even more significant if many evaluations have to be performed, e.g. in Vandermonde matrices.

But there is another important point that is not taken into account by results above. Both algorithms are restricted to a specific coefficient ordering. To avoid costly order conversions design aspects of an algorithm using multivariate polynomials should include the compatibility of the evaluation method and the coefficient ordering.

6 Summary

We have shown that de Boor’s variant of the multivariate Horner scheme outperforms the one by Peña and Sauer in terms of computational effort and backwards stability. Even if these differences are minor ones, they are “measurable”. The only disadvantage, if any, of de Boor’s approach is that it is tied to the monomial basis while the approach from [3] allows for a more general extension in the sense of (2.5) and thus for bases with a better condition number and hence a better forward error estimate.