1 Introduction

The goal of the original Prony method is to compute a parameter estimation for a finite univariate exponential sum

$$\begin{aligned} f(x) = \sum _{j=1}^M f_j \, e^{\omega _j x}, \end{aligned}$$
(1)

from sampled values \(f(x_j)\), \(j=1,\dots ,N\), where neither the frequencies \(\omega _1,\dots ,\omega _M\) nor the coefficients \(f_1,\dots ,f_M\) are known. By a simple but ingenious trick [34], due to Gaspard Richard, Baron de Prony, in 1795, the frequencies can be determined by computing the kernel of a certain Hankel matrix and then finding the zeros of a polynomial whose coefficients are formed from the kernel vector whereafter the coefficients can be easily determined by solving a linear system.

Recently, Prony’s method has gained new interest in the context of sparsity, where (1) can be interpreted as a signal f that consists of a moderate number of simple oscillations and hence permits a sparse representation once the frequencies and the coefficients are known.

The recent survey [28] connects Prony’s method and its variations to sparsity problems and also refers to recent developments, for example [29]. Here, we aim at extending the method to the multivariate case. A first step in this direction has been made in [31], but it uses projections onto single variables in contrast to which this paper progresses differently by pointing out and using the strong relationship between Prony’s method and multivariate polynomial interpolation through constructive ideal theory. In the end, this leads to an algorithm that, though it solves a nonlinear problem, relies entirely on procedures from Linear Algebra, in particular on orthogonal projections which results in a relatively stable method; nonlinearity only enters when certain eigenvalue problems for a commuting family of matrices have to be solved. These multiplication tables modulo an ideal for a given normal form spaces are the natural generalization of the Frobenius companion matrix whose role in the numerical realization of Prony’s method is well-known, cf. [28].

In one variable, Prony’s method in its simplest version consists of determining the coefficients of the so-called Prony polynomial as a zero eigenvector of a certain Hankel matrix formed by samples of the function and then finding the zeros of this polynomial which are \(e^\omega \), \(\omega \in \Omega \). In the multivariate case the kernel of the Hankel matrix is an ideal and the eigenvalues have to be determined for the operation of multiplication modulo this ideal. To compute such multiplication tables requires a well-defined analog of euclidean division, a problem that actually triggered the invention of Gröbner bases in [4]. We will study these multivariate algebraic issues that are closely related to minimal degree interpolation and derive an algorithm that extends the idea of finding a “Prony ideal” and to determine the frequencies by means of generalized companion matrices. Even if the algorithm is of algebraic nature, the choice of orthogonal H-bases makes it possible to implement it in a floating point environment relying on standard procedures from Numerical Linear Algebra like SVD and QR factorizations.

The paper is organized as follows: in Sect. 2, we recall the necessary tools from computational numerical ideal theory and show how to apply them to Prony’s problem. This leads to an algorithm which works entirely on vectors and matrices and can be implemented in quite a straightforward fashion in Matlab or Octave [10]. To illustrate the main ideas of the concept we have a more careful look at the simplest possible case in Sect. 3 and show some of the results obtained by the numerical implementation. A short remark how this can be applied to reconstruct sparse polynomials from sampling and a short summary conclude the paper.

2 Basic concepts and first solution

The problem to be solved by Prony’s method in several variables is still easy to state: for a finite set \(\emptyset \ne \Omega \subset {\mathbb {C}}^s\) of frequencies and coefficients \(f_\omega \in {\mathbb {C}}{\setminus }\{0 \}\), the goal is to reconstruct a function

$$\begin{aligned} f = \sum _{\omega \in \Omega } f_\omega \, e^{\omega ^T \cdot }, \quad f_\omega \ne 0, \, \quad \omega \in \Omega , \end{aligned}$$
(2)

from measurements of f, i.e., from point evaluations f(z), \(z \in Z\), where \(Z \subset {\mathbb {C}}^s\) has to be a finite set as well. In the classical version that we consider first, Z will even be a subset of the grid \({\mathbb {Z}}^s\).

Since the function f does not change if \(\omega \) is replaced by \(\omega + 2 i \pi \alpha \), \(\alpha \in {\mathbb {Z}}^s\), the frequencies have to be different modulo \(2 i \pi {\mathbb {Z}}^s\) so that the imaginary parts can be restricted, for example, to \([0,2\pi )^s\). In other words, we have to choose \(\Omega \) as a finite subset of \(( {\mathbb {R}}+ i {\mathbb {T}})^s\) where \({\mathbb {T}}:= {\mathbb {R}}/ 2 \pi {\mathbb {Z}}\).

To obtain an extension of Prony’s approach to the multivariate case, we fix some notation. For notational simplicity, we will restrict ourselves to the real case, i.e. \(\Omega \subset {\mathbb {R}}^s\), but the method can be easily extended in a totally straightforward manner to the complex field by adding complex conjugation where needed. In fact, the “real” implementation of the algorithm in Octave works for complex data even without any changes.

By \(\varPi = {\mathbb {R}}[z] = {\mathbb {R}}[z_1,\dots ,z_s]\) we denote the ring of all polynomials over the field \({\mathbb {R}}\) of reals, i.e., all polynomials with real coefficients. With the use usual multiindex notation, where

$$\begin{aligned} z^\alpha = z_1^{\alpha _1} \cdots z_s^{\alpha _s}, \quad |\alpha | = \alpha _1 + \cdots + \alpha _s, \end{aligned}$$

for a given multiindex \(\alpha \in {\mathbb {N}}_0^s\), we denote by

$$\begin{aligned} \varPi _n := \left\{ p(z) = \sum _{|\alpha | \le n} p_\alpha \, z^\alpha : p_\alpha \in {\mathbb {R}}\right\} , \quad n \in {\mathbb {N}}_0, \end{aligned}$$

the vector space of all polynomials of total degree

$$\begin{aligned} \deg p := \max \{ |\alpha | : p_\alpha \ne 0 \} \end{aligned}$$

at most n. The dimension of this vector space will be abbreviated as \(d_n := { n+s \atopwithdelims ()s}\). Moreover, we will write

$$\begin{aligned} \varPi _n^0 := \left\{ p(z) = \sum _{|\alpha | = n} p_\alpha \, z^\alpha : p_\alpha \in {\mathbb {R}}\right\} , \quad n \in {\mathbb {N}}_0, \end{aligned}$$

for the homogeneous polynomials or forms of degree n, a space of dimension \(d_n^0 := { n+s-1 \atopwithdelims ()s-1 }\), and denote by \(\Lambda :\varPi \rightarrow \varPi _{\deg p}^0\) the mapping that extracts the leading form of a polynomial:

$$\begin{aligned} p (z) = \sum _{|\alpha | \le \deg p} p_\alpha \, z^\alpha \quad \Rightarrow \quad \Lambda (p)(z) = \sum _{|\alpha | = \deg p} p_\alpha \, z^\alpha . \end{aligned}$$

2.1 Kernels and ideals

The fundamental tool is the (multidimensional) Hankel matrix

$$\begin{aligned} {\mathbf {F}}_n := \left[ f(\alpha +\beta ) : \begin{array}{l} |\alpha | \le n \\ |\beta | \le n \end{array} \right] \in {\mathbb {R}}^{d_n \times d_n}, \quad n \in {\mathbb {N}}. \end{aligned}$$
(3)

For any polynomial \(p \in \varPi _n\), \(p(z) = \sum p_\alpha z^\alpha \), we write its coefficient vector as \({\mathbf {p}}= \left[ p_\alpha : |\alpha | \le n \right] \) and obtain for \(|\alpha | \le n\) that

$$\begin{aligned} \left( {\mathbf {F}}_n {\mathbf {p}}\right) _\alpha&= \sum _{|\beta | \le n} f(\alpha {+}\beta ) \, p_\beta {=} \sum _{|\beta | \le n} \sum _{\omega \in \Omega } f_\omega \, e^{\omega ^T (\alpha + \beta )} p_\beta {=} \sum _{\omega \in \Omega } f_\omega e^{\omega ^T \alpha } \, \sum _{|\beta | \le n} p_\beta z^{\omega ^T \beta } \\&= \sum _{\omega \in \Omega } f_\omega e^{\omega ^T \alpha } \, p( e^\omega ). \end{aligned}$$

For abbreviation we set \(z_\omega := e^{\omega } = \left( e^{\omega _1},\ldots ,e^{\omega _s} \right) \) as well as \(Z_\Omega := e^\Omega = \{ z_\omega : \omega \in \Omega \}\) and then observe that by the above simple computation the zero dimensional ideal

$$\begin{aligned} I_\Omega := \{ p \in \varPi : p(z_\omega ) = 0, \, \omega \in \Omega \} = I (Z_\Omega ) \end{aligned}$$

plays an important role that can be stated as follows.

Lemma 1

If \(p \in I_\Omega \cap \varPi _n\) then \({\mathbf {F}}_n {\mathbf {p}}= 0\).

Corollary 1

For \(n \in {\mathbb {N}}\) we have \( \dim \ker {\mathbf {F}}_n \ge \dim ( I_\Omega \cap \varPi _n ). \)

In general, the converse of Lemma 1 does not hold true. This is most easily seen for \(n = 0\), where \(p = 1\) yields \({\mathbf {p}}= 1\) and

$$\begin{aligned} {\mathbf {F}}_n {\mathbf {p}}= \sum _{\omega \in \Omega } f_\omega . \end{aligned}$$

Hence, if the coefficients of the unknown function happen to sum to zero, then the ideal structure in \(\varPi _0\) cannot be recovered from information on \({\mathbf {F}}_0\) alone.

2.2 Ideals, bases and interpolation

For a converse of Lemma 1 under some additional restrictions, we have a closer look at the ideal \(I_\Omega \) from the point of view of multivariate polynomial interpolation, cf. [11, 38, 39]. To that end, we will construct and use H-basis H for the ideal \(I_\Omega \). Recall that an H-basis for an ideal \(I \subset \varPi \) is a finite set \(H \subset \varPi \) such that

$$\begin{aligned} p \in I \quad \Leftrightarrow \quad p = \sum _{h \in H} p_h \, h, \quad \deg p_h + \deg h \le \deg p, \, h \in H, \end{aligned}$$
(4)

for properly chosen polynomials \(p_h\). The important point of an H-basis is the non-redundant representation of p in (4) by means of

$$\begin{aligned} \left\langle {H} \right\rangle = \left\{ \sum _{h \in H} p_h \, h : p_h \in \varPi \right\} , \end{aligned}$$

the ideal generated by H with respect to the total degree: no summand on the right hand side of (4) has a larger total degree than p and therefore there is no cancellation of redundant terms of higher degree in the sum.

H-bases were already introduced by Macaulay in [20] and studied by Gröbner [14, 15] especially in the context of homogenization and dehomogenization, see also [22]. H-bases without term orders were investigated in [37], but in terms of more conventional Computer Algebra any Gröbner basis with respect to a graded term order, i.e., any term order \(\prec \) such that \(|\alpha | < |\beta |\) implies \(\alpha \prec \beta \), is also an H-basis, cf. [7].

Let let \((\cdot ,\cdot ):\varPi \times \varPi \rightarrow {\mathbb {R}}\) denote the inner product

$$\begin{aligned} (p,q) = \sum _{\alpha \in {\mathbb {N}}_0^s} p_\alpha \, q_\alpha , \end{aligned}$$
(5)

where

$$\begin{aligned} p(z) = \sum _{|\alpha | \le \deg p} p_\alpha z^\alpha , \quad q(z) = \sum _{|\alpha | \le \deg q} q_\alpha z^\alpha . \end{aligned}$$

As shown in [37], there exists, for any ideal \(I \subset \varPi \), an H-basis H of I such that any polynomial \(p \in \varPi \) can be written as

$$\begin{aligned} p = \sum _{h \in H} p_h h + r, \quad \deg r \le \deg p, \end{aligned}$$
(6)

where the remainder r is orthogonal to the ideal in the sense that any homogeneous component

$$\begin{aligned} r_k^0 (z) := \sum _{|\alpha | = k} r_\alpha z^\alpha \in \varPi _k^0, \quad k=0,\dots ,\deg r, \end{aligned}$$

of r is orthogonal to all leading terms in the ideal:

$$\begin{aligned} 0 = \left( r_k^0,\Lambda (p) \right) , \quad p \in I_\Omega \cap \varPi _k, \quad k=0,\dots ,\deg r. \end{aligned}$$
(7)

The remainder r can computed in efficient and numerically stable way by the orthogonal reduction process introduced in [37], see [23, 24] for more algorithmic and numerical details. We briefly recall this process that will be adapted to our specific needs here. For a given finite set \(H \subset \varPi \) of polynomials that does not necessarily have to be an H-basis, and \(k := \deg p\) one considers the homogeneous subspace

$$\begin{aligned} V_k (H) := \left\{ \sum _{h \in H} q_h^0 \Lambda (h) : q_h^0 \in \varPi _{k - \deg h}^0 \right\} \subset \varPi _k^0 \end{aligned}$$

and computes an orthogonal projection of \(\Lambda (p)\) onto this vector space, i.e., chooses particular polynomials \(q_{k,h}^0 = q_{k,h}^0 (p) \in \varPi _{k - \deg h}^0\), \(h \in H\), depending on p such that

$$\begin{aligned} ( r_k^0, V_k (H) ) = 0, \quad r_k^0 := r_k^0 (p) := \Lambda (p) - \sum _{h \in H} q_{k,h}^0 \Lambda (h). \end{aligned}$$

Then one replaces p by

$$\begin{aligned} p - \sum _{h \in H} q_{k,h}^0 \, h - r_k^0, \end{aligned}$$

which eliminates \(\Lambda (p)\) and thus reduces the total degree of p by at least one. After repeating this process at most \(\deg p\) times, we end up with a decomposition

$$\begin{aligned} p = \sum _{h \in H} \sum _{k=0}^{\deg p} q_{k,h}^0 \, h + \sum _{k=0}^{\deg p} r_k^0 (p) =: \sum _{h \in H} p_h \, h + r, \end{aligned}$$
(8)

where, by construction, each homogeneous component of r is in the orthogonal complement of the respective \(V_k (H)\). In general, however, the remainder depends on H and on the particular way how the orthogonal projections, i.e., the polynomials \(q_{k,h}^0 (p)\) are chosen in any step, but things simplify significantly once H is an H-basis for \(\left\langle {H} \right\rangle \).

Theorem 1

([37]) If H is an H-basis for \(\left\langle {H} \right\rangle \) then the remainder r computed by reduction depends only on \(\left\langle {H} \right\rangle \) and the choice of the inner product and is zero iff \(p \in \left\langle {H} \right\rangle \).

Consequently, \(\nu (p) := r\) is a normal form for p modulo the ideal \(\left\langle {H} \right\rangle \) whenever H is an H-basis. If we consider \(\nu :\varPi \rightarrow \varPi \), \(p \mapsto \nu (p)\), as a mapping, its image, the linear space \(N := \nu (\varPi ) \subset \varPi \) is a canonical interpolation space. With the specific canonical choice (5) of the inner product, the normal form space is the Macaulay inverse system, as it was named in [14, 15].

Theorem 2

The normal form space \(N := \nu (\varPi )\) has dimension \(\# \Omega \) and is a degree reducing interpolation space for \(Z_\Omega = e^\Omega \), i.e., for any \(p \in \varPi \) there exists a unique \(r \in N\) such that

$$\begin{aligned} r(z_\omega ) = p (z_\omega ), \quad \omega \in \Omega , \quad \text{ and } \quad \deg r \le \deg p. \end{aligned}$$
(9)

Moreover, we have the direct sum decompositions

$$\begin{aligned} \varPi _n = ( N \cap \varPi _n ) \oplus ( I_\Omega \cap \varPi _n ). \end{aligned}$$
(10)

To make the reader a bit more acquainted with the simple arguments behind this theorem, we give a short proof.

Proof

Since \(\nu (p) = \nu (p')\) whenever \(p-p' \in \left\langle {H} \right\rangle \), cf. [37], the mapping \(\nu \) is indeed well-defined and the interpolation property

$$\begin{aligned} p (z_\omega ) = \sum _{h \in H} p_h (z_\omega ) \, h (z_\omega ) + \nu (p) (z_\omega ) = \nu (p) (z_\omega ), \quad \omega \in \Omega , \end{aligned}$$

as well as the degree reduction follow directly from (6). If there were two interpolants \(r,r'\) for p in N, then \(r - r' \in I_\Omega \cap N = \{ 0 \}\), hence the interpolant is unique and therefore \(\dim N = {\mathop {codim}\nolimits \,}I_\Omega = \# \Omega \), see also [3]. Finally, (10) follows from the more general homogeneous formula

$$\begin{aligned} \varPi _n^0 = \left( \Lambda (N) \cap \varPi _n^0 \right) \oplus \left( \Lambda (I_\Omega ) \cap \varPi _n^0 \right) , \quad n \in {\mathbb {N}}_0, \end{aligned}$$
(11)

which has been proved in a wider context in [39, Theorem 5.11]. \(\square \)

Since N is a finite dimensional space, it has a finite basis, the most common one being the Lagrange fundamental polynomials \(\ell _\omega \in N\), \(\omega \in \Omega \), defined by

$$\begin{aligned} \ell _\omega (z_{\omega '}) = \delta _{\omega ,\omega '}, \quad \omega ,\omega ' \in \Omega . \end{aligned}$$
(12)

These polynomials can be given explicitly as

$$\begin{aligned} \ell _\omega = \nu \left( \prod _{\omega ' \in \Omega {\setminus }\{ \omega \}} \frac{( \cdot - z_{\omega '} )^T ( z_\omega - z_{\omega '} )}{( z_\omega - z_{\omega '} )^T ( z_\omega - z_{\omega '} )} \right) , \quad \omega \in \Omega , \end{aligned}$$
(13)

and obviously form a basis of N. Thus,

$$\begin{aligned} \deg N := \max \{ \deg r : r \in N \} = \max \{ \deg \ell _\omega : \omega \in \Omega \} \end{aligned}$$

is a well defined number.

2.3 Back to kernels

Based on the concepts above, we can now give the converse of Lemma 1 under the additional requirement that n is sufficiently large.

Theorem 3

If \(n \ge \deg N\) then

$$\begin{aligned} {\mathbf {F}}_n {\mathbf {p}}= 0 \quad \Leftrightarrow \quad p \in ( I_\Omega \cap \varPi _n ). \end{aligned}$$
(14)

Proof

The direction “\(\Leftarrow \)” has already been shown in Lemma 1. For the converse, we obtain for the coefficient vectors \({\mathbf {\ell }}_\omega \) of the polynomials \(\ell _\omega \), \(\omega \in \Omega \) that

$$\begin{aligned} \left( {\mathbf {F}}_n \, {\mathbf {\ell }}_\omega \right) _\alpha = \sum _{\omega ' \in \Omega } f_\omega e^{\alpha ^T \omega '} \, \ell _{\omega } (z_{\omega '}) = f_\omega e^{\alpha ^T \omega } = f_\omega \, z_\omega ^\alpha , \end{aligned}$$

hence

$$\begin{aligned} {\mathbf {F}}_n {\mathbf {\ell }}_\omega = f_\omega {\mathbf {v}}_\omega ^n \ne 0, \quad {\mathbf {v}}_\omega ^n := \left[ z_\omega ^\alpha : |\alpha | \le n \right] . \end{aligned}$$

Since we can write any \(r \in N{\setminus }\{ 0 \}\) as

$$\begin{aligned} r = \sum _{\omega \in \Omega } r(z_\omega ) \, \ell _\omega , \quad [ r(z_\omega ) : \omega \in \Omega ] \ne 0, \end{aligned}$$

we can conclude that

$$\begin{aligned} {\mathbf {F}}_n {\mathbf {r}}= \sum _{\omega \in \Omega } f_\omega \, r (z_\omega ) \, {\mathbf {v}}_\omega ^n \ne 0, \end{aligned}$$

since the vectors \({\mathbf {v}}_\omega ^n\) are the rows of the Vandermonde matrix

$$\begin{aligned} {\mathbf {V}}_n (\Omega ) := \left[ z_\omega ^\alpha : \begin{array}{c} \omega \in \Omega \\ |\alpha | \le n \end{array} \right] \in {\mathbb {R}}^{\# \Omega \times d_n}, \end{aligned}$$

which has rank \(\# \Omega \), yielding that the vectors are \({\mathbf {v}}_\omega ^n\) are linearly independent. \(\square \)

In \(\varPi _{\# \Omega -1}\) polynomial interpolation at the sites \(z_\omega \) is always possible for example by means of Kergin interpolation, cf. [21], or simply by noting that, similar to (13), the polynomial

$$\begin{aligned} \sum _{\omega \in \Omega } p(z_\omega ) \, \prod _{\omega ' \in \Omega {\setminus }\{ \omega \}} \frac{( \cdot - z_{\omega '} )^T ( z_\omega - z_{\omega '})}{( z_\omega - z_{\omega '} )^T ( z_\omega - z_{\omega '} )} \in \varPi _{\# \Omega - 1} \end{aligned}$$

interpolates p at \(Z = e^\Omega \). Note, however, that usually \(\# \Omega \) is much larger than \(\deg N\) as in the generic case we usually have the relationship that

$$\begin{aligned} {\deg N + s \atopwithdelims ()s} \le \# \Omega < {\deg N + s + 1 \atopwithdelims ()s}. \end{aligned}$$

If we assume like in the classical univariate Prony method that \(\# \Omega \) is known, we can reconstruct the ideal from the Hankel matrix.

Corollary 2

A polynomial \(p \in \varPi _{\#\Omega }\) belongs to \(I_\Omega \) if and only if \({\mathbf {p}}\in \ker {\mathbf {F}}_{\# \Omega }\).

Next, we define the numbers

$$\begin{aligned} v_k := v_k ( I_\Omega ) := \dim ( I_\Omega \cap \varPi _k ), \quad k=0,\dots ,n. \end{aligned}$$
(15)

The mapping \(k \mapsto v_k (I_\Omega )\) is called the (affine) volume function of the ideal \(I_\Omega \), [15, p. 159], while its complement function

$$\begin{aligned} k \mapsto h_k (I_\Omega ) := d_k - v_k (I_\Omega ) = \dim \varPi _k / I_\Omega \end{aligned}$$

is the affine Hilbert function of the ideal, cf. [7, p. 447]. In this terminology, we can summarize our findings as follows: for zero dimensional ideals the affine Hilbert function becomes constant once k is large enough.

Lemma 2

For \(k \in {\mathbb {N}}_0\) we have

$$\begin{aligned} h_k (I_\Omega ) \left\{ \begin{array}{ll}< h_{k+1} (I_\Omega ), &{} \quad k < \deg N, \\ = h_{k+1} (I_\Omega ), &{}\quad k \ge \deg N. \end{array} \right. \end{aligned}$$
(16)

Proof

From (10) it follows that

$$\begin{aligned} h_k ( I_\Omega ) = \dim \varPi _k /I_\Omega = \dim ( N \cap \varPi _k ), \end{aligned}$$

which is clearly monotonically increasing in k and constant once \(k \ge \deg N\). Now suppose that for some k we have \(h_k ( I_\Omega ) = h_{k+1} ( I_\Omega )\), then, by (11), it follows that \(\Lambda ( N ) \cap \varPi _{k+1} = \{ 0 \}\), hence \(\Lambda ( I_\Omega ) = \varPi _{k+1}^0\) and since \(I_\Omega \) is an ideal, the forms

$$\begin{aligned} \sum _{j=1}^s (\cdot )_j \, \Lambda ( I_\Omega ) = \sum _{j=1}^s (\cdot )_j \, \varPi _{k+1}^0 = \varPi _{k+2}^0 \end{aligned}$$

also generate \(\varPi _{k+2}^0\), hence \(\Lambda ( N ) \cap \varPi _{k+2}^0 = \{ 0 \}\) and therefore \(h_k ( I_\Omega ) = h_{k+2} (I_\Omega )\). By iteration we conclude that \(h_k ( I_\Omega ) = h_{k+1} ( I_\Omega )\) implies that \(h_k ( I_\Omega ) = h_{k'} ( I_\Omega )\), \(k' > k\). In particular, this yields that \(N \cap \varPi _k\) is a proper subspace of \(N \cap \varPi _{k+1}\) as long as \(k < \deg N\), from which (16) follows. \(\square \)

We remark that Lemma 2 can also be interpreted as the statement that minimal degree interpolation spaces have no “gaps”.

2.4 Graded bases

From now on suppose that \(n > \deg N\) is chosen properly. The next step will be to construct a graded basis for \(\ker {\mathbf {F}}_n\). To that end, we define the matrices

$$\begin{aligned} {\mathbf {F}}_{n,k} := \left[ f (\alpha +\beta ) : \begin{array}{c} |\alpha | \le n \\ |\beta | \le k \end{array} \right] \in {\mathbb {R}}^{d_n \times d_k}, \quad k=0,\dots ,n, \end{aligned}$$

and note that for \(p \in \varPi _k\)

$$\begin{aligned} {\mathbf {F}}_n {\mathbf {p}}= {\mathbf {F}}_{n,k} {\mathbf {p}}, \end{aligned}$$
(17)

where, strictly speaking, the two coefficient vectors in (17) are of different size as on the left hand side we view p as a polynomial of degree n while on the right hand side it is seen as a polynomial of degree k. Nevertheless, we prefer this ambiguity, which is typical for polynomials, to introducing further subscripts.

In particular, since \(n > \deg N\), it follows that

$$\begin{aligned} p \in (I_\Omega \cap \varPi _k) \quad \Rightarrow \quad 0 = {\mathbf {F}}_n {\mathbf {p}}= {\mathbf {F}}_{n,k} {\mathbf {p}}. \end{aligned}$$

Lemma 3

(Hilbert function) If \(n > \deg N\) then \(d_n-\dim \ker {\mathbf {F}}_{n,k} = h_k (I_\Omega )\), \(k \in {\mathbb {N}}_0\).

Proof

Since for \(p \in \varPi _k\) we have \(0 = {\mathbf {F}}_n {\mathbf {p}}= {\mathbf {F}}_{n,k} {\mathbf {p}}\) if and only if \(p \in I_\Omega \) by Theorem 3, it follows that \({\mathbf {p}}\in \ker {\mathbf {F}}_{n,k}\) if and only if \(p \in (I_\Omega \cap \varPi _k)\), i.e., \((I_\Omega \cap \varPi _k) \simeq \ker {\mathbf {F}}_{n,k}\). Hence, the dimensions of the two vector spaces have to coincide. \(\square \)

We now build a graded ideal basis in an inductive way. To that end, we first note that \(\ker {\mathbf {F}}_{n,0} \ne \{ 0 \}\) if and only if \({\mathbf {F}}_{n,0} = 0\) which would yield that either \(1 \in I_\Omega \), i.e., \(\Omega = \emptyset \) or \(f_\omega = 0\), \(\omega \in \Omega \). Since both is excluded by assumption, we always have that \({\mathbf {F}}_{n,0} \ne 0\). Thus, we set \({\mathbf {P}}_0 = []\).

Next, we consider \(k=1\) and let \({\mathbf {p}}_1^1,\dots ,{\mathbf {p}}^1_{v_1 - v_0}\) be a basis of \(\ker {\mathbf {F}}_{n,1}\) which we arrange into a matrix \({\mathbf {P}}_1 := [ {\mathbf {p}}_1^1,\dots ,{\mathbf {p}}^1_{v_1 - v_0} ] \in {\mathbb {R}}^{d_1 \times v_1 - v_0}\), where \(v_k\) is defined in (15). If \(\ker {\mathbf {F}}_{n,1}\) is trivial, i.e., \(\ker {\mathbf {F}}_{n,1} = \{ 0 \}\), we have \(v_1 = v_0 = 0\) and write \({\mathbf {P}}_1 = []\). Since \(\Omega \) is finite, hence \(I_\Omega \ne \{ 0 \}\), there exists some index \(k_0\) such that \(\ker {\mathbf {F}}_{n,k} \ne \{ 0 \}\), \(k \ge k_0\).

Now suppose that we have constructed matrices \({\mathbf {P}}_j \in {\mathbb {R}}^{d_j \times w_j}\), \(w_j := v_j - v_{j-1}\), \(j = 1,\dots ,k\), \(k \ge k_0\), such that the columns of \({\mathbf {P}}_0,\dots ,{\mathbf {P}}_k\) form a basis of \(\ker {\mathbf {F}}_{n,k} \ne \{ 0 \}\). We arrange these bases into the block upper triangular matrix

$$\begin{aligned} {\mathbf {K}}_k := \left[ {\mathbf {P}}_1,\dots ,{\mathbf {P}}_k \right] \in {\mathbb {R}}^{d_{k} \times v_{k}} \end{aligned}$$
(18)

from which we will derive \(P_{k+1}\) and eventually \(K_{k+1}\) in an inductive step. Like above, we use the convention that “empty columns” \({\mathbf {P}}_j = []\) are omitted and that the column vectors \({\mathbf {p}}^j_\ell \in {\mathbb {R}}^{d_j}\) of \({\mathbf {P}}_j\), \(\ell = 1,\dots ,w_j\), are embedded into \({\mathbb {R}}^{d_k}\) by appending zeros which is again consistent with the way how polynomials of degree \(< k\) are embedded in \(\varPi _k\).

To advance the construction to \(k+1\), let \(\tilde{\mathbf {P}}_{k+1} \in {\mathbb {R}}^{d_{k+1} \times v_{k+1}}\) be a basis of the \(v_{k+1}\) dimensional subspace \(\ker {\mathbf {F}}_{n,k+1}\) of \({\mathbb {R}}^{d_{k+1}}\), determined, for example by means for an SVD

$$\begin{aligned} {\mathbf {F}}_{n,k} = {\mathbf {U}}\Sigma {\mathbf {V}}^T, \end{aligned}$$
(19)

where the rows of \({\mathbf {V}}\) that correspond to zero or negligible singular values are even an orthonormal basis of the subspace. Recall that this is also the standard procedure for numerical rank computation which was also used to determine approximate ideals, cf. [16, 40].

Then, \({\mathbf {K}}_k \, {\mathbb {R}}^{v_k} = \ker {\mathbf {F}}_{n,k} \subseteq \ker {\mathbf {F}}_{n,k+1} = \tilde{\mathbf {P}}_{k+1} \, {\mathbb {R}}^{v_{k+1}}\) implies that there exists a matrix \(\tilde{\mathbf {X}}\in {\mathbb {R}}^{v_{k+1} \times v_k}\) such that

$$\begin{aligned} \left[ \begin{array}{c} {\mathbf {K}}_k \\ 0 \end{array} \right] = \tilde{\mathbf {P}}_{k+1} \, \tilde{\mathbf {X}}\end{aligned}$$

and since \({\mathop {rank}\nolimits \,}\tilde{\mathbf {P}}_{k+1} = v_{k+1}\), the pseudoinverse or Moore–Penrose inverse \(\tilde{\mathbf {P}}_{k+1}^+\) of this matrix is a left inverse of \(\tilde{\mathbf {P}}_{k+1}\), hence

$$\begin{aligned} \tilde{\mathbf {X}}= \tilde{\mathbf {P}}_{k+1}^+ \tilde{\mathbf {P}}_{k+1} \, \tilde{\mathbf {X}}= \tilde{\mathbf {P}}_{k+1}^+ \left[ \begin{array}{c} {\mathbf {K}}_k \\ 0 \end{array} \right] . \end{aligned}$$
(20)

Now we can complete the columns of \(\tilde{\mathbf {X}}\) orthogonally to a basis of \({\mathbb {R}}^{d_{k+1}}\) by computing a QR-factorization

$$\begin{aligned} \tilde{\mathbf {X}}= {\mathbf {Q}}\left[ \begin{array}{c} {\mathbf {R}}\\ 0 \end{array} \right] , \quad {\mathbf {Q}}^T {\mathbf {Q}}= {\mathbf {I}}, \quad {\mathbf {Q}}=: [ {\mathbf {Q}}_1,{\mathbf {Q}}_2 ] \end{aligned}$$

so that the last \(w_{k+1}\) columns \({\mathbf {Q}}_2\) of \({\mathbf {Q}}\) complete \({\mathbf {X}}\) orthogonally to a basis of \({\mathbb {R}}^{d_{k+1}}\): \({\mathbf {Q}}_2^T \tilde{\mathbf {X}}= 0\) and \({\mathbf {X}}= [ \tilde{\mathbf {X}}, {\mathbf {Q}}_2 ] \in {\mathbb {R}}^{v_{k+1} \times v_{k+1}}\) is nonsingular. Setting \({\mathbf {P}}_{k+1} = \tilde{\mathbf {P}}_{k+1} {\mathbf {Q}}_2\) thus yields the graded completion

$$\begin{aligned} {\mathbf {K}}_{k+1} = [ {\mathbf {P}}_1,\dots ,{\mathbf {P}}_k,{\mathbf {P}}_{k+1} ] = \widetilde{\mathbf {P}}_{k+1} {\mathbf {X}}\in {\mathbb {R}}^{d_{k+1} \times v_{k+1}}. \end{aligned}$$
(21)

This bit of Linear Algebra has an interesting ideal theoretic interpretation concerning the sets \(P_j\) of polynomials corresponding to the coefficient matrices \({\mathbf {P}}_j\):

$$\begin{aligned} P_j := \{ p \in \varPi : {\mathbf {p}}\in {\mathbf {P}}_j \}. \end{aligned}$$

Theorem 4

If \(n \ge k > \deg N\) then \(P_0,\dots ,P_k\) form an H-basis for \(I_\Omega \).

Proof

Let \(p \in I_\Omega \). If \(\deg p \le k\) then

$$\begin{aligned} p \in I_\Omega \cap \varPi _n = \ker {\mathbf {F}}_{n,k} = \mathrm {span}\,{\mathbf {K}}_k, \end{aligned}$$

hence

$$\begin{aligned} {\mathbf {p}}= \sum _{j=0}^{\deg p} {\mathbf {P}}_j \, {\mathbf {c}}_j \quad \text{ i.e. } \quad p = \sum _{j=0}^{\deg p} \sum _{\ell =0}^{w_j} c_{j,\ell } p_\ell ^j \end{aligned}$$

for appropriate coefficients \({\mathbf {c}}_j = ( c_{j,\ell } : \ell =0,\dots ,w_j ) \in {\mathbb {R}}^{w_j}\), \(j=0,\dots ,\deg p\). In particular,

$$\begin{aligned} \Lambda (p) = \sum _{\ell =0}^{w_{\deg p}} c_{\deg p,\ell } \, \Lambda (p_\ell ^{\deg p}) \in \mathrm {span}\,\Lambda \left( P_{\deg p} \right) \subset \Lambda ( \left\langle {P_1,\dots ,P_k} \right\rangle ). \end{aligned}$$

In the case \(\deg p > k\), we first note that \(h_k (I_\Omega ) = h_{k-1} (I_\Omega )\) by Lemma 2 since \(k > \deg N\). Hence, \(\Lambda ( P_k )\) spans \(\varPi _k^0\) so that the polynomials

$$\begin{aligned} \left\{ (\cdot )^\alpha p : |\alpha | = \deg p - k, \, p \in P_k \right\} \end{aligned}$$

span \(\varPi _{\deg p}^0\). Hence,

$$\begin{aligned} \Lambda (p) = \sum _{|\alpha | = \deg p - k} \sum _{j=0}^{w_k} c_{\alpha ,j} (\cdot )^\alpha \, \Lambda ( p_j^k ) = \sum _{|\alpha | = \deg p - k} \sum _{j=0}^{w_k} c_{\alpha ,j} \Lambda \left( (\cdot )^\alpha \, p_j^k \right) , \end{aligned}$$

that is \(\Lambda (p) \in \Lambda \left( \left\langle {P_1,\dots ,P_k} \right\rangle \right) \).

Combining the two cases and noting that \(\left\langle {P_1,\dots ,P_k} \right\rangle \subset \left\langle {I_\Omega } \right\rangle \) trivially yields \(\Lambda ( I_\Omega ) \supset \Lambda \left( \left\langle {P_1,\dots ,P_k} \right\rangle \right) \) we thus have that

$$\begin{aligned} \Lambda ( I_\Omega ) = \Lambda \left( \left\langle {P_1,\dots ,P_k} \right\rangle \right) , \end{aligned}$$
(22)

which is a well-known characterization of H-bases, cf. [15] or, specifically, [37, Proposition 4.2]. \(\square \)

Remark 1

The H-basis \([ P_1,\dots ,P_k ]\) is by far not minimal, but contains many redundant polynomials. Indeed, if \({\mathbf {P}}_j \ne []\) at some level, then the polynomials \((\cdot )^\alpha P_j\), \(|\alpha | = k-j\), belong to \(I_\Omega \) as well and could be removed from the \(P_k\) without losing the H-basis property. However, we will see soon that the redundant ideal basis we generated so far eases the following computations significantly.

The next step is to construct a homogeneous basis for the inverse system \(N = r(\varPi )\). To that end, we return to the inner product \((\cdot ,\cdot )\) on \(\varPi \times \varPi \) defined in (5). The goal is to construct homogeneous bases \(N_j \subseteq \varPi _j^0\), \(j=0,\dots ,k\), such that \(\left( N_j, \Lambda (P_j) \right) = 0\) and \(\varPi _j^0 = \mathrm {span}\,N_j \oplus \mathrm {span}\,\Lambda (P_j)\), \(j=0,\dots ,\deg N\). Again, we compute a QR factorization, namely

$$\begin{aligned} \Lambda ( {\mathbf {P}}_j ) = {\mathbf {Q}}\left[ \begin{array}{c} {\mathbf {R}}\\ 0 \end{array} \right] =: [ {\mathbf {Q}}_{j,1}, {\mathbf {Q}}_{j,2} ] \left[ \begin{array}{c} {\mathbf {R}}_j \\ 0 \end{array} \right] , \quad {\mathbf {Q}}_{j,1} \in {\mathbb {R}}^{d_j^0 \times w_j}, \, {\mathbf {Q}}_{j,2} \in {\mathbb {R}}^{d_j^0 \times d_j^0 - w_j},\nonumber \\ \end{aligned}$$
(23)

where \({\mathbf {R}}\) is nonsingular since the leading terms in \(\Lambda (P_j)\) are linearly independent by construction. Setting

$$\begin{aligned} {\mathbf {N}}_j = {\mathbf {Q}}_{j,2}, \end{aligned}$$
(24)

we note that

$$\begin{aligned} \left( N_j, \Lambda ( P_j ) \right) = {\mathbf {N}}_j^T \Lambda ({\mathbf {P}}_j ) = {\mathbf {N}}_j^T \, [ {\mathbf {Q}}_{j,1}, {\mathbf {Q}}_{j,2} ] \left[ \begin{array}{c} {\mathbf {R}}_j \\ 0 \end{array} \right] = [ 0, {\mathbf {I}}] \left[ \begin{array}{c} {\mathbf {R}}_j \\ 0 \end{array} \right] = 0_{d_j^0 - w_j, w_j}, \end{aligned}$$

hence \(N_j\) is a basis of the orthogonal complement of \(\Lambda ( P_j)\) in \(\varPi _j^0\).

2.5 Reduced polynomials

For a more explicit description of the space \(N = r(\varPi )\), we continue with a definition.

Definition 1

A polynomial \(p \in \varPi _{\deg N+1}\) is called reduced if \(p = \nu (p)\).

Lemma 4

A polynomial \(p \in \varPi \) is reduced if and only if \(p \in N = \nu (\varPi )\).

Proof

Let H be an H-basis for \(I_\Omega \). If \(p \in N\), hence \(p = \nu (q)\) for some q, we have that that \((p_k^0, V_k (H) ) = 0\) for any homogeneous component \(p_k^0\) of p, and it follows that \(\nu _k^0 (p) = p_k^0\), \(k=0,\dots ,\deg p\), hence \(\nu (p) = p\) and thus any polynomial in N is reduced. Conversely, \(p = \nu (p)\) trivially implies that \(p \in \nu (\varPi ) = N\). \(\square \)

Lemma 5

With the matrices \({\mathbf {N}}_j = {\mathbf {Q}}_{j,2}\) from (23) we have

$$\begin{aligned} N = \bigoplus _{j=0}^{\deg N} \mathrm {span}\,N_j. \end{aligned}$$
(25)

Proof

Let

$$\begin{aligned} r = \sum _{j=0}^{\deg N} r_j^0, \quad r_j \in \mathrm {span}\,N_j, \end{aligned}$$

that is, \({\mathbf {r}}_j^0 = {\mathbf {N}}_j \, {\mathbf {c}}_j\), \({\mathbf {c}}_j \in {\mathbb {R}}^{d_j^0-w_j}\). Then

$$\begin{aligned} ( r_j^0,\Lambda (P_j) ) = {\mathbf {c}}_j^T {\mathbf {N}}_j^T \Lambda ({\mathbf {P}}_j ) = 0, \end{aligned}$$

hence \(r_j\) is reproduced in the reduction modulo the H-basis \([ P_1,\dots ,P_k ]\) and therefore r is reduced, which shows that the inclusion \(\supseteq \) holds in (25). Conversely, suppose that \(r = r_0^0 + \cdots + r_{\deg N}^0\) is reduced. Then reproduction of the homogeneous component \(r_{\deg N^0}\) in first reduction step yields that

$$\begin{aligned} r_{\deg N}^0 \perp \mathrm {span}\,P_{\deg N} \quad \text{ i.e. } \quad r_{\deg N}^0 \in \mathrm {span}\,N_{\deg N}, \end{aligned}$$

and an iterative application of this reduction yields that r must be contained in the space on the right hand side of (25), hence also \(\subseteq \) is valid there. \(\square \)

With the H-basis \(P = [ P_1,\dots ,P_m ]\), \(m := \deg N+1\), the reduction of polynomials from \(\varPi _m\) simplifies significantly. Since \(\Lambda (\varPi _k)\) spans \(\Lambda ( I_\Omega \cap \varPi _k ) = V_k ( P )\), the orthogonal projection of \(\Lambda (p)\) onto \(V_k (P)\), \(k := \deg p\), can be written as

$$\begin{aligned} \Lambda (P_k ) {\mathbf {c}}:= \sum _{\ell =0}^{w_k} c_\ell \, \Lambda (p_\ell ^k), \end{aligned}$$

or, in terms of coefficient vectors \(\Lambda ( {\mathbf {P}}_k ) {\mathbf {c}}\), where, as known from standard least squares approximation, cf. [13],

$$\begin{aligned} {\mathbf {c}}= {\mathbf {R}}_k^{-1} ( {\mathbf {Q}}_{k,1} )^T \Lambda ({\mathbf {p}}), \end{aligned}$$
(26)

which can be computed in a stable way by solving \({\mathbf {R}}_k {\mathbf {c}}= ({\mathbf {Q}}_{k,1})^T \Lambda ({\mathbf {p}})\).

Therefore, we can already compute the reduction modulo \(I_\Omega \) for given \(p \in \varPi _m\) in the following simple manner.

Algorithm 1

(Reduction)

  1. 1.

    While \(p \ne 0\)

    1. (a)

      Set \(k = \deg p\),

    2. (b)

      Compute \({\mathbf {c}}= {\mathbf {R}}_k^{-1} ( {\mathbf {Q}}_{k,1} )^T \Lambda ({\mathbf {p}})\),

    3. (c)

      Set

      $$\begin{aligned} {\mathbf {r}}_k^0 := \Lambda ({\mathbf {p}}) - \Lambda ({\mathbf {P}}_k) \, {\mathbf {c}}. \end{aligned}$$
    4. (d)

      Replace \({\mathbf {p}}\) by

      $$\begin{aligned} {\mathbf {p}}- {\mathbf {P}}_k \, {\mathbf {c}}- {\mathbf {r}}_k^0. \end{aligned}$$

To summarize what we obtained so far: Based on the evaluation matrices \({\mathbf {F}}_{n,k}\) we constructed a graded H-basis for the ideal and, at the same time, a graded homogeneous basis for the inverse system N.

2.6 Multiplication tables

Now we are ready to compute the points \(z_\Omega = e^\omega \), \(\omega \in \Omega \), and therefore also the frequencies \(\Omega \). To that end, we make use of the eigenvalue method and multiplication tables as introduced in [42], see also [25]. This is based on observing that multiplication by coordinate polynomials modulo ideal, i.e., the operation \(r \mapsto \nu ( (\cdot )_j r )\), \(r \in N\), \(j=1,\dots ,s\), is an automorphism on N and thus can be represented by a matrix \({\mathbf {M}}_j \in {\mathbb {R}}^{\dim N \times \dim N}\). Since they represent multiplication, the matrices form a commuting family and are the multivariate extension of the Frobenius companion matrix. The following result, attributed to Sticklberger in [6], was brought to wider attention in [42]. Since the proof is very short, simple and elementary, we repeat it here for the sake of completeness.

Theorem 5

Let N be a normal form space modulo \(I_\Omega \) and let \({\mathbf {M}}_j\), \(j=1,\dots ,s\), be the multiplication tables with respect to a basis of N. Then the eigenvalues of \({\mathbf {M}}_j\) are \((z_\omega )_j = e^{\omega _j}\), \(\omega \in \Omega \), and the associated common eigenvectors are the coefficient vectors of \(\ell _\omega \) with respect to this basis.

Proof

Since N is an interpolation space, we can write the normal forms as interpolants,

$$\begin{aligned} \nu (p) = \sum _{\omega \in \Omega } p(z_\omega ) \, \ell _\omega , \quad p \in \varPi , \end{aligned}$$

where, as in (12), \(\ell _\omega \in N\) is the unique solution of \(\ell _\omega ( z_{\omega '} ) = \delta _{\omega ,\omega '}\), \(\omega ,\omega ' \in \Omega \). Hence, for \(\omega \in \Omega \) and \(j \in \{ 1,\dots ,s \}\),

$$\begin{aligned} \nu \left( (\cdot )_j \ell _\omega \right) = \sum _{\omega ' \in \Omega } (z_{\omega '})_j \ell _\omega (z_{\omega '}) \, \ell _\omega = (z_\omega )_j \, \ell _\omega , \end{aligned}$$

because \(\ell _\omega (z_{\omega '}) = \delta _{\omega ,\omega '}\). \(\square \)

The matrices \({\mathbf {M}}_j\) have a block structure

$$\begin{aligned} {\mathbf {M}}_j = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {\mathbf {M}}_{0,0}^j &{} {\mathbf {M}}_{0,1}^j &{} \dots &{} {\mathbf {M}}_{0,m-1}^j &{} {\mathbf {M}}_{0,m}^j \\ {\mathbf {M}}_{1,0}^j &{} {\mathbf {M}}_{1,1}^j &{} \dots &{} {\mathbf {M}}_{1,m-1}^j &{} {\mathbf {M}}_{1,m}^j \\ &{} {\mathbf {M}}_{2,1}^j &{} \dots &{} {\mathbf {M}}_{2,m-1}^j &{} {\mathbf {M}}_{2,m}^j \\ &{} &{} \ddots &{} \vdots &{} \vdots \\ &{} &{} &{} {\mathbf {M}}_{m,m-1}^j &{} {\mathbf {M}}_{m,m}^j \end{array} \right] , \quad {\mathbf {M}}_{k,\ell }^j \in {\mathbb {R}}^{d_k^0 \times d_\ell ^0}, \end{aligned}$$
(27)

and can be conveniently computed by means of the matrices \({\mathbf {N}}_j\), \(j=0,\dots ,m := \deg N\), of homogeneous basis polynomials that were constructed in the preceding sections. The matrices

$$\begin{aligned} {\mathbf {L}}_{k,j} = \sum _{|\alpha | = k} {\mathbf {e}}_{\alpha + \epsilon _j} {\mathbf {e}}_\alpha ^T \in {\mathbb {R}}^{d_{k+1}^0 \times d_k^0}, \quad k \in {\mathbb {N}}_0, \, j=1,\dots ,s, \end{aligned}$$

that represent multiplication of a homogeneous polynomial of degree k by the monomial \((\cdot )_j\) on coefficient level, are a well-known tool in the study of multivariate orthogonal polynomials, as well cf. [44]. For \(k=0,\dots ,m\), one has to reduce the polynomials corresponding to the columns of \({\mathbf {L}}_{k,j} {\mathbf {N}}_k\). The first reduction step gives

$$\begin{aligned} {\mathbf {R}}_{k+1}^0 = \left( {\mathbf {I}}- \Lambda ( {\mathbf {P}}_{k+1} ) {\mathbf {R}}_{k+1}^{-1} {\mathbf {Q}}_{k+1,1}^T \right) {\mathbf {L}}_{k,j} {\mathbf {N}}_k \end{aligned}$$

which yields the matrix block

$$\begin{aligned} ( {\mathbf {M}}_j )_{k+1,k} = {\mathbf {N}}_{k+1}^T {\mathbf {R}}_{k+1}^0 = {\mathbf {N}}_{k+1}^T \left( {\mathbf {I}}- \Lambda ( {\mathbf {P}}_{k+1} ) {\mathbf {R}}_{k+1}^{-1} {\mathbf {Q}}_{k+1,1}^T \right) {\mathbf {L}}_{k,j} {\mathbf {N}}_k. \end{aligned}$$
(28)

After this first reduction, we have to continue with a standard nonhomogeneous reduction starting with the matrix

$$\begin{aligned} {\mathbf {T}}_{k+1} = \left( \left[ \begin{array}{c} \Lambda ( {\mathbf {P}}_{k+1} ) \\ 0 \end{array} \right] - {\mathbf {P}}_{k+1} \right) {\mathbf {R}}_{k+1}^{-1} {\mathbf {Q}}_{k+1,1}^T {\mathbf {L}}_{k,j} {\mathbf {N}}_k. \end{aligned}$$

Denoting for \({\mathbf {T}}= [ {\mathbf {t}}_1,\dots ,{\mathbf {t}}_r ]\) the \(\ell \)-homogeneous part of this matrix by

$$\begin{aligned} ( {\mathbf {T}})_\ell ^0 = \left[ ( {\mathbf {t}}_1 )_\ell ^0, \dots , ( {\mathbf {t}}_1 )_\ell ^0 \right] \in {\mathbb {R}}^{d_\ell ^0 \times r}, \end{aligned}$$

we get, for \(\ell = k,k-1,\dots ,0\) the recurrence

$$\begin{aligned} {\mathbf {R}}_\ell ^0 = \left( {\mathbf {I}}- \Lambda ( {\mathbf {P}}_\ell ) {\mathbf {R}}_\ell ^{-1} {\mathbf {Q}}_{\ell ,1}^T \right) ({\mathbf {T}}_{\ell +1})_\ell ^0 \end{aligned}$$
(29)

and

$$\begin{aligned} \nonumber {\mathbf {T}}_\ell&= {\mathbf {T}}_{\ell +1} - {\mathbf {P}}_{\ell +1} {\mathbf {R}}_\ell ^{-1} {\mathbf {Q}}_{\ell ,1}^T ({\mathbf {T}}_{\ell +1} )_\ell ^0 - {\mathbf {R}}_\ell ^0 \\&= {\mathbf {T}}_{\ell +1} - \left[ \begin{array}{c} ( {\mathbf {T}}_{\ell +1} )_\ell ^0 \\ 0 \end{array} \right] - \left( {\mathbf {P}}_\ell - \left[ \begin{array}{c} \Lambda ( {\mathbf {P}}_\ell ) \\ 0 \end{array} \right] \right) {\mathbf {R}}_\ell ^{-1} {\mathbf {Q}}_{\ell ,1}^T ( {\mathbf {T}}_{\ell +1} )_\ell ^0. \end{aligned}$$
(30)

In particular,

$$\begin{aligned} ( {\mathbf {M}}_j )_{\ell ,k} = {\mathbf {N}}_{\ell }^T {\mathbf {R}}_{\ell }^0 = {\mathbf {N}}_{\ell }^T \left( {\mathbf {I}}- \Lambda ( {\mathbf {P}}_\ell ) {\mathbf {R}}_\ell ^{-1} {\mathbf {Q}}_{\ell ,1}^T \right) ( {\mathbf {T}}_{\ell +1} )_\ell ^0, \end{aligned}$$
(31)

builds the kth block column of the matrix \({\mathbf {M}}_j\). With the matrices \({\mathbf {M}}_j\) at hand, the points \(z_\omega \), \(\omega \in \Omega \), and therefore also \(\Omega \) can be determined by means of standard eigenvalue methods, cf. [13, pp. 308–390].

2.7 The coefficients

Once the set \(Z_\Omega \) is known the remaining problem of determining the coefficients is a linear one. To solve it, we simply set up the Vandermonde matrix

$$\begin{aligned} {\mathbf {V}}:= \left[ z_\omega ^\alpha : \begin{array}{l} \omega \in \Omega \\ |\alpha | \le \deg N \end{array} \right] \in {\mathbb {R}}^{\# \Omega \times d_m}, \quad m := \deg N. \end{aligned}$$

Since \(N \subseteq \varPi _{\deg N}\) is an interpolation space for \(z_\Omega \), the matrix \({\mathbf {V}}\) has rank \(\# \Omega \) and therefore the overdetermined system

$$\begin{aligned} {\mathbf {V}}^T \left[ f_\omega : \omega \in \Omega \right] = \left[ f(\alpha ) : |\alpha | \le n \right] , \end{aligned}$$

obtained by substituting \(\alpha \) into (2), \(|\alpha | \le n\), has a unique solution which gives the coefficients \(f_\omega \), \(\omega \in \Omega \).

2.8 The algorithm

We can collect the building block from the preceding sections into the algorithm to solve Prony’s problem in several variables which we formalize as follows. We start with an unknown finite set \(\Omega \subset \left( {\mathbb {R}}+ i {\mathbb {T}}\right) ^s\), and coefficients \(f_\omega \in {\mathbb {R}}\), \(\omega \in \Omega \).

Algorithm 2

(Prony’s method in several variables)

  1. 1.

    Guess a number \(n > \deg N\), for example \(n = \# \Omega \).

  2. 2.

    For \(k=0,1,\dots ,n\),

    1. (a)

      Determine \(F_{n,k}\).

    2. (b)

      Extend the H-basis to \({\mathbf {K}}_{k+1} = [ P_0,\dots ,P_{k+1} ]\) according to (21).

    3. (c)

      Extend the graded normal form basis to \([ {\mathbf {N}}_0,\dots ,{\mathbf {N}}_k ]\) according to (24). until \({\mathop {rank}\nolimits \,}{\mathbf {F}}_{n,k} = {\mathop {rank}\nolimits \,}{\mathbf {F}}_{n,k+1}\).

  3. 3.

    Compute the multiplication tables \({\mathbf {M}}_1,\dots ,{\mathbf {M}}_s\) by means of (31).

  4. 4.

    Compute and match the eigenvalues to determine \(z_\omega \), \(\omega \in \Omega \).

  5. 5.

    Solve the Vandermonde system to obtain the coefficients \(f_\omega \), \(\omega \in \Omega \).

Now we can summarize the preceding results as follows.

Theorem 6

For any finite set \(\Omega \subset ( {\mathbb {R}}+ i {\mathbb {T}})^s\), Algorithm 2 reconstructs \(\Omega \) and the coefficients \(f_\omega \) of the function

$$\begin{aligned} f (x) = \sum _{\omega \in \Omega } f_\omega \, e^{\omega ^T x} \end{aligned}$$

from a subset of the values \(f(\alpha )\), \(|\alpha | \le 2n\).

Remark 2

The number n depends not only on the number \(\# \Omega \) of different frequencies but also on the geometry of the points \(Z_\Omega = e^\Omega \). If points are in general position or generic, then n is the smallest number such that \(\# \Omega < d_n = {n+s \atopwithdelims ()s}\), a “safe” choice, on the other hand is always \(n = \# \Omega \). This, however, leads to huge matrices when the number of variables increases and stops being tractable quite early. Note that those generic configurations of the points \(z_\omega \) are open and dense among all point distributions, cf. [11], hence a separation distance between the points only affects the number \(\deg N\) in a very marginal way, quite in contrast to the univariate case.

Remark 3

It is worthwhile to emphasize that the validity of the following steps of the algorithm rely on the proper choice of n. Only then the sequence of ranks coincides with the Hilbert function. Building matrices \({\mathbf {F}}_k\) until their rank stabilizes is not sufficient. A simple example is to choose \(\Omega \) in such a way that

$$\begin{aligned} Z_\Omega = \left\{ \frac{\alpha }{2} : |\alpha | \le 2 \right\} \end{aligned}$$

is the triangular grid of order 2 and

$$\begin{aligned} {\mathbf {f}}:= \left[ f_\omega : \omega \in \Omega \right] = V_2^{-T} \left[ \begin{array}{c} 1 \\ \vdots \\ 1 \end{array} \right] , \quad V_2 = \left[ z_\omega ^\alpha : \begin{array}{c} \omega \in \Omega \\ |\alpha | \le 2 \end{array} \right] . \end{aligned}$$

Then,

$$\begin{aligned} f(\alpha ) = \sum _{\omega \in \Omega } f_\omega z_\omega ^\alpha = \left( V_2^T {\mathbf {f}}\right) _\alpha = 1, \quad |\alpha | \le 2, \end{aligned}$$

and therefore

$$\begin{aligned} {\mathbf {F}}_0 = [ 1 ], \quad {\mathbf {F}}_1 = \left[ \begin{array}{ccc} f(0,0) &{} f(1,0) &{} f(0,1) \\ f(1,0) &{} f(2,0) &{} f(1,1) \\ f(0,1) &{} f(1,1) &{} f(0,2) \end{array} \right] = \left[ \begin{array}{ccc} 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \\ 1 &{} 1 &{} 1 \end{array} \right] \end{aligned}$$

hence \({\mathop {rank}\nolimits \,}{\mathbf {F}}_0 = {\mathop {rank}\nolimits \,}{\mathbf {F}}_1 = 1\) while still \(\deg N = 2\).

This observation has an interesting interpretation in terms of moment problems: \({\mathbf {F}}_1\) is a flat extension, see [19], of the moment sequence defined by evaluation at the origin and therefore the measurements would give only the representation \(f = 1\). This shows that also for this approach a good guess for n is necessary to obtain correct reconstructions.

Remark 4

To relate the eigenvalues of the different multiplication tables and to combine them into the points \(z_\Omega \), one can make use of the eig function in Matlab/Octave which gives the respective eigenvalues and matrices \({\mathbf {V}}_j\), \(j=1,\dots ,s\), containing the normalized eigenvectors. If these eigenvalues are sufficiently well separated, filtering the matrix \(\left| {\mathbf {V}}_j^T {\mathbf {V}}_k \right| \) for a value 1 gives a permutation that relates the eigenvalues appropriately. This simple trick fails in the case of multiple eigenvalues when intersections of the eigenspaces have to be computed. How to do this, however, has already been pointed out by Möller and Tenberg [26].

The procedure described in Algorithm 2 has been implemented prototypically in Octave [10]. The code can be downloaded for checking and verification from

www.fim.uni-passau.de/digitale-bildverarbeitung/forschung/downloads

All tests in the following section refer to this software.

2.9 Comparison to existing methods

As mentioned in the introduction, the algorithm is the canonical multivariate extension of the well-known procedure “compute the coefficients of the Prony polynomial as kernel of a matrix and determine the zeros of the polynomial by means of a companion matrix”. In this respect it can be considered an extension of the well-known MUSIC [41] and ESPRIT [35] methods, where zero eigenvectors of a symmetric and positive semidefinite measurement covariance matrix are determined and set an eigenvalue problem. A more sophisticated variant of ESPRIT can be found in [33]. As pointed out in [32], the Frobenius companion matrix of the Prony polynomial interacts nicely with the Hankel matrix of the samples which can be used to define a generalized eigenvalue problem for matrix pencils, see [17].

Attempts to the multivariate situation are more recent. One approach, established in [19], is to interpret Prony’s problem as a truncated moment problem and to build Hankel matrices of increasing size which are then checked for the flat extension criterion. The normal form space and the ideal can be defined by means of border bases which can be checked by the commutativity of candidates for the multiplication tables. For details see [19]. Unfortunately, the flat extension approach can run into the problem pointed out in Remark 3, which means that an extension can be flat in intermediate steps as well which cannot be detected. Of course, once the critical degree n is known, the extension becomes flat beyond that degree. The moment problem formulation is also used in [18]. There, however, a “maximal degree” is used which has the severe disadvantage that it does not turn the polynomials into a graded ring which, in turn, is very useful for defining good graded ideal bases like Gröbner bases and H-bases. As a consequence, Theorem 3 in terms of the total degree is stronger than [18, Theorem 3.1] in terms of maximal “degree” and the matrices \(T_n\) there are even significantly larger the related \({\mathbf {F}}_n\); in addition, [18, Algorithm 1] only computes the kernel of the Hankel matrix but does not indicate how to obtain the common zeros of the ideal defined by this kernel.

Another approach are projection methods [9, 30, 31] used mostly in two variables, where the function is sampled along a straight line and the solutions of the resulting univariate Prony problems are recombined into a solution of the multivariate problem. In this situation, separation of the points becomes useful here as it can be carried over to the univariate projections.

Finally, the solvability of the multivariate Prony problem has already been considered in [5] in the context of identification of parameters in sums of shifts of a given function, however without giving a concrete algorithm for its solution.

Algorithm 2 differs from all the above approaches, mostly due to its multivariate and algebraic nature. It is an algebraic method that recovers frequencies and coefficients exactly provided that the measurements are noiseless and that all computations could be done exactly. On the other hand, appropriate techniques from Linear Algebra allow for a fast and still relatively stable implementation even in a floating point environment; as long as the affine Hilbert function does not change, the H-basis approach based on orthogonal projections ensures that basis and normal form space and therefore also the multiplication tables change continuously, see [24]. The construction of the Hankel matrix by adding block columns, enforced by the “curse of dimension”, seems to be new even in the univariate case. The main idea of the algorithm, namely to successively build an H-basis of the ideal and a graded basis of the normal form space from the kernels of the \({\mathbf {F}}_{n,k}\) is the core point of the multivariate algorithm. The computation of the multiplication tables is then straightforward as long as a reduction can be computed which can be done with good bases for arbitrary polynomial gradings, see [38]. Closest to this is the implicit use of border bases in the flat extension approach from [19].

2.10 Short remarks on noisy measurements

Prony’s method has a reputation for being numerically unstable with respect to perturbed measurements

$$\begin{aligned} \widehat{f} ( \alpha + \beta ) = f ( \alpha + \beta ) + \varepsilon _{\alpha +\beta }, \quad |\alpha | \le n, \, |\beta | \le k, \end{aligned}$$

yielding a perturbed matrix

$$\begin{aligned} \widehat{\mathbf {F}}_{n,k} := {\mathbf {F}}_{n,k} + {\mathbf {E}}_{n,k}. \end{aligned}$$
(32)

The critical spot in the algorithm is the place where the distinction is made whether a certain polynomial belongs to the ideal or to the normal form space. As mentioned before, this happens by considering the SVD \({\mathbf {F}}_{n,k} = {\mathbf {U}}\Sigma {\mathbf {V}}^H\) from (19) and by thresholding the singular values. In the case of \(\widehat{\mathbf {F}}_{n,k}\) we recall the standard perturbation result for singular values from [13, Corollary 8.6.2] that the singular values satisfy the estimate

$$\begin{aligned} \left| \sigma _k ( \widehat{\mathbf {F}}_{n,k} ) - \sigma _k ( {\mathbf {F}}_{n,k} ) \right| \le \sigma _1 ( {\mathbf {E}}_{n,k} ) = \Vert {\mathbf {E}}_{n,k} \Vert _2, \end{aligned}$$

where \(\sigma _j\) denotes the singular values in descending order. Hence, as long as the perturbation is small relative to the conditioning of the problem, i.e.,

$$\begin{aligned} \Vert {\mathbf {E}}_{n,k} \Vert _2 \le \min \{ \sigma _j ({\mathbf {F}}_{n,k}) : \sigma _j ({\mathbf {F}}_{n,k}) \ne 0 \}, \end{aligned}$$

the ideal structure is recovered with an appropriately adapted threshold value, provided an estimate for the perturbation matrix \({\mathbf {E}}_{n,k}\) is available. If the threshold level were set too high, the situation would be falsely interpreted to be more generic than it really is.

To get an idea which quantities influence the SVD of \({\mathbf {F}}_{n,k}\), we recall the straightforward observation that

$$\begin{aligned} {\mathbf {F}}_{n,k} = \sum _{\omega \in \Omega } f_\omega \, {\mathbf {V}}_n^T {\mathbf {e}}_\omega \, {\mathbf {e}}_\omega ^T {\mathbf {V}}_k = {\mathbf {V}}_n^T \left( \sum _{\omega \in \Omega } f_\omega {\mathbf {e}}_\omega \, {\mathbf {e}}_\omega ^T \right) {\mathbf {V}}_k = {\mathbf {V}}_n^T {\mathbf {F}}_\Omega \, {\mathbf {V}}_k, \end{aligned}$$

where \( {\mathbf {F}}_\Omega := {\mathop {diag}\nolimits \,}\left[ f_\omega : \omega \in \Omega \right] \) is the matrix with \(f_\omega \) on the diagonal. Let \({\mathbf {x}}\) be a singular vector of \({\mathbf {V}}_k\) for the singular value \(\sigma \), that is \(\Vert {\mathbf {x}}\Vert = 1\) and \({\mathbf {V}}_k {\mathbf {x}}= \sigma {\mathbf {y}}\) for some \({\mathbf {y}}\) with \(\Vert {\mathbf {y}}\Vert = 1\), then

$$\begin{aligned} \Vert {\mathbf {F}}_{n,k} {\mathbf {x}}\Vert _2 = \sigma \Vert {\mathbf {V}}_n^T {\mathbf {F}}_\Omega {\mathbf {y}}\Vert _2 \le \sigma \Vert {\mathbf {V}}_n^T {\mathbf {F}}_\Omega \Vert _2 \le \sigma \Vert {\mathbf {V}}_n^T \Vert _2 \max _{\omega \in \Omega } | {\mathbf {f}}_\omega | \end{aligned}$$

which shows that if \(\sigma \) is small relative to \(\Vert {\mathbf {V}}_n^T \Vert _2\) or if all entries in \(f_\omega \) are small, then the smallest singular value

$$\begin{aligned} \sigma _{min} ( {\mathbf {F}}_{n,k} ) = \min _{\Vert {\mathbf {x}}\Vert = 1, {\mathbf {F}}_{n,k} {\mathbf {x}}\ne 0} \Vert {\mathbf {F}}_{n,k} {\mathbf {x}}\Vert _2 \end{aligned}$$

of \({\mathbf {F}}_{n,k}\) is small as well. The first case means that the interpolation problem is ill-conditioned, the second case means that the coefficients are too close to zero to be relevant numerically. Also, if we can find some \({\mathbf {x}}\) such that \({\mathbf {F}}_\Omega {\mathbf {V}}_k {\mathbf {x}}\) becomes small, then this can also lead to a small singular value. This latter situation can even be reached when only a single coefficient is close to zero.

This is, of course, only a first, rough reasoning that shows that Prony’s method can perform quite well numerically if the perturbation is small relative to the stability of the interpolation problem. This can be verified by numerical experiments, see Table 4 in the following section.

3 Examples

The first example is to illustrate the basic idea of the procedure in the simplest possible case. To that end, we set \(\Omega = \{ 0,\omega \}\), \(\omega \ne 0\), so that

$$\begin{aligned} f (\alpha ) = f_0 + f_\omega e^{\omega ^T \alpha }, \end{aligned}$$

and

$$\begin{aligned} {\mathbf {F}}_{n,0} = \left[ f(\alpha ) : |\alpha | \le n \right] . \end{aligned}$$

The first component of this matrix is nonzero if \(f_0 \ne -f_\omega \), otherwise there exists some j such that \(\omega _j \ne 0\) and then at the unit multiindices \(\epsilon _j\) the function evaluates to \(f (\epsilon _j) = f_0 ( 1 - e^{\omega } ) \ne 0\), so that, as mentioned before, the rank of this matrix is 1. Hence, \({\mathop {rank}\nolimits \,}{\mathbf {F}}_{n,0} = 1\) and \(N_0 = \{ 1 \}\), the constant function is member of the normal form space. In the next step, we already consider the matrix

$$\begin{aligned} {\mathbf {F}}_{n,1} = \left[ f(\alpha ), f(\alpha +\epsilon _1), \dots , f(\alpha +\epsilon _s) : |\alpha | \le n \right] \end{aligned}$$

and since

$$\begin{aligned} f(\alpha +\epsilon _j) = f_0 + f_\omega \, e^{\omega _j} \, e^{\omega ^T \alpha }, \end{aligned}$$

we have \({\mathbf {F}}_{n,1} {\mathbf {p}}= 0\) for \({\mathbf {p}}= [ p_\alpha : |\alpha | \le 1 ]\) if and only if, with the abbreviations \(p_j = p_{\epsilon _j}\) and \(\omega _0 = 0\),

$$\begin{aligned} 0&= \left( f_0 + f_\omega e^{\omega ^T \alpha } \right) p_0 + \sum _{j=1}^s \left( f_0 + f_\omega e^{\omega _j} \, e^{\omega ^T \alpha } \right) p_j \\&= f_0 \, \sum _{j=0}^s p_j + f_\omega e^{\omega ^T \alpha } \sum _{j=0}^s e^{\omega _j} p_j \end{aligned}$$

holds for all \(|\alpha | \le n\). This can be rephrased as

$$\begin{aligned} 0 = \left[ \begin{array}{cccc} 1 &{} 1 &{} \dots &{} 1 \\ 1 &{} e^{\omega _1} &{} \dots &{} e^{\omega _s} \end{array} \right] {\mathbf {p}}\end{aligned}$$

which shows that the kernel has dimension \(s-2\) as the above matrix consists of the first two rows of the Fourier matrix for the frequencies \(0,\omega _1,\dots ,\omega _s\) and at least one of the \(\omega _j\) is nonzero. On the other hand, we have, for any \({\mathbf {p}}\) with only \(p_{\epsilon _j} \ne 0\) that

$$\begin{aligned} {\mathbf {F}}_{n,1} {\mathbf {p}}= \left[ f_0 + \left( f_\omega e^{\omega _j} \right) \, e^{\omega ^T \alpha } : |\alpha | \le n \right] = {\mathbf {F}}_{n,0} + f_\omega ( e^{\omega _j} - 1 ) \left[ e^{\omega ^T \alpha } : |\alpha | \le n \right] , \end{aligned}$$

which is linearly independent of \({\mathbf {F}}_{n,0}\) iff \(\omega _j \ne 0\). This reflects the fact that we can choose various subspaces of \(\varPi _1\) that allow for unique degree reducing interpolation at \(Z_\Omega = \{ 1,e^\omega \}\).

Next, we report and interpret some numerical results of the test implementation. There, we simply picked a number of random frequencies \(\Omega \) and coefficients \(f_\omega \) chosen by Octave’s rand function as well as an estimate for n. Then the algorithm was and maximal and average (Frobenius norms of the frequency matrix and coefficient vectors) errors in the frequencies and coefficients were determined. This process was repeated 100 times. Though this procedure is far from statistically meaningful, it clearly gives a reasonable first idea on how the algorithm behaves.

Table 1 Numerical tests with real frequencies

Table 1 records what happens if all parameters are chosen as real numbers in [0, 1]. Already for 20 frequencies in two variables the problem becomes numerically unsolvable due to the ill conditioning of the associated matrices. A closer inspection of \({\mathbf {F}}_{n,k}\) explains why: some entries in the matrix already become very large which is also reflected in the distribution of the “meaningful” singular values of that matrix. The largest one gets huge, even around \(10^{30}\), while the smallest one is around \(10^{-10}\) and becomes almost indistinguishable from the the largest one corresponding to the kernel of the matrix. In some cases even the number of frequencies is not determined correctly due to that effect.

As the table shows, things become better if the number of variables is increased. This is a typical effect in the numerical stability of multivariate polynomials, from evaluation [2, 8, 27] to interpolation [36]: the stability of polynomial algorithms often depends on the total degree of the polynomials and not so much on the number of coefficients involved. In fact, the experiments show, that in 5 variables one still obtains quite reasonable results with even 150 frequencies.

Table 2 Numerical tests with purely imaginary frequencies

One potential application and a main motivation for Prony’s method is the reconstruction of frequencies of functions with a sparse Fourier transform which means that \(\Omega \subset i {\mathbb {R}}^s\) is a set of purely imaginary frequencies. Table 2 shows that in this situation the method behaves significantly better and provides a remarkable amount of numerical stability now. The reason is that the matrices \({\mathbf {F}}_{n,k}\) now only contain complex numbers of modulus 1 and the spectral values are much better distributed. Here the numerical stability of orthogonal projections, which are the core ingredient for the Linear Algebra within the algorithm can seemingly be exploited.

Table 3 Numerical tests with points on a line through origin

Things change dramatically when the points in \(Z_\Omega \) are not in general position. The extremal case is that all points are on a line, which results in \(\deg R\) assuming its maximal value \(\# \Omega -1\). This situation is considered in Table 3, where frequencies of the form

$$\begin{aligned} \omega = \widetilde{\omega }+ i \lambda _\omega \left[ \begin{array}{l} 1 \\ \vdots \\ 1 \end{array} \right] , \quad \lambda _\omega \in {\mathbb {R}}, \end{aligned}$$
(33)

were chosen as then

$$\begin{aligned} z_\omega = e^{\omega } = e^{\widetilde{\omega }} \, e^{i s \lambda _\omega } \end{aligned}$$

are all points on the line through the origin and \(e^{\widetilde{\omega }}\). Since in this case the guess for n has to be the maximal value \(n = \# \Omega \), we use the third column to count the number of fails where the number of reconstructed frequencies was not correct. The table shows that again the degree is relevant and that, in contrast to points in general position, things do not get better but worse with increasing number of variables. In two variables the method is again surprisingly stable for purely imaginary frequencies but in more than three variables the eigenvalue routines crash as several matrices in the process become severely ill-conditioned. This is to be expected as a one-dimensional subspace has to be found in spaces of dimension \({k+s \atopwithdelims ()k}\). Things become even worse when the frequencies from (33) are slightly perturbed as then the are generic but the systems are almost arbitrarily ill-conditioned.

Table 4 Numerical tests with 100 random imaginary frequencies and random absolute perturbation

The last example, whose results are shown in Table 4, considers the case of randomly perturbed input data. The coefficients were chosen randomly in \([-2,-1] \cup [1,2]\) to avoid instability due to zero coefficients. A fail in this table means that the structure of the problem was not recognized properly, i.e., \(\# \Omega \) was not detected correctly, or that the Vandermonde matrix used to determine the coefficients became singular; sometimes the results were even Inf or NaN. The threshold on the singular values was adapted to the perturbation level \(\varepsilon \) as \(d_n d_k \varepsilon \). The results are in accordance with the earlier observations that the algorithm performs surprising well and that the numerical stability improves with the number of variables. However, it should be mentioned that \(\varepsilon = 10^{-3}\) consistently provided fails as then the noise level exceeded the smallest singular value of the unperturbed problem. If, on the other hand, coefficients were chosen randomly in \([-1,1]\), small coefficients lead to a larger number of fails and a much worse overall reconstruction quality.

These tests are only snapshots and therefore only of restricted practical relevance and do not give really reliable information. Nevertheless, they show that the method works quite well in principle. Crucial points that need to be studied and adapted further are the numerical computation of the rank of the \({\mathbf {F}}_{n,k}\) which at the moment only uses Octave’s standard rank procedure based on an SVD and thresholding. Better adapted methods that take into account the construction of the \(F_{n,k}\) and also consider other rank revealing factorizations are currently under investigation.

In addition, the algorithms are quite fast due to their numerical nature. For example, when reconstructing 100 frequencies in 10 variables (second to last example in Table 4), the procedure determines 901 polynomials of degree 4 in 10 variables and computes their 100 common zeros together with the associated coefficients (which only means solving an additional \(100 \times 100\) system) in about 47 s on a standard PC.

4 Sparse polynomials

An immediate byproduct of Prony’s method is to determine sparse polynomials, oligonomials or fewnomials, cf. [43], from sampling. Here we look for a polynomial

$$\begin{aligned} f(z) = \sum _{\alpha \in A} f_\alpha \, z^\alpha , \quad A \subset {\mathbb {N}}_0^d, \end{aligned}$$
(34)

where A is assumed to be of small cardinality but not necessarily to consist only of small multiindices. Again, the task is to determine A and \(f_\alpha \) from measurements of f. Let \(X \in {\mathbb {C}}^{s \times s}\) be an arbitrary nonsingular matrix, then we consider the matrices

$$\begin{aligned} {\mathbf {F}}_n = \left[ f \left( e^{X (\beta + \gamma ) } \right) : |\beta |,|\gamma | \le n \right] , \quad n \in {\mathbb {N}}_0. \end{aligned}$$

Since

$$\begin{aligned} f ( e^{X \beta } ) = \sum _{\alpha \in A} f_\alpha e^{\beta ^T X \alpha } = \sum _{\alpha \in A} f_\alpha \left( e^{X^T \alpha } \right) ^\beta \end{aligned}$$

we are in the Prony situation with \(\omega = \omega (\alpha ) = X^T \alpha \) and \(f_\omega = f_\alpha \). Hence, after reconstructing \(\Omega \), the exponent set A can be obtained as \(A = X^{-T} \Omega \), rounded to the next integer. Of course, rounding can even compensate small numerical errors occurring in the approximation process. The choice of X can be used to improve the conditioning of the problem. With the experiences from the previous section, a purely imaginary choice of X could be helpful. Note, however, that this of course changes the sampling grid.

Reconstruction of sparse polynomials has been considered a lot since the algorithm by Ben-Or and Tiwari [1] which uses a univariate Prony method on data \(f(k) = f \left( \omega _1^k,\dots ,\omega _s^k \right) \) where \(\omega _1,\dots ,\omega _s\) are coprime and reconstructs the exponents by divisibility. Closest to the approach here is the generalization in [12] which use some unit roots of the form \(\omega ^{2\pi /p}\), but use, in the spirit of [1], a univariate Prony and a reconstruction by means of the Chinese remainder theorem.

The obvious difference to these methods is that here a multivariate approach is used an that the total degree of the polynomials used in the method is usually much smaller than \(\# A\) as in the univariate case. The examples of Sect. 3 indicate that these will lead to better numerical behavior so that fast numerical methods can be used instead of symbolic ones and the final result is obtained by rounding \(X^{-T} \Omega \) to the next integer. A detailed study of these question, the choice of an optimal matrix X and quantitative estimates are not in the scope of this paper, however.

5 Summary

We have shown that frequencies and coefficients of (2) can be reconstructed from sampling the function on the integer lattice

$$\begin{aligned} \varGamma _n := \{ \alpha + \beta : |\alpha |,|\beta | \le n \}, \end{aligned}$$
(35)

where n is such that \(n \ge \deg \varPi /I_\Omega \), where \(I_\Omega \) is the ideal of all polynomials vanishing on \(Z_\Omega = e^\Omega \). In contrast to the univariate case, this is a structural quantity that depends on the geometry of \(Z_\Omega \). In particular, it is not to be expected that these sampling sets are minimal unless \(\# \Omega = {n+s \atopwithdelims ()s}\) and the points in \(Z_\Omega \) are in general position as then and only then the number of parameters and measurement points coincide. On the other hand, for \(s=1\) these conditions are always fulfilled. And even if there were smaller sampling sets, these would depend on the unknown geometry of \(Z_\Omega \).

In addition, we showed that the extended method can be implemented by using standard procedures of Numerical Linear Algebra and runs with reasonable numerical stability. There is still room to adapt some of the tools and probably make use of higher precision arithmetic if needed. Such additions should be designed in the context of a particular application, however.

All the methods shown here are based on the numerical realization of a purely algebraic algorithm that in principle solves Prony’s problem in an arbitrary number of variables and for an arbitrary number of frequencies. It goes without saying that a careful and quantitative analysis of the algorithm with respect to minimality, complexity and numerical accuracy is a reasonable next step once the underlying theory is understood. It would also be extremely interesting to compare this algorithm with other multivariate approaches like the projection method. Some of these issues are currently under investigation.