1 Introduction

The approximation of functions in a Hilbert space typically assumes a basis for that space. The non-redundancy of a basis ensures that linear systems associated with approximation problems are non-singular. In addition, suitable structure—ideally the basis is orthonormal, more generally it may be a Riesz basis—renders these systems well-conditioned. There is a unique solution, it is stably computable, and there is a close correspondence between properties of the continuous function and of the coefficients in the representation, for example the Parseval identity.

Instead, for a redundant set of functions the corresponding linear systems may be ill-conditioned or even singular, and uniqueness may be lost. Still, good approximations may exist in the span of the set. It may even be much easier to ensure that this is the case than it is for a basis, and in fact this is a popular approach in a wide range of applications. For example, a basis can be ‘enriched’ by adding a few functions that capture a singularity [10]. A periodic Fourier basis can be augmented with a few polynomials to capture the non-periodicity of f [23]. Ill-conditioning and redundancy also frequently appear in solution methods for partial differential equations (PDEs). In Trefftz methods, solutions of a PDE are approximated using other solutions of the same PDE, and this is often successful yet notoriously ill-conditioned [16]. Several methods are based on embedding a domain with complicated geometry \(\Omega \), for which a basis is unknown, into a simple bounding box D. A basis for \(L^2(D)\) yields a (redundant) frame for \(L^2(\Omega )\). Examples include embedded/fictitious domain methods, immersed boundary methods and others [6, 17, 24].

As mentioned, redundant representations necessarily lead to ill-conditioning. To which extent are the corresponding function approximations computable? What convergence behaviour and accuracy can be expected? In this paper, we continue a line of investigation that commenced in [3] on numerical approximation of functions using redundant function sets in general and frames in particular. The main contribution of [3] was a detailed analysis of the accuracy and conditioning of the computation of best approximations with regularization, with a chosen threshold \(\epsilon \). We now briefly recall the main results of [3] in Sect. 1.1, followed by an overview of the theoretical results of this paper in Sect. 1.2.

1.1 Best Approximation With Regularization

The main concern of [3] was the computation of the best approximation, i.e. the orthogonal projection, in the space \(\mathrm {H}_N : = \mathrm {span}(\Phi _N)\) spanned by a set of N elements \(\Phi _N : = \{ \phi _n \}^{N}_{n=1}\). This approximation is given by \(\mathcal {P}_N f = \sum ^{N}_{n=1} x_n \phi _n\), where \(\varvec{x} = (x_n)^{N}_{n=1}\) is a solution of the linear system

$$\begin{aligned} \varvec{G}_N \varvec{x} = \varvec{y}, \end{aligned}$$
(1.1)

where \(\varvec{y} = \{ \langle f, \phi _n \rangle \}^{N}_{n=1}\) and \(\varvec{G}_N = \left\{ \langle \phi _n, \phi _m \rangle \right\} ^{N}_{m,n=1}\) is the Gram matrix of \(\Phi _N\).

If the elements of \(\Phi _N\) are nearly or exactly linearly dependent, then \(\varvec{G}_N\) is ill-conditioned or even singular. Moreover, the coefficients \(\varvec{x}\) can also grow arbitrarily large, making them impossible to compute in floating point arithmetic for sufficiently large N. The remedy proposed in [3] was to regularize (1.1) by using a truncated Singular Value Decomposition (SVD) of \(\varvec{G}_N\) with a threshold parameter \(\epsilon >0\) below which all the singular values are discarded. This results in a new projection \(\mathcal {P}^{\epsilon }_N f = \sum ^{N}_{n=1} (\varvec{x}^\epsilon )_n \phi _n\), where \(\varvec{x}^{\epsilon }\) is the regularized solution of (1.1).

The main result of [3] concerns the best approximation with regularization as follows:

Theorem 1.1

[3] The truncated SVD projection \(\mathcal {P}^{\epsilon }_N\) satisfies

$$\begin{aligned} { \Vert f - \mathcal {P}^{\epsilon }_N f \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \Big \{ \Big \Vert f - \sum ^{N}_{n=1} z_n \phi _n \Big \Vert + \sqrt{\epsilon } \Big \Vert \varvec{z} \Big \Vert \Big \}. } \end{aligned}$$
(1.2)

Moreover, the (absolute) condition number of the mapping \(\varvec{y} \mapsto \mathcal {P}^{\epsilon }_N f\) is at most \(1/\sqrt{\epsilon }\).

Observe that the right hand side of (1.2) contains two terms. Theorem 1.1 states that the regularized projection behaves like the best approximation to f in the span of \(\Phi _N\) (the first term), as long as the coefficients have sufficiently small norm (second term). Furthermore, convergence can only be expected to an accuracy up to the order of \(\sqrt{\epsilon }\). Whether or not this accuracy is achieved, depends on the existence of a representation \(\sum ^{N}_{n=1} z_n \phi _n\) in the span of \(\Phi _N\) with that accuracy and with small norm \({\left\| \varvec{z}\right\| }\) of its coefficients. This question can be studied on a case-by-case basis, as done in [3] for a variety of examples.

To answer this question more generally, it is natural to impose additional structure on \(\Phi _N\). In [3], this was done via frames. Recall that an indexed family \(\Phi := \{ \phi _n \}_{n=1}^\infty \) is a frame for a Hilbert space \(\mathrm {H}\) if it satisfies the frame condition

$$\begin{aligned} A {\left\| f\right\| }^2 \le \sum _{n=1}^\infty | \langle f, \phi _n \rangle |^2 \le B {\left\| f\right\| }^2, \qquad \forall f \in \mathrm {H}, \end{aligned}$$
(1.3)

where \(A,B>0\) are positive constants and \({\left\| \cdot \right\| }\) is the norm on \(\mathrm {H}\). The frame condition ensures the existence of bounded representations, to any accuracy, for all functions in the space. In particular, this yields the following:

Corollary 1.2

If \(\Phi := \{ \phi _n\}_{n=1}^\infty \) is a frame for \(\mathrm {H}\), and \(\Phi _N := \{ \phi _n \}_{n=1}^N\), then

$$\begin{aligned} \limsup _{N\rightarrow \infty } \Vert f - \mathcal {P}^{\epsilon }_N f \Vert \le \sqrt{\frac{\epsilon }{A}}\, \Vert f \Vert , \qquad \forall f \in \mathrm {H}. \end{aligned}$$
(1.4)

Unlike in the general setting, the frame condition imposes sufficient structure so that accuracy to order \(\sqrt{\epsilon }\) is now guaranteed for all functions in \(\mathrm {H}\). For this reason, as well as the fact that frames occur in numerous computational problems, one can think of frames as an ideal setting in which to apply Theorem 1.1. Of course function approximation with redundancy can be successful without a frame property. For example, in the absence of a frame, one may still use Theorem 1.1 to show accuracy in a subspace of \(\mathrm {H}\) consisting of functions with bounded-norm coefficient representations. But this raises the matter of whether functions of interest to a given problem belong to this space. In the absence of a frame structure, this question must then be answered on a case-by-case basis.

1.2 Main Results

In this paper, we generalize Theorem 1.1 using oversampling and allowing for generalized samples (or indirect data). In doing so, we not only allow a much broader class of samples, including, for example, pointwise evaluations, we also overcome the \(\sqrt{\epsilon }\) bottleneck.

Approximation from generalized samples In Sect. 3, instead of inner products with the elements \(\phi _n\), as was used in Theorem 1.1 [see (1.1)], the ‘data’ about the function f is now given by bounded linear functionals \(\ell _{m,M} : \mathrm {G}\rightarrow \mathbb {C}\), \(m=1,\ldots ,M\), which may depend on M and which may be only defined on a dense subspace \(\mathrm {G}\) of \(\mathrm {H}\) (e.g. in the case of pointwise evaluations when \(\mathrm {H}= \mathrm {L}^2(\Omega )\) we consider \(\mathrm {G}=\mathrm {C}(\overline{\Omega })\)). Very much reminiscent of the frame condition (1.3), the strongest general statements can be made when this data is sufficiently ‘rich’ so as to stably recover f, in particular satisfying

$$\begin{aligned} A' \Vert f \Vert ^2 \le \liminf _{M \rightarrow \infty } \sum ^{M}_{m=1} | \ell _{m,M}(f) |^2 \le \limsup _{M \rightarrow \infty } \sum ^{M}_{m=1} | \ell _{m,M}(f) |^2 \le B' \Vert f \Vert ^2,\quad \forall f \in \mathrm {G},\nonumber \\ \end{aligned}$$
(1.5)

for constants \(A',B'>0\). We refer to this as a Marcinkiewicz-Zygmund condition. A key ingredient is to allow oversampling, i.e. let \(M >N\), and consider the \(M \times N\) linear system

$$\begin{aligned} \varvec{G}_{M,N} \varvec{x} \approx \varvec{y},\qquad \varvec{y} = \{\ell _{m,M}(f) \}^{M}_{m=1}, \end{aligned}$$
(1.6)

where \(\varvec{G}_{M,N} = \{ \ell _{m,M}(\phi _n) \}^{M,N}_{m,n=1}\). As we shall subsequently explain, this system generally remains ill-conditioned for large N, even when \(M \gg N\). Hence we construct an approximation by singular value thresholding. This leads to a regularized approximation \(\mathcal {P}^{\epsilon }_{M,N} f\) whose coefficients \(\varvec{x}^{\epsilon }\) are the solution of the SVD-regularized least-squares problem corresponding to (1.6). Our main result for this setup is the following:

Theorem 1.3

The truncated SVD projection \(\mathcal {P}^{\epsilon }_{M,N} f\) satisfies

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N}f \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ {\left\| f - \sum ^{N}_{n=1} z_n \phi _n \right\| } + \kappa ^{\epsilon }_{M,N} {\left\| f - \sum ^{N}_{n=1} z_n \phi _n\right\| }_{M} + \epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \right\} ,\nonumber \\ \end{aligned}$$
(1.7)

for constants \(\kappa ^{\epsilon }_{M,N},\lambda ^{\epsilon }_{M,N} > 0\). The (absolute) condition number of the mapping \(\varvec{y} \mapsto \mathcal {P}^{\epsilon }_{M,N} f\) is precisely \(\kappa ^{\epsilon }_{M,N}\). Moreover, these constants satisfy

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} \le \frac{\sqrt{B_N}}{\epsilon },\qquad \lambda ^{\epsilon }_{M,N} \le \frac{\sqrt{B_N}}{\epsilon }, \end{aligned}$$

for all M and N, \(M \ge N\), where \(B_N\) is the Bessel constant of \(\Phi _N\) over \(\mathrm {H}_N = \mathrm {span}(\Phi _N)\). If the sampling functionals satisfy (1.5) then, for fixed N,

$$\begin{aligned} \limsup _{M \rightarrow \infty } \kappa ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}},\qquad \limsup _{M \rightarrow \infty } \lambda ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}}. \end{aligned}$$

Here, \({\left\| g\right\| }^2_{M} = \sum ^{M}_{m=1} |\ell _{m,M}(g)|^2\) is the discrete semi-norm defined by the data. We recall that \(\Phi _N\) is a Bessel sequence, since it is finite, and therefore it has a finite Bessel constant \(B_N > 0\), defined as the smallest constant for which \(\sum ^{N}_{n=1} | \langle f, \phi _n \rangle |^2 \le B_N {\left\| f\right\| }^2\), \(\forall f \in \mathrm {H}\).

This result is very general, but its main conclusion is the following. Provided M is sufficiently large and the samples satisfy (1.5) then the approximation error depends on \(f - \sum ^{N}_{n=1} z_n \phi _n\) (measured in some norm) and \(\epsilon {\left\| \varvec{z}\right\| }\). The constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) can be large when M is insufficiently large (behaving like \(\sqrt{B_N} / \epsilon \) in the worse case), but they can be made arbitrarily close to \(1/ \sqrt{A'}\) by taking M large, i.e. by increasing oversampling. Furthermore, on comparison with Theorem 1.1, we notice that \(\sqrt{\epsilon }\) in the error bound has been replaced by \(\epsilon \). Hence, under suitable assumptions on f, \(\Phi _N\) and M, we expect order \(\epsilon \) accuracy in the limit, as opposed to order \(\sqrt{\epsilon }\) accuracy.

This result raises the question of how large M needs to be in comparison to N. Later, we quantify this via the so-called stable sampling rate. Unsurprisingly, the question ‘how large is sufficiently large’ depends completely on system \(\Phi _N\) and samples \(\{\ell _{m,M} \}\), and thus must be analyzed on a case-by-case basis. In Sect. 5, we illustrate an example for which the stable sampling rate is provably linear, i.e. there exists a \(C\ge 1\) such that setting \(M \ge C N\) implies that the constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) are bounded independently of \(\epsilon \). Alternatively, this rate can also be computed numerically, as we explain in Sect. 3.6.

Frame approximation from frame samples Theorem 1.3 applies for arbitrary \(\Phi _N\) and any samples satisfying (1.5). In order to make statements about the limiting behaviour as \(M,N \rightarrow \infty \) we need two ingredients. First, a sequence \(\varvec{z}\) for which \(\sum ^{N}_{n=1} z_n \phi _n\) converges to f in the Hilbert space norm and for which the coefficient norm \({\left\| \varvec{z}\right\| }\) does not blow up. Second, additional regularity of the samples and/or f so that the M-norm can be controlled by a suitable norm in which one also has \(\sum ^{N}_{n=1} z_n \phi _n \rightarrow f\).

Whether or not such conditions hold could be answered on a case-by-case basis for particular types of sampling and approximation systems. But instead, we now address them in a general scenario where both the sampling functionals and the approximation system \(\Phi _N\) are endowed with a frame structure. Specifically, let \(\Phi = \{ \phi _n \}^{\infty }_{n=1}\) and \(\Psi = \{ \psi _m \}^{\infty }_{m=1}\) be frames for \(\mathrm {H}\) and set \(\Phi _N = \{ \phi _n \}^{N}_{N=1}\) and \(\ell _{m,M} = \langle \cdot , \psi _m \rangle \) for \(m = 1,\ldots ,M\). Note that (1.5) now automatically holds with \(\mathrm {G}= \mathrm {H}\), with \(A'\) and \(B'\) being the frame bounds for \(\Psi \).

Our main result in this case is the following:

Theorem 1.4

In the above setting, the truncated SVD projection \(\mathcal {P}^{\epsilon }_{M,N} f\) satisfies

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N}f \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ (1+ \sqrt{B'}\kappa ^{\epsilon }_{M,N}) {\left\| f - \sum ^{N}_{n=1} z_n \phi _n \right\| } + \epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \right\} , \end{aligned}$$

for certain constants \(\kappa ^{\epsilon }_{M,N},\lambda ^{\epsilon }_{M,N} > 0\). The (absolute) condition number of the mapping \(\varvec{y} \mapsto \mathcal {P}^{\epsilon }_{M,N} f\) is precisely \(\kappa ^{\epsilon }_{M,N}\). These constants satisfy

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} , \lambda ^{\epsilon }_{M,N} \le \left\{ \begin{array}{cc} \sqrt{B}/\epsilon &{} \Psi \ne \Phi \\ 1/\sqrt{\epsilon } &{} \Psi = \Phi , \end{array} \right. , \end{aligned}$$

for all M and N, \(M \ge N\), and

$$\begin{aligned} \limsup _{M \rightarrow \infty } \kappa ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}},\qquad \limsup _{M \rightarrow \infty } \lambda ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}}. \end{aligned}$$

Moreover, for each \(1< \theta < \infty \) and \(N \in \mathbb {N}\) there exists an integer \(\Theta ^{\epsilon }(N,\theta ) \in \mathbb {N}\) such that

$$\begin{aligned} \limsup _{\begin{array}{c} M,N \rightarrow \infty \\ M \ge \Theta ^{\epsilon }(N,\theta ) \end{array}} \Vert f - \mathcal {P}^{\epsilon }_{M,N}f \Vert \le \epsilon \frac{\theta }{\sqrt{A A'}} {\left\| f\right\| }. \end{aligned}$$
(1.8)

The quantity \(\Theta ^{\epsilon }(N,\theta )\) is what we term the stable sampling rate. As noted above, it can be computed numerically. Observe that in the special case \(\Psi = \Phi \), the setting of Theorem 1.1 (i.e. sampling and approximating with the same functions) is restored. However, oversampling according to (1.8) overcomes the \(\sqrt{\epsilon }\) bottleneck in (1.4), thus improving the limiting accuracy to order \(\epsilon \).

1.3 Relation to Other Work

This paper is a continuation of [3], in which the systematic study of numerical frame approximation was commenced. This study had its origins in earlier work on so-called Fourier extensions [5, 15], which are particular frames arising as restrictions of the Fourier basis on a box to a subdomain.

Our use of oversampling here is inspired by earlier work on generalized sampling in Hilbert spaces by the first author and Hansen [1, 2, 4]. That work considered both sampling and approximation using orthonormal bases and frames, introducing the stable sampling rate as well, but did not address the ill-conditioning issue for approximation in the latter. Note that the matrices \(\varvec{G}_{M,N}\) (in the case of Theorem 1.4 with \(\Psi = \Phi \)) and \(\varvec{G}_N\) are so-called uneven and finite sections respectively of the infinite Gram matrix of the full frame \(\Phi \). Using uneven as opposed to finite sections is a well-known trick in computational spectral theory [13, 14, 18].

For a more in-depth discussion of relations between this work and standard frame theory, we refer to [3].

Our focus in this paper is accuracy and conditioning of the regularized frame approximations. We do not consider efficiency, i.e. computational time, which is very much dependent on the particular frame under consideration. There are efficient numerical methods for solving (1.6) for certain frames and sampling functionals [8, 9, 20,21,22]. In the absence of a more efficient method, the cost of computing the SVD of an \(M \times N\) matrix with \(M > N\) scales as \({\mathcal O} (M N^2)\). However, based in particular on [9], some of the examples at the end of this paper can be implemented in \({\mathcal O}(N (\log N)^2)\) operations. We refer to [9] for more examples.

2 Preliminaries

We first introduce some notation and useful concepts from frame theory.

2.1 Bases and Frames

For the remainder of this paper, \(\mathrm {H}\) is a separable Hilbert space over the field \(\mathbb {C}\). We write \(\langle \cdot , \cdot \rangle \) and \({\left\| \cdot \right\| }\) for the inner product and norm on \(\mathrm {H}\) respectively. Recall that \(\Phi \) is a Riesz basis if \(\mathrm {span}(\Phi )\) is dense in \(\mathrm {H}\), and there exist constants \(A,B>0\) such that

$$\begin{aligned} A \Vert \varvec{x} \Vert ^2 \le {\left\| \sum _{n \in I} x_n \phi _n \right\| }^2 \le B \Vert \varvec{x} \Vert ^2,\quad \forall \varvec{x} = \{ x_n \}_{n \in I} \in \ell ^2(I). \end{aligned}$$
(2.1)

Here and throughout, \(\ell ^2(I)\) denotes the space of square-summable sequences indexed over I, and \({\left\| \cdot \right\| }\) denotes its norm, i.e. \({\left\| \varvec{x}\right\| } = \sqrt{\sum _{n \in I} |x_n|^2 }\). In this paper, we assume that the constants appearing in (2.2) are the optimal constants such that the inequality holds. We note that a Riesz basis is an orthonormal basis if (2.1) holds with \(A = B = 1\). In general, we may view (2.1) as a relaxed version of Parseval’s identity.

An indexed family \(\Phi \) is a frame if

$$\begin{aligned} A \Vert f \Vert ^2 \le \sum _{n \in I} | \langle f, \phi _n \rangle |^2 \le B \Vert f \Vert ^2,\quad \forall f \in \mathrm {H}, \end{aligned}$$
(2.2)

for positive constants \(A,B>0\). A frame is tight if \(A = B\). We refer to (2.2) as the frame condition. Note that (2.2) implies that \(\Phi \) is dense in \(\mathrm {H}\). It follows from (2.1) that a Riesz basis is also a frame with the same constants AB [7, Prop. 3.6.4]. But, in general, a frame need not be a Riesz basis.

2.2 Operators on Frames

Associated to any frame \(\Phi \) (and therefore any Riesz basis) is the so-called synthesis operator

$$\begin{aligned} \mathcal {T}: \ell ^2(I) \rightarrow \mathrm {H},\quad \varvec{y} = \{ y_n \}_{n \in I} \mapsto \sum _{n \in I} y_n \phi _n. \end{aligned}$$

Its adjoint, the analysis operator, is given by

$$\begin{aligned} \mathcal {T}^* : \mathrm {H}\rightarrow \ell ^2(I),\quad f \mapsto \{ \langle f, \phi _n \rangle \}_{n \in I}, \end{aligned}$$

and the composition \(\mathcal {S}= \mathcal {T}\mathcal {T}^*\), known as the frame operator, is

$$\begin{aligned} \mathcal {S}: \mathrm {H}\rightarrow \mathrm {H},\quad f \mapsto \sum _{n \in I} \langle f, \phi _n \rangle \phi _n. \end{aligned}$$

This operator is self-adjoint, bounded, invertible and positive with

$$\begin{aligned} A {\left\| f\right\| }^2 \le \langle \mathcal {S}f, f \rangle \le B {\left\| f\right\| }^2. \end{aligned}$$
(2.3)

See [7, Lemma 5.1.5]. Note that this inequality is equivalent to the frame condition (2.2). Note also that \(\mathcal {S}= \mathcal {I}\) is the identity operator for an orthonormal basis. Similarly, \(\mathcal {S}= A \mathcal {I}\) for a tight frame. However, for a general Riesz basis or frame, \(\mathcal {S}\ne \mathcal {I}\).

2.3 Dual Frames

A frame \(\Psi = \{ \psi _n \}_{n \in I} \subseteq \mathrm {H}\) is a dual frame for a given frame \(\Phi \) if

$$\begin{aligned} f = \sum _{n \in I} \langle f, \psi _n \rangle \phi _n = \sum _{n \in I} \langle f, \phi _n \rangle \psi _n,\qquad \forall f \in \mathrm {H}. \end{aligned}$$
(2.4)

If a frame \(\Phi \) is also a Riesz basis then it has a unique dual frame \(\Psi \), which is also a Riesz basis. In this case, the pair \((\Phi ,\Psi )\) is biorthogonal:

$$\begin{aligned}\langle \phi _n, \psi _m \rangle = \delta _{n,m}, \quad n,m \in I. \end{aligned}$$

Note that an orthonormal basis is self-dual, i.e. \(\Psi = \Phi \). Conversely, a frame and any of its duals are typically not biorthogonal. A frame may also have more than one dual. The so-called canonical dual frame of \(\Phi \) is the frame \(\Psi = \{ \mathcal {S}^{-1} \phi _n \}_{n \in I}\). This is a frame [7, Lemma 5.1.5], and its frame bounds are \(B^{-1}\) and \(A^{-1}\) respectively. In this case, (2.4) reads

$$\begin{aligned} f = \sum _{n \in I} \langle f, \mathcal {S}^{-1} \phi _n \rangle \phi _n = \sum _{n \in I} \langle \mathcal {S}^{-1} f, \phi _n \rangle \phi _n. \end{aligned}$$
(2.5)

We refer to the coefficients \(\varvec{a} = \{ \langle f, \mathcal {S}^{-1} \phi _n \rangle \}_{n \in I}\) as the canonical frame coefficients of f. These coefficients have the property that, amongst all possible representations of f in \(\Phi \), they have the smallest norm [7, Lemma 5.4.2]. Specifically, if \(f = \sum _{n \in I} a_n \phi _n = \sum _{n \in I} c_n \phi _n\) for some \(\varvec{c} = \{ c_n \}_{n \in I}\), then \(\Vert \varvec{c} \Vert \ge \Vert \varvec{a} \Vert \). Moreover, from the frame condition of the dual we have \(\Vert \varvec{a} \Vert \le \frac{1}{\sqrt{A}} \Vert f \Vert \).

3 Approximation from Indirect Data

In this section, we describe the general setup, which will lead eventually to Theorem 1.3. Throughout this section, the approximation system is defined by \(\Phi _N = \{ \phi _n \}_{n \in I_N } \subset \mathrm {H}\). It is an an indexed family of N elements in \(\mathrm {H}\), where \(I_N\) is an index set of cardinality N. For convenience, we now make a mild generalization over Sect. 1, allowing \(I_N\) to be an arbitrary index set rather than just \(\{1,\ldots ,N\}\). As noted, \(\Phi _N\) is a Bessel sequence, since it is finite. We write \(B_N>0\) for its Bessel constant, i.e. the smallest constant for which \(\sum ^{N}_{n=1} | \langle f, \phi _n \rangle |^2 \le B_N {\left\| f\right\| }^2\), \(\forall f \in \mathrm {H}\). We also define the operators

$$\begin{aligned}&\mathcal {T}_N : \mathbb {C}^{N} \rightarrow \mathrm {H},\ \varvec{z} = (z_n)_{n \in I_N} \mapsto \sum _{n \in I_N} z_n \phi _n\\&\mathcal {T}^*_N : \mathrm {H}\rightarrow \mathbb {C}^N,\ f \mapsto \left( \langle f, \phi _n \rangle \right) _{n \in I_N}\\&\mathcal {S}_N = \mathcal {T}_N \mathcal {T}^*_N : \mathrm {H}\rightarrow \mathrm {H},\ f \mapsto \sum _{n \in I_N} \langle f, \phi _n \rangle \phi _n. \end{aligned}$$

3.1 Indirect Data

Let \(\mathrm {G}\) be a dense subspace of the Hilbert space \(\mathrm {H}\) endowed with a norm \({\left| \left| \left| \cdot \right| \right| \right| }\). Suppose that f, the function we seek to approximate, and \(\Phi _N\) both belong to \(\mathrm {G}\). For each \(M \in \mathbb {N}\), let \(J_M\) be an index set of cardinality \(|J_M| = M\), and

$$\begin{aligned} \ell _{m,M} : \mathrm {G}\rightarrow \mathbb {C},\quad m \in J_M, \end{aligned}$$

be a set of linear functionals which are bounded with respect to \({\left| \left| \left| \cdot \right| \right| \right| }\), i.e.

$$\begin{aligned} | \ell _{m,M}(f) | \le c_{m,M} {\left| \left| \left| f \right| \right| \right| }, \qquad f \in \mathrm {G}. \end{aligned}$$
(3.1)

The data of f is given by

$$\begin{aligned} \varvec{y} = \{ \ell _{m,M}(f) \}_{m \in J_M}. \end{aligned}$$

Write \(\mathcal {M}_{M} : \mathrm {G}\rightarrow \mathbb {C}^M\) for the mapping \(\mathcal {M}_{M} f = \{ \ell _{m,M}(f) \}_{m \in J_M}\). Our goal is to compute an approximation to f in \(\Phi _N\) for some \(N \le M\) from this data.

In order to make meaningful general statements about the subsequent approximations we define, we require the data to be sufficiently rich. In analogy to the frame bounds (2.2), we shall often assume that there exist constants \(A',B' > 0\) such that

$$\begin{aligned} A' \Vert f \Vert ^2 \le \liminf _{M \rightarrow \infty } \sum _{m \in J_M} | \ell _{m,M}(f) |^2 \le \limsup _{M \rightarrow \infty } \sum _{m \in J_M} | \ell _{m,M}(f) |^2 \le B' \Vert f \Vert ^2,\quad \forall f \in \mathrm {G}.\nonumber \\ \end{aligned}$$
(3.2)

We term this a Marcinkiewicz–Zygmund condition, because it is similar (although not identical) to the classical Marcinkiewicz–Zygmund inequality in approximation theory [19]; see also [12]. We comment further on this assumption and the constants involved in Sect. 3.5.

Before going any further, let us mention several examples of this framework:

Example 3.1

Suppose that \(M = N\) and the samples of f are inner products with respect to the functions \(\phi _m\), i.e. \(\ell _{m,M}(f) = \langle f, \phi _m \rangle \), \(m=1,\ldots ,M\). Then this is precisely the setting of Theorem 1.1, and is a special case of the present setup with \(\mathrm {G}= \mathrm {H}\), \({\left| \left| \left| \cdot \right| \right| \right| } = {\left\| \cdot \right\| }\). Note that (3.2) is precisely the frame condition (2.2); in particular, it holds with \(A' = A\) and \(B' = B\).

Example 3.2

Let \(\Psi = \{ \psi _m \}_{m \in J}\) be another frame (or Riesz/orthonormal basis) of \(\mathrm {H}\) and consider samples of the form \(\ell _{m,M}(f) = \langle f, \psi _m \rangle \), \(m \in J_M\), where \(\{ J_M \}_{M \in \mathbb {N}}\) are nested index sets with \(|J_M | = M\) and \(\bigcup ^{\infty }_{M=1} J_M = J\). This problem corresponds to sampling according to the frame \(\Psi \) and reconstructing in \(\Phi \), as in the original framework of generalized sampling [1, 2]. We also have \(\mathrm {G}= \mathrm {H}\) and \({\left| \left| \left| \cdot \right| \right| \right| } = {\left\| \cdot \right\| }\) in this case, and (3.2) holds with \(A'\), \(B'\) being the frame bounds for \(\Psi \). In fact, it is straightforward to see that the the upper bound holds for any M in this case, i.e.

$$\begin{aligned} \Vert f \Vert _M \le \sqrt{B'} \Vert f \Vert , \qquad \forall f \in \mathrm {H}. \end{aligned}$$
(3.3)

Example 3.3

Consider a frame for the Hilbert space \(\mathrm {H}= \mathrm {L}^2(-1,1)\) of square-integrable functions on \((-1,1)\). Suppose that the samples are pointwise evaluations at equispaced points. In this case, we may take \(\mathrm {G}= \mathrm {C}([-1,1])\) with its usual norm, and \(\ell _{m,M}(f) = \sqrt{2/M} f(-1+2(m-1)/M)\), \(m=1,\ldots ,M\). Then (3.2) holds with \(A' = B' = 1\), since \(\sum ^{M}_{m=1} | \ell _{m,M}(f) |^2\) is a Riemann sum approximation to \(\int ^{1}_{-1} | f(x) |^2 \,\mathrm {d}x\). More generally, if \(-1 \le t_{1,M}< \ldots < t_{M,M} \le 1\) are (not necessarily equispaced) sample points, then (3.2) can be achieved with \(A' = B' = 1\) if

$$\begin{aligned} h_{M} = \max _{0,\ldots ,M} \{ t_{m+1,M} - t_{m,M} \} \rightarrow 0,\qquad M \rightarrow \infty , \end{aligned}$$

where \(t_{0,M} = t_{M,M}-2\) and \(t_{M+1,M} = t_{1,M}+2\). Indeed, in this case choosing the linear functionals as

$$\begin{aligned} \ell _{m,M}(f) = \sqrt{\frac{1}{2}(t_{m+1,M} - t_{m-1,M})} f(t_{m,M}) \end{aligned}$$

gives a convergent approximation \(\sum ^{M}_{m=1} | \ell _{m,M}(f) |^2\) to \({\left\| f\right\| }^2\) as \(M \rightarrow \infty \).

3.2 Best Approximation with Regularization from Indirect Data

Given \(f \in \mathrm {G}\) and data \(\mathcal {M}_{M}f\), we construct the approximation \(\mathcal {P}^{\epsilon }_{M,N}f\) as follows. Let

$$\begin{aligned} \varvec{G}_{M,N} = \mathcal {M}_M \mathcal {T}_N = \{ \ell _{m,M}(\phi _n) \}_{m \in J_M,n \in I_N} \in \mathbb {C}^{M \times N}. \end{aligned}$$

As we explain below, much like in the setting of Theorem 1.1, this matrix is generally ill-conditioned. This arises from the inherent redundancy of \(\Phi _N\), independently of the samples—in particular, it cannot be avoided by taking \(M \gg N\). Hence we need to regularize. Suppose that \(\varvec{G}_{M,N}\) has singular value decomposition

$$\begin{aligned} \varvec{G}_{M,N} = \varvec{U} \varvec{\Sigma } \varvec{V}^*, \end{aligned}$$
(3.4)

and let \(\epsilon > 0\) be fixed. Then we set

$$\begin{aligned} \varvec{x}^{\epsilon } = (\varvec{G}^{\epsilon }_{M,N})^{\dag } \varvec{y} = \varvec{V} (\varvec{\Sigma }^{\epsilon })^{\dag } \varvec{U}^* \varvec{y}, \end{aligned}$$
(3.5)

where \(\dag \) denotes the pseudoinverse and \(\varvec{\Sigma }^{\epsilon }\) is the diagonal matrix with \(n^{\mathrm {th}}\) entry \(\sigma _n\) if

$$\begin{aligned} \sigma _n > \epsilon , \end{aligned}$$
(3.6)

and zero otherwise. The corresponding approximation to f is

$$\begin{aligned} f \approx \mathcal {P}^{\epsilon }_{M,N} f = \mathcal {T}_N \varvec{x}^{\epsilon }. \end{aligned}$$

Observe that both the solution vector \(\varvec{x}^{\epsilon }\) and the approximation \(\mathcal {P}^{\epsilon }_{M,N} f\) are uniquely defined by construction, even if the family \(\Phi _N\) happens to be linearly dependent. For convenience, we now define the mapping

$$\begin{aligned} \mathcal {L}^{\epsilon }_{M,N} : \mathbb {C}^M \rightarrow \mathrm {H}_N,\ \varvec{y} \mapsto \mathcal {T}_N (\varvec{G}^{\epsilon }_{M,N})^\dag \varvec{y}, \end{aligned}$$

so that

$$\begin{aligned} \mathcal {P}^{\epsilon }_{M,N} f = \mathcal {T}_{N} \varvec{x}^{\epsilon } = \mathcal {L}^{\epsilon }_{M,N} \mathcal {M}_M f. \end{aligned}$$
(3.7)

Remark 3.4

To see why \(\varvec{G}_{M,N}\) is generally ill-conditioned, consider the setting of Example 3.2 and suppose that \(\Psi \) is a tight frame, i.e. \(A' = B'\). This assumption is made for convenience: the following argument readily extends to general frames. Let \(\widetilde{\mathcal {S}}_M : f \mapsto \sum _{m \in J_M} \langle f, \psi _m \rangle \psi _m\) be the partial frame operator for \(\Psi \). Since \(\Psi \) is tight, \(\mathcal {S}_{M} \rightarrow A'\mathcal {I}\) strongly as \(M \rightarrow \infty \), and therefore, for fixed N,

$$\begin{aligned} (\varvec{G}_{M,N})^* \varvec{G}_{M,N} \rightarrow A' \varvec{G}_N,\qquad M \rightarrow \infty , \end{aligned}$$

where \(\varvec{G}_N\) is the Gram matrix of \(\Phi _N\). Hence, whenever \(\varvec{G}_N\) is ill-conditioned (i.e. \(\Phi _N\) is near-redundant), we expect \(\varvec{G}_{M,N}\) to inherit the same ill-conditioning for large M.

Remark 3.5

This argument gives some insight into the advantage of oversampling. For a tight frame, \((\varvec{G}_{M,N})^* \varvec{G}_{M,N}\) is an approximate factorization of \(\varvec{G}_N\). Thus, solving the linear system (1.1) is akin to solving the normal equations of the least-squares problem \(\varvec{G}_{M,N} \varvec{x} \approx \varvec{y}\). In this sense it is not surprising that oversampling yields \(\mathcal {O}\left( \epsilon \right) \) accuracy, whereas solving (1.1) yields only \(\mathcal {O}\left( \sqrt{\epsilon } \right) \) accuracy. Indeed, this is reminiscent of the typical squaring of the condition number incurred when forming the normal equations of a least-squares problem [11, §5.3].

3.3 The Solution as an Orthogonal Projection

A key element of our subsequent analysis is the reinterpretation of the operator \(\mathcal {P}^{\epsilon }_{M,N}\) as a projection with respect to a semi-definite sesquilinear form. Specifically, we now define the data-dependent sesquilinear form \(\langle \cdot , \cdot \rangle _M\) on \(\mathrm {G}\times \mathrm {G}\) as

$$\begin{aligned} \langle f, g \rangle _M = \langle \mathcal {M}_M f, \mathcal {M}_M g \rangle = \sum _{m \in J_M} \ell _{m,M}(f) \overline{\ell _{m,M}(g)},\quad f,g \in \mathrm {G}, \end{aligned}$$

with corresponding discrete semi-norm \({\left\| f\right\| }_M = \sqrt{\langle f, f \rangle _M} = {\left\| \mathcal {M}_M f\right\| }\). Note that in general \(\langle \cdot , \cdot \rangle _M\) is semi-definite on \(\mathrm {G}_N \times \mathrm {G}_N\) as well, since

$$\begin{aligned} \langle g, g \rangle _M = {\left\| g\right\| }_M^2 = \sum _{m \in J_M} |\ell _{m,M}(g) |^2 \ge 0,\quad \forall g \in \mathrm {H}_N, \quad g \ne 0. \end{aligned}$$

In particular, for poorly chosen functionals it may be that \(\Vert g \Vert _M = 0\) for some non-trivial function \(g \in \mathrm {H}_N\). However, with assumption (3.2) we do have the limiting behaviour

$$\begin{aligned} \liminf _{M \rightarrow \infty } \langle g, g \rangle _M = \liminf _{M \rightarrow \infty } {\left\| g\right\| }_M^2 \ge A' \Vert g \Vert ^2 > 0,\quad \forall g \in \mathrm {H}_N, g \ne 0. \end{aligned}$$

This means that, ultimately, the sesquilinear form becomes an inner product on all of \(\mathrm {H}_N\).

Recall the singular value decomposition (3.4), and let \(\varvec{u}_1,\ldots ,\varvec{u}_M\), \(\varvec{v}_1,\ldots ,\varvec{v}_N\) and \(\sigma _1,\ldots ,\sigma _N\) be the left and right singular vectors and singular values of \(\varvec{G}_{M,N}\) respectively, with \(\sigma _n \ge 0\), \(n=1,\ldots ,N\). To the right singular vectors, we associate the functions

$$\begin{aligned} \xi _n = \mathcal {T}_N \varvec{v}_n \in \mathrm {H}_N,\quad n=1,\ldots ,N. \end{aligned}$$
(3.8)

By construction, these functions are orthogonal with respect to \(\langle \cdot , \cdot \rangle _M\). Indeed, by orthogonality of the singular vectors, we have

$$\begin{aligned} \langle \xi _m, \xi _n \rangle _M = \langle \mathcal {M}_M \mathcal {T}_N \varvec{v}_n, \mathcal {M}_M \mathcal {T}_N \varvec{v}_n \rangle = \sigma _n \sigma _m \langle \varvec{u}_m, \varvec{u}_n \rangle = \sigma _n \sigma _m \delta _{m,n},\quad m,n \in I_N.\nonumber \\ \end{aligned}$$
(3.9)

Here, too, it may be that \(\Vert \xi _n \Vert _M = 0\). This is the case if \(\sigma _n = 0\).

We shall, for convenience, let \(\varvec{x} = \varvec{x}^{0}\) be the solution of the unregularized problem, given by

$$\begin{aligned} \varvec{x} =(\varvec{G}_{M,N})^{\dag } \varvec{y}. \end{aligned}$$

We also write \(\mathcal {P}_{M,N} = \mathcal {P}^{0}_{M,N}\) so that \(\mathcal {P}_{M,N} f = \mathcal {T}_N \varvec{x}\). Using the expression for the pseudoinverse in terms of the SVD, we can write both \(\varvec{x}\) and \(\varvec{x}^{\epsilon }\) in terms of the left and right singular vectors:

$$\begin{aligned} \varvec{x} = \sum _{\sigma _n> 0} \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \varvec{v}_n,\qquad \varvec{x}^{\epsilon } = \sum _{\sigma _n > \epsilon } \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \varvec{v}_n. \end{aligned}$$
(3.10)

Furthermore, for \(\sigma _n > 0\) we also have

$$\begin{aligned} \langle \varvec{y}, \varvec{u}_n \rangle = \frac{\langle \mathcal {M}_M f, \varvec{G}_{M,N} \varvec{v}_n \rangle }{\sigma _n} = \frac{\langle \mathcal {M}_M f, \mathcal {M}_M \mathcal {T}_N \varvec{v}_n \rangle }{\sigma _n} = \frac{\langle f, \xi _n \rangle _M}{\sigma _n}, \end{aligned}$$

where in the last step we use (3.8). In particular, this gives

$$\begin{aligned} \mathcal {P}^{\epsilon }_{M,N} f = \mathcal {T}_N \varvec{x}^{\epsilon } = \sum _{\sigma _n> \epsilon } \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \mathcal {T}_N \varvec{v}_n = \sum _{\sigma _n > \epsilon } \frac{\langle f, \xi _n \rangle _M}{\sigma ^2_n} \xi _n. \end{aligned}$$
(3.11)

Similarly, we have

$$\begin{aligned} \mathcal {P}_{M,N} f = \mathcal {T}_N \varvec{x} = \sum _{\sigma _n > 0} \frac{\langle f, \xi _n \rangle _M}{\sigma ^2_n} \xi _n. \end{aligned}$$

Finally, we define the regularized spaces

$$\begin{aligned} \mathrm {H}^{\epsilon }_{M,N} = \mathrm {span}\left\{ \xi _n : \sigma _n > \epsilon \right\} . \end{aligned}$$

Since \(\{\varvec{v}_n \}_{n \in I_N}\) is a basis of \(\mathbb {C}^N\), we see that the functions \(\{ \xi _n \}_{\sigma _n > 0}\) are an orthogonal basis of \(\mathrm {H}_{M,N}^0\) with respect to \(\langle \cdot , \cdot \rangle _M\). We can conclude that \(\mathcal {P}_{M,N}\) is the orthogonal projection onto \(\mathrm {H}_{M,N}^0 \subseteq \mathrm {H}_N \subset G\). In turn, \(\mathcal {P}^{\epsilon }_{M,N}\) is the orthogonal projection onto the subspace \(\mathrm {H}^{\epsilon }_{M,N} \subseteq \mathrm {H}^0_{M,N}\).

A relevant property in the analysis that follows, is that these orthogonal projections imply a reduction in the M-norm:

Lemma 3.6

For any \(\epsilon \ge 0\), we have

$$\begin{aligned} \Vert \mathcal {P}_{M,N}^\epsilon f \Vert _M \le \Vert f \Vert _M, \qquad \forall f \in G. \end{aligned}$$
(3.12)

3.4 Theoretical Results

We now define the constants

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} = \max _{\begin{array}{c} \varvec{y} \in \mathrm {Ran}(\mathcal {M}_M) \\ \Vert \varvec{y} \Vert =1 \end{array}} \Vert \mathcal {L}^{\epsilon }_{M,N} \varvec{y} \Vert ,\quad \lambda ^{\epsilon }_{M,N} = \epsilon ^{-1} \max _{\begin{array}{c} \varvec{z} \in \mathbb {C}^N \\ \Vert \varvec{z} \Vert =1 \end{array}} \Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} \Vert .\nonumber \\ \end{aligned}$$
(3.13)

Note that \(\kappa ^{\epsilon }_{M,N} \) is precisely the operator norm of \(\mathcal {L}_{M,N}^{\epsilon } : \mathrm {Ran}(\mathcal {M}_M) \rightarrow \mathrm {H}_N\). Since \(\mathcal {L}_{M,N}^{\epsilon }\) is linear, it is also its absolute condition number, i.e. \(\kappa ^{\epsilon }_{M,N} \) measures the absolute effect of perturbations in the data \(\varvec{y}\) on the final approximation. The constant \(\lambda ^{\epsilon }_{M,N}\) measures how close \(\mathcal {P}^{\epsilon }_{M,N}\) is to being a projection on the subspace \(\mathrm {H}_N = \mathcal {T}_N(\mathbb {C}^N)\).

Our first result concerns the approximation error of \(\mathcal {P}^{\epsilon }_{M,N}f\):

Theorem 3.7

Let \(f \in \mathrm {G}\). The truncated SVD approximation \(\mathcal {P}^{\epsilon }_{M,N}f\) satisfies

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N}f \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ \Vert f - \mathcal {T}_N \varvec{z} \Vert + \kappa ^{\epsilon }_{M,N} \Vert f - \mathcal {T}_N \varvec{z} \Vert _M + \epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \right\} .\nonumber \\ \end{aligned}$$
(3.14)

This result differs from Theorem 1.1 in several respects. On the one hand, if the constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) are order one, the error depends on \(\epsilon {\left\| \varvec{z}\right\| }\), not \(\sqrt{\epsilon } {\left\| \varvec{z}\right\| }\), thus overcoming the \(\sqrt{\epsilon }\) bottleneck. We will discuss when this occurs in the next subsection. On the other hand, the error bound involves the discrete data norm \(\Vert f - \mathcal {T}_N \varvec{z} \Vert _M\). In general, this cannot be bounded by \(\Vert f - \mathcal {T}_N \varvec{z}\Vert \). However, one clearly has

$$\begin{aligned} \Vert f - \mathcal {T}_N \varvec{z} \Vert _M \le C_M {\left| \left| \left| f - \mathcal {T}_N \varvec{z} \right| \right| \right| },\qquad C_M = \sqrt{\sum _{m \in J_M} (c_{m,M})^2}, \end{aligned}$$

where \(c_{m,M}\) are the norms of the functionals \(\ell _{m,M}\); recall (3.1). In particular, for Example 3.3 it follows that \(\Vert f - \mathcal {T}_N \varvec{z} \Vert _M \le \sqrt{2} \Vert f - \mathcal {T}_N \varvec{z} \Vert _{\mathrm {L}^\infty }\).

Proof of Theorem 3.7

For any \(\varvec{z} \in \mathbb {C}^N\),

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N}f \Vert \le \Vert f - \mathcal {T}_N \varvec{z} \Vert + \Vert \mathcal {P}^{\epsilon }_{M,N}(f-\mathcal {T}_N \varvec{z}) \Vert + \Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} \Vert . \end{aligned}$$

Consider the second term. We have

$$\begin{aligned} \Vert \mathcal {P}^{\epsilon }_{M,N}(f-\mathcal {T}_N \varvec{z}) \Vert= & {} \Vert \mathcal {L}^{\epsilon }_{M,N} \mathcal {M}_M (f-\mathcal {T}_N \varvec{z}) \Vert \le \kappa ^{\epsilon }_{M,N} \Vert \mathcal {M}_M (f-\mathcal {T}_N \varvec{z}) \Vert \\= & {} \kappa ^{\epsilon }_{M,N} \Vert f - \mathcal {T}_N \varvec{z} \Vert _M, \end{aligned}$$

which gives the corresponding term in (3.14). Now consider the third term. It follows immediately from the definition of \(\lambda ^{\epsilon }_{M,N}\) that \(\Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} \Vert \le \epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \), as required. \(\square \)

We now consider the coefficients \(\varvec{x}^{\epsilon }\):

Theorem 3.8

Let \(f \in \mathrm {G}\). The coefficients \(\varvec{x}^{\epsilon }\) of the truncated SVD projection \(\mathcal {P}^{\epsilon }_{M,N}\) satisfy

$$\begin{aligned} \Vert \varvec{x}^{\epsilon } \Vert \le \inf _{ \varvec{z} \in \mathbb {C}^N} \left\{ \frac{1}{\epsilon } \, \Vert f - \mathcal {T}_N \varvec{z} \Vert _M + \Vert \varvec{z} \Vert \right\} . \end{aligned}$$
(3.15)

Moreover, if \(\Phi = \{ \phi _n \}_{n \in I}\) is a frame, \(\Phi _N = \{ \phi _n \}_{n \in I_N}\) and if \(\varvec{a} = \{ \langle f, \mathcal {S}^{-1}\phi _n \rangle \}_{n \in I}\) are the canonical frame coefficients of f and \(\varvec{a}^{\epsilon }_{M,N} \in \ell ^2(I)\) is the extension of \(\varvec{x}^\epsilon \) by zero, then

$$\begin{aligned} \Vert \varvec{a} - \varvec{a}^{\epsilon }_{M,N} \Vert \le \sqrt{\sum _{n \in I \backslash I_N} | a_n |^2} + \frac{1}{\epsilon } \Vert (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f \Vert _M + \epsilon \frac{\lambda ^{\epsilon }_{M,N}}{\sqrt{A}} \Vert \varvec{a} \Vert . \end{aligned}$$
(3.16)

For general measurements, (3.16) does not imply convergence of the coefficients \(\varvec{a}^{\epsilon }_{M,N}\) to the canonical frame coefficients \(\varvec{a}\) since the term \(\Vert (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f \Vert _M\) cannot typically be bounded by \(\Vert (\mathcal {S}- \mathcal {S}_N ) \mathcal {S}^{-1} f \Vert \). There is also no guarantee that \({\left| \left| \left| (\mathcal {S}- \mathcal {S}_N ) \mathcal {S}^{-1} f \right| \right| \right| } \rightarrow 0\) as \(N \rightarrow \infty \). This does hold, however, when the data arises from sampling with another frame \(\{ \psi _n \}_{n \in I}\), as in Example 3.2, due to (3.3). We will discuss this case further in Sect. 4.

Proof of Theorem 3.8

For the first part, we use (3.10) to write

$$\begin{aligned} \varvec{x}^{\epsilon } = \sum _{\sigma _n> \epsilon } \frac{\langle f, \xi _n \rangle _M}{\sigma ^2_n} \varvec{v}_n =\sum _{\sigma _n> \epsilon } \frac{\langle f-\mathcal {T}_N \varvec{z}, \xi _n \rangle _M}{\sigma ^2_n} \varvec{v}_n + \sum _{\sigma _n > \epsilon } \frac{\langle \mathcal {T}_N \varvec{z}, \xi _n \rangle _M}{\sigma ^2_n} \varvec{v}_n.\nonumber \\ \end{aligned}$$
(3.17)

Consider the first term on the right-hand side. Since the \(\varvec{v}_n\) are orthonormal, we have

$$\begin{aligned} {\left\| \sum _{\sigma _n> \epsilon } \frac{\langle f-\mathcal {T}_N \varvec{z}, \xi _m \rangle _M}{\sigma ^2_n} \varvec{v}_n\right\| }^2 = \sum _{\sigma _n> \epsilon } \frac{|\langle f-\mathcal {T}_N \varvec{z}, \xi _m \rangle _M|^2}{\sigma ^4_m} \le \frac{1}{\epsilon ^2} \sum _{\sigma _n > \epsilon } \frac{|\langle f-\mathcal {T}_N \varvec{z}, \xi _m \rangle _M|^2}{\sigma ^2_m}. \end{aligned}$$

It follows from (3.9) and (3.11) that

$$\begin{aligned} \sum _{\sigma _n > \epsilon } \frac{|\langle g, \xi _m \rangle _M|^2}{\sigma ^2_m} = {\left\| \mathcal {P}^{\epsilon }_{M,N} g\right\| }^2_{M},\quad g \in \mathrm {G}. \end{aligned}$$

Hence

$$\begin{aligned} {\left\| \sum _{\sigma _n > \epsilon } \frac{\langle f-\mathcal {T}_N \varvec{z}, \xi _m \rangle _M}{\sigma ^2_n} \varvec{v}_n\right\| }^2 \le \frac{1}{\epsilon ^2} {\left\| \mathcal {P}^{\epsilon }_{M,N}(f-\mathcal {T}_N \varvec{z})\right\| }_{M} \le \frac{1}{\epsilon ^2} \Vert f - \mathcal {T}_N \varvec{z} \Vert ^2_M, \end{aligned}$$

where in the second step we use the fact that \(\mathcal {P}^{\epsilon }_{M,N}\) is the orthogonal projection with respect to \(\langle \cdot , \cdot \rangle _M\) (recall Lemma 3.6). This gives the first term of (3.15). Next, consider the second term of the right-hand side of (3.17). Since

$$\begin{aligned} \langle \mathcal {T}_N \varvec{z}, \xi _m \rangle _M = \langle \mathcal {T}_N \varvec{z}, \mathcal {T}_N \varvec{v}_n \rangle _M = \sigma ^2_n \langle \varvec{z}, \varvec{v}_n \rangle , \end{aligned}$$
(3.18)

it follows that

$$\begin{aligned} {\left\| \sum _{\sigma _n> \epsilon } \frac{\langle \mathcal {T}_N \varvec{z}, \xi _m \rangle _M}{\sigma ^2_n} \varvec{v}_n\right\| }^2 = \sum _{\sigma _n > \epsilon } | \langle \varvec{z}, \varvec{v}_n \rangle |^2 \le \Vert \varvec{z} \Vert , \end{aligned}$$

This gives the second term of (3.15).

For (3.16), of course the canonical frame coefficients are well defined since we now assume that \(\Phi \) is a frame. We first note that

$$\begin{aligned} \Vert \varvec{a} - \varvec{a}^{\epsilon }_{M,N} \Vert \le \sqrt{\sum _{n \in I \backslash I_N} | a_n |^2} + \Vert \varvec{a}_N- \varvec{x}^{\epsilon } \Vert , \end{aligned}$$

where \(\varvec{a}_N = \{ a_n \}_{n \in I_N}\). Therefore it suffices to consider \(\Vert \varvec{a}_N - \varvec{x}^{\epsilon } \Vert \). Observe that

$$\begin{aligned} \varvec{a}_N = \sum _{n \in I_N} \langle \varvec{a}_N, \varvec{v}_n \rangle \varvec{v}_n = \sum _{n \in I_N} \langle \mathcal {S}^{-1} f , \xi _n \rangle \varvec{v}_n. \end{aligned}$$

Now \(\langle f, \xi _n \rangle _M= \langle \mathcal {S}\mathcal {S}^{-1} f, \xi _n \rangle _M\) and therefore

$$\begin{aligned} \langle f, \xi _n \rangle _M&= \langle \mathcal {S}_N \mathcal {S}^{-1} f, \xi _n \rangle _M + \langle (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f, \xi _n \rangle _M\\&= \sigma ^2_n \langle \mathcal {T}^*_N \mathcal {S}^{-1} f, \varvec{v}_n \rangle + \langle (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f, \xi _n \rangle _M. \end{aligned}$$

Notice that \(\mathcal {S}_N \mathcal {S}^{-1} f \in \mathrm {G}\) and \((\mathcal {S}- \mathcal {S}_N)\mathcal {S}^{-1} f = f - \mathcal {S}_N \mathcal {S}^{-1} f \in \mathrm {G}\). Therefore all the terms above are well defined. Hence, by (3.17),

$$\begin{aligned} \varvec{x}^{\epsilon } = \sum _{\sigma _{n}> \epsilon } \langle \mathcal {S}^{-1} f, \xi _n \rangle \varvec{v}_n + \sum _{\sigma _{n} > \epsilon } \frac{\langle (\mathcal {S}- \mathcal {S}_M) \mathcal {S}^{-1} f, \xi _m \rangle _N}{\sigma ^2_n} \varvec{v}_n , \end{aligned}$$

which gives

$$\begin{aligned} {\left\| \varvec{a}_N-\varvec{x}^{\epsilon }\right\| } \le {\left\| \sum _{\sigma _n \le \epsilon } \langle \mathcal {S}^{-1} f , \xi _n \rangle \varvec{v}_n\right\| } + {\left\| \sum _{\sigma _n > \epsilon } \frac{\langle (\mathcal {S}- \mathcal {S}_M) \mathcal {S}^{-1} f, \xi _n \rangle _M}{\sigma ^2_n} \varvec{v}_n \right\| } .\nonumber \\ \end{aligned}$$
(3.19)

Consider the first term. Let \(\varvec{z} \in \mathbb {C}^N\) be given by \(\varvec{z} = \sum _{\sigma _n \le \epsilon } \langle \mathcal {S}^{-1} f , \xi _n \rangle \varvec{v}_n\), so that the first term is merely \(\Vert \varvec{z} \Vert \). By the definition of \(\lambda ^{\epsilon }_{M,N}\), we have \(\epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \ge \Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} \Vert \). Now, since \(\mathcal {T}_N \varvec{z} \perp \mathrm {H}^{\epsilon }_{M,N}\), we have that \(\mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} = 0\). Hence

$$\begin{aligned} \epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \ge \Vert \mathcal {T}_N \varvec{z} \Vert = \sup _{\begin{array}{c} g \in \mathrm {H}\\ g \ne 0 \end{array}} \frac{| \langle \mathcal {T}_N \varvec{z}, g \rangle |}{\Vert g \Vert }. \end{aligned}$$

Set \(g = \mathcal {S}^{-1} f\). Then \(\langle \mathcal {T}_N \varvec{z}, g \rangle = \sum _{\sigma _n \le \epsilon } | \langle \mathcal {S}^{-1} f, \xi _m \rangle |^2 = \Vert \varvec{z} \Vert ^2\) and therefore we obtain \(\epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \ge \Vert \varvec{z} \Vert ^2/\Vert \mathcal {S}^{-1} f \Vert \). It follows that

$$\begin{aligned} \Vert \varvec{z} \Vert= & {} {\left\| \sum _{\sigma _n \le \epsilon } \langle \mathcal {S}^{-1} f , \xi _m \rangle \varvec{v}_n\right\| } \le \epsilon \lambda ^{\epsilon }_{M,N} {\left\| \mathcal {S}^{-1} f \right\| } \le \epsilon \lambda ^{\epsilon }_{M,N}/\sqrt{A} \Vert \varvec{a} \Vert . \end{aligned}$$
(3.20)

Now consider the second term of (3.19). We have

$$\begin{aligned} {\left\| \sum _{\sigma _n> \epsilon } \frac{\langle (\mathcal {S}- \mathcal {S}_M) \mathcal {S}^{-1} f, \xi _n \rangle _M}{\sigma ^2_n} \varvec{v}_n \right\| }^2&= \sum _{\sigma _n > \epsilon } \frac{|\langle (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f, \xi _n \rangle _M |^2}{\sigma ^4_n} \\&\le \frac{1}{\epsilon ^2} \Vert \mathcal {P}^{\epsilon }_{M,N} (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f \Vert ^2_M\\&\le \frac{1}{\epsilon ^2} \Vert (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f \Vert ^2_M. \end{aligned}$$

In the last line, we used Lemma 3.6 again. Combining this with (3.20) now gives the result. \(\square \)

Remark 3.9

These results extend to the setting of noisy measurements. Suppose the measurements are \(\varvec{y} + \varvec{n}\) where \(\varvec{y} = \mathcal {M}_M f\) and \(\varvec{n}\) is a vector of noise. We assume that \(\varvec{n} \in \mathrm {Ran}(\mathcal {M}_M)\), in other words it takes the form \(\varvec{n} = \mathcal {M}_M g\) for some \(g \in \mathrm {G}\). Then, by linearity, the reconstruction is

$$\begin{aligned} \tilde{f} = \mathcal {P}^{\epsilon }_{M,N} f + \mathcal {L}^{\epsilon }_{M,N} \varvec{n}, \end{aligned}$$

where \(\mathcal {P}^{\epsilon }_{M,N} f\) is the standard reconstruction from the noiseless data \(\varvec{y}\). Hence, by Theorem 3.7, the error satisfies

$$\begin{aligned} {{\Vert f - \tilde{f}\Vert } \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ \Vert f - \mathcal {T}_N \varvec{z} \Vert + \kappa ^{\epsilon }_{M,N} \Vert f - \mathcal {T}_N \varvec{z} \Vert _M + \epsilon \lambda ^{\epsilon }_{M,N} \Vert \varvec{z} \Vert \right\} + \kappa ^{\epsilon }_{M,N} {\left\| \varvec{n}\right\| }. }\nonumber \\ \end{aligned}$$
(3.21)

In particular, when \(\kappa ^{\epsilon }_{M,N}\) is order one, the effect of the noise is proportional to its \(\ell ^2\)-norm.

3.5 Behaviour of the Constants

We now consider the behaviour of the constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\). To do so, we define the constant \(A'_{M,N}\) as follows:

$$\begin{aligned} A'_{M,N} = \inf _{\begin{array}{c} g \in \mathrm {H}_N \\ \Vert g \Vert =1 \end{array}} \Vert g \Vert ^2_M. \end{aligned}$$
(3.22)

In general, with a poor choice of sampling functionals, it may be that \(A'_{M,N} = 0\). However, even in that case, with assumption (3.2) we have that \(\liminf _{M \rightarrow \infty } A'_{M,N} \ge A'\) for any fixed N.

Proposition 3.10

The constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) defined in (3.13) satisfy

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} \le \frac{\sqrt{B_N}}{\epsilon }, \qquad \lambda ^{\epsilon }_{M,N} \le \frac{\sqrt{B_N}}{\epsilon }, \end{aligned}$$
(3.23)

for all M and N, \(M \ge N\), where \(B_N\) is the Bessel bound for \(\Phi _N\). Moreover,

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'_{M,N}}},\qquad \lambda ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'_{M,N}}}, \end{aligned}$$
(3.24)

and if the sampling functionals satisfy (3.2) then, for fixed N,

$$\begin{aligned} \limsup _{M \rightarrow \infty } \kappa ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}} \quad \text{ and } \quad \limsup _{M \rightarrow \infty } \lambda ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}}. \end{aligned}$$

Proof

Let \(\varvec{y} \in \mathrm {Ran}(\mathcal {M}_M)\) be given and write \(\varvec{y} = \mathcal {M}_M f\) for some \(f \in \mathrm {G}\). Then, by (3.11),

$$\begin{aligned} \Vert \mathcal {L}^{\epsilon }_{M,N} \varvec{y} \Vert = \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert = {\left\| \mathcal {T}_N \sum _{\sigma _n > \epsilon } \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \varvec{v}_n \right\| }. \end{aligned}$$

Notice that \(\Vert \mathcal {T}_N \varvec{x} \Vert \le \sqrt{B_N} \Vert \varvec{x} \Vert \). This follows since any frame automatically satisfies the upper Riesz basis condition with constant equal to the Bessel bound. Hence

$$\begin{aligned} \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert ^2 \le B_N \sum _{\sigma _n > \epsilon } \frac{| \langle \varvec{y}, \varvec{u}_n \rangle |^2}{\sigma ^2_n} \le \frac{B_N}{\epsilon ^2} \Vert \varvec{y} \Vert ^2, \end{aligned}$$

which gives (3.23) for \(\kappa ^{\epsilon }_{M,N}\). For (3.24), we let \(\varvec{y} \in \mathrm {Ran}(\mathcal {M}_M)\) and write \(\varvec{y} = \mathcal {M}_M f\) for some \(f \in \mathrm {G}\) once more. From the definition (3.22) of \(A'_{M,N}\) and by Lemma 3.6 we find

$$\begin{aligned} \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert \le \frac{1}{\sqrt{A'_{M,N}}} \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert _M \le \frac{1}{\sqrt{A'_{M,N}}} {\left\| f\right\| }_{M} = \frac{1}{\sqrt{A'_{M,N}}} {\left\| \varvec{y}\right\| }. \end{aligned}$$

This gives (3.24).

We now consider \(\lambda ^{\epsilon }_{M,N}\). Let \(\varvec{z} \in \mathbb {C}^M\) be arbitrary. Using (3.11) and (3.18) we have

$$\begin{aligned} \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} = \mathcal {T}_N \sum _{\sigma _n \le \epsilon } \langle \varvec{z}, \varvec{v}_n \rangle \varvec{v}_n. \end{aligned}$$

Arguing as above, this implies that \(\Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} \Vert ^2 \le B_N \Vert \varvec{z} \Vert ^2\), which gives (3.23). For (3.24), we again let \(\varvec{z} \in \mathbb {C}^M\) be arbitrary. Then

$$\begin{aligned} {\left\| \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z}\right\| }^2_M = \sum _{\sigma _n \le \epsilon } \sigma ^2_n | \langle \varvec{z}, \varvec{v}_n \rangle |^2 \le \epsilon ^2 \Vert \varvec{z} \Vert ^2. \end{aligned}$$

Moreover, since \(\mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} \in \mathrm {H}_N\) we obtain

$$\begin{aligned} \Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z}\Vert ^2 \le \frac{1}{A'_{M,N}} \Vert \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \Vert ^2_M \le \frac{\epsilon ^2 }{A'_{M,N}} \Vert \varvec{z} \Vert ^2 , \end{aligned}$$

as required. \(\square \)

3.6 The Stable Sampling Rate

Suppose that the sampling functionals satisfy (3.2). Motivated by Proposition 3.10 we now introduce the following concept:

Definition 3.11

For \(1< \theta < \infty \) and \(N \in \mathbb {N}\), the stable sampling rate is

$$\begin{aligned} { \Theta ^{\epsilon }(N,\theta ) = \min \left\{ M \in \mathbb {N}: M \ge N,\ \kappa ^{\epsilon }_{M,N} \le \frac{\theta }{\sqrt{A'}},\ \lambda ^{\epsilon }_{M,N} \le \frac{\theta }{\sqrt{A'}} \right\} . } \end{aligned}$$

For a fixed N, suppose that M is chosen so that \(M \ge \Theta ^{\epsilon }(N,\theta )\). Then this guarantees an error bound of the form

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N} f \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ \Vert f - \mathcal {T}_N \varvec{z} \Vert +\frac{\theta }{\sqrt{A'}} \Vert f - \mathcal {T}_N \varvec{z} \Vert _M+ \epsilon \frac{\theta }{\sqrt{A'}} \Vert \varvec{z} \Vert \right\} . \end{aligned}$$

Hence, sampling according to the stable sampling rate, ensures that the error decays down to roughly \(\epsilon \) as \(N \rightarrow \infty \). This holds on the additional condition that the term \(\Vert f - \mathcal {T}_N \varvec{z} \Vert _M \rightarrow 0\); see the discussion after Theorem 3.7. Furthermore, sampling according to \( \Theta ^{\epsilon }(N,\theta )\) means that the rate of decay of the error for finite N depends completely on how well f can be approximated by elements of \(\mathrm {H}_N\) with bounded coefficients. As discussed, this depends completely on the frame \(\Phi \) and the element f being approximated. For estimates in certain cases, see [3].

Remark 3.12

If the data is noisy as in Remark 3.9 and \(M \ge \Theta ^{\epsilon }(N,\theta )\) then (3.21) becomes

$$\begin{aligned} {\Vert f - \tilde{f}\Vert } \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ \Vert f - \mathcal {T}_N \varvec{z} \Vert +\frac{\theta }{\sqrt{A'}} \Vert f - \mathcal {T}_N \varvec{z} \Vert _M+ \epsilon \frac{\theta }{\sqrt{A'}} \Vert \varvec{z} \Vert \right\} + \frac{\theta }{\sqrt{A'}} {\left\| \varvec{n}\right\| }. \end{aligned}$$

Note that \(\epsilon \) does not enter into the noise term. Recall that the first term will decrease down to a limiting accuracy of at best \(\mathcal {O}( \epsilon ) \). Hence, in the noisy case the limiting accuracy will depend on the maximum of \(\epsilon \) and \({\left\| \varvec{n}\right\| }\). In particular, this yields a simple strategy for choosing \(\epsilon \) in the noisy case, simply as proportional to the noise level.

The behaviour of \(\Theta ^{\epsilon }(N,\theta )\) as a function of N depends completely on \(\Phi \) and the sampling functionals. Thus, theoretical estimates for this quantity can only be established on a case-by-case basis. We shall consider this issue further in Sect. 5 for a particular class of problems. However, there is no general recipe for providing such estimates, and moreover, when possible, doing so typically only reveals the asymptotic growth rate of \(\Theta ^{\epsilon }(N,\theta )\) with N and not the precise constant.

On the other hand, \(\Theta ^{\epsilon }(N,\theta )\) can always be computed. To see this, we observe the following:

Lemma 3.13

The constant \(\kappa ^{\epsilon }_{M,N} \) satisfies

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} \le \sqrt{\lambda _{\max } \left( (\varvec{B}^{\epsilon }_{M,N})^* \varvec{G}_N \varvec{B}^{\epsilon }_{M,N} \right) }, \end{aligned}$$

where \(\varvec{B}^{\epsilon }_{M,N} = (\varvec{G}^{\epsilon }_{M,N})^{\dag }\) and \(\varvec{G}_N \in \mathbb {C}^{N \times N}\) is the Gram matrix of \(\Phi _N\). If \(\mathrm {Ran}(\mathcal {M}_M) = \mathbb {C}^M\) this holds with equality. The constant \(\lambda ^{\epsilon }_{M,N}\) satisfies

$$\begin{aligned} \lambda ^{\epsilon }_{M,N} = \epsilon ^{-1} \sqrt{\lambda _{\max } \left( (\varvec{C}^{\epsilon }_{M,N})^* \varvec{G}_N \varvec{C}^{\epsilon }_{M,N} \right) }, \end{aligned}$$

where \(\varvec{C}^{\epsilon }_{M,N} = \varvec{V} (\varvec{I} - \varvec{I}^{\epsilon }) \varvec{V}^*\) and \(\varvec{I}_{\epsilon }\) is the diagonal matrix with nth entry 1 if \(\sigma _n \ge \epsilon \) and zero otherwise.

Proof

Let \(\varvec{y} \in \mathrm {Ran}(\mathcal {M}_M)\) with \({\left\| \varvec{y}\right\| } = 1\). Then

$$\begin{aligned} {\left\| \mathcal {L}^{\epsilon }_{M,N} \varvec{y}\right\| }^2 = {\left\| \mathcal {T}_N \varvec{B}^{\epsilon }_{M,N} \varvec{y} \right\| }^2 = \varvec{y}^* (\varvec{B}^{\epsilon }_{M,N})^* \varvec{G}_N \varvec{B}^{\epsilon }_{M,N} \varvec{y} \le \lambda _{\max } \left( (\varvec{B}^{\epsilon }_{M,N})^* \varvec{G}_N \varvec{B}^{\epsilon }_{M,N} \right) , \end{aligned}$$

since \(\varvec{G}_N = \mathcal {T}^*_N \mathcal {T}_N\). This gives the first result.

Let \(\varvec{z} \in \mathbb {C}^N\), \({\left\| \varvec{z}\right\| }=1\). Then

$$\begin{aligned} {\left\| \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z}\right\| }^2&= {\left\| \mathcal {T}_N \left( \varvec{I}- (\varvec{G}^{\epsilon }_{M,N})^{\dag } \varvec{G}_{M,N} \right) \varvec{z} \right\| }^2 = {\left\| \mathcal {T}_N \left( \varvec{I} - \varvec{V} \varvec{I}^{\epsilon } \varvec{V}^* \right) \varvec{z}\right\| }^2\\&= {\left\| \mathcal {T}_N \varvec{C}^{\epsilon }_{M,N} \varvec{z} \right\| }^2 = \varvec{z}^* (\varvec{C}^{\epsilon }_{M,N})^* \varvec{G}_N \varvec{C}^{\epsilon }_{M,N} \varvec{z}. \end{aligned}$$

Maximizing over \(\varvec{z}\) now gives the result. \(\square \)

This result implies that \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) can be computed, and therefore so can \(\Theta ^{\epsilon }(N;\theta )\), provided the matrix \(\varvec{G}_N\) has been computed.

Remark 3.14

In practice, it may be difficult to compute \(\varvec{G}_N\), since its entries are inner products which may for instance be integrals. This may be overcome by a further approximation, e.g. a quadrature. Specifically, let \(K \ge 1\) and \(\jmath _{k,K}\) be a family of linear functionals such that

$$\begin{aligned} \lim _{K \rightarrow \infty } \sum ^{K}_{k=1} \jmath _{k,K}(f) \overline{\jmath _{k,K}(g)} = \langle f, g \rangle ,\qquad \forall f,g \in \mathrm {G}. \end{aligned}$$

Let \(\varvec{H}_{K,N} = \{ \jmath _{k,K}(\phi _n) \}^{K}_{k=1, n \in I_N} \in \mathbb {C}^{K \times N}\). Then \((\varvec{H}_{K,N})^* \varvec{H}_{K,N} \approx \varvec{G}_N\) for large K. Hence, by the previous lemma (assuming \(\mathrm {Ran}(\mathcal {M}_M) = \mathbb {C}^M\) for ease of presentation), we have

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} \approx {\left\| \varvec{H}_{K,N} \varvec{B}^{\epsilon }_{M,N}\right\| }_{2} = {\left\| \varvec{H}_{K,N} \varvec{V} (\varvec{\Sigma }^{\epsilon })^{\dag } \right\| }_{2},\qquad \lambda ^{\epsilon }_{M,N} \approx {\left\| \varvec{H}_{K,N} \varvec{V} ( \varvec{I} - \varvec{I}^{\epsilon } ) \right\| }_{2}, \end{aligned}$$

for sufficiently large K. If, for instance, the functionals \(\jmath _{k,K}\) correspond to pointwise evaluations as part of a quadrature, this gives a means of numerically approximating \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\).

4 Frame Approximation from Frame Samples

In this section, we discuss Example 3.2, in which both the approximation system and sampling functionals arise from frames of \(\mathrm {H}\). We write \(\Phi = \{ \phi _n \}_{n \in I}\) for the approximation frame (with bounds A and B) and \(\Psi = \{ \psi _m \}_{m \in J}\) for the sampling frame (with bounds \(A'\) and \(B'\)). We assume that \(\{ I_N \}_{N \in \mathbb {N}}\) is a sequence of nested index sets with

$$\begin{aligned} { I_N \subset I,\ |I_N| = N,\ \forall N \in \mathbb {N},\qquad \bigcup ^{\infty }_{N=1} I_N = I, } \end{aligned}$$

and similarly, we assume that \(\{ J_M \}_{M \in \mathbb {N}}\) is a sequence of nested index sets with

$$\begin{aligned} {J_M \subset I,\ |J_M| = M,\ \forall M \in \mathbb {N},\qquad \bigcup ^{\infty }_{M=1} I_M = J. } \end{aligned}$$

As before, we write \(\mathcal {P}^{\epsilon }_{M,N} f\) for the truncated SVD approximation of \(f \in \mathrm {H}\) (note that \(\mathrm {G}= \mathrm {H}\) in this case, since the sampling functionals arise from a frame of \(\mathrm {H}\)). We write \(\varvec{x}^{\epsilon }\) for its coefficients.

4.1 Error and Coefficient Bounds

Theorem 4.1

In the above setting, the truncated SVD projection \(\mathcal {P}^{\epsilon }_{M,N} f\) of \(f \in \mathrm {H}\) satisfies

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N}f \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ \left( 1 + \sqrt{B'} \kappa ^{\epsilon }_{M,N} \right) {\left\| f - \mathcal {T}_{N} \varvec{z}\right\| } + \epsilon \lambda ^{\epsilon }_{M,N} {\left\| \varvec{z}\right\| } \right\} . \end{aligned}$$
(4.1)

Its coefficients \(\varvec{x}^{\epsilon }\) satisfy

$$\begin{aligned} \Vert \varvec{x}^{\epsilon } \Vert \le \inf _{\varvec{z} \in \mathbb {C}^N} \left\{ \frac{\sqrt{B'}}{\epsilon } {\left\| f - \mathcal {T}_{N} \varvec{z} \right\| } + {\left\| \varvec{z}\right\| } \right\} , \end{aligned}$$
(4.2)

and, if \(\varvec{a}^{\epsilon }_{M,N} \in \ell ^2(I)\) the extension of \(\varvec{x}^\epsilon \) by zero,

$$\begin{aligned} \Vert \varvec{a} - \varvec{a}^{\epsilon }_{M,N} \Vert \le \left( 1 + \frac{\sqrt{B B'}}{\epsilon } \right) \sqrt{\sum _{n \in I \backslash I_N} | a_n |^2} + \epsilon \frac{\lambda ^{\epsilon }_{M,N}}{\sqrt{A}} \Vert \varvec{a} \Vert , \end{aligned}$$
(4.3)

where \(\varvec{a} = \{ \langle f, \mathcal {S}^{-1}\phi _n \rangle \}_{n \in I}\) are the canonical frame coefficients of \(f \in \mathrm {H}\).

Proof

Recall (3.3). The first two bounds now follow immediately from Theorems 3.7 and 3.8 respectively. For the third bound, we use use (3.16) and then observe that

$$\begin{aligned} { {\left\| (\mathcal {S}- \mathcal {S}_N) \mathcal {S}^{-1} f \right\| }^2 = {\left\| \sum _{n \in I \backslash I_N} a_n \phi _n \right\| }^2 \le B \sum _{n \in I \backslash I_N} |a_n |^2, } \end{aligned}$$

where for the final step we recall that a frame with upper frame bound B satisfies the upper Riesz basis condition with constant B. The result now follows immediately.

\(\square \)

4.2 Behaviour of the Coefficients and \(\mathcal {O}\left( \epsilon \right) \) Accuracy

We now consider the constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) and the stable sampling rate:

Proposition 4.2

The constants \(\kappa ^{\epsilon }_{M,N}\) and \(\lambda ^{\epsilon }_{M,N}\) satisfy

$$\begin{aligned} \kappa ^{\epsilon }_{M,N} , \lambda ^{\epsilon }_{M,N} \le \left\{ \begin{array}{cc} \sqrt{B}/\epsilon &{} \Psi \ne \Phi \\ 1/\sqrt{\epsilon } &{} \Psi = \Phi , \end{array} \right. , \end{aligned}$$
(4.4)

for all M and N, \(M \ge N\). Moreover, for fixed N,

$$\begin{aligned} \limsup _{M \rightarrow \infty } \kappa ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}} \quad \text{ and } \quad \limsup _{M \rightarrow \infty } \lambda ^{\epsilon }_{M,N} \le \frac{1}{\sqrt{A'}}, \end{aligned}$$

This result is essentially a special case of Proposition 3.10, except in the case where \(\Psi = \Phi \) where we have a slightly improved worst-case behaviour, with the right-hand side of (4.4) scaling like \(1/\sqrt{\epsilon }\) as opposed to \(1/\epsilon \). As is made clear by the proofs, this discrepancy is due to the fact that in the latter case the measurements are just inner products with respect to the same frame. We prove this in a moment. First, however, we consider its implications for limiting accuracy:

Corollary 4.3

For each \(1< \theta < \infty \) the truncated SVD approximation \(\mathcal {P}^{\epsilon }_{M,N} f\) satisfies

$$\begin{aligned} \limsup _{\begin{array}{c} M,N \rightarrow \infty \\ M \ge \Theta ^{\epsilon }(N,\theta ) \end{array}}\Vert f - \mathcal {P}^{\epsilon }_{M,N} f \Vert \le \epsilon \frac{\theta }{\sqrt{A A'}} \Vert f \Vert , \end{aligned}$$

where \(\Theta ^{\epsilon }(N,\theta )\) is as in Definition 3.11. Moreover, the coefficients \(\varvec{x}^{\epsilon }\) satisfy

$$\begin{aligned} {\limsup _{\begin{array}{c} M,N \rightarrow \infty \\ M \ge \Theta ^{\epsilon }(N,\theta ) \end{array}} {\left\| \varvec{x}^{\epsilon }\right\| } \le \frac{1}{\sqrt{A}} {\left\| f\right\| },\qquad \limsup _{\begin{array}{c} M,N \rightarrow \infty \\ M \ge \Theta ^{\epsilon }(N,\theta ) \end{array}} {\left\| \varvec{a} - \varvec{a}^{\epsilon }_{M,N}\right\| } \le \epsilon \frac{\theta }{\sqrt{AA'}} {\left\| \varvec{a}\right\| }. } \end{aligned}$$

Proof

The proof is based on the canonical frame coefficients \(\varvec{a} = \{ \langle f, \mathcal {S}^{-1} \phi _n \rangle \}_{n \in I}\). Let \(\varvec{z} = \{ a_n \}_{n \in I_N}\). Then \({\left\| \varvec{z}\right\| } \le {\left\| \varvec{a}\right\| } \le 1/\sqrt{A} {\left\| f\right\| }\) since the dual frame has upper frame bound \(A^{-1}\) (see Sect. 2.3). Therefore (4.1) and Proposition 4.2 gives

$$\begin{aligned} \Vert f - \mathcal {P}^{\epsilon }_{M,N} f \Vert \le \left( 1+\sqrt{\frac{B}{A}} \theta \right) {\left\| f - \sum _{n \in I_N} \langle f, \mathcal {S}^{-1} \phi _n \rangle \phi _n \right\| } + \epsilon \frac{\theta }{A}{\left\| f\right\| } . \end{aligned}$$

As \(N \rightarrow \infty \) (2.5) gives that the first term vanishes. Hence we obtain the result for f. For the other results, we use (4.2) and (4.3) instead. \(\square \)

In summary, provided M is chosen above the stable sampling rate \(\Theta ^{\epsilon }(N,\theta )\), the approximation \(\mathcal {P}^{\epsilon }_{M,N} f\) converges to within roughly \(\epsilon \) of f and the coefficients converge to within roughly \(\epsilon \) of the frame coefficients \(\varvec{a}\), and in particular are small in norm for large N. As a consequence, in the setting of Theorem 1.1 where \(\Psi = \Phi \), we overcome the \(\sqrt{\epsilon }\) bottleneck, with the limiting accuracy bounded by \(\epsilon \theta {\left\| f\right\| } / A\) as opposed to \(\sqrt{\epsilon } {\left\| f\right\| } / \sqrt{A}\) (see Corollary 1.2).

This result also illuminates the role that the frame structure plays in both the approximation and the sampling. Indeed, the limiting error depends on both of the lower frame bounds A and \(A'\), while the limiting size of the coefficients depends only on A. This is as expected. The existence of small norm coefficients depends only on the approximation frame \(\Phi \), a small limiting error depends on both the sampling frame \(\Psi \) and the approximation frame \(\Phi \).

Proof of Proposition 4.2

All results follow immediately from Proposition 3.10, except for (4.4) in the case \(\Psi = \Phi \) for which we require a different argument.

Consider \(\kappa ^{\epsilon }_{M,N}\) first. Let \(\varvec{y} \in \mathrm {Ran}(\mathcal {M}_M)\) be given and notice that we may write \(\varvec{y} = \mathcal {T}^*_M f\) for some \(f \in \mathrm {H}\) so that \( \Vert \mathcal {L}^{\epsilon }_{M,N} \varvec{y} \Vert = \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert \). By (3.10) we have

$$\begin{aligned} { \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert ^2 = \sum _{\sigma _m,\sigma _n > \epsilon } \frac{\langle \varvec{y}, \varvec{u}_m \rangle \overline{\langle \varvec{y}, \varvec{u}_n \rangle } }{\sigma _m \sigma _n} \langle \xi _m, \xi _n \rangle . } \end{aligned}$$

Recall that \(\mathcal {M}_M \mathcal {T}_N \varvec{v}_m = \sigma _m \varvec{u}_m\). Since \(\mathcal {M}_M = \mathcal {T}^*_M\) in this case, we have \(\mathcal {T}^*_N \mathcal {T}_N \varvec{v}_m = \sigma _m \varvec{\tilde{u}}_m\), where where \(\varvec{\tilde{u}}_m \in \mathbb {C}^N\) is the vector with entries \((\varvec{\tilde{u}}_m)_k = (\varvec{u}_m)_k\) for \(k \in I_N\). Hence \(\langle \xi _m, \xi _n \rangle = \langle \mathcal {T}^*_N \mathcal {T}_N \varvec{v}_m, \varvec{v}_n \rangle = \sigma _m \langle \varvec{\tilde{u}}_m, \varvec{v}_n \rangle \) and this gives

$$\begin{aligned} { \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert ^2 = \left\langle \sum _{\sigma _n> \epsilon } \langle \varvec{y}, \varvec{u}_m \rangle \varvec{\tilde{u}}_m , \sum _{\sigma _n> \epsilon } \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \varvec{v}_n \right\rangle \le {\left\| \sum _{\sigma _n> \epsilon } \langle \varvec{y}, \varvec{u}_m \rangle \varvec{\tilde{u}}_m \right\| } {\left\| \sum _{\sigma _n > \epsilon } \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \varvec{v}_n \right\| }. } \end{aligned}$$

By orthogonality, the second term satisfies

$$\begin{aligned} { {\left\| \sum _{\sigma _n> \epsilon } \frac{\langle \varvec{y}, \varvec{u}_n \rangle }{\sigma _n} \varvec{v}_n \right\| } = \sqrt{\sum _{\sigma _n > \epsilon } \frac{| \langle \varvec{y}, \varvec{u}_n \rangle |^2}{\sigma ^2_n}} \le \frac{{\left\| \varvec{y}\right\| }}{\epsilon }. } \end{aligned}$$

Consider the first term. Let \(\mathcal {Q}_N : \mathbb {C}^M \rightarrow \mathbb {C}^M\) be the projection defined by \((\mathcal {Q}_N \varvec{x})_{m} = x_m\), \(m \in I_N\) and \((\mathcal {Q}_N \varvec{x})_m = 0\), \(m \in I_M \backslash I_N\). Then

$$\begin{aligned} { {\left\| \sum _{\sigma _n> \epsilon } \langle \varvec{y}, \varvec{u}_m \rangle \varvec{\tilde{u}}_m \right\| } = {\left\| \mathcal {Q}_N \left( \sum _{\sigma _n> \epsilon } \langle \varvec{y}, \varvec{u}_m \rangle \varvec{u}_m \right) \right\| } \le {\left\| \sum _{\sigma _n > \epsilon } \langle \varvec{y}, \varvec{u}_m \rangle \varvec{u}_m\right\| } \le {\left\| \varvec{y}\right\| }. } \end{aligned}$$

Therefore, we deduce that

$$\begin{aligned} { \Vert \mathcal {P}^{\epsilon }_{M,N} f \Vert ^2 \le {\left\| \varvec{y}\right\| }/\epsilon , } \end{aligned}$$

and the result for \(\kappa ^{\epsilon }_{M,N}\) now follows from its definition.

Now consider \(\lambda ^{\epsilon }_{M,N}\). Let \(\varvec{z} \in \mathbb {C}^N\) be arbitrary and recall that

$$\begin{aligned} { \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z} = \sum _{\sigma _n \le \epsilon } \langle \varvec{z}, \varvec{v}_n \rangle \xi _n. } \end{aligned}$$

Hence

$$\begin{aligned} { {\left\| \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z}\right\| }^2 = \sum _{\sigma _m,\sigma _n \le \epsilon } \langle \varvec{z}, \varvec{v}_n \rangle \overline{\langle \varvec{z}, \varvec{v}_n \rangle } \langle \xi _m, \xi _n \rangle . } \end{aligned}$$

As above, we note that \(\langle \xi _m, \xi _n \rangle = \sigma _m \langle \varvec{\tilde{u}}_m, \varvec{v}_n \rangle \), and therefore

$$\begin{aligned} {{\left\| \mathcal {T}_N \varvec{z} - \mathcal {P}^{\epsilon }_{M,N} \mathcal {T}_N \varvec{z}\right\| }^2 = \left\langle \sum _{\sigma _n \le \epsilon } \sigma _m \langle \varvec{z}, \varvec{v}_m \rangle \varvec{\tilde{u}}_m ,\sum _{\sigma _n \le \epsilon } \langle \varvec{z}, \varvec{v}_n \rangle \varvec{v}_n \right\rangle \le \epsilon \Vert \varvec{z} \Vert ^2. } \end{aligned}$$

Since \(\varvec{z}\) was arbitrary, we now obtain the result for \(\lambda ^{\epsilon }_{M,N}\). \(\square \)

5 ONB\(+1\) and ONB\(+K\) Frames

We conclude this paper with several examples to illustrate the stable sampling rate.

First, let \(\{ \varphi _{n} \}_{n \in \mathbb {N}}\) be an orthonormal basis of \(\mathrm {H}\) and \(\psi \in \mathrm {H}\), \({\left\| \psi \right\| } = 1\), be such that \(\langle \psi , \varphi _n \rangle \ne 0\) for infinitely-many \(n \in \mathbb {N}\). Then the indexed family

$$\begin{aligned} { \Phi = \{ \phi _0,\phi _1,\ldots \} = \{ \psi , \varphi _1 , \varphi _2 ,\ldots \}, } \end{aligned}$$

is a frame for \(\Phi \) with frame bounds \(A = 1\) and \(B = 2\). We refer to this frame as the ONB\(+1\)frame. Note that it was previously used in [3] to show that the Gram matrix of a frame can be arbitrarily badly conditioned. It is motivated by the idea of ‘enriching’ an orthonormal basis to better capture a certain feature of a function under approximation (e.g. a singularity or oscillation).

Throughout this section, we let \(\mathcal {Q}_{N}\) denote the projection onto \(\mathrm {span}\{ \varphi _1,\ldots ,\varphi _{N} \}\), i.e.

$$\begin{aligned} { \mathcal {Q}_{N} f = \sum ^{N}_{n=1} \langle f, \varphi _n \rangle \varphi _n. } \end{aligned}$$

5.1 The Stable Sampling Rate for the ONB\(+1\) Frame

A problem of interest is that where the samples are inner products with respect to the orthonormal basis \(\{ \varphi _m \}_{m \in \mathbb {N}}\). That is,

$$\begin{aligned} {\ell _{m,M}(f) = \ell _m(f) =\langle f, \varphi _m \rangle ,\quad m = 1,\ldots ,M,\ M \in \mathbb {N}. } \end{aligned}$$
(5.1)

For instance, these are Fourier coefficients if \(\{ \varphi _m \}_{m \in \mathbb {N}}\) is the Fourier basis, and hence the goal would be to compute a better approximation in the frame \(\Phi \) from the given Fourier data. Note that this is an instance of Sect. 4 with \(A' = B' = 1\). Note also that \({\left\| g\right\| }_M = {\left\| \mathcal {Q}_M g\right\| }\). Recalling Proposition 3.10, to determine the stable sampling rate we note that it suffices to estimate

$$\begin{aligned} {A'_{M,N}= \inf _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} {\left\| \mathcal {Q}_M g\right\| }^2 = 1 - \sup _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} {\left\| g - \mathcal {Q}_M g\right\| }^2, } \end{aligned}$$
(5.2)

where \(\mathrm {H}_N = \mathrm {span}\{ \psi , \varphi _1,\ldots ,\varphi _{N-1} \}\).

Lemma 5.1

For \(M \ge N\), we have

$$\begin{aligned} {\sup _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} {\left\| g - \mathcal {Q}_M g\right\| } = \frac{{\left\| \psi - \mathcal {Q}_M \psi \right\| }}{{\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }}. } \end{aligned}$$

Proof

Let \(g \in \mathrm {H}_N\) and write \(g = x_0 \psi + \sum ^{N-1}_{n=1} x_{n} \varphi _{n}\). Then

$$\begin{aligned}{ \langle g, \varphi _n \rangle = x_0 \langle \psi , \varphi _n \rangle + x_{n},\quad n = 1,\ldots ,N-1, } \end{aligned}$$

and hence

$$\begin{aligned}{ g = x_0 \psi + \sum ^{N-1}_{n=1} \left( \langle g, \varphi _n \rangle - x_0 \langle \psi , \varphi _n \rangle \right) \varphi _n = x_0 (\psi - \mathcal {Q}_{N-1} \psi )+ \mathcal {Q}_{N-1} g. } \end{aligned}$$

Rearranging gives

$$\begin{aligned} { g - \mathcal {Q}_{N-1} g = x_0 (\psi - \mathcal {Q}_{N-1} \psi ), } \end{aligned}$$

and taking the norm of both sides, we find that

$$\begin{aligned} { |x_0| = \frac{{\left\| g-\mathcal {Q}_{N-1} g\right\| }}{{\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }} \le \frac{{\left\| g\right\| }}{{\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }} = \frac{1}{{\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }}. } \end{aligned}$$

Also, we have

$$\begin{aligned} {\langle g, \varphi _n \rangle = x_0 \langle \psi , \varphi _n \rangle ,\quad n \ge N. } \end{aligned}$$

Therefore

$$\begin{aligned}{ {\left\| g - \mathcal {Q}_{M} g\right\| } = |x_0| {\left\| \psi - \mathcal {Q}_M \psi \right\| }. } \end{aligned}$$

Combining this with the bound for \(|x_0|\) gives that

$$\begin{aligned} {\sup _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} {\left\| g - \mathcal {Q}_M g\right\| } \le \frac{{\left\| \psi - \mathcal {Q}_M \psi \right\| }}{{\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }} } \end{aligned}$$

To show equality, we divide into two cases. Suppose first that \(\psi \perp \mathrm {span}\{ \varphi _1,\ldots ,\varphi _{N-1}\}\). Then \(\mathcal {Q}_{N-1} \psi = 0\) and so we may take \(g = \psi / {\left\| \psi \right\| }\) to obtain equality. On the other hand, if \(\psi \not \perp \mathrm {span}\{ \varphi _1,\ldots ,\varphi _{N-1}\}\) then there exists a \(g \in \mathrm {H}_N\), \({\left\| g\right\| } = 1\), with \(\mathcal {Q}_{N-1} g = 0\). In this case, the above arguments give that \({\left\| g - \mathcal {Q}_{M} g\right\| } = {\left\| \psi - \mathcal {Q}_M \psi \right\| } / {\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }\), which implies the result. \(\square \)

This leads us to the following result:

Theorem 5.2

Suppose that \(\psi \) is such that \(| \langle \psi , \varphi _n \rangle | \sim c n^{-\alpha }\) as \(n \rightarrow \infty \) for some \(c>0\) and \(\alpha > 1/2\). Then the stable sampling rate

$$\begin{aligned} { \Theta ^{\epsilon }(N,\theta ) \le C N, } \end{aligned}$$

for some constant \(C>0\) depending on c, \(\alpha \) and \(\theta \) only. Conversely, if \(| \langle \psi , \varphi _n \rangle | \sim c \rho ^{-n}\) as \(n \rightarrow \infty \) for some \(c>0\) and \(\rho > 1\) then

$$\begin{aligned} {\Theta ^{\epsilon }(N,\theta ) \le N + C, } \end{aligned}$$

where \(C>0\) depends on c, \(\rho \) and \(\theta \) only.

Proof

In the first case, the condition on the coefficients gives

$$\begin{aligned} { {\left\| \psi - \mathcal {Q}_{N} \psi \right\| }^2 = \sum _{n \ge N} | \langle \psi , \varphi _n \rangle |^2 \sim c' N^{1-2\alpha },\quad N \rightarrow \infty , } \end{aligned}$$

where \(c'\) depends on c and \(\alpha \). Hence Lemma 5.1 and the bound (5.2) give

$$\begin{aligned} { A'_{M,N} \ge 1 - c'' \left( \frac{M^{1/2-\alpha } }{N^{1/2-\alpha }} \right) ^2, } \end{aligned}$$

for some constant \(c''\) depending on c and \(\alpha \). Recalling that \(B' = A' = A = 1\) and \(B = 2\) for this frame and using Proposition 3.10 gives the first result. For the second result, we notice that

$$\begin{aligned} { {\left\| \psi - \mathcal {Q}_{N} \psi \right\| }^2 \sim c \frac{\rho ^{-N}}{1-\rho }. } \end{aligned}$$

We now argue as in the previous case. \(\square \)

This result shows that the stable sampling rate is linear when \(\psi \) has algebraically or exponentially-decaying coefficients in the orthonormal basis \(\{ \varphi _n \}_{n \in \mathbb {N}}\). Furthermore, the better \(\psi \) is approximated in this basis, the smaller the stable sampling rate is, as evidenced by the case of exponentially-decaying coefficients. In fact, Lemma 5.1 demonstrates the connection between the stable sampling rate and how well approximated \(\psi \) is in the orthonormal basis \(\{ \varphi _n \}_{n \in \mathbb {N}}\). Specifically, the faster the projection errors \({\left\| \psi - \mathcal {Q}_M \psi \right\| }\) decay, the smaller \(M \ge N\) needs to be so that \(\frac{{\left\| \psi - \mathcal {Q}_M \psi \right\| }}{{\left\| \psi - \mathcal {Q}_{N-1} \psi \right\| }} \le \delta \) for constant \(0< \delta < 1\). This is intuitive. The better \(\psi \) is approximated by this basis, the more information the data, i.e. inner products with the \(\varphi _n\), carries about the element g.

On the other hand, the worse g is approximated the higher the stable sampling rate. Indeed, if \({\left\| \psi - \mathcal {Q}_M \psi \right\| } \asymp (\log (N))^{-1}\) then it is a simple exercise to show that the stable sampling rate is algebraic in N with the power depending on \(\theta \), i.e. \(\Theta ^{\epsilon }(N,\theta ) = \mathcal {O}\left( N^{h(\theta )} \right) \) for some function \(h(\theta ) \ge 1\) with \(h(\theta ) \rightarrow \infty \) as \(\theta \rightarrow 1^+\).

Remark 5.3

One can also determine a bound on the stable sampling rate for special case \(\Phi = \Psi \), also discussed in Sect. 4. In this case, the data is given by the inner products

$$\begin{aligned} { \langle f, \psi \rangle ,\quad \langle f, \varphi _m \rangle ,\quad m=1,\ldots ,M-1. } \end{aligned}$$

Indeed, observe that

$$\begin{aligned} A'_{M,N}= & {} \inf _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} \left( | \langle g, \psi \rangle |^2 + {\left\| \mathcal {Q}_{M-1} g\right\| }^2 \right) \ge \inf _{\begin{array}{c} g \in \mathrm {H}_N\\ {\left\| g\right\| } = 1 \end{array}} {\left\| \mathcal {Q}_{M-1} g\right\| }^2. \end{aligned}$$

The right-hand side is precisely the constant in (5.2) with M replaced by \(M-1\). Hence, up to an additive factor of one, the stable sampling rate for this problem satisfies the same bounds as those of Theorem 5.2.

5.2 The Approximation of Functions with Logarithmic Singularities

Let \(\mathrm {H}= L^2(0,1)\). The scaled Legendre polynomials, \(\varphi _{n}(x) = \sqrt{2n-1} P_{n-1}(2x-1)\), \(n \in \mathbb {N}\), form an orthonormal basis for \(\mathrm {H}\). Here \(P_n(x)\) is the usual Legendre polynomial, with normalization \(P_n(1) = 1\). This basis is extremely good at approximating smooth functions. However, many functions that may arise in applications, such as Green’s functions or solutions to PDEs on domains with corners, fail to be smooth at a point x, yet posses a known type of singularity there. That is, in these applications we may want to approximate functions of the form

$$\begin{aligned} {f(x) = g(x) + w(x) h(x),\quad x \in (0,1) } \end{aligned}$$
(5.3)

where gh are smooth functions, and \(w \in L^2(0,1)\) is a known function which may be singular at, say, \(x = 0\). Such functions cannot generally be accurately approximated using polynomials alone. However, they can be more accurately captured by enriching the polynomial basis with the function w. This gives a frame

$$\begin{aligned} {\Phi = \{ \varphi _{n} \}^{\infty }_{n=1} \cup \{ w \}, } \end{aligned}$$
(5.4)

for \(\mathrm {H}\). Indeed, since the \(\varphi _n\) are an orthonormal basis, it quickly follows that

$$\begin{aligned} { {\left\| f\right\| }^2_{2} = \sum ^{\infty }_{n=1} | \langle f, \varphi _{n} \rangle |^2 \le \sum ^{\infty }_{n=1} | \langle f, \varphi _{n} \rangle |^2 + | \langle f, w \rangle |^2 \le {\left\| f\right\| }^2 + {\left\| f\right\| }^2 {\left\| w\right\| }^2 . } \end{aligned}$$

Hence this is a frame with bounds \(A \ge 1\) and \(B \le 1 + {\left\| w\right\| }^2\).

Fig. 1
figure 1

Pointwise error as a function of the polynomial degree N for the approximation of the logarithmically singular function (5.5) on [0, 1] using Legendre polynomials (left panel) and Legendre polynomials augmented with \(\log x\) (right panel). The error is shown in four points in the interval [0, 1]. In both cases, the approximation problem is solved using generalized sampling (5.1) with \(M=2N\). The generalized samples \((\langle f,\varphi _m\rangle )_{m=1}^{M}\) were evaluated using adaptive numerical integration. The regularization threshold is \(\epsilon = 2e^{-13}\)

The case of a logarithmic singularity, i.e. \(w(x) = \log (x)\), is an important instance of the problem. Figure 1 gives an illustration of the benefits of this frame over just the polynomial basis for approximating the simple yet singular function

$$\begin{aligned} {f(x) = e^x + \log (x) \cos (x). } \end{aligned}$$
(5.5)

The polynomial interpolation to f converges poorly, as expected. However, adding just the single element \(w(x)=\log (x)\) to the basis results in significantly faster convergence rates, shown in Fig. 1b. Importantly, note that the approximation scheme does not evaluate the smooth parts of f separately. They are implicitly approximated simultaneously when approximating f from its samples. Indeed, if the smooth parts of f were known separately in an application, the approximation problem simplifies and there would be no need to construct a frame. Note also that the evaluation of the generalized samples (5.1) requires the evaluation of integrals, and this step is computationally demanding because the integrals are weakly singular. In subsequent examples, we shall consider a fully-discrete approximation based on function samples.

Using Theorem 5.2, we may estimate the stable sampling rate for this problem:

Proposition 5.4

Let \(\mathrm {H}= L^2(0,1)\), \(w(x) = \log (x)\), \(\{ \varphi _n \}\) be the Legendre basis on \(\mathrm {H}\), \(\Phi \) be as in (5.4) and consider the sampling functionals (5.1). Then the stable sampling rate for this problem is linear in N, and specifically,

$$\begin{aligned} \Theta ^{\epsilon }(N,\theta ) \le \max \left\{ N , \frac{N-1}{\sqrt{1-1/\theta ^2}} \right\} ,\quad \forall \theta > 1,\ N \ge 2. \end{aligned}$$

Proof

The Legendre polynomials satisfy

$$\begin{aligned} {\int ^{1}_{0} P_n(2x-1) \log (x) \,\mathrm {d}x = \frac{(-1)^{n+1}}{n(n+1)},\quad n \ge 1.} \end{aligned}$$

This follows from the differential equation \(((1-x^2) P'_n(x))' + n(n+1) P_n(x) = 0\) after two integrations by parts. Let \(\psi (x) = \log (x)\). Then, for \(M \ge 1\),

$$\begin{aligned} {{\left\| \psi - \mathcal {Q}_{M} \psi \right\| }^2 = \sum _{m > M} | \langle \psi , \varphi _m \rangle |^2 = \sum _{m \ge M} \frac{2m+1}{m^2(m+1)^2} = \sum _{m \ge M} \left( \frac{1}{m^2} - \frac{1}{(m+1)^2} \right) = \frac{1}{M^2}.} \end{aligned}$$

Lemma 5.1 now gives

$$\begin{aligned} {\sup _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} {\left\| g - \mathcal {Q}_M g\right\| } = \frac{N-1}{M}.} \end{aligned}$$

Therefore

$$\begin{aligned} {\sup _{\begin{array}{c} g \in \mathrm {H}_N \\ {\left\| g\right\| } = 1 \end{array}} {\left\| g - \mathcal {Q}_M g\right\| } \le \sqrt{1-1/\theta ^2},} \end{aligned}$$

provided

$$\begin{aligned} {\frac{N-1}{M} \le \sqrt{1-1/\theta ^2} \qquad \Leftrightarrow \qquad M \ge \frac{N-1}{\sqrt{1-1/\theta ^2}}, } \end{aligned}$$

as required.\(\square \)

Fig. 2
figure 2

Pointwise error as a function of the polynomial degree N for the approximation of the logarithmically singular function (5.5) on [0, 1], using Legendre interpolation (left panel) and the ONB\(+K\) frame \(\Phi \) with Legendre polynomials, \(K=5\) and \(w(x)=\log x\) (right panel). The error is shown in three points in the interval [0, 1] as well as in the \(L^2\)-norm on [0, 1]. In both cases, the samples are function evaluations in the Legendre nodes. The left panel is based on interpolation, the right panel corresponds to a discrete least squares approximation with \(M=2N\) and regularization threshold \(\epsilon = 2e^{-13}\)

Fig. 3
figure 3

Pointwise error as a function of the number of samples M for the ONB\(+K\) frame based on Legendre polynomials, with degree \(N=40\), \(K=5\) and \(w(x)=\log x\) as in Fig. 2b. Similar to regular polynomial approximation, using Legendre points yields better accuracy than using equispaced points. They also require less oversampling, i.e., smaller values of M, to achieve the best error

Fig. 4
figure 4

The values of \(\kappa _{M,N}^\epsilon \) and \(\lambda _{M,N}^\epsilon \) are shown as a function of N with constant oversampling \(M = \gamma N\) and varying factors \(\gamma \). Legendre points (top row) and equispaced points (bottom row) are used for the ONB\(+K\) frame using Legendre polynomials, \(w(x)=\log (x)\) and \(K=5\). The threshold used here is \(\epsilon = 1e^{-5}\): the values are bounded for Legendre points but they approach \(1/\epsilon \) for equispaced points. Equispaced points require more than linear oversampling

Fig. 5
figure 5

The same as Fig. 4 but with threshold \(\epsilon = 1e^{-8}\). For Legendre nodes (top row), the intermediate peaks are higher, corresponding to the larger upper bounds in (3.23). However, the values settle down for increasing N, with a clear benefit for larger oversampling factors. This is in agreement with the limit (3.24). For equispaced samples, linear oversampling is not sufficient and, as in the previous figure, the values seem to approach \(1/\epsilon \). This again corresponds to (3.23)

5.3 ONB+K Frames

Functions with logarithmic singularities can be more accurately approximately using the frame (5.4) than the Legendre polynomial basis alone. However, the accuracy may be limited, due to the presence of weak logarithmic singularities. To increase the accuracy one may consider a frame of the type

$$\begin{aligned} {\Phi = \{ \varphi _{n} \}^{\infty }_{n=1} \cup \{ \psi _k \}^{K}_{k=1},} \end{aligned}$$
(5.6)

for fixed \(K \ge 1\), where \(\psi _k(x) = w(x) \varphi _k(x)\).

Proposition 5.5

Let \(\{ \varphi _n \}\) be the Legendre basis on \(\mathrm {H}\) and \(w \in L^2(0,1)\). Then (5.6) is a frame for any fixed \(K \ge 1\), with frame bounds

$$\begin{aligned} { 1 \le A \le B \le 1 + {\left\| w\right\| }^2 K^2. } \end{aligned}$$

Proof

First observe that \(\psi _k \in L^2(0,1)\) since \(w \in L^2(0,1)\) and \(\varphi _{k} \in L^{\infty }(0,1)\). Second, we have

$$\begin{aligned} {{\left\| f\right\| }^2 = \sum ^{\infty }_{n=1} | \langle f, \varphi _n \rangle |^2 \le \sum ^{\infty }_{n=1} | \langle f, \varphi _n \rangle |^2 + \sum ^{K}_{k=1} | \langle f, \psi _k \rangle |^2 , } \end{aligned}$$

and therefore the lower frame condition holds with \(A \ge 1\). Moreover,

$$\begin{aligned} \sum ^{\infty }_{n=1} | \langle f, \varphi _n \rangle |^2 + \sum ^{K}_{k=1} | \langle f, \psi _k \rangle |^2&\le {\left\| f\right\| }^2 + {\left\| f\right\| }^2 {\left\| w\right\| }^2 \sum ^{K}_{k=1} {\left\| \varphi _k\right\| }^2_{L^{\infty }}\\&= {\left\| f\right\| }^2 \left( 1 + {\left\| w\right\| }^2 \sum ^{K}_{k=1} (2k-1) \right) \\&= {\left\| f\right\| }^2 \left( 1 + {\left\| w\right\| }^2 K^2 \right) . \end{aligned}$$

Here, in the penultimate step, we use the fact that \(| \varphi _{n}(x) | \le \varphi _n(1) = \sqrt{2n-1}\) for \(0 \le x \le 1\). This completes the proof. \(\square \)

In Fig. 2 we demonstrate the benefits of this frame for approximating the singular function given by (5.5). Here, rather than the inner products (5.1), we take the sampling functionals to be pointwise evaluations at the Legendre nodes, i.e., at the roots of a high degree Legendre polynomial, mapped from \([-1,1]\) to [0, 1]. We also compare these approximations with the polynomial interpolant at these nodes. As is evident, polynomial interpolation performs poorly. In contrast, the convergence for the ONB\(+K\) frame is significantly faster. Figure 2 also illustrates the stability of the numerical approximation using oversampling with \(M=2N\). The accuracy reaches machine precision, in spite of it requiring the solution of an extremely ill-conditioned linear system of equations, and this high level of accuracy is maintained as N grows.

The influence of the oversampling factor M is illustrated in Fig. 3. Here, the error is shown as a function of M, for constant \(N=40\). Best accuracy is only achieved for \(M > N\), i.e., when using some amount of oversampling. We have used discrete sampling in this figure using Legendre points (left panel) and equispaced points on [0, 1] (right panel). It is not unexpected that Legendre points are a better choice: less oversampling is needed to achieve the best accuracy for the given N.

The behaviour shown in Fig. 3 can be explained by computing the corresponding constants \(\kappa _{M,N}^\epsilon \) and \(\lambda _{M,N}^\epsilon \). This also serves to illustrate the stable sampling rate for this problem. Their values are shown in Fig. 4 for several choices of the oversampling factor \(\gamma \), with M and N such that \(M = \gamma N\). The convergence of both values to constants of modest size, in particular much smaller than \(1/\epsilon \), suggests that the stable sampling rate is indeed linear when sampling in the Legendre points. For equispaced points, as could be expected, this does not seem to be the case.

The results in Fig. 4 correspond to the threshold \(\epsilon = 1e^{-5}\). For comparison, the experiment is repeated in Fig. 5 for the smaller threshold \(\epsilon = 1e^{-8}\). The latter figure illustrates the larger upper bound of (3.23), on the order of \(1/\epsilon \), in the pre-asymptotic regime. Still, for the case of Legendre nodes, linear oversampling is sufficient to reach the \(\epsilon \)-independent (and small) limit (3.24).

The constants were computed following the approach described in Remark 3.14. We have run this experiment in higher precision arithmetic, in order to exclude the possibility of inaccuracies in their computation. An exponentially converging composite hp-graded quadrature rule was used to approximate the singular integrals that arise in the elements of the Gram matrix. Furthermore, in this case we also weighted the discrete samples in Legendre points by the square roots of the corresponding Gauss–Legendre quadrature weights: this discrete normalization ensures that \(A' = B' = 1\) in (3.2) and leads to slightly smaller values of the constants (and, correspondingly, smaller error in the approximation).