1 Introduction

The space of signals with finite degree of freedom per unit of time is called the space with finite rate of innovation (FRI) [2, 5, 12, 15, 17, 19, 23]. The concept is introduced by Vetterli et al. [23] at first. The FRI model is ubiquitous and has wide scientific applications such as radar imaging [15], compression of electrocardiogram signals [2], curve fitting [12]. The model of signals with finite rate of innovation can cover following cases: (i) band-limited signals [3], (ii) signals in some shift-invariant spaces [1, 7,8,9, 24, 25], (iii) non-uniform splines [16, 20, 21], (iv) stream of pulses \(\sum_{k} a_{k}p(t-t_{k})\) where \(p\) is some pulse signal shape, applied in GPS applications and cellular radio, (v) sum of some of the signals above [19], and so on.

In this paper, we consider the signal spaces of finite rate of innovation \(V^{2}(\varLambda,\varPhi)\subset{\mathrm{L}}^{2}(\mathbb {R}^{d})\), defined as the closed linear span of a tuple of generators \(\{\phi_{\lambda}\} _{\lambda\in \varLambda} \subset{\mathrm{L}}^{2}(\mathbb {R}^{d})\), which is introduced in [17, 19]:

$$V^{2}(\varLambda,\varPhi)= \biggl\{ \sum _{\lambda\in\varLambda}c(\lambda)\phi_{\lambda}:(c_{\lambda})_{\lambda\in\varLambda}\in\ell^{2}(\varLambda) \biggr\} , $$

where \(\varLambda\) is a relatively-seperated subset of \(\mathbb{R}^{d}\).

Realistically we can learn about \(f\in V^{2}(\varLambda,\varPhi)\) only if the samples are taken in the set where most of the \(L^{2}\)-norm is localized. So motivated by [3, 8], we focus on the sampling problem on the following subset of \(V^{2}(\varLambda,\varPhi)\):

$$ V^{2}_{R,\delta}(\varLambda, \varPhi):= \biggl\{ f\in V^{2}(\varLambda, \varPhi): \int_{C_{R}}\bigl|f(x)\bigr|^{2}dx \ge(1-\delta) \int_{\mathbb {R}^{n}}\bigl|f(x)\bigr|^{2}dx \biggr\} , $$

where \(C_{R}=[-R/2, R/2]^{n}\) and \(0< \delta<1\). Thus \(V^{2}_{R,\delta}(\varLambda, \varPhi)\) represents the subset consisting of those functions whose energy is largely concentrated on \(C_{R}\).

Usually, in order to guarantee the success of reconstruction, we demand the sampling inequality as below fulfilled:

$$A \| f \|_{2}^{2} \le\sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \le B \| f \|_{2}^{2}, $$

where \(A\) and \(B\) are some positive constants. However, owing to the randomness in choosing the sampling points, we can not guarantee the sampling inequality fulfilled surely. Instead, our goal is to chase the following probability estimate:

$$ \mathbb{P} \Biggl(A \| f \|_{2}^{2} \le \sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \le B \| f \|_{2}^{2} \Biggr)\geq1-\epsilon, $$
(1)

where \(\epsilon>0\) can be taken as arbitrarily small. This problem is studied for the cases of band-limited signal spaces, shift-invariant signal spaces and finitedly generated shift-invariant signal spaces, respectively [3, 8, 10, 25].

In former research [3, 4, 6, 8, 13], the probability distribution of random sampling points is taken as the uniform distribution in \(C_{R}\). It is often beneficial in statistical sampling and function integration to sample uniformly over the applicable parameter space. However, non-uniform distributions also has application in real world [11, 13, 14]. A sequence of points which is uniformly distributed can be mapped into another sequence that reflects a desired non-uniform join probability distribution function [13] using different techniques like the rejection method or weighted sampling [11, 14]. In this paper, we will consider a more generalized probability distribution: we only demand that the probability density function exists, and has positive lower bound in \(C_{R}\). We will show that the uniform random sampling in \(C_{R}\) is a special case in our results about non-uniform random sampling.

In this paper, we will show the probability estimate (1) for non-uniform random sampling in signal spaces of finite rate of innovation. This is one of the main results in this paper.

In [9], a reconstruction method from nonuniform sampling is presented in shift-invariant spaces. But the sampling is NOT random sampling in [9]. In [4], a linear algorithm is obtained for the random sampling from regular languages. But how to give a reconstruction algorithm from uniform random sampling is still an open question in bandlimited functions spaces [3]. In this paper, we try to show a reconstruction algorithm from non-uniform random sampling in spaces with finite rate of innovation. This is one direct application of our main results.

Many problems arising in industry, statistics, physics, and finance require to compute multivariate integration \(\int_{C_{R}} f(x)dx\). The sample mean method has been recommended to give an approximation to \(\int_{C_{R}} f(x) dx\). The Koksma–Hlawka inequality gives an upper bound for the approximation error

$$\Biggl\vert \int_{C_{R}} f(x)dx-\frac{1}{s}\sum _{i=1}^{s}f(x_{i}) \Biggr\vert \leq C(\varGamma)P(f), $$

where \(\varGamma=(x_{i})_{i=1}^{s}\subset{C_{R}}\) is the set of sampling points with uniform distribution, \(C(\varGamma)\) is the discrepancy of \(\varGamma\) and \(P (f)\) is a measure of the variation of \(f\) [6, 13]. A spontaneous question is that, can we use the mean of the squares of those to estimate \(\int_{\mathbb{R}^{d}} |f(x)|^{2} dx\)? Or what is the maximum error between \(\int_{\mathbb{R}^{d}} |f(x)|^{2} dx\) and \(\frac{1}{s}\sum_{i=1}^{s}|f(x_{i})|^{2}\)? In this paper, we provide that the estimate error is direct proportional to the \(\mathrm{L}^{2}\) norm of the function. If \(\varGamma=\{x_{i}\}_{i=1}^{s}\subset{\mathbb{R}^{d}}\) with non-uniform distribution fulfills some conditions with probability at least \(1-\epsilon\) for sufficiently small \(\epsilon>0\) , then we can get that the estimate error is a functional only dependent on \(f\), as shown below:

$$\mathbb{P} \Biggl( \Biggl\vert \int_{\mathbb{R}^{d}} \bigl|f(x)\bigr|^{2} dx-\frac{1}{s} \sum_{i=1}^{s}\bigl|f(x_{i})\bigr|^{2} \Biggr\vert \leq D(f) \Biggr)\geq1-\epsilon. $$

This is one direct application of our main results too.

The paper is organized as follows. In Sect. 2, we state the theorem of random sampling in a space of finite rate of innovation, and give notations and preliminaries. In Sect. 3, some examples are shown about the generators and probability density function. In Sect. 4, spectrum estimation based on Hilbert–Schmidt operator is described. In Sect. 5, we discuss non-uniform random sampling in finite sums of eigenspaces. The proof of main results are provided in Sect. 6. And in the last section, we give a reconstruction algorithm for a special case, and give some numerical testes to verify them. Some properties of signal spaces \(V^{2}(\varLambda,\varPhi)\) are presented in appendix.

2 Statement of the Main Result

2.1 Notations and Preliminaries

We first define relatively-seperated point sets, the relative covering index and the absolute covering index.

We say a point set \(X={x_{j}} \subset \mathbb {R}^{d}\) is relatively-seperated [17], if the relative covering index

$$ D(X):=\sup_{x \in \mathbb {R}^{d}}\sum _{x_{j} \in X}\chi_{x_{j}+[0,1]^{d}}(x) $$
(2)

is finite. And we can define the absolute covering index as below:

$$ N_{0}(X):=\sup_{k \in \mathbb {Z}^{d}}\# \bigl(X \cap \bigl(k+[0,1]^{d} \bigr) \bigr) $$
(3)

We can easily see that \(N_{0}(X)\leq D(X)\) and \(\sup_{x_{0}\in[0,1)^{d}} N_{0}(X+x_{0})=D(X)\) for any relatively-seperated set \(X\). The two definitions are used in different usages.

Throughout the paper, we consider a space of finite rate of innovation \(V^{2}(\varLambda,\varPhi)\subset{\mathrm{L}}^{2}(\mathbb {R}^{d})\), defined as the closed linear span of a tuple of generators \(\{\phi_{\lambda}\}_{\lambda\in\varLambda} \subset{\mathrm{L}}^{2}(\mathbb {R}^{d})\):

$$V^{2}(\varLambda,\varPhi)= \biggl\{ \sum _{\lambda\in\varLambda}c(\lambda)\phi_{\lambda}:(c_{\lambda})_{\lambda\in\varLambda}\in\ell^{2}(\varLambda) \biggr\} , $$

where \(\varLambda\) is a relatively-seperated subset of \(\mathbb {R}^{d}\) with relative covering number \(D(\varLambda)\).

In this paper, we use the polynomial weight function \(u\) below [17]:

$$ u(x) = u_{\alpha}(x) = \bigl(1+|x|\bigr)^{\alpha}, $$
(4)

where \(\alpha\geq0\). Usually we demand that the below norm of \(\varPhi\) is finite for some \(q,p \in[1,\infty]\):

$$ \begin{aligned}[b] &\|\varPhi\|_{q,p,u} \\ &:= \max \biggl\{ \sup_{k\in \mathbb {Z}^{d}} \biggl(\sum _{\lambda\in\varLambda}\bigl\| \phi _{\lambda}(\cdot)u(\cdot-\lambda)\bigr\| _{\mathrm{L}^{q}(k+[0,1]^{d})}^{p} \biggr)^{\frac{1}{p}} , \sup _{\lambda\in\varLambda} \biggl(\sum_{k\in \mathbb {Z}^{d}} \bigl\| \phi_{\lambda}(\cdot)u(\cdot-\lambda) \bigr\| _{\mathrm{L}^{q}(k+[0,1]^{d})}^{p} \biggr)^{\frac{1}{p}} \biggr\} . \end{aligned} $$
(5)

The definition is a bit different with that in [17] and [19]. However, the definition in [17] and that in this paper are equivalent. Some properties of the signal space \(V^{2}(\varLambda ,\varPhi )\) will be shown in Appendix.

We pay attention to the subset \(V_{R,\delta}^{2}(\varPhi,\varLambda )\) of \(V^{2}(\varLambda,\varPhi) \):

$$V_{R,\delta}^{2}(\varPhi,\varLambda):= \biggl\{ f\in V^{2}(\varPhi,\varLambda):\; \int_{C_{R}}\bigl|f(x)\bigr|^{2}dx\geq(1-\delta) \int_{\mathbb {R}^{n}}\bigl|f(x)\bigr|^{2}dx \biggr\} . $$

What’s more, we will adopt a measurable typical or classical probability density function (P.D.F.) \(w_{1}:\mathbb{R}^{d} \to\mathbb{R}\) which satisfies \(\|w_{1}\|_{\mathrm{L}^{1}(\mathbb {R}^{d})}=1\) and \(w_{1}(x)\geq0\), \(\forall x \in \mathbb {R}^{d}\). And the density function of the probability distribution we use to sample is defined as:

$$ w_{R}(x)=\frac{1}{R^{d}}w_{1} \biggl(\frac{x}{R} \biggr). $$
(6)

We denote \(v_{R}=\sqrt{w_{R}}\), \(\| x\|_{\infty}=\max(|x_{1}|, \ldots, |x_{d}|)\), and \(C_{R}=[-\frac {R}{2},\frac{R}{2}]^{d}\). Throughout this paper, we will frequently refer to the constants as in Table 1.

Table 1 Some constants

2.2 Assumptions and Main Results

We collect our assumptions on the generators in the following list:

  1. (A.0)

    \(\varPhi\) is a Riesz basis of \(V^{2}(\varLambda,\varPhi)\).

  2. (A.1)

    \(\|\varPhi\|_{\infty,1,u_{0}}<\infty\).

  3. (A.2)

    \(\|\varPhi\|_{2,2,u_{\alpha}}<\infty\) for some \(\alpha>0\).

By (A.1), we have \(\forall1\leq p,q \leq\infty\), \(\|\varPhi\| _{q,p,u_{0}}\leq\|\varPhi\|_{\infty,1,u_{0}}<\infty\).

By (A.0) and [19, Theorem 4.1], we have that \(\forall1\leq p,q \leq\infty\), \(\|\widetilde{\varPhi}\|_{q,p,u_{0}}<\infty\).

The assumptions on the probability density function are shown in the following:

  1. (B.0)

    \(w_{1}\) is essentially bounded, or \(K_{w}:=\|w_{1}\| _{\mathrm{L}^{\infty}(\mathbb {R}^{d})}<\infty\).

  2. (B.1)

    The essential infimum of \(w_{1}\) in \(C_{1}\) is positive, that is, there exists \(\rho>0\) such that \(\mathrm{ess}\,\mathrm{inf}_{x \in C_{1}}w_{1}(x)\geq\rho K_{w}\), where \(K_{w}\) is defined in (B.0).

According to (6) and the assumptions of \(w_{1}\), we get that \(\|w_{R}\|_{\mathrm{L}^{\infty}(\mathbb {R}^{d})}=R^{-d}K_{w}\) by (B.0), and \(\mathrm{ess}\,\mathrm{sup}_{x \in C_{R}}w_{R}(x)>\rho R^{-d} K_{w}\) by (B.1).

Theorem 1

Assume that \(\varPhi\)satisfies assumptions (A.0)–(A.2), and \(w_{1}\)satisfies assumptions (B.0)–(B.1). Let \((x_{j})_{j \in\mathbb{N}} \subset \mathbb {R}^{d}\)be a sequence of i.i.d random variables with the probability density function \(w_{R}\). Assume that \(R \ge R_{0}\)and \(\delta, \nu\in(0,1)\)are sufficiently small to guarantee that

$$ \frac{\nu^{2}}{K_{w}+\frac{\nu}{3}}\leq6K_{w} D_{2}^{2}\log{ \biggl(\frac{3}{e} \biggr)} \quad \textit{and} \quad A = K_{w}\rho \biggl(\frac{1}{2}-\delta \biggr)-\nu -12K_{w} D_{3} \delta> 0. $$
(7)

Let \(0 < \epsilon< 1\). If the number \(s\)of samples satisfies

$$ s \ge\max \biggl\{ R^{d} \frac{2D_{2} (K_{w}+\frac{\nu}{3} )}{\nu^{2}} \log \biggl(\frac{2 D(\varLambda) M }{\epsilon}R^{d^{2}/\alpha'} \biggr), \frac{R^{d}}{3 K_{w}} \biggr\} , $$
(8)

then the sampling inequality

$$ \forall f \in V_{R,\delta}^{2}(\varLambda, \varPhi) ~:~ s A R^{-d} \| f \|_{2}^{2} \le \sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \le s D_{2} \| f \|_{2}^{2} $$
(9)

holds with probability at least \(1-\epsilon\).

Remark 2

The result above shows the sampling rate of \(O(R^{d} \log R)\) holds in the space of finite rate of innovation when fulfills the assumptions above and \(d\) is fixed, even if the distribution of random sampling is not uniform. If we have one more condition that \(u_{\alpha}\) is 2-admissible, or \(\alpha>d/2\) (see [17]), then \(\alpha'=d\) and the sampling rate could be actually \(O(R^{d} \log{R^{d}})\) when \(d\) is not fixed.

Remark 3

Theorem 1 is the main result in [3, 8, 25] in bandlimited signal spaces and shift-invariant spaces for uniform random sampling, respectively.

Corollary 4

Suppose that \(\mathrm{supp}{\;\phi_{\lambda}}\subset C_{R}\)for any \(\lambda\in\varLambda\), thus \(V_{R,\delta }^{2}(\varLambda,\varPhi)=V^{2}(\varLambda,\varPhi)\)for any small \(\delta>0\). For a sampling set \(\varGamma=(x_{j})_{j=1}^{s}\subset \mathbb {R}^{d}\), define the sampling operator \(S:V^{2}(\varLambda ,\varPhi) \to \mathbb {C}^{s}\)by \(f \mapsto(f(x_{j}))_{j=1}^{s}\). Denoting \(\ell\)as the sampling result \((f(x_{j}))_{j=1}^{s}\), we have that \(S^{*}S\)is invertible under the conditions in Theorem 1with probability at least \(1-\epsilon\), and thus \(f=(S^{*}S)^{-1} S^{*} \ell\)in this case. A schematic of random sampling and reconstruction is shown in Fig1.

Fig. 1
figure 1

A schematic showing random sampling and reconstruction

Corollary 5

Denote the conditional number of the operator \(S^{*}S\)in Corollary 4by \(\kappa(S^{*}S)\). Then the conditional number satisfies

$$\mathbb {P}\biggl(\kappa \bigl(S^{*}S \bigr)\leq\frac {2D_{2}R^{d}}{K_{w}\rho-2\nu} \biggr)\geq1-\epsilon $$

under the conditions in Theorem 1. The upper bound above conditional number is the quotient of the upper bound and lower bound of the sampling inequality.

Corollary 6

For any \(f\in V_{R,\delta}^{2}(\varLambda,\varPhi)\), we have that

$$\mathbb{P} \Biggl( \Biggl\vert \int \bigl|f(x)\bigr|^{2} dx-\frac{1}{s}\sum _{i=1}^{s}\bigl|f(x_{i})\bigr|^{2} \Biggr\vert \leq\max \bigl\{ \bigl|1-AR^{-d}\bigr|, |D_{2}-1| \bigr\} \|f\|_{2}^{2} \Biggr)\geq1-\epsilon $$

when the conditions in Theorem 1are fulfilled.

Remark 7

We expect that \(D(s)\) in the below inequality tends to zero when \(s\to \infty\):

$$\mathbb {P}\Biggl\{ \Biggl\vert \int_{\mathbb{R}^{d}} \bigl|f(x)\bigr|^{2} dx-\frac{1}{s} \sum_{i=1}^{s}\bigl|f(x_{i})\bigr|^{2} \Biggr\vert \leq D(s) \|f\|_{2}^{2} \Biggr\} > 1- \epsilon. $$

However, this can not reach in the FRI space. Thus the estimation in Corollary 6 is very coarse.

3 Some Examples of Generators and Probability Density Function

Two examples of generators of a space of finite rate of innovation fulfilling the assumptions of (A.0)–(A.2).

Example 8

The space \(S_{m}\) of all \(\mathrm{L}^{2}\) polynomial splines of order \(m\) fulfilling \(C^{m-2}\) continuity conditions at each knot \(t_{i}\), where \(m \in \mathbb {N}^{+}\) and \(T=(t_{i})_{i=-\infty}^{+\infty}\) is a bi-infinite increasing sequence satisfying

$$ 0< T_{0}\leq t_{i+m}-t_{i} \leq T_{1}< \infty. $$
(10)

The space can be written as

$$ S_{m}= \Biggl\{ \sum_{i=-\infty}^{+\infty}c(i)B_{i}(x): \bigl(c(i) \bigr)_{i=-\infty}^{+\infty}\in\ell^{2}( \mathbb {Z}) \Biggr\} $$
(11)

where \(B_{i}\) is the normalized B-spline associated with the knots \(t_{i},\ldots,t_{i+m}\) [16]. We have \(B_{i}(x)=0\) when \(x\leq t_{i}\) or \(x\geq t_{i+m}\). Let \(\varLambda=T\) and \(\phi_{t_{i}}=B_{i}\), then the support of \(B_{i}\) is a subset of \([t_{i},t_{i+m}]\). Obviously, \(D(T)\leq\frac {m}{T_{0}}+m\), so \(T\) is relatively seperated.

Let \(q=\infty,p=1\), and take any weight function \(u(x)\) that monotonicly increases with \(|x|\). Then we have that

$$\begin{aligned} \|\varPhi\|_{\infty,1,u} =& \max \biggl\{ \sup_{i\in \mathbb {Z}} \sum_{k\in \mathbb {Z}}\bigl\| B_{i}(\cdot)u(\cdot -t_{i})\bigr\| _{\mathrm{L}^{\infty}(k+[0,1])} , \;\sup_{k\in \mathbb {Z}} \sum _{i\in \mathbb {Z}}\bigl\| B_{i}(\cdot)u(\cdot -t_{i})\bigr\| _{\mathrm{L}^{\infty}(k+[0,1])} \biggr\} \\ \leq& \sup_{i\in \mathbb {Z}} \sum_{k=\lfloor t_{i}\rfloor}^{\lceil t_{i+m}\rceil-1} \bigl\| B_{i}(\cdot)u(\cdot-t_{i})\bigr\| _{\mathrm{L}^{\infty}([t_{i},t_{i+m})} \\ \; & + \sup_{k \in \mathbb {Z}} \sum_{i: t_{i+m}\geq k\ \text{and}\ t_{i}\leq k+1} \bigl\| B_{i}(\cdot)u(\cdot-t_{i})\bigr\| _{\mathrm{L}^{\infty}([t_{i},t_{i+m}])} \\ \leq& \sup_{i\in \mathbb {Z}} \sum_{k=\lfloor t_{i}\rfloor}^{\lceil t_{i+m}\rceil-1} \|B_{i}\|_{\mathrm{L}^{\infty}([t_{i},t_{i+m})} u(t_{i+m}-t_{i}) \\ \; & + \sup_{k \in \mathbb {Z}} \sum_{i: t_{i+m}\geq k\ \text{and}\ t_{i}\leq k+1} \|B_{i}\|_{\mathrm{L}^{\infty}([t_{i},t_{i+m}])} u(t_{i+m}-t_{i}) \\ \leq& \sup_{i\in \mathbb {Z}} \bigl(\lceil t_{i+m}\rceil- \lfloor t_{i}\rfloor \bigr) u(t_{i+m}-t_{i}) \\ \; & + \sup_{k \in \mathbb {Z}} \#\{i:t_{i+m}\geq k \ \text{and}\ t_{i}\leq k+1\} u(t_{i+m}-t_{i}) \\ \leq& \sup_{i\in \mathbb {Z}} (T_{1}+2)u(T_{1})+ \sup_{k\in \mathbb {Z}} \biggl(\frac{1}{T_{0}}+2 \biggr)m u(T_{1}) \\ = & \biggl(T_{1}+2+ \biggl(\frac{1}{T_{0}}+2 \biggr)m \biggr)u(T_{1})< \infty \end{aligned}$$

by using that \(\|B_{i}\|_{\infty}\leq1\) according to [16]. So the assumption (A.1) holds.

At the same time, this implies that \(\|\varPhi\|_{q,p,u}\,{\leq}\,\|\varPhi \|_{\infty,1,u}\,{\leq}\,(T_{1}+2+(\frac{1}{T_{0}}+2)m)u(T_{1}){\,{<}\,\infty}\) for any \(1\leq q,p \leq\infty\). So the assumption (A.2) holds.

Moreover, by [16, Theorem 4.41], we have that for any \(f=\sum_{i=-\infty}^{+\infty}c(i)B_{i}\),

$$\begin{aligned} \sum_{i=-\infty}^{+\infty}\bigl|c(i)\bigr|^{2} \leq& \sum_{i=-\infty}^{+\infty}(2m+1)^{2}9^{2(m-1)}(t_{i+m}-t_{i})^{-1} \|f\|_{\mathrm{L}^{2}[t_{i},t_{i+m}]}^{2} \\ \leq& (2m+1)^{2}81^{m-1}T_{0}^{-1} \sum_{i=-\infty}{+\infty}\|f\|_{\mathrm{L}^{2}[t_{i},t_{i+m}]}^{2} \\ = & m(2m+1)^{2}81^{m-1}T_{0}^{-1} \|f\|_{\mathrm{L}^{2}(\mathbb {R})}^{2}. \end{aligned}$$

In the other hand, we have

$$\begin{aligned} \displaystyle \int_{\mathbb {R}}\bigl\vert f(x) \bigr\vert ^{2} = & \displaystyle \int_{\mathbb {R}}\Biggl\vert \sum_{i=-\infty}^{+\infty}c(i)B_{i}(x) \Biggr\vert ^{2}dx \\ = & \displaystyle\sum_{j=-\infty}^{+\infty} \displaystyle \int_{[t_{j},t_{j+1}]} \Biggl\vert \sum_{i=j-m+1}^{j}c(i)B_{i}(x) \Biggr\vert ^{2}dx \\ \leq& \displaystyle\sum_{j=-\infty}^{+\infty} \displaystyle \int_{[t_{j},t_{j+1}]}\sum_{i=j-m+1}^{j} \bigl\vert c(i) \bigr\vert ^{2}\sum_{i=j-m+1}^{j} \bigl\vert B_{i}(x) \bigr\vert ^{2}dx \\ \leq& \sum_{j=-\infty}^{+\infty}m(t_{j+1}-t_{j}) \sum_{i=j-m+1}^{j} \bigl\vert c(i) \bigr\vert ^{2} \\ \leq& m^{2}T_{1}\sum_{i=-\infty}^{+\infty} \bigl\vert c(i) \bigr\vert ^{2}. \end{aligned}$$

Thus \(\varPhi=\{B_{i}\}_{i=-\infty}^{+\infty}\) is a Riesz basis of \(S_{m}\). Therefore the assumption (A.0) is fulfilled.

Example 9

The space of trigonometric polynomials segmented in \(C_{R}\). This is a very trivial example. We will consider the following signal spaces:

$$T(k_{0},{t})= \biggl\{ f:f(x)=\sum_{k\in[-k_{0},k_{0}]^{d}\cap \mathbb {Z}^{d}}c_{k} \exp \biggl(\frac{2\pi i}{{t}}x\cdot k \biggr),\ (c_{k}) \in\ell ^{2},\ x\in \mathbb {R}^{d} \biggr\} . $$

We can take \(\varLambda\) arbitrarily and give the correspondence between \(\varLambda\) and the basis arbitrarily, and denote the basis still by \(\varPhi\). Note that \(\varLambda\) is finite, hence is relatively-seperated.

If we take \(R=mt\) where \(m\in \mathbb {Z}^{+}\), it is a Riesz basis obviously since the basis is orthogonal. Moreover, for any weight function \(u(x)\) that monotonicly increases with \(|x|\), we can easily get that \(\|\phi_{\lambda}(\cdot)u(\cdot-\lambda)\|_{\mathrm {L}^{\infty}(k+[0,1]^{d})}\leq u(R/2+1+\max_{\lambda\in\varLambda}\{\| \lambda\|_{\infty}\})\) is finite for \(|k|\leq\lceil R/2 \rceil\) while \(\|\phi_{\lambda}(\cdot)u(\cdot-\lambda)\|_{\mathrm{L}^{\infty}(k+[0,1]^{d})}=0\) for \(|k|> \lceil R/2 \rceil\). Then \(\|\varPhi\| _{\infty,1,u}\leq\max\{(2k_{0}+1)^{d},(R+2)^{d}\}\cdot u(R/2+1+\max_{\lambda\in\varLambda}\{\|\lambda\|_{\infty}\})\) holds.

Two examples of probability density function fulfilling the assumptions (B.0)–(B.1).

Example 10

Beta distribution of one dimension

$$w_{1}(x)=B_{a,b,\alpha,\beta}(x)=\left \{ \textstyle\begin{array}{c@{\quad}c} \frac{1}{B(\alpha,\beta)(b-a)^{\alpha+\beta-1}}(x-a)^{\alpha}(b-x)^{\beta}, & x \in[a,b] \\ 0, & \mbox{otherwise} \end{array}\displaystyle \right . $$

where \(\alpha,\beta\geq1\), \(a<-\frac{1}{2}\), \(b>\frac{1}{2}\) and \(B(\alpha,\beta)=\frac{\varGamma(\alpha)\varGamma(\beta )}{\varGamma(\alpha+\beta)}\) is the Beta function. We can check that \(K_{w}=B_{a,b,\alpha,\beta}(\frac{\alpha-1}{\alpha +\beta-2})<\infty\) and \(\rho=\frac{\min\{B_{a,b,\alpha,\beta }(-\frac{1}{2}),B_{a,b,\alpha,\beta}(\frac{1}{2})\}}{K_{w}}>0\). Specially, we take \(\alpha=\beta=1\), \(a\to-\frac{1}{2}-\) and \(b\to \frac{1}{2}+\), then \(w_{R}\) degenerates into the uniform distribution on \(C_{R}\), just like the distribution used in [3] and [8]. Figure 2 is the plot of the P.D.F. of a typical Beta distribution with \(a=-1\), \(b=1\), \(\alpha=1\), \(\beta=2\).

Fig. 2
figure 2

Beta distribution

Example 11

Normal distribution centered at 0:

$$w_{1}(x)=N_{\sigma}(x)=\frac{1}{\sqrt{2\pi\sigma^{2}}}\exp{ \biggl(- \frac{x^{2}}{2\sigma^{2}} \biggr)} $$

where \(\sigma>0\). We have that \(K_{w}=N_{\sigma}(0)=\frac{1}{\sqrt{2\pi\sigma ^{2}}}<\infty\) and \(\rho=\frac{N_{\sigma}(\frac{1}{2})}{N_{\sigma}(0)}=\exp{(-\frac{1}{8\sigma^{2}})}>0\). For fixed \(\delta\in(0,\frac{1}{2+24D_{3}})\), in order to optimize the lower bound \(A\) of the sampling inequality, the optimal choice \(\sigma\) satisfies

$$ \biggl(\frac{1}{2}-\delta \biggr) \bigl(1-4\sigma ^{2} \bigr)+48D_{3}\delta\exp{ \biggl( \frac{1}{8\sigma^{2}} \biggr)}\sigma^{2}=0 $$
(12)

It is easy to check that there exists a unique \(\sigma\) satisfying Eq. (12). We can guarantee that \(A\) is positive when \(\sigma\) take the optimal choice and \(\nu\) is small enough by the assumption \(0<\delta<\frac{1}{2+24D_{3}}\). Figure 3 is the plot of the P.D.F. of standard normal distribution.

Fig. 3
figure 3

Standard normal distribution

4 Spectrum Estimation Based on Hilbert–Schmidt Operator

In the following, we will mostly work with the notation

$$ P_{\varPhi}= \sum_{\lambda\in\varLambda} \phi_{\lambda}\otimes\widetilde{\phi}_{\lambda}, $$
(13)

where unconditional convergence in the strong operator topology is provided by the frame properties for all \(f \in{\mathrm{L}}^{2}(\mathbb{R}^{n})\). The tensor product notation \(v \otimes w\) refers to the rank-one operator \((v \otimes w) f \mapsto\langle f, w \rangle v\). By our assumptions (A.0)–(A.2), it is obvious that \(P_{\varPhi}\) is the orthogonal projection onto \(V^{2}(\varLambda,\varPhi)\).

Given \(R>0\), we write \(Q_{R} : \mathrm{L}^{2}(\mathbb {R}^{d}) \to{\mathrm {L}}^{2}(\mathbb {R}^{d})\) for the orthogonal projection operator \(f \mapsto f \cdot\chi_{C_{R}}\). We introduce the localization operator

$$A_{R} = P_{\varPhi}\circ Q_{R} \circ P_{\varPhi}. $$

We will show that \(A_{R}\) is self-adjoint and Hilbert–Schmidt, and thus has a basis of eigenvectors with associated \(\ell^{2}\) spectrum. We denote the span of the eigenvectors associated to the largest \(N\) eigenvalues by \(\mathcal{P}_{N}\), and then establish a random sampling theorem for that space. Next we will built the connection between the sampling of the elements in \(V_{R,\delta}^{2}(\varLambda,\varPhi)\) and that in \(\mathcal{P}_{N}\). How to choose the proper value of \(N\) depends on the decay of the spectrum of \(A_{R}\). The localization of the above operator is the closest connection with the energy condition of \(V_{R,\delta}^{2}(\varLambda,\varPhi)\).

Lemma 12

\(A_{R}\)is a positive-semidefinite Hilbert–Schmidt operator.

Proof

First of all, \(A_{R}\) is positive-semidefinite since

$$\langle A_{R} f, f \rangle= \langle Q_{R} P_{\varPhi}f, P_{\varPhi}f \rangle= \| Q_{R} P_{\varPhi}f \|_{2}^{2} , $$

by using the self-adjointness of \(Q_{R}\).

We next prove that \(Q_{R} P_{\varPhi}\) is Hilbert–Schmidt operator, which will imply that \(A_{R}\) is Hilbert–Schmidt operator.

Let \(\forall\lambda\in\varLambda\), we have

$$Q_{R} \circ(\phi_{\lambda}\otimes\widetilde{\phi }_{\lambda}) = (Q_{R}\phi_{\lambda}) \otimes \widetilde{\phi}_{\lambda}. $$

By (13) and continuity of \(Q_{R}\), we have

$$Q_{R} P_{\varPhi}= \sum_{\lambda\in\varLambda} (Q_{R} \phi_{\lambda}) \otimes\widetilde{\phi }_{\lambda}$$

with unconditional convergence in the strong operator topology.

Next we will prove that

$$ \sum_{\lambda\in\varLambda} \| Q_{R} \phi_{\lambda}\|_{2}^{2} < \infty. $$
(14)

In fact, we have

$$\begin{aligned} \sum_{\lambda\in\varLambda} \| Q_{R} \phi _{\lambda}\|_{2}^{2} = & \sum _{\lambda\in\varLambda} \| \phi_{\lambda}\|_{\mathrm{L}^{2}(C_{R})}^{2} \\ \leq& \sum_{k\in (\mathbb {Z}\cap [\lfloor-\frac{R}{2}\rfloor,\lceil \frac{R}{2}\rceil ) )^{d}} \sum _{\lambda\in\varLambda} \| \phi_{\lambda}\|_{\mathrm{L}^{2}(k+[0,1]^{d})}^{2} \\ \leq& \sum_{k\in (\mathbb {Z}\cap [\lfloor-\frac{R}{2}\rfloor,\lceil \frac{R}{2}\rceil ) )^{d}} \| \varPhi \|_{2,2,u_{0}}^{2} \leq(R+2)^{d} \| \varPhi \|_{2,2,u_{0}}^{2} < \infty. \end{aligned}$$

Thus, we have

$$\begin{aligned} (Q_{R} P_{\varPhi}) (Q_{R} P_{\varPhi})^{*} = & (Q_{R} P_{\varPhi}) \sum_{\lambda\in\varLambda} \widetilde{\phi }_{\lambda}\otimes(Q_{R} \phi_{\lambda}) \\ = & \sum_{\lambda\in\varLambda} (Q_{R} P_{\varPhi}\widetilde{\phi}_{\lambda}) \otimes (Q_{R} \phi_{\lambda}) \\ = & \sum_{\lambda\in\varLambda} (Q_{R} \widetilde{ \phi}_{\lambda}) \otimes(Q_{R} \phi_{\lambda}) \end{aligned}$$

with convergence in the strong operator topology. Similarly to the proof of Eq. (14), we see that

$$\sum_{\lambda\in\varLambda} \| Q_{R} \widetilde{\phi }_{\lambda}\|_{2}^{2} < \infty. $$

Combing with (14) and the Cauchy–Schwarz inequality, we can get that

$$\begin{aligned} {\mathrm{trace}}(Q_{R} P_{\varPhi}) (Q_{R} P_{\varPhi})^{*} \leq& \sum_{\lambda\in\varLambda} {\mathrm{trace}} \bigl( \bigl\vert ( Q_{R} \widetilde{\phi }_{\lambda}) \otimes( Q_{R} \phi_{\lambda}) \bigr\vert \bigr) \\ = & \sum_{\lambda\in\varLambda} \Vert Q_{R} \widetilde{\phi}_{\lambda} \Vert \; \Vert Q_{R} \phi _{\lambda} \Vert < \infty. \end{aligned}$$

Hence \(Q_{R} P_{\varPhi}\) is Hilbert–Schmidt operator. □

It follows that \(A_{R}\) has an orthogonal normalized basis of eigenvectors, whose associated eigenvalues are non-negative and square-summable. So the cluster points of the spectrum of \(A_{R}\) are either \(\{0\}\) or \(\varnothing\), and thus they can be sorted in decreasing order. Denoting the non-zero eigenvalues by \((\mu_{n})_{n \in I}\) sorted in decreasing order and the associated eigenfunctions by \((\psi_{n})_{n \in I}\), with \(I\) being either \(\mathbb {N}_{+}\) or \(\{1, \ldots, M\}\) for some integer \(M\). Then \(A_{R}\) is given as the sum

$$\sum_{n \in I} \mu_{n} \psi _{n} \otimes\psi_{n}. $$

We can easily check that \(\psi_{n} \in V^{2}(\varPhi,\varLambda)\), as well as \(P_{\varPhi}(\psi_{n}) = \psi_{n}\). Since \(A_{R}\) is a composition of orthogonal projections, we have \(\mu _{1} \le\| A_{R} \|_{op} \le1\). Let

$$\mathcal{P}_{N} = \mathrm{span}_{n\in \mathbb {N}_{+},n\leq N}\{ \psi _{n} \} . $$

For the sampling problem we will need some information on the spectrum of \(A_{R}\). Let

$$N(R) = \max\{ n \in\mathbb{N}~: ~\mu_{n} \ge1/2 \}, $$

and \(N(R)=0\) when \(\mu_{1} < 1/2\). Then whenever \(N(R)>0\), we have \(\mu_{N(R)} \ge1/2 > \mu_{N(R)+1}\).

An upper bound for \(N(R)\) is provided in the following lemma.

Lemma 13

Let \(\beta\)and \(R_{0}\)be that defined in Table 1. Then for all \(R>R_{0}\), the inequalities \(0 < N(R) \le D(\varLambda )\beta^{d} R^{d^{2}/\alpha'}\)hold, where \(D(\varLambda)\)stands for the relative covering index of \(\varLambda\), defined in (2).

Proof

We use the minimax formula for a given order of eigenvalue

$$\mu_{m} = \inf \bigl\{ \sup \bigl\{ \langle A_{R} f, f \rangle~:~ f \bot\mathcal{H}, \| f \|_{2} = 1 \bigr\} : \mathcal{H} \subset{\mathrm{L}}^{2} \bigl(\mathbb{R}^{n} \bigr), \dim( \mathcal{H}) < m \bigr\} . $$

Now fix \(S\geq R+2\), and denote \(\mathcal{H}_{S} := \mathrm{span} \{ \widetilde{\phi}_{\lambda}: \| \lambda\|_{\infty}\le S/2 \}\). It follows that

$$\mathrm{dim}(\mathcal{H}_{S}) \leq\bigl(\lfloor S \rfloor+1 \bigr)^{d} D(\varLambda). $$

For any unit vector \(f\) in \(\mathcal{H}_{S}^{\bot}\), we obtain

$$\begin{aligned} \langle A_{R} f, f \rangle= \sum_{\lambda, \lambda' \in\varLambda} \langle f,\widetilde{\phi}_{\lambda}\rangle \langle Q_{R} \phi_{\lambda},\phi_{\lambda'}\rangle\overline{ \langle f, \widetilde{\phi}_{\lambda'}\rangle}. \end{aligned}$$

We denote \(c_{\lambda}= \langle f, \widetilde{\phi}_{\lambda}\rangle\), \(c=(c_{\lambda})_{\lambda\in\varLambda}\),

$$a_{\lambda,\lambda'}=\left \{ \textstyle\begin{array}{c@{\quad}c}\langle Q_{R} \phi_{\lambda},\phi_{\lambda'}\rangle , &\min\{\|\lambda\|_{\infty},\|\lambda'\|_{\infty}\}>\frac{S}{2}\\0, &\mbox{otherwise} \end{array}\displaystyle \right . $$

and an infinite, positive semidefinite matrix

$$\mathcal{A} = (a_{\lambda,\lambda'})_{\lambda,\lambda'\in \varLambda}, $$

by recalling the assumption that \(f \bot\widetilde{\phi}_{\lambda}\) for \(\| \lambda\|_{\infty}\le S/2\). In particular, we get

$$\langle A_{R} f, f \rangle= c A c^{*} \le\| c \|_{2}^{2} \| \mathcal{A} \|_{\mathrm{op}} \le\| \widetilde{\varPhi}\|_{2,1,u_{0}}^{2} \| \mathcal{A} \|_{\mathit{HS}} $$

by using Proposition 22 in Appendix. To estimate the Hilbert–Schmidt norm of the matrix, we have

$$\begin{aligned} \| \mathcal{A} \|_{\mathit{HS}}^{2} = & \sum _{ \begin{subarray}{c}\lambda,\lambda'\in\varLambda\\ \|\lambda\| _{\infty}> S/2\\ \|\lambda' \|_{\infty}\geq S/2 \end{subarray} } |a_{\lambda,\lambda'}|^{2} \\ \le& \sum_{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> S/2 \end{subarray} } \sum _{\lambda'\in\varLambda} \bigl\vert \langle Q_{R} \phi _{\lambda}, \phi_{\lambda'} \rangle \bigr\vert ^{2} \\ \le& \sum_{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> S/2 \end{subarray} } \sum _{\lambda'\in\varLambda} \| Q_{R} \phi_{\lambda}\|_{2}^{2} \|\phi_{\lambda'}\|_{2}^{2} \\ = & \sum_{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> S/2 \end{subarray} } \| Q_{R} \phi _{\lambda}\|_{2}^{2} \sum _{\lambda'\in\varLambda} \|\phi_{\lambda'}\|_{2}^{2} \\ \le& \sum_{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> S/2 \end{subarray} } \|\varPhi \|_{2,1,u_{0}}^{2} \| Q_{R} \phi_{\lambda}\|_{2}^{2} ~ \end{aligned}$$

where Proposition 22 in Appendix is used in the above argument again.

Now we will estimate \(\sum_{\lambda,\|\lambda\|_{\infty}>S/2}\| Q_{R}\phi_{\lambda}\|_{2}^{2}\) by the following:

$$\begin{aligned} &\sum_{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> \frac{S}{2} \end{subarray} } \| Q_{R} \phi _{\lambda}\|_{2}^{2} \\ &\quad = \sum_{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> \frac{S}{2} \end{subarray} } \|\phi_{\lambda}\|_{\mathrm{L}^{2}(C_{R})} \\ &\quad \leq \sum_{k \in (\mathbb {Z}\cap [\lfloor-\frac{R}{2}]\rfloor ,\lceil\frac{R}{2}\rceil ) )^{d}}\sum _{ \begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> \frac{S}{2} \end{subarray} }\|\phi_{\lambda}\|_{\mathrm{L}^{2}(k+[0,1]^{d})} \\ &\quad \leq \sum_{k \in (\mathbb {Z}\cap [\lfloor-\frac{R}{2}]\rfloor ,\lceil\frac{R}{2}\rceil ) )^{d}}\sum _{\begin{subarray}{c}\lambda\in\varLambda\\ \|\lambda\|_{\infty}> \frac{S}{2} \end{subarray} } \bigl\| \phi_{\lambda}u_{\alpha}(\cdot- \lambda)\bigr\| _{\mathrm{L}^{2}(k+[0,1]^{d})} \sup_{x \in k+[0,1]^{d}} \bigl(u_{\alpha}(x- \lambda) \bigr)^{-2} \\ &\quad \leq \Bigl(\inf_{\|x\|_{\infty}\geq\frac{S}{2}-\frac{R}{2}-1}u_{\alpha}(x) \Bigr)^{-2}(R+2)^{d}\|\varPhi\|_{2,2,u_{\alpha}}^{2}. \end{aligned}$$

Therefore we have

$$\begin{aligned} \langle A_{R} f,f \rangle^{2} \le& \|\widetilde{ \varPhi}\|_{2,1,u_{0}}^{4} \| \mathcal{A} \|_{\mathit{HS}}^{2} \\ \le& \|\widetilde{\varPhi}\|_{2,1,u_{0}}^{4} \|\varPhi \|_{2,1,u_{0}}^{2} \|\varPhi\|_{2,2,u_{\alpha}}^{2} (R+2)^{d} \biggl(u_{\alpha}\biggl(\frac{S}{2}- \frac{R}{2}-1 \biggr) \biggr)^{-2} \\ = & D_{1} (R+2)^{d} \biggl(\frac{S}{2}- \frac{R}{2} \biggr)^{-2\alpha}. \end{aligned}$$

For \(S = (\beta-\frac{3}{2}) R^{d/\alpha'}+2\) with \(\beta\) in Table 1, one finds that \(R>R_{0}\geq2\) and the definition of \(\alpha '\) yield that \(R \le R^{d/\alpha'}\), and hence

$$\begin{aligned} D_{1} (R+2)^{d} \biggl(\frac{S}{2}- \frac{R}{2} \biggr)^{-2\alpha} \leq& D_{1} (R+2)^{d} \biggl(\frac{\beta-\frac{1}{2}}{2}R^{d/\alpha'}+1- \frac{R}{2} \biggr)^{-2\alpha} \\ < & D_{1} (R+2)^{d} \biggl(\frac{\beta-\frac{3}{2}}{2}R^{d/\alpha'}- \frac{1}{2}R^{d/\alpha'} \biggr)^{-2\alpha} \\ \leq& D_{1} (R+2)^{d} \biggl(\frac{\beta-\frac{5}{2}}{2} \biggr)^{-2\alpha}R^{-\frac{2\alpha d}{\alpha'}} \\ \leq& \frac{1}{4}. \end{aligned}$$

Hence we have shown

$$\langle A_{R} f,f \rangle< 1/2~ $$

for all unit vectors \(f \in\mathcal{H}_{S}^{\bot}\).

If \(R>2\), then

$$\begin{aligned} N = & \biggl( \biggl\lfloor \biggl(\beta-\frac{3}{2} \biggr) R^{d/\alpha'} +2 \biggr\rfloor + 1 \biggr)^{d} D(\varLambda) \\ \leq& \biggl( \biggl(\beta-\frac{3}{2} \biggr) R^{d/\alpha'} +3 \biggr)^{d} D(\varLambda) \\ \leq& \beta^{d} R^{d^{2}/\alpha'} D(\varLambda), \end{aligned}$$

and the minimax estimate yields \(\mu_{N} < 1/2\).

On the other hand, if \(R>R_{0}\geq2\|\lambda^{*}\|_{\infty}+2\), then

$$\begin{aligned} \mu_{1} \geq& \frac{\langle A_{R} \phi_{\lambda^{*}},\phi _{\lambda}\rangle}{ \Vert \phi_{\lambda^{*}} \Vert _{\mathrm {L}^{2}(\mathbb {R}^{d})}^{2}} \\ = & \frac{\langle Q_{R} \phi_{\lambda^{*}},\phi_{\lambda}\rangle }{ \Vert \phi_{\lambda^{*}} \Vert _{\mathrm{L}^{2}(\mathbb {R}^{d})}^{2}} \\ = & 1-\frac{\|\phi_{\lambda^{*}}\|_{\mathrm{L}^{2}(\mathbb {R}^{d}\backslash C_{R})}}{ \Vert \phi_{\lambda^{*}} \Vert _{\mathrm {L}^{2}(\mathbb {R}^{d})}^{2}} \\ \geq& 1-\frac{\sum_{k \in (\mathbb {Z}\cap [\lceil-\frac{R}{2}\rceil ,\lfloor\frac{R}{2}\rfloor ) )^{d}}\|\phi_{\lambda^{*}}\|_{\mathrm {L}^{2}(k+[0,1]^{d})}}{ \Vert \phi_{\lambda^{*}} \Vert _{\mathrm {L}^{2}(\mathbb {R}^{d})}^{2}} \\ \geq& 1-\frac{\sum_{k \in (\mathbb {Z}\cap [\lceil-\frac{R}{2}\rceil ,\lfloor\frac{R}{2}\rfloor ) )^{d}}\|\phi_{\lambda^{*}}u_{\alpha}(\cdot-\lambda^{*})\|_{\mathrm{L}^{2}(k+[0,1]^{d})} (\inf_{x\in k+[0,1]^{d}}u_{\alpha}(x-\lambda^{*}) )^{-2}}{ \Vert \phi_{\lambda ^{*}} \Vert _{\mathrm{L}^{2}(\mathbb {R}^{d})}^{2}} \\ \geq& 1-\frac{\|\varPhi\|_{2,2,u_{\alpha}}^{2} (u_{\alpha}(\lfloor \frac{R}{2}\rfloor-\|\lambda^{*}\|_{\infty}) )^{-2}}{ \Vert \phi _{\lambda^{*}} \Vert _{\mathrm{L}^{2}(\mathbb {R}^{d})}^{2}} > \frac{1}{2}. \end{aligned}$$

So for all \(R>R_{0}\) the following inequality holds

$$0 < N(R) \le\beta^{d} R^{d^{2}/\alpha'} D(\varLambda) . $$

 □

5 Non-uniform Random Sampling in Finite Sums of Eigenspaces \(\mathcal{P}_{N}\)

Recall from the previous section that \(\lambda_{1} \ge\lambda_{2} \ge\)... are the eigenvalues of \(A_{R}\), with corresponding eigenfunctions \(\psi_{1},\psi_{2},\ldots\). The span of the first \(N\) eigenfunctions is denoted by \(\mathcal{P}_{N}\). We let \(\Delta_{N} = \mathrm{diag}(\lambda_{1},\ldots,\lambda_{N})\).

The goal of this section is to show a non-uniform random sampling statement for \(\mathcal{P}_{N}\). We will show it by applying a matrix Bernstein inequality which will be stated in Theorem 14, using the following notation: For \(A \in \mathbb{C}^{N \times N}\), we let \(\| A \|\) denote the operator norm with respect to the Euclidean norm. Further, the inequality \(A \le B\) for two matrices \(A,B\) of equal size means that \(B-A\) is positive semidefinite.

Theorem 14

(Matrix Bernstein inequality [22, Theorem 1.4]) Let \(X_{j}\)be a sequence of independent, random self-adjoint \(N\times N\)-matrices. Suppose that \(\mathbb{E}X_{j} =0\), and \(\| X_{j}\|\leq B\)a.s. Then for all \(t>0\),

$$\mathbb{P} \Biggl(\lambda_{\max} \Biggl( \sum \limits _{i=1}^{s}X_{j} \Biggr)\geq t \Biggr)\leq N \exp \biggl(-\frac{t^{2}/2}{\sigma^{2}+Bt/3} \biggr) $$

holds where \(\lambda_{\max}(U)\)is the largest singular value of a matrix \(U\)so that \(\|U\|=\lambda_{\max}(U^{*}U)^{1/2}\)is the operator norm, and \(\sigma^{2}= \Vert \sum_{j=1}^{s}\mathbb{E}(X_{j}^{2}) \Vert \).

The random matrices under consideration are constructed as follows: For each \(j \in\mathbb{N}\) and \(k,l \in\{ 1, \ldots, N \}\), we introduce the \(N \times N\) rank-one random matrix \(M_{j}\) defined by

$$ (M_{j})_{k,l} = \psi _{k}(x_{j}) \overline{\psi_{l}(x_{j})}. $$
(15)

Here the \(x_{j}\)’s denote i.i.d. random variables, and are distributed in \(\mathbb {R}^{d}\) with probability density function \(w_{R}\). Let

$$ X_{j} = M_{j} - \mathbb{E}(M_{j}). $$
(16)

We can now formulate and prove the non-uniform random sampling statement for \(\mathcal{P}_{N}\) by using Theorem 14.

Theorem 15

Let \((x_{j})_{j \in\mathbb{N}}\subset \mathbb {R}^{d}\)be a sequence of independent and identically distributed random variables with probability density function \(w_{R}\). Then, for all \(\nu\ge0\)and \(s \in \mathbb {N}\):

$$ \mathbb {P}\Biggl( \inf_{f \in\mathcal{P}_{N}, \| f \| _{2} = 1} \frac{1}{s} \sum_{j=1}^{s} \bigl( \bigl|f(x_{j})\big|^{2}- \| v_{R} f \|_{2}^{2} \bigr) \le- R^{-d} \nu \Biggr) \le N \exp \biggl( -\frac{\nu^{2} s}{2D_{2}R^{d} (K_{w}+\frac{\nu}{3} )} \biggr) $$
(17)

holds where \(v_{R}\)denotes the square root of \(w_{R}\)and \(\| v_{R} f \|_{2}^{2}\)is the expectation of \(|f(x_{j})|^{2}\).

Proof

By the definition of \(M_{j}\) defined by (15), we have

$$\begin{aligned} \bigl( \mathbb{E} (M_{j}) \bigr)_{k,\ell} = & \int_{\mathbb {R}^{d}} w_{R}(x) \psi_{\ell}(x) \overline{\psi_{k}(x)} dx \\ = & \int_{\mathbb {R}^{d}} v_{R}(x) \psi_{\ell}(x) \overline{v_{R}(x)} \; \overline{\psi_{k}(x)} dx \\ = & \langle v_{R}\psi_{\ell}, v_{R}\psi _{k} \rangle. \end{aligned}$$

Furthermore, for any unit vector \(f \in\mathcal{P}_{N}\) we can rewrite it by \(f = \sum_{n=1}^{N} c_{n} \psi_{n}\) with a unit vector \((c_{n})_{n}\) of coefficients, and

$$\langle c, M_{j} c \rangle= \bigl| f(x_{j}) \bigr| ^{2}, $$

and similarly

$$\begin{aligned} \langle c,\mathbb{E}M_{j}c \rangle = & \sum _{k} \sum_{\ell}c_{k} \overline{c_{\ell}} \; \overline{( \mathbb{E}M_{j})_{k,\ell}} \\ = & \sum_{k} \sum _{\ell}c_{k} \overline{c_{\ell}} \langle v_{R} \psi_{k}, v_{R} \psi _{\ell}\rangle \\ = & \biggl\langle v_{R} \sum_{k} c_{k} \psi_{k}, v_{R} \sum _{\ell}c_{\ell}\psi_{\ell}\biggr\rangle \\ = & \|v_{R} f\|_{2}^{2}, \end{aligned}$$

which implies that both \(M_{j}\) and \(\mathbb{E}M_{j}\) are positive semi-definite.

Thus we have

$$\begin{aligned} & \inf_{f \in\mathcal{P}_{N}, \| f \|_{2} = 1} \frac{1}{s} \sum _{j=1}^{s} \bigl( \bigl|f(x_{j})\bigr|^{2}- \| v_{R} f \|_{2}^{2} \bigr) \\ & \quad = \inf_{\| c \|_{2} = 1} \frac{1}{s} \sum _{j=1}^{s} \bigl( \langle c, M_{j} c \rangle- \langle c,\mathbb{E}M_{j}c \rangle \bigr) \\ & \quad = \lambda_{\mathrm{min}} \Biggl( \frac{1}{s} \sum _{j=1}^{s} X_{j} \Biggr), \end{aligned}$$

where \(\lambda_{\mathrm{min}}\) represents the smallest eigenvalue of a self-adjoint matrix.

Next we estimate \(\|X_{j}\|\). For all \(\|c\|_{2}=1\) and \(f=\sum_{k=1}^{s} c_{k} \psi_{k}\), we have

$$\langle c,M_{j} c \rangle= \bigl|f(x_{j})\bigr|^{2} \leq\|f\|_{\infty}^{2}. $$

Since we can ignore a zero-measure subset from the existence of probability density function of \(x_{j}\), as well as

$$\begin{aligned} \langle c,\mathbb{E}M_{j} c \rangle = & \| v_{R} f \|_{2}^{2} \\ \leq& \|v_{R}\|_{2}^{2} \; \|f \|_{\infty}^{2} = \|f\|_{\infty}^{2} . \end{aligned}$$

Thus we have

$$\begin{aligned} \bigl|\langle c,X_{j} c \rangle\bigr| = & \bigl| \langle c,M_{j} c \rangle- \langle c,\mathbb{E}M_{j} c \rangle\bigr| \\ \leq& \|f\|_{\infty}^{2} \leq D_{2} \end{aligned}$$

by Proposition 24 in Appendix. Therefore \(\|X_{j}\| \leq D_{2}\) holds.

Finally we will estimate \(\sigma^{2}\). In fact, we have that

$$\mathbb{E} \bigl(X_{j}^{2} \bigr) = \mathbb{E} \bigl(M_{j}^{2} \bigr) - \bigl[\mathbb{E}(M_{j}) \bigr]^{2} \leq\mathbb{E} \bigl(M_{j}^{2} \bigr) , $$

and

$$\begin{aligned} \bigl(M_{j}^{2} \bigr)_{\mathit{km}} = & \sum _{\ell=1}^{N} (M_{j})_{\mathit{kl}} (M_{j})_{\mathit{lm}} \\ = & \sum_{\ell=1}^{N} \psi _{k}(x_{j}) \overline{\psi_{\ell}(x_{j})} \psi_{\ell}(x_{j}) \overline{\psi _{m}(x_{j})} \\ = & \sum_{\ell=1}^{N} \bigl|\psi_{\ell}(x_{j})\bigr|^{2} (M_{j})_{\mathit{km}} . \end{aligned}$$

Thus \(M_{j}^{2} = \sum_{\ell=1}^{N} |\psi_{\ell}(x_{j})|^{2} (M_{j})_{\mathit{km}}\). Moreover, we have the following estimation that

$$\begin{aligned} \sum_{\ell=1}^{N} \bigl|\psi_{\ell}(x_{j})\bigr|^{2} = & \sum_{\ell=1}^{N} \bigl|\langle\psi_{\ell},v_{x_{j}} \rangle\bigr|^{2} \\ \leq& \sum_{\ell=1}^{\infty}\bigl|\langle\psi _{\ell},v_{x_{j}} \rangle\bigr|^{2} \\ = & \|v_{x_{j}}\|_{2}^{2} \leq D_{2} \end{aligned}$$

almost everywhere by Proposition 24 in Appendix.

Therefore

$$M_{j}^{2} \leq D_{2} M_{j} , $$

as well as

$$\mathbb{E} \bigl(M_{j}^{2} \bigr) \leq D_{2} \mathbb{E}(M_{j}) . $$

By the arguments above, we can get that

$$\begin{aligned} \sigma^{2} = & \Biggl\Vert \sum_{j=1}^{s} \mathbb{E} \bigl(X_{j}^{2} \bigr) \Biggr\Vert \\ \leq& \Biggl\Vert \sum_{j=1}^{s} \mathbb{E} \bigl(M_{j}^{2} \bigr) \Biggr\Vert \\ \leq& D_{2} \Biggl\Vert \sum_{j=1}^{s} \mathbb{E}(M_{j}) \Biggr\Vert \\ \leq& sD_{2} \sup_{\|f\|_{2}=1} \|v_{R} f\|_{2}^{2} \\ \leq& sD_{2}\|v_{R}\|_{\infty}^{2} = sD_{2}R^{-d}K_{w}. \end{aligned}$$

Now the statement of the theorem follows from the matrix Bernstein inequality formulated in Theorem 14, with proper estimation constants provided above. □

6 Proofs of the Main Results

It remains to transfer the random sampling statements from the spaces \(\mathcal{P}_{N}\) to the set \(V_{R,\delta}(\varPhi)\). The following lemma is a first step in this direction, by providing a norm estimate for the projection onto \(\mathcal{P}_{N}\).

Lemma 16

Let \(N \in\mathbb{N}\)and \(\gamma\in\mathbb{R}\)with \(\lambda_{N} \ge\gamma\ge\lambda_{N+1}\). Denote the orthogonal projection onto \(\mathcal{P}_{N}\)by \(E_{N}\), and the orthogonal projection onto the orthogonal complement by \(F_{N}\). Then for all \(f \in V_{R,\delta }(\varPhi)\), we have

$$\begin{aligned} \| E_{N} f \|_{2}^{2} \ge& \biggl( 1 - \frac{\delta}{1-\gamma} \biggr) \| f \|_{2}^{2}, \\ \| Q_{R} E_{N} f \|_{2}^{2} \ge& \gamma \biggl( 1 - \frac{\delta}{1-\gamma} \biggr) \| f \|_{2}^{2}, \\ \| F_{N} f \|_{2}^{2} \le& \frac{\delta}{1-\gamma} \| f \|_{2}^{2}. \end{aligned}$$

If \(N=N(R) \neq0\), then these estimation can be simplified into

$$\begin{aligned} \| E_{N} f \|_{2}^{2} \ge& (1 - 2 \delta ) \| f \|_{2}^{2}, \\ \| Q_{R} E_{N} f \|_{2}^{2} \ge& \biggl( \frac{1}{2}- \delta \biggr) \| f \|_{2}^{2}, \\ \| F_{N} f \|_{2}^{2} \le& 2 \delta\| f \|_{2}^{2}. \end{aligned}$$

Proof

Let \(f \in V_{R,\delta}(\varPhi)\), w.l.o.g. \(\| f \|_{2} = 1\). Since \(f = P_{\varPhi}f\), we obtain

$$1 - \delta\le\| Q_{R} f \|_{2}^{2} = \| Q_{R} P_{\varPhi}f \|_{2}^{2} = \langle Q_{R} P_{\varPhi}f, Q_{R} P_{\varPhi}f \rangle= \langle A_{R} f , f \rangle= \sum _{j} \bigl|\langle f, \psi_{j} \rangle\bigr|^{2} \lambda_{j} . $$

Let \(c_{j} = \langle f, \psi_{j} \rangle\), and define

$$A = \| E_{N} f \|_{2}^{2} = \sum _{j=1}^{N} | c_{j}|^{2}, $$

and \(B = 1- A = \| F_{N} f \|_{2}^{2}\). Then \(\sum_{j=N+1}^{\infty}|c_{j}|^{2} \le\| F_{N} f \|_{2}^{2} = 1-A\). Using \(\gamma\ge\lambda _{N+1} \ge\lambda_{N+2} > \cdots\) and \(\lambda_{j} \le1\), we find

$$\begin{aligned} A = & \sum_{j=1}^{N} \bigl|\langle f, \psi_{j} \rangle\bigr|^{2} \ge\sum _{j=1}^{N} \bigl|\langle f, \psi_{j} \rangle\bigr|^{2} \lambda_{j} \\ = & \sum_{j=1}^{\infty}|c_{j}|^{2} \lambda_{j} - \sum _{j=N+1}^{\infty}|c_{j}|^{2} \lambda_{j} \\ \ge& 1- \delta- \gamma \Biggl( \sum_{j=n+1}^{\infty}|c_{j}|^{2} \Biggr) \\ \ge& 1-\delta- \gamma(1-A). \end{aligned}$$

Solving this inequality for \(A\) yields \(A \ge 1-\frac{\delta}{1-\gamma}\), which implies \(B \le \frac{\delta}{1-\gamma}\). Finally, \(\gamma\le\lambda_{N}\) holds, which implies

$$\| Q_{R} E_{N} f \|_{2}^{2} = \sum_{j=1}^{N} \lambda_{j} |c_{j}|^{2} \ge \gamma A \ge \gamma\biggl( 1- \frac{\delta}{1-\gamma} \biggr). $$

For \(N=N(R) \neq0\), we may pick \(\gamma= 1/2\), which results in the estimates given for this case. □

To connect the sampling theorem between \(V_{R,\delta}^{2}(\varLambda ,\varPhi)\) and the eigenspace \(\mathcal{P}_{N}\), we need a statement below.

Lemma 17

Let \(X=\{x_{j}\}_{j=1}^{s}\)be a finite subset of \(\mathbb {R}^{d}\)and \(\alpha\in[\mu_{N+1},\mu_{N}]\). Assume that the inequality

$$\frac{1}{s} \sum_{j=1}^{s} \bigl( \bigl|g(x_{j})\bigr|^{2}- \| v_{R} g \|_{2}^{2} \bigr) \ge- R^{-d} \nu\|g \|_{2}^{2} $$

holds for all \(g \in\mathcal{P}_{N}\). Then the inequality

$$\sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \geq\hat{A} \|f\|_{2}^{2} $$

holds for all \(f \in V_{R,\delta}^{2}(\varLambda,\varPhi)\), where

$$\hat{A}=sR^{-d} \biggl(K_{w}^{2}\rho\alpha \biggl(1-\frac{\delta}{1-\alpha} \biggr)-\nu \biggr)-2D_{3}N_{0}(X) \frac{\delta}{1-\alpha} . $$

Proof

Denoting the orthogonal projection of a square-integrable function \(f\) to \(\mathcal{P}_{N}\) by \(\mathit{Ef}\). By Lemma 16 above and Proposition 23 in Appendix, we have

$$\begin{aligned} \sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \geq& \sum_{j=1}^{s} \bigl|E f(x_{j})\bigr|^{2} - D_{3}N_{0}(X) \|f-\mathit{Ef}\|_{2}^{2} \\ \geq& \sum_{j=1}^{s} \bigl|E f(x_{j})\bigr|^{2} - 2D_{3}N_{0}(X) \frac{\delta}{1-\alpha}\|f\|_{2}^{2} \\ \geq& s \|v_{R} E f\|_{\mathrm{L}^{2}(C_{R})}^{2} - \frac{\nu s}{R^{d}} \|E f\|_{2}^{2} - 2D_{3}N_{0}(X) \frac{\delta}{1-\alpha}\|f \|_{2}^{2} \\ \geq& s K_{w} R^{-d} \rho\|E f\|_{\mathrm{L}^{2}(C_{R})}^{2} - \frac{\nu s}{R^{d}} \|E f\|_{2}^{2} - 2D_{3}N_{0}(X) \frac{\delta}{1-\alpha}\|f \|_{2}^{2} \\ \geq& K_{w} R^{-d} \rho s \alpha \biggl(1- \frac{\delta}{1-\alpha} \biggr)\|f\|_{2}^{2} - \frac{\nu s}{R^{d}} \|f\|_{2}^{2} - 2D_{3}N_{0}(X) \frac{\delta}{1-\alpha}\|f\|_{2}^{2}. \end{aligned}$$

Then we get the conclusion. □

Before proving the main result, we have a following statement about absolute covering index.

Lemma 18

Suppose \(X=\{x_{j}\}_{j=1}^{s}\)are independent and identically distributed random variables in \(\mathbb {R}^{d}\). Let \(P\geq\sup_{k \in \mathbb {Z}^{d}}\mathbb {P}(x_{j} \in k+[0,1]^{d} )\)and \(a>e P\)with \(as\geq1\). Then we have

$$\mathbb {P}\bigl(N_{0}(X)>as \bigr)\leq\frac{1}{P} \biggl( \frac{eP}{a} \biggr)^{as} $$

where \(N_{0}(X)\)represents the absolute covering index of \(X\)stated in (3).

Proof

Let \(P_{k}=\mathbb {P}(x_{j} \in k+[0,1]^{d} )\) for \(k \in \mathbb {Z}^{d}\). Combing with Chebyshev’s inequality, then we have that

$$\begin{aligned} \mathbb {P}\bigl(N_{0}(X)>as \bigr) \leq& \sum _{k \in \mathbb {Z}^{d}}\mathbb {P}\bigl(\#\{x_{j}\}\cap k+[0,1]^{d}>as \bigr) \\ = & \sum_{k \in \mathbb {Z}^{d}}\mathbb {P}\Biggl(\sum _{j=1}^{s}\chi_{k+[0,1]^{d}}(x_{j})>as \Biggr) \\ = & \sum_{k \in \mathbb {Z}^{d}}\mathbb {P}\Biggl(\exp \Biggl(b_{k}\sum_{j=1}^{s} \chi_{k+[0,1]^{d}}(x_{j}) \Biggr)>e^{b_{k}as} \Biggr) \\ \leq& \sum_{k \in \mathbb {Z}^{d}}e^{b_{k}as}\mathbb{E} \exp \Biggl(b_{k}\sum_{j=1}^{s} \chi_{k+[0,1]^{d}}(x_{j}) \Biggr) \\ = & \sum_{k \in \mathbb {Z}^{d}}e^{b_{k}as}\prod _{j=1}^{s}\mathbb{E}\exp \bigl(b_{k} \chi_{k+[0,1]^{d}}(x_{j}) \bigr) \\ = & \sum_{k \in \mathbb {Z}^{d}}e^{b_{k}as} \bigl(1+ \bigl(e^{b_{k}}-1 \bigr)P_{k} \bigr)^{s} \\ \leq& \sum_{k \in \mathbb {Z}^{d}}e^{b_{k}as} \bigl(\exp \bigl( \bigl(e^{b_{k}}-1 \bigr)P_{k} \bigr) \bigr)^{s} . \end{aligned}$$

Taking the optimal choice \(b_{k}=\log (\frac{a}{P_{k}} )\), we obtain that

$$\begin{aligned} \mathbb {P}\bigl(N_{0}(X)>as \bigr) \leq& \sum _{k \in \mathbb {Z}^{d}} \exp \biggl(-s \biggl(a \log \biggl( \frac{a}{P_{k}} \biggr)- (a-P_{k} ) \biggr) \biggr) \\ = & \sum_{k \in \mathbb {Z}^{d}} \exp \biggl(-sa \log \biggl( \frac{a}{P_{k}} \biggr)+sa-sP_{k} \biggr) \\ \leq& \sum_{k \in \mathbb {Z}^{d}} \exp \biggl(as \biggl(1-\log \biggl(\frac{a}{P_{k}} \biggr) \biggr) \biggr) \\ = & \sum_{k \in \mathbb {Z}^{d}} \exp \biggl(as \biggl(\log \biggl(\frac{eP_{k}}{a} \biggr) \biggr) \biggr) \\ = & \sum_{k \in \mathbb {Z}^{d}} \biggl(\frac{eP_{k}}{a} \biggr)^{as} = \sum_{k \in \mathbb {Z}^{d}} \biggl( \frac{eP_{k}}{a} \biggr) \biggl(\frac{eP_{k}}{a} \biggr)^{as-1} \\ \leq& \sum_{k \in \mathbb {Z}^{d}} \biggl(\frac{eP_{k}}{a} \biggr) \biggl(\frac{eP}{a} \biggr)^{as-1} = \frac{1}{P} \biggl(\frac{eP}{a} \biggr)^{as} . \end{aligned}$$

 □

Remark 19

For a distribution with probability density function \(w(x)\), we can take \(P=\|w\|_{\infty}\) so that for \(a\) and \(s\) satisfying the conditions of Lemma 18. So we have

$$\mathbb {P}\bigl(N_{0}(X)>as \bigr)\leq\frac{1}{\|w\|_{\infty}} \biggl( \frac{e\|w\|_{\infty}}{a} \biggr)^{as} . $$

Remark 20

If we take a special example that the distribution is uniform distribution on \(C_{R}\), then \(P\) can be taken as \(\lfloor R \rfloor ^{-d}\). So we have

$$\mathbb {P}\bigl(N_{0}(X)>as \bigr)\leq\lfloor R \rfloor ^{d} \biggl(\frac{e\lfloor R \rfloor^{-d}}{a} \biggr)^{as} \leq R^{d}\exp \bigl(-s \bigl(a \log \bigl(a\lfloor R \rfloor ^{d} \bigr)-a \bigr) \bigr) . $$

It provide a different estimate from [3, Lemma 8]. Note that we demand \(a\) more strictly conditions than that in [3], but our estimate might be better than that in [3]. And our result is suitable for a large kinds of non-uniform probability distribution.

Then we can see the following statement.

Theorem 21

Let \((x_{j})_{j \in\mathbb{N}}\subset \mathbb {R}^{d}\)be a sequence of independent random variables with probability density function \(w_{R}\). Assume that \(R>R_{0}\), and furthermore

$$\delta< \frac{\rho}{2(\rho+12D_{3})}, \qquad \nu< K_{w} \biggl(\rho \biggl( \frac{1}{2} - \delta \biggr) - 12 D_{3} \delta \biggr). $$

Then, for any \(s \in\mathbb{N}\)and \(s \geq\frac{R^{d}}{3 K_{w}}\),

$$A = K_{w} \rho \biggl( \frac{1}{2} - \delta \biggr) - \nu - 12 K_{w} D_{3} \delta $$

is strictly positive, and the sampling estimate

$$ sAR^{-d} \| f \|_{2}^{2} \le\sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \le s D_{2} \| f \|_{2}^{2}, \quad \forall f \in V_{R,\delta}^{2}(\varLambda,\varPhi) $$
(18)

holds with probability at least

$$ 1- D(\varLambda) R^{d^{2}/\alpha'} \beta^{d} \exp \biggl( -\frac{\nu^{2} s}{2 D_{2} R^{d} (K_{w}+\frac{v}{3} )} \biggr) - \frac{R^{d}}{K_{w}} \biggl( \frac{e}{3} \biggr)^{3sR^{-d}K_{w}}. $$
(19)

Proof

Define the random variable \(N_{0}\) as the absolute covering index of \(x_{1},\ldots,x_{s}\). Fix \({N = N(R)}\), and consider the events

$$V_{1} = \Biggl\{ \inf_{f \in\mathcal{P}_{N}, \| f \|_{2} = 1} \frac{1}{s} \sum_{j=1}^{s} \bigl( \bigl|f(x_{j})\bigr|^{2} - \| v_{R} f \|_{2}^{2} \bigr) \le- \nu R^{-d} \Biggr\} $$

and

$$V_{2} = \bigl\{ N_{0} \ge3 R^{-d} K_{w} \bigr\} . $$

Following Lemma 17 and taking \(\alpha=\frac{1}{2}\), we see that for all \((x_{1},\ldots,x_{s})\) in the complement of \(V_{1} \cup V_{2}\) and all \(f \in V_{R,\delta}(\varPhi)\),

$$\frac{1}{s} \sum_{j=1}^{s} \bigl|f(x_{j})\bigr|^{2} \ge AR^{-d} \| f \|_{2}^{2}. $$

Theorem 15 combined with Lemma 13 imply that \(V_{1}\) occurs with probability at most

$$D(\varLambda) R^{d^{2}/\alpha'} \beta^{d} \exp \biggl( - \frac{\nu^{2} s}{2 D_{2} R^{d} (K_{w}+\frac{v}{3} )} \biggr). $$

Furthermore, Lemma 18 yields that \(V_{2}\) occurs with probability at most

$$\frac{R^{d}}{K_{w}} \biggl( \frac{e}{3} \biggr)^{3sR^{-d}K_{w}}. $$

Thus the lower estimate in (18) occurs at least with the probability given in (19), whereas the upper estimate follows Proposition 24 in Appendix. □

Proof of Theorem 1

By the definitions of \(M\) and \(\beta\) in Table 1, we have \(D(\varLambda)R^{d^{2}/\alpha'}\beta^{d}\leq D(\varLambda )R^{d^{2}/\alpha'}M\) as well as \(\frac{R^{d}}{K_{w}}\leq D(\varLambda )R^{d^{2}/\alpha'}M\). And by the assumptions in (7), we have \(( \frac{e}{3} )^{3sR^{-d}K_{w}} \leq\exp ( -\frac{\nu^{2} s}{2 D_{2} R^{d} (K_{w}+\frac{v}{3} )} )\). Hence, as soon as

$$s \ge R^{d} \frac{2 D_{2} (K_{w}+\frac{\nu}{3} )}{\nu^{2}} \log \biggl(\frac{2 D(\varLambda) M R^{d^{2}/\alpha'}}{\epsilon} \biggr) $$

holds we have that \(D(\varLambda)R^{d^{2}/\alpha'}M \exp ( -\frac {\nu^{2} s}{2 D_{2} R^{d} (K_{w}+\frac{v}{3} )} ) \le\epsilon/2\). The theorem is proved. □

Proof of Corollary 4 and 5

Obviously \(\ell=Sf\), therefore \(S^{*}\ell=S^{*}Sf\). Now we estimate the spectrum of \(S^{*}S\). For all \(f \in V^{2}(\varLambda,\varPhi)\), we can see that

$$\bigl\langle f, S^{*}Sf \bigr\rangle = \langle Sf, Sf \rangle= \| \ell\|_{2}^{2} = \sum_{j=1}^{s}\bigl|f(x_{j})\bigr|^{2} . $$

By Theorem 1, we have that

$$s A R^{-d}\langle f, f \rangle\leq \bigl\langle f, S^{*}Sf \bigr\rangle = \sum_{j=1}^{s}\bigl|f(x_{j})\bigr|^{2} \leq sD_{2}\langle f, f \rangle $$

hold under the conditions in Theorem 1 with probability at least \(1-\epsilon\). This implies that \(\|S^{*}S\|\leq sD_{2}\), as well as \(S^{*}S\) is invertible and \(\|(S^{*}S)^{-1}\|\leq(sAR^{-d})^{-1}\) in this case. Then Corollary 4 is proved.

In this case, the conditional number

$$\begin{aligned} \kappa \bigl(S^{*}S \bigr) = & \bigl\| S^{*}S \bigr\| \bigl\| \bigl(S^{*}S \bigr)^{-1}\bigr\| \\ \leq& sD_{2} \bigl(AsR^{-d} \bigr)^{-1} \\ = & \frac{D_{2}R^{d}}{A} \\ = & \frac{D_{2}R^{d}}{K_{w}\rho(\frac{1}{2}-\delta)-\nu-12K_{w} D_{3} \delta}. \end{aligned}$$

Let \(\delta\rightarrow0\), then Corollary 5 is proved. □

Proof of Corollary 6

By the conclusions of Theorem 1, we have that \(AR^{-d}\|f\| _{2}^{2} \leq\frac{1}{s}\sum_{i=1}^{s}|f(x_{i})|^{2} \leq D_{2}\|f\| _{2}^{2}\) for all \(f \in V_{R,\delta}^{2}(\varLambda,\varPhi)\) hold with probability \(1-\epsilon\) under the conditions in Theorem 1. Therefore we have that

$$(1-D_{2})\|f\|_{2}^{2} \leq\|f\|_{2}^{2}-\frac{1}{s}\sum _{i=1}^{s}\bigl|f(x_{i})\bigr|^{2} \leq \bigl(1-AR^{-d} \bigr)|f\|_{2}^{2} $$

in this case. The corollary is proved. □

7 Reconstruction Algorithm from Non-uniformly Random Sampling

Realistically one can sample \(f\) only on a bounded set; furthermore, every function vanishes at infinity, thus samples far out do not contribute anything significant to sampling and reconstruction. We can learn about \(f\) only if the samples are taken in the “essential support” of \(f\). Let \(\varLambda\) is a finite set so we can write \(\varPhi\) as \(\{\phi _{1},\phi_{2},\ldots,\phi_{i},\ldots,\phi_{n}\}\) where \(n=|\varLambda|\). Suppose the generators are bounded and the support of every \(\phi_{i}\) is in \(C_{R}\). So that for any small \(\delta>0\), we have \(V_{R,\delta}^{2}(\varLambda,\varPhi )=V^{2}(\varLambda,\varPhi)\) since all the energy of these functions is all in \(C_{R}\). In addition, we assume that \(\varPhi\) is a Riesz basis of \(V^{2}(\varLambda,\varPhi)\).

For a signal \(f=\sum_{i=1}^{n}c_{i}\phi_{i} \in V^{2}(\varLambda ,\varPhi)\), denote

$$F=\left ( \textstyle\begin{array}{ccccc}f(x_{1})&\cdots&f(x_{j})&\cdots&f(x_{s}) \end{array}\displaystyle \right ) $$

to be our sample value,

$$C=\left ( \textstyle\begin{array}{ccccc} c_{1}&\cdots&c_{i}&\cdots&c_{n} \end{array}\displaystyle \right ) $$

to be the coefficients, and the matrix

$$ U= \bigl(\phi_{i}(x_{j}) \bigr)_{ \begin{subarray}{c}i=1,\ldots,n\\j=1,\ldots,s \end{subarray} } . $$
(20)

Then we can easily see that \(F=CU\).

Note that \(\varPhi\) is a Riesz basis, we have that \(c_{i}=\langle f,\widetilde{\phi}_{i}\rangle\). So that by Proposition 22 in Appendix, we have

$$\|C\|_{\ell^{2}(\varLambda)}^{2}\leq\|\widetilde{\varPhi} \|_{2,1,u_{0}}^{2} \|f\|_{2}^{2} . $$

On the other hand, use the same method as Proposition 22 in Appendix, we have

$$\begin{aligned} \|f\|_{2}^{2} = & \Biggl\Vert \sum _{i=1}^{n}c_{i}\phi_{i} \Biggr\Vert _{2}^{2} \\ = & \sum_{k\in \mathbb {Z}^{d}} \Biggl\Vert \sum _{i=1}^{n}c_{i}\phi_{i} \Biggr\Vert _{\mathrm{L}^{2}(k+[0,1]^{d})}^{2} \\ \leq& \sum_{k\in \mathbb {Z}^{d}} \Biggl(\sum _{i=1}^{n}|c_{i}|\|\phi _{i}\|_{\mathrm{L}^{2}(k+[0,1]^{d})} \Biggr)^{2} \\ \leq& \|\varPhi\|_{2,1,u_{0}}^{2} \|C\|_{\ell^{2}(\varLambda)}^{2}. \end{aligned}$$

By Corollary 4, for \(\nu\) small enough and \(s\) large enough, we have that

$$A_{1}\|f\|_{2}^{2}\leq\sum _{j=1}^{n}\bigl|f(x_{j})\bigr|^{2} \leq B_{1}\|f\|_{2}^{2} $$

hold with probability at least \(1-\epsilon\) where \(A_{1} = \frac {s}{R^{d}} (\frac{K_{w} \rho}{2} - \nu )\) and \(B_{1} = s D_{2}\). Combining with the arguments above, we have that

$$A_{2}\|C\|_{\ell^{2}(\varLambda)}\leq\|f\|_{2}^{2} \leq B_{2} \|C\|_{\ell^{2}(\varLambda)}~ , $$

where \(A_{2}=\|\widetilde{\varPhi}\|_{2,1,u_{0}}^{-2}\) and \(B_{2}=\| \varPhi\|_{2,1,u_{0}}^{2}\). Therefore

$$A_{1} A_{2}\|C\|_{\ell^{2}(\varLambda)}\leq\sum _{j=1}^{n}\bigl|f(x_{j})\bigr|^{2} \leq B_{1} B_{2} \|C\|_{\ell^{2}(\varLambda)} $$

hold with probability at least \(1-\epsilon\). Similar to the arguments which prove Corollary 4 and Corollary 5, we get that \(UU^{*}\) is invertible and the conditional number of \(UU^{*}\) which is denoted by \(\kappa (UU^{*})\) would not larger than \(\frac{B_{1} B_{2}}{A_{1} A_{2}}=\frac {2 R^{d} D_{2} \|\varPhi\|_{2,1,u_{0}}^{2} \|\widetilde{\varPhi}\| _{2,1,u_{0}}^{2}}{K_{w} \rho- 2\nu}\) with probability at least \(1-\epsilon\). That is,

$$\mathbb {P}\biggl(\kappa \bigl(UU^{*} \bigr)>\frac{2 R^{d} D_{2} \|\varPhi \|_{2,1,u_{0}}^{2} \|\widetilde{\varPhi}\|_{2,1,u_{0}}^{2}}{K_{w} \rho - 2\nu} \biggr)\leq\epsilon. $$

Then one can reconstruct the original function from the sample by

$$C= \bigl(FU^{*} \bigr) \bigl(UU^{*} \bigr)^{-1} $$

with probability with at least \(1-\epsilon\).

The reconstruction algorithm from non-uniform random sampling is shown in Table 2.

Table 2 Reconstruction algorithm

7.1 Numerical Test

Recall the spline functions mentioned in Example 8. Let us consider it as a numerical model for the generators in \(V_{R,\delta }^{2}(\varLambda,\varPhi)\). Let the spline knots be integer points and \(m=3\). We restrict the splines in \(C_{R}\), where \(R\) is an even positive integer. We will review that space by the following definition. Let \(d=1\) and \(R\) be an even positive integer. We use the spline function

$$\phi_{0} = \chi_{[-\frac{1}{2},\frac{1}{2}]}*\chi_{[-\frac {1}{2},\frac{1}{2}]}*\chi _{[-\frac{1}{2},\frac{1}{2}]}, $$

where “∗” represents the convolution product. The generators are presented by

$$\phi_{\lambda}(x) = \phi_{0}(x-\lambda)\chi _{C_{R}}(x), \qquad \lambda\in\frac{1}{2}+\mathbb {Z}. $$

Let \(\varLambda=\{\lambda:\phi_{\lambda}\not\equiv0\}\). We can easily check that \(\varLambda=(\frac{1}{2}+\mathbb {Z})\cap[-\frac{R}{2}-1,\frac {R}{2}+1]\) and \({\lambda^{*}=1/2}\).

We will test two kinds of probability distributions in sampling. One is the normal distribution with mean 0 and variance \(\sigma\) like Example 10. In this case, we can get that the optimal choice of \(\sigma\) to optimize the lower bound of sampling inequality is \(1/2\) taking \(\delta \rightarrow0\), so that \(K_{w}=\sqrt{\frac{2}{\pi}}\) and \(\rho=\exp (-\frac{1}{2})\). Note that we actually abandon those sampling points out of \(C_{R}\) in practice. Another distribution is the uniform distribution in \(C_{R}\), just like that used in [3] and [8]. In this case, we have \(K_{w}=1\) and \(\rho=1\). Figure 4 plots the examples of different distributions of sampling points with \(R=8\).

Fig. 4
figure 4

Different distributions of sampling points

Then we can give a totally practical trial to get the actual success rate. We use the algorithm in Table 2, using the numerical pseudo-inverse in Step 5, just like the “PseudoInverse” function in Wolfram Mathematica or “pinv” function in MATLAB. We take the space \(\varPhi\) as above while the distribution of sampling is normal distribution with a specific \(\sigma\) defined in Example 10. More specific, we take \(c(\lambda)\) periodically of period 8. The signal is shown in Fig. 4 too.

For a test, we will run the algorithm and calculate the Euclidean distance between the \(C\) we calculated numerically in Step 7 and the real \(C\) of the original function \(f\) which is regarded as the reconstruction error. One test is regard as successful test when the reconstruction error is less than \(10^{-9}\).

Firstly, we take into two parts: one is using the normal distribution mentioned above with \(\sigma=1/2\), taking \(R=8,16,24\) and \(s=20,30,\ldots,100\) respectively; another is using the uniform distribution in \(C_{R}\) taking the same parameters. We take 10000 trials for each case, then test the actual success rate in those cases. Figure 5 plots the success rate respectively to every value of \(s\) in each case.

Fig. 5
figure 5

The relation between success rate and sampling rate

From Fig. 5, we can see that: (i) the success rate increases with \(s\) increases; (ii) the success rate decreases with \(R\) increases and other conditions unchanged; (iii) due to the nonuniformness of the distribution, we need more sampling points or less \(R\) to reach the same success rate comparing with the uniform distribution. The results of the numerical tests match with Theorem 1 and Corollary 5.

Secondly, we fix \(R=8\), take \(s=40,60\) as well as \(\sigma =1/80,2/80,\ldots,120/80\) respectively and take 10000 trials for each case with the same normal distribution, then test the actual success rate in those cases. Figure 6 shows the success rate with the \(\sigma\)’s.

Fig. 6
figure 6

The relation between the practical success rate and \(\sigma\)

From Fig. 6, we can see that for a certain \(s\), the success rate first increases then decreases with \(\sigma\) increasing. We get the best choice that optimize the actual success rate is that \(\sigma=40/80=1/2\) by the data of \(s=60\). At the same time, we can improve the success rate by increasing sampling points under the same \(\sigma\). The results of the numerical tests match with Theorem 1 and Corollary 5.

Next, we explore the “threshold” of \(s\), or the smallest \(s\) to have the actual sampling success rate reached \(95\%\) for different \(R\)’s. We fix \(\sigma=1/2\) and \(R=8,16,32,64\) respectively, take different values of sampling rates around the threshold we pre-estimated while taking 10000 trials for each case of sampling rate, and look for the “threshold” of \(s\). Table 3 is the thresholds with those \(R\)’s.

Table 3 Thresholds of \(s\) with different \(R\)’s

We can see that every value of the threshold is a large twice times than the front value from Table 3. So the \(O(R^{d} \log R^{d})\) sampling rate in Theorem 1 is match with numerical test.

8 Conclusions

In this paper, we considered the non-uniform random sampling and reconstruction problem on signal spaces with finite rate of innovation. Under very mild assumptions on the generators and the probability distribution function, we show that for \(R\) sufficiently large, taking \(O(R^{d} \log(R^{d}))\) many random samples yields a sampling set for \(V^{2}_{R,\delta}(\varPhi)\) with high probability. As one of applications of our main results, we provided a reconstruction algorithm from non-uniform random sampling with high probability and implemented it when the generators are compact-supported.