1 Introduction

A standard approach to solving a continuous problem numerically – the Galerkin method – suggests looking for an approximate solution from a given finite-dimensional subspace. A typical way to measure an error of approximation is an appropriate \(L_p\) norm, \(1\le p\le \infty \). Thus, the problem of discretization of the \(L_p\) norms of functions from a given finite-dimensional subspace arises in a very natural way. Approximation by elements from a linear subspace falls in the category of linear approximation.

It was understood in numerical analysis and approximation theory that in many problems from signal/image processing it is beneficial to use an m-term approximant with respect to a given system of elements (dictionary) \({{\mathcal {D}}}_N:=\{g_i\}_{i=1}^N\). This means that for \(f\in X\) we look for an approximant of the form

$$\begin{aligned} a_m(f):=\sum _{k\in \Lambda (f)}c_kg_k \end{aligned}$$
(1.1)

where \(\Lambda (f) \subset [1,N]\) is a set of m indices which is determined by f. The complexity of this approximant is characterized by the cardinality \(|\Lambda (f)|=m\) of \(\Lambda (f)\). Approximation of this type is referred to as nonlinear approximation because, for a fixed m, the approximant \(a_m(f)\) comes from different linear subspaces spanned by \(g_k\), \(k\in \Lambda (f)\), which depend on f. The cardinality \(|\Lambda (f)|\) is a fundamental characteristic of \(a_m(f)\) called sparsity of \(a_m(f)\) with respect to \({{\mathcal {D}}}_N\). It is now well understood that we need to study nonlinear sparse approximation in order to significantly increase our ability to process (compress, denoise, etc.) large data sets. Sparse approximations of a function are not only a powerful analytic tool but they are utilized in many applications in image/signal processing and numerical computation.

Therefore, here is an important ingredient of the discretization problem, desirable in practical applications. Suppose we have a finite dictionary \({{\mathcal {D}}}_N:=\{g_j\}_{j=1}^N\) of functions from \(L_p(\Omega ,\mu )\). Applying our strategy of sparse m-term approximation with respect to \({{\mathcal {D}}}_N\) we obtain a collection of all subspaces spanned by at most m elements of \({{\mathcal {D}}}_N\) as a possible source of approximating (representing) elements. Thus, we would like to build a discretization scheme, which works well for all such subspaces. This kind of discretization falls in the category of universal discretization. The paper is devoted to the problem of universal sampling discretization.

Let \(\Omega \) be a nonempty set equipped with a probability measure \(\mu \). For \(1\le p\le \infty \), let \(L_p(\Omega ):=L_p(\Omega ,\mu )\) denote the real Lebesgue space \(L_p\) defined with respect to the measure \(\mu \) on \(\Omega \), and let \(\Vert \cdot \Vert _p\) be the norm of \(L_p(\Omega )\). By discretization of the \(L_p\) norm we understand a replacement of the measure \(\mu \) by a discrete measure \(\mu _m\) with support on a set \(\xi =\{\xi ^j\}_{j=1}^m \subset \Omega \). This means that integration with respect to measure \(\mu \) is replaced by an appropriate cubature formula. Thus, integration is replaced by evaluation of a function f at a finite set \(\xi \) of points. This is why this way of discretization is called sampling discretization. The problem of sampling discretization is a classical problem. The first results in this direction were obtained in the 1930 s by Bernstein, by Marcinkiewicz, and by Marcinkiewicz and Zygmund for discretization of the \(L_p\) norms of the univariate trigonometric polynomials. Even though this problem is very important in applications, its systematic study has begun only recently (see the survey paper [5]). We now give explicit formulations of the sampling discretization problem (also known as the Marcinkiewicz discretization problem) and of the problem of universal discretization.

The sampling discretization problem. Let \((\Omega ,\mu )\) be a probability space and let \(X_N\subset L_p\) be an N-dimensional subspace of \(L_p(\Omega ,\mu )\) with \(1\le p \le \infty \) (the index N here, usually, stands for the dimension of \(X_N\)). We shall always assume that every function in \(X_N\) is defined everywhere on \(\Omega \), and

$$\begin{aligned} f\in X_N, \ \Vert f\Vert _p =0\implies f=0\in X_N. \end{aligned}$$

We say that \(X_N\) admits the Marcinkiewicz-type discretization with parameters \(m\in {\mathbb {N}}\) and p and positive constants \(C_1\le C_2\) if there exists a set \(\xi := \{\xi ^j\}_{j=1}^m \subset \Omega \) such that for any \(f\in X_N\) we have in the case of \(1\le p <\infty \),

$$\begin{aligned} C_1\Vert f\Vert _p^p \le \frac{1}{m} \sum _{j=1}^m |f(\xi ^j)|^p\le C_2\Vert f\Vert _p^p, \end{aligned}$$
(1.2)

and in the case of \(p=\infty \),

$$\begin{aligned} C_1\Vert f\Vert _\infty \le \max _{1\le j\le m} |f(\xi ^j)| \le \Vert f\Vert _\infty . \end{aligned}$$

The problem of universal discretization. Let \({\mathcal {X}}:= \{X(n)\}_{n=1}^k\) be a collection of finite-dimensional linear subspaces X(n) of the space \(L_p(\Omega )\) for a given \(1\le p \le \infty \). We say that a set \(\xi := \{\xi ^j\}_{j=1}^m \subset \Omega \) provides universal discretization for the collection \({\mathcal {X}}\) if there are two positive constants \(C_i\), \(i=1,2\), such that for each \(n\in \{1,\dots ,k\}\) and any \(f\in X(n)\) we have

$$\begin{aligned} C_1\Vert f\Vert _p^p \le \frac{1}{m} \sum _{j=1}^m |f(\xi ^j)|^p \le C_2\Vert f\Vert _p^p \end{aligned}$$

in the case of \(1\le p<\infty \), and

$$\begin{aligned} C_1\Vert f\Vert _\infty \le \max _{1\le j\le m} |f(\xi ^j)| \le \Vert f\Vert _\infty \end{aligned}$$
(1.3)

in the case of \(p=\infty \).

Note that the problem of universal discretization for the collection \({\mathcal {X}}:= \{X(n)\}_{n=1}^k\) is the sampling discretization problem for the set \(\cup _{n=1}^k X(n)\). Also, we point out that the concept of universality is well known in approximation theory. For instance, the reader can find a discussion of universal cubature formulas in [28], Section 6.8.

The problem of universal discretization for some special subspaces of the trigonometric polynomials was studied in [5, 29]. To describe the results in [5, 29] we need to introduce some necessary notations. First, for a given finite subset Q of \({\mathbb {Z}}^d\) we set

$$\begin{aligned} {\mathcal {T}}(Q):= \Bigl \{f: f=\sum _{{\textbf{k}}\in Q}c_{\textbf{k}}e^{i({\textbf{k}},{\textbf{x}})},\ \ c_{\textbf{k}}\in \mathbb {C},\ \ {\textbf{k}}\in Q\Bigr \}. \end{aligned}$$

For \({\textbf{s}}=(s_1, \cdots , s_d)\in {\mathbb {Z}}^d_+\) we define

$$\begin{aligned} R({\textbf{s}}):= \{{\textbf{k}}\in {\mathbb {Z}}^d: |k_j| < 2^{s_j}, \quad j=1,\dots ,d\}. \end{aligned}$$

The following result, proved in [29], solves the universal discretization problem for the collection

$$\begin{aligned} {{\mathcal {C}}}(n,d):= \left\{ {\mathcal {T}}(R({\textbf{s}})): s_1+\cdots +s_d=n\right\} \end{aligned}$$

of subspaces of trigonometric polynomials.

Theorem 1.1

[29] For every \(1\le p\le \infty \) there exists a large enough constant C(dp), which depends only on d and p, such that for any \(n\in {\mathbb {N}}\) there is a set \(\xi :=\{\xi ^\nu \}_{\nu =1}^m\subset {\mathbb {T}}^d\), with \(m\le C(d,p)2^n\) that provides universal discretization in \(L_p\) for the collection \({{\mathcal {C}}}(n,d)\).

Second, for \(n\in {\mathbb {N}}\) let

$$\begin{aligned} \Pi _n:=[-2^{n-1}+1, 2^{n-1} -1]^d\cap {\mathbb {Z}}^d. \end{aligned}$$

For a positive integer \(v\le |\Pi _n|\) define

$$\begin{aligned} {\mathcal {S}}(v,n):= \left\{ Q: \ \ Q\subset \Pi _n, \ |Q|=v\right\} . \end{aligned}$$

Then it is easily seen that

$$\begin{aligned} |{\mathcal {S}}(v,n)| =\left( {\begin{array}{c}|\Pi _n|\\ v\end{array}}\right) <2^{dnv}. \end{aligned}$$

The following theorem provides universal discretization of \(L_1\) and \(L_2\) norms for the collection \(\{{\mathcal {T}}(Q):\ \ Q\in {\mathcal {S}}(v,n)\}\).

Theorem 1.2

[5, 27, Theorem 7.4] For positive integers n and \(1\le v\le |\Pi _n|\) let

$$\begin{aligned} M_p(n,v):={\left\{ \begin{array}{ll} v^2 n^{9/2}, \ \,&{}\text {if }p=1,\\ v^2n,\ \,&{}\text {if }p=2. \end{array}\right. } \end{aligned}$$

Then there exist three positive constants \(C_i(d)\), \(i=1,2,3\), such that for any \(n,v\in {\mathbb {N}}\) with \(v\le |\Pi _n|\), and for \(p=1\) and \(p=2\) there is a set \(\xi =\{\xi ^\nu \}_{\nu =1}^m \subset {\mathbb {T}}^d\), with \(m\le C_1(d) M_p(n,v)\) such that for any \(f\in \cup _{Q\in {\mathcal {S}}(v,n)} {\mathcal {T}}(Q)\)

$$\begin{aligned} C_2(d)\Vert f\Vert _p^p \le \frac{1}{m} \sum _{\nu =1}^m |f(\xi ^\nu )|^p \le C_3(d)\Vert f\Vert _p^p. \end{aligned}$$

Let us denote by \({{\mathcal {D}}}_N=\{g_i\}_{i=1}^N\) a system of functions from \(L_p\). Denote the set of all v-term approximants with respect to \({{\mathcal {D}}}_N\) as

$$\begin{aligned} \Sigma _v({{\mathcal {D}}}_N):= \left\{ f:\, f = \sum _{i\in G} c_ig_i,\quad \text {with}\, G\subset [1,N] \ \ \text {such that} |G|=v\right\} . \end{aligned}$$

Theorem 1.2 provides universal discretization for the collection \(\{{\mathcal {T}}(Q):\, Q\in {\mathcal {S}}(v,n)\}\), which is equivalent to the sampling discretization of the \(L_p\) norm of elements from the set \(\Sigma _v({{\mathcal {D}}}_N)\) with \(N=|\Pi _n|\), \({{\mathcal {D}}}_N =\{e^{i({\textbf{k}},{\textbf{x}})}\}_{{\textbf{k}}\in \Pi _n}\). The proof of Theorem 1.2 in the case \(p=2\) is based on deep results on random matrices and in the case \(p=1\) is based on the chaining technique. We point out that in both cases \(p=2\) and \(p=1\) Theorem 1.2 provides universal discretization with the number of points growing as \(v^2\).

On the other hand, while Theorem 1.1 provides universal discretization for the subcollection \({{\mathcal {C}}}(n,d)\) of \(\{{\mathcal {T}}(Q):\ \ Q\in {\mathcal {S}}(v,n+1)\}\) (rather than the whole collection \(\{{\mathcal {T}}(Q):\ \ Q\in {\mathcal {S}}(v,n+1)\}\)) with \(v=2^n\) and

$$\begin{aligned} {{\mathcal {D}}}:=\{ e^{i({\textbf{k}},{\textbf{x}})}:\ \ {\textbf{k}}\in {\mathbb {Z}}^d,\ \ |k_j|<2^n,\ \ 1\le j\le d\Bigr \}, \end{aligned}$$

it gives a better estimate \(m\le C v\) on the number of points, which is linear in v, and applies to the full range of \(1\le p\le \infty \).

In this paper we prove the following estimate (see below for definitions and notations).

Theorem 1.3

Let \(1\le p\le 2\). Assume that \(\Phi _N\) is a uniformly bounded Riesz basis of \(X_N:={\text {span}}(\Phi _N)\) satisfying (2.8) for some constants \(0<R_1\le R_2\). Then for a large enough constant \(C=C(p,R_1,R_2)\) and any integer \(1\le v\le N\) there exist m points \(\xi ^1,\cdots , \xi ^m \in \Omega \) with

$$\begin{aligned} m \le Cv(\log N)^2(\log (2v))^2 \end{aligned}$$

such that for any \(f\in \Sigma _v(\Phi _N)\) we have

$$\begin{aligned} \frac{1}{2}\Vert f\Vert _p^p \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)|^p \le \frac{3}{2}\Vert f\Vert _p^p. \end{aligned}$$
(1.4)

In particular, Theorem 1.3 gives the order of bound

$$\begin{aligned} m\ll v(\log N)^2(\log (2v))^2, \end{aligned}$$

which is linear in v with extra logarithmic terms in N and v. This bound is much better than previously known bounds (see Theorem 1.2), which provided quadratic in v bounds. Note that even for each individual subspace from \(\{{\mathcal {T}}(Q):\ \ Q\in {\mathcal {S}}(v,n)\}\) we have the lower bound \(m\ge v\) for the sampling discretization.

Finally, we point out that very recent progress related to universal discretization has been made in our follow-up papers [8, 9]. More precisely, in [8] we prove that in the setting of Theorem 1.3 independent random points \(\xi _1,\cdots , \xi _m\in \Omega \) that are identically distributed according to a given probabilistic measure \(\mu \) provide the universal discretization (1.4) with high probability under a slightly weaker condition than the condition in Theorem 1.3 on the number of points

$$\begin{aligned} m\ll v(\log N)(\log (2v))^2 (\log (2v) +\log \log N). \end{aligned}$$

Also, in [8] we relaxed the condition on the Riesz basis \(\Phi _N\). In [9] we show how universal discretization can be applied to deduce interesting results on sparse sampling recovery. In particular, we demonstrate that a simple greedy type algorithm based on good points for universal discretization provides good recovery in the square norm.

The rest of this paper is organized as follows. Sections 2 and 3 are devoted to estimating the entropy numbers \(\varepsilon _k(\Sigma _v^p(\Phi _N),L_\infty )\) of the sets

$$\begin{aligned} \Sigma _v^p(\Phi _N):= \{f\in \Sigma _v(\Phi _N):\, \Vert f\Vert _p\le 1\},\ \ 1\le p\le 2, \end{aligned}$$

in the \(L_\infty \)-norm, where \(\Phi _N\) is a uniformly bounded Riesz basis of \(X_N:=[\Phi _N]\subset L_2\) and \(1\le v\le N\) is an integer. Such estimates play an important role in the proof of Theorem 1.3. To be more precise, in Sect. 2 we prove under the additional condition (2.9) on the space \(X_N={\text {span}}(\Phi _N)\) that for \(p=2\),

$$\begin{aligned} \varepsilon _k(\Sigma _v^2(\Phi _N),L_\infty ) \le C (\log N) \Bigl (\frac{v}{k}\Bigr )^{1/2},\quad k=1,2,\dots . \end{aligned}$$
(1.5)

The proof of (1.5) uses a known result from Greedy approximation in smooth Banach spaces and its connection with entropy numbers. In Sect. 3, we show how the estimate (1.5) can be extended to the case \(1\le p<2\) under condition (2.9). This extension step is based on a general inequality for the entropy, which is given in Lemma 3.1 and appears to be of independent interest. In Sect. 4 we prove Theorem 1.3, using the estimates on entropy numbers established in the previous two sections and a conditional theorem on sampling discretization. A main step in the proof is to show that the condition (2.9) that is assumed in our estimates of entropy numbers can be dropped in sampling discretization. The conditional Theorem 2.2 used in the proof of Theorem 1.3 is given in Sect. 2 without proof. In Sect. 5, we prove a refined conditional theorem for sampling discretization of all integral norms \(L_p\) of functions from a subset \(\mathcal {W}\subset L_\infty \) satisfying certain conditions, which allows us to estimate the number of points required for the sampling discretization in terms of an integral of the \(\varepsilon \)-entropy \(\mathcal {H}_\varepsilon (\mathcal {W}, L_\infty )\), \(\varepsilon >0\). This is an extension of the conditional result proved in [7, 27] for the unit ball of the space \(X_N\subset L_p\). In particular, it also allows us to prove a refined version of Theorem 1.3, where the constants \(\frac{1}{2}\) and \(\frac{3}{2}\) in (1.4) are replaced by \(1-\varepsilon \) and \(1+\varepsilon \) respectively for an arbitrarily given \(\varepsilon \in (0, 1)\). Finally, in Sect. 6 we give a few remarks on universal sampling discretization of \(L_p\) norms for \(p>2\).

Throughout this paper the letter C denotes a general positive constant depending only on the parameters indicated as arguments or subscripts. We use the notation |A| to denote the cardinality of a finite set A.

2 Some General Entropy Bounds and the Case \(p=2\)

It is well known that bounds of the entropy numbers of the unit ball of an N-dimensional subspace \(X_N\subset L_p\)

$$\begin{aligned} X_N^p:= \{f\in X_N:\, \Vert f\Vert _p \le 1\} \end{aligned}$$

play an important role in sampling discretization of the \(L_p\) norm of elements of \(X_N\) (see [6, 26, 27], and [7]).

Recall the definition of entropy numbers in Banach spaces. Let X be a Banach space and \(B_X(g,r)\) denote the closed ball \(\{f\in X:\Vert f-g\Vert \le r\}\) with center \(g\in X\) and radius \(r>0\). Given a positive number \(\varepsilon \), the covering number \(N_\varepsilon (A, X)\) of a compact set \(A\subset X\) is defined as

$$\begin{aligned} N_\varepsilon (A,X):=\min \left\{ n\in {\mathbb {N}}:\ \ \exists \; g^1,\ldots , g^n\in A, \ A\subset \bigcup _{j=1}^n B_X(g^j, \varepsilon )\right\} . \end{aligned}$$

We denote by \({\mathcal {N}}_\varepsilon (A,X)\) the corresponding minimal \(\varepsilon \)-net of the set A in X; namely, \({\mathcal {N}}_\varepsilon (A,X)\) is a finite subset of A such that \(A\subset \bigcup _{y\in {\mathcal {N}}_\varepsilon (A,X)}B_X(y,\varepsilon )\) and \(N_\varepsilon (A,X)= |{\mathcal {N}}_\varepsilon (A,X)|\). The \(\varepsilon \)-entropy \(\mathcal {H}_\varepsilon (A, X)\) of the compact set A in X is defined as \(\log _2 N_\varepsilon (A,X)\), and the entropy numbers \(\varepsilon _k(A,X)\) of the set A in X are defined as

$$\begin{aligned} \varepsilon _k(A,X) :&=\inf \{\varepsilon >0: {{\mathcal {H}}}_\varepsilon (A, X)\le k\},\ \ k=1,2,\ldots . \end{aligned}$$

The following conditional result was proved in [27] for \(p=1\) and in [6] for the full range of \(1\le p<\infty \).

Theorem 2.1

[27, 6, Theorem 1.3] Let \(1\le p<\infty \). Suppose that a subspace \(X_N\subset L_p(\Omega ,\mu )\) satisfies the condition

$$\begin{aligned} \varepsilon _k(X^p_N,L_\infty ) \le B (N/k)^{1/p}, \quad 1\le k\le N, \end{aligned}$$
(2.1)

where \(B\ge 1\). Then for a large enough constant C(p) there exist m points \(\xi ^1,\cdots , \xi ^m \in \Omega \) with

$$\begin{aligned} m \le C(p)NB^{p}(\log _2(2BN))^2 \end{aligned}$$

such that for any \(f\in X_N\) we have

$$\begin{aligned} \frac{1}{2}\Vert f\Vert _p^p \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)|^p \le \frac{3}{2}\Vert f\Vert _p^p. \end{aligned}$$

As we explained above the problem of universal discretization of the collection \(\{X(n)\}_{n=1}^k\) is equivalent to the sampling discretization of the union \(\cup _{n=1}^k X(n)\) of the corresponding subsets. Therefore, instead of bounds of the entropy numbers of the unit ball \(X_N^p\) we are interested in the entropy bounds of the "unit ball"

$$\begin{aligned} \Sigma _v^p({{\mathcal {D}}}_N):= \{f\in \Sigma _v({{\mathcal {D}}}_N):\, \Vert f\Vert _p\le 1\}, \end{aligned}$$

which is the union of the corresponding unit balls.

The following version of Theorem 2.1 follows directly from its proof.

Theorem 2.2

Let \(1\le p<\infty \) and \(1\le v\le N\). Suppose that a dictionary \({{\mathcal {D}}}_N\) is such that the set \(\Sigma _v^p({{\mathcal {D}}}_N)\) satisfies the condition

$$\begin{aligned} \varepsilon _k(\Sigma _v^p({{\mathcal {D}}}_N),L_\infty ) \le B_1 (v/k)^{1/p}\Vert f\Vert _p, \quad 1\le k <\infty , \end{aligned}$$
(2.2)

where \(B_1\ge 1\). Assume in addition that there exists a constant \(B_2\ge 1\) such that

$$\begin{aligned} \Vert f\Vert _\infty \le B_2 v^{1/p} \Vert f\Vert _p,\ \ \forall f\in \Sigma _v^p({{\mathcal {D}}}_N). \end{aligned}$$
(2.3)

Then for a large enough constant C(p) there exist m points \(\xi ^1,\cdots , \xi ^m\in \Omega \) with

$$\begin{aligned} m \le C(p)B_1^{p} v(\log (2B_2v))^2 \end{aligned}$$

such that for any \(f\in \Sigma _v({{\mathcal {D}}}_N)\) we have

$$\begin{aligned} \frac{3}{4}\Vert f\Vert _p^p \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)|^p \le \frac{5}{4}\Vert f\Vert _p^p. \end{aligned}$$

Theorem 2.2 also follows from a more general conditional theorem that will be proved in Sect. 5 (see Corollary 5.1).

Remark 2.1

We point out that (2.2) implies

$$\begin{aligned} \Vert f\Vert _\infty \le 3 B_1 v^{1/p}\Vert f\Vert _p,\ \ \forall f\in \Sigma ^p_v({{\mathcal {D}}}_N). \end{aligned}$$
(2.4)

Therefore, assumption (2.3) can be dropped with \(B_2\) replaced by \(3B_1\) in the bound on m. However, in applications the constant \(B_2\) in (2.3) may be significantly smaller than \(3B_1\). For example, if \({{\mathcal {D}}}_N\) is a uniformly bounded orthonormal system with \(\max _{f\in {{\mathcal {D}}}_N}\Vert f\Vert _\infty =1\), then we can take \(B_2=1\).

Proof of (2.4)

For \(\Lambda \subset [1,N]\cap {\mathbb {N}}\) denote \(X(\Lambda ):= {\text {span}}(g_i)_{i\in \Lambda }\) and \(X(\Lambda )^p:= \{f\in X(\Lambda ):\, \Vert f\Vert _p\le 1\}\). Clearly, (2.2) implies the same bound for each \(X(\Lambda )^p\) with \(|\Lambda |=v\). Thus, it is sufficient to prove (2.4) for a v-dimensional subspace \(X_v\). With a slightly worse constant \(4B_1\) instead of \(3B_1\) it was proved in [7, Remark 1.1]. We now show how to get a better constant. Setting \(\varepsilon _1:= \epsilon _1(X_v^p,L_\infty )\), we can find two functions \(f_1, f_2 \in X_v^p\) such that \(X_v^p \subset B_{L_\infty } (f_1, \varepsilon _1) \cup B_{L_\infty } (f_2, \varepsilon _1)\). Since \(0\in X_v^p\), 0 is contained in one of the two balls. Without loss of generality we may assume that \(0\in B_{L_\infty } (f_1, \varepsilon _1)\) so that \(\Vert f_1\Vert _\infty \le \varepsilon _1\). Since \(-f_2\in X_v^p\), we have either \(-f_2 \in B_{L_\infty } (f_1, \varepsilon _1)\) or \(-f_2 \in B_{L_\infty } (f_2, \varepsilon _1)\), which implies \(\Vert f_2\Vert _\infty \le 2\varepsilon _1\). It then follows that \(\Vert f\Vert _\infty \le 3 \varepsilon _1\) for all \(f\in X_v^p\). This together with (2.2) proves (2.4). \(\square \)

Theorem 2.2 motivates us to estimate the characteristics \(\varepsilon _k(\Sigma _v^p({{\mathcal {D}}}_N),L_\infty )\). We now recall some known general results, which turn out to be useful for that purpose. Let \({{\mathcal {D}}}_N=\{g_j\}_{j=1}^N\) be a system of elements of cardinality \(|{{\mathcal {D}}}_N|=N\) in a Banach space X. Consider the best m-term approximations of f with respect to \({{\mathcal {D}}}_N\)

$$\begin{aligned} \sigma _m(f,{{\mathcal {D}}}_N)_X:= \inf _{\{c_j\};\Lambda :|\Lambda |=m}\Vert f-\sum _{j\in \Lambda }c_jg_j\Vert . \end{aligned}$$

For a set \(W\subset X\) we define

$$\begin{aligned} \sigma _m(W,{{\mathcal {D}}}_N)_X:=\sup _{f\in W}\sigma _m(f,{{\mathcal {D}}}_N)_X,\ \ m=1,2,\cdots , \end{aligned}$$

and \( \sigma _0(W,{{\mathcal {D}}}_N)_X=\sup _{f\in W}\Vert f\Vert _X\). The following Theorem 2.3 was proved in [24] (see also [28], p.331, Theorem 7.4.3).

Theorem 2.3

Let a compact \(W\subset X\) be such that there exist a system \({{\mathcal {D}}}_N\subset X\) with \(|{{\mathcal {D}}}_N|=N\), and a number \(r>0\) such that

$$\begin{aligned} \sigma _m(W,{{\mathcal {D}}}_N)_X \le (m+1)^{-r},\quad m=0,1,\cdots , N. \end{aligned}$$

Then for \(k\le N\)

$$\begin{aligned} \varepsilon _k(W,X) \le C(r) \left( \frac{\log (2N/k)}{k}\right) ^r. \end{aligned}$$
(2.5)

For a given set \({{\mathcal {D}}}_N=\{g_j\}_{j=1}^N\) of elements we introduce the octahedron (generalized octahedron)

$$\begin{aligned} A_1({{\mathcal {D}}}_N) := \left\{ f\,:\, f=\sum _{j=1}^N c_jg_j,\quad \sum _{j=1}^N |c_j|\le 1\right\} \end{aligned}$$
(2.6)

and the norm \(\Vert \cdot \Vert _A\) on \(X_N\)

$$\begin{aligned} \Vert f\Vert _A:= \inf \left\{ \sum _{j=1}^N |c_j|:\, f=\sum _{j=1}^N c_jg_j \right\} . \end{aligned}$$

We now use a known general result for a smooth Banach space. For a Banach space X we define the modulus of smoothness

$$\begin{aligned} \rho (u):= \rho (X,u):= \sup _{\Vert x\Vert =\Vert y\Vert =1}\left( \frac{1}{2}(\Vert x+uy\Vert +\Vert x-uy\Vert )-1\right) . \end{aligned}$$

The uniformly smooth Banach space is the one with the property

$$\begin{aligned} \lim _{u\rightarrow 0}\rho (u)/u =0. \end{aligned}$$

In this paper we only consider uniformly smooth Banach spaces with power type moduli of smoothness \(\rho (u) \le \gamma u^s\), \(1< s\le 2\). The following bound is a corollary of greedy approximation results (see, for instance [28], p.455).

Theorem 2.4

Let X be s-smooth: \(\rho (X,u) \le \gamma u^s\), \(1<s\le 2\). Then for any normalized system \({{\mathcal {D}}}_N\) of cardinality \(|{{\mathcal {D}}}_N|=N\) we have

$$\begin{aligned} \sigma _m(A_1({{\mathcal {D}}}_N), X) \le C(s)\gamma ^{1/s}m^{1/s-1}. \end{aligned}$$

Note that it is known that in the case \(X=L_p\) we have

$$\begin{aligned} \rho (L_p,u) \le (p-1)u^2/2,\quad 2\le p<\infty . \end{aligned}$$
(2.7)

We now proceed to a special case when \(X=L_p\) and \({{\mathcal {D}}}_N=\Phi _N:=\{\varphi _j\}_{j=1}^N\) is a uniformly bounded Riesz basis of \(X_N:=[\Phi _N]:={\text {span}}(\varphi _1,\dots ,\varphi _N)\). Namely, we assume that \(\Vert \varphi _j\Vert _\infty \le 1\), \(1\le j\le N\) and for any \((a_1,\cdots , a_N) \in {{\mathbb {R}}}^N\)

$$\begin{aligned} R_1 \left( \sum _{j=1}^N |a_j|^2\right) ^{1/2} \le \left\| \sum _{j=1}^N a_j\varphi _j\right\| _2 \le R_2 \left( \sum _{j=1}^N |a_j|^2\right) ^{1/2}, \end{aligned}$$
(2.8)

where \(0< R_1 \le R_2 <\infty \). Assume in addition that for any \(f\in X_N\) we have

$$\begin{aligned} \Vert f\Vert _\infty \le C_0\Vert f\Vert _{\log N}. \end{aligned}$$
(2.9)

Theorem 2.5

Assume that \(\Phi _N\) is a uniformly bounded Riesz basis of \(X_N:=[\Phi _N]\) satisfying (2.9). Then we have

$$\begin{aligned} \varepsilon _k(\Sigma _v^2(\Phi _N),L_\infty ) \le C(R_1,C_0) (\log N) \Bigl (\frac{v}{k}\Bigr )^{1/2},\quad k=1,2,\dots . \end{aligned}$$
(2.10)

Proof

First of all, for any \(f=\sum _{j\in G }a_j\varphi _j\), \(|G|=v\) we get

$$\begin{aligned} \Vert f\Vert _A \le \sum _{j\in G} |a_j| \le v^{1/2} \left( \sum _{j\in G} |a_j|^2\right) ^{1/2} \le R_1^{-1}v^{1/2}\Vert f\Vert _2. \end{aligned}$$
(2.11)

Therefore,

$$\begin{aligned} \Sigma _v^2(\Phi _N) \subset R_1^{-1}v^{1/2}\Sigma _v^A(\Phi _N), \end{aligned}$$

where

$$\begin{aligned} R\Sigma _v^A(\Phi _N):=\{f\in \Sigma _v(\Phi _N):\, \Vert f\Vert _A \le R\}. \end{aligned}$$

By Theorem 2.4 with \(s=2\) and by (2.7) we have that for \(p\in [2,\infty )\)

$$\begin{aligned} \sigma _m(\Sigma _v^2(\Phi _N),L_p) \le C R_1^{-1} v^{1/2} \sqrt{p} m^{-\frac{1}{2}},\quad m=1,2,\cdots ,N. \end{aligned}$$
(2.12)

Thus, Theorem 2.3 implies that for \(p\in [2,\infty )\)

$$\begin{aligned} \varepsilon _k(\Sigma _v^2(\Phi _N),L_p) \le C(R_1) (p \log (2N/k))^{1/2}(v/k)^{1/2},\quad k=1,2,\dots ,N. \end{aligned}$$
(2.13)

Second, by (2.9) we obtain

$$\begin{aligned} \varepsilon _k(\Sigma _v^2(\Phi _N),L_\infty )\le C_0\varepsilon _k(\Sigma _v^2(\Phi _N),L_{\log N}). \end{aligned}$$
(2.14)

Combining (2.13) and (2.14) we get

$$\begin{aligned} \varepsilon _k(\Sigma _v^2(\Phi _N),L_\infty ) \le C(R_1,C_0) (\log N) (v/k)^{1/2},\quad k=1,2,\dots ,N. \end{aligned}$$
(2.15)

Finally, for \(k>N\) we use the inequalities

$$\begin{aligned} \varepsilon _k(W,L_\infty ) \le \varepsilon _N(W,L_\infty )\varepsilon _{k-N}(X_N^\infty ,L_\infty ) \end{aligned}$$

and

$$\begin{aligned} \varepsilon _{n}(X_N^\infty ,L_\infty ) \le 3(2^{-n/N}),\qquad 2^{-x} \le 1/x,\quad x\ge 1, \end{aligned}$$

to obtain (2.15) for all k. This completes the proof. \(\square \)

3 A Step From \(p=2\) to \(1\le p<2\)

In this section we show how Theorem 2.5 proved in Sect. 2 for \(p=2\) can be extended to the case \(1\le p<2\). This extension step is based on a general inequality for the entropy. For convenience, we set \(\Sigma _v({{\mathcal {D}}}_N)=X_N:=[{{\mathcal {D}}}_N]\) for \(v> N\).

Lemma 3.1

For \(v=1,2,\dots , N\), \(1\le p< 2<q\le \infty \), and\(\theta :=(\frac{1}{2}-\frac{1}{q})/(\frac{1}{p}-\frac{1}{q})\) we have for \(\varepsilon >0\)

$$\begin{aligned} {{\mathcal {H}}}_{\varepsilon } (\Sigma _v^p({{\mathcal {D}}}_N); L_{q}) \le \sum _{s=0}^\infty {{\mathcal {H}}}_{ 2^{-3}a^{s-1} \varepsilon ^{\theta } } (\Sigma _{2v}^2({{\mathcal {D}}}_N); L_{q})+{{\mathcal {H}}}_{\varepsilon ^\theta } (\Sigma _{2v}^2({{\mathcal {D}}}_N); L_{q}), \end{aligned}$$
(3.1)

where \(a=a(\theta ) =2^{\frac{\theta }{1-\theta }}\).

Proof

In the case when \(v=N\) Lemma 3.1 was proved in [7, Lemma 3.3]. A slight modification of the proof there works equally well for a general case of \(1\le v\le N\). For completeness, we include the proof of this lemma here. First, we note that for any \(\varepsilon _1, \varepsilon _2>0\)

$$\begin{aligned} {{\mathcal {H}}}_{\varepsilon _1\varepsilon _2} (\Sigma _v^p({{\mathcal {D}}}_N); L_q) \le {{\mathcal {H}}}_{\varepsilon _1} (\Sigma _v^p({{\mathcal {D}}}_N); L_2)+{{\mathcal {H}}}_{\varepsilon _2}(\Sigma _{2v}^2({{\mathcal {D}}}_N); L_q). \end{aligned}$$
(3.2)

To see this, let \(x_1,\cdots , x_{N_1}\in \Sigma _v^p({{\mathcal {D}}}_N)\) and \(y_1,\cdots , y_{N_2}\in \Sigma _{2v}^2({{\mathcal {D}}}_N)\) be such that

$$\begin{aligned} \Sigma _v^p({{\mathcal {D}}}_N) \subset \bigcup _{i=1}^{N_1} \left( x_i + \varepsilon _1 B_{L_2}\right) \ \ \text {and}\ \ \Sigma _{2v}^2({{\mathcal {D}}}_N) \subset \bigcup _{j=1}^{N_2} (y_j + \varepsilon _2 B_{L_q}), \end{aligned}$$

where \(N_1=N_{\varepsilon _1}(\Sigma _v^p({{\mathcal {D}}}_N), L_2)\) and \(N_2=N_{\varepsilon _2}(\Sigma _{2v}^2({{\mathcal {D}}}_N), L_q)\). Since \(\Sigma _v({{\mathcal {D}}}_N) +\Sigma _v({{\mathcal {D}}}_N) \subset \Sigma _{2v}({{\mathcal {D}}}_N)\), we have

$$\begin{aligned} \Sigma _v^p({{\mathcal {D}}}_N)&\subset \bigcup _{i=1}^{N_1} \left( x_i + \varepsilon _1 B_{L_2}\right) \cap \Sigma _v({{\mathcal {D}}}_N) \subset \bigcup _{i=1}^{N_1} \left( x_i + \varepsilon _1 \Sigma _{2v}^2({{\mathcal {D}}}_N)\right) \\&\subset \bigcup _{i=1}^{N_1} \bigcup _{j=1}^{N_2} \left( x_i + \varepsilon _1y_j +\varepsilon _1\varepsilon _2 B_{L_q}\right) . \end{aligned}$$

Inequality (3.2) then follows.

Next, setting \(\varepsilon _1:=\varepsilon ^{1-\theta }\) and \(\varepsilon _2=\varepsilon ^\theta \) in (3.2), we reduce the problem to showing that

$$\begin{aligned} {{\mathcal {H}}}_{\varepsilon _1} (\Sigma _v^p({{\mathcal {D}}}_N); L_2) \le \sum _{s=0}^\infty {{\mathcal {H}}}_{ 2^{-3}a^{s-1} \varepsilon ^{\theta } } (\Sigma _{2v}^2({{\mathcal {D}}}_N); L_{q}). \end{aligned}$$
(3.3)

It will be shown that for \(s=0,1,\ldots ,\)

$$\begin{aligned} {{\mathcal {H}}}_{2^{s} \varepsilon _1} (\Sigma _v^p({{\mathcal {D}}}_N); L_2) -{{\mathcal {H}}}_{2^{s+1} \varepsilon _1} (\Sigma _v^p({{\mathcal {D}}}_N); L_2)\le {{\mathcal {H}}}_{ 2^{-3}a^{s-1} \varepsilon ^{\theta } } (\Sigma _{2v}^2({{\mathcal {D}}}_N); L_{q}) , \end{aligned}$$
(3.4)

from which (3.3) will follow by taking the sum over \(s=0,1,\ldots \)

To show (3.4), for each nonnegative integer s let \({{\mathcal {F}}}_s\subset \Sigma _v^p({{\mathcal {D}}}_N)\) be a maximal \(2^s \varepsilon _1\)-separated subset of \(\Sigma _v^p({{\mathcal {D}}}_N)\) in the metric \(L_2\); that is \(\Vert f-g\Vert _2\ge 2^s\varepsilon _1\) for any two distinct functions \(f, g\in {{\mathcal {F}}}_s\) and \(\Sigma _v^p({{\mathcal {D}}}_N)\subset \bigcup _{f\in {{\mathcal {F}}}_s} B_{L_2} (f, 2^s\varepsilon _1)\). Then

$$\begin{aligned} {{\mathcal {H}}}_{2^s \varepsilon _1}(\Sigma _v^p({{\mathcal {D}}}_N); L_2) \le \log _2 |{{\mathcal {F}}}_s| \le {{\mathcal {H}}}_{2^{s-1}\varepsilon _1}(\Sigma _v^p({{\mathcal {D}}}_N); L_2). \end{aligned}$$
(3.5)

Let \(f_s\in {{\mathcal {F}}}_{s+2}\) be such that

$$\begin{aligned} \left| B_{L_2}(f_s, 2^{s+2} \varepsilon _1)\cap {{\mathcal {F}}}_s\right| =\max _{f\in {{\mathcal {F}}}_{s+2}} \left| B_{L_2}(f, 2^{s+2} \varepsilon _1)\cap {{\mathcal {F}}}_s\right| . \end{aligned}$$

Since

$$\begin{aligned} {{\mathcal {F}}}_s =\bigcup _{f\in {{\mathcal {F}}}_{s+2}}\left( B_{L_2}(f, 2^{s+2} \varepsilon _1)\cap {{\mathcal {F}}}_s\right) \subset \Sigma _v^p({{\mathcal {D}}}_N), \end{aligned}$$

it follows that

$$\begin{aligned} |{{\mathcal {F}}}_s|\le |{{\mathcal {F}}}_{s+2}| \left| B_{L_2}(f_s, 2^{s+2} \varepsilon _1)\cap {{\mathcal {F}}}_s\right| . \end{aligned}$$
(3.6)

Set

$$\begin{aligned} {{\mathcal {A}}}_s:=\left\{ \frac{f-f_s}{2^{s+2} \varepsilon _1}:\ \ f\in B_{L_2} (f_s, 2^{s+2} \varepsilon _1) \cap {{\mathcal {F}}}_s\right\} \subset \Sigma _{2v}({{\mathcal {D}}}_N). \end{aligned}$$

Clearly, for any \(g\in {{\mathcal {A}}}_s\)

$$\begin{aligned} \Vert g\Vert _2\le 1,\ \ \Vert g\Vert _p\le (2^{s+1} \varepsilon _1)^{-1}. \end{aligned}$$
(3.7)

On the one hand, using (3.5) and (3.6), we obtain that

$$\begin{aligned} \log _2|{{\mathcal {A}}}_s|&\ge \log _2|{{\mathcal {F}}}_s|-\log _2|{{\mathcal {F}}}_{s+2}|\nonumber \\&\ge {{\mathcal {H}}}_{2^{s} \varepsilon _1} (\Sigma _v^p({{\mathcal {D}}}_N); L_2) -{{\mathcal {H}}}_{2^{s+1} \varepsilon _1} (\Sigma _v^p({{\mathcal {D}}}_N); L_2) . \end{aligned}$$
(3.8)

On the other hand, since \(\frac{1}{2} =\frac{\theta }{p}+\frac{1-\theta }{q}\), using (3.7) and the fact that \({{\mathcal {F}}}_s\) is \(2^s\varepsilon _1\)-separated in the \(L_2\)-metric, we have that for any two distinct \(g', g\in {{\mathcal {A}}}_s\)

$$\begin{aligned} 2^{-2} \le \Vert g'-g\Vert _2\le \Vert g'-g\Vert _p^\theta \Vert g-g'\Vert _q^{1-\theta }\le \left( 2^{s+1} \varepsilon _1\right) ^{-\theta } \Vert g-g'\Vert _q^{1-\theta }, \end{aligned}$$

which implies that

$$\begin{aligned} \Vert g'-g\Vert _q \ge 2^{-2}(2^{s-1} \varepsilon _1)^{\frac{\theta }{1-\theta }}=2^{-2} a^{s-1} \varepsilon ^\theta . \end{aligned}$$

This together with (3.7) means that \({{\mathcal {A}}}_s\) is a \(2^{-2} a^{s-1} \varepsilon ^\theta \)-separated subset of \(\Sigma _{2v}^2({{\mathcal {D}}}_N)\) in the metric \(L_{q}\). We obtain

$$\begin{aligned} \log _2 |{{\mathcal {A}}}_s| \le {{\mathcal {H}}}_{ 2^{-3} a^{s-1} \varepsilon ^\theta } (\Sigma _{2v}^2({{\mathcal {D}}}_N); L_q). \end{aligned}$$
(3.9)

Thus, combining (3.9) with (3.8), we prove inequality (3.4). \(\square \)

Lemma 3.1 with \(1\le p<2\), \(q=\infty \), \(\theta =p/2\) and Theorem 2.5 imply the following bound for the entropy numbers.

Theorem 3.1

Assume that \(\Phi _N\) is a uniformly bounded Riesz basis of \(X_N:=[\Phi _N]\) satisfying (2.9). Then for \(1\le p\le 2\) we have

$$\begin{aligned} \varepsilon _k(\Sigma _v^p(\Phi _N),L_\infty ) \le C(p,R_1,C_0) (\log N)^{2/p} (v/k)^{1/p},\quad k=1,2,\dots . \end{aligned}$$
(3.10)

4 Proof of Theorem 1.3

Theorem 3.1 provides bounds on the entropy numbers \( \varepsilon _k(\Sigma _v^p(\Phi _N),L_\infty )\) under additional assumption (2.9). Thus, a combination of Theorem 3.1 with Theorem 2.2 implies the statement of Theorem 1.3 under extra assumption (2.9). However, (2.9) is not assumed in Theorem  1.3. Below we give a proof of Theorem  1.3.

We need the following lemma proved in [7].

Lemma 4.1

[7, Lemma 4.3] Let \(1\le p<\infty \) be a fixed number. Assume that \(X_N\) is an N-dimensional subspace of \(L_\infty (\Omega )\) satisfying the following condition: For some parameter \(\beta >0\) and constant \(K\ge 2\)

$$\begin{aligned} \Vert f\Vert _\infty \le (KN)^{\frac{\beta }{p}} \Vert f\Vert _p,\ \ \forall f\in X_N. \end{aligned}$$
(4.1)

Let \(\{\xi _j\}_{j=1}^\infty \) be a sequence of independent random points distributed in accordance with \(\mu \). Then there exists a positive constant \(C_\beta \) depending only on \(\beta \) such that for any \(0< \varepsilon \le \frac{1}{2}\) and any integer

$$\begin{aligned} m\ge C_\beta K^\beta \varepsilon ^{-2}( \log \frac{2}{\varepsilon }) N^{\beta +1}\log N \end{aligned}$$
(4.2)

the inequality

$$\begin{aligned} (1-\varepsilon ) \Vert f\Vert _p^p\le \frac{1}{m} \sum _{j=1}^m| f(\xi _j)|^p \le (1+\varepsilon )\Vert f\Vert _p^p \end{aligned}$$
(4.3)

holds with probability \( \ge 1-m^{-N/\log K}\).

For a set \(\Omega _m:=\{x_1,\cdots , x_m\}\subset \Omega \) and a function \(f:\Omega _m\rightarrow {{\mathbb {R}}}\) we define \(\Vert f\Vert _{L_\infty (\Omega _m)}:=\max _{1\le j\le m} |f(x_j)|\) and

$$\begin{aligned} \Vert f\Vert _{L_p(\Omega _m)}:=\left( \frac{1}{m} \sum _{j=1}^m |f(x_j)|^p\right) ^{\frac{1}{p}}\ \ \text {for }p<\infty . \end{aligned}$$

Now we turn to the proof of Theorem 1.3. Recall that we do not assume (2.9). First, since the Riesz basis \(\Phi _N:=\{\varphi _j\}_{j=1}^{N}\) is uniformly bounded by 1 on \(\Omega \), we have by (2.8)

$$\begin{aligned} \Vert f\Vert _\infty \le R_1^{-1} N^{\frac{1}{2}} \Vert f\Vert _2,\ \ \forall f\in X_N, \end{aligned}$$

which in turn implies that

$$\begin{aligned} \Vert f\Vert _\infty \le C(R_1) N^{\frac{1}{p}} \Vert f\Vert _p,\ \ \forall f\in X_N,\ \ 1\le p\le 2. \end{aligned}$$

Thus, by Lemma 4.1 with \(\beta =1\) there exists a discrete set \(\Omega _{m_1}:=\{\xi ^1,\cdots , \xi ^{m_1}\}\subset \Omega \) with

$$\begin{aligned} C^{-1} N^{2}\log N\le m_1\le C N^{2}\log N \end{aligned}$$

such that for all \(f\in X_N\)

$$\begin{aligned} \frac{4}{5} \Vert f\Vert _p^p\le \Vert f\Vert _{L_p(\Omega _{m_1})}^p \le \frac{6}{5}\Vert f\Vert _p^p \ \ \text {and}\ \ \frac{4}{5} \Vert f\Vert _2^2\le \Vert f\Vert _{L_2(\Omega _{m_1})}^2 \le \frac{6}{5}\Vert f\Vert _2^2, \end{aligned}$$
(4.4)

where \(C>1\) is an absolute constant.

Second, we consider the discrete norm \(\Vert \cdot \Vert _{L_p(\Omega _{m_1})}\) instead of the norm \(\Vert \cdot \Vert _{L_p(\Omega )}\). By (4.4) \(\Phi _N\) is a uniformly bounded Riesz basis of the space \((X_N, \Vert \cdot \Vert _{L_2(\Omega _{m_1})})\) and moreover

$$\begin{aligned} \Vert f\Vert _{L_\infty (\Omega _{m_1})} \le C(p, R_1, R_2) v^{1/p} \Vert f\Vert _{L_p(\Omega _{m_1})},\ \ \forall f\in \Sigma _v(\Phi _N). \end{aligned}$$

Since \(\log m_1 \sim \log N\), by the regular Nikolskii inequality for the norms \(\ell _p^{m_1}\), \(1\le p\le \infty \), we also have

$$\begin{aligned} \Vert f\Vert _{L_\infty (\Omega _{m_1})} \le C \Vert f\Vert _{L_{\log N}(\Omega _{m_1})},\ \ \forall f\in X_N, \end{aligned}$$

where \(C>1\) is an absolute constant. Thus, by Theorem 2.2 and Theorem 3.1 applied to the discrete norm \( \Vert \cdot \Vert _{L_p(\Omega _{m_1})}\) we can find a subset \(\Omega _m\subset \Omega _{m_1}\) with

$$\begin{aligned} m=|\Omega _m| \le C(p, R_1, R_2) v (\log N)^2 (\log (2v))^2 \end{aligned}$$

such that for any \(f\in \Sigma _v(\Phi _N)\)

$$\begin{aligned} \frac{3}{4} \Vert f\Vert _{L_p(\Omega _{m_1})}^p \le \Vert f\Vert _{L_p(\Omega _{m})}^p \le \frac{5}{4} \Vert f\Vert _{L_p,(\Omega _{m_1})}^p. \end{aligned}$$
(4.5)

Combining (4.4) with (4.5), we obtain the stated result of Theorem 1.3.

5 A Refined Version of the Conditional Theorem

Let us first recall some notations. Let \((\Omega , \mu )\) be a probability space. For \(1\le p\le \infty \) denote by \(L_p(\Omega )\) the usual Lebesgue space \(L_p\) defined with respect to the measure \(\mu \) on \(\Omega \) and by \(\Vert \cdot \Vert _p\) the norm of \(L_p(\Omega )\). We also set

$$\begin{aligned} B_{L_p}:=\{f\in L_p(\Omega ):\ \ \Vert f\Vert _p\le 1\},\ \ 1\le p\le \infty . \end{aligned}$$

In this section we prove a refined version of the conditional Theorem 2.2 for sampling discretization of all integral norms \(L_p\) of functions from a more general subset \(\mathcal {W}\subset L_\infty \), which allows us to estimate the number of points needed for the sampling discretization in terms of an integral of the \(\varepsilon \)-entropy \(\mathcal {H}_\varepsilon (\mathcal {W}, L_\infty )\), \(\varepsilon >0\).

Theorem 5.1

Let \(1\le p<\infty \) and let \(\mathcal {W}\) be a set of uniformly bounded functions on \(\Omega \) with

$$\begin{aligned} 1\le R:=\sup _{f\in \mathcal {W}}\sup _{x\in \Omega } |f(x)|<\infty . \end{aligned}$$

Assume that \({{\mathcal {H}}}_{ t}(\mathcal {W},L_\infty )<\infty \) for every \(t>0\), and

$$\begin{aligned} (\lambda \cdot \mathcal {W})\cap B_{L_p} \subset \mathcal {W}\subset B_{L_p},\ \ \ \forall \lambda >0. \end{aligned}$$
(5.1)

Then there exist positive constants \(C_p, c_p\) depending only on p such that for any \(\varepsilon \in (0, 1)\) and any integer

$$\begin{aligned} m\ge C_p \varepsilon ^{-5 }\left( \int _{10^{-1}\varepsilon ^{1/p}} ^{R} u^{\frac{p}{2}-1} \left( \int _{ u}^{ R }\frac{{{\mathcal {H}}}_{c_p \varepsilon t}(\mathcal {W},L_\infty )}{t}\, dt\right) ^{\frac{1}{2}} du\right) ^2, \end{aligned}$$
(5.2)

there exist m points \(x_1,\cdots , x_m\in \Omega \) such that for all \(f\in \mathcal {W}\),

$$\begin{aligned} (1-\varepsilon ) \Vert f\Vert _p^p \le \frac{1}{m} \sum _{j=1}^m |f(x_j)|^p\le (1+\varepsilon ) \Vert f\Vert _p^p. \end{aligned}$$
(5.3)

In particular, Theorem 5.1 allows us to prove refined versions of Theorem 2.2 and Theorem 1.3, where the constants in the Marcinkiewicz type discretization are replaced by \(1-\varepsilon \) and \(1+\varepsilon \) for an arbitrarily given \(\varepsilon \in (0, 1)\).

First, we have the following refined version of Theorem 2.2.

Corollary 5.1

Let \(1\le p<\infty \) and \(1\le v\le N\). Suppose that a dictionary \({{\mathcal {D}}}_N\) is such that

$$\begin{aligned} \varepsilon _k(\Sigma _v^p({{\mathcal {D}}}_N),L_\infty ) \le B_1 (v/k)^{1/p}, \quad k=1,2,\cdots , \end{aligned}$$
(5.4)

where \(B_1\ge 1\). Assume in addition that there exists a constant \(B_2\ge 1\) such that

$$\begin{aligned} \Vert f\Vert _\infty \le B_2 v^{1/p} \Vert f\Vert _p,\ \ \forall f\in \Sigma _v^p({{\mathcal {D}}}_N). \end{aligned}$$
(5.5)

Then for a large enough constant C(p) and any \(\varepsilon \in (0, 1)\) there exist m points \(\xi ^1,\cdots , \xi ^m\in \Omega \) with

$$\begin{aligned} m \le C(p) \varepsilon ^{-5-p} v B_1^{p}(\log (B_2v/\varepsilon ))^2 \end{aligned}$$

such that for any \(f\in \Sigma _v({{\mathcal {D}}}_N)\)

$$\begin{aligned} (1-\varepsilon )\Vert f\Vert _p^p \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)|^p \le (1+\varepsilon ) \Vert f\Vert _p^p. \end{aligned}$$

Proof of Corollary 5.1

We apply Theorem 5.1 to \(\mathcal {W}:=\Sigma _v^p({{\mathcal {D}}}_N)\) and \(R=B_2 v^{1/p}\). It is clear that \(\mathcal {W}\) satisfies (5.1). Furthermore, (5.4) implies that (see by [7, Lemma 2.1])

$$\begin{aligned} {{\mathcal {H}}}_{t} ( \mathcal {W}, L_\infty ) \le C(p) v\cdot (B_1/t)^p,\ \ t>0. \end{aligned}$$
(5.6)

Finally, a straightforward calculation using (5.6) then shows that

$$\begin{aligned}&\varepsilon ^{-5} \left( \int _{10^{-1}\varepsilon ^{1/p}} ^{B_2 v^{1/p}} u^{\frac{p}{2}-1} \left( \int _{ u}^{ B_2 v^{1/p} }\frac{{{\mathcal {H}}}_{c_p \varepsilon t}(\mathcal {W},L_\infty )}{t} \, dt\right) ^{\frac{1}{2}} du\right) ^2\\&\le C (p) \varepsilon ^{-5-p} B_1^p v (\log (B_2 v/\varepsilon ))^2. \end{aligned}$$

Corollary 5.1 then follows from Theorem 5.1. \(\square \)

Using Corollary 5.1 and following the proof in Sect. 4, we can also obtain the \(\varepsilon \)-version of Theorem 1.3.

Corollary 5.2

Let \(\Phi _N\) be a uniformly bounded Riesz basis of \(X_N:={\text {span}}(\Phi _N)\subset L_2(\Omega )\) satisfying (2.8) for some constants \(0<R_1\le R_2\). Let \(1\le p\le 2\) and let \(1\le v\le N\) be an integer. Then for a large enough constant \(C=C(p,R_1,R_2)\) and any \(\varepsilon \in (0, 1)\) there exist m points \(\xi ^1,\cdots , \xi ^m\in \Omega \) with

$$\begin{aligned} m \le C\varepsilon ^{-p-5} v(\log N)^2(\log (2v\varepsilon ^{-1}))^2 \end{aligned}$$

such that for any \(f\in \Sigma _v(\Phi _N)\) we have

$$\begin{aligned} (1-\varepsilon )\Vert f\Vert _p^p \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)|^p \le (1+\varepsilon )\Vert f\Vert _p^p. \end{aligned}$$

The rest of this section is devoted to the proof of Theorem 5.1, which is close to the proof of Theorem  1.3 of [6]. We need the following lemma:

Lemma 5.1

[6, Lemma 2.4] Let \(\{{{\mathcal {F}}}_j\}_{j\in G}\) be a collection of finite sets of bounded functions from \(L_1(\Omega ,\mu )\). Assume that for each \(j\in G\) and all \(f\in {{\mathcal {F}}}_j\) we have

$$\begin{aligned} \Vert f\Vert _1 \le 1,\quad \Vert f\Vert _\infty := \sup _{x\in \Omega }|f(x)| \le M_j. \end{aligned}$$

Suppose that positive numbers \(\eta _j\) \(\in (0,1)\) and a natural number m satisfy the condition

$$\begin{aligned} 2\sum _{j\in G} |{{\mathcal {F}}}_j| \exp \left( -\frac{m\eta _j^2}{8M_j}\right) <1. \end{aligned}$$

Then there exists a set \(\xi =\{\xi ^\nu \}_{\nu =1}^m \subset \Omega \) such that for each \(j\in G\) and for all \(f\in {{\mathcal {F}}}_j\) we have

$$\begin{aligned} \left| \Vert f\Vert _1 - \frac{1}{m}\sum _{\nu =1}^m |f(\xi ^\nu )|\right| \le \eta _j. \end{aligned}$$

Proof of Theorem 5.1

Let

$$\begin{aligned} \mathcal {W}_1:= \{ f/ {\Vert f\Vert _p}:\ \ f\in \mathcal {W}\setminus \{0\}\}. \end{aligned}$$

Clearly, \(\mathcal {W}_1\subset \mathcal {W}\), and it suffices to prove (5.3) for all \(f\in \mathcal {W}_1\). Let \(c^*=c_p^*\in (0, \frac{1}{2})\) be a sufficiently small constant depending only on p. Let \(a:=c^*\varepsilon \). Let \(J, j_0\) be two integers such that \(j_0<0\le J\),

$$\begin{aligned} (1+a)^{J-1}\le R < (1+a)^J\ \ \text {and}\ \ (1+a)^{j_0 p} \le \frac{1}{5} \varepsilon \le (1+a)^{(j_0+1) p }. \end{aligned}$$
(5.7)

For \(j\in {\mathbb {Z}}\), let

$$\begin{aligned} {{\mathcal {A}}}_j:= {\mathcal {N}}_{2a(1+a)^j}(\mathcal {W}_1,L_\infty )\subset \mathcal {W}_1 \end{aligned}$$

denote the minimal \(2a(1+a)^j\)-net of \(\mathcal {W}_1\) in the norm of \(L_\infty \). For \(j\in {\mathbb {Z}}\) and \(f\in \mathcal {W}_1\) we define \(A_j(f)\) to be the function in \( {{{\mathcal {A}}}}_j\) that is closest to f in the \(L_\infty \) norm. Thus, \(\Vert A_j (f)-f\Vert _\infty \le 2a(1+a)^j\) for all \(f\in \mathcal {W}_1\) and \(j\in {\mathbb {Z}}\).

Next, for \(f\in \mathcal {W}_1\) and \(j> j_0\) define

$$\begin{aligned} U_j(f):= \{{\textbf{x}}\in \Omega : |A_j(f)({\textbf{x}})| \ge (1+a)^{j-1}\}, \end{aligned}$$

and

$$\begin{aligned} D_j(f):= U_j(f) \setminus \bigcup _{k\ge j+1} U_k(f). \end{aligned}$$

We also set

$$\begin{aligned} D_{j_0}(f):= \Omega \setminus \bigcup _{k>j_0} U_k(f). \end{aligned}$$

Note that by (5.7) \(U_j(f)=\emptyset \) for \(j> J\). Thus, \(\{D_{j}(f):\ \ j=j_0,\cdots , J\}\) forms a partition of the domain \(\Omega \). Define

$$\begin{aligned} h(f,{\textbf{x}}) := \sum _{j=j_0+1}^{J} (1+a)^j \chi _{D_j(f)}({\textbf{x}}), \end{aligned}$$
(5.8)

where \(\chi _E({\textbf{x}})\) is a characteristic function of a set E.

For \({\textbf{x}}\in D_{j_0} (f)\) we have

$$\begin{aligned} |f({\textbf{x}})|&\le |A_{j_0+1} f({\textbf{x}})| + 2a(1+a)^{j_0+1} \le (1+a)^{j_0} + 2a(1+a)^{j_0+1}\\&\le (1+a)^{j_0} (1+4a), \end{aligned}$$

which in turn implies that

$$\begin{aligned} |f({\textbf{x}})|^p \le (1+a)^{j_0p} (1+4a)^p\le (1+a)^{j_0p} (1+C_p a) \le \frac{\varepsilon }{10} (1+C_p a). \end{aligned}$$

On the other hand, for \({\textbf{x}}\in D_j(f)\) and \(j_0<j\le J\) we have

$$\begin{aligned} |f({\textbf{x}})|&\ge |A_j f ({\textbf{x}})| -2a(1+a)^j \ge (1+a)^j (1-3a)\ \ \text {and}\\ |f({\textbf{x}})|&\le |A_{j+1} f({\textbf{x}})|+2a(1+a)^{j+1} \le (1+a)^j (1+3a), \end{aligned}$$

which implies

$$\begin{aligned} (1+3a)^{-p} |f({\textbf{x}})|^p \le |h(f,{\textbf{x}})|^p\le (1-3a)^{-p} |f({\textbf{x}})|^p. \end{aligned}$$

Therefore, choosing \(c^*=c_p^*\) small enough, we have

$$\begin{aligned} \left( 1-\frac{\varepsilon }{8}\right) |f({\textbf{x}})|^p \le |h(f,{\textbf{x}})|^p \le \left( 1+\frac{\varepsilon }{8}\right) |f({\textbf{x}})|^p,\ \ \ \ \forall {\textbf{x}}\in \bigcup _{j_0<j\le J} D_j(f), \end{aligned}$$
(5.9)

and

$$\begin{aligned} |f({\textbf{x}})|^p\le \frac{\varepsilon }{8},\ \ \forall {\textbf{x}}\in D_{j_0}(f). \end{aligned}$$

In particular, this implies that for any probability measure \(\nu \) on \(\Omega \) and any \(f\in \mathcal {W}_p'\)

$$\begin{aligned} \Bigl | \Vert h(f)\Vert _{L_p(\nu )}^p- \Vert f\Vert _{L_p(\nu )}^p\Bigr | \le \frac{\varepsilon }{8} \Vert f\Vert _{L_p(\nu )}^p+\frac{\varepsilon }{8}. \end{aligned}$$
(5.10)

For \(j_0+1\le j\le J\) let

$$\begin{aligned} {{\mathcal {F}}}_j^p:= \left\{ (1+a)^{pj}\chi _{D_j(f)}:\ f\in \mathcal {W}_1\right\} . \end{aligned}$$

Our aim is to find m points \(\xi ^1, \cdots , \xi ^m\in \Omega \) for each m satisfying (5.2)

so that the following inequality holds for all \(f\in {{\mathcal {F}}}_j^p\) and \( j_0<j\le J\):

$$\begin{aligned}&\Bigl | \frac{1}{m} \sum _{k=1}^m f(\xi ^k) -\int _{\Omega } f(x) \, d\mu (x) \Bigr |\le \varepsilon _j, \end{aligned}$$
(5.11)

where \(\{\varepsilon _j\}_{j=j_0+1}^{J} \subset (0, 1)\) satisfies \(\sum _{j=j_0+1}^{J} \varepsilon _j \le \varepsilon /4\). Once (5.11) is proved, we obtain by (5.8) that

$$\begin{aligned} \Bigl | \frac{1}{m} \sum _{j=1}^m |h(f, \xi ^j)|^p-\Vert h(f)\Vert _p^p\Bigr | \le \frac{\varepsilon }{4}, \end{aligned}$$
(5.12)

which, applying (5.10), will prove the desired inequality (5.3).

To see this, we apply Lemma 5.1 for the collection of the above sets \({{\mathcal {F}}}_j^p\) and notice that for \(j_0<j\le J\)

$$\begin{aligned} \Vert (1+a)^{pj}\chi _{D_j(f)}\Vert _1 \le \Vert h(f)\Vert _p^p\le \Vert f\Vert _p^p +\frac{\varepsilon }{4}\le 2 \end{aligned}$$

and

$$\begin{aligned} \Vert (1+a)^{pj}\chi _{D_j(f)}\Vert _\infty \le (1+a)^{pj} =: M_j. \end{aligned}$$

Thus, by Lemma 5.1 it suffices to show that for each integer m satisfying (5.2) one can find a sequence \(\{\varepsilon _j\}_{j_0<j\le J}\subset (0, 1)\) such that

$$\begin{aligned}&\sum _{j_0<j\le J} \varepsilon _j \le \frac{\varepsilon }{4} \end{aligned}$$
(5.13)
$$\begin{aligned}&\sum _{j=j_0+1}^{J} |{{\mathcal {F}}}_j^p| \exp \Bigl ( - \frac{m\varepsilon _j^2}{8M_j} \Bigr )<\frac{1}{2}. \end{aligned}$$
(5.14)

To this end we need to estimate the cardinalities of the sets \({{\mathcal {F}}}_j^p\). By definition, for each \(j_0< j\le J\), the set \(D_j(f)\) is uniquely determined by the functions \(A_k(f)\in {{\mathcal {A}}}_k\), \(j\le k\le J\). As a result, we have

$$\begin{aligned} |{{\mathcal {F}}}_j^p| \le |{{\mathcal {A}}}_j|\times \cdots \times |{{\mathcal {A}}}_{J}|=:L_j, \end{aligned}$$

and

$$\begin{aligned} \log L_j&\le \sum _{k=j}^{J} \log |{{\mathcal {A}}}_k|\le \sum _{k=j}^{J} {{\mathcal {H}}}_{a(1+a)^k}(\mathcal {W},L_\infty )\nonumber \\&\le \frac{1}{\log (1+a)} \sum _{k=j}^{J} \int _{ a( 1+a)^{k-1}}^{a(1+a)^{k}}{{\mathcal {H}}}_{t}(\mathcal {W},L_\infty )\frac{dt}{t}\nonumber \\&\le C\varepsilon ^{-1} \int _{ ( 1+a)^{j-1}}^{R }{{\mathcal {H}}}_{at}(\mathcal {W},L_\infty )\frac{dt}{t}. \end{aligned}$$
(5.15)

For each \(j_0<j\le J\) we choose \(\varepsilon _j>0\) so that

$$\begin{aligned}&\log (\lambda L_j) = \frac{m\varepsilon _j^2}{16 M_j},\ \ \ \text {that is}\ \ \varepsilon _j:=4\sqrt{M_j} \sqrt{\log (\lambda L_j)} m^{-\frac{1}{2}}, \end{aligned}$$
(5.16)

where \(\lambda >1\) is a large absolute constant to be specified later. Then

$$\begin{aligned} \sum _{j=j_0+1}^{J}\varepsilon _j&=4m^{-1/2} \sum _{j=j_0+1}^{J} (M_j \log (\lambda L_j))^{\frac{1}{2}}\\&\le 4 m^{-1/2} \sqrt{\log \lambda }\sum _{j=j_0+1}^{J} (M_j \log ( L_j))^{\frac{1}{2}} \end{aligned}$$

and hence (5.13) is ensured once

$$\begin{aligned} m\ge \varepsilon ^{-2} \Bigl (\sqrt{\log \lambda } \sum _{j=j_0+1}^{J} (M_j \log (L_j))^{\frac{1}{2}}\Bigr )^2. \end{aligned}$$
(5.17)

However, using (5.15), we have

$$\begin{aligned} \sum _{j=j_0+1}^{J} (M_j \log L_j)^{\frac{1}{2}}&\le C\varepsilon ^{-\frac{1}{2}}\sum _{j=j_0+1}^{J} (1+a)^{pj/2} \Bigl (\int _{ ( 1+a)^{j-1}}^{R }{{\mathcal {H}}}_{at}(\mathcal {W},L_\infty )\frac{dt}{t}\Bigr )^{\frac{1}{2}}\\&\le C_p \varepsilon ^{-\frac{3}{2}}\sum _{j=j_0+1}^{J} \int _{(1+a)^{j-2}} ^{(1+a)^{j-1}} u^{\frac{p}{2}-1} \Bigl (\int _{ u}^{ R }{{\mathcal {H}}}_{at}(\mathcal {W},L_\infty )\frac{dt}{t}\Bigr )^{\frac{1}{2}} du \\&\le C_p \varepsilon ^{-\frac{3}{2}}\int _{10^{-1}\varepsilon ^{1/p}} ^{R} u^{\frac{p}{2}-1} \Bigl (\int _{ u}^{ R }{{\mathcal {H}}}_{c_p \varepsilon t}(\mathcal {W},L_\infty )\frac{dt}{t}\Bigr )^{\frac{1}{2}} du. \end{aligned}$$

This combined with (5.17) implies that (5.13) is ensured by (5.2).

Finally, we prove (5.14). Indeed, using (5.16), we have

$$\begin{aligned}&\sum _{j=j_0+1}^{J} |{{\mathcal {F}}}_j^p| \exp \Bigl ( - \frac{m\varepsilon _j^2}{8M_j} \Bigr )\le \lambda \sum _{j=j_0+1}^{J} L_j \exp \Bigl ( - \frac{m\varepsilon _j^2}{8M_j} \Bigr )\\&\quad =\sum _{j=j_0+1}^{J} \exp \Bigl (\log (\lambda L_j) -\frac{m\varepsilon _j^2}{8M_j} \Bigr ) = \sum _{j=j_0+1}^{J} \exp \Bigl ( - \log (\lambda L_j) \Bigr )=\frac{1}{\lambda }\sum _{j=j_0+1}^{J} \frac{1}{L_j}\\&\quad \le \frac{1}{\lambda } \sum _{j=j_0+1}^{J} \frac{1}{N_{a(1+a)^j}(\mathcal {W}, L_\infty )}, \end{aligned}$$

where the last step uses the fact that

$$\begin{aligned} L_j \ge |{{\mathcal {A}}}_j| =N_{a(1+a)^j}(\mathcal {W}, L_\infty ). \end{aligned}$$

We claim that

$$\begin{aligned} {{\mathcal {H}}}_{t} (\mathcal {W}, L_\infty ) \ge \log \frac{R}{4t},\ \ \ \forall 0<t<R. \end{aligned}$$
(5.18)

To see this, let \(f^*\in \mathcal {W}\) be such that \(\Vert f^*\Vert _\infty =R\). Let \(k=[\frac{R}{t}]\). Define \(f_j=\frac{ 2jt}{R} f^*\) for \(0\le j\le k/2\). Then \(\{f_j\}_{0\le j\le \frac{k}{2}} \subset \mathcal {W}\) is 2t-separated in \(L_\infty \)-norm. It follows that

$$\begin{aligned} N_{t} (\mathcal {W}, L_\infty ) \ge \frac{k}{2} \ge \frac{1}{4} \frac{R}{t}, \end{aligned}$$

which shows (5.18).

Now using (5.18), we obtain

$$\begin{aligned} \frac{1}{\lambda } \sum _{j=j_0+1}^{J} \frac{1}{N_{a(1+a)^j}(\mathcal {W}, L_\infty )} \le C R^{-1} \lambda ^{-1} \sum _{j=j_0+1}^{J} (1+a)^j\le C \lambda ^{-1}<1, \end{aligned}$$

provided that \(\lambda >1\) is large enough. This proves (5.14). \(\square \)

6 Concluding Remarks on Sampling Discretization of \(L_p\) norms for \(2<p<\infty \)

In this section, we give a few remarks on sampling discretization of \(L_p\) norms for \(2<p<\infty \).

  1. 1.

    The following Nikolskii type inequality plays an important role in the proof of Theorem 1.3:

    $$\begin{aligned} \Vert f\Vert _\infty \le C v^{\frac{1}{p}}\Vert f\Vert _p,\ \ \ \forall f\in \Sigma _v(\Phi _N), \end{aligned}$$
    (6.1)

    where the constant C is independent of f, v and N. This inequality holds for \(1\le p\le 2\) whenever \(\Phi _N\) is a uniformly bounded Riesz basis of \(X_N\). However, this is no longer true for \(p>2\). For example, take \(N=2^v\) and consider the system

    $$\begin{aligned} \Phi _N =\{ e^{2\pi i j x}\}_{j=1}^N \end{aligned}$$

    on the interval [0, 1] equipped with the usual Lebesgue measure. By the Littlewood-Paley inequality we have that for \(f(x)=\sum _{j=1}^v e^{2\pi {\textbf{i}} 2^j x}\in \Sigma _v(\Phi _N)\) and \(2<p<\infty \),

    $$\begin{aligned} \Vert f\Vert _\infty =v > C v^{\frac{1}{p}}\Vert f\Vert _p\asymp v^{\frac{1}{2}+\frac{1}{p}}. \end{aligned}$$
  2. 2.

    Let \(\Phi _N\) be a uniformly bounded Riesz basis of \(X_N\subset L_2\) satisfying (2.9). By monotonicity of the \(L_p\) norms, we have that for any integer \(1\le v\le N\),

    $$\begin{aligned} \Sigma ^p_v(\Phi _N)\subset \Sigma ^2_v(\Phi _N),\ \ p>2, \end{aligned}$$

    which in particular implies that

    $$\begin{aligned} \sup _{f\in \Sigma _v^p(\Phi _N)} \Vert f\Vert _\infty \le \sup _{f\in \Sigma _v^2(\Phi _N)} \Vert f\Vert _\infty \le C v^{1/2}. \end{aligned}$$

    Moreover, using Theorem 2.5, we have that for \(p>2\) and all integer \(k\ge 1\),

    $$\begin{aligned} \varepsilon _k(\Sigma _v^p(\Phi _N),L_\infty ) \le \varepsilon _k(\Sigma _v^2(\Phi _N),L_\infty )\le C\cdot (\log N) \Bigl (\frac{v}{k}\Bigr )^{1/2}, \end{aligned}$$
    (6.2)

    which also yields

    $$\begin{aligned} {{\mathcal {H}}}_{t} ( \Sigma _v^p(\Phi _N), L_\infty ) \le C(p) v\cdot \Bigl (\frac{\log N}{t}\Bigr )^2,\ \ \ \forall t>0. \end{aligned}$$
    (6.3)

    On the other hand, a straightforward calculation shows that for any \(\varepsilon \in (0, 1)\) and \(p>2\),

    $$\begin{aligned}&\varepsilon ^{-5} \left( \int _{10^{-1}\varepsilon ^{1/p}} ^{C v^{1/2}} u^{\frac{p}{2}-1} \Bigl (\int _{ u}^{ C v^{1/2} }\frac{{{\mathcal {H}}}_{c_p \varepsilon t}( \Sigma _v^p(\Phi _N),L_\infty )}{t} \, dt\Bigr )^{\frac{1}{2}} du\right) ^2\\&\quad \le C (p) \varepsilon ^{-7} v^{p/2} (\log N)^2. \end{aligned}$$

    Thus, an application of Theorem 5.1 leads to

Theorem 6.1

Assume that \(\Phi _N\) is a uniformly bounded Riesz basis of \(X_N:={\text {span}}(\Phi _N)\) satisfying (2.8) for some constants \(0<R_1\le R_2\). Let \(2<p<\infty \) and let \(1\le v\le N\) be an integer. Then for a large enough constant \(C=C(p,R_1,R_2)\) and any \(\varepsilon \in (0, 1)\) there exist m points \(\xi ^1,\cdots , \xi ^m\in \Omega \) with

$$\begin{aligned} m\le C\varepsilon ^{-7} v^{p/2} (\log N)^2 \end{aligned}$$

such that for any \(f\in \Sigma _v(\Phi _N)\)

$$\begin{aligned} (1-\varepsilon ) \Vert f\Vert _p^p \le \frac{1}{m} \sum _{j=1}^m |f(\xi ^j)|^p\le (1+\varepsilon ) \Vert f\Vert _p^p. \end{aligned}$$