1 Introduction

Discretization is a very important step in making a continuous problem computationally feasible. The problem of construction of good sets of points in a multidimensional domain is a fundamental problem of mathematics and computational mathematics. A prominent example of a classical discretization problem is a problem of metric entropy (covering numbers, entropy numbers). Bounds for the entropy numbers of function classes are important by themselves and also have important connections to other fundamental problems (see, for instance, [15, Ch. 3] and [3, Ch. 6]). Another prominent example of a discretization problem is the problem of numerical integration. Numerical integration in the mixed smoothness classes requires deep number theoretical results for constructing optimal (in the sense of order) cubature formulas (see, for instance, [3, Ch. 8]). A typical approach to solving a continuous problem numerically—the Galerkin method—suggests looking for an approximate solution from a given finite dimensional subspace. A standard way to measure an error of approximation is an appropriate \(L_q\) norm, \(1\le q\le \infty \). Thus, the problem of discretization of the \(L_q\) norms of functions from a given finite dimensional subspace arises in a very natural way.

The main goal of this paper is to study the discretization problem for a finite dimensional subspace \(X_N\) of a Banach space X. We are interested in discretizing the \(L_q\), \(1\le q\le \infty \), norm of elements of \(X_N\). We call such results the Marcinkiewicz-type discretization theorems because the first result in this direction was obtained by Marcinkiewicz (see [23, Ch.10, §7]). He proved the following inequalities for the univariate trigonometric polynomials f of degree n: for \(1<q<\infty \) there are two positive constants \(C_1(q)\) and \(C_2(q)\) such that for the \(L_q\) norm \(\Vert f\Vert _q\) of these polynomials, we have

$$\begin{aligned} C_1(q)\Vert f\Vert _q^q \le \frac{1}{2n+1}\sum _{\nu =1}^{2n+1} \left| f\left( \frac{2\pi \nu }{2n+1}\right) \right| ^q \le C_2(q)\Vert f\Vert _q^q. \end{aligned}$$

There are different settings and different ingredients that play important roles in this problem. We now discuss these issues.

Marcinkiewicz Problem Let \(\Omega \) be a compact subset of \({\mathbb {R}}^d\) with the probability measure \(\mu \). We say that a linear subspace \(X_N\) of the \(L_q(\Omega ):=L_q(\Omega ,\mu )\), \(1\le q < \infty \), admits the Marcinkiewicz-type discretization theorem with parameters m and q if there exist a set \(\{\xi ^\nu \in \Omega , \nu =1,\dots ,m\}\) and two positive constants \(C_j(d,q)\), \(j=1,2\), such that for any \(f\in X_N\), we have

$$\begin{aligned} C_1(d,q)\Vert f\Vert _q^q \le \frac{1}{m} \sum _{\nu =1}^m |f(\xi ^\nu )|^q \le C_2(d,q)\Vert f\Vert _q^q. \end{aligned}$$
(1.1)

In the case \(q=\infty \), we define \(L_\infty \) as the space of continuous on \(\Omega \) functions and require

$$\begin{aligned} C_1(d)\Vert f\Vert _\infty \le \max _{1\le \nu \le m} |f(\xi ^\nu )| \le \Vert f\Vert _\infty . \end{aligned}$$
(1.2)

We will also use a brief way to express the above property: the \(\mathcal {M}(m,q)\) theorem holds for a subspace \(X_N\) or \(X_N \in \mathcal {M}(m,q)\).

Numerical Integration Problem In the case \(1\le q<\infty \), the above problem can be reformulated as a problem on numerical integration of special classes of functions. Define a class \(|X_N|^q := \{|f|^q: f\in X_N, \Vert f\Vert _q \le 1\}\), and consider the numerical integration problem: for a given \(\varepsilon >0\), find \(m = m(N,q,\varepsilon )\) such that

$$\begin{aligned} \inf _{\xi ^1,\dots ,\xi ^m}\sup _{f\in X_N,\Vert f\Vert _q\le 1} \left| \frac{1}{m}\sum _{\nu =1}^m |f(\xi ^\nu )|^q -\Vert f\Vert _q^q\right| \le \varepsilon . \end{aligned}$$
(1.3)

In (1.3) we limit our search for good numerical integration methods to cubature formulas with equal weights 1 / m. These special kinds of cubature formulas are called Quasi-Monte Carlo methods. In numerical integration, general cubature formulas (with weights) are also very important. In this case the above problem (1.3) is reformulated as follows:

$$\begin{aligned} \inf _{\xi ^1,\dots ,\xi ^m;\lambda _1,\dots ,\lambda _m}\sup _{f\in X_N,\Vert f\Vert _q\le 1} \left| \sum _{\nu =1}^m\lambda _\nu |f(\xi ^\nu )|^q -\Vert f\Vert _q^q\right| \le \varepsilon . \end{aligned}$$
(1.4)

Thus, in this case we are optimizing both over the knots \(\xi ^1,\dots ,\xi ^m\) and over the weights \(\lambda _1,\dots ,\lambda _m\).

Marcinkiewicz Problem with Weights The above remark on numerical integration encourages us to consider the following variant of the Marcinkiewicz problem. We say that a linear subspace \(X_N\) of the \(L_q(\Omega )\), \(1\le q < \infty \), admits the weighted Marcinkiewicz-type discretization theorem with parameters m and q if there exist a set of knots \(\{\xi ^\nu \in \Omega \}\), a set of weights \(\{\lambda _\nu \}\), \(\nu =1,\dots ,m\), and two positive constants \(C_j(d,q)\), \(j=1,2\), such that for any \(f\in X_N\), we have

$$\begin{aligned} C_1(d,q)\Vert f\Vert _q^q \le \sum _{\nu =1}^m \lambda _\nu |f(\xi ^\nu )|^q \le C_2(d,q)\Vert f\Vert _q^q. \end{aligned}$$
(1.5)

Then we also say that the \(\mathcal {M}^w(m,q)\) theorem holds for a subspace \(X_N\) or \(X_N \in \mathcal {M}^w(m,q)\). Obviously, \(X_N\in \mathcal {M}(m,q)\) implies that \(X_N\in \mathcal {M}^w(m,q)\).

Marcinkiewicz Problem with \(\varvec{\varepsilon }\) We write \(X_N\in \mathcal {M}(m,q,\varepsilon )\) if (1.1) holds with \(C_1(d,q)=1-\varepsilon \) and \(C_2(d,q)=1+\varepsilon \). Respectively, we write \(X_N\in \mathcal {M}^w(m,q,\varepsilon )\) if (1.5) holds with \(C_1(d,q)=1-\varepsilon \) and \(C_2(d,q)=1+\varepsilon \). We note that the most powerful results are for \(\mathcal {M}(m,q,0)\), when the \(L_q\) norm of \(f\in X_N\) is discretized exactly by the formula with equal weights 1 / m.

In this paper we mostly concentrate on the Marcinkiewicz problem and on its variant with \(\varepsilon \). Our main results are for \(q=1\). We now make some general remarks for the case \(q=2\) that illustrate the problem. We discuss the case \(q=2\) in more detail in Sect. 6. We describe the properties of the subspace \(X_N\) in terms of a system \(\mathcal {U}_N:=\{u_i\}_{i=1}^N\) of functions such that \(X_N = {\text {span}}\{u_i, i=1,\dots ,N\}\). In the case \(X_N \subset L_2\), we assume that the system is orthonormal on \(\Omega \) with respect to measure \(\mu \). In the case of real functions we associate with \(x\in \Omega \) the matrix \(G(x) := [u_i(x)u_j(x)]_{i,j=1}^N\). Clearly, G(x) is a symmetric positive semi-definite matrix of rank 1. It is easy to see that for a set of points \(\xi ^k\in \Omega \), \(k=1,\dots ,m\), and \(f=\sum _{i=1}^N b_iu_i\), we have

$$\begin{aligned} \sum _{k=1}^m\lambda _k f(\xi ^k)^2 - \int _\Omega f(x)^2 \mathrm{d}\mu = {\mathbf b}^T\left( \sum _{k=1}^m \lambda _k G(\xi ^k)-I\right) {\mathbf b}, \end{aligned}$$

where \({\mathbf b} = (b_1,\dots ,b_N)^T\) is the column vector and I is the identity matrix. Therefore, the \(\mathcal {M}^w(m,2)\) problem is closely connected to a problem of approximation (representation) of the identity matrix I by an m-term approximant with respect to the system \(\{G(x)\}_{x\in \Omega }\). It is easy to understand that under our assumptions on the system \(\mathcal {U}_N\), there exist a set of knots \(\{\xi ^k\}_{k=1}^m\) and a set of weights \(\{\lambda _k\}_{k=1}^m\), with \(m\le N^2\), such that

$$\begin{aligned} I = \sum _{k=1}^m \lambda _k G(\xi ^k), \end{aligned}$$

and, therefore, we have for any \(X_N \subset L_2\) that

$$\begin{aligned} X_N \in \mathcal {M}^w(N^2,2,0). \end{aligned}$$
(1.6)

However, we do not know a characterization of those \(X_N\) for which \(X_N \in \mathcal {M}(N^2,2,0)\).

In the above formulations of the problems we only ask about existence of either good \(\{\xi ^\nu \}\) or good \(\{\xi ^\nu ,\lambda _\nu \}\). Certainly, it is important to have either explicit constructions of good \(\{\xi ^\nu \}\) (\(\{\xi ^\nu ,\lambda _\nu \}\)) or deterministic ways to construct good \(\{\xi ^\nu \}\) (\(\{\xi ^\nu ,\lambda _\nu \}\)). Thus, the Marcinkiewicz-type problem can be split into the following four problems: Under some assumptions on \(X_N\),

  1. (I)

    Find a condition on m for \(X_N \in \mathcal {M}(m,q)\);

  2. (II)

    Find a condition on m for \(X_N \in \mathcal {M}^w(m,q)\);

  3. (III)

    Find a condition on m such that there exists a deterministic construction of \(\{\xi ^\nu \}_{\nu =1}^m\) satisfying (1.1) for all \(f\in X_N\);

  4. (IV)

    Find a condition on m such that there exists a deterministic construction of \(\{\xi ^\nu ,\lambda _\nu \}_{\nu =1}^m\) satisfying (1.5) for all \(f\in X_N\).

The main results of this paper address the problem (I) in the case \(q=1\). Our method is probabilistic.

We impose the following assumptions on the system \(\{u_i\}_{i=1}^N\) of real functions:

  • A. There exist \(\alpha >0\), \(\beta \), and \(K_1\) such that for all \(i\in [1,N]\), we have

    $$\begin{aligned} |u_i(\mathbf x)-u_i(\mathbf y)| \le K_1N^\beta \Vert \mathbf x-\mathbf y\Vert _\infty ^\alpha ,\quad \mathbf x,\mathbf y\in \Omega . \end{aligned}$$
    (1.7)
  • B. There exists a constant \(K_2\) such that \(\Vert u_i\Vert _\infty ^2 \le K_2\), \(i=1,\dots ,N\).

  • C. Define \(X_N:= {\text {span}}(u_1,\dots ,u_N)\). There exist two constants \(K_3\) and \(K_4\) such that the following Nikol’skii-type inequality holds for all \(f\in X_N\):

    $$\begin{aligned} \Vert f\Vert _\infty \le K_3N^{K_4/p}\Vert f\Vert _p,\quad p\in [2,\infty ). \end{aligned}$$
    (1.8)

The main result of this paper is the following theorem (see Theorem 5.9).

Theorem 1.1

Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then for large enough \(C_1=C(d,K_1,K_2,K_3,K_4,\Omega ,\alpha ,\beta )\), there exists a set of \(m \le C_1N(\log N)^{7/2}\) points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), such that for any \(f\in X_N\), we have

$$\begin{aligned} \frac{1}{2}\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le \frac{3}{2}\Vert f\Vert _1. \end{aligned}$$

An important particular case for application of Theorem 1.1 is the case when \(X_N\) is a subspace of trigonometric polynomials. For a finite \(Q\subset \mathbb {Z}^d\), set

$$\begin{aligned} \mathcal {T}(Q) :=\left\{ f: f(\mathbf x) = \sum _{\mathbf k\in Q}c_\mathbf ke^{i(\mathbf k,\mathbf x)}\right\} . \end{aligned}$$

The hyperbolic cross polynomials \(\mathcal {T}(Q_n)\) are of special interest (see, for instance, [3]): let \(\mathbf s\in \mathbb {Z}^d_+\),

$$\begin{aligned} Q_n := \cup _{\Vert \mathbf s\Vert _1 \le n} \rho (\mathbf s), \end{aligned}$$

and

$$\begin{aligned} \rho (\mathbf s):= \{\mathbf k=(k_1,\dots ,k_d)\in \mathbb {Z}^d_+ : [2^{s_j-1}]\le |k_j|<2^{s_j},\quad j=1,\dots ,d\}, \end{aligned}$$

where [a] denotes the integer part of a number a.

The following two theorems were proved in [21]. Denote by \(\mathbb {T}^d\) the d-dimensional torus.

Theorem 1.2

Let \(d=2\). For any \(n\in {\mathbb {N}}\) and large enough absolute constant \(C_1\), there exists a set of \(m \le C_1|Q_n|n^{7/2}\) points \(\xi ^j\in \mathbb {T}^2\), \(j=1,\dots ,m\) such that for any \(f\in \mathcal {T}(Q_n)\), we have

$$\begin{aligned} C_2\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le C_3\Vert f\Vert _1. \end{aligned}$$

Theorem 1.3

For any \(d\in {\mathbb {N}}\) and \(n\in {\mathbb {N}}\) for large enough absolute constant \(C_1(d)\), there exists a set of \(m \le C_1(d)|Q_n|n^{d/2+3}\) points \(\xi ^j\in \mathbb {T}^d\), \(j=1,\dots ,m\) such that for any \(f\in \mathcal {T}(Q_n)\), we have

$$\begin{aligned} C_2\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le C_3\Vert f\Vert _1. \end{aligned}$$

Theorem 1.2 addresses the case \(d=2\), and Theorem 1.3 extends Theorem 1.2 to the case of all d. We point out that for \(d=2\), Theorem 1.3 is weaker than Theorem 1.2. Theorem 1.1 gives Theorem 1.2 and improves Theorem 1.3 by replacing an extra factor \(n^{d/2+3}\) by \(n^{7/2}\) in the bound for m. The technique for proving Theorem 1.1 presented in this paper is a development of a technique from [21]. It is a combination of probabilistic technique, based on chaining, and results on the entropy numbers. We present this technique in the following way. In Sect. 2 we prove a conditional theorem (Theorem 2.1), which provides \(X_N \in \mathcal {M}(m,1)\) with an appropriate m under conditions on the entropy numbers of the \(L_1\) ball \(X^1_N := \{f\in X_N:\Vert f\Vert _1 \le 1\}\). The proof of this conditional theorem is based on the chaining technique. In Sect. 3 we discuss new elements of a method that gives good upper bounds for the entropy numbers of the unit \(L_1\) ball \(\mathcal {T}(Q)_1\) in the \(L_\infty \) norm. In Sect. 4 we present results on the Marcinkiewicz-type theorems for the trigonometric polynomials. In Sect. 5 we show how the technique developed in Sect. 3 for the trigonometric polynomials can be generalized for subspaces \(X_N\) satisfying conditions A, B, and C. The main results of the paper are in Sects. 25. They are about discretization theorems in \(L_1\). In Sect. 6 we give some comments on the discretization theorems in \(L_2\). This case (the \(L_2\) case) is much better understood than the \(L_1\) case, and it has nice connections to recent strong results on submatrices of orthogonal matrices and on random matrices.

2 Conditional Theorem

We begin with the definition of the entropy numbers. Let X be a Banach space, and let \(B_X\) denote the unit ball of X with the center at 0. Denote by \(B_X(y,r)\) a ball with center y and radius r: \(\{x\in X:\Vert x-y\Vert \le r\}\). For a compact set A and a positive number \(\varepsilon \), we define the covering number \(N_\varepsilon (A)\) as follows:

$$\begin{aligned} N_\varepsilon (A) := N_\varepsilon (A,X) :=\min \left\{ n : \exists y^1,\dots ,y^n, y^j\in A :A\subseteq \cup _{j=1}^n B_X(y^j,\varepsilon )\right\} . \end{aligned}$$

It is convenient to consider along with the entropy \(H_\varepsilon (A,X):= \log _2 N_\varepsilon (A,X)\) the entropy numbers \(\varepsilon _k(A,X)\):

$$\begin{aligned} \varepsilon _k(A,X) :=\inf \left\{ \varepsilon : \exists y^1,\dots ,y^{2^k} \in A : A \subseteq \cup _{j=1} ^{2^k} B_X(y^j,\varepsilon )\right\} . \end{aligned}$$

In our definition of \(N_\varepsilon (A)\) and \(\varepsilon _k(A,X)\) we require \(y^j\in A\). In a standard definition of \(N_\varepsilon (A)\) and \(\varepsilon _k(A,X)\) this restriction is not imposed. However, it is well known (see [15, p. 208]) that these characteristics may differ at most by a factor of 2.

Theorem 2.1

Suppose that a subspace \(X_N\) satisfies the condition \((B\ge 1)\)

$$\begin{aligned} \varepsilon _k(X^1_N,L_\infty ) \le B\left\{ \begin{array}{ll} N/k, &{}\quad k\le N,\\ 2^{-k/N},&{}\quad k\ge N.\end{array} \right. \end{aligned}$$

Then for large enough absolute constant C, there exists a set of

$$\begin{aligned} m \le CNB(\log _2(2N\log _2(8B)))^2 \end{aligned}$$

points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), such that for any \(f\in X_N\), we have

$$\begin{aligned} \frac{1}{2}\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le \frac{3}{2}\Vert f\Vert _1. \end{aligned}$$

Proof

Our proof uses the chaining technique and proceeds along the lines of the proof of Theorem 3.1 from [21]. We use the following results from [21]. Lemma 2.1 is from [2].

Lemma 2.1

Let \(\{g_j\}_{j=1}^m\) be independent random variables with \(\mathbb {E}g_j=0\), \(j=1,\dots ,m\), which satisfy

$$\begin{aligned} \Vert g_j\Vert _1\le 2,\quad \Vert g_j\Vert _\infty \le M,\quad j=1,\dots ,m. \end{aligned}$$

Then for any \(\eta \in (0,1)\), we have the following bound on the probability:

$$\begin{aligned} \mathbb {P}\left\{ \left| \sum _{j=1}^m g_j\right| \ge m\eta \right\} < 2\exp \left( -\frac{m\eta ^2}{8M}\right) . \end{aligned}$$

We now consider measurable functions \(f(\mathbf x)\), \(\mathbf x\in \Omega \). For \(1\le q<\infty \), define

$$\begin{aligned} L^q_\mathbf z(f) := \frac{1}{m}\sum _{j=1}^m |f(\mathbf x^j)|^q -\Vert f\Vert _q^q,\quad \mathbf z:= (\mathbf x^1,\dots ,\mathbf x^m). \end{aligned}$$

Let \(\mu \) be a probabilistic measure on \(\Omega \). Denote by \(\mu ^m := \mu \times \cdots \times \mu \) the probabilistic measure on \(\Omega ^m := \Omega \times \cdots \times \Omega \). We will need the following inequality, which is a corollary of Lemma 2.1 (see [21]).

Proposition 2.1

Let \(f_j\in L_1(\Omega )\) be such that

$$\begin{aligned} \Vert f_j\Vert _1 \le 1/2,\quad j=1,2;\quad \Vert f_1-f_2\Vert _\infty \le \delta . \end{aligned}$$

Then

$$\begin{aligned} \mu ^m\left\{ \mathbf z: \left| L^1_\mathbf z(f_1) -L^1_\mathbf z(f_2)\right| \ge \eta \right\} < 2\exp \left( -\frac{m\eta ^2}{16\delta }\right) . \end{aligned}$$
(2.1)

We consider the case where X is \({\mathcal {C}}(\Omega )\) the space of functions continuous on a compact subset \(\Omega \) of \({\mathbb {R}}^d\) with the norm

$$\begin{aligned} \Vert f\Vert _\infty := \sup _{\mathbf x\in \Omega }|f(\mathbf x)|. \end{aligned}$$

Sometimes we write \(L_\infty \) instead of \({\mathcal {C}}\) for the space of continuous functions. We use the abbreviated notation

$$\begin{aligned} \varepsilon _n(W):= \varepsilon _n(W,{\mathcal {C}}). \end{aligned}$$

In our case,

$$\begin{aligned} W:= \{t\in X_N: \Vert t\Vert _1 = 1/2\}. \end{aligned}$$
(2.2)

Set

$$\begin{aligned} \varepsilon _k:= \frac{B}{2}\left\{ \begin{array}{ll} N/k, &{}\quad k\le N,\\ 2^{-k/N},&{}\quad k\ge N.\end{array} \right. \end{aligned}$$

Specify \(\eta =1/4\). Define \(\delta _j := \varepsilon _{2^j}\), \(j=0,1,\dots \), and consider minimal \(\delta _j\)-nets \({\mathcal {N}}_j \subset W\) of W in \({\mathcal {C}}(\Omega )\). We use the notation \(N_j:= |{\mathcal {N}}_j|\). Let \(J:=J(N,B)\) be the minimal j satisfying \(\delta _j \le 1/16\). For \(j=1,\dots ,J\), we define a mapping \(A_j\) that associates with a function \(f\in W\) a function \(A_j(f) \in {\mathcal {N}}_j\) closest to f in the \({\mathcal {C}}\) norm. Then, clearly,

$$\begin{aligned} \Vert f-A_j(f)\Vert _\infty \le \delta _j. \end{aligned}$$

We use the mappings \(A_j\), \(j=1,\dots , J\), to associate with a function \(f\in W\) a sequence (a chain) of functions \(f_J, f_{J-1},\dots , f_1\) in the following way:

$$\begin{aligned} f_J := A_J(f),\quad f_j:=A_j(f_{j+1}),\quad j=J-1,\dots ,1. \end{aligned}$$

Let us find an upper bound for J, defined above. Our assumption that \(B\ge 1\) and the definition of J imply that \(2^J\ge N\) and

$$\begin{aligned} B2^{-2^{J-1}/N} \ge 1/8. \end{aligned}$$
(2.3)

We derive from (2.3),

$$\begin{aligned} 2^J \le 2N\log (8B),\quad J \le \log (2N\log (8B)). \end{aligned}$$
(2.4)

Set

$$\begin{aligned} \eta _j := \frac{1}{8J},\quad j=1,\dots ,J. \end{aligned}$$

We now proceed to the estimate of \(\mu ^m\{\mathbf z:\sup _{f\in W}|L^1_\mathbf z(f)|\ge 1/4\}\). First of all, by the following simple Proposition 2.2, the assumption \(\delta _J\le 1/16\) implies that if \(|L^1_\mathbf z(f)| \ge 1/4\), then \(|L^1_\mathbf z(f_J)|\ge 1/8\).

Proposition 2.2

If \(\Vert f_1-f_2\Vert _\infty \le \delta \), then

$$\begin{aligned} \left| L^1_\mathbf z(f_1)-L^1_\mathbf z(f_2)\right| \le 2\delta . \end{aligned}$$

Rewriting

$$\begin{aligned} L^1_\mathbf z(f_J) = L^1_\mathbf z(f_J)-L^1_\mathbf z(f_{J-1}) +\dots +L^1_\mathbf z(f_{2})-L^1_\mathbf z(f_1)+L^1_\mathbf z(f_1), \end{aligned}$$

we conclude that if \(|L^1_\mathbf z(f)| \ge 1/4\), then at least one of the following events occurs:

$$\begin{aligned} \left| L^1_\mathbf z(f_j)-L^1_\mathbf z(f_{j-1})\right| \ge \eta _j\quad \text {for some }j\in (1,J] \quad \text {or}\quad \left| L^1_\mathbf z(f_1)\right| \ge \eta _1. \end{aligned}$$

Therefore,

$$\begin{aligned}&\mu ^m\left\{ \mathbf z:\sup _{f\in W}\left| L^1_\mathbf z(f)\right| \ge 1/4\right\} \le \mu ^m\left\{ \mathbf z:\sup _{f\in {\mathcal {N}}_1}\left| L^1_\mathbf z(f)\right| \ge \eta _1\right\} \nonumber \\&\quad +\sum _{j\in (1,J]}\sum _{f\in {\mathcal {N}}_j}\mu ^m \left\{ \mathbf z:\left| L^1_\mathbf z(f)-L^1_\mathbf z(A_{j-1}(f))\right| \ge \eta _j\right\} \nonumber \\&\quad \le \mu ^m\left\{ \mathbf z:\sup _{f\in {\mathcal {N}}_1}\left| L^1_\mathbf z(f)\right| \ge \eta _1\right\} \nonumber \\&\qquad +\sum _{j\in (1,J]} N_j\sup _{f\in W}\mu ^m \left\{ \mathbf z:\left| L^1_\mathbf z(f)-L^1_\mathbf z(A_{j-1}(f))\right| \ge \eta _j\right\} . \end{aligned}$$
(2.5)

Applying Proposition 2.1, we obtain

$$\begin{aligned} \sup _{f\in W} \mu ^m\left\{ \mathbf z:\left| L^1_\mathbf z(f)-L^1_\mathbf z(A_{j-1}(f))\right| \ge \eta _j\right\} \le 2\exp \left( -\frac{m\eta _j^2}{16\delta _{j-1}}\right) . \end{aligned}$$

We now make further estimates for a specific \(m\ge C_1NBJ^2\) with large enough \(C_1\). For j such that \(2^j\le N\), we obtain from the definition of \(\delta _j\),

$$\begin{aligned} \frac{m\eta _j^2}{\delta _{j-1}} \ge \frac{C_1NBJ^22^{j-1}}{32J^2BN} \ge \frac{C_1}{64}2^j. \end{aligned}$$

By our choice of \(\delta _j=\varepsilon _{2^j}\), we get \(N_j\le 2^{2^j} <e^{2^j}\) and, therefore,

$$\begin{aligned} 2N_j\exp \left( -\frac{m\eta _j^2}{16\delta _{j-1}}\right) \le \exp (-2^j) \end{aligned}$$
(2.6)

for sufficiently large \(C_1\).

In the case \(2^j\in (N, 2^J]\), we have

$$\begin{aligned} \frac{m\eta _j^2}{\delta _{j-1}} \ge \frac{C_1NBJ^2}{32J^2B2^{-2^{j-1}/N}} \ge \frac{C_1}{C'}2^j \end{aligned}$$

and

$$\begin{aligned} 2N_j\exp \left( -\frac{m\eta _j^2}{16\delta _{j-1}}\right) \le \exp (-2^j) \end{aligned}$$
(2.7)

for sufficiently large \(C_1\).

We now estimate \(\mu ^m\{\mathbf z:\sup _{f\in {\mathcal {N}}_1}|L^1_\mathbf z(f)|\ge \eta _1\}\). We use Lemma 2.1 with \(g_j(\mathbf z) = |f(\mathbf x^j)|-\Vert f\Vert _1\). To estimate \(\Vert g_j\Vert _\infty \), we use that by our assumption, \(\varepsilon _1:= \varepsilon _1(X_N^1,{\mathcal {C}}) \le BN\). This means that there exist two points \(y^1\) and \(y^2\) in \(X_N^1\) such that \(X_N^1\subset B_{\mathcal {C}}(y^1,\varepsilon _1)\cup B_{\mathcal {C}}(y^2,\varepsilon _1)\). This implies that the 0 element belongs to one of these balls, say, \(B_{\mathcal {C}}(y^1,\varepsilon _1)\), and, therefore, \(\Vert y^1\Vert _\infty \le \varepsilon _1\). Next, \((y^1+y^2)/2\) belongs to one of those balls, and, therefore, \(\Vert y^1-y^2\Vert _\infty \le 2\varepsilon _1\). Thus, \(\Vert y^i\Vert _\infty \le 3\varepsilon _1\), \(i=1,2\). This implies that for any \(f\in W\), we have

$$\begin{aligned} \Vert f\Vert _\infty \le \frac{1}{2}(4\varepsilon _1) \le 2NB. \end{aligned}$$

Then Lemma 2.1 gives

$$\begin{aligned} \mu ^m\left\{ \mathbf z:\sup _{f\in {\mathcal {N}}_1}\left| L^1_\mathbf z(f)\right| \ge \eta _1\right\} \le N_1\exp \left( -\frac{m\eta _1^2}{CNB}\right) \le 1/4 \end{aligned}$$

for sufficiently large \(C_1\). Substituting the above estimates into (2.5), we obtain

$$\begin{aligned} \mu ^m\left\{ \mathbf z:\sup _{f\in W}\left| L^1_\mathbf z(f)\right| \ge 1/4\right\} <1. \end{aligned}$$

Therefore, there exists \(\mathbf z_0=(\xi ^1,\dots ,\xi ^m)\) such that for any \(f\in W\), we have

$$\begin{aligned} \left| L^1_{\mathbf z_0}(f)\right| \le 1/4. \end{aligned}$$

Taking into account that \(\Vert f\Vert _1=1/2\), for \(f\in W\) we obtain the statement of Theorem 4.1 with \(C_2 =1/2\), \(C_3=3/2\). \(\square \)

In the above proof of Theorem 2.1, we specified \(\eta =1/4\). We can carry out that proof for \(\eta \in (0,1/4]\). In this case we define \(J:=J(N,B,\eta )\) to be the minimal j satisfying \(\delta _j \le \eta /4\) and set \(\eta _j:= \frac{\eta }{2J}\), \(j=1,\dots , J\). Then under assumptions \(\eta \ge 2^{-N}\) and \(\log (2B) \le N\), we obtain the following bound on J:

$$\begin{aligned} J\le 2 \log (2N) \end{aligned}$$

instead of (2.4). Further, we make the estimates for \(m\ge C_1NBJ^2 \eta ^{-2}\). This modification of Theorem 2.1 gives the following version of Theorem 2.1.

Theorem 2.2

Suppose that a subspace \(X_N\) satisfies the condition \((B\ge 1)\)

$$\begin{aligned} \varepsilon _k(X^1_N,L_\infty ) \le B\left\{ \begin{array}{ll} N/k, &{}\quad k\le N,\\ 2^{-k/N},&{}\quad k\ge N.\end{array} \right. \end{aligned}$$

Then for large enough absolute constant C and for \(\epsilon \in (0,1)\), there exists a set of

$$\begin{aligned} m \le CNB(\log (2N))^2\epsilon ^{-2} \end{aligned}$$

points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), such that for any \(f\in X_N\), we have

$$\begin{aligned} (1-\epsilon )\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le (1+\epsilon )\Vert f\Vert _1. \end{aligned}$$

3 The Entropy Numbers of \(\mathcal {T}(Q)_1\)

We use the technique developed in [20], which is based on the following two-step strategy. In the first step, we obtain bounds of the best m-term approximations with respect to a dictionary. In the second step, we use general inequalities relating the entropy numbers to the best m-term approximations. We begin the detailed discussion with the second step of the above strategy. Let \({\mathcal {D}}=\{g_j\}_{j=1}^N\) be a system of elements of cardinality \(|{\mathcal {D}}|=N\) in a Banach space X. Consider best m-term approximations of f with respect to \({\mathcal {D}}\),

$$\begin{aligned} \sigma _m(f,{\mathcal {D}})_X:= \inf _{\{c_j\};\Lambda :|\Lambda |=m}\Vert f-\sum _{j\in \Lambda }c_jg_j\Vert . \end{aligned}$$

For a function class F, set

$$\begin{aligned} \sigma _m(F,{\mathcal {D}})_X:=\sup _{f\in F}\sigma _m(f,{\mathcal {D}})_X. \end{aligned}$$

The following results are from [16].

Theorem 3.1

Let a compact \(F\subset X\) be such that there exists a system \({\mathcal {D}}\), \(|{\mathcal {D}}|=N\), and a number \(r>0\) such that

$$\begin{aligned} \sigma _m(F,{\mathcal {D}})_X \le m^{-r},\quad m\le N. \end{aligned}$$

Then for \(k\le N\),

$$\begin{aligned} \varepsilon _k(F,X) \le C(r) \left( \frac{\log (2N/k)}{k}\right) ^r. \end{aligned}$$
(3.1)

Remark 3.1

Suppose that a compact F from Theorem 3.1 belongs to an N-dimensional subspace \(X_N:={\text {span}}({\mathcal {D}})\). Then in addition to (3.1) we have for \(k\ge N\),

$$\begin{aligned} \varepsilon _k(F,X) \le C(r)N^{-r}2^{-k/(2N)}. \end{aligned}$$
(3.2)

We point out that Remark 3.1 is formulated for a complex Banach space X. In the case of real Banach space X, we have \(2^{-k/N}\) instead of \(2^{-k/(2N)}\) in (3.2).

We begin with the best m-term approximation of elements of \(\mathcal {T}(Q)_1:=\{f\in \mathcal {T}(Q):\Vert f\Vert _1\le 1\}\) in \(L_2\) with respect to a special dictionary \({\mathcal {D}}^1:={\mathcal {D}}^1(Q)\) associated with Q. Define

$$\begin{aligned} {\mathcal {D}}_Q(\mathbf x) := \sum _{\mathbf k\in Q} e^{i(\mathbf k,\mathbf x)},\quad w_Q := |Q|^{-1/2}{\mathcal {D}}_Q. \end{aligned}$$

Then \(\Vert w_Q\Vert _2 =1\). Consider the dictionary

$$\begin{aligned} {\mathcal {D}}^1:= {\mathcal {D}}^1(Q):= \{w_Q(\mathbf x-\mathbf y)\}_{\mathbf y\in \mathbb {T}^d}. \end{aligned}$$

For a dictionary \({\mathcal {D}}\) in a Hilbert space H with an inner product \(\langle \cdot ,\cdot \rangle \), denote by \(A_1({\mathcal {D}})\) the closure of the convex hull of the dictionary \({\mathcal {D}}\). In the case of a complex Hilbert space, define the symmetrized dictionary \({\mathcal {D}}^s := \{e^{i\theta } g: g\in {\mathcal {D}}, \theta \in [0,2\pi ]\}\). We use the weak orthogonal greedy algorithm (weak orthogonal matching pursuit) for m-term approximation. We recall the corresponding definition and formulate the known result, which we will use.

Weak Orthogonal Greedy Algorithm (WOGA) Let \(t\in (0,1]\) be a weakness parameter. We define \(f^{o,t}_0 :=f\). Then for each \(m\ge 1\), we inductively define:

  1. (1)

    \(\varphi ^{o,t}_m \in {\mathcal {D}}\) is any element satisfying

    $$\begin{aligned} \left| \left\langle f^{o,t}_{m-1},\varphi ^{o,t}_m\right\rangle \right| \ge t \sup _{g\in {\mathcal {D}}} \left| \left\langle f^{o,t}_{m-1},g\right\rangle \right| . \end{aligned}$$
  2. (2)

    Let \(H_m^t := {\text {span}}(\varphi _1^{o,t},\dots ,\varphi ^{o,t}_m)\), and let \(P_{H_m^t}(f)\) denote an operator of orthogonal projection onto \(H_m^t\). Define

    $$\begin{aligned} G_m^{o,t}(f,{\mathcal {D}}) := P_{H_m^t}(f). \end{aligned}$$
  3. (3)

    Define the residual after mth iteration of the algorithm

    $$\begin{aligned} f^{o,t}_m := f-G_m^{o,t}(f,{\mathcal {D}}). \end{aligned}$$

In the case \(t=1\), the WOGA is called the orthogonal greedy algorithm (OGA). The following theorem is from [12] (see also [15, Ch. 2]).

Theorem 3.2

Let \({\mathcal {D}}\) be an arbitrary dictionary in H. Then for each \(f \in A_1({\mathcal {D}}^s)\), we have

$$\begin{aligned} \left\| f-G^{o,t}_m(f,{\mathcal {D}})\right\| \le (1+mt^2)^{-1/2}. \end{aligned}$$
(3.3)

We now prove the following assertion.

Theorem 3.3

For any finite \(Q\subset \mathbb {Z}^d\), we have

$$\begin{aligned} \sigma _m(\mathcal {T}(Q)_1,{\mathcal {D}}^1(Q))_2 \le (|Q|/m)^{-1/2}. \end{aligned}$$

Proof

Each \(f\in \mathcal {T}(Q)_1\) has a representation

$$\begin{aligned} f(\mathbf x) = (2\pi )^{-d} \int _{\mathbb {T}^d} f(\mathbf y){\mathcal {D}}_Q(\mathbf x-\mathbf y)\mathrm{d}\mathbf y= |Q|^{1/2}(2\pi )^{-d} \int _{\mathbb {T}^d} f(\mathbf y)w_Q(\mathbf x-\mathbf y)\mathrm{d}\mathbf y. \end{aligned}$$
(3.4)

It follows from \(\Vert f\Vert _1 \le 1\) and (3.4) that \(f|Q|^{-1/2} \in A_1(({\mathcal {D}}^1)^s)\). Therefore, by Theorem 3.2 we get the required bound. \(\square \)

Dictionary \({\mathcal {D}}^1(Q)\) is an infinite dictionary. In our further applications we would like to have a finite dictionary. Here we consider \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), where \(\Pi (\mathbf N):=[-N_1,N_1]\times \cdots \times [-N_d,N_d]\), \(\mathbf N=(N_1,\dots ,N_d)\). We write

$$\begin{aligned} P(\mathbf N)&:= \bigl \{\mathbf n = (n_1 ,\dots ,n_d), n_j\text {---are nonnegative integers},\\&\;\quad 0\le n_j\le 2N_j ,\quad j = 1,\dots ,d \bigr \}, \end{aligned}$$

and set

$$\begin{aligned} \mathbf x^{\mathbf n}:=\left( \frac{2\pi n_1}{2N_1+1},\dots ,\frac{2\pi n_d}{2N_d+1}\right) ,\quad \mathbf n\in P(\mathbf N) . \end{aligned}$$

Then for any \(t\in \mathcal {T}(\Pi (\mathbf N))\) (see [23, Ch. 10]),

$$\begin{aligned} \vartheta (\mathbf N)^{-1}\sum _{\mathbf n\in P(\mathbf N)} \bigl |t(\mathbf x^{\mathbf n})\bigr |\le C(d)\Vert t\Vert _1, \end{aligned}$$
(3.5)

where \(\vartheta (\mathbf N) := \prod _{j=1}^d (2N_j + 1)=\dim \mathcal {T}(\Pi (\mathbf N))\). Specify \(\mathbf N:=(2^n,\dots ,2^n)\) and define

$$\begin{aligned} {\mathcal {D}}^2:={\mathcal {D}}^2(Q):= \left\{ w_Q(\mathbf x-\mathbf x^\mathbf n)\right\} _{\mathbf n\in P(\mathbf N)}. \end{aligned}$$

Then, clearly, \(|{\mathcal {D}}^2(Q)|=\vartheta (\mathbf N) = (2^{n+1}+1)^d\). Also, it is well known that for \(f\in \mathcal {T}(\Pi (\mathbf N))\), one has

$$\begin{aligned} f(\mathbf x) = \vartheta (\mathbf N)^{-1}\sum _{\mathbf n\in P(\mathbf N)} f(\mathbf x^{\mathbf n}){\mathcal {D}}_{\Pi (\mathbf N)}(\mathbf x-\mathbf x^\mathbf n), \end{aligned}$$
(3.6)

and, therefore, for \(f\in \mathcal {T}(Q)\), \(Q\subset \Pi (\mathbf N)\),

$$\begin{aligned} f(\mathbf x) = \vartheta (\mathbf N)^{-1}\sum _{\mathbf n\in P(\mathbf N)} f(\mathbf x^{\mathbf n}){\mathcal {D}}_Q(\mathbf x-\mathbf x^\mathbf n). \end{aligned}$$
(3.7)

In particular, (3.5) and (3.7) imply that there exists \(C(d)>0\) such that for every \(f\in \mathcal {T}(Q)_1\), we have \(C(d)^{-1}|Q|^{-1/2} f\in A_1(({\mathcal {D}}^2)^s)\). Therefore, we have the following version of Theorem 3.3.

Theorem 3.4

For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), we have

$$\begin{aligned} \sigma _m(\mathcal {T}(Q)_1,{\mathcal {D}}^2(Q))_2 \le C(d)(|Q|/m)^{-1/2} \end{aligned}$$

and \(|{\mathcal {D}}^2(Q)|\le C'(d)2^{nd}\).

Theorems 3.3 and 3.4 provide bounds for the best m-term approximation of elements of \(\mathcal {T}(Q)_1\) in the \(L_2\) norm. For applications in the Marcinkiewicz discretization theorem, we need bounds for the entropy numbers in the \(L_\infty \) norm. As we explained above, we derive appropriate bounds for the entropy numbers from the corresponding bounds on the best m-term approximations with the help of Theorem 3.1. Thus we need bounds on the best m-term approximations in the \(L_\infty \) norm. We proceed in the same way as in [20] and use the following dictionary:

$$\begin{aligned} {\mathcal {D}}^\mathcal {T}:={\mathcal {D}}^\mathcal {T}(Q):= \{e^{i(\mathbf k,\mathbf x)}: \mathbf k\in Q\}. \end{aligned}$$

In order to obtain the bounds in the \(L_\infty \) norm, we use the following theorem from [20], which in turn is a corollary of the corresponding result from [18].

Theorem 3.5

Let \(\Lambda \subset \Pi (\mathbf N)\), with \(N_j=2^n\), \(j=1,\dots ,d\). There exist constructive greedy-type approximation methods \(G^\infty _m(\cdot )\) that provide m-term polynomials with respect to \(\mathcal {T}^d\) with the following properties:

for \(f\in \mathcal {T}(\Lambda )\), we have \(G^\infty _m(f)\in \mathcal {T}(\Lambda )\) and

$$\begin{aligned} \left\| f-G^\infty _m(f)\right\| _\infty \le C_3(d)({\bar{m}})^{-1/2}n^{1/2}|\Lambda |^{1/2}\Vert f\Vert _2,\quad \bar{m} := \max (m,1). \end{aligned}$$

We now consider a dictionary

$$\begin{aligned} {\mathcal {D}}^3:={\mathcal {D}}^3(Q):= {\mathcal {D}}^2(Q)\cup {\mathcal {D}}^\mathcal {T}(Q). \end{aligned}$$

Lemma 3.1

For any \(Q\subset \Pi (\mathbf N)\), with \(\mathbf N=(2^n,\dots ,2^n)\), we have

$$\begin{aligned} \sigma _m\left( \mathcal {T}(Q)_1,{\mathcal {D}}^3(Q)\right) _\infty \le C(d)n^{1/2}|Q|/m \end{aligned}$$

and \(|{\mathcal {D}}^3(Q)|\le C'(d)2^{nd}\).

Proof

Take \(f\in \mathcal {T}(Q)_1\). Applying first Theorem 3.4 with [m / 2] and then applying Theorem 3.5 with \(\Lambda = Q\) and [m / 2], we obtain

$$\begin{aligned} \sigma _m(f,{\mathcal {D}}^3(Q))_\infty \ll n^{1/2}(| Q|/m)\Vert f\Vert _1, \end{aligned}$$

which proves the lemma. \(\square \)

Lemma 3.1, Theorem 3.1, and Remark 3.1 imply the following result on the entropy numbers.

Theorem 3.6

For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), we have

$$\begin{aligned} \varepsilon _k(\mathcal {T}( Q)_1,L_\infty ) \ll \left\{ \begin{array}{ll} n^{3/2}(|Q|/k), &{}\quad k\le 2| Q|,\\ n^{3/2}2^{-k/(2| Q|)},&{}\quad k\ge 2| Q|.\end{array} \right. \end{aligned}$$

The above theorem with \(Q=Q_n\) can be used for proving the upper bounds for the entropy numbers of the mixed smoothness classes. We define the classes that were studied in [19, 20].

Let \(\mathbf s=(s_1,\dots ,s_d)\) be a vector with non-negative integer coordinates (\(\mathbf s\in \mathbb {Z}^d_+\)) and, as above,

$$\begin{aligned} \rho (\mathbf s):= \left\{ \mathbf k=(k_1,\dots ,k_d)\in \mathbb {Z}^d_+ : [2^{s_j-1}]\le |k_j|<2^{s_j}, j=1,\dots ,d\right\} , \end{aligned}$$

where [a] denotes the integer part of a number a. Define for \(f\in L_1\),

$$\begin{aligned} \delta _\mathbf s(f) := \sum _{\mathbf k\in \rho (\mathbf s)} \hat{f}(\mathbf k) e^{i(\mathbf k,\mathbf x)} \end{aligned}$$

and

$$\begin{aligned} f_l:=\sum _{\Vert \mathbf s\Vert _1=l}\delta _\mathbf s(f), \quad l\in {\mathbb {N}}_0,\quad {\mathbb {N}}_0:={\mathbb {N}}\cup \{0\}. \end{aligned}$$

Consider the class (see [19])

$$\begin{aligned} \mathbf W^{a,b}_q:=\left\{ f: \Vert f_l\Vert _q \le 2^{-al}(\bar{l})^{(d-1)b}\right\} ,\quad \bar{l}:=\max (l,1). \end{aligned}$$

Define

$$\begin{aligned} \Vert f\Vert _{\mathbf W^{a,b}_q} := \sup _l \Vert f_l\Vert _q 2^{al}(\bar{l})^{-(d-1)b}. \end{aligned}$$

Here is one more class, which is equivalent to \(\mathbf W^{a,b}_q\) in the case \(1<q<\infty \) (see [19]). Consider a class \({\bar{\mathbf W}}^{a,b}_q\), which consists of functions f with a representation

$$\begin{aligned} f=\sum _{n=1}^\infty t_n, \quad t_n\in \mathcal {T}(Q_n), \quad \Vert t_n\Vert _q \le 2^{-an} n^{b(d-1)}. \end{aligned}$$

In the case \(q=1\), classes \({\bar{\mathbf W}}^{a,b}_1\) are wider than \(\mathbf W^{a,b}_1\).

The following theorem was proved in [20].

Theorem 3.7

Let \(d=2\) and \(a>1\). Then

$$\begin{aligned} \varepsilon _k\left( \mathbf W^{a,b}_1,L_\infty \right) \asymp \varepsilon _k\left( {\bar{\mathbf W}}^{a,b}_1,L_\infty \right) \asymp k^{-a} (\log k)^{a+b+1/2}. \end{aligned}$$
(3.8)

We prove here an extension of Theorem 3.7 to all d. We note that this extension—Theorem 3.8—is weaker than Theorem 3.7 in the case \(d=2\).

Theorem 3.8

Let \(a>1\). Then

$$\begin{aligned} \varepsilon _k\left( \mathbf W^{a,b}_1,L_\infty \right) \le \varepsilon _k\left( {\bar{\mathbf W}}^{a,b}_1,L_\infty \right) \ll k^{-a} (\log k)^{(a+b)(d-1)+3/2}. \end{aligned}$$
(3.9)

Proof

The proof is based on the following general result from [20]. Let X and Y be two Banach spaces. We discuss a problem of estimating the entropy numbers of an approximation class, defined in the space X, in the norm of the space Y. Suppose a sequence of finite dimensional subspaces \(X_n \subset X\), \(n=1,\dots \), is given. Define the following class:

$$\begin{aligned} {\bar{\mathbf W}}^{a,b}_X:={\bar{\mathbf W}}^{a,b}_X\{X_n\}:= & {} \left\{ f\in X: f=\sum _{n=1}^\infty f_n,\quad f_n\in X_n,\right. \\&\left. \Vert f_n\Vert _X \le 2^{-an}n^{b}, n=1,2,\dots \right\} . \end{aligned}$$

In particular,

$$\begin{aligned} {\bar{\mathbf W}}^{a,b}_q = {\bar{\mathbf W}}^{a,b(d-1)}_{L_q}\{\mathcal {T}(Q_n)\} . \end{aligned}$$

Set \(D_n:=\dim X_n\), and assume that for the unit balls \(B(X_n):=\{f\in X_n: \Vert f\Vert _X\le 1\}\), we have the following upper bounds for the entropy numbers: there exist real \(\alpha \) and non-negative \(\gamma \) and \(\beta \in (0,1]\) such that

$$\begin{aligned} \varepsilon _k(B(X_n),Y) \ll n^\alpha \left\{ \begin{array}{ll} (D_n/(k+1))^\beta (\log (4D_n/(k+1)))^\gamma , &{}\quad k\le 2D_n,\\ 2^{-k/(2D_n)},&{}\quad k\ge 2D_n.\end{array} \right. \end{aligned}$$
(3.10)

Theorem 3.9

Assume \(D_n \asymp 2^n n^c\), \(c\ge 0\), \(a>\beta \), and subspaces \(\{X_n\}\) satisfy (3.10). Then

$$\begin{aligned} \varepsilon _k\left( \bar{\mathbf W}^{a,b}_X\{X_n\},Y\right) \ll k^{-a} (\log k)^{ac+b+\alpha }. \end{aligned}$$
(3.11)

Theorem 3.6 with \(Q=Q_n\) yields (3.10) with \(\alpha =3/2\), \(\beta =1\), \(\gamma =0\). It remains to apply Theorem 3.9 with \(X_n=\mathcal {T}(Q_n)\) and \(c=d-1\). \(\square \)

4 The Marcinkiewicz-Type Discretization Theorem for the Trigonometric Polynomials

In this section we improve Theorem 1.3 from the introduction, which was proved in [21], in two directions. We prove the Marcinkiewicz-type discretization theorem for \(\mathcal {T}(Q)\) instead of \(\mathcal {T}(Q_n)\) for a rather general Q. Also, even in a more general situation, we improve the bound from \(m \le C_1(d)|Q_n|n^{d/2+3}\) to \(m \le C_1(d)|Q_n|n^{7/2}\) similar to that in Theorem 1.2. Our proof is based on the conditional Theorems 2.1 and 3.6.

We now prove the Marcinkiewicz-type theorem for discretization of the \(L_1\) norm of polynomials from \(\mathcal {T}(Q)\).

Theorem 4.1

There is a large enough constant \(C_1(d)\) with the property: For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), there exists a set of \(m \le C_1(d)|Q|n^{7/2}\) points \(\xi ^j\in \mathbb {T}^d\), \(j=1,\dots ,m\) such that for any \(f\in \mathcal {T}(Q)\), we have

$$\begin{aligned} C_2\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le C_3\Vert f\Vert _1. \end{aligned}$$

Proof

We use Theorem 3.6, proved in Sect. 3. We formulate it here for the reader’s convenience. We stress that Theorem 4.2 is the only result on the specific features of the \(\mathcal {T}(Q)\) that we use in the proof of Theorem 4.1.

Theorem 4.2

For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), we have

$$\begin{aligned} \varepsilon _k(\mathcal {T}( Q)_1,L_\infty ) \le C_4(d) \left\{ \begin{array}{ll} n^{3/2}(|Q|/k), &{}\quad k\le 2| Q|,\\ n^{3/2}2^{-k/(2| Q|)},&{}\quad k\ge 2| Q|.\end{array} \right. \end{aligned}$$

We now apply Theorem 2.1 with \(N=2|Q|\) and \(B=C_4(d)n^{3/2}\). This completes the proof. \(\square \)

In the above proof of Theorem 2.1, we specified \(\eta =1/4\). If instead we take \(\eta \in [2^{-2^{nd/2}},1/4]\), define \(J(\eta )\) to be the minimal j satisfying \(\delta _j \le \eta /4\), and set

$$\begin{aligned} \eta _j := \frac{\eta }{4nd}, \end{aligned}$$

then we obtain the following generalization of Theorem 4.1.

Theorem 4.3

For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\) and \(\epsilon \in [2^{1-2^{nd/2}},1/2]\), there exists a set of \(m \le C_1(d)|Q|n^{7/2}\epsilon ^{-2}\) points \(\xi ^j\in \mathbb {T}^d\), \(j=1,\dots ,m\), such that for any \(f\in \mathcal {T}(Q)\), we have

$$\begin{aligned} (1-\epsilon )\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le (1+\epsilon )\Vert f\Vert _1. \end{aligned}$$

5 Some Marcinkiewicz-Type Discretization Theorems for General Polynomials

In this section we extend the technique developed in Sects. 3 and 4 to the case of a general orthonormal system \(\{u_i\}_{i=1}^N\) on a compact \(\Omega \subset {\mathbb {R}}^d\), which satisfies conditions A, B, and C from the introduction. Let \(\mu \) be a probability measure on \(\Omega \). It is convenient for us to assume that \(u_i\), \(i=1,\dots ,N\), are real functions and set

$$\begin{aligned} \langle u,v\rangle := \int _\Omega u(\mathbf x)v(\mathbf x) \mathrm{d}\mu , \quad \Vert u\Vert _2 := \langle u,u\rangle ^{1/2}. \end{aligned}$$

Denote the unit \(L_p\) ball in \(X_N\) by

$$\begin{aligned} X^p_N :=\{f\in X_N: \Vert f\Vert _p\le 1\}. \end{aligned}$$

We begin with the estimates of the entropy numbers \(\varepsilon _k(X^1_N,L_\infty )\). We use the same strategy as above: first we get bounds on m-term approximations for \(X^1_N\) in \(L_2\) with respect to a dictionary \({\mathcal {D}}^1\), second we obtain bounds on m-term approximations for \(X^2_N\) in \(L_\infty \) with respect to a dictionary \({\mathcal {D}}^2\), third we get bounds on m-term approximations for \(X^1_N\) in \(L_\infty \) with respect to a dictionary \({\mathcal {D}}^3={\mathcal {D}}^1\cup {\mathcal {D}}^2\). Then we apply Theorem 3.1 to obtain the entropy numbers estimates.

5.1 Sparse Approximation in \(L_2\)

We begin with the study of m-term approximations with respect to the dictionary

$$\begin{aligned} {\mathcal {D}}^0:= \{g_\mathbf y(\mathbf x)\}_{\mathbf y\in \Omega },\quad g_\mathbf y(\mathbf x):= (K_2N)^{-1/2}{\mathcal {D}}_N(\cdot ,\mathbf y), \end{aligned}$$

where

$$\begin{aligned} {\mathcal {D}}_N(\mathbf x,\mathbf y) := \sum _{i=1}^N u_i(\mathbf x)u_i(\mathbf y) \end{aligned}$$

is the Dirichlet kernel for the system \(\{u_i\}_{i=1}^N\). Then assumption B guarantees that \(\Vert g_\mathbf y\Vert _2 \le 1\). We now use the following greedy-type algorithm (see [15, p. 82]).

Relaxed Greedy Algorithm (RGA) Let \(f^r_0:=f\) and \(G_0^r(f):= 0\). For a function h from a real Hilbert space H, let \(g=g(h)\) denote the function from \({\mathcal {D}}^\pm :=\{\pm g:g\in {\mathcal {D}}\}\) that maximizes \(\langle h,g\rangle \) (we assume the existence of such an element). Then, for each \(m\ge 1\), we inductively define

$$\begin{aligned} G_m^r(f):= \left( 1-\frac{1}{m}\right) G_{m-1}^r(f)+\frac{1}{m}g\left( f_{m-1}^r\right) , \quad f_m^r:= f-G_m^r(f). \end{aligned}$$

We use the following known result (see [15, p. 90]).

Theorem 5.1

For the relaxed greedy algorithm, we have, for each \(f\in A_1({\mathcal {D}}^\pm )\), the estimate

$$\begin{aligned} \left\| f-G_m^r(f)\right\| \le \frac{2}{\sqrt{m}},\quad m\ge 1. \end{aligned}$$

In our application of the above RGA, the Hilbert space H is the \(X_N\) with the \(L_2\) norm and the dictionary \({\mathcal {D}}\) is the \({\mathcal {D}}^0\) defined above. Using the representation

$$\begin{aligned} f(\mathbf x) = \int _{\Omega } f(\mathbf y) {\mathcal {D}}_N(\mathbf x,\mathbf y)\mathrm{d}\mu (\mathbf y), \end{aligned}$$
(5.1)

we see that the search for \(g\in ({\mathcal {D}}^0)^\pm \) maximizing \(\langle h,g\rangle \), \(h\in X_N\), is equivalent to the search for \(\mathbf y\in \Omega \) maximizing \(|h(\mathbf y)|\). A function h from \(X_N\) is continuous on the compact \(\Omega \), and therefore such a maximizing \(\mathbf y_{\max }\) exists. This means that we can run the RGA.

For \(f\in X^1_N\), by representation (5.1) we obtain

$$\begin{aligned} f(\mathbf x)= & {} \int _{\Omega } f(\mathbf y) {\mathcal {D}}_N(\mathbf x,\mathbf y)\mathrm{d}\mu (\mathbf y)\\= & {} (K_2N)^{1/2}\int _{\Omega } f(\mathbf y) (K_2N)^{-1/2}{\mathcal {D}}_N(\mathbf x,\mathbf y)\mathrm{d}\mu (\mathbf y). \end{aligned}$$

Therefore,

$$\begin{aligned} (K_2N)^{-1/2} f \in A_1(({\mathcal {D}}^0)^\pm )\quad \text {or}\quad f\in A_1(({\mathcal {D}}^0)^\pm ,(K_2N)^{1/2}). \end{aligned}$$

Applying Theorem 5.1, we get the following result.

Theorem 5.2

For the relaxed greedy algorithm with respect to \({\mathcal {D}}^0\), we have, for each \(f\in X^1_N\), the estimate

$$\begin{aligned} \Vert f-G_m^r(f)\Vert \le 2(K_2N/m)^{1/2},\quad m\ge 1. \end{aligned}$$

We need an analog of Theorem 5.2 for a discrete version of \({\mathcal {D}}^0\). Take a \(\delta >0\) and let \(\{\mathbf y^1,\dots ,\mathbf y^M\}\), \(M=M(\delta )\), be a \(\delta \)-net of points in \(\Omega \), which means that for any \(\mathbf y\in \Omega \), there is a \(\mathbf y^j\) from the net such that \(\Vert \mathbf y-\mathbf y^j\Vert _\infty \le \delta \). It is clear that

$$\begin{aligned} M(\delta ) \le (C(\Omega )/\delta )^d. \end{aligned}$$
(5.2)

It follows from the definition of the RGA that \(G_m^r(f)\in A_1({\mathcal {D}}^\pm )\) provided \(f\in A_1({\mathcal {D}}^\pm )\). Let \(f\in X^1_N\), and let \(G_m^r(f)\) be its approximant from Theorem 5.2. Then

$$\begin{aligned} G_m^r(f) = \sum _{k=1}^m c_kg_{\mathbf y(k)},\quad \sum _{k=1}^m |c_k| \le (K_2N)^{1/2}. \end{aligned}$$
(5.3)

For each \(\mathbf y(k)\), find \(\mathbf y^{j(k)}\) from the net such that \(\Vert \mathbf y(k)-\mathbf y^{j(k)}\Vert _\infty \le \delta \). Then, using assumption A, we get

$$\begin{aligned} \Vert g_{\mathbf y(k)}-g_{\mathbf y^{j(k)}}\Vert _2^2 = (K_2N)^{-1}\sum _{i=1}^N |u_i(\mathbf y(k))-u_i(\mathbf y^{j(k)})|^2 \end{aligned}$$
$$\begin{aligned} \le (K_2N)^{-1} K_1^2 N^{1+2\beta } \delta ^{2\alpha }. \end{aligned}$$
(5.4)

Let

$$\begin{aligned} t_m(f) := \sum _{k=1}^m c_kg_{\mathbf y^{j(k)}}. \end{aligned}$$

Combining (5.3) and (5.4), we obtain

$$\begin{aligned} \Vert G_m^r(f)-t_m(f)\Vert _2 \le (K_2N)^{1/2}K_2^{-1/2} K_1N^\beta \delta ^\alpha . \end{aligned}$$
(5.5)

Choosing \(\delta \) such that

$$\begin{aligned} \delta _0^\alpha = K_1^{-1}N^{-1/2-\beta }, \end{aligned}$$

we obtain by Theorem 5.2 and (5.5) that for \(f\in X^1_N\),

$$\begin{aligned} \Vert f-t_m(f)\Vert _2 \le 3(K_2N/m)^{1/2},\quad m\le N. \end{aligned}$$
(5.6)

Inequality (5.2) gives

$$\begin{aligned} M(\delta _0) \le C(K_1,\Omega ,d) N^{c(\alpha ,\beta ,d)}. \end{aligned}$$
(5.7)

Define the dictionary \({\mathcal {D}}^1\) as follows:

$$\begin{aligned} {\mathcal {D}}^1 := \{g_{\mathbf y^j}\}_{j=1}^M. \end{aligned}$$

Relation (5.6) gives us the following theorem.

Theorem 5.3

We have

$$\begin{aligned} \sigma _m(X^1_N,{\mathcal {D}}^1)_2 \le 3(K_2N/m)^{1/2}. \end{aligned}$$

5.2 Sparse Approximation in \(L_\infty \)

In this subsection we study m-term approximations of \(f\in X^2_N\) in the \(L_\infty \) norm with respect to the following dictionary:

$$\begin{aligned} {\mathcal {D}}^2 := \{\pm g_i\}_{i=1}^N,\quad g_i:= u_i K_2^{-1/2}. \end{aligned}$$

Then by property B, for all p we have \(\Vert g_i\Vert _p \le 1\).

In this subsection we use greedy algorithms in Banach spaces. We recall some notation from the theory of greedy approximation in Banach spaces. The reader can find a systematic presentation of this theory in [15, Chapter 6]. Let X be a Banach space with norm \(\Vert \cdot \Vert \). We say that a set of elements (functions) \({\mathcal D}\) from X is a dictionary if each \(g\in {\mathcal D}\) has norm less than or equal to one (\(\Vert g\Vert \le 1\)) and the closure of \({\text {span}}{\mathcal D}\) coincides with X. We note that in [13] we required in the definition of a dictionary normalization of its elements (\(\Vert g\Vert =1\)). However, it is pointed out in [14] that it is easy to check that the arguments from [13] work under assumption \(\Vert g\Vert \le 1\) instead of \(\Vert g\Vert =1\). In applications it is more convenient for us to have an assumption \(\Vert g\Vert \le 1\) than normalization of a dictionary.

For an element \(f\in X\), we denote by \(F_f\) a norming (peak) functional for f:

$$\begin{aligned} \Vert F_f\Vert =1,\quad F_f(f) =\Vert f\Vert . \end{aligned}$$

The existence of such a functional is guaranteed by the Hahn–Banach theorem.

We proceed to the incremental greedy algorithm (see [14] and [15, Chapter 6]). Let \(\epsilon =\{\epsilon _n\}_{n=1}^\infty \), \(\epsilon _n> 0\), \(n=1,2,\dots \) . For a Banach space X and a dictionary \({\mathcal {D}}\), define the following algorithm IA(\(\epsilon \)) \(:=\) IA(\(\epsilon ,X,{\mathcal {D}}\)).

Incremental Algorithm with Schedule \(\varvec{\epsilon }\) (IA(\(\varvec{\epsilon ,X},{\mathcal {D}}\))) Set \(f_0^{i,\epsilon }:= f\) and \(G_0^{i,\epsilon } :=0\). Then, for each \(m\ge 1\), we have the following inductive definition:

  1. (1)

    \(\varphi _m^{i,\epsilon } \in {\mathcal {D}}\) is any element satisfying

    $$\begin{aligned} F_{f_{m-1}^{i,\epsilon }}\left( \varphi _m^{i,\epsilon }-f\right) \ge -\epsilon _m. \end{aligned}$$
  2. (2)

    Define

    $$\begin{aligned} G_m^{i,\epsilon }:= (1-1/m)G_{m-1}^{i,\epsilon } +\varphi _m^{i,\epsilon }/m. \end{aligned}$$
  3. (3)

    Let

    $$\begin{aligned} f_m^{i,\epsilon } := f- G_m^{i,\epsilon }. \end{aligned}$$

    We consider here approximation in uniformly smooth Banach spaces. For a Banach space X, we define the modulus of smoothness

    $$\begin{aligned} \rho (u) := \sup _{\Vert x\Vert =\Vert y\Vert =1}\left( \frac{1}{2}(\Vert x+uy\Vert +\Vert x-uy\Vert )-1\right) . \end{aligned}$$

    The uniformly smooth Banach space is the one with the property

    $$\begin{aligned} \lim _{u\rightarrow 0}\rho (u)/u =0. \end{aligned}$$

It is well known (see for instance [4, Lemma B.1]) that in the case \(X=L_p\), \(1\le p < \infty \), we have

$$\begin{aligned} \rho (u) \le {\left\{ \begin{array}{ll} u^p/p &{} \text {if }1\le p\le 2 ,\\ (p-1)u^2/2 &{} \text {if } 2\le p<\infty . \end{array}\right. } \end{aligned}$$
(5.8)

Denote by \(A_1({\mathcal {D}}):=A_1({\mathcal {D}})(X)\) the closure in X of the convex hull of \({\mathcal {D}}\). In order to be able to run the IA(\(\epsilon \)) for all iterations, we need the existence of an element \(\varphi _m^{i,\epsilon } \in {\mathcal {D}}\) at the step (1) of the algorithm for all m. It is clear that the following condition guarantees such existence (see [17]).

Condition B We say that for a given dictionary \({\mathcal {D}}\), an element f satisfies Condition B if for all \(F\in X^*\), we have

$$\begin{aligned} F(f) \le \sup _{g\in {\mathcal {D}}}F(g). \end{aligned}$$

It is well known (see, for instance, [15, p. 343]) that any \(f\in A_1({\mathcal {D}})\) satisfies Condition B. For completeness we give this simple argument here. Take any \(f\in A_1({\mathcal {D}})\). Then for any \(\epsilon >0\), there exist \(g_1^\epsilon ,\dots ,g_N^\epsilon \in {\mathcal {D}}\) and numbers \(a_1^\epsilon ,\dots ,a_N^\epsilon \) such that \(a_i^\epsilon >0\), \(a_1^\epsilon +\dots +a_N^\epsilon = 1\) and

$$\begin{aligned} \Vert f-\sum _{i=1}^Na_i^\epsilon g_i^\epsilon \Vert \le \epsilon . \end{aligned}$$

Thus

$$\begin{aligned} F(f) \le \Vert F\Vert \epsilon + F\left( \sum _{i=1}^Na_i^\epsilon g_i^\epsilon \right) \le \epsilon \Vert F\Vert +\sup _{g\in {\mathcal {D}}} F(g), \end{aligned}$$

which proves Condition B.

We note that Condition B is equivalent to the property \(f\in A_1({\mathcal {D}})\). Indeed, as we showed above, the property \(f\in A_1({\mathcal {D}})\) implies Condition B. Let us show that Condition B implies that \(f\in A_1({\mathcal {D}})\). Assuming the contrary, \(f\notin A_1({\mathcal {D}})\), by the separation theorem for convex bodies, we find \(F\in X^*\) such that

$$\begin{aligned} F(f) > \sup _{\phi \in A_1({\mathcal {D}})} F(\phi ) \ge \sup _{g\in {\mathcal {D}}}F(g), \end{aligned}$$

which contradicts Condition B.

We formulate results on the IA(\(\epsilon \)) in terms of Condition B because in the applications it is easy to check Condition B.

Theorem 5.4

Let X be a uniformly smooth Banach space with modulus of smoothness \(\rho (u)\le \gamma u^q\), \(1<q\le 2\). Define

$$\begin{aligned} \epsilon _n := \beta \gamma ^{1/q}n^{-1/p},\quad p=\frac{q}{q-1},\quad n=1,2,\dots . \end{aligned}$$

Then, for every f satisfying Condition B, we have

$$\begin{aligned} \Vert f_m^{i,\epsilon }\Vert \le C(\beta ) \gamma ^{1/q}m^{-1/p},\quad m=1,2\dots . \end{aligned}$$

In the case \(f\in A_1({\mathcal {D}})\), this theorem is proved in [14] (see also [15, Chapter 6]). As we mentioned above, Condition B is equivalent to \(f\in A_1({\mathcal {D}})\).

For \(f\in X_N\), write \(f=\sum _{i=1}^N c_ig_i\) and define

$$\begin{aligned} \Vert f\Vert _A := \sum _{i=1}^N |c_i|. \end{aligned}$$

Theorem 5.5

Assume that \(X_N\) satisfies C. For any \(t\in X_N\) the IA(\(\epsilon ,X_N\cap L_p,{\mathcal {D}}^2\)) with an appropriate p and schedule \(\epsilon \) applied to \(f:=t/\Vert t\Vert _A\) yields, after m iterations, an m-term polynomial \(G_m(t):=G^{i,\epsilon }_m(f)\Vert t\Vert _A\) with the following approximation property:

$$\begin{aligned} \Vert t-G_m(t)\Vert _\infty \le Cm^{-1/2}(\ln N)^{1/2}\Vert t\Vert _A, \quad \Vert G_m(t)\Vert _A =\Vert t\Vert _A, \end{aligned}$$

with a constant \(C=C(K_3,K_4)\).

Proof

It is clear that it is sufficient to prove Theorem 5.5 for \(t\in X_N\) with \(\Vert t\Vert _A =1\). Then \(t\in A_1({\mathcal {D}}^2)(X_N\cap L_p)\) for all \(p\in [2,\infty )\). Applying the IA(\(\epsilon \)) to f with respect to \({\mathcal {D}}^2\), we obtain by Theorem 5.4 after m iterations

$$\begin{aligned} \left\| t -\sum _{j\in \Lambda } \frac{a_j}{m}g_j\right\| _p \le C\gamma ^{1/2}m^{-1/2},\quad \sum _{j\in \Lambda }a_j = m, \end{aligned}$$
(5.9)

where \(\sum _{j\in \Lambda } \frac{a_j}{m}g_j\) is the \(G^{i,\epsilon }_m(t)\). By (5.8) we find \(\gamma \le p/2\). Next, by the Nikol’skii inequality from assumption C we get from (5.9),

$$\begin{aligned} \left\| t -\sum _{j\in \Lambda } \frac{a_j}{m}g_j\right\| _\infty \le K_3N^{K_4/p}\Vert t -\sum _{j\in \Lambda } \frac{a_j}{m}g_j\Vert _p \le Cp^{1/2}K_3N^{K_4/p}m^{-1/2}. \end{aligned}$$

Choosing \(p\asymp \ln N\), we obtain the bound desired in Theorem 5.5. \(\square \)

Using the following simple relations:

$$\begin{aligned} \Vert f\Vert _2^2= & {} \left\| \sum _{i=1}^N c_ig_i\right\| _2^2 = \left\| K_2^{-1/2}\sum _{i=1}^N c_iu_i\right\| _2^2 = K_2^{-1}\sum _{i=1}^N |c_i|^2, \\ \sum _{i=1}^N |c_i|\le & {} N^{1/2} \left( \sum _{i=1}^N |c_i|^2\right) ^{1/2} = (K_2N)^{1/2}\Vert f\Vert _2, \end{aligned}$$

we obtain from Theorem 5.5 the following estimates.

Theorem 5.6

We have

$$\begin{aligned} \sigma _m\left( X^2_N,{\mathcal {D}}^2\right) _\infty \ll (N/m)^{1/2} (\ln N)^{1/2}. \end{aligned}$$

Combining Theorems 5.3 and 5.6, we obtain:

Theorem 5.7

We have

$$\begin{aligned} \sigma _m\left( X^1_N,{\mathcal {D}}^1\cup {\mathcal {D}}^2\right) _\infty \ll (N/m) (\ln N)^{1/2}. \end{aligned}$$

5.3 The Entropy Numbers

By our construction (see (5.7)), we obtain

$$\begin{aligned} |{\mathcal {D}}^1\cup {\mathcal {D}}^2| \ll N^{c(\alpha ,\beta ,d)}. \end{aligned}$$

Theorem 5.7, Theorem 3.1, and Remark 3.1 (its version for the real case) imply the following result on the entropy numbers.

Theorem 5.8

Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then we have

$$\begin{aligned} \varepsilon _k(X^1_N,L_\infty ) \ll \left\{ \begin{array}{ll} (\log N)^{3/2}(N/k), &{}\quad k\le N,\\ (\log N)^{3/2}2^{-k/N},&{}\quad k\ge N.\end{array} \right. \end{aligned}$$

In the same way that Theorem 4.1 was derived from Theorems 3.6 and 2.1, the following Theorem 5.9 can be derived from Theorems 5.8 and 2.1.

Theorem 5.9

Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then there exists a set of \(m \le C_1N(\log N)^{7/2}\) points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), \(C_1=C(d,K_1,K_2,K_3,K_4,\Omega ,\alpha ,\beta )\), such that for any \(f\in X_N\), we have

$$\begin{aligned} C_2\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le C_3\Vert f\Vert _1. \end{aligned}$$

The following analog of Theorem 4.3 holds for general systems.

Theorem 5.10

Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then for \(\epsilon \in [2^{-N},1/2]\), there exists a set of \(m \le C_1N(\log N)^{7/2}\epsilon ^{-2}\) points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), \(C_1=C(d,K_1,K_2,K_3,K_4,\Omega ,\alpha ,\beta )\), such that for any \(f\in X_N\), we have

$$\begin{aligned} (1-\epsilon )\Vert f\Vert _1 \le \frac{1}{m}\sum _{j=1}^m |f(\xi ^j)| \le (1+\epsilon )\Vert f\Vert _1. \end{aligned}$$

6 The Marcinkiewicz-Type Theorem in \(L_2\)

In this section we discuss some known results directly connected to the discretization theorems and demonstrate how recent results on random matrices can be used to obtain the Marcinkiewicz-type theorem in \(L_2\). We begin with the formulation of the Rudelson result from [11]. In [11], it is formulated in terms of submatrices of an orthogonal matrix. We reformulate it in our notation. Let \(\Omega _M=\{x^j\}_{j=1}^M\) be a discrete set with the probability measure \(\mu (x^j)=1/M\), \(j=1,\dots ,M\). Assume that \(\{u_i(x)\}_{i=1}^N\) is a real orthonormal on \(\Omega _M\) system satisfying the following condition: for all j,

$$\begin{aligned} \sum _{i=1}^Nu_i(x^j)^2 \le Nt^2, \end{aligned}$$
(6.1)

with some \(t\ge 1\). Then for every \(\epsilon >0\), there exists a set \(J\subset \{1,\dots ,M\}\) of indices with cardinality

$$\begin{aligned} m:=|J| \le C\frac{t^2}{\epsilon ^2}N\log \frac{Nt^2}{\epsilon ^2} \end{aligned}$$
(6.2)

such that for any \(f=\sum _{i=1}^N c_iu_i\), we have

$$\begin{aligned} (1-\epsilon )\Vert f\Vert _2^2 \le \frac{1}{m} \sum _{j\in J} f(x^j)^2 \le (1+\epsilon )\Vert f\Vert _2^2. \end{aligned}$$

In particular, the above result implies that for any orthonormal system \(\{u_i\}_{i=1}^N\) on \(\Omega _M\), satisfying (6.1), we have

$$\begin{aligned} {\mathcal U}_N := {\text {span}}(u_1,\dots ,u_N) \in \mathcal {M}(m,2)\quad \text {provided}\quad m\ge CN\log N, \end{aligned}$$

with large enough C. We note that (6.1) is satisfied if the system \(\{u_i\}_{i=1}^N\) is uniformly bounded: \(\Vert u_i\Vert _\infty \le t\), \(i=1,\dots ,N\).

We first demonstrate how the Bernstein-type concentration inequality for matrices can be used to prove an analog of Rudelson’s result above for a general \(\Omega \). Our proof is based on a different idea than Rudelson’s proof. Let \(\{u_i\}_{i=1}^N\) be an orthonormal system on \(\Omega \), satisfying the condition

D. For \(x\in \Omega \), we have

$$\begin{aligned} w(x):=\sum _{i=1}^N u_i(x)^2 = N. \end{aligned}$$
(6.3)

With each \(x\in \Omega \) we associate the matrix \(G(x) := [u_i(x)u_j(x)]_{i,j=1}^N\). Clearly, G(x) is a symmetric matrix. We will also need the matrix \(G(x)^2\). We have for the (kl) element of \(G(x)^2\),

$$\begin{aligned} (G(x)^2)_{k,l} = \sum _{j=1}^N u_k(x)u_j(x)u_j(x)u_l(x) = w(x)u_k(x)u_l(x). \end{aligned}$$

Therefore,

$$\begin{aligned} G(x)^2 = w(x)G(x)\quad \text {and}\quad \Vert G(x)\Vert = w(x). \end{aligned}$$
(6.4)

We use the following Bernstein-type concentration inequality for matrices (see [22]).

Theorem 6.1

Let \(\{T_k\}_{k=1}^n\) be a sequence of independent random symmetric \(N\times N\) matrices. Assume that each \(T_k\) satisfies:

$$\begin{aligned} \mathbb {E}(T_k) =0 \quad \text {and}\quad \Vert T_k\Vert \le R \quad \text {almost surely}. \end{aligned}$$

Then for all \(\eta \ge 0\),

$$\begin{aligned} \mathbb {P}\left\{ \left\| \sum _{k=1}^n T_k \right\| \ge \eta \right\} \le N\exp \left( -\frac{\eta ^2}{2\sigma ^2 +(2/3)R\eta }\right) , \end{aligned}$$

where \(\sigma ^2 := \left\| \sum _{k=1}^n \mathbb {E}(T_k^2)\right\| \).

We now consider a sequence \(T_k := G(x^k)-I\), \(k=1,\dots ,m\) of independent random symmetric matrices. Orthonormality of the system \(\{u_i\}_{i=1}^N\) implies that \(\mathbb {E}(T_k)=0\) for all k. Relation (6.4) and our assumption D imply for all k,

$$\begin{aligned} \Vert T_k\Vert \le \Vert G(x^k)\Vert +1 = N+1=: R. \end{aligned}$$
(6.5)

Define \(T(x):= G(x)-I\), and, using (6.3) and (6.4), we can write

$$\begin{aligned} T(x)^2 = G(x)^2 -2G(x) +I = (N-2)G(x) +I. \end{aligned}$$

Then by the orthonormality of the system \(\{u_i\}_{i=1}^N\), we get

$$\begin{aligned} \mathbb {E}(T(x)^2) = (N-1)I, \end{aligned}$$

and, therefore, we obtain

$$\begin{aligned} \Vert \mathbb {E}(T^2)\Vert \le N-1. \end{aligned}$$
(6.6)

Thus, by Theorem 6.1 we obtain for \(\eta \le 1\),

$$\begin{aligned} \mathbb {P}\left\{ \left\| \sum _{k=1}^n (G(x^k)-I) \right\| \ge n\eta \right\} \le N\exp \left( -\frac{n\eta ^2}{cN}\right) , \end{aligned}$$
(6.7)

with an absolute constant c.

For a set of points \(\xi ^k\in \Omega \), \(k=1,\dots ,m\), and \(f=\sum _{i=1}^N b_iu_i\), we have

$$\begin{aligned} \frac{1}{m}\sum _{k=1}^m f(\xi ^k)^2 - \int _\Omega f(x)^2 \mathrm{d}\mu = {\mathbf b}^T\left( \frac{1}{m}\sum _{k=1}^m G(\xi ^k)-I\right) {\mathbf b}, \end{aligned}$$

where \({\mathbf b} = (b_1,\dots ,b_N)^T\) is the column vector. Therefore,

$$\begin{aligned} \left| \frac{1}{m}\sum _{k=1}^m f(\xi ^k)^2 - \int _\Omega f(x)^2 \mathrm{d}\mu \right| \le \left\| \frac{1}{m}\sum _{k=1}^m G(\xi ^k)-I\right\| \Vert {\mathbf b}\Vert _2^2. \end{aligned}$$
(6.8)

We now make \(m = [CN\eta ^{-2}\log N]\) with large enough C. Then, using (6.7) with \(n=m\), we get the corresponding probability \(<1\). Thus, we have proved the following theorem.

Theorem 6.2

Let \(\{u_i\}_{i=1}^N\) be an orthonormal system, satisfying condition D. Then for every \(\epsilon \in (0,1)\), there exists a set \(\{\xi ^j\}_{j=1}^m \subset \Omega \) with

$$\begin{aligned} m \le C \epsilon ^{-2}N\log N \end{aligned}$$

such that for any \(f=\sum _{i=1}^N c_iu_i\), we have

$$\begin{aligned} (1-\epsilon )\Vert f\Vert _2^2 \le \frac{1}{m} \sum _{j=1}^m f(\xi ^j)^2 \le (1+\epsilon )\Vert f\Vert _2^2. \end{aligned}$$

We note that Theorem 6.2 treats a special case, when (6.3) instead of (6.1) is satisfied. This is the case, for instance, for the trigonometric and the Walsh systems. In this special case, Theorem 6.2 is more general and slightly stronger than the Rudelson theorem discussed in the beginning of this section. Theorem 6.2 yields the Marcinkiewicz-type discretization theorem for a general domain \(\Omega \) instead of a discrete set \(\Omega _M\). Also, in Theorem 6.2 we have an extra \(\log N\) instead of \(\log \frac{Nt^2}{\epsilon ^2}\) in (6.2).

In [21] we showed how to derive the following result from the recent paper by Nitzan et al. [10], which in turn is based on the paper of Marcus et al. [9].

Theorem 6.3

Let \(\Omega _M=\{x^j\}_{j=1}^M\) be a discrete set with the probability measure \(\mu (x^j)=1/M\), \(j=1,\dots ,M\). Assume that \(\{u_i(x)\}_{i=1}^N\) is an orthonormal on \(\Omega _M\) system (real or complex). Assume in addition that this system has the following property: for all \(j=1,\dots , M\), we have

$$\begin{aligned} \sum _{i=1}^N \left| u_i(x^j)\right| ^2 = N. \end{aligned}$$
(6.9)

Then there is an absolute constant \(C_1\) such that there exists a subset \(J\subset \{1,2,\dots ,M\}\) with the property: \(m:=|J| \le C_1 N\), and for any \(f\in X_N:= {\text {span}}\{u_1,\dots ,u_N\}\), we have

$$\begin{aligned} C_2 \Vert f\Vert _2^2 \le \frac{1}{m}\sum _{j\in J} |f(x^j)|^2 \le C_3 \Vert f\Vert _2^2, \end{aligned}$$

where \(C_2\) and \(C_3\) are absolute positive constants.

Theorem 6.3 is based on the following lemma from [10].

Lemma 6.1

Let a system of vectors \(\mathbf v_1,\dots ,\mathbf v_M\) from \(\mathbb {C}^N\) have the following properties: for all \(\mathbf w\in \mathbb {C}^N\),

$$\begin{aligned} \sum _{j=1}^M |\langle \mathbf w,\mathbf v_j\rangle |^2 = \Vert \mathbf w\Vert _2^2 \end{aligned}$$
(6.10)

and

$$\begin{aligned} \Vert \mathbf v_j\Vert _2^2 = N/M,\quad j=1,\dots ,M. \end{aligned}$$
(6.11)

Then there is a subset \(J\subset \{1,2,\dots ,M\}\) such that for all \(\mathbf w\in \mathbb {C}^N\),

$$\begin{aligned} c_0 \Vert \mathbf w\Vert _2^2 \le \frac{M}{N} \sum _{j\in J} |\langle \mathbf w,\mathbf v_j\rangle |^2 \le C_0\Vert \mathbf w\Vert _2^2, \end{aligned}$$
(6.12)

where \(c_0\) and \(C_0\) are some absolute positive constants.

The above Lemma 6.1 was derived from the following theorem from Marcus et al. [9], which solved the Kadison–Singer problem.

Theorem 6.4

Let a system of vectors \(\mathbf v_1,\dots ,\mathbf v_M\) from \(\mathbb {C}^N\) have the following properties: for all \(\mathbf w\in \mathbb {C}^N\), we have

$$\begin{aligned} \sum _{j=1}^M |\langle \mathbf w,\mathbf v_j\rangle |^2 = \Vert \mathbf w\Vert _2^2 \end{aligned}$$

and \(\Vert \mathbf v_j\Vert _2^2 \le \epsilon \).

Then there exists a partition of \(\{1,\dots ,M\}\) into two sets \(S_1\) and \(S_2\), such that for each \(i=1,2\), we have for all \(\mathbf w\in \mathbb {C}^N\),

$$\begin{aligned} \sum _{j\in S_i} |\langle \mathbf w,\mathbf v_j\rangle |^2 \le \frac{(1+\sqrt{2\epsilon })^2}{2}\Vert \mathbf w\Vert _2^2. \end{aligned}$$

Second, we demonstrate a method of proof that is different from the above proof of Theorem 6.2 and that allows us to replace condition D by the following more general condition E that is similar to (6.1).

E. There exists a constant t such that

$$\begin{aligned} w(x):=\sum _{i=1}^N u_i(x)^2 \le Nt^2. \end{aligned}$$
(6.13)

The new proof method uses the fact that the matrix G(x) is a semi-definite matrix. It is based on the following result (see [22, Theorem 1.1]) on random matrices.

Theorem 6.5

Consider a finite sequence \(\{T_k\}_{k=1}^m\) of independent, random, self-adjoint matrices with dimension N. Assume that each random matrix is semi-positive and satisfies

$$\begin{aligned} \lambda _{\max }(T_k) \le R\quad \text {almost surely}. \end{aligned}$$

Define

$$\begin{aligned} s_{\min } := \lambda _{\min }\left( \sum _{k=1}^m \mathbb {E}(T_k)\right) \quad \text {and}\quad s_{\max } := \lambda _{\max }\left( \sum _{k=1}^m \mathbb {E}(T_k)\right) . \end{aligned}$$

Then

$$\begin{aligned} \mathbb {P}\left\{ \lambda _{\min }\left( \sum _{k=1}^m T_k\right) \le (1-\eta )s_{\min }\right\} \le N\left( \frac{e^{-\eta }}{(1-\eta )^{1-\eta }}\right) ^{s_{\min }/R} \end{aligned}$$

for \(\eta \in [0,1)\), and for \(\eta \ge 0\),

$$\begin{aligned} \mathbb {P}\left\{ \lambda _{\max }\left( \sum _{k=1}^m T_k\right) \ge (1+\eta )s_{\max }\right\} \le N\left( \frac{e^{\eta }}{(1+\eta )^{1+\eta }}\right) ^{s_{\max }/R}. \end{aligned}$$

As above, we consider the matrix \(G(x) := [u_i(x)u_j(x)]_{i,j=1}^N\). Clearly, G(x) is a symmetric matrix. Consider a sequence \(T_k := G(x^k)\), \(k=1,\dots ,m\), of independent random symmetric matrices. It is easy to see that \(T_k\) are semi-positive definite. Orthonormality of the system \(\{u_i\}_{i=1}^N\) implies that \(\mathbb {E}(T_k)=I\) for all k. This implies that \(s_{\min }=s_{\max } = m\). Relation (6.4) shows that we can take \(R:=Nt^2\). Then Theorem 6.5 implies for \(\eta \le 1\),

$$\begin{aligned} \mathbb {P}\left\{ \left\| \sum _{k=1}^m (G(x^k)-I) \right\| \ge m\eta \right\} \le N\exp \left( -\frac{m\eta ^2}{ct^2N}\right) , \end{aligned}$$
(6.14)

with an absolute constant c (we can take \(c=2/\ln 2\)). Using inequality (6.8), which was used in the above proof of Theorem 6.2, we derive from here the following theorem.

Theorem 6.6

Let \(\{u_i\}_{i=1}^N\) be an orthonormal system satisfying condition E. Then for every \(\epsilon >0\), there exists a set \(\{\xi ^j\}_{j=1}^m \subset \Omega \) with

$$\begin{aligned} m \le C\frac{t^2}{\epsilon ^2}N\log N \end{aligned}$$

such that for any \(f=\sum _{i=1}^N c_iu_i\), we have

$$\begin{aligned} (1-\epsilon )\Vert f\Vert _2^2 \le \frac{1}{m} \sum _{j=1}^m f(\xi ^j)^2 \le (1+\epsilon )\Vert f\Vert _2^2. \end{aligned}$$

We note that Theorem 6.6 is more general and slightly stronger than the Rudelson theorem discussed in the beginning of this section. Theorem 6.6 yields the Marcinkiewicz-type discretization theorem for a general domain \(\Omega \) instead of a discrete set \(\Omega _M\). Also, in Theorem 6.6 we have an extra \(\log N\) instead of \(\log \frac{Nt^2}{\epsilon ^2}\) in (6.2).

The Marcinkiewicz Theorem and Sparse Approximation Our above argument, in particular, inequality (6.8), shows that the Marcinkiewicz-type discretization theorem in \(L_2\) is closely related to approximation of the identity matrix I by an m-term approximant of the form \(\frac{1}{m}\sum _{k=1}^m G(\xi ^k)\) in the operator norm from \(\ell ^N_2\) to \(\ell ^N_2\) (spectral norm). Therefore, we can consider the following sparse approximation problem. Assume that the system \(\{u_i(x)\}_{i=1}^N\) satisfies (6.13), and consider the dictionary

$$\begin{aligned} {\mathcal {D}}^u := \{g_x\}_{x\in \Omega },\quad g_x:= G(x)(Nt^2)^{-1},\quad G(x):=[u_i(x)u_j(x)]_{i,j=1}^N. \end{aligned}$$

Then condition (6.13) guarantees that for the Frobenius norm of \(g_x\), we have

$$\begin{aligned} \Vert g_x\Vert _F = w(x)(Nt^2)^{-1} \le 1. \end{aligned}$$
(6.15)

Our assumption on the orthonormality of the system \(\{u_i\}_{i=1}^N\) gives

$$\begin{aligned} I = \int _\Omega G(x)\mathrm{d}\mu = Nt^2\int _\Omega g_x \mathrm{d}\mu , \end{aligned}$$

which implies that \(I \in A_1({\mathcal {D}}^u,Nt^2)\). Consider the Hilbert space H to be a closure in the Frobenius norm of \({\text {span}}\{g_x, x\in \Omega \}\) with the inner product generated by the Frobenius norm: for \(A=[a_{i,j}]_{i,j=1}^N\) and \(B=[b_{i,j}]_{i,j=1}^N\),

$$\begin{aligned} \langle A,B\rangle = \sum _{i,j=1}^N a_{i,j}b_{i,j} \end{aligned}$$

in the case of real matrices (with standard modification in the case of complex matrices).

By Theorem 5.1, for any \(m\in {\mathbb {N}}\), we constructively find (by the RGA) points \(\xi ^1,\dots ,\xi ^m\) such that

$$\begin{aligned} \left\| \frac{1}{m}\sum _{k=1}^m G(\xi ^k)-I\right\| _F \le 2Nt^2 m^{-1/2}. \end{aligned}$$
(6.16)

Taking into account the inequality \(\Vert A\Vert \le \Vert A\Vert _F\), we get from here and from (6.8) the following proposition.

Proposition 6.1

Let \(\{u_i\}_{i=1}^N\) be an orthonormal system satisfying condition E. Then there exists a constructive set \(\{\xi ^j\}_{j=1}^m \subset \Omega \) with \(m\le C(t)N^2\) such that for any \(f=\sum _{i=1}^N c_iu_i\), we have

$$\begin{aligned} \frac{1}{2}\Vert f\Vert _2^2 \le \frac{1}{m} \sum _{j=1}^m f(\xi ^j)^2 \le \frac{3}{2}\Vert f\Vert _2^2. \end{aligned}$$

The Marcinkiewicz-Type Theorem with Weights We now comment on a recent breakthrough result by Batson et al. [1]. We formulate their result in our notation. Let, as above, \(\Omega _M=\{x^j\}_{j=1}^M\) be a discrete set with the probability measure \(\mu (x^j)=1/M\), \(j=1,\dots ,M\). Assume that \(\{u_i(x)\}_{i=1}^N\) is a real orthonormal system on \(\Omega _M\). Then for any number \(d>1\), there exist a set of weights \(w_j\ge 0\) such that \(|\{j: w_j\ne 0\}| \le dN\), so that for any \(f\in {\text {span}}\{u_1,\dots ,u_N\}\), we have

$$\begin{aligned} \Vert f\Vert _2^2 \le \sum _{j=1}^M w_jf(x^j)^2 \le \frac{d+1+2\sqrt{d}}{d+1-2\sqrt{d}}\Vert f\Vert _2^2. \end{aligned}$$

The proof of this result is based on a delicate study of the m-term approximation of the identity matrix I with respect to the system \({\mathcal {D}}:= \{G(x)\}_{x\in \Omega }\), \(G(x):=[u_i(x)u_j(x)]_{i,j=1}^N\) in the spectral norm. The authors of [1] control the change of the maximal and minimal eigenvalues of a matrix, when they add a rank one matrix of the form wG(x). Their proof provides an algorithm for construction of the weights \(\{w_j\}\). In particular, this implies that

$$\begin{aligned} X_N(\Omega _M) \in \mathcal {M}^w(m,2,\epsilon )\quad \text {provided } m \ge CN\epsilon ^{-2}, \end{aligned}$$

with large enough C.

In this section we discussed two deep general results—the Rudelson theorem and the Batson–Spielman–Srivastava theorem—about submatrices of orthogonal matrices, which provide very good Marcinkiewicz-type discretization theorems for \(L_2\). The reader can find corresponding historical comments in [11]. We also refer the reader to the paper [5] for a discussion of a recent outstanding progress on the theory of submatrices of orthogonal matrices.

7 Discussion

As we pointed out in the introduction, the main results of this paper are on the Marcinkiewicz-type discretization theorems in \(L_1\). We proved here that under certain conditions on an N-dimensional subspace \(X_N\) we can get the corresponding discretization theorems with the number of knots \(m\ll N(\log N)^{7/2}\). This result is only off from the ideal case \(m=N\) by the \((\log N)^{7/2}\) factor. We point out that the situation with the discretization theorems in the \(L_\infty \) case is fundamentally different. A very nontrivial surprising negative result was proved for the \(L_\infty \) case (see [6,7,8]). The authors proved that the necessary condition for \(\mathcal {T}(Q_n)\in \mathcal {M}(m,\infty )\) is \(m\gg |Q_n|^{1+c}\) with absolute constant \(c>0\).

Theorem 2.1 shows that an important ingredient of our technique of proving the Marcinkiewicz discretization theorems in \(L_1\) consists in the study of the entropy numbers \(\varepsilon _k(X^1_N,L_\infty )\). We note that this problem is a nontrivial problem by itself. We demonstrate this on the example of the trigonometric polynomials. It is proved in [20] that in the case \(d=2\), we have

$$\begin{aligned} \varepsilon _k(\mathcal {T}( Q_n)_1,L_\infty )\ll n^{1/2} \left\{ \begin{array}{ll} (| Q_n|/k) \log (4| Q_n|/k), &{}\quad k\le 2| Q_n|,\\ 2^{-k/(2| Q_n|)},&{}\quad k\ge 2| Q_n|.\end{array} \right. \end{aligned}$$
(7.1)

The proof of estimate (7.1) is based on an analog of the small ball inequality for the trigonometric system proved for the wavelet type system (see [20]). This proof uses the two-dimensional specific features of the problem, and we do not know how to extend this proof to the case \(d>2\). Estimate (7.1) is used in the proof of the upper bounds in Theorem 3.7. Theorem 3.7 gives the right order of the entropy numbers for the classes of mixed smoothness. This means that (7.1) cannot be substantially improved. The trivial inequality \(\log (4| Q_n|/k) \ll n\) shows that (7.1) implies the following estimate:

$$\begin{aligned} \varepsilon _k(\mathcal {T}( Q_n)_1,L_\infty )\ll n^{3/2} \left\{ \begin{array}{ll} | Q_n|/k , &{}\quad k\le 2| Q_n|,\\ 2^{-k/(2| Q_n|)},&{}\quad k\ge 2| Q_n|.\end{array} \right. \end{aligned}$$
(7.2)

Estimate (7.2) is not as good as (7.1) in application for proving the upper bounds of the entropy numbers of smoothness classes. For instance, instead of the bound in Theorem 3.7, use of (7.2) will give

$$\begin{aligned} \varepsilon _k(\mathbf W^{a,b}_1,L_\infty ) \ll k^{-a}(\log k)^{a+b+3/2}. \end{aligned}$$

However, it turns out that in application to the Marcinkiewicz-type discretization theorems, estimates (7.1) and (7.2) give the same bounds on the number of knots \(m\ll |Q_n|n^{7/2}\) (see Theorems 1.2 and 4.1).

As we pointed out above, we do not have an extension of (7.1) to the case \(d>2\). A somewhat straightforward technique presented in [21] gives the following result for all d:

$$\begin{aligned} \varepsilon _k(\mathcal {T}( Q_n)_1,L_\infty )\ll n^{d/2} \left\{ \begin{array}{ll} (| Q_n|/k) \log (4| Q_n|/k), &{}\quad k\le 2| Q_n|,\\ 2^{-k/(2| Q_n|)},&{}\quad k\ge 2| Q_n|.\end{array} \right. \end{aligned}$$
(7.3)

This result is used in [21] to prove Theorem 1.3. An interesting contribution of this paper is the proof of (7.2) for all d and for rather general sets \(\mathcal {T}(Q)_1\) instead of \(\mathcal {T}(Q_n)_1\). An important new ingredient here is the use of dictionary \({\mathcal {D}}^2(Q)\), consisting of shifts of normalized Dirichlet kernels associated with Q, in m-term approximations. Certainly, it would be nice to understand, even in the special case of the hyperbolic cross polynomials \(\mathcal {T}(Q_n)\), whether the embedding \(\mathcal {T}(Q_n) \in \mathcal {M}(m,1)\) with \(m\asymp |Q_n|\) holds. Results of this paper only show that the above embedding holds with \(m \gg |Q_n|n^{7/2}\). We got the extra factor \(n^{7/2}\) as a result of using (7.2), which contributed \(n^{3/2}\), and of using the chaining technique, which contributed \(n^2\).