Abstract
This paper is devoted to discretization of integral norms of functions from a given finite dimensional subspace. This problem is very important in applications, but there is no systematic study of it. We present here a new technique, which works well for discretization of the integral norm. It is a combination of probabilistic technique, based on chaining, and results on the entropy numbers in the uniform norm.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Discretization is a very important step in making a continuous problem computationally feasible. The problem of construction of good sets of points in a multidimensional domain is a fundamental problem of mathematics and computational mathematics. A prominent example of a classical discretization problem is a problem of metric entropy (covering numbers, entropy numbers). Bounds for the entropy numbers of function classes are important by themselves and also have important connections to other fundamental problems (see, for instance, [15, Ch. 3] and [3, Ch. 6]). Another prominent example of a discretization problem is the problem of numerical integration. Numerical integration in the mixed smoothness classes requires deep number theoretical results for constructing optimal (in the sense of order) cubature formulas (see, for instance, [3, Ch. 8]). A typical approach to solving a continuous problem numerically—the Galerkin method—suggests looking for an approximate solution from a given finite dimensional subspace. A standard way to measure an error of approximation is an appropriate \(L_q\) norm, \(1\le q\le \infty \). Thus, the problem of discretization of the \(L_q\) norms of functions from a given finite dimensional subspace arises in a very natural way.
The main goal of this paper is to study the discretization problem for a finite dimensional subspace \(X_N\) of a Banach space X. We are interested in discretizing the \(L_q\), \(1\le q\le \infty \), norm of elements of \(X_N\). We call such results the Marcinkiewicz-type discretization theorems because the first result in this direction was obtained by Marcinkiewicz (see [23, Ch.10, §7]). He proved the following inequalities for the univariate trigonometric polynomials f of degree n: for \(1<q<\infty \) there are two positive constants \(C_1(q)\) and \(C_2(q)\) such that for the \(L_q\) norm \(\Vert f\Vert _q\) of these polynomials, we have
There are different settings and different ingredients that play important roles in this problem. We now discuss these issues.
Marcinkiewicz Problem Let \(\Omega \) be a compact subset of \({\mathbb {R}}^d\) with the probability measure \(\mu \). We say that a linear subspace \(X_N\) of the \(L_q(\Omega ):=L_q(\Omega ,\mu )\), \(1\le q < \infty \), admits the Marcinkiewicz-type discretization theorem with parameters m and q if there exist a set \(\{\xi ^\nu \in \Omega , \nu =1,\dots ,m\}\) and two positive constants \(C_j(d,q)\), \(j=1,2\), such that for any \(f\in X_N\), we have
In the case \(q=\infty \), we define \(L_\infty \) as the space of continuous on \(\Omega \) functions and require
We will also use a brief way to express the above property: the \(\mathcal {M}(m,q)\) theorem holds for a subspace \(X_N\) or \(X_N \in \mathcal {M}(m,q)\).
Numerical Integration Problem In the case \(1\le q<\infty \), the above problem can be reformulated as a problem on numerical integration of special classes of functions. Define a class \(|X_N|^q := \{|f|^q: f\in X_N, \Vert f\Vert _q \le 1\}\), and consider the numerical integration problem: for a given \(\varepsilon >0\), find \(m = m(N,q,\varepsilon )\) such that
In (1.3) we limit our search for good numerical integration methods to cubature formulas with equal weights 1 / m. These special kinds of cubature formulas are called Quasi-Monte Carlo methods. In numerical integration, general cubature formulas (with weights) are also very important. In this case the above problem (1.3) is reformulated as follows:
Thus, in this case we are optimizing both over the knots \(\xi ^1,\dots ,\xi ^m\) and over the weights \(\lambda _1,\dots ,\lambda _m\).
Marcinkiewicz Problem with Weights The above remark on numerical integration encourages us to consider the following variant of the Marcinkiewicz problem. We say that a linear subspace \(X_N\) of the \(L_q(\Omega )\), \(1\le q < \infty \), admits the weighted Marcinkiewicz-type discretization theorem with parameters m and q if there exist a set of knots \(\{\xi ^\nu \in \Omega \}\), a set of weights \(\{\lambda _\nu \}\), \(\nu =1,\dots ,m\), and two positive constants \(C_j(d,q)\), \(j=1,2\), such that for any \(f\in X_N\), we have
Then we also say that the \(\mathcal {M}^w(m,q)\) theorem holds for a subspace \(X_N\) or \(X_N \in \mathcal {M}^w(m,q)\). Obviously, \(X_N\in \mathcal {M}(m,q)\) implies that \(X_N\in \mathcal {M}^w(m,q)\).
Marcinkiewicz Problem with \(\varvec{\varepsilon }\) We write \(X_N\in \mathcal {M}(m,q,\varepsilon )\) if (1.1) holds with \(C_1(d,q)=1-\varepsilon \) and \(C_2(d,q)=1+\varepsilon \). Respectively, we write \(X_N\in \mathcal {M}^w(m,q,\varepsilon )\) if (1.5) holds with \(C_1(d,q)=1-\varepsilon \) and \(C_2(d,q)=1+\varepsilon \). We note that the most powerful results are for \(\mathcal {M}(m,q,0)\), when the \(L_q\) norm of \(f\in X_N\) is discretized exactly by the formula with equal weights 1 / m.
In this paper we mostly concentrate on the Marcinkiewicz problem and on its variant with \(\varepsilon \). Our main results are for \(q=1\). We now make some general remarks for the case \(q=2\) that illustrate the problem. We discuss the case \(q=2\) in more detail in Sect. 6. We describe the properties of the subspace \(X_N\) in terms of a system \(\mathcal {U}_N:=\{u_i\}_{i=1}^N\) of functions such that \(X_N = {\text {span}}\{u_i, i=1,\dots ,N\}\). In the case \(X_N \subset L_2\), we assume that the system is orthonormal on \(\Omega \) with respect to measure \(\mu \). In the case of real functions we associate with \(x\in \Omega \) the matrix \(G(x) := [u_i(x)u_j(x)]_{i,j=1}^N\). Clearly, G(x) is a symmetric positive semi-definite matrix of rank 1. It is easy to see that for a set of points \(\xi ^k\in \Omega \), \(k=1,\dots ,m\), and \(f=\sum _{i=1}^N b_iu_i\), we have
where \({\mathbf b} = (b_1,\dots ,b_N)^T\) is the column vector and I is the identity matrix. Therefore, the \(\mathcal {M}^w(m,2)\) problem is closely connected to a problem of approximation (representation) of the identity matrix I by an m-term approximant with respect to the system \(\{G(x)\}_{x\in \Omega }\). It is easy to understand that under our assumptions on the system \(\mathcal {U}_N\), there exist a set of knots \(\{\xi ^k\}_{k=1}^m\) and a set of weights \(\{\lambda _k\}_{k=1}^m\), with \(m\le N^2\), such that
and, therefore, we have for any \(X_N \subset L_2\) that
However, we do not know a characterization of those \(X_N\) for which \(X_N \in \mathcal {M}(N^2,2,0)\).
In the above formulations of the problems we only ask about existence of either good \(\{\xi ^\nu \}\) or good \(\{\xi ^\nu ,\lambda _\nu \}\). Certainly, it is important to have either explicit constructions of good \(\{\xi ^\nu \}\) (\(\{\xi ^\nu ,\lambda _\nu \}\)) or deterministic ways to construct good \(\{\xi ^\nu \}\) (\(\{\xi ^\nu ,\lambda _\nu \}\)). Thus, the Marcinkiewicz-type problem can be split into the following four problems: Under some assumptions on \(X_N\),
-
(I)
Find a condition on m for \(X_N \in \mathcal {M}(m,q)\);
-
(II)
Find a condition on m for \(X_N \in \mathcal {M}^w(m,q)\);
-
(III)
Find a condition on m such that there exists a deterministic construction of \(\{\xi ^\nu \}_{\nu =1}^m\) satisfying (1.1) for all \(f\in X_N\);
-
(IV)
Find a condition on m such that there exists a deterministic construction of \(\{\xi ^\nu ,\lambda _\nu \}_{\nu =1}^m\) satisfying (1.5) for all \(f\in X_N\).
The main results of this paper address the problem (I) in the case \(q=1\). Our method is probabilistic.
We impose the following assumptions on the system \(\{u_i\}_{i=1}^N\) of real functions:
-
A. There exist \(\alpha >0\), \(\beta \), and \(K_1\) such that for all \(i\in [1,N]\), we have
$$\begin{aligned} |u_i(\mathbf x)-u_i(\mathbf y)| \le K_1N^\beta \Vert \mathbf x-\mathbf y\Vert _\infty ^\alpha ,\quad \mathbf x,\mathbf y\in \Omega . \end{aligned}$$(1.7) -
B. There exists a constant \(K_2\) such that \(\Vert u_i\Vert _\infty ^2 \le K_2\), \(i=1,\dots ,N\).
-
C. Define \(X_N:= {\text {span}}(u_1,\dots ,u_N)\). There exist two constants \(K_3\) and \(K_4\) such that the following Nikol’skii-type inequality holds for all \(f\in X_N\):
$$\begin{aligned} \Vert f\Vert _\infty \le K_3N^{K_4/p}\Vert f\Vert _p,\quad p\in [2,\infty ). \end{aligned}$$(1.8)
The main result of this paper is the following theorem (see Theorem 5.9).
Theorem 1.1
Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then for large enough \(C_1=C(d,K_1,K_2,K_3,K_4,\Omega ,\alpha ,\beta )\), there exists a set of \(m \le C_1N(\log N)^{7/2}\) points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), such that for any \(f\in X_N\), we have
An important particular case for application of Theorem 1.1 is the case when \(X_N\) is a subspace of trigonometric polynomials. For a finite \(Q\subset \mathbb {Z}^d\), set
The hyperbolic cross polynomials \(\mathcal {T}(Q_n)\) are of special interest (see, for instance, [3]): let \(\mathbf s\in \mathbb {Z}^d_+\),
and
where [a] denotes the integer part of a number a.
The following two theorems were proved in [21]. Denote by \(\mathbb {T}^d\) the d-dimensional torus.
Theorem 1.2
Let \(d=2\). For any \(n\in {\mathbb {N}}\) and large enough absolute constant \(C_1\), there exists a set of \(m \le C_1|Q_n|n^{7/2}\) points \(\xi ^j\in \mathbb {T}^2\), \(j=1,\dots ,m\) such that for any \(f\in \mathcal {T}(Q_n)\), we have
Theorem 1.3
For any \(d\in {\mathbb {N}}\) and \(n\in {\mathbb {N}}\) for large enough absolute constant \(C_1(d)\), there exists a set of \(m \le C_1(d)|Q_n|n^{d/2+3}\) points \(\xi ^j\in \mathbb {T}^d\), \(j=1,\dots ,m\) such that for any \(f\in \mathcal {T}(Q_n)\), we have
Theorem 1.2 addresses the case \(d=2\), and Theorem 1.3 extends Theorem 1.2 to the case of all d. We point out that for \(d=2\), Theorem 1.3 is weaker than Theorem 1.2. Theorem 1.1 gives Theorem 1.2 and improves Theorem 1.3 by replacing an extra factor \(n^{d/2+3}\) by \(n^{7/2}\) in the bound for m. The technique for proving Theorem 1.1 presented in this paper is a development of a technique from [21]. It is a combination of probabilistic technique, based on chaining, and results on the entropy numbers. We present this technique in the following way. In Sect. 2 we prove a conditional theorem (Theorem 2.1), which provides \(X_N \in \mathcal {M}(m,1)\) with an appropriate m under conditions on the entropy numbers of the \(L_1\) ball \(X^1_N := \{f\in X_N:\Vert f\Vert _1 \le 1\}\). The proof of this conditional theorem is based on the chaining technique. In Sect. 3 we discuss new elements of a method that gives good upper bounds for the entropy numbers of the unit \(L_1\) ball \(\mathcal {T}(Q)_1\) in the \(L_\infty \) norm. In Sect. 4 we present results on the Marcinkiewicz-type theorems for the trigonometric polynomials. In Sect. 5 we show how the technique developed in Sect. 3 for the trigonometric polynomials can be generalized for subspaces \(X_N\) satisfying conditions A, B, and C. The main results of the paper are in Sects. 2–5. They are about discretization theorems in \(L_1\). In Sect. 6 we give some comments on the discretization theorems in \(L_2\). This case (the \(L_2\) case) is much better understood than the \(L_1\) case, and it has nice connections to recent strong results on submatrices of orthogonal matrices and on random matrices.
2 Conditional Theorem
We begin with the definition of the entropy numbers. Let X be a Banach space, and let \(B_X\) denote the unit ball of X with the center at 0. Denote by \(B_X(y,r)\) a ball with center y and radius r: \(\{x\in X:\Vert x-y\Vert \le r\}\). For a compact set A and a positive number \(\varepsilon \), we define the covering number \(N_\varepsilon (A)\) as follows:
It is convenient to consider along with the entropy \(H_\varepsilon (A,X):= \log _2 N_\varepsilon (A,X)\) the entropy numbers \(\varepsilon _k(A,X)\):
In our definition of \(N_\varepsilon (A)\) and \(\varepsilon _k(A,X)\) we require \(y^j\in A\). In a standard definition of \(N_\varepsilon (A)\) and \(\varepsilon _k(A,X)\) this restriction is not imposed. However, it is well known (see [15, p. 208]) that these characteristics may differ at most by a factor of 2.
Theorem 2.1
Suppose that a subspace \(X_N\) satisfies the condition \((B\ge 1)\)
Then for large enough absolute constant C, there exists a set of
points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), such that for any \(f\in X_N\), we have
Proof
Our proof uses the chaining technique and proceeds along the lines of the proof of Theorem 3.1 from [21]. We use the following results from [21]. Lemma 2.1 is from [2].
Lemma 2.1
Let \(\{g_j\}_{j=1}^m\) be independent random variables with \(\mathbb {E}g_j=0\), \(j=1,\dots ,m\), which satisfy
Then for any \(\eta \in (0,1)\), we have the following bound on the probability:
We now consider measurable functions \(f(\mathbf x)\), \(\mathbf x\in \Omega \). For \(1\le q<\infty \), define
Let \(\mu \) be a probabilistic measure on \(\Omega \). Denote by \(\mu ^m := \mu \times \cdots \times \mu \) the probabilistic measure on \(\Omega ^m := \Omega \times \cdots \times \Omega \). We will need the following inequality, which is a corollary of Lemma 2.1 (see [21]).
Proposition 2.1
Let \(f_j\in L_1(\Omega )\) be such that
Then
We consider the case where X is \({\mathcal {C}}(\Omega )\) the space of functions continuous on a compact subset \(\Omega \) of \({\mathbb {R}}^d\) with the norm
Sometimes we write \(L_\infty \) instead of \({\mathcal {C}}\) for the space of continuous functions. We use the abbreviated notation
In our case,
Set
Specify \(\eta =1/4\). Define \(\delta _j := \varepsilon _{2^j}\), \(j=0,1,\dots \), and consider minimal \(\delta _j\)-nets \({\mathcal {N}}_j \subset W\) of W in \({\mathcal {C}}(\Omega )\). We use the notation \(N_j:= |{\mathcal {N}}_j|\). Let \(J:=J(N,B)\) be the minimal j satisfying \(\delta _j \le 1/16\). For \(j=1,\dots ,J\), we define a mapping \(A_j\) that associates with a function \(f\in W\) a function \(A_j(f) \in {\mathcal {N}}_j\) closest to f in the \({\mathcal {C}}\) norm. Then, clearly,
We use the mappings \(A_j\), \(j=1,\dots , J\), to associate with a function \(f\in W\) a sequence (a chain) of functions \(f_J, f_{J-1},\dots , f_1\) in the following way:
Let us find an upper bound for J, defined above. Our assumption that \(B\ge 1\) and the definition of J imply that \(2^J\ge N\) and
We derive from (2.3),
Set
We now proceed to the estimate of \(\mu ^m\{\mathbf z:\sup _{f\in W}|L^1_\mathbf z(f)|\ge 1/4\}\). First of all, by the following simple Proposition 2.2, the assumption \(\delta _J\le 1/16\) implies that if \(|L^1_\mathbf z(f)| \ge 1/4\), then \(|L^1_\mathbf z(f_J)|\ge 1/8\).
Proposition 2.2
If \(\Vert f_1-f_2\Vert _\infty \le \delta \), then
Rewriting
we conclude that if \(|L^1_\mathbf z(f)| \ge 1/4\), then at least one of the following events occurs:
Therefore,
Applying Proposition 2.1, we obtain
We now make further estimates for a specific \(m\ge C_1NBJ^2\) with large enough \(C_1\). For j such that \(2^j\le N\), we obtain from the definition of \(\delta _j\),
By our choice of \(\delta _j=\varepsilon _{2^j}\), we get \(N_j\le 2^{2^j} <e^{2^j}\) and, therefore,
for sufficiently large \(C_1\).
In the case \(2^j\in (N, 2^J]\), we have
and
for sufficiently large \(C_1\).
We now estimate \(\mu ^m\{\mathbf z:\sup _{f\in {\mathcal {N}}_1}|L^1_\mathbf z(f)|\ge \eta _1\}\). We use Lemma 2.1 with \(g_j(\mathbf z) = |f(\mathbf x^j)|-\Vert f\Vert _1\). To estimate \(\Vert g_j\Vert _\infty \), we use that by our assumption, \(\varepsilon _1:= \varepsilon _1(X_N^1,{\mathcal {C}}) \le BN\). This means that there exist two points \(y^1\) and \(y^2\) in \(X_N^1\) such that \(X_N^1\subset B_{\mathcal {C}}(y^1,\varepsilon _1)\cup B_{\mathcal {C}}(y^2,\varepsilon _1)\). This implies that the 0 element belongs to one of these balls, say, \(B_{\mathcal {C}}(y^1,\varepsilon _1)\), and, therefore, \(\Vert y^1\Vert _\infty \le \varepsilon _1\). Next, \((y^1+y^2)/2\) belongs to one of those balls, and, therefore, \(\Vert y^1-y^2\Vert _\infty \le 2\varepsilon _1\). Thus, \(\Vert y^i\Vert _\infty \le 3\varepsilon _1\), \(i=1,2\). This implies that for any \(f\in W\), we have
Then Lemma 2.1 gives
for sufficiently large \(C_1\). Substituting the above estimates into (2.5), we obtain
Therefore, there exists \(\mathbf z_0=(\xi ^1,\dots ,\xi ^m)\) such that for any \(f\in W\), we have
Taking into account that \(\Vert f\Vert _1=1/2\), for \(f\in W\) we obtain the statement of Theorem 4.1 with \(C_2 =1/2\), \(C_3=3/2\). \(\square \)
In the above proof of Theorem 2.1, we specified \(\eta =1/4\). We can carry out that proof for \(\eta \in (0,1/4]\). In this case we define \(J:=J(N,B,\eta )\) to be the minimal j satisfying \(\delta _j \le \eta /4\) and set \(\eta _j:= \frac{\eta }{2J}\), \(j=1,\dots , J\). Then under assumptions \(\eta \ge 2^{-N}\) and \(\log (2B) \le N\), we obtain the following bound on J:
instead of (2.4). Further, we make the estimates for \(m\ge C_1NBJ^2 \eta ^{-2}\). This modification of Theorem 2.1 gives the following version of Theorem 2.1.
Theorem 2.2
Suppose that a subspace \(X_N\) satisfies the condition \((B\ge 1)\)
Then for large enough absolute constant C and for \(\epsilon \in (0,1)\), there exists a set of
points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), such that for any \(f\in X_N\), we have
3 The Entropy Numbers of \(\mathcal {T}(Q)_1\)
We use the technique developed in [20], which is based on the following two-step strategy. In the first step, we obtain bounds of the best m-term approximations with respect to a dictionary. In the second step, we use general inequalities relating the entropy numbers to the best m-term approximations. We begin the detailed discussion with the second step of the above strategy. Let \({\mathcal {D}}=\{g_j\}_{j=1}^N\) be a system of elements of cardinality \(|{\mathcal {D}}|=N\) in a Banach space X. Consider best m-term approximations of f with respect to \({\mathcal {D}}\),
For a function class F, set
The following results are from [16].
Theorem 3.1
Let a compact \(F\subset X\) be such that there exists a system \({\mathcal {D}}\), \(|{\mathcal {D}}|=N\), and a number \(r>0\) such that
Then for \(k\le N\),
Remark 3.1
Suppose that a compact F from Theorem 3.1 belongs to an N-dimensional subspace \(X_N:={\text {span}}({\mathcal {D}})\). Then in addition to (3.1) we have for \(k\ge N\),
We point out that Remark 3.1 is formulated for a complex Banach space X. In the case of real Banach space X, we have \(2^{-k/N}\) instead of \(2^{-k/(2N)}\) in (3.2).
We begin with the best m-term approximation of elements of \(\mathcal {T}(Q)_1:=\{f\in \mathcal {T}(Q):\Vert f\Vert _1\le 1\}\) in \(L_2\) with respect to a special dictionary \({\mathcal {D}}^1:={\mathcal {D}}^1(Q)\) associated with Q. Define
Then \(\Vert w_Q\Vert _2 =1\). Consider the dictionary
For a dictionary \({\mathcal {D}}\) in a Hilbert space H with an inner product \(\langle \cdot ,\cdot \rangle \), denote by \(A_1({\mathcal {D}})\) the closure of the convex hull of the dictionary \({\mathcal {D}}\). In the case of a complex Hilbert space, define the symmetrized dictionary \({\mathcal {D}}^s := \{e^{i\theta } g: g\in {\mathcal {D}}, \theta \in [0,2\pi ]\}\). We use the weak orthogonal greedy algorithm (weak orthogonal matching pursuit) for m-term approximation. We recall the corresponding definition and formulate the known result, which we will use.
Weak Orthogonal Greedy Algorithm (WOGA) Let \(t\in (0,1]\) be a weakness parameter. We define \(f^{o,t}_0 :=f\). Then for each \(m\ge 1\), we inductively define:
-
(1)
\(\varphi ^{o,t}_m \in {\mathcal {D}}\) is any element satisfying
$$\begin{aligned} \left| \left\langle f^{o,t}_{m-1},\varphi ^{o,t}_m\right\rangle \right| \ge t \sup _{g\in {\mathcal {D}}} \left| \left\langle f^{o,t}_{m-1},g\right\rangle \right| . \end{aligned}$$ -
(2)
Let \(H_m^t := {\text {span}}(\varphi _1^{o,t},\dots ,\varphi ^{o,t}_m)\), and let \(P_{H_m^t}(f)\) denote an operator of orthogonal projection onto \(H_m^t\). Define
$$\begin{aligned} G_m^{o,t}(f,{\mathcal {D}}) := P_{H_m^t}(f). \end{aligned}$$ -
(3)
Define the residual after mth iteration of the algorithm
$$\begin{aligned} f^{o,t}_m := f-G_m^{o,t}(f,{\mathcal {D}}). \end{aligned}$$
In the case \(t=1\), the WOGA is called the orthogonal greedy algorithm (OGA). The following theorem is from [12] (see also [15, Ch. 2]).
Theorem 3.2
Let \({\mathcal {D}}\) be an arbitrary dictionary in H. Then for each \(f \in A_1({\mathcal {D}}^s)\), we have
We now prove the following assertion.
Theorem 3.3
For any finite \(Q\subset \mathbb {Z}^d\), we have
Proof
Each \(f\in \mathcal {T}(Q)_1\) has a representation
It follows from \(\Vert f\Vert _1 \le 1\) and (3.4) that \(f|Q|^{-1/2} \in A_1(({\mathcal {D}}^1)^s)\). Therefore, by Theorem 3.2 we get the required bound. \(\square \)
Dictionary \({\mathcal {D}}^1(Q)\) is an infinite dictionary. In our further applications we would like to have a finite dictionary. Here we consider \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), where \(\Pi (\mathbf N):=[-N_1,N_1]\times \cdots \times [-N_d,N_d]\), \(\mathbf N=(N_1,\dots ,N_d)\). We write
and set
Then for any \(t\in \mathcal {T}(\Pi (\mathbf N))\) (see [23, Ch. 10]),
where \(\vartheta (\mathbf N) := \prod _{j=1}^d (2N_j + 1)=\dim \mathcal {T}(\Pi (\mathbf N))\). Specify \(\mathbf N:=(2^n,\dots ,2^n)\) and define
Then, clearly, \(|{\mathcal {D}}^2(Q)|=\vartheta (\mathbf N) = (2^{n+1}+1)^d\). Also, it is well known that for \(f\in \mathcal {T}(\Pi (\mathbf N))\), one has
and, therefore, for \(f\in \mathcal {T}(Q)\), \(Q\subset \Pi (\mathbf N)\),
In particular, (3.5) and (3.7) imply that there exists \(C(d)>0\) such that for every \(f\in \mathcal {T}(Q)_1\), we have \(C(d)^{-1}|Q|^{-1/2} f\in A_1(({\mathcal {D}}^2)^s)\). Therefore, we have the following version of Theorem 3.3.
Theorem 3.4
For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), we have
and \(|{\mathcal {D}}^2(Q)|\le C'(d)2^{nd}\).
Theorems 3.3 and 3.4 provide bounds for the best m-term approximation of elements of \(\mathcal {T}(Q)_1\) in the \(L_2\) norm. For applications in the Marcinkiewicz discretization theorem, we need bounds for the entropy numbers in the \(L_\infty \) norm. As we explained above, we derive appropriate bounds for the entropy numbers from the corresponding bounds on the best m-term approximations with the help of Theorem 3.1. Thus we need bounds on the best m-term approximations in the \(L_\infty \) norm. We proceed in the same way as in [20] and use the following dictionary:
In order to obtain the bounds in the \(L_\infty \) norm, we use the following theorem from [20], which in turn is a corollary of the corresponding result from [18].
Theorem 3.5
Let \(\Lambda \subset \Pi (\mathbf N)\), with \(N_j=2^n\), \(j=1,\dots ,d\). There exist constructive greedy-type approximation methods \(G^\infty _m(\cdot )\) that provide m-term polynomials with respect to \(\mathcal {T}^d\) with the following properties:
for \(f\in \mathcal {T}(\Lambda )\), we have \(G^\infty _m(f)\in \mathcal {T}(\Lambda )\) and
We now consider a dictionary
Lemma 3.1
For any \(Q\subset \Pi (\mathbf N)\), with \(\mathbf N=(2^n,\dots ,2^n)\), we have
and \(|{\mathcal {D}}^3(Q)|\le C'(d)2^{nd}\).
Proof
Take \(f\in \mathcal {T}(Q)_1\). Applying first Theorem 3.4 with [m / 2] and then applying Theorem 3.5 with \(\Lambda = Q\) and [m / 2], we obtain
which proves the lemma. \(\square \)
Lemma 3.1, Theorem 3.1, and Remark 3.1 imply the following result on the entropy numbers.
Theorem 3.6
For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), we have
The above theorem with \(Q=Q_n\) can be used for proving the upper bounds for the entropy numbers of the mixed smoothness classes. We define the classes that were studied in [19, 20].
Let \(\mathbf s=(s_1,\dots ,s_d)\) be a vector with non-negative integer coordinates (\(\mathbf s\in \mathbb {Z}^d_+\)) and, as above,
where [a] denotes the integer part of a number a. Define for \(f\in L_1\),
and
Consider the class (see [19])
Define
Here is one more class, which is equivalent to \(\mathbf W^{a,b}_q\) in the case \(1<q<\infty \) (see [19]). Consider a class \({\bar{\mathbf W}}^{a,b}_q\), which consists of functions f with a representation
In the case \(q=1\), classes \({\bar{\mathbf W}}^{a,b}_1\) are wider than \(\mathbf W^{a,b}_1\).
The following theorem was proved in [20].
Theorem 3.7
Let \(d=2\) and \(a>1\). Then
We prove here an extension of Theorem 3.7 to all d. We note that this extension—Theorem 3.8—is weaker than Theorem 3.7 in the case \(d=2\).
Theorem 3.8
Let \(a>1\). Then
Proof
The proof is based on the following general result from [20]. Let X and Y be two Banach spaces. We discuss a problem of estimating the entropy numbers of an approximation class, defined in the space X, in the norm of the space Y. Suppose a sequence of finite dimensional subspaces \(X_n \subset X\), \(n=1,\dots \), is given. Define the following class:
In particular,
Set \(D_n:=\dim X_n\), and assume that for the unit balls \(B(X_n):=\{f\in X_n: \Vert f\Vert _X\le 1\}\), we have the following upper bounds for the entropy numbers: there exist real \(\alpha \) and non-negative \(\gamma \) and \(\beta \in (0,1]\) such that
Theorem 3.9
Assume \(D_n \asymp 2^n n^c\), \(c\ge 0\), \(a>\beta \), and subspaces \(\{X_n\}\) satisfy (3.10). Then
Theorem 3.6 with \(Q=Q_n\) yields (3.10) with \(\alpha =3/2\), \(\beta =1\), \(\gamma =0\). It remains to apply Theorem 3.9 with \(X_n=\mathcal {T}(Q_n)\) and \(c=d-1\). \(\square \)
4 The Marcinkiewicz-Type Discretization Theorem for the Trigonometric Polynomials
In this section we improve Theorem 1.3 from the introduction, which was proved in [21], in two directions. We prove the Marcinkiewicz-type discretization theorem for \(\mathcal {T}(Q)\) instead of \(\mathcal {T}(Q_n)\) for a rather general Q. Also, even in a more general situation, we improve the bound from \(m \le C_1(d)|Q_n|n^{d/2+3}\) to \(m \le C_1(d)|Q_n|n^{7/2}\) similar to that in Theorem 1.2. Our proof is based on the conditional Theorems 2.1 and 3.6.
We now prove the Marcinkiewicz-type theorem for discretization of the \(L_1\) norm of polynomials from \(\mathcal {T}(Q)\).
Theorem 4.1
There is a large enough constant \(C_1(d)\) with the property: For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), there exists a set of \(m \le C_1(d)|Q|n^{7/2}\) points \(\xi ^j\in \mathbb {T}^d\), \(j=1,\dots ,m\) such that for any \(f\in \mathcal {T}(Q)\), we have
Proof
We use Theorem 3.6, proved in Sect. 3. We formulate it here for the reader’s convenience. We stress that Theorem 4.2 is the only result on the specific features of the \(\mathcal {T}(Q)\) that we use in the proof of Theorem 4.1.
Theorem 4.2
For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\), we have
We now apply Theorem 2.1 with \(N=2|Q|\) and \(B=C_4(d)n^{3/2}\). This completes the proof. \(\square \)
In the above proof of Theorem 2.1, we specified \(\eta =1/4\). If instead we take \(\eta \in [2^{-2^{nd/2}},1/4]\), define \(J(\eta )\) to be the minimal j satisfying \(\delta _j \le \eta /4\), and set
then we obtain the following generalization of Theorem 4.1.
Theorem 4.3
For any \(Q\subset \Pi (\mathbf N)\) with \(\mathbf N=(2^n,\dots ,2^n)\) and \(\epsilon \in [2^{1-2^{nd/2}},1/2]\), there exists a set of \(m \le C_1(d)|Q|n^{7/2}\epsilon ^{-2}\) points \(\xi ^j\in \mathbb {T}^d\), \(j=1,\dots ,m\), such that for any \(f\in \mathcal {T}(Q)\), we have
5 Some Marcinkiewicz-Type Discretization Theorems for General Polynomials
In this section we extend the technique developed in Sects. 3 and 4 to the case of a general orthonormal system \(\{u_i\}_{i=1}^N\) on a compact \(\Omega \subset {\mathbb {R}}^d\), which satisfies conditions A, B, and C from the introduction. Let \(\mu \) be a probability measure on \(\Omega \). It is convenient for us to assume that \(u_i\), \(i=1,\dots ,N\), are real functions and set
Denote the unit \(L_p\) ball in \(X_N\) by
We begin with the estimates of the entropy numbers \(\varepsilon _k(X^1_N,L_\infty )\). We use the same strategy as above: first we get bounds on m-term approximations for \(X^1_N\) in \(L_2\) with respect to a dictionary \({\mathcal {D}}^1\), second we obtain bounds on m-term approximations for \(X^2_N\) in \(L_\infty \) with respect to a dictionary \({\mathcal {D}}^2\), third we get bounds on m-term approximations for \(X^1_N\) in \(L_\infty \) with respect to a dictionary \({\mathcal {D}}^3={\mathcal {D}}^1\cup {\mathcal {D}}^2\). Then we apply Theorem 3.1 to obtain the entropy numbers estimates.
5.1 Sparse Approximation in \(L_2\)
We begin with the study of m-term approximations with respect to the dictionary
where
is the Dirichlet kernel for the system \(\{u_i\}_{i=1}^N\). Then assumption B guarantees that \(\Vert g_\mathbf y\Vert _2 \le 1\). We now use the following greedy-type algorithm (see [15, p. 82]).
Relaxed Greedy Algorithm (RGA) Let \(f^r_0:=f\) and \(G_0^r(f):= 0\). For a function h from a real Hilbert space H, let \(g=g(h)\) denote the function from \({\mathcal {D}}^\pm :=\{\pm g:g\in {\mathcal {D}}\}\) that maximizes \(\langle h,g\rangle \) (we assume the existence of such an element). Then, for each \(m\ge 1\), we inductively define
We use the following known result (see [15, p. 90]).
Theorem 5.1
For the relaxed greedy algorithm, we have, for each \(f\in A_1({\mathcal {D}}^\pm )\), the estimate
In our application of the above RGA, the Hilbert space H is the \(X_N\) with the \(L_2\) norm and the dictionary \({\mathcal {D}}\) is the \({\mathcal {D}}^0\) defined above. Using the representation
we see that the search for \(g\in ({\mathcal {D}}^0)^\pm \) maximizing \(\langle h,g\rangle \), \(h\in X_N\), is equivalent to the search for \(\mathbf y\in \Omega \) maximizing \(|h(\mathbf y)|\). A function h from \(X_N\) is continuous on the compact \(\Omega \), and therefore such a maximizing \(\mathbf y_{\max }\) exists. This means that we can run the RGA.
For \(f\in X^1_N\), by representation (5.1) we obtain
Therefore,
Applying Theorem 5.1, we get the following result.
Theorem 5.2
For the relaxed greedy algorithm with respect to \({\mathcal {D}}^0\), we have, for each \(f\in X^1_N\), the estimate
We need an analog of Theorem 5.2 for a discrete version of \({\mathcal {D}}^0\). Take a \(\delta >0\) and let \(\{\mathbf y^1,\dots ,\mathbf y^M\}\), \(M=M(\delta )\), be a \(\delta \)-net of points in \(\Omega \), which means that for any \(\mathbf y\in \Omega \), there is a \(\mathbf y^j\) from the net such that \(\Vert \mathbf y-\mathbf y^j\Vert _\infty \le \delta \). It is clear that
It follows from the definition of the RGA that \(G_m^r(f)\in A_1({\mathcal {D}}^\pm )\) provided \(f\in A_1({\mathcal {D}}^\pm )\). Let \(f\in X^1_N\), and let \(G_m^r(f)\) be its approximant from Theorem 5.2. Then
For each \(\mathbf y(k)\), find \(\mathbf y^{j(k)}\) from the net such that \(\Vert \mathbf y(k)-\mathbf y^{j(k)}\Vert _\infty \le \delta \). Then, using assumption A, we get
Let
Combining (5.3) and (5.4), we obtain
Choosing \(\delta \) such that
we obtain by Theorem 5.2 and (5.5) that for \(f\in X^1_N\),
Inequality (5.2) gives
Define the dictionary \({\mathcal {D}}^1\) as follows:
Relation (5.6) gives us the following theorem.
Theorem 5.3
We have
5.2 Sparse Approximation in \(L_\infty \)
In this subsection we study m-term approximations of \(f\in X^2_N\) in the \(L_\infty \) norm with respect to the following dictionary:
Then by property B, for all p we have \(\Vert g_i\Vert _p \le 1\).
In this subsection we use greedy algorithms in Banach spaces. We recall some notation from the theory of greedy approximation in Banach spaces. The reader can find a systematic presentation of this theory in [15, Chapter 6]. Let X be a Banach space with norm \(\Vert \cdot \Vert \). We say that a set of elements (functions) \({\mathcal D}\) from X is a dictionary if each \(g\in {\mathcal D}\) has norm less than or equal to one (\(\Vert g\Vert \le 1\)) and the closure of \({\text {span}}{\mathcal D}\) coincides with X. We note that in [13] we required in the definition of a dictionary normalization of its elements (\(\Vert g\Vert =1\)). However, it is pointed out in [14] that it is easy to check that the arguments from [13] work under assumption \(\Vert g\Vert \le 1\) instead of \(\Vert g\Vert =1\). In applications it is more convenient for us to have an assumption \(\Vert g\Vert \le 1\) than normalization of a dictionary.
For an element \(f\in X\), we denote by \(F_f\) a norming (peak) functional for f:
The existence of such a functional is guaranteed by the Hahn–Banach theorem.
We proceed to the incremental greedy algorithm (see [14] and [15, Chapter 6]). Let \(\epsilon =\{\epsilon _n\}_{n=1}^\infty \), \(\epsilon _n> 0\), \(n=1,2,\dots \) . For a Banach space X and a dictionary \({\mathcal {D}}\), define the following algorithm IA(\(\epsilon \)) \(:=\) IA(\(\epsilon ,X,{\mathcal {D}}\)).
Incremental Algorithm with Schedule \(\varvec{\epsilon }\) (IA(\(\varvec{\epsilon ,X},{\mathcal {D}}\))) Set \(f_0^{i,\epsilon }:= f\) and \(G_0^{i,\epsilon } :=0\). Then, for each \(m\ge 1\), we have the following inductive definition:
-
(1)
\(\varphi _m^{i,\epsilon } \in {\mathcal {D}}\) is any element satisfying
$$\begin{aligned} F_{f_{m-1}^{i,\epsilon }}\left( \varphi _m^{i,\epsilon }-f\right) \ge -\epsilon _m. \end{aligned}$$ -
(2)
Define
$$\begin{aligned} G_m^{i,\epsilon }:= (1-1/m)G_{m-1}^{i,\epsilon } +\varphi _m^{i,\epsilon }/m. \end{aligned}$$ -
(3)
Let
$$\begin{aligned} f_m^{i,\epsilon } := f- G_m^{i,\epsilon }. \end{aligned}$$We consider here approximation in uniformly smooth Banach spaces. For a Banach space X, we define the modulus of smoothness
$$\begin{aligned} \rho (u) := \sup _{\Vert x\Vert =\Vert y\Vert =1}\left( \frac{1}{2}(\Vert x+uy\Vert +\Vert x-uy\Vert )-1\right) . \end{aligned}$$The uniformly smooth Banach space is the one with the property
$$\begin{aligned} \lim _{u\rightarrow 0}\rho (u)/u =0. \end{aligned}$$
It is well known (see for instance [4, Lemma B.1]) that in the case \(X=L_p\), \(1\le p < \infty \), we have
Denote by \(A_1({\mathcal {D}}):=A_1({\mathcal {D}})(X)\) the closure in X of the convex hull of \({\mathcal {D}}\). In order to be able to run the IA(\(\epsilon \)) for all iterations, we need the existence of an element \(\varphi _m^{i,\epsilon } \in {\mathcal {D}}\) at the step (1) of the algorithm for all m. It is clear that the following condition guarantees such existence (see [17]).
Condition B We say that for a given dictionary \({\mathcal {D}}\), an element f satisfies Condition B if for all \(F\in X^*\), we have
It is well known (see, for instance, [15, p. 343]) that any \(f\in A_1({\mathcal {D}})\) satisfies Condition B. For completeness we give this simple argument here. Take any \(f\in A_1({\mathcal {D}})\). Then for any \(\epsilon >0\), there exist \(g_1^\epsilon ,\dots ,g_N^\epsilon \in {\mathcal {D}}\) and numbers \(a_1^\epsilon ,\dots ,a_N^\epsilon \) such that \(a_i^\epsilon >0\), \(a_1^\epsilon +\dots +a_N^\epsilon = 1\) and
Thus
which proves Condition B.
We note that Condition B is equivalent to the property \(f\in A_1({\mathcal {D}})\). Indeed, as we showed above, the property \(f\in A_1({\mathcal {D}})\) implies Condition B. Let us show that Condition B implies that \(f\in A_1({\mathcal {D}})\). Assuming the contrary, \(f\notin A_1({\mathcal {D}})\), by the separation theorem for convex bodies, we find \(F\in X^*\) such that
which contradicts Condition B.
We formulate results on the IA(\(\epsilon \)) in terms of Condition B because in the applications it is easy to check Condition B.
Theorem 5.4
Let X be a uniformly smooth Banach space with modulus of smoothness \(\rho (u)\le \gamma u^q\), \(1<q\le 2\). Define
Then, for every f satisfying Condition B, we have
In the case \(f\in A_1({\mathcal {D}})\), this theorem is proved in [14] (see also [15, Chapter 6]). As we mentioned above, Condition B is equivalent to \(f\in A_1({\mathcal {D}})\).
For \(f\in X_N\), write \(f=\sum _{i=1}^N c_ig_i\) and define
Theorem 5.5
Assume that \(X_N\) satisfies C. For any \(t\in X_N\) the IA(\(\epsilon ,X_N\cap L_p,{\mathcal {D}}^2\)) with an appropriate p and schedule \(\epsilon \) applied to \(f:=t/\Vert t\Vert _A\) yields, after m iterations, an m-term polynomial \(G_m(t):=G^{i,\epsilon }_m(f)\Vert t\Vert _A\) with the following approximation property:
with a constant \(C=C(K_3,K_4)\).
Proof
It is clear that it is sufficient to prove Theorem 5.5 for \(t\in X_N\) with \(\Vert t\Vert _A =1\). Then \(t\in A_1({\mathcal {D}}^2)(X_N\cap L_p)\) for all \(p\in [2,\infty )\). Applying the IA(\(\epsilon \)) to f with respect to \({\mathcal {D}}^2\), we obtain by Theorem 5.4 after m iterations
where \(\sum _{j\in \Lambda } \frac{a_j}{m}g_j\) is the \(G^{i,\epsilon }_m(t)\). By (5.8) we find \(\gamma \le p/2\). Next, by the Nikol’skii inequality from assumption C we get from (5.9),
Choosing \(p\asymp \ln N\), we obtain the bound desired in Theorem 5.5. \(\square \)
Using the following simple relations:
we obtain from Theorem 5.5 the following estimates.
Theorem 5.6
We have
Combining Theorems 5.3 and 5.6, we obtain:
Theorem 5.7
We have
5.3 The Entropy Numbers
By our construction (see (5.7)), we obtain
Theorem 5.7, Theorem 3.1, and Remark 3.1 (its version for the real case) imply the following result on the entropy numbers.
Theorem 5.8
Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then we have
In the same way that Theorem 4.1 was derived from Theorems 3.6 and 2.1, the following Theorem 5.9 can be derived from Theorems 5.8 and 2.1.
Theorem 5.9
Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then there exists a set of \(m \le C_1N(\log N)^{7/2}\) points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), \(C_1=C(d,K_1,K_2,K_3,K_4,\Omega ,\alpha ,\beta )\), such that for any \(f\in X_N\), we have
The following analog of Theorem 4.3 holds for general systems.
Theorem 5.10
Suppose that a real orthonormal system \(\{u_i\}_{i=1}^N\) satisfies conditions A, B, and C. Then for \(\epsilon \in [2^{-N},1/2]\), there exists a set of \(m \le C_1N(\log N)^{7/2}\epsilon ^{-2}\) points \(\xi ^j\in \Omega \), \(j=1,\dots ,m\), \(C_1=C(d,K_1,K_2,K_3,K_4,\Omega ,\alpha ,\beta )\), such that for any \(f\in X_N\), we have
6 The Marcinkiewicz-Type Theorem in \(L_2\)
In this section we discuss some known results directly connected to the discretization theorems and demonstrate how recent results on random matrices can be used to obtain the Marcinkiewicz-type theorem in \(L_2\). We begin with the formulation of the Rudelson result from [11]. In [11], it is formulated in terms of submatrices of an orthogonal matrix. We reformulate it in our notation. Let \(\Omega _M=\{x^j\}_{j=1}^M\) be a discrete set with the probability measure \(\mu (x^j)=1/M\), \(j=1,\dots ,M\). Assume that \(\{u_i(x)\}_{i=1}^N\) is a real orthonormal on \(\Omega _M\) system satisfying the following condition: for all j,
with some \(t\ge 1\). Then for every \(\epsilon >0\), there exists a set \(J\subset \{1,\dots ,M\}\) of indices with cardinality
such that for any \(f=\sum _{i=1}^N c_iu_i\), we have
In particular, the above result implies that for any orthonormal system \(\{u_i\}_{i=1}^N\) on \(\Omega _M\), satisfying (6.1), we have
with large enough C. We note that (6.1) is satisfied if the system \(\{u_i\}_{i=1}^N\) is uniformly bounded: \(\Vert u_i\Vert _\infty \le t\), \(i=1,\dots ,N\).
We first demonstrate how the Bernstein-type concentration inequality for matrices can be used to prove an analog of Rudelson’s result above for a general \(\Omega \). Our proof is based on a different idea than Rudelson’s proof. Let \(\{u_i\}_{i=1}^N\) be an orthonormal system on \(\Omega \), satisfying the condition
D. For \(x\in \Omega \), we have
With each \(x\in \Omega \) we associate the matrix \(G(x) := [u_i(x)u_j(x)]_{i,j=1}^N\). Clearly, G(x) is a symmetric matrix. We will also need the matrix \(G(x)^2\). We have for the (k, l) element of \(G(x)^2\),
Therefore,
We use the following Bernstein-type concentration inequality for matrices (see [22]).
Theorem 6.1
Let \(\{T_k\}_{k=1}^n\) be a sequence of independent random symmetric \(N\times N\) matrices. Assume that each \(T_k\) satisfies:
Then for all \(\eta \ge 0\),
where \(\sigma ^2 := \left\| \sum _{k=1}^n \mathbb {E}(T_k^2)\right\| \).
We now consider a sequence \(T_k := G(x^k)-I\), \(k=1,\dots ,m\) of independent random symmetric matrices. Orthonormality of the system \(\{u_i\}_{i=1}^N\) implies that \(\mathbb {E}(T_k)=0\) for all k. Relation (6.4) and our assumption D imply for all k,
Define \(T(x):= G(x)-I\), and, using (6.3) and (6.4), we can write
Then by the orthonormality of the system \(\{u_i\}_{i=1}^N\), we get
and, therefore, we obtain
Thus, by Theorem 6.1 we obtain for \(\eta \le 1\),
with an absolute constant c.
For a set of points \(\xi ^k\in \Omega \), \(k=1,\dots ,m\), and \(f=\sum _{i=1}^N b_iu_i\), we have
where \({\mathbf b} = (b_1,\dots ,b_N)^T\) is the column vector. Therefore,
We now make \(m = [CN\eta ^{-2}\log N]\) with large enough C. Then, using (6.7) with \(n=m\), we get the corresponding probability \(<1\). Thus, we have proved the following theorem.
Theorem 6.2
Let \(\{u_i\}_{i=1}^N\) be an orthonormal system, satisfying condition D. Then for every \(\epsilon \in (0,1)\), there exists a set \(\{\xi ^j\}_{j=1}^m \subset \Omega \) with
such that for any \(f=\sum _{i=1}^N c_iu_i\), we have
We note that Theorem 6.2 treats a special case, when (6.3) instead of (6.1) is satisfied. This is the case, for instance, for the trigonometric and the Walsh systems. In this special case, Theorem 6.2 is more general and slightly stronger than the Rudelson theorem discussed in the beginning of this section. Theorem 6.2 yields the Marcinkiewicz-type discretization theorem for a general domain \(\Omega \) instead of a discrete set \(\Omega _M\). Also, in Theorem 6.2 we have an extra \(\log N\) instead of \(\log \frac{Nt^2}{\epsilon ^2}\) in (6.2).
In [21] we showed how to derive the following result from the recent paper by Nitzan et al. [10], which in turn is based on the paper of Marcus et al. [9].
Theorem 6.3
Let \(\Omega _M=\{x^j\}_{j=1}^M\) be a discrete set with the probability measure \(\mu (x^j)=1/M\), \(j=1,\dots ,M\). Assume that \(\{u_i(x)\}_{i=1}^N\) is an orthonormal on \(\Omega _M\) system (real or complex). Assume in addition that this system has the following property: for all \(j=1,\dots , M\), we have
Then there is an absolute constant \(C_1\) such that there exists a subset \(J\subset \{1,2,\dots ,M\}\) with the property: \(m:=|J| \le C_1 N\), and for any \(f\in X_N:= {\text {span}}\{u_1,\dots ,u_N\}\), we have
where \(C_2\) and \(C_3\) are absolute positive constants.
Theorem 6.3 is based on the following lemma from [10].
Lemma 6.1
Let a system of vectors \(\mathbf v_1,\dots ,\mathbf v_M\) from \(\mathbb {C}^N\) have the following properties: for all \(\mathbf w\in \mathbb {C}^N\),
and
Then there is a subset \(J\subset \{1,2,\dots ,M\}\) such that for all \(\mathbf w\in \mathbb {C}^N\),
where \(c_0\) and \(C_0\) are some absolute positive constants.
The above Lemma 6.1 was derived from the following theorem from Marcus et al. [9], which solved the Kadison–Singer problem.
Theorem 6.4
Let a system of vectors \(\mathbf v_1,\dots ,\mathbf v_M\) from \(\mathbb {C}^N\) have the following properties: for all \(\mathbf w\in \mathbb {C}^N\), we have
and \(\Vert \mathbf v_j\Vert _2^2 \le \epsilon \).
Then there exists a partition of \(\{1,\dots ,M\}\) into two sets \(S_1\) and \(S_2\), such that for each \(i=1,2\), we have for all \(\mathbf w\in \mathbb {C}^N\),
Second, we demonstrate a method of proof that is different from the above proof of Theorem 6.2 and that allows us to replace condition D by the following more general condition E that is similar to (6.1).
E. There exists a constant t such that
The new proof method uses the fact that the matrix G(x) is a semi-definite matrix. It is based on the following result (see [22, Theorem 1.1]) on random matrices.
Theorem 6.5
Consider a finite sequence \(\{T_k\}_{k=1}^m\) of independent, random, self-adjoint matrices with dimension N. Assume that each random matrix is semi-positive and satisfies
Define
Then
for \(\eta \in [0,1)\), and for \(\eta \ge 0\),
As above, we consider the matrix \(G(x) := [u_i(x)u_j(x)]_{i,j=1}^N\). Clearly, G(x) is a symmetric matrix. Consider a sequence \(T_k := G(x^k)\), \(k=1,\dots ,m\), of independent random symmetric matrices. It is easy to see that \(T_k\) are semi-positive definite. Orthonormality of the system \(\{u_i\}_{i=1}^N\) implies that \(\mathbb {E}(T_k)=I\) for all k. This implies that \(s_{\min }=s_{\max } = m\). Relation (6.4) shows that we can take \(R:=Nt^2\). Then Theorem 6.5 implies for \(\eta \le 1\),
with an absolute constant c (we can take \(c=2/\ln 2\)). Using inequality (6.8), which was used in the above proof of Theorem 6.2, we derive from here the following theorem.
Theorem 6.6
Let \(\{u_i\}_{i=1}^N\) be an orthonormal system satisfying condition E. Then for every \(\epsilon >0\), there exists a set \(\{\xi ^j\}_{j=1}^m \subset \Omega \) with
such that for any \(f=\sum _{i=1}^N c_iu_i\), we have
We note that Theorem 6.6 is more general and slightly stronger than the Rudelson theorem discussed in the beginning of this section. Theorem 6.6 yields the Marcinkiewicz-type discretization theorem for a general domain \(\Omega \) instead of a discrete set \(\Omega _M\). Also, in Theorem 6.6 we have an extra \(\log N\) instead of \(\log \frac{Nt^2}{\epsilon ^2}\) in (6.2).
The Marcinkiewicz Theorem and Sparse Approximation Our above argument, in particular, inequality (6.8), shows that the Marcinkiewicz-type discretization theorem in \(L_2\) is closely related to approximation of the identity matrix I by an m-term approximant of the form \(\frac{1}{m}\sum _{k=1}^m G(\xi ^k)\) in the operator norm from \(\ell ^N_2\) to \(\ell ^N_2\) (spectral norm). Therefore, we can consider the following sparse approximation problem. Assume that the system \(\{u_i(x)\}_{i=1}^N\) satisfies (6.13), and consider the dictionary
Then condition (6.13) guarantees that for the Frobenius norm of \(g_x\), we have
Our assumption on the orthonormality of the system \(\{u_i\}_{i=1}^N\) gives
which implies that \(I \in A_1({\mathcal {D}}^u,Nt^2)\). Consider the Hilbert space H to be a closure in the Frobenius norm of \({\text {span}}\{g_x, x\in \Omega \}\) with the inner product generated by the Frobenius norm: for \(A=[a_{i,j}]_{i,j=1}^N\) and \(B=[b_{i,j}]_{i,j=1}^N\),
in the case of real matrices (with standard modification in the case of complex matrices).
By Theorem 5.1, for any \(m\in {\mathbb {N}}\), we constructively find (by the RGA) points \(\xi ^1,\dots ,\xi ^m\) such that
Taking into account the inequality \(\Vert A\Vert \le \Vert A\Vert _F\), we get from here and from (6.8) the following proposition.
Proposition 6.1
Let \(\{u_i\}_{i=1}^N\) be an orthonormal system satisfying condition E. Then there exists a constructive set \(\{\xi ^j\}_{j=1}^m \subset \Omega \) with \(m\le C(t)N^2\) such that for any \(f=\sum _{i=1}^N c_iu_i\), we have
The Marcinkiewicz-Type Theorem with Weights We now comment on a recent breakthrough result by Batson et al. [1]. We formulate their result in our notation. Let, as above, \(\Omega _M=\{x^j\}_{j=1}^M\) be a discrete set with the probability measure \(\mu (x^j)=1/M\), \(j=1,\dots ,M\). Assume that \(\{u_i(x)\}_{i=1}^N\) is a real orthonormal system on \(\Omega _M\). Then for any number \(d>1\), there exist a set of weights \(w_j\ge 0\) such that \(|\{j: w_j\ne 0\}| \le dN\), so that for any \(f\in {\text {span}}\{u_1,\dots ,u_N\}\), we have
The proof of this result is based on a delicate study of the m-term approximation of the identity matrix I with respect to the system \({\mathcal {D}}:= \{G(x)\}_{x\in \Omega }\), \(G(x):=[u_i(x)u_j(x)]_{i,j=1}^N\) in the spectral norm. The authors of [1] control the change of the maximal and minimal eigenvalues of a matrix, when they add a rank one matrix of the form wG(x). Their proof provides an algorithm for construction of the weights \(\{w_j\}\). In particular, this implies that
with large enough C.
In this section we discussed two deep general results—the Rudelson theorem and the Batson–Spielman–Srivastava theorem—about submatrices of orthogonal matrices, which provide very good Marcinkiewicz-type discretization theorems for \(L_2\). The reader can find corresponding historical comments in [11]. We also refer the reader to the paper [5] for a discussion of a recent outstanding progress on the theory of submatrices of orthogonal matrices.
7 Discussion
As we pointed out in the introduction, the main results of this paper are on the Marcinkiewicz-type discretization theorems in \(L_1\). We proved here that under certain conditions on an N-dimensional subspace \(X_N\) we can get the corresponding discretization theorems with the number of knots \(m\ll N(\log N)^{7/2}\). This result is only off from the ideal case \(m=N\) by the \((\log N)^{7/2}\) factor. We point out that the situation with the discretization theorems in the \(L_\infty \) case is fundamentally different. A very nontrivial surprising negative result was proved for the \(L_\infty \) case (see [6,7,8]). The authors proved that the necessary condition for \(\mathcal {T}(Q_n)\in \mathcal {M}(m,\infty )\) is \(m\gg |Q_n|^{1+c}\) with absolute constant \(c>0\).
Theorem 2.1 shows that an important ingredient of our technique of proving the Marcinkiewicz discretization theorems in \(L_1\) consists in the study of the entropy numbers \(\varepsilon _k(X^1_N,L_\infty )\). We note that this problem is a nontrivial problem by itself. We demonstrate this on the example of the trigonometric polynomials. It is proved in [20] that in the case \(d=2\), we have
The proof of estimate (7.1) is based on an analog of the small ball inequality for the trigonometric system proved for the wavelet type system (see [20]). This proof uses the two-dimensional specific features of the problem, and we do not know how to extend this proof to the case \(d>2\). Estimate (7.1) is used in the proof of the upper bounds in Theorem 3.7. Theorem 3.7 gives the right order of the entropy numbers for the classes of mixed smoothness. This means that (7.1) cannot be substantially improved. The trivial inequality \(\log (4| Q_n|/k) \ll n\) shows that (7.1) implies the following estimate:
Estimate (7.2) is not as good as (7.1) in application for proving the upper bounds of the entropy numbers of smoothness classes. For instance, instead of the bound in Theorem 3.7, use of (7.2) will give
However, it turns out that in application to the Marcinkiewicz-type discretization theorems, estimates (7.1) and (7.2) give the same bounds on the number of knots \(m\ll |Q_n|n^{7/2}\) (see Theorems 1.2 and 4.1).
As we pointed out above, we do not have an extension of (7.1) to the case \(d>2\). A somewhat straightforward technique presented in [21] gives the following result for all d:
This result is used in [21] to prove Theorem 1.3. An interesting contribution of this paper is the proof of (7.2) for all d and for rather general sets \(\mathcal {T}(Q)_1\) instead of \(\mathcal {T}(Q_n)_1\). An important new ingredient here is the use of dictionary \({\mathcal {D}}^2(Q)\), consisting of shifts of normalized Dirichlet kernels associated with Q, in m-term approximations. Certainly, it would be nice to understand, even in the special case of the hyperbolic cross polynomials \(\mathcal {T}(Q_n)\), whether the embedding \(\mathcal {T}(Q_n) \in \mathcal {M}(m,1)\) with \(m\asymp |Q_n|\) holds. Results of this paper only show that the above embedding holds with \(m \gg |Q_n|n^{7/2}\). We got the extra factor \(n^{7/2}\) as a result of using (7.2), which contributed \(n^{3/2}\), and of using the chaining technique, which contributed \(n^2\).
References
Batson, J., Spielman, D.A., Srivastava, N.: Twice-Ramanujan sparsifiers. SIAM J. Comput. 41, 1704–1721 (2012)
Bourgain, J., Lindenstrauss, J., Milman, V.: Approximation of zonoids by zonotopes. Acta Math. 162, 73–141 (1989)
Dũng, D., Temlyakov, V.N., Ullrich, T.: Hyperbolic cross approximation (2016). arXiv:1601.03978v2 [math.NA]
Donahue, M., Gurvits, L., Darken, C., Sontag, E.: Rate of convex approximation in non-Hilbert spaces. Constr. Approx. 13, 187–220 (1997)
Kashin, B.S.: Lunin’s method for selecting large submatrices with small norm. Matem. Sb. 206, 95–102 (2015)
Kashin, B.S., Temlyakov, V.N.: On a norm and related applications. Mat. Zametki 64, 637–640 (1998)
Kashin, B.S., Temlyakov, V.N.: On a norm and approximation characteristics of classes of functions of several variables. Metric theory of functions and related problems in analysis, Izd. Nauchno-Issled. Aktuarno-Finans. Tsentra (AFTs), Moscow, pp. 69–99 (1999)
Kashin, B.S., Temlyakov, V.N.: The volume estimates and their applications. East J. Approx. 9, 469–485 (2003)
Marcus, A., Spielman, D.A., Srivastava, N.: Interlacing families II: mixed characteristic polynomials and the Kadison–Singer problem. Ann. Math. 182, 327–350 (2015)
Nitzan, S., Olevskii, A., Ulanovskii, A.: Exponential frames on unbounded sets. Proc. Am. Math. Soc. 144, 109–118 (2016)
Rudelson, M.: Almost orthogonal submatrices of an orthogonal matrix. Izr. J. Math. 111, 143–155 (1999)
Temlyakov, V.N.: Weak greedy algorithms. Adv. Comput. Math. 12, 213–227 (2000)
Temlyakov, V.N.: Greedy algorithms in Banach spaces. Adv. Comput. Math. 14, 277–292 (2001)
Temlyakov, V.N.: Greedy-type approximation in Banach spaces and applications. Constr. Approx. 21, 257–292 (2005)
Temlyakov, V.N.: Greedy Approximation. Cambridge University Press, Cambridge (2011)
Temlyakov, V.N.: An inequality for the entropy numbers and its application. J. Approx. Theory 173, 110–121 (2013)
Temlyakov, V.N.: Incremental greedy algorithm and its applications in numerical integration. In: Springer Proceedings in Mathematics and Statistics, Monte Carlo and Quasi-Monte Carlo Methods, MCQMC, Leuven, Belgium, Apr 2014, pp. 557–570 (2014)
Temlyakov, V.N.: Constructive sparse trigonometric approximation and other problems for functions with mixed smoothness. Matem. Sb. 206, 131–160 (2015). arXiv:1412.8647v1 [math.NA]
Temlyakov, V.N.: Constructive sparse trigonometric approximation for functions with small mixed smoothness. Constr. Approx. 45, 467–495 (2017). arXiv:1503.0282v1 [math.NA]
Temlyakov, V.N.: On the entropy numbers of the mixed smoothness function classes. J. Approx. Theory 207, 26–56 (2017). arXiv:1602.08712v1 [math.NA]
Temlyakov, V.N.: The Marcinkewiecz-type discretization theorems for the hyperbolic cross polynomials. Jaen J. Approx. 9(1), 37–63 (2017). arXiv:1702.01617v2 [math.NA]
Tropp, J.A.: User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12, 389–434 (2012)
Zygmund, A.: Trigonometric Series. Cambridge University Press, Cambridge (1959)
Acknowledgements
The work was supported by the Russian Federation Government Grant No. 14.W03.31.0031.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Ronald A. DeVore.
Rights and permissions
About this article
Cite this article
Temlyakov, V.N. The Marcinkiewicz-Type Discretization Theorems. Constr Approx 48, 337–369 (2018). https://doi.org/10.1007/s00365-018-9446-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00365-018-9446-2