Abstract
We introduce the notion “stochastic quasi-interpolation on compact Hausdorff spaces”, and establish Gaussian-type \(L^p\)-concentration inequalities (\(1 \le p \le \infty \)) for stochastic Bernstein polynomials in terms of the modulus of continuity of a target function \(f \in C[0,1]\). For p in the range \(1 \le p < \infty ,\) these inequalities hold true unconditionally in the sense that no additional assumption on a given target function is required. For the case \(p=\infty \), our proof calls for a crucial application of Dvoretzky–Kiefer–Wolfowitz inequality (Dvoretzky et al. in Ann Math Stat 27(3):642–669, 1956; Massart in Ann Probab 18(3):1269–1283, 1990) , and requires a moderate decay condition on the modulus of continuity. Our result for the case \(p=\infty \) confirms a similar conjecture raised in Sun and Wu (Proc Am Math Soc 147(2):671–679, 2019). As a corollary, we show that for all \(1 \le p \le \infty \) the expected \(L^p\)-approximation order of stochastic Bernstein polynomials is comparable to that given by the classical Bernstein polynomials.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Let \(\Omega \) be a compact Hausdorff space, and let \(\mathcal R_p(\Omega )\) denote the totality of Radon probability measures on \(\Omega \). For an \(f \in C(\Omega )\) and \(n \in \mathbb N\), we prescribe a set of n functions \(\psi _{n,1},\ldots , \psi _{n,n} \in C(\Omega )\), and define stochastic quasi-interpolants \(Q^X_n(f)\) of f by
Here \(\{f(X_{n,j})\psi _{n,j}\}_{j=1}^n \) are \( C(\Omega )\)-valued random variables obeying respectively laws \(\{\nu _{n,j}\}_{j=1}^n \subset \mathcal R_p(\Omega )\), the design of which caters to problems on hand and may require some ingenuity. Ideally, one selects functions \(\psi _{n,1},\ldots , \psi _{n,n} \in C(\Omega )\) following the “partition of unity” principle. That is,
In many theoretical and practical problems, however, an exact partition of unity may be either hard to obtain or cumbersome to implement. One then uses a set of functions that forms an approximate partition of unity in the sense that the sequence of functions \(\sum ^n_{j=1} \psi _{n,j} (x)\) approximates 1 within a desirable error bound under an appropriately-selected topology on \(C(\Omega )\). We will call this general type of stochastic approximation scheme “stochastic quasi-interpolation”.
Let \(\mu \) be the uniform probability measure on \(\Omega \) as defined by Niederreiter [27]. For \(1 \le p \le \infty \), let \(L^p(\Omega )\) denote the Banach spaces consisting of all real-valued Borel measurable functions f on \(\Omega \) for which \(\Vert f\Vert _p < \infty \). Here \(\Vert f\Vert _p:=\Vert f\Vert _{L^p(\Omega )}\) is defined by:
Let \(\mathbb P(A)\) and \(\mathbb E(Z)\) denote, respectively, the probability of the event A and the expectation of the random variable Z. A coveted type of estimates in the study of stochastic quasi-interpolation is Gaussian-style \(L^p\)-concentration inequalities for some target functions f of interest: For any given \(\epsilon>\), there holds
Here \(\alpha \ge 1,\) and \(c_1, c_2\) are absolute constants, which can be made explicit with efforts. As a corollary of the above inequality, we derive
where \(c_3\) is an absolute constant depending on \(c_1\) and \(c_2\). This amounts to saying that the “average approximation order” of the stochastic quasi-interpolation scheme is: \(n^{-\alpha /2}\). Of course, one needs to carry out “Bochner integrals” on the underlying Banach space valued random variables en route. Under certain circumstances, it may be impossible to obtain the desired \(L^\infty \)-concentration inequalities. One then settles on the next best thing: For some p in \([1,\infty )\), and any given \(\epsilon >0\), there holds
Due to the compactness of \(\Omega \), if the above inequality holds true for a \(p_1 \in [1,\infty ),\) then it does for all \(p_2\) in the range \(1 \le p_2 \le p_1.\) Many theoretical and practical problems may be modeled under this stochastic quasi-interpolation framework. Literature abounds in implicit applications of stochastic quasi-interpolation techniques. Notably, Wagner [37], and Bourgain and Lindenstrauss [7] applied a stochastic quasi-interpolation method to show the existence of certain node sets giving rise to a near optimal number of segments in Minkowski sums with equal length needed to approximate a euclidean ball under the Hausdorff metric within a prescribed error bound; see also [5, 6]. Gao et al [21] studied the case in which \( X_{n,1}, \ldots , X_{n,n}\) are n-independent copies of the random variables uniformly distributed in \(\Omega .\)
To mitigate uncertainties brought on by unreliable sampling sites, Wu et al. [38] proposed and studied a class of stochastic Bernstein polynomials
in which \(X_{n,0}, X_{n,1}, \cdots , X_{n,n}\) are the order statistics (see [9, 14]) of \((n+1)\) independent copies of the random variable uniformly distributed in (0, 1), and
These are stochastic cousins of the classical Bernstein polynomial \(B_n f\) defined by
Stochastic Bernstein polynomials have a simple structure and therefore are nimble in a wide range of applications. Furthermore, the inherent randomness makes them suitable for Monte Carlo simulations [34]. Authors of [1, 20, 33, 39] have established algebraic and exponential convergence rates for probabilistic convergence of these stochastic Bernstein polynomials.
Authors of [34] introduced the notion “\(L^p\)-probabilistic convergence” (\(1 \le p \le \infty \)) of stochastic Bernstein polynomials (Definition 2.1) and established various \(L^p\)-probabilistic convergence rates, a highlight of which is as follows. For any given \(\varepsilon >0\), p in the range \(1 \le p \le 2\), and any \( f \in C[0,1]\), the following inequality:
holds true under the assumption that
in which \(\omega \left( f, \cdot \right) \) denotes the modulus of continuity of f defined by
These are Gaussian-type concentration inequalities dressed in modulus of continuity, which in the current article will be called \(L^p\)-concentration inequalities (or simply concentration inequalities) for stochastic Bernstein polynomials. However, both the restriction on \(p\ (1 \le p \le 2)\) and the assumption (1.5) are impractical. Concerning the latter, we have received a constructive feedback from practitioners in the field, a gist of which goes as follows. In modeling many real-world problems, one often finds a priori information on the smoothness of an unknown target function elusive. Thus, assumption (1.5) is sometimes impossible to verify beforehand, which has hindered application and foreshadowed the effectiveness of estimate (1.4).
The proof given in [34] depends on assumption (1.5) because it requires harnessing the approximation power of the classical Bernstein polynomial \(B_n f\). In so doing, authors of [34] have inadvertently followed a familiar methodology in a learning theory paradigm advocated by Cucker and Smale [12], and Cucker and Zhou [13], which decomposes an overall error in two parts: the approximation error and the sampling error. Another caveat of the approach is that it implicitly implies that under no circumstance, will any other type of Bernstein polynomials (spawned from their stochastic cousins) outperform the classical ones. But this is not true. Guided by theoretical analysis, we carried out many rounds of numerical simulations, which show that Bernstein polynomials built upon some carefully-designed triangular arrays of knots converge to a Heaviside-type function exponentially fast in the space \(L^1 [0,1]\). We will study this interesting problem in the near future.
Approximation theorists are interested in knowing the average approximation order of the stochastic Bernstein polynomials (1.2). For this goal, assumption (1.5) is nearly harmless. The main result of Sect. 4 (Theorem 4.2) asserts that inequality (1.4) holds true for all \(1 \le p \le \infty \) under a similar assumption to (1.5). This implies the following inequality (see Corollary 4.3):
where C is an absolute constant, which will be specified in Corollary 4.3, and \(\omega (h)\) is an abbreviation for \(\omega (f, h)\). (We will use the abbreviation throughout the article.) The above inequality implies that the expected (average) \(L^p\)-approximation order \((1 \le p \le \infty )\) of stochastic Bernstein polynomials is the same as that achieved by the classical Bernstein polynomial \(B_n f\).Footnote 1 This implication does not depend on assumption (1.5).
In summary, there are good arguments to be made on both sides (with or without assumption (1.5)) in so far as \(L^p\)-concentration inequalities are developed. Henceforth, we will use the phrase “unconditional \(L^p\)-concentration inequalities” to refer to estimates obtained without using assumption (1.5). To facilitate broader applications of stochastic Bernstein polynomials, we will push forward as much as we can to prove unconditional \(L^p\)-concentration inequalities, using assumption (1.5) only as the last resort. The departure of the assumption (1.5) has increased the level of difficulty. For p in the range \(1 \le p \le 2\), Gaussian bounds of mean square beta distributions [3, pp. 125] play a key role in our proof. But the same method does not carry over to the cases \(2< p <\infty \) for which our proof is much involved, calling for a host of techniques including a crucial application of Dvoretzky–Kiefer–Wolfowitz inequality [17, 26]. Finally, we prove an \(L^\infty \)-concentration inequality using a slightly weaker version of assumption (1.5).
Dvoretzky et al. [17] proved the following remarkable result. Let \(X_1, X_2,\ldots , X_n\) be real-valued independent and identically distributed random variables with c.d.f (cumulative distribution function) F. Let \(F_n\) denote the empirical distribution function associated with F defined by
Then for any given \(\varepsilon > 0\), there holds:
We remark that Dvoretzky, Kiefer, and Wolfowitz only established the above inequality with an unspecified multiplicative constant C. The inequality with the sharp constant \(C=2\) was proved by Massart [26], which confirms a conjecture due to Birnbaum and McCarty [2], and has since been known as Dvoretzky–Kiefer–Wolfowitz inequality.
The current investigation has drawn inspirations from previous research done in the theory of random approximation of functions; see [10, 18, 19, 23, 24, 29, 30, 36], of which we mention particularly the idea of employing Bernstein polynomials for density and distribution estimation on the interval [0, 1]:
where \(m, n \in \mathbb N\), and \(\widehat{F}_m\) is the empirical distribution of order m. There are close relationships as well as fundamental differences between the two classes of stochastic Bernstein polynomials. Addressing these interesting topics, however, will take us far afield.
The arrangement of the current article is as follows. In Sect. 2 , we prove unconditional \(L^p\)-concentration inequalities for stochastic Bernstein polynomials (1.2) for p in the range \(1 \le p \le 2\). Among other results, we obtain the unconditional version of inequality (1.4) using theory of mean square beta distribution. In Sect. 3, we prove unconditional \(L^p\)-concentration inequalities for p in the range \(2< p < \infty \). In Sect. 4, we prove an \(L^\infty \)-concentration inequality with an assumption similar to (1.5). As a corollary, we show that for all p in the range \(1 \le p \le \infty \), the average \(L^p\) approximation order of stochastic Bernstein polynomials is the same as that achieved by the classical Bernstein polynomial \(B_n f\) under a mild restriction.
2 Mean Square Beta Distribution and Unconditional \(L^p\)-Concentration Inequalities
Let X be the random variable uniformly distributed in (0, 1). Let \(X_{n,0}, X_{n,1},\ldots , X_{n,n}\) be the order statistics of \((n+1)\) independent copies of X. Then \(X_{n,k}\) is distributed according to the beta distribution \(\mathrm{Beta}(k+1, n-k+1)\) with parameters \((k+1)\) and \((n-k+1)\) that has density function
Beta distributions belong to the wider class of sub-Gaussian distributions that enjoy a Gaussian-type probability concentration inequalities; see [4, 8, 28]. For a fixed \(0 \le x \le 1\), we have
Thus, the random variable \(B^X_n f\) can be studied effectively as a convex linear combination of the \((n+1)\) random variables \(f(X_{n,k})\).
For \(1 \le p \le \infty \) and \(f \in C[0,1]\), we define
Definition 2.1
Let \(1 \le p \le \infty \), and \(f\in C[0, 1]\) be given. If \(\forall \varepsilon >0\),
then we say that \(B^X_nf\) converges to f in probability under the \(L^p\)-norm. We also refer to such probabilistic convergence as “\(L^p\)-probabilistic convergence” of \(B^X_nf\) to f.
The bulk of our effort in the current article is devoted to finding exponential decay ratesFootnote 2 (in terms of the modulus of continuity of an \(f \in C[0,1]\)) for \(L^p\)-probabilistic convergence. We use throughout the article two basic properties of the modulus of continuity. The first one is called “positivity”, which asserts that if \(f \in C[0,1]\) is not a constant, then \(\omega (h)>0\) for \(0<h\le 1\). The second one is referred to as “subadditivity”, as shown in the following inequality:
Neither of the two properties is hard to prove. Authors of [34] showed the following result.
Theorem 2.2
If \(X\sim \mathrm{Beta}(\alpha ,\beta )\), then
Let \({\mathcal {B}}_n\) denote the function defined by
Let (X, Y) be the random vector with the joint density function \({\mathcal {B}}_n\). The resulted distribution on \([0,1]^2\) is often called “mean square beta distribution”; see [3]. With the newly-updated Gaussian tail bound as shown in Theorem 2.2, we reiterate a result of Bobkov and Ledoux [3, Proposition B.12].
Proposition 2.3
If (X, Y) is the random vector with the joint density function \({\mathcal {B}}_n\), then for all \(r \ge 0,\)
We will utilize the random vector (X, Y) as a “majorant” for our stochastic Bernstein polynomials. The best effect of this approach can be seen via its implementation on the uniform distribution in (0, 1). We denote U the c.d.f. of the uniform distribution in (0, 1).
Lemma 2.4
Let (X, Y) be the random vector that has joint density function \({\mathcal {B}}_n(x,y).\) Then for any \(1 \le p < \infty \), we have
Proof
We first use the triangle inequality to write
By Jensen’s inequality, we have
This completes the proof. \(\square \)
The result of the above lemma amounts to saying that for any \(1\le p < \infty \), the p-moment of the random variable \(\Vert B^X_n U -U\Vert _p\) is majorized by that of the random variable \((X-Y)\). The following result shows that the majorization extends to exponential moments.
Lemma 2.5
For any \(1 \le p < \infty \) and \(r \ge 0\), the following inequality holds true:
Proof
Similar to the proof of Lemma 2.4, we first write the following inequality:
Applying Jensen’s inequality twice (first the integral form then the discrete form), we have
It follows that
which is the desired result. \(\square \)
The majorization of \(\Vert B^X_n U -U\Vert _p\) by \(|X-Y|\) as shown in Lemmas 2.4 and 2.5 can be passed on to the random variable \(\Vert B^X_n f -f\Vert _p\) for any \(f \in C[0,1]\) via manipulation of its modulus of continuity. Our main goal in the current section is to find upper bounds for the quantities
where \(\varepsilon >0\) and \(n \in \mathbb N\) are fixed. If f is a constant, then \(B^X_nf \equiv f\). In such a case, the upper bounds hold true automatically. We hereby declare once and for all that in the sequel our target functions are not constant, which justifies writing \(\omega (\frac{1}{\sqrt{n}})\) in denominators. For aesthetic reasons, we will use the equivalent form \(\omega (n^{-1/2})\) for \(\omega (\frac{1}{\sqrt{n}})\) in mathematical contexts in which the presence of the latter expression would make unusually-large displayed equations.
Proposition 2.6
For any \(1 \le p < \infty \), and any given \(f \in C[0,1]\), the following inequalities hold true:
Proof
The proofs of (2.3) and (2.4) use basically the same techniques as those in the proofs of Lemmas 2.4 and 2.5. We will prove (2.4) as it is more involved than the other. For a fixed \(x \in [0,1]\), by the subadditivity of modulus of continuity of f and Jensen’s inequality, we write
It follows that
which implies that
This completes the proof. \(\square \)
In the rest of this section, we will first develop upper bounds for \(\mathbb E\left( |X-Y|^p \right) \; (1 \le p < \infty )\) (Lemma 2.7 below), and \(\mathbb E\left[ \exp \left( \frac{1}{2} n^{p/2} |X-Y|^p \right) \right] \ (1 \le p \le 2)\) (Lemma 2.9 below). We will then use Chebyshev inequality to obtain unconditional \(L^p\)-concentration inequalities for stochastic Bernstein polynomials (Theorems 2.8, 2.11 below). En route we will be using variants of the following identity
where V is a nonnegative random variable.
Lemma 2.7
Let \(1 \le p < \infty \). Then
Proof
We make use of identity (2.5) and Theorem 2.2 to write
which is the desired result. \(\square \)
Theorem 2.8
For any \(\varepsilon >0,\) \(1 \le p < \infty ,\) and \(f \in C[0,1],\) the following inequalities hold true:
-
1.
\({\displaystyle \mathbb P\{\Vert B^X_n f -f \Vert _p > \varepsilon \} \le \frac{ 2^{p-1}\left[ 1+ p\ \Gamma \left( \frac{p}{2}\right) \right] \omega ^p \left( n^{-1/2}\right) }{\varepsilon ^p}.}\)
-
1.
\({\displaystyle \mathbb E\left( \Vert B^X_n f -f \Vert _p \right) \le 2^{1-\frac{1}{p}} \left[ 1+ p\ \Gamma \left( \frac{p}{2}\right) \right] ^{\frac{1}{p}} \omega \left( n^{-1/2}\right) .}\)
Part 1 of Theorem 2.8 gives an \(L^p\)-probabilistic convergence rate (\(1 \le p < \infty )\) for stochastic Bernstein polynomials; Part 2 shows that the average \(L^p\)-approximation order \((1 \le p < \infty )\) of stochastic Bernstein polynomials is the same as that given by the classical Bernstein polynomials. The result in Part 2 fails to work for the case \(p=\infty \). The causation is that the constant, by Sterling’s formula, is of the order \(\sqrt{p}\) as \(p \rightarrow \infty .\) In Sect. 4, we will obtain an average \(L^p\)-approximation order with an absolute constant for stochastic Bernstein polynomials (Corollary 4.3).
Proof
To prove Part 1, we use Chebyshev inequality, inequality (2.3), and Lemma 2.7 to derive
To prove Part 2, we use Jensen’s inequality, inequality (2.3), and Lemma 2.7 to derive
This completes the proof. \(\square \)
Lemma 2.9
Suppose that Z is a random variable satisfying the following inequality:
Then we have
Proof
We use Proposition (2.5) to derive
This completes the proof. \(\square \)
Proposition 2.3 and Lemma 2.9 yield the following result.
Corollary 2.10
The following inequality holds true:
We now state and prove the main result of the section.
Theorem 2.11
For any given \(\varepsilon >0\) and p in the range \(1 \le p \le 2\), and \(f \in C[0,1]\), the following inequality holds true.
Proof
For any given \(\varepsilon >0\) and p in the range \(1 \le p \le 2\), and \(f \in C[0,1]\), we have
Thus, it suffices to prove the theorem for the case \(p=2.\) By (2.4) and Corollary 2.10, we have
Applying Chebyshev inequality to the right hand side of (2.6), we obtain
which is the desired result. \(\square \)
Unlike (1.4), we do not need assumption (1.5) in the proof of Theorem 2.11. The price we paid for this is the multiplicative constant \(\sqrt{e}\).
3 Gaussian-Type \(L^p\)-Concentration Inequalities \( (1 \le p < \infty )\)
In an essential way, the proof of Theorem 2.11 depends on the boundedness of the sequence
This is no longer true for \(p > 2.\) To prove unconditional \(L^p\)-concentration inequalities for \(2<p < \infty \), we start with the random process \(V_n(x)\) defined by
Lemma 3.1
The following inequality holds true almost surely:
Proof
By subadditivity of the modulus of continuity, we have
We then apply Minkowski’s and Jensen’s inequalities to get the desired the result. \(\square \)
We use a triangle inequality to write
Let \( Y_n(x)\) denote the random process as is expressed by the first sum on the right hand side of (3.1), and \(W_n\) the random variable \(\max \nolimits _{0 \le k \le n} |X_{n,k} - \frac{k}{n}|.\) For a fixed \(x \in [0,1]\), let T(x) be the random variable taking values 0 and 1, with respective probabilities x and \(1-x\). Let \(\{T_k(x)\}_{k=0}^n\) be \((n+1)\) independent copies of T(x). Let \(Z_n=Z_n(x)\) be the random variable defined by
Then \(Z_n(x) \) takes values \(\frac{k}{n}\) with respective probabilities \(p_{n,k}(x)\), \(k=0,1,\ldots ,n\). These lead to the following result.
Lemma 3.2
The following inequality holds true almost surely:
Proof
For all \(x \in [0,1]\), we have
which implies that \(\Vert Y_n\Vert _\infty \le W_n\) almost surely. By Minkowski’s and Jensen’s inequalities, we have
which is the desired result.
We will deal with the two terms on the right hand side of the above inequality separately, and start with the first. The key strategy is to tie the random variable \(W_n\) to the empirical process of the uniform distribution in (0, 1).
Lemma 3.3
Let \(U_{n+1}\) be the empirical distribution function of order \((n+1)\) associated with the uniform distribution in (0, 1) defined by
Then the following inequality holds true almost surely,
Proof
Let \(X^*_{n,0}< X^*_{n,1}< \cdots < X^*_{n,n}\) be a set of observed values of the random variables \(X_{n,0}, X_{n,1}, \ldots , X_{n,n}\). Let \(U^*_{n+1}\) be the corresponding (observed) empirical distribution function for the uniform distribution U on [0, 1]. Let \(I_0:=[0,X^*_{n,0})\), \(I_k:=[X^*_{n,k-1}, X^*_{n,k}), \; 1 \le k <n,\) and \(I_n:=[X^*_{n,n}, 1]\). Then we have
The equations above show that
Since \({X^*_{n,k} -\frac{k}{n} < X^*_{n,k} -\frac{k}{n+1},}\) we have \( = \max _{0 \le k \le n} \left( X^*_{n,k} -\frac{k}{n}\right) \le \sup _{x \in [0, 1]}\left( x - U^*_{n+1}(x)\right) \). This inequality holds true for every set of such observed values of the order statistics. Hence we conclude that
Similarly we show that
Combining (3.5) and (3.6), we get the desired result. \(\square \)
Lemma 3.4
Let \(\varepsilon > 0\) and \(n \in \mathbb N\) be given. Then we have the following inequality:
Proof
By Lemma 3.3 and Dvoretzky–Kiefer–Wolfowitz inequality (see (1.6)), we have
which is the desired result. \(\square \)
We now turn our attention to estimating the term \(\Vert \mathbb E\left( |Z_n - \mathbb E(Z_n)| \right) \Vert ^2_p \) in Lemma 3.2, en route we need a well-known concentration inequality known as Bernstein inequality; see [12].
Theorem 3.5
Let \(c>0\). Let \(X_1, \ldots , X_n\) be independent variables. If \(\mathbb P(|X_j| \le c)=1\), then for any \(\varepsilon > 0\),
in which \(S_n:= \frac{1}{n} \sum \nolimits ^n_{j=1}X_j\), \(\mathbb E(S_n) = \mu \), and \(\sigma ^2:= \frac{1}{n} \sum ^n_{j=1}\mathrm{Var} (X_j).\)
In the next lemma, we apply Bernstein inequality to control a certain segment of sums of \(p_{n,k}.\)
Lemma 3.6
For a fixed \(x \in [0,1]\), the following inequality holds true:
Proof
Consider the random variable \(Z_n\) as defined in (3.2). We have \(\mathbb E(Z_n)=x\), and \(\text{ Var }(Z_n)=x(1-x)\). We apply Bernstein inequality to the random variable \(Z_n\) to get the desired inequality. \(\square \)
Denote
Theorem 3.7
Let \(0<p<\infty \) be given. The central p-moment of the random variable \(Z_n\) satisfies the following inequality:
where
Proof
If \(x=0\) or \(x=1\), then the desired inequality holds true trivially. In the rest of the proof, we assume that \(x(1-x) > 0.\) Denote
where \(J_{n,x}:= \lceil \Delta ^{-1}_{n,x}\rceil \), and use Lemma 3.6 to write
In what follows, we derive an upper bound ( independent of x and n) for the sum above by considering two separate cases. Case (i): \(x(1-x) \le \frac{1}{n}\). We have \( \Delta _{n,x} \le 1/n.\) It follows that
Case (ii): \(x(1-x) > \frac{1}{n}\). We have
which yields
The proof is complete. \(\square \)
Lemma 3.8
The following inequality holds true:
where \(C_p\) is as defined in (3.7).
Proof
By Jensen’s inequality and Theorem 3.7, we have
which is the desired result. \(\square \)
Here is the main result of the section.
Theorem 3.9
Let \(\varepsilon > 0\) and \(n \in \mathbb N\) be given. Then the following inequality holds true:
in which \(D_p:= \exp \left[ \left( 1 + \frac{C^{2/p}_p}{2}\right) /2\right] .\)
Proof
By Lemmas 3.1, 3.2, and 3.8, the following inequalities hold true almost surely:
It follows from Lemmas 3.4 and 2.9 that
Applying Chebyshev inequality, we get
which is the desired result. \(\square \)
Two remarks about Theorem 3.9 are in order.
-
1.
For \(p=2,\) the multiplicative constant in the result of Theorem 3.9 is \(\exp \left[ \left( 1 + \frac{C_2}{2}\right) /2\right] \), which is noticeably larger than \(\sqrt{e}\) as given in Theorem 2.11.
-
2.
The multiplicative constant \(D_p\) is of the order \(\exp (\sqrt{p})\) as \(p \rightarrow \infty ,\) which is the typical growth rate of the pth order of exponential moments of a sub-Gaussian random variable. This has ruled out the feasibility of obtaining an \(L^\infty \)-concentration inequality for stochastic Bernstein polynomials by taking the limit \(p \rightarrow \infty .\)
4 Exponential \(L_\infty \)-Probabilistic Convergence Rates
The approach undertaken in this section needs to harness the deterministic approximation power of classical Bernstein polynomials, which features many publications; see [11, 15, 16, 22, 25, 35], and the references therein. Sikkema [31] proved the following remarkable result:
We denote this constant by \(c_s\). Sikkema [32] also showed that the value on the right hand side of the above equation is only reached for \(n=6\). But we will not use the latter result in the current article.
Lemma 4.1
Let \(\varepsilon >0\) and \(f\in C[0, 1]\) be given. Suppose that \(\omega (\frac{1}{\sqrt{n}})<\frac{\varepsilon }{2(c_s +1)}.\) Then the following inequality holds true:
Authors of [34, Lemma 2.10] proved a slightly different version of the above lemma. We will omit the proof, and refer interested readers to the proof of Lemma 2.10 in [34]
Theorem 4.2
Let \(\varepsilon >0\) and \(f\in C[0, 1]\) be given. Suppose that
Then the following inequality holds true:
Proof
Let \(1 \le p \le \infty \) be given. We use Lemmas 4.1 and 3.4 in a successive fashion to derive the following inequalities:
This completes the proof. \(\square \)
The result of Theorem 4.2 confirms a similar conjecture raised in [33].
Corollary 4.3
Let \(f\in C[0, 1]\) be given. Then the following inequality holds true:
Proof
Denote \(a_n:=2(c_s +1)\omega (n^{-1/2})\). By (2.5) and Theorem 4.2, we have
This completes the proof. \(\square \)
Notes
Let us take a brief moment off the main topic here to reflect on making a list of standard criteria in evaluating the robustness of a stochastic approximation operator. We suggest it be on the list that the expected approximation power of the stochastic operator matches that of a standard deterministic operator.
The type of exponential decay rates we derive here is often referred to as Gaussian tail bounds.
References
Adell, J.A., Cardenas-Morales, D.: Stochastic Bernstein polynomials: uniform convergence in probability with rates. Adv. Comput. Math. 46, 16 (2020)
Birnbaum, Z.W., McCarty, R.C.: A distribution-free upper confidence bound for \(\text{ Pr }\{Y<X\}\), based on independent samples of \(X\) and \(Y\). Ann. Math. Stat. 29, 558–562 (1958)
Bobkov, S., Ledoux, M.: One-dimensional empirical measures, order statistics and Kantorovich transport distances. Mem. Am. Math. Soc. 261, 1259 (2019)
Boucheron, S., Lugosi, G., Massart, P.: Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, Oxford (2013)
Bourgain, J., Lindenstrauss, J.: Distribution of points on spheres and approximation by zonotopes. Israel J. Math. 64, 25–31 (1988)
Bourgain, J., Lindenstrauss, J., Milman, V.: Approximation of zonoids by zonotopes. Acta Math. 162, 73–141 (1989)
Bourgain, J., Lindenstrauss, J.: Approximating the ball by a Minkowski sum of segments with equal length. Discret. Comput. Geom. 9, 131–144 (1993)
Buldygin, V.V., Kozachenko, I.V.: Metric Characterization of Random Variables and Random Processes, Translations of Mathematical Monographs, vol. 188. American Mathematical Society, Providence (2000)
Casella, G., Berger, R.: Statistical Reference, 2nd ed, Duxbury Advanced Series (2002)
Cenusa, Gh., Sacuiu, I.: On some stochastic approximations for random functions. Rend. Mat. (Roma) Ser. VI 12, 143–156 (1979)
Cheney, E.W.: Introduction to Approximation Theory, 2nd edn. Chelsea, New York (1982)
Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. (New Ser.) Am. Math. Soc. 39, 1–49 (2001)
Cucker, F., Zhou, D.X.: Learning Theory: An Approximation Theory Viewpoint. Cambridge University Press, Cambridge (2007)
David, H.A., Nagaraja, H.N.: Order Statistics, Wiley Series in Probability and Statistics, 3rd edn. Wiley, Hoboken (2003)
Ditzian, Z., Ivanov, K.G.: Strong converse inequalities. J. Anal. Math. 61, 61–111 (1993)
Ditzian, Z., Totik, V.: Moduli of Smoothness. Springer, Berlin (2012)
Dvoretzky, A., Kiefer, J., Wolfowitz, J.: Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator. Ann. Math. Stat. 27(3), 642–669 (1956)
Gal, S.G.: Jackson type estimates in the approximation of random functions by random polynomials. Rend. Mat. Appl. Roma 4, 543–556 (1994)
Gal, S.G.: Approximation Theory in Random Setting, Chapter 12 in Handbook of Analytic Computational Methods in Applied Mathematics, pp. 571–616. Chapman and Hall, Boca Raton (2000)
Gal, S.G., Niculescu, P.C.: Approximation of random functions by stochastic Bernstein polynomials in capacity spaces. Carpathian J. Math. 37(2), 185–194 (2021)
Gao, W., Sun, X., Wu, Z., Zhou, X.: Multivariate Monte Carlo approximation based on scattered data. SIAM J. Sci. Comput. 42, 2262–2280 (2020)
Gavrea, I., Ivan, M.: The Bernstein Voronovskaja-type theorem for positive linear operators. J. Approx. Theory 192, 291–296 (2015)
Ignatov, Z.G., Mills, T.M., Tzankova, I.P.: On the rate of approximation of random functions. Serdica Bulg. Math. Publ. 18, 240–247 (1992)
Kamolov, A.I.: On exact estimates of approximation of random processes (in Russian). Dokl. Akad. Nauk UzSSR 11, 4–6 (1986)
Lorentz, G.G.: Bernstein Polynomials, Mathematical Expositions, vol. 8. University of Toronto Press, Toronto (1953)
Massart, P.: The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. Ann. Probab. 18(3), 1269–1283 (1990)
Niederrreiter, H.: On the existence of uniformly distributed sequences in compact spaces. Compos. Math. 25(1), 93–99 (1972)
Marchal, O., Arbel, J.: On the sub-Gaussianity of the Beta and Dirichlet distributions. Electron. Commun. Probab. 22(54), 1–14 (2017)
Onicescu, O., Istratescu, V.I.: Approximation theorems for random functions. Rend. Mat. (Roma) Ser. VI 8, 65–81 (1975)
Onicescu, O., Istratescu, V.I.: Approximation theorems for random functions. II. Rend. Mat. (Roma) Ser. VI 11(4), 585–589 (1978)
Sikkema, P.C.: Der Wert einiger Konstanten in der Theorie der Approximation mit Bernstein-Polynomen, (German). Numer. Math. 3, 107–116 (1961)
Sikkema, P.C.: Über den Grad der approximation mit Bernstein-Polynomen (German). Numer. Math. 1, 221–239 (1959)
Sun, X., Wu, Z.: Chebyshev type inequality for stochastic Bernstein polynomials. Proc. Am. Math. Soc. 147(2), 671–679 (2019)
Sun, X., Wu, Z., Zhou, X.: On probabilistic convergence rates of stochastic Bernstein polynomials. Math. Comp. 90(328), 813–830 (2021)
Totik, V.: Approximation by Bernstein polynomials. Am. J. Math. 116, 995–1018 (1994)
Vitale, R.A.: A Bernstein polynomial approach to density function estimation, statistical inference and related topics 2. In: Proceedings of the Summer Research Institute on Statistical Inference for Stochastic Processes, Bloomington, Indiana, July 31–August 9, pp. 87–99 (1975)
Wagner, G.: On a new method for constructing good point sets on spheres. Discret. Comput. Geom. 9, 111–129 (1993)
Wu, Z., Sun, X., Ma, L.: Sampling scattered data with Bernstein polynomials: stochastic and deterministic error estimates. Adv. Comput. Math. 38, 187–205 (2013)
Wu, Z., Zhou, X.: Polynomial convergence order of stochastic Bernstein approximation. Adv. Comput. Math. 46, 25 (2020)
Acknowledgements
We thank Professor Zongmin Wu, Fudan University, for fruitful discussions on topics related to the article.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sun, X., Zhou, X. Stochastic Quasi-Interpolation with Bernstein Polynomials. Mediterr. J. Math. 19, 240 (2022). https://doi.org/10.1007/s00009-022-02150-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00009-022-02150-y
Keywords
- Dvoretzky–Kiefer–Wolfowitz inequality
- modulus of continuity
- order statistics
- stochastic quasi-interpolation