Rates for Bayesian Estimation of Location-Scale Mixtures of Super-Smooth Densities

Scricciolo, Catia

doi:10.1007/978-3-319-27274-0_5

Catia Scricciolo⁸

Part of the book series: Studies in Theoretical and Applied Statistics ((STASSPSS))

1153 Accesses
1 Citations

Abstract

We consider Bayesian nonparametric density estimation with a Dirichlet process kernel mixture as a prior on the class of Lebesgue univariate densities, the emphasis being on the achievability of the error rate $n^{-1/2}$, up to a logarithmic factor, depending upon the kernel. We derive rates of convergence for the Bayes’ estimator of super-smooth densities that are location-scale mixtures of densities whose Fourier transforms have sub-exponential tails. We show that a nearly parametric rate is attainable in the L ¹-norm, under weak assumptions on the tail decay of the true mixing distribution and the overall Dirichlet process base measure.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Nonparametric estimation of the mixing density using polynomials

Article 01 July 2015

Bayes and maximum likelihood for $L^1$-Wasserstein deconvolution of Laplace mixtures

Article 15 September 2017

A non-parametric Bayesian approach to decompounding from high frequency data

Article Open access 16 December 2016

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Consider the estimation of a density f ₀ on $\mathbb{R}$ from observations X ₁, …, X _n taking a Bayesian nonparametric approach. A prior is defined on a metric space of probability measures with Lebesgue density and a summary of the posterior, e.g., the posterior expected density, is employed. The so-called what if approach, which consists in investigating frequentist asymptotic properties of the posterior, under the non-Bayesian assumption that the data are generated from a fixed density, provides a way to validate priors on infinite-dimensional spaces. Desirable asymptotic properties of posterior distributions are consistency, minimax-optimal concentration rate of the posterior mass around the “truth” as the sample size grows, possibly with full adaptation to the regularity level of f ₀, if unknown, and distributional convergence. For bounded and convex distances, posterior contraction rates yield upper bounds on convergence rates of the Bayes’ estimator, thus motivating the interest in their study. Since the seminal articles of Ferguson [2] and Lo [4], the idea of constructing priors on spaces of densities by convoluting a fixed kernel with a random distribution has been successfully exploited in density estimation. Even if much progress has been done during the last decade in understanding frequentist asymptotic properties of mixture models, the choice of the kernel is a topic largely ignored in the literature, except for the article of Wu and Ghosal [9], mainly focussed on consistency. Posterior contraction rates for Dirichlet process kernel mixture priors have been investigated by Ghosal and van der Vaart [3] and Scricciolo [5]. One key message is that some constraints on the regularity of the kernel and on the tail decay of the true mixing distribution are necessary to accurately estimate a density. Most of the literature has dealt with the estimation of mixtures, with normal (or generalized normal) kernel and mixing distribution having either compact support or sub-exponential tails, finding a nearly parametric rate, up to a logarithmic factor, in the L ¹-distance, but there are almost no results beyond the Gaussian kernel. The aim of this work is to contribute to the understanding of the role of the kernel choice in density estimation with a Dirichlet process mixture prior. The main result states that a nearly parametric rate can be attained to estimate mixtures of super-smooth densities having Fourier transforms that decay exponentially, whatever the kernel tail decay, heavy tailed distributions, like Student’s-t or Cauchy, being included, which have been proved to be extremely useful in accurately modeling different kinds of financial data. For example, individual stock indices can be modeled as stable laws. Multivariate stable laws have been fruitfully used in computer networks, see Bickson and Guestrin [1]. The assumption on the exponential tail decay of the true mixing distribution seems unavoidable in order to find a finite approximating mixture with a sufficiently restricted number of points. This step is a delicate mathematical point in the proof, see Lemma 1. Such an approximation result, which is reported in the Appendix, may be of autonomous interest as well. In Sect. 2, we fix the notation and present the result.

2 Main Result

We derive rates for location-scale mixtures of super-smooth densities. The model is $f_{F,\,G}(x):=\int _{ 0}^{\infty }(F {\ast} K_{\sigma })(x)\,\mathrm{d}G(\sigma )$, $x \in \mathbb{R}$, where K is a kernel density, $F \sim D_{\alpha }$ is a Dirichlet process with base measure $\alpha:=\alpha (\mathbb{R})\bar{\alpha }$, for $0 <\alpha (\mathbb{R}) <\infty$ and $\bar{\alpha }$ a probability measure on $\mathbb{R}$, and $G \sim D_{\beta }$, with finite and positive base measure β on $(0,\,\infty )$. We assume that $f_{0} = f_{F_{0},\,G_{0}}$, with F ₀ and G ₀ denoting the true mixing distributions for the location and scale parameters, respectively. We use the following assumptions.

(A)
The true mixing distribution G ₀ for the scale parameter satisfies
$$\displaystyle{ \int _{0}^{\infty }\sigma \,\mathrm{d}G_{ 0}(\sigma ) <\infty \qquad \mathrm{and}\qquad \int _{0}^{\infty }\frac{1} {\sigma } \,\mathrm{d}G_{0}(\sigma ) <\infty. }$$
(1)
Also, for constants d ₁, d ₂ > 0 and $0 <\gamma _{ 1}^{0},\,\gamma _{2}^{0} \leq \infty$,
$$\displaystyle{G_{0}(s) \lesssim e^{-d_{1}s^{-\gamma _{1}^{0}} }\quad \mathrm{as\ }s \rightarrow 0\quad \,\mathrm{and}\quad \,1 - G_{0}(s) \lesssim e^{-d_{2}s^{\gamma _{2}^{0}} }\quad \mathrm{as\ }s \rightarrow \infty.}$$
(B)
The base measure β of the Dirichlet process prior for G has a continuous and positive Lebesgue density β′ on $(0,\,\infty )$ such that, for constants C _j, D _j > 0, j = 1, …, 4, q ₁, q ₂, r ₁, r ₂ ≥ 0 and $0 <\gamma _{1},\,\gamma _{2} \leq \infty$,
$$\displaystyle{ C_{1}\sigma ^{-q_{1} }e^{-C_{2}\sigma ^{-\gamma _{1}}(\log (1/\sigma ))^{r_{1}} } \leq \beta '(\sigma ) \leq C_{3}\sigma ^{-q_{1} }e^{-C_{4}\sigma ^{-\gamma _{1}}(\log (1/\sigma ))^{r_{1}} } }$$
(2)
for all $\sigma$ in a neighborhood of 0, and
$$\displaystyle{ D_{1}\sigma ^{q_{2} }e^{-D_{2}\sigma ^{\gamma _{2}}(\log \sigma )^{r_{2}} } \leq \beta '(\sigma ) \leq D_{3}\sigma ^{q_{2} }e^{-D_{4}\sigma ^{\gamma _{2}}(\log \sigma )^{r_{2}} } }$$
(3)
for all $\sigma$ large enough.

Remark 1

The right-hand side requirement in (1) has also been postulated by Tokdar [7], see condition 3 of Lemma 5.1 and condition 4 of Theorem 5.2, pp. 102–103. If, for example, G ₀ is an $\mathrm{IG}(\nu,\,\lambda )$, with shape parameter ν > 0 and scale parameter $\lambda> 0$, then $\int _{0}^{\infty }\sigma ^{-1}\,\mathrm{d}G_{0}(\sigma ) = (\nu /\lambda ) <\infty$. If G ₀ is a right-truncated distribution, then the requirement on the upper tail is satisfied with $\gamma _{2}^{0} = \infty$. A right-truncated Inverse-Gamma distribution meets all the requirements of assumption (A).

Remark 2

Condition (2) is satisfied (with r ₁ = 0) if β′ is an Inverse-Gamma distribution. It can be seen that (2) implies that

$$\displaystyle{\beta ((0,\,s]) \leq \exp \left \{-\frac{C_{4}} {2} s^{-\gamma _{1} }\left (\log \frac{1} {s}\right )^{r_{1} }\right \} \lesssim e^{-\frac{1} {2} C_{4}s^{-\gamma _{1}} }\qquad \mathrm{as\ }s \rightarrow 0.}$$

Condition (3) has been considered by van der Vaart and van Zanten [8], p. 2660, and implies that $\beta ((s,\,\infty )) \lesssim \exp \{-D_{4}s^{\gamma _{2}}/2\}$ as $s \rightarrow \infty$, see Lemma 4.9, p. 2669.

We assess rates for location-scale mixtures of symmetric stable laws. The result goes through to location-scale mixtures of Student’s-t distributions.

Theorem 1

Let K be the density of a symmetric stable law of index 0 < r ≤ 2. Suppose that $f_{0} =\int _{ 0}^{\infty }(F_{0} {\ast} K_{\sigma })\,\mathrm{d}G_{0}(\sigma )$ , with the true mixing distribution F ₀ for the location parameter satisfying the tail condition

$$\displaystyle{ F_{0}(\{\theta:\, \vert \theta \vert> t\}) \lesssim \exp \{-c_{0}t^{1+I_{(1,\,2]}(r)/(r-1)}\}\qquad \mathrm{for\ large}\,t> 0, }$$

(4)

for some constant c ₀ > 0, and the true mixing distribution G ₀ for the scale parameter satisfying assumption (A ), with $\gamma _{2}^{0} = \infty$ . If the base measure α has a density α′ such that, for constants b > 0 and $0 <\delta \leq 1 + I_{(1,\,2]}(r)/(r - 1)$ , satisfies

$$\displaystyle{ \alpha '(\theta ) \propto e^{-b\vert \theta \vert ^{\delta }},\qquad \theta \in \mathbb{R}, }$$

(5)

the base measure β satisfies assumption (B ), with $0 <\gamma _{j} \leq \gamma _{j}^{0} \leq \infty$ and γ _j < γ _j ⁰ if r _j > 0, j = 1, 2, then the posterior rate of convergence relative to the Hellinger distance is $\varepsilon _{n} = n^{-1/2}(\log n)^{\kappa }$ , with κ > 0 depending on γ ₁ ⁰ , γ ₁ , γ ₂ , and r.

Proof

The proof is in the same spirit as that of Theorem 4.1 in Scricciolo [6], which, for space limitations, cannot be reported here. Let $\bar{\varepsilon }_{n} = n^{-1/2}(\log n)^{\kappa }$ and $\tilde{\varepsilon }_{n} = n^{-1/2}(\log n)^{\tau }$, with κ > τ > 0 whose rather lengthy expressions we refrain from writing down. Let $0 <s_{n} \leq E(\log (1/\bar{\varepsilon }_{n}))^{-2\tau /\gamma _{1}}$, $0 <S_{n} \leq F(\log (1/\bar{\varepsilon }_{n}))^{2\tau /\gamma _{2}}$, and $0 <a_{n} \leq L(\log (1/\bar{\varepsilon }_{n}))^{2\tau /\delta }$, with E, F, L > 0 suitable constants. Replacing the expression of N in (A.19) of Lemma A.7 of Scricciolo [6], with that in Lemma 1, we can estimate the covering number of the sieve set

$$\displaystyle{\mathcal{F}_{n}:=\{\, f_{F,\,G}:\, F([-a_{n},\,a_{n}]) \geq 1 -\bar{\varepsilon }_{n}/2,\quad G([s_{n},\,S_{n}]) \geq 1 -\bar{\varepsilon }_{n}/2\}}$$

and show that $\log D(\bar{\varepsilon }_{n},\,\mathcal{F}_{n},\,d_{\mathrm{H}}) \lesssim (\log n)^{2\kappa } = n\bar{\varepsilon }_{n}^{2}$. Verification of the remaining mass condition $\pi (\mathcal{F}_{n}^{c}) \lesssim \exp \{-(c_{2} + 4)n\tilde{\varepsilon }_{n}^{2}\}$ can proceed as in the aforementioned theorem using, among others, the fact that 2τ > 1.

We now turn to consider the small ball probability condition. For $0 <\varepsilon <1/4$, let $a_{\varepsilon }:= (c_{0}^{-1}\log (1/(s_{\varepsilon }\varepsilon )))^{1/(1+I_{(1,\,2]}(r)/(r-1))}$ and $s_{\varepsilon }:= (d_{1}^{-1}\log (1/\varepsilon ))^{-1/\gamma _{1}^{0} }$. Let G ₀ ^∗ be the re-normalized restriction of G ₀ to $[s_{\varepsilon },\,S_{0}]$, with S ₀ the upper endpoint of the support of G ₀, and F ₀ ^∗ the re-normalized restriction of F ₀ to $[-a_{\varepsilon },\,a_{\varepsilon }]$. Then, $\|f_{F_{0}^{{\ast}},\,G_{0}^{{\ast}}}- f_{0}\|_{1} \lesssim \varepsilon$. By Lemma 1, there exist discrete distributions $F'_{0}:=\sum _{ j=1}^{N}p_{j}\delta _{\theta _{j}}$ on $[-a_{\varepsilon },\,a_{\varepsilon }]$ and $G_{0}':=\sum _{ k=1}^{N}q_{k}\delta _{\sigma _{k}}$ on $[s_{\varepsilon },\,S_{0}]$, with at most $N \lesssim (\log (1/\varepsilon ))^{2\tau -1}$ support points, such that $\|\,f_{F'_{0},\,G'_{0}} - f_{F_{0}^{{\ast}},\,G_{0}^{{\ast}}}\|_{\infty }\lesssim \varepsilon$. For $T_{\varepsilon }:= (2a_{\varepsilon } \vee \varepsilon ^{-1/(r+I_{(0,\,1]}(r))})$,

$$\displaystyle{\|f_{F'_{0},\,G'_{0}} - f_{F_{0}^{{\ast}},\,G_{0}^{{\ast}}}\|_{1} \lesssim T_{\varepsilon }\|f_{F'_{0},\,G'_{0}} - f_{F_{0}^{{\ast}},\,G_{0}^{{\ast}}}\|_{\infty } + T_{\varepsilon }^{-r} \lesssim \varepsilon ^{1-1/(r+I_{(0,\,1]}(r))}.}$$

Without loss of generality, the $\theta _{j}$’s and $\sigma _{k}$’s can be taken to be at least $2\varepsilon$-separated. For any distribution F on $\mathbb{R}$ and G on $(0,\,\infty )$ such that

$$\displaystyle{\sum _{j=1}^{N}\vert F([\theta _{ j}-\varepsilon,\,\theta _{j}+\varepsilon ]) - p_{j}\vert \leq \varepsilon \quad \mathrm{and}\quad \sum _{k=1}^{N}\vert G([\sigma _{ k}-\varepsilon,\,\sigma _{k}+\varepsilon ]) - q_{k}\vert \leq \varepsilon,}$$

by the same arguments as in the proof of Theorem 4.1 in Scricciolo [6],

$$\displaystyle{\|f_{F,\,G} - f_{F'_{0},\,G'_{0}}\|_{1} \lesssim \varepsilon.}$$

Consequently,

$$\displaystyle\begin{array}{rcl} d_{\mathrm{H}}^{2}(\,f_{ F,\,G},\,f_{0})& \leq & \|f_{F,\,G} - f_{F_{0}',\,G_{0}'}\|_{1} +\| f_{F_{0}',\,G_{0}'} - f_{F_{0}^{{\ast}},\,G_{0}^{{\ast}}}\|_{1} +\| f_{F_{0}^{{\ast}},\,G_{0}^{{\ast}}}- f_{0}\|_{1} {}\\ & \lesssim & \varepsilon ^{1-1/(r+I_{(0,\,1]}(r))}. {}\\ \end{array}$$

By an analogue of the last part of the same proof, we get that $\pi (B_{\mathrm{KL}}(\,f_{0};\,\tilde{\varepsilon }_{n}^{2})) \gtrsim \exp \{-c_{2}n\tilde{\varepsilon }_{n}^{2}\}$.

Remark 3

Assumptions (4) on F ₀ and (5) on α′ imply that $\mathrm{supp}(F_{0}) \subseteq \mathrm{ supp}(\alpha )$, thus, F ₀ is in the weak support of D _α. Analogously, assumptions (A) on G ₀ and (B) on β′, together with the restrictions on γ _j ⁰, γ _j, j = 1, 2, imply that $\mathrm{supp}(G_{0}) \subseteq \mathrm{ supp}(\beta )$, thus, G ₀ is in the weak support of D _β.

Remark 4

If $\gamma _{1} =\gamma _{2} = \infty$, then also $\gamma _{1}^{0} =\gamma _{ 2}^{0} = \infty$, i.e., the true mixing distribution G ₀ for $\sigma$ is compactly supported on an interval [s ₀, S ₀], for some $0 <s_{0} \leq S_{0} <\infty$, and (an upper bound on) the rate is given by $\varepsilon _{n} = n^{-1/2}(\log n)^{\kappa }$, with κ whose value for Gaussian mixtures (r = 2) reduces to the same found by Ghosal and van der Vaart [3] in Theorem 6.1, p. 1255.

References

Bickson, D., Guestrin, C.: Inference with multivariate heavy-tails in linear models. In: Lafferty, J.D., Williams, C.K.I., Shawe-Taylor, J., Zemel, R.S., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 208–216. Curran Associates, Inc., Red Hook (2010). http://papers.nips.cc/paper/3949-inference-with-multivariate-heavy-tails-in-linear-models.pdf
Google Scholar
Ferguson, T.S.: Bayesian density estimation by mixtures of normal distributions. In: Rizvi, M.H., Rustagi, J.S., Siegmund, D. (eds.) Recent Advances in Statistics, pp. 287–302. Academic, New York (1983)
Google Scholar
Ghosal, S., van der Vaart, A.W.: Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Stat. 29, 1233–1263 (2001)
Article MathSciNet MATH Google Scholar
Lo, A.Y.: On a class of Bayesian nonparametric estimates: I. Density estimates. Ann. Stat. 12, 351–357 (1984)
Article MathSciNet MATH Google Scholar
Scricciolo, C.: Posterior rates of convergence for Dirichlet mixtures of exponential power densities. Electron. J. Stat. 5, 270–308 (2011)
Article MathSciNet MATH Google Scholar
Scricciolo, C.: Rates of convergence for Bayesian density estimation with Dirichlet process mixtures of super-smooth kernels. Working Paper No.1. DEC, Bocconi University (2011)
Google Scholar
Tokdar, S.T.: Posterior consistency of Dirichlet location-scale mixture of normals in density estimation and regression. Sankhyā 68, 90–110 (2006)
MathSciNet MATH Google Scholar
van der Vaart, A.W., van Zanten, J.H.: Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Stat. 37, 2655–2675 (2009)
Article MathSciNet MATH Google Scholar
Wu, Y., Ghosal, S.: Kullback Leibler property of kernel mixture priors in Bayesian density estimation. Electron. J. Stat. 2, 298–331 (2008)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Bocconi University, Via Roentgen n. 1, 20136, Milan, Italy
Catia Scricciolo

Authors

Catia Scricciolo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Catia Scricciolo .

Editor information

Editors and Affiliations

MEMOTEF, Sapienza University of Rome, Rome, Italy
Giorgio Alleva
Dept. of Statistcs & Informatics, University of Florence, Florence, Italy
Andrea Giommi

Appendix

The following lemma provides an upper bound on the number of mixing components of finite location-scale mixtures of symmetric stable laws that uniformly approximate densities of the same type with compactly supported mixing distributions. We use $\mathbb{E}$ and $\mathbb{E}'$ to denote expectations corresponding to priors G and G′ for the scale parameter $\varSigma$, respectively.

Lemma 1

Let K be a density with Fourier transform such that, for constants A, ρ > 0 and 0 < r < 2, $\varPhi _{K}(t) = Ae^{-\rho \vert t\vert ^{r} }$, $t \in \mathbb{R}$ . Let 0 < ε < 1, $0 <a <\infty$ and $0 <s \leq S <\infty$ be given, with (a∕s) ≥ 1. For any pair of probability measures F on [−a, a] and G on [s, S], there exist discrete probability measures F′ on [−a, a] and G′ on [s, S], with at most

$$\displaystyle\begin{array}{rcl} N \lesssim \left \{\begin{array}{lll} \frac{a} {s}\, \times \,\left (\frac{S} {s} \right )^{r}\,\left (\log \frac{1} {s\epsilon }\right )^{1+1/r}, &\,\,\,\,\,\,\,\,\,\,\,\,\mathrm{if}\,\,\,\,\,\,\,\,\,\,\,\,0 <r \leq 1, \\ \max \left \{\left (\frac{a} {s}\right )^{r/(r-1)},\,\left (\frac{S} {s} \right )^{r/(r-1)}\left (\log \frac{1} {s^{r}\epsilon }\right )^{1/(r-1)}\right \},&\,\,\,\,\,\,\,\,\,\,\,\,\mathrm{if}\,\,\,\,\,\,\,\,\,\,\,\,1 <r <2, \end{array} \right.& & {}\\ \end{array}$$

support points, such that $\|\mathbb{E}[F {\ast} K_{\varSigma }] - \mathbb{E}'[F' {\ast} K_{\varSigma }]\|_{\infty }\lesssim \epsilon$.

Proof

We consider first the case where 1 < r < 2 because, since (a∕s) ≥ 1 by assumption, we can appeal to Lemma A.1 of Scricciolo [6]. The arguments of the first part of the proof can be then used to deal also with the case where 0 < r ≤ 1. For each $s \leq \sigma \leq S$, since $\int _{-\infty }^{\infty }\vert \varPhi _{K}(\sigma t)\vert \,\mathrm{d}t <\infty$, the inversion formula can be applied to recover both $F {\ast} K_{\sigma }$ and $F' {\ast} K_{\sigma }$. For any M > 0 and $x \in \mathbb{R}$,

$$\displaystyle\begin{array}{rcl} & & \vert \mathbb{E}[(F {\ast} K_{\varSigma })(x)] - \mathbb{E}'[(F' {\ast} K_{\varSigma })(x)]\vert {}\\ & & = \frac{1} {2\pi }\Bigg\vert \int _{s}^{S}\int _{ -\infty }^{\infty }e^{-itx}\varPhi _{ K}(\sigma t)\varPhi _{F}(t)\,\mathrm{d}t\,\mathrm{d}G(\sigma ) {}\\ & & \quad -\int _{s}^{S}\int _{ -\infty }^{\infty }e^{-itx}\varPhi _{ K}(\sigma t)\varPhi _{F'}(t)\,\mathrm{d}t\,\mathrm{d}G'(\sigma )\Bigg\vert {}\\ & & = \frac{1} {2\pi }\Bigg\vert \left (\int _{\vert t\vert \leq M} +\int _{\vert t\vert>M}\right )e^{-itx}\left [\varPhi _{ F}(t)\mathbb{E}[\varPhi _{K}(\varSigma t)] -\varPhi _{F'}(t)\mathbb{E}'[\varPhi _{K}(\varSigma t)]\right ]\,\mathrm{d}t\Bigg\vert. {}\\ \end{array}$$

Let

$$\displaystyle{U:= \frac{1} {2\pi }\Bigg\vert \int _{\vert t\vert \leq M}e^{-itx}\left [\varPhi _{ F}(t)\mathbb{E}[\varPhi _{K}(\varSigma t)] -\varPhi _{F'}(t)\mathbb{E}'[\varPhi _{K}(\varSigma t)]\right ]\,\mathrm{d}t\Bigg\vert }$$

and

$$\displaystyle{V:= \frac{1} {2\pi }\Bigg\vert \int _{\vert t\vert>M}e^{-itx}\left [\varPhi _{ F}(t)\mathbb{E}[\varPhi _{K}(\varSigma t)] -\varPhi _{F'}(t)\mathbb{E}'[\varPhi _{K}(\varSigma t)]\right ]\,\mathrm{d}t\Bigg\vert.}$$

For $M \geq (\rho ^{1/r}s)^{-1}(\log (1/(s^{r}\varepsilon )))^{1/r}$,

$$\displaystyle{V \leq \frac{A} {2\pi } \int _{\vert t\vert>M}\int _{s}^{S}e^{-\rho (\sigma \vert t\vert )^{r} }\,\mathrm{d}(G + G')(\sigma )\,\mathrm{d}t\, \lesssim \,\varepsilon.}$$

In order to find an upper bound on U, we apply Lemma A.1 of Ghosal and van der Vaart [3], p. 1260, to both F and G. There exists a discrete probability measure F′ on [−a, a], with at most N ₁ + 1 support points, where N ₁ is a positive integer to be suitably chosen later on, such that it matches the (finite) moments of F up to the order N ₁, i.e., $\mathbb{E}'[\varTheta ^{j}] = \mathbb{E}[\varTheta ^{j}]$ for all j = 1, …, N ₁. Analogously, there exists a discrete probability measure G′ on [s, S], with at most N ₂ support points, where N ₂ is a positive integer to be suitably chosen later on, such that

$$\displaystyle{\mathbb{E}'[\varSigma ^{r\ell}]:=\int _{ s}^{S}\sigma ^{r\ell}\,\mathrm{d}G'(\sigma ) =\int _{ s}^{S}\sigma ^{r\ell}\,\mathrm{d}G(\sigma ) =: \mathbb{E}[\varSigma ^{r\ell}],\qquad \ell = 1,\,\ldots,\,N_{ 2} - 1.}$$

Both N ₁ and N ₂ will be chosen to be increasing functions of $\varepsilon$. In virtue of the latter matching conditions,

$$\displaystyle\begin{array}{rcl} \left \vert \mathbb{E}[\varPhi _{K}(\varSigma t)] - \mathbb{E}'[\varPhi _{K}(\varSigma t)]\right \vert & \leq & \left \vert \mathbb{E}\left [\varPhi _{K}(\varSigma t) - A\sum _{\ell=0}^{N_{2}-1}\frac{(\rho (\varSigma \vert t\vert )^{r})^{\ell}} {\ell!} \right ]\right \vert \\ & & +\left \vert \mathbb{E}'\left [\varPhi _{K}(\varSigma t) - A\sum _{\ell=0}^{N_{2}-1}\frac{(\rho (\varSigma \vert t\vert )^{r})^{\ell}} {\ell!} \right ]\right \vert \\ & \leq & \frac{2A} {(N_{2})!}(\rho (S\vert t\vert )^{r})^{N_{2} },\qquad t \in \mathbb{R}. {}\end{array}$$

(6)

Using arguments of Lemma A.1 in Scricciolo [6] and inequality (6),

$$\displaystyle\begin{array}{rcl} U& \leq & \frac{1} {2\pi }\int _{\vert t\vert \leq M}\left \vert \varPhi _{F}(t)\mathbb{E}[\varPhi _{K}(\varSigma t)] -\varPhi _{F'}(t)\mathbb{E}'[\varPhi _{K}(\varSigma t)]\right \vert \,\mathrm{d}t {}\\ & \leq & \frac{1} {2\pi }\int _{\vert t\vert \leq M}\left \vert \varPhi _{F}(t) -\sum _{j=0}^{N_{1} } \frac{(it)^{j}} {j!} \mathbb{E}[\varTheta ^{j}]\right \vert \mathbb{E}[\vert \varPhi _{ K}(\varSigma t)\vert ]\,\mathrm{d}t {}\\ & & +\,\frac{1} {2\pi }\int _{\vert t\vert \leq M}\left \vert \varPhi _{F'}(t) -\sum _{j=0}^{N_{1} } \frac{(it)^{j}} {j!} \mathbb{E}'[\varTheta ^{j}]\right \vert \mathbb{E}'[\vert \varPhi _{ K}(\varSigma t)\vert ]\,\mathrm{d}t {}\\ & & +\,\frac{1} {2\pi }\sum _{j=0}^{N_{1} } \frac{\vert \mathbb{E}[\varTheta ^{j}]\vert } {j!} \int _{\vert t\vert \leq M}\vert t^{j}\vert \left \vert \mathbb{E}[\varPhi _{ K}(\varSigma t)] - \mathbb{E}'[\varPhi _{K}(\varSigma t)]\right \vert \,\mathrm{d}t {}\\ & \leq & \frac{4Aa^{N_{1}}} {\pi r(\rho s^{r})^{(N_{1}+1)/r}} \frac{\varGamma ((N_{1} + 1)/r)} {\varGamma (N_{1} + 1)} + \frac{2A} {\pi (N_{2})!}(1 + aM)^{N_{1} }(\rho (SM)^{r})^{N_{2} }M {}\\ & \sim & \frac{4Aa^{N_{1}}} {\pi r(\rho s^{r})^{(N_{1}+1)/r}} \frac{\varGamma ((N_{1} + 1)/r)} {\varGamma (N_{1} + 1)} + \frac{\sqrt{2}A(1 + aM)^{N_{1}}(\rho (SM)^{r})^{N_{2}}M} {\pi ^{3/2}e^{-N_{2}}N_{2}^{N_{2}-1/2}}, {}\\ \end{array}$$

where, in the last line, we have used Stirling’s approximation for (N ₂)! , assuming N ₂ is large enough. For $N_{1} \lesssim \max \{\log (1/(s\varepsilon )),\,(a/s)^{r/(r-1)}\}$,

$$\displaystyle{U_{1}:= \frac{4Aa^{N_{1}}} {\pi r(\rho s^{r})^{(N_{1}+1)/r}} \frac{\varGamma ((N_{1} + 1)/r)} {\varGamma (N_{1} + 1)} \lesssim \varepsilon.}$$

Let M be such that aM ≥ 1 and (ρ ^1∕r SM) ≥ 2a. Then, for $N_{2} \geq \max \{ (2N_{1} + 1)(r - 1)/(r(2 - r)),\,e^{3}(\rho ^{1/r}SM)^{r/(r-1)},\,\log (1/\varepsilon )\}$,

$$\displaystyle{(1 + aM)^{N_{1} }(\rho (SM)^{r})^{N_{2} }M \leq (\rho ^{1/r}SM)^{rN_{2}+2N_{1}+1} \leq (\rho ^{1/r}SM)^{rN_{2}/(r-1)}}$$

and

$$\displaystyle{U_{2}:= \frac{\sqrt{2}A(1 + aM)^{N_{1}}(\rho (SM)^{r})^{N_{2}}M} {\pi ^{3/2}e^{-N_{2}}N_{2}^{N_{2}-1/2}} \lesssim \varepsilon.}$$

Hence, $N_{2} \lesssim \max \{ (a/s)^{r/(r-1)},\,\left ((S/s)\right )^{r/(r-1)}(\log (1/s^{r}\varepsilon ))^{1/(r-1)}\}$.

In the case where 0 < r ≤ 1, since (a∕s) ≥ 1, we need to restrict the support of the mixing distribution F. To the aim, we consider a partition of [−a, a] into $k = \lceil (a/s)(\log (1/(s\varepsilon )))^{1/r-1}\rceil$ subintervals I ₁, …, I _k of equal length $0 <l \leq 2s(\log (1/(s\varepsilon )))^{-(1-r)/r}$ and, possibly, a final interval I _k+1 of length 0 ≤ l _k+1 < l. Let J be the number of intervals in the partition, which can be either k or k + 1. Write $F =\sum _{ j=1}^{J}F(I_{j})F_{j}$, where F _j denotes the re-normalized restriction of F to I _j. Then, for each $s \leq \sigma \leq S$, we have $(F {\ast} K_{\sigma })(x) =\sum _{ j=1}^{J}F(I_{j})(F_{j} {\ast} K_{\sigma })(x)$, $x \in \mathbb{R}$. For any probability measure F′ such that F′(I _j) = F(I _j), j = 1, …, J,

$$\displaystyle\begin{array}{rcl} & & \vert \mathbb{E}[(F {\ast} K_{\varSigma })(x)] - \mathbb{E}'[(F' {\ast} K_{\varSigma })(x)]\vert {}\\ & &\quad \leq \sum _{j=1}^{J}F(I_{ j})\vert \mathbb{E}[(F_{j} {\ast} K_{\varSigma })(x)] - \mathbb{E}'[(F'_{j} {\ast} K_{\varSigma })(x)]\vert,\qquad \,x \in \mathbb{R}. {}\\ \end{array}$$

Reasoning as in the case where 1 < r < 2, with a to be understood as l∕2 and N ₁ as the number of support points of the generic F _j, for $M \geq ((\rho /2)^{1/r}s)^{-1}(\log (1/\varepsilon ))^{1/r}$,

$$\displaystyle{\vert \mathbb{E}[(F_{j} {\ast} K_{\varSigma })(x)] - \mathbb{E}'[(F'_{j} {\ast} K_{\varSigma })(x)]\vert \lesssim U + V \lesssim (U_{1} + U_{2})+\varepsilon,\qquad x \in \mathbb{R}.}$$

Since $(a/s) \lesssim (\log (1/(s\varepsilon )))^{-(1-r)/r}$ by construction, for $N_{1} =\log (1/(s\varepsilon ))$, it turns out that $U_{1} \lesssim \varepsilon$. For $N_{2} \geq \max \{ N_{1},\,2e^{4}\rho (SM)^{r}\log (1/(s\varepsilon )),\,\log (1/\varepsilon )\}$,

$$\displaystyle{(1 + aM)^{N_{1} }(\rho (SM)^{r})^{N_{2} }M \leq M(2\rho (SM)^{r}\log (1/(s\varepsilon )))^{N_{2} }}$$

and $U_{2} \lesssim \varepsilon$. Then, $N_{2} \lesssim (S/s)^{r}(\log (1/(s\varepsilon )))^{2}$ and the total number N ^T of support points of F′ is bounded above by

$$\displaystyle{J \times N_{1} \lesssim J \times N_{2} \lesssim \frac{a} {s} \times \left (\frac{S} {s} \right )^{r}\left (\log \frac{1} {s\varepsilon }\right )^{1+1/r}.}$$

The proof is thus complete.

Remark 5

Lemma 1 does not cover the case where r = 2, i.e., the kernel is Gaussian: this might possibly be due to the arguments laid out in the proof. This case can be retrieved from Lemma A.2 in Scricciolo [6] when p = 2.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Scricciolo, C. (2016). Rates for Bayesian Estimation of Location-Scale Mixtures of Super-Smooth Densities. In: Alleva, G., Giommi, A. (eds) Topics in Theoretical and Applied Statistics. Studies in Theoretical and Applied Statistics(). Springer, Cham. https://doi.org/10.1007/978-3-319-27274-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-27274-0_5
Published: 20 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-27272-6
Online ISBN: 978-3-319-27274-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Rates for Bayesian Estimation of Location-Scale Mixtures of Super-Smooth Densities

Abstract

Similar content being viewed by others

Nonparametric estimation of the mixing density using polynomials

Bayes and maximum likelihood for \(L^1\)-Wasserstein deconvolution of Laplace mixtures

A non-parametric Bayesian approach to decompounding from high frequency data

Keywords

1 Introduction

2 Main Result

Remark 1

Remark 2

Theorem 1

Proof

Remark 3

Remark 4

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Lemma 1

Proof

Remark 5

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Rates for Bayesian Estimation of Location-Scale Mixtures of Super-Smooth Densities

Abstract

Similar content being viewed by others

Nonparametric estimation of the mixing density using polynomials

Bayes and maximum likelihood for \(L^1\)-Wasserstein deconvolution of Laplace mixtures

A non-parametric Bayesian approach to decompounding from high frequency data

Keywords

1 Introduction

2 Main Result

Remark 1

Remark 2

Theorem 1

Proof

Remark 3

Remark 4

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 1

Proof

Remark 5

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation