Random Dimension Low Sample Size Asymptotics

Christoph, Gerd; Ulyanov, Vladimir V.

doi:10.1007/978-3-030-83266-7_16

Gerd Christoph⁴ &
Vladimir V. Ulyanov^5,6

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 371))

Included in the following conference series:

International Conference on Stochastic Methods

549 Accesses
1 Citations

Abstract

A first investigation of high-dimensional low-sample-size (HDLSS) asymptotics, Hall, Marron and Neeman (2005) discovered a surprisingly rigid geometric structure. A sample of size k taken from the standard m-dimensional normal distribution is for large m close to the vertices of the k-dimensional simplex in m-dimensional vector space. It follows from the analysis of three geometric statistics: the length of an observation, the distance between any two independent observations and the angle between these vectors. We generalize and refine the results constructing the second order Chebyshev-Edgeworth expansions under assumption that the data dimension is random and different scaling factors are chosen.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Asymptotic Properties of Multivariate Order Statistics with Random Index

Article 16 October 2014

High-Dimensional Linear Models: A Random Matrix Perspective

Article 06 October 2020

Multivariate Order Statistics: the Intermediate Case

Article 06 June 2017

Keywords

1 Three Geometric Statistics of Gaussian Vectors

We continue to study properties of high-dimensional Gaussian random vectors. In our earlier papers Christoph, Prokhorov and Ulyanov [8] and Bobkov, Naumov and Ulyanov [5] two-sided bounds were constructed for a probability density function of the distance of a Gaussian random element Y with zero mean from a point a in a Hilbert space $\mathbb {H}$. We get new results for basic geometric statistics connected with high-dimensional random normal vectors.

Let $\mathbf {X}_1=(X_{1,1},...,X_{1,m})^T$,..., $\mathbf {X}_k=(X_{k,1}...,X_{k,m})^T$ be a random sample.

In a high-dimension low-sample-size (HDLSS) data it is assumed that dimension m tends to infinity and sample size k is fixed.

One of the first investigation of HDLSS data was done in Hall, Marron and Neeman (2005) [14]. It became the basis of research in high-dimensional mathematical statistics. See a recent survey on HDLSS asymptotics and its applications in Aoshima et al. [1]. Further development see e.g. in Fujikoshi, Ulyanov and Shimizu [12] when both m and k may tend to infinity. This is an important framework of the current data analysis called Big data. In [14] it was discovered a surprisingly rigid geometric structure. A sample of size k taken from the standard m-dimensional normal distribution is close for large m to the vertices of the k-dimensional simplex in $\mathbb {R}^m$. It follows from the analysis of three geometric statistics:

the length $||\mathbf {X}_i||_m$ of an observation,
the distance $||\mathbf {X}_i- \mathbf {X}_j||_m$ between any two independent observations,
and the angle $\theta _m = \mathrm {ang}(\mathbf {X}_i,\mathbf {X}_j)$ between these vectors.

We generalize and refine the results constructing the second order Chebyshev-Edgeworth expansions under assumption that the data dimension is random and different scaling factors are chosen.

In case of $\dim \mathbb {H} < \infty $ we consider a sample of size k when the dimension of the observations is a random variable $N_n$ with values in $\mathbb {N}_+ = \{1, 2, \ldots \}$.

The present work continues our investigations in Christoph and Ulyanov [9] on these three geometric statistics of Gaussian vectors with randomly distributed dimension $N_n$ which depends on parameter $n \in \mathbb {N}_+$ and $N_n \rightarrow \infty $ in probability as $n \rightarrow \infty $. Let the vectors $\mathbf {X}_1, ..., \mathbf {X}_k$ and $N_1,N_2, ...$ be defined on one and the same probability space and it is assumed that they are independent. If $T_m := T_m \left( \mathbf {X}_1,..., \mathbf {X}_k\right) $ is some statistic of the vectors $\mathbf {X}_1,..., \mathbf {X}_k$ with non-random dimension $m \in \mathbb {N}_+$ then the random variable $T_{N_n}=T_{N_n}(\omega )$ is defined as:

$$ T_{N_n} (\omega ) := T_{N_n(\omega )} \left( \mathbf {X}_1(\omega ),..., \mathbf {X}_k (\omega )\right) ,\quad \omega \in \varOmega \quad \text{ and } \quad n \in \mathbb {N}_+. $$

Therefore, the statistics $T_{N_n}$ based on statistics $T_m$ are constructed from the sample $\{\mathbf {X}_1,..., \mathbf {X}_k\}$, where these vectors have the dimension $N_n$.

In [9], the distribution function of the normalized angle $\theta _m =\mathrm {ang}(\mathbf {X}_i,\mathbf {X}_j)$ was approximated by a second order Chebyshev-Edgeworth expansion with a bound $\le C m^{-2}$ for all $m \in \mathbb {N}_+$. Furthermore, the fixed dimension m of the Gaussian vectors was substituted by a random number $N_n$ and expansions for statistics $\theta _{N_n}$ were proved.

A natural question arises whether similar results hold for the length $||\mathbf {X}_i||_{N_n}$ and the distance $||\mathbf {X}_i- \mathbf {X}_j||_{N_n}$ of Gaussian vectors with random dimension ${N_n}$.

Two cases of random dimensions (or random sample sizes) $N_n$ are considered as e.g. in Bening, Galieva and Korolev [2], Christoph, Monakhov and Ulyanov [7] and Christoph and Ulyanov [9]:

i)
The random dimension $N_n=N_n(r) \in \mathbb {N}_+$ has negative binomial distribution displaced by 1 with probability of success 1/n, positive parameter $r>0$ and probabilities
$$\begin{aligned} \mathbb {P}(N_n(r)=j)= \frac{\varGamma (j+r-1)}{\varGamma (j) \, \varGamma (r)}\left( \frac{1}{n} \right) ^{r}\left( 1 - \frac{1}{n} \right) ^{j-1}, \,\,\, j \in \mathbb {N}_+ . \end{aligned}$$
(1)
ii)
The random dimension $N_n=N_n(s) \in \mathbb {N}_+$ is discrete Pareto-like distributed with parameters $n \in \mathbb {N}_+$, $ s>0$ and distribution function
$$\begin{aligned} \mathbb {P}(N_n(s)\le k)=\left( \frac{k}{s+k}\right) ^{n}\quad \text{ where } \quad N_n(s)= \max _{1\le j \le n} Y_j(s), \end{aligned}$$
(2)
and $Y(s), Y_1(s), Y_2(s), . . . $, are independent discrete Pareto II distributed random variables with the common distribution
$$\begin{aligned} \mathbb {P}\big (Y(s)\le k\big ) =\frac{k}{s+k} \quad \text{ and } \quad \mathbb {P}(Y(s)=k)=\frac{s}{(s+k)\,(s+k-1)}, \,\,k \in \mathbb {N_+}. \end{aligned}$$
(3)
The discrete Y(s) on integers is the discretized continuous Pareto II (Lomax) random variable, see Buddana and Kozubowski [6].

Both cases of random dimensions of the Gaussian vectors are also interesting because $\mathbb {E}N_n(r) = r (n-1) +1 < \infty $ and $\mathbb {E}N_n(s) = \infty $, which has an influence on the normalization factors.

The rest of the paper is organized as follows: In Sect. 2, Chebyshev-Edgeworth expansions are proved for the geometric statistics of Gaussian vectors with fixed dimension m. Section 3 presents the transfer theorem for results with fixed sample size (in our case the dimension of the vectors) m to those with random sample size $N_n$. The main results are given in Sects. 4 and 5 when the random sample size is negative binomial $N_n(r)$ or discrete Pareto-like $N_n(s)$ distributed, respectively. In Sect. 6 the main results are proved.

2 Approximation for Geometric Statistics of m-Dimensional Normal Vectors

Let $\mathbf {X}_i=(X_{i,1},...,X_{i,m})^T$,..., $\mathbf {X}_j=(X_{j,1}...,X_{j,m})^T$ be m-dimensional vectors chosen from a sample $\{\mathbf {X}_1, .... , \mathbf {X}_k\}$ of normal distribution $\mathcal {N}(\mathbf {0}_m, {\mathrm{I}}_m)$ with mean vectors $\mathbb {E}\mathbf {X}_k = \mathbf {0}_m$ and covariance matrix $\mathrm{I}_m$ for $1 \le i<j \le k\le m$.

The length of the vector $\mathbf {X}_j$ is defined by the Euclidean distance $|| \cdot ||_m$:

$$\begin{aligned} ||\mathbf {X}_i||_m = S_m^{1/2} \quad \text{ with } \quad S_m = \sum \nolimits _{k=1}^m X_{i,k}^2\,. \end{aligned}$$

(4)

and similarly the distance $||\mathbf {X}_i- \mathbf {X}_j||_m$ between any two independent vectors

$$\begin{aligned} ||\mathbf {X}_i - \mathbf {X}_i||_m = \sum \nolimits _{k=1}^m \left( X_{i,k} - X_{j,k}\right) ^2\,. \end{aligned}$$

(5)

The distribution of distance $||\mathbf {X}_i- \mathbf {X}_j||_m$ is closely linked to the distribution of length $||\mathbf {X}_i||_m$, since $(X_{i,k} - X_{j,k})/\sqrt{2}$ has also standard normal distribution $\varPhi (x)$. Therefore

$$\begin{aligned} \mathbb {P}(||\mathbf {X}_i- \mathbf {X}_j||_m/\sqrt{2} \le x) = \mathbb {P}(||\mathbf {X}_i||_m \le x). \end{aligned}$$

(6)

The angle $\theta _m = \mathrm {ang}(\mathbf {X}_i,\mathbf {X}_j)$ between these two independent vectors with vertex at the origin and the sample correlation coefficient $R_m(\mathbf {X}_i, \mathbf {X}_j)$ are connected by:

$$\begin{aligned} \cos \theta _m = \frac{||\mathbf {X}_i||_m^2+||\mathbf {X}_j||_m^2 - ||\mathbf {X}_i -\mathbf {X}_j||_m^2}{2\,||\mathbf {X}_i||_m\,\,||\mathbf {X}_j||_m} =R_m(\mathbf {X}_i,\mathbf {X}_j)=R_m. \end{aligned}$$

(7)

Hall, Marron and Neeman [14] showed

for the length $||\mathbf {X}_i||_m = \sqrt{m} +\mathcal {O}_p(1)$,
for the distance $||\mathbf {X}_i- \mathbf {X}_j||_m =\sqrt{2m} +\mathcal {O}_p(1)$ with $i \not = j$ and
for the $\theta _m$ = angle $\mathrm {ang}(\mathbf {X}_i,\mathbf {X}_j) = \frac{1}{2}\pi + \mathcal {O}_p(m^{-1/2})$ with $i \not = j$,

where $1 \le i<j\le k \le m$ and $\mathcal {O}_p$ refers to the stochastic boundedness.

The length of the vector $\mathbf {X}_i$ drawn from an m-dimensional normal distribution $\mathcal {N}({\mathbf {0},\mathrm{I}}_m)$ is defined in (4) as $||\mathbf {X}_i||_m = S_m^{1/2}$, where the statistics $S_m$ as a sum of the squares of m independent standard normal random variables has chi-square distribution with m degrees of freedom and

$$\begin{aligned} V_m = \frac{S_m -m}{\sqrt{2\,m} } \end{aligned}$$

(8)

is asymptotically standard normally distributed. With the two-term Chebyshev-Edgeworth expansions in the central limit theorem for the distribution function of $V_m$, the following inequality results for all $m \in \mathbb {N}$

$$ \left| P\Big (V_m \le x\Big ) - \varPhi (x) - \varphi (x)\,\Big (\frac{\lambda _3\,H_2(x)}{6\,\sqrt{m}} + \frac{\lambda _3^2 \,H_5(x)}{72\,m} + \frac{\lambda _4\,H_3(x)}{24\,m}\Big )\right| \le \frac{C}{m^{3/2}}$$

where $H_2(x) =x^2 -1$, $H_3(x)=x^3 - 3 x$, $H_5(x)= x^5 - 10x^3 + 15 x$ are the Chebyshev-Hermite polynomials, skewness $\lambda _3 = \sqrt{8}$ and excess kurtosis $\lambda _4 = 12$ of $S_1$, see Petrov [19, Sec. 5.7, Theorem 5.18].

Then $S_m =m(1+ \sqrt{2/m}\,V_m)$ and Tayor expansion of $(1+u)^{1/2}$ lead to

$$\begin{aligned} ||\mathbf {X}_i||_m = S_m^{1/2} = \sqrt{m} \left( 1+ \frac{1}{\sqrt{2\,m}}\,V_m - \frac{1}{4\,m}\,V^2_m + \frac{\sqrt{2}}{8\,m^{3/2}}V^3_m + ...\right) \end{aligned}$$

(9)

Define the statistics

$$\begin{aligned} Z_m = \sqrt{2}\Big (\frac{\displaystyle ||\mathbf {X}_i||_m}{\displaystyle \sqrt{m}}\,- 1\Big ) \quad \text{ and } \quad Z_m^* = \sqrt{2}\Big (\frac{\displaystyle ||\mathbf {X}_i - \mathbf {X}_j||_m}{\displaystyle \sqrt{2\,m}}\,- 1\Big ), \end{aligned}$$

(10)

then (6) results in

$$\begin{aligned} P\Big (\sqrt{m}\,Z_m \le x\Big ) = P\Big (\sqrt{m}\,Z_m^* \le x\Big ). \end{aligned}$$

(11)

It follows from (9) that the statistic $T_1=\sqrt{m} Z_m$ holds

$$\begin{aligned} T_1=\sqrt{m} Z_m = V_m- \frac{\sqrt{2}}{4 \sqrt{m}} \,V_m^2 + \frac{\sqrt{1}}{4 \,m} \,V_m^3 + ... \end{aligned}$$

(12)

Following the sketch of the proof in Kawaguchi, Ulyanov and Fujikoshi [16, Theorem 1] (The coefficients in the polynomial $l_2(x)$ are incorrect.) and calculating the characteristic function $f_{T_1}(t)$, we obtain

$$\begin{aligned} f_{T_1}(t)= & {} \mathbb {E}\left[ {\mathrm{e}}^{{\mathrm{i}} t V_m}\,\left( 1 - \frac{\sqrt{2}({\mathrm{i}} t)}{4 \sqrt{m}} V_m^2 + \frac{({\mathrm{i}} t)}{4\,m} V_m^3 + \frac{({\mathrm{i}} t)^2}{16\,m} V_m^4 + \mathcal {O}_p(m^{-3/2})\right) \right] \nonumber \\&= {\mathrm{e}}^{-t^2/2}\left( 1 - \frac{\sqrt{2}(({\mathrm{i}} t)^3 + 3 ({\mathrm{i}} t))}{12 \sqrt{m}} + \frac{({\mathrm{i}} t)^6 - 6 ({\mathrm{i}} t)^4 - 9({\mathrm{i}} t)^2}{144\,m} )\right) + \mathcal {O}(m^{-3/2}). \end{aligned}$$

(13)

This results in the related expansion of the corresponding distribution function:

Proposition 1

Let $\mathbf {X}_i$ be a vector drawn from an m-dimensional normal distribution $\mathcal {N}({\mathbf {0}_m,\mathrm{I}}_m)$. Then with the asymptotic expansion for the distribution of normalized length $Z_m = \sqrt{2}\Big (\frac{\displaystyle ||\mathbf {X}_i||_m}{\displaystyle \sqrt{m}}\,- 1\Big )$ we obtain the following inequality for all $m \in \mathbb {N}$:

$$\begin{aligned} \left| P\Big (\sqrt{m}\,Z_m \le x\Big ) - \varPhi (x) - \varphi (x)\,\Big (\frac{x^2 - 4}{6 \, \sqrt{2\,m}}\, + \frac{x^5 -16 x^3+ 24 x}{144\,m}\Big )\right| \le \frac{C}{m^{3/2}}. \end{aligned}$$

(14)

Corollary 1

Let $\mathbf {X}_i$ and $\mathbf {X}_j$, $i \not = j$ be independent random vectors with an m-dimensional normal distribution $\mathcal {N}({\mathbf {0}_m,\mathrm{I}}_m)$. Due to (11), distribution function of the normalized distance $Z_m^* = \sqrt{2}\Big (\frac{\displaystyle ||\mathbf {X}_i - \mathbf {X}_j||_m}{\displaystyle \sqrt{2\,m}}\,- 1\Big )$ has the same asymptotic expansion as the distribution of normalized length $Z_m$ and inequality (14) with replacing $Z_m$ by $Z_m^*$.

Second order Chebyshev-Edgeworth expansion of the angle $\theta _m = \mathrm {ang}(\mathbf {X}_i,\mathbf {X}_j)$ between independent vectors $\mathbf {X}_i$ and $\mathbf {X}_j$ with vertex at the origin and the corresponding sample correlation coefficient $R_m(\mathbf {X}_i, \mathbf {X}_j)$ with computable error bounds of approximation are shown in Christoph and Ulyanov [9, Section 2], using results of Konishi [17, Sect. 4], Johnson, Kotz and Balakrishnan [15, Chap. 32], Christoph, Ulyanov and Fujikoshi [11]:

$$\begin{aligned} \sup \nolimits _x \left| P\Big (\sqrt{m}\,\,R_m \le x\Big ) \,-\, \varPhi (x) - \frac{x^3 - 5 x}{4\, m} \,\varphi (x) \right| \le \frac{B_1}{m^2} \end{aligned}$$

(15)

and

$$\begin{aligned} \sup \limits \nolimits _{x} \Big | P\left( \sqrt{m}(\theta _m - \frac{\pi }{2}) \le x\right) - \varPhi (x) - \frac{x^3 -15 x}{12 \,m} \,\varphi (x) \Big |\le \frac{B_2}{m^2}. \end{aligned}$$

(16)

The estimates (15) and (16) were used in Christoph and Ulyanov [9] to obtain second order approximations the statistics $R_{N_n}$ and $\varTheta _{N_n}= \theta _{N_n} - \pi /2$ when the non-random dimension m of the vectors is replaced be a random dimension $N_n$, where the random dimension $N_n \rightarrow \infty $ in probability when the parameter $n \rightarrow \infty $.

Analogous results for the statistics $||\mathbf {X}_i||_m$ and $||\mathbf {X}_i- \mathbf {X}_j||_m$ are proven in Sects. 4 and 5 below, when the non-random dimension m is replaced be a random dimension $N_n$.

3 Auxiliary Proposition

In this section, expansions for the distribution function of statistics $T_{N_n}$ obtained from samples with random sample size (here with random dimension $N_n$ of the considered vectors $\mathbf {X}_i$) are obtained. These depend directly on the expansions concerning statistics $T_m$ based on non-random samples size m and expansions regarding the random sample size $N_n$.

First we formulate the conditions determining expansions for the statistic $T_m$ with $\mathbb {E} T_m =0$ and the normalized random dimension $N_n$:

Assumption A:

Given $\gamma \in \{- 1/2, 0, 1/2\}$, $a > 1$, $C_1 > 0$ and differentiable functions $f_1(x), f_2(x)$ with bounded derivatives $f'_1(x), f'_2(x)$ such that

$$\begin{aligned} \sup \nolimits _{x}\Big |\mathbb {P}\big (m^{\gamma } T_m \le x \big ) - \varPhi (x) - \frac{f_1(x)}{\sqrt{m}} - \frac{f_2(x)}{m} \Big | \le \frac{C_1}{m^{a}}\quad \text{ for } \text{ all } \quad m \in \mathbb {N}. \end{aligned}$$

(17)

Remark 1

Statistics satisfying Assumption A are shown in (14), (15) and (16).

Assumption B:

Given constants $b > 0$ and $C_2 > 0$, real numbers $g_n$ with $0<g_n \uparrow \infty $ if $n \rightarrow \infty $, a distribution function H(y) with $H(0+) = 0$ and a function $h_2(y)$of bounded variation that

$$\begin{aligned} \sup \limits _{y \ge 0} \left| \mathbb {P}\left( \frac{N_n}{g_n} \le y\right) - H(y) - \frac{h_2(y) \, \mathbb {I}_{\{b>1\}}(b)}{n} \right| \le \frac{C_2}{n^{b}} \quad \text{ for } \text{ all } \quad n \ge 1. \end{aligned}$$

(18)

where $\mathbb {I}_{A}(x)= \left\{ \begin{array}{ll} 1,\quad &{} x\in A \\ 0, &{} x \notin A \end{array}\right. $ defines the indicator function of a set $A \subset \mathbb {R}$.

Remark 2

The random dimensions $N_n(r)$ and $N_n(s)$ given in (1) and (2), respectively, fulfill Assumption B as shown in [9, Propositions 1 and 2], see (29) and (39) below.

Proposition 2

Let $\gamma \in \{1/2,\, 0,\, -1/2\}$ and both Assumption A and B as well as the following requirements on H(.) and $h_2(.)$ are fulfilled

$$\begin{aligned} \left. \begin{array}{llll} i: \quad &{} H(1/g_n) \le c_1\,g_n^{- b} \qquad &{} \text{ for } &{} b> 0,\\ ii: \quad &{} \int \nolimits _0^{1/g_n} y^{-\,1/2} dH(y) \le c_2\,g_n^{- b + 1/2} \qquad &{} \text{ for } &{} b> 1/2,\\ iii: \quad &{} \int \nolimits _0^{1/g_n} y^{-\,1} dH(y)\le c_3\,g_n^{- b + 1} \qquad &{} \text{ for } &{} b>1, \end{array}\,\right\} \end{aligned}$$

(19)

$$\begin{aligned} \left. \begin{array}{llll} i: &{} h_2(0) = 0, \quad \text{ and } \quad |h_2(1/g_n)| \le c_4\,n\,g_n^{- b}&{} \text{ for } &{} b>1,\\ ii: &{} \int \nolimits _0^{1/g_n} y^{-\,1} |h_2(y)| dy \le c_5\,n \,g_n^{- b}&{} \text{ for } &{} b>1, \end{array}\,\right\} \end{aligned}$$

(20)

where b is the convergence rate in (18). Then for all $n \ge 1$ is valid:

$$\begin{aligned} \sup \limits _{x \in \mathbb {R}}\Big | \mathbb {P}\Big (g_n^{\gamma } T_{N_n} \le x\Big ) - G_{n,2}(x)\Big | \le C_1 \,\mathbb {E} \left( N_n^{- a}\right) + (C_3 D_n + C_4)\, n^{- b}+ I_n , \end{aligned}$$

(21)

where with $a>1, b>0, f_1(z), f_2(z), h_{2}(y)$ are given in (17) and (18)

$$\begin{aligned} G_{n,2}(x) = \left\{ \begin{array}{lr} \int \limits ^{\infty }_{0} \varPhi (x\,y^\gamma ) \mathrm{d}H(y), &{} 0<b \le 1/2, \\ \int \limits _{0}^{\infty } \Big (\!\varPhi (xy^\gamma ) + \frac{\displaystyle f_1(x\,y^\gamma )}{\displaystyle \sqrt{g_n y}} \Big ) \mathrm{d}H(y) =: G_{n,1}(x), &{} 1/2 < b \le 1,\\ G_{n,1}(x) + \int \limits ^{\infty }_{0} \frac{\displaystyle f_2(x\,y^\gamma )}{\displaystyle g_n y}\mathrm{d}H(y) + \int \limits _{0}^{\infty } \frac{\displaystyle \varPhi (x\,y^\gamma )}{\displaystyle n} \mathrm{d}h_2(y), &{} b > 1,\end{array} \right\} \end{aligned}$$

(22)

$$\begin{aligned} D_n= \sup _x \int _{1/g_n}^\infty \left| \frac{\displaystyle \partial }{\displaystyle \partial y} \left( \varPhi (xy^{\gamma }) + \frac{f_1(xy^{\gamma })}{\sqrt{g_n y}} + \frac{f_2(xy^{\gamma })}{y g_n} \right) \right| \mathrm{d}y, \end{aligned}$$

(23)

$$\begin{aligned} I_n = \sup \nolimits _x \left( |I_1(x,n)| + |I_2(x,n)|\right) , \end{aligned}$$

(24)

$$\begin{aligned} I_1(x,n) = \int \nolimits _{1/g_n}^\infty \Big (\frac{f_1(x\,y^\gamma )\,\,{\mathbb {I}}_{(0 , 1/2]}(b)}{\sqrt{g_n y\,}} + \frac{f_2(x\,y^\gamma )}{g_n\,y}\Big ) \mathrm{d}H(y), \quad \quad b\le 1, \end{aligned}$$

(25)

and

$$\begin{aligned} I_2(x,n) = \int \nolimits _{1/g_n}^\infty \Big (\frac{f_1(x\,y^\gamma )}{n\,\sqrt{g_n y\,}} + \frac{f_2(x\,y^\gamma )}{n\,g_n y}\Big ) \mathrm{d}h_2(y), \quad \quad b > 1. \end{aligned}$$

(26)

The constants $C_1, C_3, C_4$ are independent of n.

Proof

The proof is based on the statement in [2, Theorem 3.1] for $\gamma \ge 0$. Since in Theorems 1 and 2 in the present paper as well as in Christoph and Ulyanov [9, Theorems 1 and 2] the case $\gamma = - 1/2$ is also considered, therefore the proof was adapted to $\gamma \in \{1/2,\, 0,\, -1/2\}$ in [9]. The conditions (19) and (20) guarantee integration range $(0 , \infty )$ of the integrals in (22). The approximation function $G_{n,2}(x)$ in (22) is now a polynomial in $g_n^{-1/2}$ and $n^{-1/2}$. Present Proposition 2 differs from Theorems 1 and 2 in [9] only by the term $f_1(xy^{\gamma })\,(g_n y)^{- 1/2}$ and the added condition (19ii) to estimate this term. Therefore here the details are omitted. $\square $

Remark 3

The domain $[1/g_n, \infty )$ of integration depends on $g_n$ in (23), (25) and (26). Some of the integrals in (25) and (26) could tend to infinity with $1/g_n \rightarrow 0$ as $n \rightarrow \infty $ and thus worsen the convergence rates of the corresponding terms. See (47) in Sect. 6.

In the next two sections we consider the statistics $Z_m$ and $Z_m^*$ defined in (10) and the cases when the random dimension $N_n$ is given in either (1) or (2). We use Proposition 2 when the limit distributions of scaled statistics $Z_{N_n}$ are scale mixtures $G_\gamma (x) = \int _0^\infty \varPhi (x\, y^{\gamma }) \mathrm{d}H(y)$ with $\gamma \in \{1/2,\, 0,\, -1/2\}$ that can be expressed in terms of the well-known distributions. We obtain non-asymptotic results for the statistics $Z_{N_n}$ and $Z_{N_n}^*$, using second order approximations the statistics $Z_m$ and $Z_m^*$ given in (14) as well as for the random sample size $N_n$. In both cases the jumps of the distribution function of the random sample size $N_n$ only affect the function $h_2(y)$ in formula (18).

4 The Random Dimension $N_n(r)$ is Negative Binomial Distributed

The negative binomial distributed dimension $N_n(r)$ has probability mass function (1)) and $g_n = \mathbb {E}(N_n(r))= r\,(n-1) +1$. Schluter and Trede [21] (Sect. 2.1) underline the advantage of this distribution compared to the Poisson distribution for counting processes. They showed in a general unifying framework

$$\begin{aligned} \lim \nolimits _{n\rightarrow \infty } \sup \nolimits _y\left| \mathbb {P}(N_n(r)/g_n \le y) - G_{r,r}(y)\right| = 0, \end{aligned}$$

(27)

where $G_{r,r}(y)$ is the Gamma distribution function with the identical shape and scale parameters $r >0$ and density

$$\begin{aligned} g_{r,r}(y)=\frac{r^r}{\varGamma (r)}\,\ y^{r - 1} \, e^{- r y}\,\,{\mathbb {I}}_{(0\,,\,\infty )}(y)\quad \text{ for } \text{ all } \quad y \in \mathbb {R}. \end{aligned}$$

(28)

Statement (27) was proved earlier in Bening and Korolev [3, Lemma 2.2].

In [9, Proposition 1] the following inequality was proved for $r >0$:

$$\begin{aligned} \sup \limits _{y \ge 0} \left| \mathbb {P} \left( \frac{N_n(r)}{g_n} \le y\right) - G_{r,r}(y) - \frac{h_{2;r}(y)\,{\mathbb {I}}_{\{r > 1\}}(r)}{n} \right| \le \frac{C_2(r)}{n^{\min \{r , 2\}}}, \end{aligned}$$

(29)

where $h_{2;r}(y) = \frac{\displaystyle 1}{\displaystyle 2\,r}\, g_{r,r}(y) \left( (y-1)(2 - r) + 2 Q_1\big (g_n \,y\big )\right) $ for $r > 1$,

$$\begin{aligned} Q_1(y) = 1/2 - (y - [y]) \quad \text {and}\,\,\,[y]\,\,\text { is the integer part of a value }y. \end{aligned}$$

(30)

Both Bening, Galieva and Korolev [2] and Gavrilenko, Zubov and Korolev [13] showed the rate of convergence in (29) for $r \le 1$. In Christoph, Monakhov and Ulyanov [7, Theorem 1] the Chebyshev-Edgeworth expansion (29) for $r>1$ is proved.

Remark 4

The random dimension $N_n(r)$ satisfies Assumption 2 of the Transfer Propositions 2 with $g_n=\mathbb {E}N_n(r)$, $H(y)=G_{r,r}(y)$, $h_2(y) =h_{2;r}(y)$ and $b=2$.

In (21), negative moment $\mathbb {E}(N_n(r))^{-a}$ is required where $m^{-a}$ is rate of convergence of Chebyshev-Edgeworth expansion for $T_m$ in (17). Negative moments $\mathbb {E}(N_n(r))^{-a}$ fulfill the estimate:

$$\begin{aligned} \mathbb {E}\big (N_n(r)\big )^{- a} \le C(a, r) \left\{ \begin{array}{ll} n^{- \min \{r, \, a\}}, r \not = a\\ \ln (n) \, n^{- a}, r = a \end{array} \right. \quad \text{ for } \text{ all } \quad r>0 \quad \text{ and } \quad a > 0. \end{aligned}$$

(31)

For $r = a $ the factor $\ln \,n$ cannot be removed. In Christoph, Ulyanov and Bening [10, Corollary 4.2] leading terms for the negative moments of $\mathbb {E}\big (N_n(r)\big )^{- p}$ were derived for any $p>0$ that lead to (31).

The expansions of the length of the vector $Z_m$ in (14) as well as of the sample correlation coefficient $R_n$ in (15) and the angle $\theta _m$ in (16) have as limit distribution the standard normal distribution $\varPhi (x)$. Therefore, with $g_n= \mathbb {E}N_n(r)$ and $\gamma \in \{1/2,\, 0,\, -1/2\}$, limit distributions for

$$\mathbb {P}\Big (g_n^{\gamma } (N_n(r))^{1/2 - \gamma } Z_{N_n(r)} \le x\Big ) \quad \text{ are } \quad G_\gamma (x,r) =\int \nolimits _0^\infty \varPhi (x\,y^\gamma ) \mathrm{d}G_{r,r}(y).$$

These scale mixtures distributions $G_\gamma (x,r)$ are calculated in Christoph and Ulyanov [9, Theorems 3–5]. We apply Proposition 2 to the statistics

$$T_{N_n(r)} = N_n(r)^{1/2 - \gamma }\, Z_{N_n(r)} \quad \text{ with } \text{ the } \text{ normalizing } \text{ factor } \quad g_n^{\gamma }=\mathbb {E}(N_n(r))^\gamma .$$

The limit distributions are:

for $\gamma = 1/2$ and $r>0$ the Student’s t-distribution $S_{2\,r}(x)$ with density
$$\begin{aligned} s_{2\,r}(x) = \frac{\displaystyle \varGamma (r+1/2)}{\displaystyle \sqrt{2\,r \pi } \, \varGamma (r)}\,\, \Big (1 + \frac{\displaystyle x^2}{\displaystyle 2\,r}\Big )^{- (r+1/2)}, \quad x \in \mathbb {R}, \end{aligned}$$
(32)
for $\gamma =0$ the normal law $\varPhi (x)$,
for $\gamma = - 1/2$ and $r=2$ the generalized Laplace distributions $L_2(x)$ with density $l_2(x)$:
$$L_2(x)= \frac{1}{2} + \frac{1}{2}\, \mathrm{sign}(x)\,(1 - (1 + |x|)\, e^{- 2\,|x|} ) \quad \text{ and } \quad l_2(x) = \left( \frac{\displaystyle 1}{\displaystyle 2} + |x|\right) \, e^{- 2\,|x|}.$$
For arbitrary $r>0$ Macdonald functions $K_{r-1/2}(x)$ occur in the density $l_r(x)$, which can be calculated in closed form for integer values of r.

The standard Laplace density with variance 1 is $l_1(x) = \frac{\displaystyle 1}{\displaystyle \sqrt{2}} \, {\mathrm{e}}^{- \sqrt{2}\,|x|}$.

Theorem 1

Let $Z_m$ and $N_n(r)$ with $r>0$ be defined by (10) and (1), respectively. Suppose that (14) is satisfied for $Z_m$ and (29) for $N_n(r)$. Then the following statements hold for all $n \in \mathbb {N}_+$:

(i)
Student’s t approximation using scaling factor $\sqrt{\mathbb {E}N_n(r)}$ by $Z_{N_n(r)}$
$$\begin{aligned} \sup \nolimits _{x} \left| \mathbb {P}\left( \sqrt{g_n}\,Z_{N_n(r)}\le x\right) - S_{2r; n}(x) \right| \le C_r \left\{ \begin{array}{ll} n^{- \min \{r, 3/2\}}, &{} r \not = 3/2, \\ \ln (n) \, n^{- 3/2}, &{} r = 3/2, \end{array} \right. \end{aligned}$$
(33)
where
$$\begin{aligned} S_{2r; n}(x)= & {} S_{2 r}(x) + \, s_{2 r}(x) \left( \frac{\sqrt{2}\,((2r - 5) x^2 - 8 r)}{12\,(2 r - 1) \sqrt{g_n}}\right. \mathbb {I}_{\{r> 1/2\}}(r)\nonumber \\&\,\, \left. + \,\frac{96 r^2 x + (-64 r^2 + 128 r) x^3 + (4 r^2 - 32 r + 39) x^5}{(x^2 + 2 r) (2 r - 1)\,g_n} \mathbb {I}_{\{r > 1\}}(r)\right) , \end{aligned}$$
(34)
(ii)
Normal approximation with random scaling factor $N_n(r)$ by $Z_{N_n(r)}$
$$\begin{aligned} \sup \nolimits _{x} \left| \mathbb {P}(\sqrt{N_n(r)}\,Z_{N_n(r)} \le x) - \varPhi _{n\,,\,2}(x)\right| \le C_r \left\{ \begin{array}{ll} n^{- \min \{r, 3/2\}}, &{} r \not = 3/2, \\ \ln (n) \, n^{- 3/2}, &{} r = 3/2, \end{array}\right. \end{aligned}$$
(35)
where
$$\begin{aligned} \varPhi _{n\,,\,2}(x)= & {} \varPhi (x) + \frac{\sqrt{2}\, r \, \varGamma (r - 1/2)}{12\,\varGamma (r)\,\sqrt{g_n}} \,(x^2 -4) \varphi (x)\,\mathbb {I}_{\{r>1/2\}}(r) \nonumber \\& \,+ \,\frac{x^5 - 16 \,x^3 + 24\,x}{144\,g_n}\,\left( \frac{r}{r-1} \,\mathbb {I}_{\{r>1\}}(r)\,+\,\ln \,n \, \mathbb {I}_{\{r=1\}}(r)\right) . \end{aligned}$$
(36)
(iii)
Generalized Laplace approximation if $r=2$ with mixed scaling factor $g_n^{- 1/2}\,N_n(2)$ by $Z_{N_n(2)}$
$$\begin{aligned} \sup \nolimits _{x} \left| \mathbb {P}\left( g_n^{- 1/2}\,N_n(2)\,Z_{N_n(2)}\,\le \,x\right) - L_{n;2}(x) \right| \le C_2\, n^{- 3/2} \end{aligned}$$
(37)
where
$$\begin{aligned} L_{n;2}(x)= & {} L_2(x) - \frac{1}{3\,\sqrt{g_n}} \,\left( \frac{1}{\sqrt{2}} + \sqrt{2} |x| - \,x^2\right) \,{\mathrm{e}}^{- 2 |x|}\nonumber \\&\,+ \,\frac{1}{33\,g_n}\,( 12\,\sqrt{2}\,x - 15 |x|\,x +2\,x^3 )\, {\mathrm{e}}^{- 2 |x|}\,. \end{aligned}$$
(38)

5 The Random Dimension $N_n(s)$ is Discrete Pareto-Like Distributed

The Pareto-like distributed dimension $N_n(s)$ has probability mass function (2) and $\mathbb {E}(N_n(s))=\infty $. Hence $g_n = n$ is chosen as normalizing sequence for $N_n(s)$.

Bening and Korolev [4, Sect. 4.3] showed that for integer $s \ge 1$

$$ \lim \nolimits _{n \rightarrow \infty } \sup \nolimits _{y>0}|\mathbb {P}(N_n(s)\le n\,y) - H_s(y)| =0. $$

where $H_s(y) = {\mathrm{e}}^{-s/y} {\mathbb {I}}_{(0\,,\,\infty )}(y)$ is the continuous distribution function of the inverse exponential $W(s)=1/V(s)$ with exponentially distributed V(s) having rate parameter $s>0$. As $\mathbb {P}(N_n(s) \le y)$, so $H_s(y)$ is heavy tailed with shape parameter 1 and $\mathbb {E}W(s) = \infty $.

Lyamin [18] proved a bound $|\mathbb {P}(N_n(s)\le n\,y) - H_s(y)| \le C/n$ and $C< 0.37$ for integer $s \ge 1$.

In [9, Proposition 2] the following results are presented for $s >0$:

$$\begin{aligned} \sup _{y>0}\left| \mathbb {P}\left( \frac{N_n(s)}{n} \le y\right) - H_s(y) - \frac{h_{2;s}(y)}{n}\right| \le \frac{C_3(s)}{n^2}, \quad \text{ for } \text{ all } \quad n \in \mathbb {N}_+, \end{aligned}$$

(39)

with $H_s(y) = {\mathrm{e}}^{-s/y}$ and $h_{2;s}(y) = s\, {\mathrm{e}}^{-s/y}\,\big (s-1 + 2 Q_1(n\,y)\big )/\big (2\,y^2\big )$ for $y>0$, where $Q_1(y)$ is defined in (30). Moreover

$$\begin{aligned} \mathbb {E}\big (N_n(s)\big )^{- p} \le C(p) \, n^{- \min \{p, 2\}}, \end{aligned}$$

(40)

where for $0< p \le 2$ the order of the bound is optimal.

The Chebyshev-Edgeworth expansion (39) is proved in Christoph, Monakhov and Ulyanov [7, Theorem 4]. The leading terms for the negative moments $\mathbb {E}\big (N_n(s)\big )^{- p}$ were derived in Christoph, Ulyanov and Bening [10, Corollary 5.2] that lead to (40).

Remark 5

The random dimension $N_n(s)$ satisfies Assumption 2 of the Transfer Propositions 2 with $H_s(y) = {\mathrm{e}}^{-s/y}$, $h_2(y) =h_{2;s}(y)$, $g_n=n$ and $b=2$.

With $g_n= n$ and $\gamma \in \{1/2,\, 0,\, -1/2\}$, the limit distributions for

$$\mathbb {P}\left( n^{\gamma } N_n(s)^{1/2 - \gamma } Z_{N_n(s)} \le x\right) \quad \text{ are } \text{ now } \quad G_\gamma (x,s) =\int \nolimits _0^\infty \varPhi (x\,y^\gamma ) {\mathrm{d}} H_s(y).$$

These scale mixtures distributions $G_\gamma (x,s)$ are calculated in Christoph and Ulyanov [9, Theorems 6–8]. We apply Proposition 2 to statistics

$$T_{N_n(s)} = N_n(s)^{1/2 - \gamma }\, Z_{N_n(s)}\quad \text{ with } \text{ the } \text{ normalizing } \text{ factor } \,\,\, n^{\gamma }.$$

The limit distributions are:

for $\gamma = 1/2$ Laplace distributions $L_{1/\sqrt{s}}(x)$ with density
$$l_{1/\sqrt{s}}(x)=\sqrt{s/2}\,{\mathrm{e}}^{-\sqrt{2\,s}|x|},$$
for $\gamma =0$ the standard normal law $\varPhi (x)$ and
for $\gamma = - 1/2$ the scaled Student’s t-distribution $S_2^*(x; \sqrt{s})$ with density
$$s^*_{2}(x; \sqrt{s})= \frac{\displaystyle 1}{\displaystyle 2\,\sqrt{2\,s}}\Big (1 + \frac{\displaystyle x^2}{\displaystyle 2\,s}\Big )^{- 3/2}.$$

Theorem 2

Let $Z_m$ and $N_n(s)$ with $s>0$ be defined by (10) and (2), respectively. Suppose that (14) is satisfied for $Z_m$ and (39) for $N_n(s)$. Then the following statements hold for all $n \in \mathbb {N}_+$:

(i)
Laplace approximation with non-random scaling factor $n^\gamma $ by $Z_{N_n(s)}$:
$$\begin{aligned} \sup \nolimits _{x} \left| \mathbb {P}\left( \sqrt{n}\,Z_{N_n(s)}\le x\right) - L_{1/\sqrt{s}; n}(x) \right| \le C_s\, n^{- 3/2} \end{aligned}$$
(41)
where
$$\begin{aligned} L_{1/\sqrt{s}; n}(x)= & {} L_{1/\sqrt{s}}(x) + \, l_{1/\sqrt{s}}(x)\left( \frac{\sqrt{2}}{12\,s\,\sqrt{n}}\,(s x^2 - 2 \,(1 + \sqrt{2\,s\, }\,|x|)\right. \nonumber \\&\left. \,+\,\frac{s}{72 \, n}\left( \frac{x^3\,|x|}{\sqrt{2\,s}} - \frac{8\, x^2}{s} + \frac{6\,x}{s^2} \,(1 + \sqrt{2\,s\,}\,|x|)\,\right) \right) \end{aligned}$$
(42)
,
(ii)
Normal approximation with random scaling factor $\sqrt{N_n(s)}$ by $Z_{N_n(r)}$:
$$\begin{aligned} \sup \nolimits _{x} \left| \mathbb {P}\left( \sqrt{N_n(s)}\,Z_{N_n(s)}\,\le \,x\right) - \varPhi _{n, 2}(x)\right| \le C_s \, n^{- 3/2}, \end{aligned}$$
(43)
where
$$\begin{aligned} \varPhi _{n, 2}(x) = \varPhi (x) + \varphi (x) \left( \frac{\sqrt{2\,\pi } (x^2 - 4)}{24 \, \sqrt{n}} + \frac{x^5 - 16 x^3 +24 x}{144\,s\, n} \right) \end{aligned}$$
(44)
(iii)
Scaled Student’s t-distribution with mixed scaling factor by $Z_{N_n(s)}$
$$\begin{aligned} \sup \nolimits _{x} \left| \mathbb {P}\left( n^{- 1/2}\,N_n(s)\,Z_{N_n(s)}\,\le \,x\right) - S^*_{n;2}(x) \right| \le C_s \, n^{- 3/2}, \end{aligned}$$
(45)
where
$$\begin{aligned} S^*_{n;2}(x;\sqrt{s})= & {} S^*_2(x;\sqrt{s}) + s_2^*(x; \sqrt{s})\left( -\frac{\sqrt{2}\,(x^2 + 8\,s)}{12 (2\,s + x^2) \,\sqrt{n}}\,\right. \nonumber \\&\left. + \,\frac{1}{144 \,n}\left( \frac{105 x^5}{(2\,s + x^2)^3} + \frac{240 x^3}{(2\,s + x^2)^2} + \frac{72 x}{2\,s + x^2}\right) \right) . \end{aligned}$$
(46)

6 Proofs of Main Results

Proof

The proofs of Theorems 1 and 2 are based on Proposition 2. The structure of the functions $f_1$, $f_2$ and $h_2$in Assumptions A and B is similar to the structure of the corresponding functions in Conditions 1 and 2 in [9]. Therefore, the estimates of the term $D_n$ and of the integrals $I_1(x,n)$ and $I_2(x,n)$ in (23), (25) and (24) as well as the validity of (19) and (20) in Proposition 2 when H(y) is $G_{r,r}(y)$ or $H_s(y)$ can be shown analogously to the proofs for Lemmas 1, 2 or 4 in [9]. In Remark 3 above it was pointed out that the integrals in (25) and (25) can degrade the convergence rate. Let $r<1$. With $|f_2(x\,y^\gamma | \le c^*$ we get

$$\begin{aligned} \int \nolimits _{1/g_n}^\infty \frac{|f_2(x\,y^\gamma )|}{g_n\,y}\mathrm{d}G_{r,r}(y) \le \frac{c^*r^r}{\varGamma (r)\,g_n} \int \nolimits _{1/g_n}^\infty y^{r-2} \mathrm{d}y \le \frac{c^*r^r}{(1-r)\varGamma (r)} \,g_n^{- r}. \end{aligned}$$

(47)

The additional term $f_1(xy^{\gamma })\,(g_n y)^{- 1/2}$ in (17) in Assumption A is to be estimated with condition (19ii).

Moreover, the bounds for $\mathbb {E}(N_n)^{-3/2}$ follow from (31) and (40), since $a=3/2$ in Assumption A, considering the approximation (14).

The integrals in (22) in Proposition 2 are still to be calculated. Similar integrals are calculated in great detail in the proofs of Theorems 3–8 in [9]. To obtain (34), we compute the integrals with Formula 2.3.3.1 in Prudnikov et al. [20]

$$\begin{aligned} M_{\alpha }(x) = \frac{r^r}{\varGamma (r)\,\sqrt{2 \pi }}\,\int \limits ^{\infty }_{0}y^{\alpha -1} {\mathrm{e}}^{- (r+x^2/2)y} dy = \frac{\varGamma (\alpha ) \, r^{r-\alpha } }{\varGamma (r)\,\sqrt{2\, \pi }}\, \big (1+x^2/(2r)\big )^{- \alpha }\, \end{aligned}$$

(48)

for $\alpha = r-1/2, \,\, r~+~1/2,\,\, r~+~3/2$ and $p=r+x^2/2$.

Lemma 2 in [9] and $ \int \nolimits _0^\infty y^{- 1} d G_{r,r}(y) = r/(r-1)$ for $r>1$ lead to (36).

To show (38) we use Formula 2.3.16.2 in [20] with $n=0, 1$ and Formula 2.3.16.3 in [20] with $n=1, 2$ and $p=2$ and $q=x^2/2$.

To obtain (42), we calculate the integrals again with Formula 2.3.16.3 in [20], with $p=x^2/2>0$, $q=s>0$, $n =0, 1, 2$.

Lemma 4 in [9] and $\int \nolimits _0^\infty y^{- a-1} {\mathrm{e}}^{-s/y} d y = s^{-a} \varGamma (a) $ for $a=3/2,\,2$ lead to (44).

Finally, in $\int \nolimits _0^\infty f_k(x/\,y^\gamma ) y^{-2-k/2} {\mathrm{e}}^{-s/y}\mathrm{d}y$ we use the substitution $s/y = u$ to obtain, with (48), the terms in (46). $\square $

References

Aoshima, M., Shen, D., Shen, H., Yata, K., Zhou, Y.-H., Marron, J.S.: A survey of high dimension low sample size asymptotics. Aust. N. Z. J. Stat. 60(1), 4–19 (2018). https://doi.org/10.1111/anzs.12212
Article MathSciNet MATH Google Scholar
Bening, V.E., Galieva, N.K., Korolev, V.Y.: Asymptotic expansions for the distribution functions of statistics constructed from samples with random sizes [in Russian]. Inf. Appl. IPI RAN 7(2), 75–83 (2013)
Google Scholar
Bening, V.E., Korolev, V.Y.: On the use of Student’s distribution in problems of probability theory and mathematical statistics. Theory Probab. Appl. 49(3), 377–391 (2005)
Article MathSciNet Google Scholar
Bening, V.E., Korolev, V.Y.: Some statistical problems related to the Laplace distribution [in Russian]. Inf. Appl. IPI RAN 2(2), 19–34 (2008)
Google Scholar
Bobkov, S.G., Naumov, A.A., Ulyanov V.V.: Two-sided inequalities for the density function’s maximum of weighted sum of chi-square variables. arXiv:2012.10747v1 (2020). https://arxiv.org/pdf/2012.10747.pdf
Buddana, A., Kozubowski, T.J.: Discrete Pareto distributions. Econ. Qual. Control 29(2), 143–156 (2014)
Article Google Scholar
Christoph, G., Monakhov, M.M., Ulyanov, V.V.: Second-order Chebyshev-Edgeworth and Cornish-Fisher expansions for distributions of statistics constructed with respect to samples of random size. J. Math. Sci. (N.Y.) 244(5), 811–839 (2020). Translated from Zapiski Nauchnykh Seminarov POMI, 466, Veroyatnost i Statistika. 26, 167–207 (2017)
Article MathSciNet Google Scholar
Christoph, G., Prokhorov, Yu., Ulyanov, V.: On distribution of quadratic forms in Gaussian random variables. Theory Prob. Appl. 40(2), 250–260 (1996)
Google Scholar
Christoph, G., Ulyanov, V.V.: Second order expansions for high-dimension low-sample-size data statistics in random setting. Mathematics 8(7), 1151 (2020)
Article Google Scholar
Christoph, G., Ulyanov, V.V., Bening, V.E.: Second order expansions for sample median with random sample size. arXiv:1905.07765v2 (2020). https://arxiv.org/pdf/1905.07765.pdf
Christoph, G., Ulyanov, V.V., Fujikoshi, Y.: Accurate approximation of correlation coefficients by short Edgeworth-Chebyshev expansion and its statistical applications. In: Shiryaev, A.N., Varadhan, S.R.S., Presman, E.L. (eds.) Prokhorov and Contemporary Probability Theory. In Honor of Yuri V. Prokhorov. Springer Proceedings in Mathematics & Statistics, vol. 33, pp. 239–260. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-33549-5_13
Chapter Google Scholar
Fujikoshi, Y., Ulyanov, V.V., Shimizu, R.: Multivariate Statistics. High-Dimensional and Large-Sample Approximations. Wiley Series in Probability and Statistics. Wiley, Hoboken (2010)
Book Google Scholar
Gavrilenko, S.V., Zubov, V.N., Korolev, V.Y.: The rate of convergence of the distributions of regular statistics constructed from samples with negatively binomially distributed random sizes to the Student distribution. J. Math. Sci. (N.Y.) 220(6), 701–713 (2017)
Article MathSciNet Google Scholar
Hall, P., Marron, J.S., Neeman, A.: Geometric representation of high dimension, low sample size data. J. R. Stat. Soc. Ser. 67, 427–444 (2005)
Article MathSciNet Google Scholar
Johnson, N.L., Kotz, S., Balakrishnan, N.: Continuous Univariate Distributions, vol. 2, 2nd edn. Wiley, New York (1995)
MATH Google Scholar
Kawaguchi, Y., Ulyanov, V.V., Fujikoshi, Y.: Asymptotic distributions of basic statistics in geometric representation for high-dimensional data and their error bounds (Russian). Inf. Appl. 4, 12–17 (2010)
Google Scholar
Konishi, S.: Asymptotic expansions for the distributions of functions of a correlation matrix. J. Multivar. Anal. 9, 259–266 (1979)
Article MathSciNet Google Scholar
Lyamin, O.O.: On the rate of convergence of the distributions of certain statistics to the Laplace distribution. Mosc. Univ. Comput. Math. Cybern. 34(3), 126–134 (2010)
Article MathSciNet Google Scholar
Petrov, V.V.: Limit Theorems of Probability Theory. Sequences of Independent Random Variables. Clarendon Press, Oxford (1995)
MATH Google Scholar
Prudnikov, A.P., Brychkov, Y.A., Marichev, O.I.: Integrals and Series, Volume 1: Elementary Functions, 3rd edn. Gordon & Breach Science Publishers, New York (1992)
MATH Google Scholar
Schluter, C., Trede, M.: Weak convergence to the student and Laplace distributions. J. Appl. Probab. 53(1), 121–129 (2016)
Article MathSciNet Google Scholar

Download references

Acknowledgements

Theorem 1 has been obtained under support of the Ministry of Education and Science of the Russian Federation as part of the program of the Moscow Center for Fundamental and Applied Mathematics under the agreement N^o 075-15-2019-1621. Theorem 2 was proved within the framework of the HSE University Basic Research Program.

Author information

Authors and Affiliations

Department of Mathematics, University of Magdeburg, Magdeburg, Germany
Gerd Christoph
Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow, Russian Federation
Vladimir V. Ulyanov
Faculty of Computer Science, HSE University, 109028, Moscow, Russian Federation
Vladimir V. Ulyanov

Authors

Gerd Christoph
View author publications
You can also search for this author in PubMed Google Scholar
Vladimir V. Ulyanov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vladimir V. Ulyanov .

Editor information

Editors and Affiliations

Steklov Mathematical Institute of RAS, Moscow, Russia
Albert N. Shiryaev
Applied Probability & Informatics, Department of Applied Probability and Informatics, RUDN University, Moscow, Russia
Konstantin E. Samouylov
Applied Probability & Informatics, Department of Applied Probability and Informatics, RUDN University, Moscow, Russia
Dmitry V. Kozyrev

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Christoph, G., Ulyanov, V.V. (2021). Random Dimension Low Sample Size Asymptotics. In: Shiryaev, A.N., Samouylov, K.E., Kozyrev, D.V. (eds) Recent Developments in Stochastic Methods and Applications. ICSM-5 2020. Springer Proceedings in Mathematics & Statistics, vol 371. Springer, Cham. https://doi.org/10.1007/978-3-030-83266-7_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-83266-7_16
Published: 03 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-83265-0
Online ISBN: 978-3-030-83266-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Random Dimension Low Sample Size Asymptotics

Abstract

Similar content being viewed by others

Asymptotic Properties of Multivariate Order Statistics with Random Index

High-Dimensional Linear Models: A Random Matrix Perspective

Multivariate Order Statistics: the Intermediate Case

Keywords

1 Three Geometric Statistics of Gaussian Vectors

2 Approximation for Geometric Statistics of m-Dimensional Normal Vectors

Proposition 1

Corollary 1

3 Auxiliary Proposition

Assumption A:

Remark 1

Assumption B:

Remark 2

Proposition 2

Proof

Remark 3

4 The Random Dimension \(N_n(r)\) is Negative Binomial Distributed

Remark 4

Theorem 1

5 The Random Dimension \(N_n(s)\) is Discrete Pareto-Like Distributed

Remark 5

Theorem 2

6 Proofs of Main Results

Proof

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation