1 Introduction

Let \(W=\frac{1}{\sqrt{n}}\sum _{i=1}^n X_i\) be a standardized sum of independent and identically distributed \(d\)-dimensional random vectors such that \({\mathbb {E}}(X_1)=0\), \(\mathop {\mathrm {Cov}}(X_1)=\mathbb {I}_d\), and \({\mathbb {E}}( |X_1|^3)<\infty \) where \(\mathbb {I}_d\) denotes the \(d\)-dimensional identity matrix and \(\vert \cdot \vert \) denotes the Euclidean norm of a vector. Bentkus [5] proved the following bound on a non-smooth function distance between the distribution of \(W\) and the standard \(d\)-dimensional Gaussian distribution:

$$\begin{aligned} \sup _{A\in {\mathcal {A}}}|{\mathbb {P}}(W\in A)-{\mathbb {P}}(Z \in A)| \leqslant C {\mathbb {E}}( |X_1|^3) d^{\frac{1}{4}} n^{-\frac{1}{2}} \end{aligned}$$
(1.1)

where \({\mathcal {A}}\) denotes the collection of all the convex sets in \(\mathrm{I\!R}^d\), \(Z\) is a \(d\)-dimensional standard Gaussian vector, and \(C\) is an absolute constant.

In this paper, we aim to prove a multivariate central limit theorem with an error bound having the same order of magnitude in terms of \(d\) and \(n\) for the same non-smooth function distance as in (1.1), but for general bounded dependent random vectors. The dependence structure we consider is similar to that of Barbour et al. [4] and is more general than the local dependence structure considered in Chen and Shao [10]. The approach we use is the recursive approach in Stein’s method for multivariate normal approximation.

Stein’s method was introduced by Stein [18] for normal approximation and has become a powerful tool in proving distributional approximations. We refer to Barbour and Chen [3] for an introduction to Stein’s method. Stein’s method for multivariate normal approximation was first studied in Götze [13]. Along with an inductive argument for sums of independent random vectors, he proved an error bound whose dependence on the dimension is \(d^{\frac{3}{2}}\) for the distance in (1.1). The dependence on the dimension was improved to \(d^{\frac{1}{4}}\) in Bentkus [5] [cf. (1.1)] and Bentkus [6] by using the same inductive approach and a Lindeberg-type argument. Using the recursive approach in Stein’s method, Rinott and Rotar [17] proved a multivariate normal approximation result for sums of bounded random vectors that allow for a certain decomposition. The error bound they obtained for the distance in (1.1) is typically of the order \(O_d( n^{-\frac{1}{2}} \log n)\) with unspecified dependence on \(d\) and an additional logarithmic factor. Recently, Fang and Röllin [11] proved a multivariate normal approximation result under the general framework of Stein coupling [cf. Fang and Röllin [9]]. For sums of locally dependent bounded random vectors, their bound for the distance in (1.1) is typically of the order \(d^{\frac{7}{4}} n^{-\frac{1}{2}}\). Compared with the existing literature on bounding the non-smooth function distance in (1.1) for multivariate normal approximation for sums of bounded random vectors, our new result not only applies to a decomposition structure more general than local dependence, but also obtains an error bound typically of the same order as for sums of independent and identically distributed random vectors in (1.1). Stein’s method has also been used to prove smooth function distances for multivariate normal approximation under various dependence structures; see, for example, Goldstein and Rinott [12], Raič [15], Chatterjee and Meckes [7], and Reinert and Röllin [16].

Many problems in random graph counting satisfy the decomposition structure considered in this paper. See, for example, Barbour et al. [4], Janson and Nowicki [14], Avram and Bertsimas [1], and Rinott and Rotar[17]. We will study an example from Rinott and Rotar [17] and show that the error bound obtained by our main theorem is better than that by Rinott and Rotar [17].

The paper is organized as follows. In the next section, we state our main result. In Sect. 3, we study a random graph counting problem. In Sect. 4, we prove our main result. Throughout this article, \(\vert \cdot \vert \) denotes the Euclidean norm of a vector or the cardinality of a set, and \(\mathrm {Id}_d\) denotes the \(d\)-dimensional identity matrix. Define \([n]:=\{1,\dots , n\}\) and \(X_N:=\sum _{i\in N} X_i\) for an index set \(N\).

2 Main Results

For a sum of standardized \(d\)-dimensional random vectors \(W=\sum _{i=1}^n X_i\) with a certain decomposition structure, we aim to bound the quantity

$$\begin{aligned} d_{c}\bigl (\fancyscript{L}(W), \fancyscript{L}(Z)\bigr ) =\sup _{A\in {\mathcal {A}}}|{\mathbb {P}}(W\in A)-{\mathbb {P}}(Z \in A)|, \end{aligned}$$
(2.1)

where \(Z\) has standard \(d\)-dimensional Gaussian distribution and \({\mathcal {A}}\) denotes the collection of all the convex sets in \(\mathrm{I\!R}^d\). The following is our main theorem.

Theorem 2.1

Let \(W=\sum _{i=1}^n X_i\) be a sum of \(d\)-dimensional random vectors such that \({\mathbb {E}}(X_i)=0\) and \(\mathop {\mathrm {Cov}}(W)=\mathbb {I}_d\). Suppose \(W\) can be decomposed as follows:

$$\begin{aligned}&\forall \ i\in [n],\ \exists \ i\in N_i\subset [n]\ \text {such that}\ W-X_{N_i}\ \text {is independent of}\ X_i; \end{aligned}$$
(2.2)
$$\begin{aligned}&\forall \ i\in [n], j\in N_i,\ \exists \ N_i\subset N_{ij}\subset [n]\ \text {such that}\ W-X_{N_{ij}}\nonumber \\&\quad \text {is independent of}\ \{X_i, X_j\}; \end{aligned}$$
(2.3)
$$\begin{aligned}&\forall \ i\in [n], j\in N_i, k\in N_{ij},\ \exists \ N_{ij}\subset N_{ijk} \subset [n]\ \text {such that}\ W-X_{N_{ijk}}\nonumber \\&\quad \text {is independent of}\ \{X_i, X_j, X_k\}. \end{aligned}$$
(2.4)

Suppose further that for each \(i\in [n], j\in N_i\) and \(k\in N_{ij}\),

$$\begin{aligned} |X_i|\leqslant \beta , \quad \ |N_i|\leqslant n_1, \quad \ |N_{ij}|\leqslant n_2, \quad \ |N_{ijk}|\leqslant n_3. \end{aligned}$$
(2.5)

Then there is a universal constant \(C\) such that

$$\begin{aligned} d_c(\fancyscript{L}(W), \fancyscript{L}(Z))\leqslant C d^{1/4} n \beta ^3 n_1\left( n_2+\frac{n_3}{d}\right) , \end{aligned}$$
(2.6)

where \(d_c\) is defined as in (2.1) and \(Z\) is a \(d\)-dimensional standard Gaussian random vector.

Remark 2.2

Under the conditions of Theorem 2.1 but with \(\mathop {\mathrm {Cov}}(W)=\Sigma \), by considering \(\Sigma ^{-1/2}W\), we have

$$\begin{aligned} d_c(\fancyscript{L}(W), \fancyscript{L}(\Sigma ^{1/2} Z))\leqslant C d^{1/4} n ||\Sigma ^{-1/2}||^3\beta ^3 n_1\left( n_2+\frac{n_3}{d}\right) , \end{aligned}$$

where \(||\Sigma ^{-1/2}||\) is the operator norm of \(\Sigma ^{-1/2}\).

Remark 2.3

The decomposition (2.2) and (2.3) is the same as that of Barbour et al. [4] and, as observed there, is more general than the local dependence structure studied in Chen [8] and later in Chen and Shao [10]. Condition (2.4) is a natural extension of the decomposition of Barbour et al. [4]. We need this extra condition to obtain the bound in (2.6).

Remark 2.4

As mentioned at the beginning of the Introduction, the dependence on the dimension \(d\) in (2.6) is the best known dependence even for sums of independent and identically distributed random vectors. In typical applications, \(\beta \) is of the order \(O(n^{-\frac{1}{2}})\). From the decomposition (2.2)–(2.4), the neighborhood sizes \(n_2\) and \(n_3\) are typically of the same order as \(n_1\). On the other hand, in the local dependence structure studied in Chen [8] and Chen and Shao [10], quantities corresponding to \(n_2\) and \(n_3\) are typically of the order \(O(n_1^2)\) and \(O(n_1^3)\), respectively. Rinott and Rotar [17] proved a bound similar to (2.6) for a different decomposition structure. Their bound does not have explicit dependence on \(d\) and has an additional \(\log n\) term. Recently, Fang and Röllin [11] proved a general multivariate central limit theorem for the non-smooth function distance under the framework of Stein coupling (cf. Chen and Röllin [9]) with boundedness conditions. Although their bound is more widely applicable, it does not yield optimal dependence on \(d\) and does not directly apply to the decomposition structure considered in this paper.

3 An Application

Let \(n\geqslant 2, m\geqslant 1, d\geqslant 2\) be positive integers. Consider a regular graph with \(n\) vertices and vertex degree \(m\). Let \(N=nm/2\) be the total number of edges. We color each vertex independently with one of the colors \(c_i\), \(1\leqslant i\leqslant d\), with the probability of \(c_i\) being \(\pi _i\), where \(\sum _{i=1}^d \pi _i=1\). For \(i\in \{1,\dots , d\}\), let \(W_i\) be the number of edges connecting vertices both of color \(c_i\). Formally, with edges indexed by \(j\in \{1,\dots , N\}\), we set

$$\begin{aligned} W_i=\sum _{j=1}^N X_{ji}, \end{aligned}$$

where \(X_{ji}\) is the indicator of the event that the edge \(j\) connects two vertices both of color \(c_i\). Let

$$\begin{aligned} W=(W_1,\dots , W_d)^t. \end{aligned}$$
(3.1)

Let \(\lambda ={\mathbb {E}}(W)\), \(\Sigma =\mathop {\mathrm {Cov}}(W)\). It is known that

$$\begin{aligned} \lambda =(N\pi _1^2,\dots , N\pi _d^2)^t, \end{aligned}$$

and [cf. (3.1) of Rinott and Rotar [17]]

$$\begin{aligned} \mathop {\mathrm {Var}}\nolimits (W_i)= & {} N\pi _i^2(1-\pi _i^2)+2N(m-1)(\pi _i^3-\pi _i^4),\\ \mathop {\mathrm {Cov}}(W_i, W_j)= & {} -N(2m-1)\pi _i^2 \pi _j^2, \quad \ \text {for}\ i\ne j. \end{aligned}$$

We prove the following bound on the non-smooth function distance between the standardized distribution of \(W\) and the standard \(d\)-dimensional Gaussian distribution.

Proposition 3.1

Let \(W\) be defined as in (3.1). Let \(\lambda \) and \(\Sigma \) be the mean and covariance matrix of \(W\), respectively. We have

$$\begin{aligned} d_c (\mathcal {L}(\Sigma ^{-1/2}(W-\lambda )), \mathcal {L}(Z) )\leqslant Cd^{7/4}m^{3/2} L^3 n^{-1/2} \end{aligned}$$
(3.2)

where \(d_c\) is defined as in (2.1), \(Z\) is a \(d\)-dimensional standard Gaussian vector, \(C\) is an absolute constant, and

$$\begin{aligned} L=\left[ \mathrm{min}_{1\leqslant i\leqslant d} \{\pi _i^2(1-\pi _i)\}\right] ^{-1/2}. \end{aligned}$$

Remark 3.2

Rinott and Rotar [17] proved an upper bound for the left-hand side of (3.2) as follows [cf. (3.2) of Rinott and Rotar [17]]:

$$\begin{aligned} d_c (\mathcal {L}(\Sigma ^{-1/2}(W-\lambda )), \mathcal {L}(Z) )\leqslant c_d m^{3/2} L^3 (|\log L|+\log n)n^{-1/2}, \end{aligned}$$
(3.3)

where \(c_d\) is an unspecified constant depending on \(d\). Compared to (3.3), our bound in (3.2) has explicit dependence on \(d\) and does not have the logarithmic terms.

Proof of Proposition 3.1

Recall that

$$\begin{aligned} \Sigma ^{-1/2} (W-\lambda )=\sum _{j=1}^N \Sigma ^{-1/2} \left( (X_{j1},\dots , X_{jd})^t-(\pi _1^2,\dots , \pi _d^2)^t \right) =:\sum _{j=1}^N \xi _j. \end{aligned}$$

Observe that at most one of \(\{X_{j1},\dots , X_{jd}\}\) can be nonzero. Together with \(\sum _{i=1}^d \pi _i=1\) and the fact that the elements of \(\Sigma ^{-1/2}\) are bounded in modulus by \(N^{-1/2} L\) (cf. page 339 of Rinott and Rotar [17]), we have

$$\begin{aligned} |\xi _j|\leqslant 2 d^{1/2} N^{-1/2} L. \end{aligned}$$

Moreover, the summation \(\sum _{j=1}^N \xi _j\) can be easily seen to satisfy the decomposition structure (2.2)–(2.4) with

$$\begin{aligned} |N_i|\leqslant 2m, \quad |N_{ij}|\leqslant 3m, \quad |N_{ijk}|\leqslant 4m. \end{aligned}$$

By applying Theorem 2.1, we obtain the bound (3.2). \(\square \)

4 Proof of Main Theorem

For given test function \(h\), we consider the Stein equation

$$\begin{aligned} \Delta f(w)-w^t\nabla f(w)=h(w)-{\mathbb {E}}[h(Z)],\qquad w\in \mathrm{I\!R}^d, \end{aligned}$$
(4.1)

where \(\Delta \) denotes the Laplacian operator and \(\nabla \) the gradient operator. If \(h\) is not continuous (as for the indicator function of a convex set), then \(f\) is not smooth enough to apply Taylor’s expansion to the necessary degree, so more refined techniques are necessary.

We follow the smoothing technique of Bentkus [5]. Recall that \({\mathcal {A}}\) is the collection of all the convex sets in \(\mathrm{I\!R}^d\). For \(A\in {\mathcal {A}}\), let \(h_A(x)=I_A(x)\), and define the smoothed function

$$\begin{aligned} h_{A,\varepsilon }(w) = \psi \biggl (\frac{\mathop {\mathrm {dist}}(w, A)}{\varepsilon }\biggr ), \end{aligned}$$
(4.2)

where \(\mathop {\mathrm {dist}}(w,A) = \inf _{v\in A}\vert w-v\vert \) and

$$\begin{aligned} \psi (x)= {\left\{ \begin{array}{ll} 1, &{} x<0,\\ 1-2x^2, &{} 0\leqslant x<\frac{1}{2},\\ 2(1-x)^2, &{} \frac{1}{2} \leqslant x <1,\\ 0, &{} 1\leqslant x. \end{array}\right. } \end{aligned}$$
(4.3)

Define also

$$\begin{aligned} A^\varepsilon = \{x\in \mathrm{I\!R}^d{:}\, \mathop {\mathrm {dist}}(x,A)\leqslant \varepsilon \},\qquad A^{-\varepsilon } = \{x\in A{:}\, \mathop {\mathrm {dist}}(x,\mathrm{I\!R}^d\setminus A) > \varepsilon \} \end{aligned}$$

[note that in general \((A^{-\varepsilon })^\varepsilon \ne A\)].

We will use the following lemmas in the proof of Theorem 2.1.

Lemma 4.1

(Lemma 2.3 of Bentkus [5]) The function \(h_{A,\varepsilon }\) as defined above has the following properties:

$$\begin{aligned} ( \mathrm{i})&h_{A,\varepsilon }(w)=1 \quad \text { for all } w\in A, \end{aligned}$$
(4.4)
$$\begin{aligned} (\mathrm{ii})&h_{A,\varepsilon }(w)=0 \quad \text { for all } w\in \mathrm{I\!R}^d\setminus A^\varepsilon ,\end{aligned}$$
(4.5)
$$\begin{aligned} (\mathrm{iii})&0\leqslant h_{A,\varepsilon }(w) \leqslant 1 \quad \text { for all } w\in A^\varepsilon \setminus A, \end{aligned}$$
(4.6)
$$\begin{aligned} (\mathrm{iv})&|\nabla h_{A,\varepsilon } (w)|\leqslant 2\varepsilon ^{-1} \quad \text { for all } w\in \mathrm{I\!R}^d, \end{aligned}$$
(4.7)
$$\begin{aligned} (\mathrm{v})&|\nabla h_{A,\varepsilon } (v)-\nabla h_{A,\varepsilon } (w)|\leqslant 8|v-w|\varepsilon ^{-2} \quad \text { for all } v,w\in \mathrm{I\!R}^d. \end{aligned}$$
(4.8)

Lemma 4.2

(Ball  [2], Bentkus  [5]) We have

$$\begin{aligned} { \sup _{A\in {\mathcal {A}}}\max \{{\mathbb {P}}(Z\in A^\varepsilon \setminus A),{\mathbb {P}}(Z\in A\setminus A^{-\varepsilon })\} \leqslant 4d^{1/4} \varepsilon , } \end{aligned}$$
(4.9)

and the dependence on \(d\) in (4.9) is optimal.

Lemma 4.3

(Lemma 4.2 of Fang and Röllin [11]) For any \(d\)-dimensional random vector \(W\),

$$\begin{aligned} d_c(\fancyscript{L}(W), \fancyscript{L}(Z)) \leqslant 4d^{1/4}\varepsilon + \sup _{A\in {\mathcal {A}}}\bigl \vert {\mathbb {E}}[h_{A,\varepsilon }(W)]-{\mathbb {E}}[h_{A,\varepsilon }(Z)]\bigr \vert . \end{aligned}$$
(4.10)

Lemma 4.4

(Lemma 4.3 of Fang and Röllin [11]) For each map \(a{:}\,\{1,\dots ,d\}^k\rightarrow \mathrm{I\!R}\), we have

$$\begin{aligned} \int _{\mathrm{I\!R}^d}\left( \sum _{i_1,\dots , i_k=1}^d a(i_1,\dots , i_k)\frac{\varphi _{i_1\dots i_k}(z)}{\varphi (z)} \right) ^2\varphi (z)\hbox {d}z \leqslant k! \sum _{i_1,\dots , i_k=1}^d \left( a(i_1,\dots , i_k) \right) ^2, \end{aligned}$$
(4.11)

where \(\varphi (z)\) is the density of \(d\)-dimensional standard normal distribution and

$$\begin{aligned} \varphi _{i_1\dots i_k}(z)=\partial ^k \varphi (z)/(\partial z_{i_1} \dots \partial z_{i_k}). \end{aligned}$$

Now fix \(\varepsilon \) and a convex set \(A\subset \mathrm{I\!R}^d\). It can be verified directly that defining

$$\begin{aligned} g_{A,\varepsilon }(w,\tau )=-\frac{1}{2(1-\tau )} \int _{\mathrm{I\!R}^d} \left\{ h_{A,\varepsilon }(\sqrt{1-\tau }w+\sqrt{\tau }z)-{\mathbb {E}}[h_{A,\varepsilon } (Z)] \right\} \varphi (z)\hbox {d}z, \end{aligned}$$
(4.12)

the function

$$\begin{aligned} f_{A,\varepsilon }(w)=\int _0^1 g_{A, \varepsilon }(w,\tau ) \hbox {d}\tau \end{aligned}$$
(4.13)

is a solution to (4.1) is (cf. Götze [13]). In what follows, we keep the dependence on \(A\) and \(\varepsilon \) implicit and write \(g=g_{A, \varepsilon }, f=f_{A,\varepsilon }\) and \(h=h_{A,\varepsilon }\). For real-valued functions on \(\mathrm{I\!R}^d\), we write \(f_r(x)\) for \(\partial f(x)/\partial x_r\), \(f_{rs}(x)\) for \(\partial ^2 f(x)/(\partial x_r\partial x_s)\) and so forth. We also write \(g_r(w,\tau )=\partial g(w,\tau )/\partial w_r\) and so on. Moreover, let \(\nabla g(w,\tau )=(g_1(w,\tau ),\dots , g_d(w,\tau ))^t\) and let \(\Delta g(w,\tau )=\sum _{r=1}^d g_{rr}(w,\tau )\).

Using this notation and the integration by parts formula, we have for \(1\leqslant r,s,t\leqslant d\) that

$$\begin{aligned} g_{rs} (w,\tau )= & {} - \frac{1}{2\tau } \int _{\mathrm{I\!R}^d} h (\sqrt{1-\tau }w+\sqrt{\tau }z) \varphi _{rs} (z) \hbox {d}z \nonumber \\= & {} \frac{1}{2\sqrt{\tau }} \int _{\mathrm{I\!R}^d} h_s(\sqrt{1-\tau }w+\sqrt{\tau }z) \varphi _r (z) \hbox {d}z \end{aligned}$$
(4.14)

and

$$\begin{aligned} g_{rst} (w,\tau )= & {} \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} h (\sqrt{1-\tau }w+\sqrt{\tau }z) \varphi _{rst} (z) \hbox {d}z \nonumber \\= & {} \frac{\sqrt{1-\tau }}{2\sqrt{\tau }} \int _{\mathrm{I\!R}^d} h_{jk}(\sqrt{1-\tau }w+\sqrt{\tau }z) \varphi _r (z) \hbox {d}z. \end{aligned}$$
(4.15)

Proof of Theorem 2.1

Fix \(A\in {\mathcal {A}}\) and \(\varepsilon >0\) (to be chosen later) and let \(f=f_{A,\varepsilon }\) be the solution to the Stein equation (4.1) corresponding to \(h=h_{A,\varepsilon }\) as defined by (4.2). Let

$$\begin{aligned} \kappa := d_c(\fancyscript{L}(W),\fancyscript{L}(Z)). \end{aligned}$$
(4.16)

To avoid confusion, we will always use \(r,s,t\) to index the components of \(d\)-dimensional vectors. Define

$$\begin{aligned} W_i:=W-X_{N_i},\quad W_{ij}:=W-X_{N_{ij}},\quad W_{ijk}:=W-X_{N_{ijk}}. \end{aligned}$$

By assumption (2.2) and because \({\mathbb {E}}(X_i)=0\), we have

$$\begin{aligned} - {\mathbb {E}}[ W^t \nabla g(W,\tau ) ]= & {} -\sum _{r=1}^d {\mathbb {E}}[ (W)_r g_r (W,\tau ) ]\\ \quad= & {} -\sum _{r=1}^d \sum _{i=1}^n {\mathbb {E}}[(X_i)_r g_r(W, \tau ) ] \\ \nonumber= & {} \sum _{r=1}^d \sum _{i=1}^n {\mathbb {E}}\{ (X_i)_r [g_r(W_i, \tau ) - g_r (W,\tau )] \}. \end{aligned}$$

Here and subsequently, we use \((X)_r\) to denote the \(r\)th component of a vector \(X\). By assumption (2.2) and \(\mathop {\mathrm {Cov}}(W)=\mathbb {I}_d\), we have \(\sum _{i=1}^n \sum _{j\in N_i} {\mathbb {E}}\{ (X_i)_r (X_j)_s \} =\delta _{rs}\) where \(\delta \) denotes the Kronecker delta. This implies that

$$\begin{aligned} {\mathbb {E}}[\Delta g(W,\tau )] ={\mathbb {E}}\left\{ \sum _{r,s=1}^d \sum _{i=1}^n \sum _{j\in N_i} ({\mathbb {E}}[(X_i)_r(X_j)_s]) g_{rs} (W,\tau )\right\} . \end{aligned}$$

Adding and subtracting the corresponding terms and using (2.3), we have

$$\begin{aligned}&{\mathbb {E}}\left\{ \Delta g(W,\tau ) -W^t \nabla g(W,\tau ) \right\} \\&= {\mathbb {E}}\left\{ \sum _{r=1}^d \sum _{i=1}^n (X_i)_r \big [g_r(W_i,\tau ) -g_r(W,\tau ) +\sum _{s=1}^d\sum _{j\in N_i} (X_j)_s g_{rs} (W,\tau ) \big ] \right\} \\&\quad +\, {\mathbb {E}}\left\{ \sum _{r,s=1}^d \sum _{i=1}^n \sum _{j\in N_i} ({\mathbb {E}}[(X_i)_r(X_j)_s] -(X_i)_r (X_j)_s) [g_{rs} (W,\tau )-g_{rs}(W_{ij},\tau ) ] \right\} \\&=: R_1(\tau )+R_2(\tau ). \end{aligned}$$

Taking \(g(w,\tau )=g_{A,\varepsilon }(w,\tau )\) in (4.12), it follows from (4.1) and (4.13) that

$$\begin{aligned} {\mathbb {E}}[ h(W) ]-{\mathbb {E}}[h(Z)]=\int _0^1 (R_1(\tau )+R_2(\tau ))\hbox {d}\tau . \end{aligned}$$
(4.17)

In the following, we will first give an upper bound for \(|\int _0^1 R_1(\tau ) \hbox {d}\tau |\) and then argue that an upper bound for \(|\int _0^1 R_2(\tau ) \hbox {d}\tau |\) can be derived similarly.

To estimate \(\int _0^1 R_1(\tau ) \hbox {d}\tau \), we consider the cases \(\varepsilon ^2< \tau \leqslant 1\) and \(0< \tau \leqslant \varepsilon ^2\) separately. For the first case, we use the first expression of \(g_{rs}(w,\tau )\) in (4.14), Taylor’s expansion

$$\begin{aligned} f(x+a)-f(x)=\int _0^1 a^t \nabla f(x+ua) \hbox {d}u={\mathbb {E}}\{a^t \nabla f(x+U a) \}, \end{aligned}$$

and the integration by parts formula and get

$$\begin{aligned} \int _{\varepsilon ^2}^1 R_1(\tau )\hbox {d}\tau= & {} {\mathbb {E}}\left\{ \sum _{r,s=1}^d \sum _{i=1}^n \sum _{j\in N_i} \int _{\varepsilon ^2}^1 \frac{1}{2\tau } \int _{\mathrm{I\!R}^d} [h(\sqrt{1-\tau }(W-U X_{N_i})\right. \\&\left. +\, \sqrt{\tau }z)-h(\sqrt{1-\tau }W+\sqrt{\tau }z)]\right. \\&\left. \quad \times (X_i)_r (X_j)_s \varphi _{rs} (z) \hbox {d}z \hbox {d}\tau \right\} \\= & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} h(\sqrt{1-\tau }W\right. \\&\left. +\, \sqrt{\tau }z-\sqrt{1-\tau }UVX_{N_i})\right. \\&\left. \quad \times U( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z)\hbox {d}z\hbox {d}\tau \right\} , \end{aligned}$$

where \(U\) and \(V\) are independent random variables distributed uniformly on \([0,1]\). By writing \(h(\sqrt{1-\tau }W +\sqrt{\tau }z-\sqrt{1-\tau }UVX_{N_i})\) as a sum of differences and using the independence assumption (2.4), we have

$$\begin{aligned} \int _{\varepsilon ^2}^1 R_1(\tau )\hbox {d}\tau =R_{1,1}+R_{1,2}+R_{1,3}+R_{1,4} \end{aligned}$$

where

$$\begin{aligned} R_{1,1}= & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} \big [ h(\sqrt{1-\tau }W +\sqrt{\tau }z{-}\sqrt{1{-}\tau }UVX_{N_i})\right. \\&\left. -\, h(\sqrt{1-\tau }W_{ijk}+\sqrt{\tau }z) \big ] U( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z)\hbox {d}z\hbox {d}\tau \right\} ,\\ R_{1,2}= & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} \left\{ {\mathbb {E}}[h(\sqrt{1-\tau }W_{ijk}+\sqrt{\tau }z) ]\right. \right. \\&\left. \left. -\, {\mathbb {E}}[ h(\sqrt{1-\tau }W+\sqrt{\tau }z)] \right\} U( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z)\hbox {d}z\hbox {d}\tau \right\} ,\\ R_{1,3}= & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} \left\{ {\mathbb {E}}[h((\sqrt{1-\tau }W+\sqrt{\tau }z)]\right. \right. \\&\left. \left. -\, {\mathbb {E}}[h(\sqrt{1-\tau }Z+\sqrt{\tau }z)]\right\} U( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z)\hbox {d}z\hbox {d}\tau \right\} ,\\ R_{1,4}= & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} h(\sqrt{1-\tau }Z +\sqrt{\tau }z)\right. \\&\left. \times U( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z)\hbox {d}z\hbox {d}\tau \right\} , \end{aligned}$$

where \(Z\) is an independent \(d\)-dimensional standard Gaussian random vector.

By the properties of \(h\) in (4.4) and (4.5), the boundedness condition (2.5), and the independence assumption (2.4),

$$\begin{aligned} |R_{1,1}|\leqslant & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{4\tau ^{3/2}} \int _{\mathrm{I\!R}^d} \mathrm{I}(\hbox {dist}(\sqrt{1-\tau } W_{ijk}\right. \\&+\sqrt{\tau }z, A^\varepsilon \backslash A)\leqslant \sqrt{1-\tau } n_3 \beta )\\&\left. \times \big | ( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z) \big |\hbox {d}z\hbox {d}\tau \right\} . \end{aligned}$$

By the boundedness condition (2.5), the definition of \(\kappa \) in (4.16) and (4.9),

$$\begin{aligned}&{\mathbb {E}}\big [ \mathrm{I}(\hbox {dist}(\sqrt{1-\tau } W_{ijk} +\sqrt{\tau }z, A^\varepsilon \backslash A)\leqslant \sqrt{1-\tau } n_3 \beta ) \big ]\nonumber \\&\leqslant {\mathbb {E}}\big [ \mathrm{I}(\hbox {dist}(\sqrt{1-\tau }W +\sqrt{\tau }z, A^\varepsilon \backslash A)\leqslant 2\sqrt{1-\tau } n_3 \beta ) \big ]\nonumber \\&\leqslant 4d^{1/4} \left( \frac{\varepsilon }{\sqrt{1-\tau }}+4n_3\beta \right) +2\kappa . \end{aligned}$$
(4.18)

By the Cauchy–Schwartz inequality, (4.11) and the boundedness condition (2.5),

$$\begin{aligned} \int _{\mathrm{I\!R}^d} \sum _{r,s,t=1}^d | ( X_i)_r(X_j)_s(X_k)_t \varphi _{rst}(z)| \hbox {d}z \leqslant \sqrt{6} \beta ^3. \end{aligned}$$

Therefore, using \(\int _{\varepsilon ^2}^1 \frac{1}{\tau ^{3/2}}\hbox {d}\tau \leqslant C \frac{1}{\varepsilon }\),

$$\begin{aligned} |R_{1,1}|\leqslant C n \beta ^3 n_1^2 \frac{1}{\varepsilon } \left[ d^{1/4} (\varepsilon +n_3 \beta )+\kappa \right] . \end{aligned}$$
(4.19)

Here and in the remainder of the proof, \(C\) denotes an absolute constant, which may differ from line to line.

By the same argument, \(|R_{1,2}|\) has the same upper bound as \(|R_{1,1}|\).

By the properties of \(h\) in (4.4) and (4.5), the definition of \(\kappa \) in (4.16), and (4.9),

$$\begin{aligned}&{\mathbb {E}}[h(\sqrt{1-\tau }W+\sqrt{\tau }z)-h(\sqrt{1-\tau }Z+\sqrt{\tau }z) ]\\&\quad \leqslant {\mathbb {E}}[\mathrm{I}(\sqrt{1-\tau }W+\sqrt{\tau }z \in A^\varepsilon ) -\mathrm{I}(\sqrt{1-\tau }Z +\sqrt{\tau }z \in A)]\\&\quad = {\mathbb {E}}[ \mathrm{I}(\sqrt{1-\tau }W+\sqrt{\tau }z \in A^\varepsilon ) -\mathrm{I}(\sqrt{1-\tau }Z +\sqrt{\tau }z \in A^\varepsilon )\\&\qquad + \mathrm{I}(\sqrt{1-\tau }Z +\sqrt{\tau }z \in A^\varepsilon \backslash A) ]\\&\quad \leqslant \kappa + 4 d^{1/4} \frac{\varepsilon }{\sqrt{1-\tau }}. \end{aligned}$$

By the same lower bound and a similar argument as for \(R_{1,1}\), we can bound \(R_{1,3}\) by

$$\begin{aligned} |R_{1,3}|\leqslant C n \beta ^3 n_1^2 \frac{1}{\varepsilon } (d^{1/4} \varepsilon +\kappa ). \end{aligned}$$
(4.20)

Using the first expression of \(g_{rst}(w,s)\) in (4.15), we have

$$\begin{aligned} R_{1,4}={\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _{\varepsilon ^2}^1 U {\mathbb {E}}[(X_i)_r (X_j)_s (X_k)_t] g_{rst}(Z,\tau ) \hbox {d}\tau \right\} . \end{aligned}$$

Observe that from (4.12),

$$\begin{aligned} {\mathbb {E}}[g(Z+w,\tau )]= & {} -\frac{1}{2(1-\tau )}\int _{\mathrm{I\!R}^d} \left\{ {\mathbb {E}}[h(\sqrt{1-\tau }(Z+w)+\sqrt{\tau }z)]\right. \\&\quad \left. -{\mathbb {E}}[h(Z)] \right\} \varphi (z)\hbox {d}z\\= & {} -\frac{1}{2(1-\tau )}\int _{\mathrm{I\!R}^d} h(\sqrt{1-\tau }w+z)\varphi (z)\hbox {d}z+\frac{1}{2(1-\tau )}{\mathbb {E}}[h(Z)]\\= & {} -\frac{1}{2(1-\tau )}\int _{\mathrm{I\!R}^d} h(x)\varphi (x-\sqrt{1-\tau }w)\hbox {d}x+\frac{1}{2(1-\tau )}{\mathbb {E}}[h(Z)]. \end{aligned}$$

Differentiating with respect to \(w_r, w_s\), and \(w_t\) and evaluating at \(w=0\), we obtain

$$\begin{aligned} {\mathbb {E}}[g_{rst}(Z,\tau )]=\frac{\sqrt{1-\tau }}{2}\int _{\mathrm{I\!R}^d} h(x)\varphi _{rst}(x)\hbox {d}x. \end{aligned}$$

Now with (4.11) and (2.5),

$$\begin{aligned} |R_{1,4}|\leqslant Cn \beta ^3 n_1^2 . \end{aligned}$$
(4.21)

For the case \(0< \tau \leqslant \varepsilon ^2\), we use the second expression of \(g_{rs}(w,\tau )\) in (4.14) and the Taylor expansion

$$\begin{aligned} \int _0^{\varepsilon ^2} R_1(\tau ) \hbox {d}\tau= & {} -{\mathbb {E}}\left\{ \sum _{r,s=1}^d \sum _{i=1}^n \sum _{j\in N_i} \int _0^{\varepsilon ^2} \frac{1}{2\sqrt{\tau }}\int _{\mathrm{I\!R}^d} [h_s(\sqrt{1-\tau }(W-UX_{N_i})+\sqrt{\tau }z)\right. \\&\left. - \, h_s(\sqrt{1-\tau }W+\sqrt{\tau }z)] (X_i)_r(X_j)_s \varphi _r (z)\hbox {d}zd \tau \right\} \\= & {} {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j,k\in N_i} \int _0^{\varepsilon ^2} \frac{\sqrt{1-\tau }}{2\sqrt{\tau }} \int _{\mathrm{I\!R}^d} h_{st}(\sqrt{1-\tau }W\right. \\&+\sqrt{\tau }z-\sqrt{1-\tau }UVX_{N_i})\\&\left. \times \, U (X_i)_r(X_j)_s(X_k)_t \varphi _r (z) \hbox {d}z\hbox {d}\tau \right\} , \end{aligned}$$

where we recall that \(U\) and \(V\) are independent random variables distributed uniformly on \([0,1]\). By (4.4), (4.5), (4.8), and (2.5),

$$\begin{aligned}&\Big | \int _0^{\varepsilon ^2} R_1(\tau ) d \tau \Big | \leqslant \frac{8}{\varepsilon ^2} \beta ^2 n_1^2 {\mathbb {E}}\left\{ \sum _{r=1}^d \sum _{i=1}^n \int _0^{\varepsilon ^2} \frac{\sqrt{1-\tau }}{2\sqrt{\tau }} \int _{\mathrm{I\!R}^d} \mathrm{I}(\hbox {dist}(\sqrt{1-\tau }W_i\right. \\&\left. \quad +\,\sqrt{\tau }z, A^\varepsilon \backslash A) \leqslant \sqrt{1-\tau } n_1 \beta ) \quad \times U |(X_i)_r \varphi _r(z)| \hbox {d}z\hbox {d}\tau \right\} . \end{aligned}$$

Much as in (4.18),

$$\begin{aligned}&{\mathbb {E}}[\mathrm{I}(\hbox {dist}(\sqrt{1-\tau } W_{i} +\sqrt{\tau }z, A^\varepsilon \backslash A)\leqslant \sqrt{1-\tau } n_1 \beta )]\\&\quad \leqslant {\mathbb {E}}[\mathrm{I}(\hbox {dist}(\sqrt{1-\tau } W +\sqrt{\tau }z, A^\varepsilon \backslash A)\leqslant 2\sqrt{1-\tau } n_1 \beta )]\\&\quad \leqslant 4d^{1/4} (\frac{\varepsilon }{\sqrt{1-\tau }}+4n_1\beta ) +2\kappa . \end{aligned}$$

Together with (2.2), (2.5), and (4.11), we obtain

$$\begin{aligned} \Big | \int _0^{\varepsilon ^2} R_1(\tau ) d \tau \Big | \leqslant Cn \beta ^3 n_1^2 \frac{1}{\varepsilon } [d^{1/4}(\varepsilon + n_1\beta )+\kappa ], \end{aligned}$$

and hence, from (4.19), (4.20), and (4.21),

$$\begin{aligned} \Big |\int _0^1 R_1(\tau )d \tau \Big |\leqslant C n \beta ^3 n_1^2 \frac{1}{\varepsilon } [d^{1/4} (\varepsilon +n_3 \beta )+\kappa ]. \end{aligned}$$
(4.22)

Now we turn to bounding \(|\int _0^1 R_2(s) ds|\). Much as for \(R_1(s)\),

$$\begin{aligned} \int _0^1 R_2(\tau ) \hbox {d}\tau= & {} - {\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j\in N_i}\sum _{k\in N_{ij}} \int _{\varepsilon ^2}^1 \frac{\sqrt{1-\tau }}{2\tau ^{3/2}} \int _{\mathrm{I\!R}^d} h(\sqrt{1-\tau } W\right. \\&\left. +\, \sqrt{\tau }z-\sqrt{1-\tau } U X_{N_{ij}})\right. \\&\left. \times (X_k)_t \left\{ (X_i)_r(X_j)_s-{\mathbb {E}}[(X_i)_r(X_j)_s]\right\} \varphi _{rst} (z)\hbox {d}z\hbox {d}\tau \right\} \\&- \,{\mathbb {E}}\left\{ \sum _{r,s,t=1}^d \sum _{i=1}^n \sum _{j\in N_i}\sum _{k\in N_{ij}} \int _0^{\varepsilon ^2} \frac{\sqrt{1-\tau }}{2\sqrt{\tau }} \int _{\mathrm{I\!R}^d} h_{st}(\sqrt{1-\tau } W\right. \\&+\sqrt{\tau }z-\sqrt{1-\tau } U X_{N_{ij}})\\&\left. \times \,(X_k)_t \left\{ (X_i)_r(X_j)_s-{\mathbb {E}}[(X_i)_r(X_j)_s]\right\} \varphi _{r} (z)\hbox {d}z\hbox {d}\tau \right\} \end{aligned}$$

where \(U\) is a independent random variable distributed uniformly on \([0,1]\). By the same arguments used in bounding \(|\int _0^1 R_1(\tau )\hbox {d}\tau |\), we have

$$\begin{aligned} \Big |\int _0^1 R_2(\tau )d \tau \Big |\leqslant C n \beta ^3 n_1 n_2 \frac{1}{\varepsilon } [d^{1/4} (\varepsilon +n_3 \beta )+\kappa ]. \end{aligned}$$
(4.23)

By (4.10), (4.17), (4.22), and (4.23),

$$\begin{aligned} \kappa \leqslant 4d^{1/4} \varepsilon + C n \beta ^3 n_1 n_2 \frac{1}{\varepsilon } [d^{1/4} (\varepsilon +n_3 \beta )+\kappa ]. \end{aligned}$$
(4.24)

The final bound (2.6) is obtained by choosing \(\varepsilon =2C n \beta ^3 n_1 n_2\) for the same \(C\) as in (4.24), solving the recursive inequality (4.24) and observing that \(d\leqslant n \beta ^2 n_1\) from \(\mathop {\mathrm {Cov}}(W)=\mathbb {I}_d\) and (2.5). \(\square \)