Abstract
In the present paper, we consider the problem of the estimation of a parameter \(\varvec{\theta }\), in Banach spaces, maximizing some criterion function which depends on an unknown nuisance parameter h, possibly infinite-dimensional. The classical estimation methods are mainly based on maximizing the corresponding empirical criterion by substituting the nuisance parameter by a nonparametric estimator. We show that the M-estimators converge weakly to maximizers of Gaussian processes under rather general conditions. The conventional bootstrap method fails in general to consistently estimate the limit law. We show that the m out of n bootstrap, in this extended setting, is weakly consistent under conditions similar to those required for weak convergence of the M-estimators. The aim of this paper is therefore to extend the existing theory on the bootstrap of the M-estimators. Examples of applications from the literature are given to illustrate the generality and the usefulness of our results. Finally, we investigate the performance of the methodology for small samples through a short simulation study.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The semiparametric modeling has proved to be a flexible tool and provided a powerful statistical modeling framework in a variety of applied and theoretical contexts [refer to Pfanzagl (1990), Bickel et al. (1993), van der Vaart and Wellner (1996), van de Geer (2000), and Kosorok (2008). An important work to be cited is the paper of Pakes and Pollard (1989), where a general central limit theorem is proved for estimators defined by minimization of the length of a vector-valued, random criterion function with no smoothness assumptions. The last reference was extended in different settings, among many others, by Pakes and Olley (1995), Chen et al. (2003), Zhan (2002). Recall that the semiparametric models are statistical models where at least one parameter of interest is not Euclidean. The term “M-estimation” refers to a general method of estimation, where the estimators are obtained by maximizing (or minimizing) certain criterion functions. The most widely used M-estimators include maximum likelihood (MLE), ordinary least-squares (OLS), and least absolute deviation estimators. Notice that the major practical problem of maximum likelihood estimators is the lack of robustness, while many robust estimators achieve robustness at some cost in first-order efficiency. The appeal of the M-estimation method is that in addition to the statistical efficiency of the estimators when the parametric model is correctly specified, these estimators are also robust to contamination when the objective function is appropriately chosen. Throughout the available literature, investigations on the asymptotic properties of the M-estimators, as well as the relevant test statistics, have privileged the parametric case. However, in practice, we need more flexible models that contain both parametric and nonparametric components. This paper concentrates on this specific problem. To formulate the problem that we will treat in this paper, we need the following notation. Let \(\mathcal {X} = (\mathbf {X}_{1},\ldots , \mathbf {X}_{n})\) be n independent copies of a random element \(\mathbf {X}\) in a probability space \((\mathcal {S}, \mathcal {A}, \mathbb {P})\). For a Banach spaces \(\mathcal {B}\) and \(\mathcal {H}\) equipped with a norm \(\Vert \cdot \Vert\) and a metric denoted by \(d_{\mathcal {H}}(\cdot ,\cdot )\) respectively, let \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\) be a class of Borel measurable functions \(\mathbf {m}_{{\varvec{\theta }},h} : \mathcal {S} \rightarrow \mathbb {R}\), indexed by \({\varvec{\theta }}\) over some parameter space \({\varvec{\Theta }} \subset \mathcal {B}\) and \(h \in \mathcal {H}\), where \(\varvec{\theta }\) is the parameter of interest and \(h^{0}\) the true value of h consists of nuisance parameter. We define the empirical measure to be
where, for \(\mathbf {x}\in \mathcal {S}\), \(\delta _\mathbf { x}\) is the measure that assigns mass 1 at \(\mathbf { x}\) and zero elsewhere. Let \(f(\cdot )\) be a real valued measurable function, \(f:\mathcal {S}\rightarrow \mathbb {R}\). In the modern theory of the empirical it is usual to identify \(\mathbb {P}\) and \(\mathbb {P}_{n}\) with the mappings given by
The M-estimand of interest \({\varvec{\theta }}_{0}\) and its corresponding M-estimator \({\varvec{\theta }}_{n}\) are assumed to be well-separated maximizers of the processes \(\Big \{\mathbb {P}\mathbf {m}_{{\varvec{\theta }},h^{0}} : {\varvec{\theta }} \in {\varvec{\Theta }}\Big \}\) and \(\Big \{\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}} : {\varvec{\theta }} \in {\varvec{\Theta }}\Big \}\) for a given consistent sequence of estimators \(\widehat{h}\) for \(h^{0}\), respectively. Under suitable entropy conditions on \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\) (defined below) and moment conditions on its envelope, we show that there exist norming sequences \(\{\alpha _{n}\}\) and \(\{r_{n}\}\) such that the random process \(\Big \{\alpha _{n}\mathbb {P}_{n}(\mathbf {m}_{{\varvec{\theta }}_{0}+\gamma /r_{n},\widehat{h}}- \mathbf {m}_{{\varvec{\theta }}_{0},\widehat{h}}) : \gamma \in K \Big \}\) converges weakly, in the sense of Hoffmann-Jørgensen (1991), see van der Vaart and Wellner (1996), in particular their Definition 1.3.3., to the process \(\{\mathbb {Z}(\gamma ) : \gamma \in K\}\), for each closed bounded subset \(K \subset \mathcal {B}\). It follows by an argmax continuous mapping theorem, refer to Kosorok (2008) in particular Chapter 14, that \(r_{n} ({\varvec{\theta }}_{n} -{\varvec{\theta }}_{0} )\) converges weakly to \(\arg \max _{\gamma } \mathbb {Z}(\gamma )\). The latter weak limit has a complicated form in general and does not permit explicit computation. It would therefore be of interest to estimate the sampling distribution of \(r_{n} ({\varvec{\theta }}_{n} -{\varvec{\theta }}_{0} )\) by the bootstrap for inferencing purposes. Bootstrap samples were introduced and first investigated in Efron (1979). Since this seminal paper, bootstrap methods have been proposed, discussed, investigated and applied in a huge number of papers in the literature. Being one of the most important ideas in the practice of statistics, the bootstrap also introduced a wealth of innovative probability problems, which in turn formed the basis for the creation of new mathematical theories. The bootstrap can be described briefly as follows. Let \(T(\mathbb {F})\) be a functional of an unknown distribution function \(\mathbb {F}(\cdot )\), \(\mathbf { X}_{1},\ldots ,\mathbf { X}_{n}\) a sample from \(\mathbb {F}(\cdot )\), and \(\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{n}\) an independent and identically distributed [i.i.d.] sample with common distribution given by the empirical distribution \(\mathbb {F}_{n}(\cdot )\) of the original sample. The distribution of \(\{T(\mathbb {F}_{n})-T(\mathbb {F})\}\) is then approximated by that of \(\{T(\mathbb {F}_{n}^{*})-T(\mathbb {F}_{n})\}\) conditionally on \(\mathbf { X}_{1},\ldots ,\mathbf { X}_{n}\), with \(\mathbb {F}^{*}_{n}(\cdot )\) being the empirical distribution of \(\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{n}\). The key idea behind the bootstrap is that if a sample is representative of the underlying population, then one can make inferences about the population characteristics by resampling from the current sample. The asymptotic theory of the bootstrap with statistical applications has been reviewed in the books among others Efron and Tibshirani (1993) and Shao and Tu (1995). Chernick (2008), Davison and Hinkley (1997), van der Vaart and Wellner (1996), Hall (1992) and Kosorok (2008). A major application for an estimator is in the calculation of confidence intervals. By far the most favored confidence interval is the standard confidence interval based on a normal or a Student t-distribution. Such standard intervals are useful tools, but based on an approximation that can be quite inaccurate in practice. Bootstrap procedures are an attractive alternative. One way to look at them is as procedures for handling data when one is not willing to make assumptions about the parameters of the populations from which one sampled. The most that one is willing to assume that the data are a reasonable representation of the population from which they come. One then resamples from the data and draws inferences about the corresponding population and its parameters. The resulting confidence intervals have received the most theoretical study of any topic in the bootstrap analysis. Roughly speaking, it is known that the bootstrap works in the i.i.d. case if and only if the central limit theorem holds for the random variable under consideration. For further discussion we refer the reader to the landmark paper by Giné and Zinn (1989). It is worth noticing that some special examples reveal that the conventional bootstrap based on resamples of size n breaks down; see, for example, Bose and Chatterjee (2001) and El Bantli (2004). We focus on a modified form of bootstrap methods, known as the m out of n bootstrap, with a view to remedy the problem of inconsistency. The m out of n scheme modifies the conventional scheme by drawing bootstrap resamples of size \(m = o(n)\). See, for example, Bickel et al. (1997) for a review of this technique in a variety of contexts. For more recent references on the bootstrap one can refer to Bouzebda (2010), Bouzebda and Limnios (2013), Bouzebda et al. (2018), Alvarez-Andrade and Bouzebda (2013, 2015, 2019) and the reference therein. Denote by \({\varvec{\widehat{\theta }}}_m\) the M-estimator calculated from a bootstrap resample of size m. Weak convergence in probability of the conditional distribution of \(r_{m}({\varvec{\widehat{\theta }}}_{m} - {\varvec{\theta }}_{n})\) to the distribution of \(\arg \max _g \mathbb {Z}(g )\) is established under essentially similar conditions for weak convergence of \(r_{n} ({\varvec{\theta }}_{n}-{\varvec{\theta }}_{0})\), provided that \(m = o(n), m \rightarrow \infty\) and \(a^{2}_m m^{-1/2} \log n/ \log (n/m + 1) = o(1)\) for a fixed sequence \(\{a_{m}\}\) depending on the size of the envelope for \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\). The asymptotic properties of \({\varvec{\theta }}_{n}\) have been established by, among many others, Bose and Chatterjee (2001) and El Bantli (2004), under appropriate concavity or differentiability conditions. Empirical process methods are instrumental tools for evaluating the large sample properties of estimators based on semiparametric models, including consistency, distributional convergence, and validity of the bootstrap. In particular, modern empirical process theory provides a more general approach to theoretical investigation of general M-estimators; see, for example, Dudley (1999), Kim and Pollard (1990), Pollard (1985), van de Geer (2000) and van der Vaart and Wellner (1996). Most results obtained thus far are, however, restricted to the cases where the Gaussian process \(\mathbb {Z}\) has either quadratic mean function or quadratic covariance function. In order to establish stronger results which cover cases where the covariance and mean functions of \(\mathbb {Z}\) may take on more general structures, we will use the empirical process approach. Applications of the bootstrap to M-estimation have been investigated deeply in the literature extensively. Relevant theoretical results concern mostly M-estimators with \(r_{n} = n^{1/2}\) and asymptotically Gaussian limits. The most common technique for studying bootstrap M-estimators is the linearization which can not be used in a nonstandard setting. Under standard regularity conditions, the Edgeworth expansions for bootstrap distributions of finite-dimensional M-estimators are Lahiri (1992). Under a weak form of differentiability condition, Arcones and Giné (1992) investigated bootstrapping finite-dimensional \(n^{1/2}\)-consistent M-estimators and established an almost sure bootstrap central limit theorem. An in-probability bootstrap central limit theorem for possibly infinite-dimensional Z-estimators is investigated by Wellner and Zhan (1996). In the setting of the nonregular vector-valued M-estimators obtained from \(\mathbf {m}_{{\varvec{\theta }}}\) concave in \({\varvec{\theta }}\), Bose and Chatterjee (2001) investigated a weighted form of the bootstrap including the m out of n bootstrap is a special case. The M-estimation for linear models under nonstandard conditions was considered by El Bantli (2004), and proved that the m out of n bootstrap is consistent but the conventional bootstrap is not in general. The Bose and Chatterjee (2001) and El Bantli (2004) results are restricted to the case where \(\mathbb{Z}\) has a quadratic covariance function, concavity and differentiability assumptions. Lee and Pun (2006) prove m out of n bootstrap consistency for vector-valued M-estimators under twice-differentiability of the process \(\mathbb {P}\mathbf {m}_{{\varvec{\theta }}}\), where \({\varvec{\theta }}\) may contain a subvector of nuisance parameters, in which case the process \(\mathbb {Z}\) has a quadratic mean function. Lee (2012) gives the general result of m out of n bootstrap of M-estimators without the presence of a nuisance parameter. Under nonstandard conditions, Lee and Yang (2020) proposed a one-dimensional pivot derived from the criterion function, and prove that its distribution can be consistently estimated by the m out of n bootstrap, or by a modified version of the perturbation bootstrap. They provide a new method for constructing confidence regions which are transformation equivariant and have shapes driven solely by the criterion function.
The main purpose of the present work is to consider a general framework of non-smooth semi-parametric M-estimators extending the setting of Lee (2012) to the \(\mathcal {B}\)-valued M-estimators in presence of nuisance parameter where the rate of convergence of the nuisance parameter may be different of that of the parameter of interest. More precisely, we consider the m out n bootstrapped version of the M-estimator investigated in Delsol and Van Keilegom (2020), where these authors showed that, their M-estimator converges weakly to some process which is composed on Gaussian process and some deterministic continuous function, which is harder to evaluate for practical use. For that we propose in this paper as a solution to this problem the m out of n bootstrap. We mention at this stage that parameter \(\varvec{\theta }\), in the present paper, belongs to some Banach space which is different from the last mentioned work where the parameter of interest is euclidean. Hence, we restate the results of Delsol and Van Keilegom (2020) under more general conditions. The main aim of the present paper is to provide a first full theoretical justification of the m out of n bootstrap consistency of M-estimators with non-smooth criterion functions of the parameters and gives the consistency rate together with the asymptotic distribution of the parameters of interest \(\varvec{\theta }_0\). This requires the effective application of large sample theory techniques, which were developed for the empirical processes. The Lee (2012) results are not directly applicable here since the estimation procedures depend on some nuisance parameters. These results are not only useful in their own right but essential for the derivation of our asymptotic results.
The paper is organized as follows. Section 2 introduces the notation and assumptions. Section 3 states the main theorems. Though our main objective in the paper is theoretical, we provide in Sect. 4 Monte Carlo simulations of simulations to look at the method’s performance in practice. Some concluding remarks are given in Sect. 5. All proofs are gathered in Sect. 6. In the Appendix we apply our theorems and prove as corollaries new m out of n bootstrap consistency results for three examples.
2 Notation
We abuse notation slightly by identifying the underlying probability space \((\mathcal {S},\mathcal {A},\mathbb {P})\) with the product space \((\mathcal {S}^{\infty },\mathcal {A}^{\infty },\mathbb {P}^{\infty })\times (\mathcal {Z}, \mathcal {C} , \widetilde{P} )\). Now \(\mathbf {X}_{1},\dots , \mathbf {X}_{n}\) are equal to the coordinate projections on the first n coordinates. All auxiliary variables, assumed to be independent of the \(\mathbf {X}_{i}\), depend only on the last coordinate. We will use the usual notation of the empirical processes of van der Vaart and Wellner (1996). Let \(\mathbb {Q}\) denote some signed measure on \(\mathcal {S}\). Let \(\mathcal {F}\) be a class of measurable functions \(f : \mathcal {S} \rightarrow \mathbb {R}\). Define
For any \(r \ge 1\), denote by \(L^{r} (\mathbb {Q})\) the class of measurable functions \(f : S \rightarrow \mathbb {R}\) with
where \(\mathbb {Q}\) is a probability measure. The \(L^{r}(\mathbb {Q})\)-norm \(\Vert \cdot \Vert _{\mathbb {Q},r}\) is defined by
for \(f \in L^{r}(\mathbb {Q})\). The essential supremum of \(f \in L^{\infty } (\mathbb {Q} )\)is denoted by \(\Vert f \Vert _{\mathbb {Q},\infty }\).
The covering number \(N(\epsilon ,\mathcal {F},L^{r}(\mathbb {Q}))\) of a function class \(\mathcal {F} \subset L^{r}(\mathbb {Q})\) is computed with respect to the \(L^{r}(\mathbb {Q})\)-norm for radius \(\epsilon > 0\). To be more precise, \(N(\epsilon ,\mathcal {F},L^{r}(\mathbb {Q}))\) is the minimum number of balls \(\{g : \Vert g-h\Vert _{\mathbb {Q},r} < \epsilon \}\) of radius \(\epsilon\) covering \(\mathcal {F}\).
For some random element \(\mathbf {Z}\), the probability measure induced by \(\mathbf {Z}\) is denoted by \(\mathbb {P}_\mathbf {Z}\), conditional on all other variables. The empirical process is defined to be
The outer and inner probability measures derived from \(\mathbb {P}\) are designated by \(\mathbb {P}^{*}\) and \(\mathbb {P}_{*}\), respectively. Outer and inner probability measures to be understood in the sense used in the monograph by van der Vaart and Wellner (1996), in particular their definitions in page 6. Let T be any map from the underlying probability space to the extended real line \(\overline{\mathbb {R}}\). The minimal measurable majorant and maximal measurable minorant of T are denoted by \(T^{*}\) and \(T_{*}\), respectively. For any subset B of the probability space, by similar notation, its indicator function satisfies \(\mathbbm {1}_{B^{*}} = \mathbbm {1}^{*}_{B}\) and \(\mathbbm {1}_{B_{*}} = (\mathbbm {1}_{B})_{*}\). We draw randomly with replacement from \(\mathcal {X}\) independent bootstrap observations \(\mathbf {Y}_{1},\ldots , \mathbf {Y}_{m}\). Let us define
so that
where \(mW = m(W_{1},\ldots , W_{n})\) is a multinomial vector with m trials and parameters \((n^{-1},\ldots ,n^{-1})\), independent of the \(\mathbf {X}_{i}\). The probability measure induced by bootstrap resampling conditional on \(\mathcal {X}\) is denoted by \(P_W\). Let us define the bootstrappped empirical process by
Let \(T_{n}\) denote a sequence of maps. Let \(\mathbb {D}\) be a metric space. Let T be a \(\mathbb {D}\)-valued measurable map from the underlying probability. If \(T_{n}\) is bounded in outer probability, we will write \(T_{n} = O_{\mathbb {P}^{*}} (1)\), in a similar way, if \(T_{n}\) converges in outer probability to zero, we will write \(T_{n} = o_{\mathbb {P}^{*}} (1)\). Assume that
If (1) holds along almost every sequence \(\mathbf {X}_{1},\mathbf {X}_{2},\ldots\), we write \(T_{n} = O_{\mathbb {P}_{W}^{*}} (1)\) a.s. (almost surely). If for any subsequence \(\{T_{n^{\prime }}\}\), there exists a further subsequence \(\{T_{n^{\prime \prime }} \}\) with \(T_{n^{\prime \prime }} = O_{\mathbb {P}_{W}^{*}} (1)\) a.s., we write \(T_{n} = O_{\mathbb {P}_{W}^{*}} (1)\) i.p. (in probability). We write \(T_{n} =o_{\mathbb {P}_{W}^{*}}(1)\) a.s., if, for any \(\epsilon > 0\), we have
almost surely. We write \(T_{n} =o_{\mathbb {P}_{W}^{*}}(1)\) i.p., in the case when the convergence (2) is in probability. The weak convergence of \(T_{n}\) to T, in the sense of Hoffmann-Jørgensen (1991), is denoted by \(T_{n} \Rightarrow T\). The space of \(\mathbb {D}\)-valued functions in \(\mathbb {R}\) bounded by 1 in the Lipschitz norm is denoted by \(BL_{1}(\mathbb {D})\). The conditional weak convergence of \(T_{n}\) to a separable T in \(\mathbb {D}\) is characterized by the condition
In the case of the convergence (3) is in outer probability, we will write write \(T_{n} \Rightarrow T\) i.p., in a similar way, if it is outer almost sure, we write \(T_{n} \Rightarrow T\) a.s.
Define \(\mathcal {M}_{S,\mathcal {H}} = \{\mathbf {m}_{{\varvec{\theta }},h} : {\varvec{\theta }} \in S, h \in {\mathcal {H}}\} \subset \mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\), where \(S \subset {\varvec{\Theta }}\). For any \(\delta , \delta _{1},\eta > 0\), let us denote by \(\mathcal {M}_{\delta ,\delta _{1}}(\eta )\) and \(\mathcal {M}_{\delta ,\delta _{1}}\) the class of functions
The envelope function of \(\mathcal {M}_{\delta ,\delta _{1}}\) is denoted by \(M_{\delta ,\delta _{1}}\). For each \(\varvec{\psi } \in \mathcal {B}\) and \(h \in \mathcal {H}\) with \({\varvec{\theta }}_{0} +\varvec{\psi } \in {\varvec{\Theta }}\), define \(\widetilde{\mathbf {m}}_{\varvec{\psi },h} =\mathbf {m}_{{\varvec{\theta }}_{0}+\varvec{\psi },h}- \mathbf {m}_{{\varvec{\theta }}_{0},h}\). For any \(\mathcal {T} \subset \mathcal {B}\), the class of bounded functions from \(\mathcal {T}\) to \(\mathbb {R}\) is denoted by \(l^{\infty }(\mathcal {T})\), equipped with the sup norm. In the sequel, for all \(x\in \mathcal {S}\) and closed bounded \(K \subset {\varvec{\Theta }}\), assume that
In the sequel, we denote by C a positive constant that may be different from line to line. The choice of the bootstrap sample size m is theoretically governed by (AB1) and (C4). The above conditions are typically satisfied by taking \(m \propto n^{c}\), for some sufficiently small \(c \in (0, 1)\). Empirical determination of m has long been an important problem which has not yet been fully resolved, for more comments see Remark 3.12 below.
3 Main Results
In this section, we present four main theorems, each of independent interest, which lead eventually to weak convergence of \(r_{n}({\varvec{\theta }}_n - {\varvec{\theta }}_{0})\) and in-outer-probability m out of n bootstrap consistencies in the context of general M-estimation by applying the argmax theorem in van der Vaart and Wellner (1996) and in Lee (2012) respectively. Let us recall the basic idea. If the argmax functional is continuous with respect to some metric on the space of the criterion functions, then convergence in distribution of the criterion functions will imply the convergence in distribution of their points of maximum, the M-estimators, to the maximum of the limit criterion function. First, we establish consistency of \({\varvec{\theta }}_{n}\) and \({\varvec{\theta }}_{m}\) for \({\varvec{\theta }}_{0}\) by the following theorem.
3.1 Consistency
In our analysis, we consider the following assumptions. Assume that the sequence of positive constants \(r_{n} \uparrow \infty\), for some fixed \(\nu > 1\) and for some function \(\ell : (0, \infty ) \rightarrow [0, \infty )\) which is slowly varying at \(\infty\).
-
(A1) \(\mathbb {P}\left( \widehat{h}\in \mathcal {H}\right) \longrightarrow 1\) as \(n \longrightarrow \infty\) and \(d_{\mathcal {H}}(\widehat{h},h^{0}){\mathop {\longrightarrow }\limits ^{\mathbb {P}^{*}}}0\).
-
(A2) \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\) is Glivenko-Cantelli:
$$\Vert \mathbb {P}_{n}-\mathbb {P}\Vert _{\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}}=o_{\mathbb {P}^{*}} (1).$$ -
(A3) \(\lim _{d_{\mathcal {H}}(h,h_{0})\rightarrow 0}\sup _{\varvec{\theta }\in \varvec{\Theta }}|\mathbb {P}\mathbf {m}_{\varvec{\theta },h}-\mathbb {P}\mathbf {m}_{\varvec{\theta },h^{0}}|=0\).
-
(A4) The parameter of interest \({\varvec{\theta }}_{0}\) lies in the interior of \({\varvec{\Theta }}\) and satisfies, for every open \(\mathcal {O}\) containing \({\varvec{\theta }}_{0}\),
$$\mathbb {P}\mathbf {m}_{{\varvec{\theta }}_{0},h^{0}} > \sup _{{\varvec{\theta }}\notin \mathcal {O}} \mathbb {P}\mathbf {m}_{{\varvec{\theta }},h^{0}}.$$ -
(A5) The M-estimator \({\varvec{\theta }}_{n}\) satisfies \(\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }}_{n},\widehat{h}} \ge \mathbb {P}_{n} \mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}-R_{n}\), with
$$r_{n}^{\nu } \ell (r_{n} )R_{n} = o_{\mathbb {P}^{*}}(1).$$ -
(AB1) \(m=m_{n}\rightarrow \infty\), \(m=o(n)\) and \(r_{m}^{\nu }\ell (r_{m})=o\left( r_{n}^{\nu }\ell (r_{n})\right)\).
-
(AB2) \(d_{\mathcal {H}}(\widehat{h}_{m},h^{0})=o_{\mathbb {P}_W^{*}} (1)\) i.p.
-
(AB3) The m out of n bootstrap M-estimator \({\varvec{\theta }}_{m}\) satisfies \(\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }}_{m},\widehat{h}_{m}} \ge \widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}_{m}}-\widehat{R}_{n}\), with
$$r_{m}^{\nu }\ell (r_{m})\widehat{R}_{n} = o_{\mathbb {P}_W^{*}} (1),~ \text{ i.p. }$$
Remark 3.1
-
(i)
Assumption (A2) fulfilled under some entropy and moment conditions; see for example, Theorem 2.4.3, \((\mathrm {p} .123)\) of van der Vaart and Wellner (1996).
-
(ii)
Assumption (A3) is automatically hold for example if; there exist function \(\mathfrak {G}(\cdot )\) such that for any h in the neighborhood of \(h^{0}\) and any \(\varvec{\theta } \in \varvec{\Theta }\), we have:
$$|\mathbf {m}(\mathbf {X}_{i}, \varvec{\theta }, h)-\mathbf {m}(\mathbf {X}_{i}, \varvec{\theta }, h^{0})| \le \mathfrak {G}(\mathbf {X}_{i})d_{\mathcal {H}}(h,h^{0}).$$The function \(\mathfrak {G}(\cdot )\) satisfies;
$$\mathbb {P}\mathfrak {G}(\mathbf {X}) <\infty ,$$or the function \(h \mapsto \mathbf {m}(x,\varvec{\theta },h)\) is Lipschitz uniformly over x and \(\varvec{\theta }\).
-
(iii)
Assumptions (A5) and (AB3) are trivially fulfilled when
$$\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }}_{n},\widehat{h}} \ge \sup _{\varvec{\theta } \in \varvec{\Theta }} \mathbb {P}_{n}\mathbf {m}_{\varvec{\theta },\widehat{h}}-R_{n},$$and
$$\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }}_{m},\widehat{h}_{m}} \ge \sup _{\varvec{\theta } \in \varvec{\Theta }}\widehat{\mathbb {P}}_{m}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}-\widehat{R}_{n},$$respectively, which allows to deal with approximations of the value that actually \({\text {maximizes }} \theta \mapsto \mathbb {P}_{n}\mathbf {m}_{\varvec{\theta },\widehat{h}}\) and \({\text {maximizes }} \theta \mapsto \widehat{\mathbb {P}}_{m}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}\) respectively.
-
(iv)
Assumption (AB2) poses no difficulty in practice and is met trivially by, for example, setting \(\widehat{h}_{m}=\widehat{h}\).
-
(v)
For the finite-dimensional \(\theta\), (A5) and (AB3) can be achieved by a global maximization of the processes \(\mathbb {P}_{n} \mathbf {m}_{\varvec{\theta },\widehat{h}}\) and \(\widehat{\mathbb {P}}_{m} \mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}\), in this situation \(R_{n}=\widehat{R}_{n}=0\). For the infinite-dimensional \(\varvec{\theta }\), the maximization of the processes may be very complex or not practically feasible. To circumvent this, we need sophisticated algorithms to construct \(\varvec{\theta }_{n}\) and \(\varvec{\theta }_{m}\) fulfilling (A5) and (AB3).
-
(vi)
Finally, it’s possible to replace the following assumptions (A2) and (A4) by:
-
(A1’) For every compact \(K \subset \varvec{\Theta }\), \(\mathcal {M}_{K,\mathcal {H}}\) is Glivenko-Cantelli.
-
(A2’) The map \(\theta \mapsto \mathbb {P} \mathbf {m}_{\varvec{\theta },h^{0}}\) is upper semicontinuous with a unique maximum at \(\varvec{\theta }_{0}\).
-
(A3’) \(\varvec{\theta }_{n}\) is uniformly tight.
-
(AB1’) \(\varvec{\theta }_{m}\) is uniformly tight i.p.
-
Theorem 3.2
-
(i)
Assume (A1)-(A5). Then
$${\varvec{\theta }}_{n} - {\varvec{\theta }}_{0} = o_{\mathbb {P}^{*}}(1).$$ -
(ii)
Assume (A2), (A3), (A4) and (AB1)-(AB3). Then
$${\varvec{\theta }}_{m} - {\varvec{\theta }}_{0}= o_{\mathbb {P}_W^{*}} (1)~~ {i.p.}$$
Note that, the result of part (i) holds if we replaced (A2) and (A4) by ((A1’)-(A3’) and the result of part (ii) holds if we replaced (A2) and (A4) by (A1’), (A2’) and (AB1’).
In the sequel, we refer to the sets of assumptions which implies the parts (i) and (ii); (C) and (CP); respectively. Next we give the set of assumptions needed to identify rates of convergence of \({\varvec{\theta }}_{n}\) and \({\varvec{\theta }}_{m}\) to \({\varvec{\theta }}_{0}\), which is the important step for studying the weak convergence of these estimators.
Remark 3.3
We highlight that the parameter of interest \(\varvec{\theta }\) is not restricted to belong to some Euclidean space as in Delsol and Van Keilegom (2020). More precisely, we consider the general framework in which \(\varvec{\theta } \in \varvec{\Theta }\), where \(\varvec{\Theta }\) is a subset of some Banach space \(\mathcal {B}\). Notice that the result (i) of Theorem 3.2 is a bit more general than the analogous stated in the last reference, by the fact the conditions imposed are more general in our setting and extend those of Lee (2012) to the semiparametric models.
3.2 Rates of Convergence
Let us introduce the following assumptions:
-
(B1) \(v_{n}d_{\mathcal {H}}(\widehat{h},h^{0})=O_{\mathbb {P}^{*}}(1)\) for some \(v_{n}\longrightarrow \infty\).
-
(B2) For all \(\delta _{1}>0\), there exist \(\varvec{\alpha }<\nu\), \(K>0\), \(\delta _{0}>0\), for all \(n \in \mathbb {N}\) there exist a function \(\varphi\) for which \(\delta \mapsto \frac{\varphi (\delta )}{\delta ^{\varvec{\alpha }}}\) is decreasing on \((0,\delta _{0}]\) and \(r_{n}^{\nu }\ell (r_{n})n^{-1/2}\varphi (1/r_{n}) \le C\) for n sufficiently large and some positive constant C, such that for all \(\delta \le \delta _{0}\),
$$\mathbb {P}^{*}\left[ \sup \limits _{\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta , d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{n}}}|\mathbb {G}_n\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|\right] \le K\varphi (\delta ).$$ -
(B3) There exist \(\eta _{0}>0\), \(C>0\) and two positive and non-decreasing functions \(\varvec{\psi }_{1}\) and \(\varvec{\psi }_{2}\) on \((0,\eta _{0}]\) such that for all \(\varvec{\theta } \in \varvec{\Theta }\) satisfying \(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta _{0}\):
$$\mathbb {P}\widetilde{\mathbf {m}}_{{\varvec{\theta }}- {\varvec{\theta }}_{0},\widehat{h}} \le W_{n}\varvec{\psi }_{1}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ) -(C+o_{\mathbb {P}^{*}}(1))\varvec{\psi }_{2}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ).$$Moreover, there exist \(\beta _{2}>\alpha , \beta _{1}<\beta _{2}, \delta _{0}>0\) such that \(\delta \mapsto \varvec{\psi }_{1}(\delta ) \delta ^{-\beta _{1}}\) is non-increasing and \(\delta \mapsto \varvec{\psi }_{2}(\delta ) \delta ^{-\beta _{2}}\) is non-decreasing on \(\left( 0, \delta _{0}\right] ,\) and such that, for some sequence \(r_{n} \rightarrow \infty\),
$$\varvec{\psi }_{1}\left( r_{n}^{1-\nu }\ell ^{-1}(r_{n})\right) W_{n}=O_{\mathbb {P}^{*}}\left( \varvec{\psi }_{2}\left( r_{n}^{1-\nu }\ell ^{-1}(r_{n})\right) \right) .$$for definition of \(\mathbb {P}\)-measurability.
-
(BB1) \(v_{m}d_{\mathcal {H}}(\widehat{h}_{m},h^{0})=O_{\mathbb {P}_W^{*}}(1)\) i.p. for some \(v_{m}\longrightarrow \infty\).
-
(BB2) With the same notation in assumption (B2) we replace \(r_{n}\) (\(v_{n}\)) by \(r_{m}\) (\(v_{m}\)) with assumption (AB1) we have;
$$\mathbb {P}^{*}\mathbb {P}_{W}^{*}\left[ \sup \limits _{\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta , d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{m}}}|\widehat{\mathbb {G}}_m\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|\right] \le K\varphi (\delta ).$$ -
(BB3) With the same notation in assumption (B3) we replace \(r_{n}\) by \(r_{m}\) with assumption (AB1) in mind we have;
$$\mathbb {P}\widetilde{\mathbf {m}}_{{\varvec{\theta }}- {\varvec{\theta }}_{0},\widehat{h}_{m}} \le W_{m}\varvec{\psi }_{1}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ) -(C+o_{\mathbb {P}^{*}}(1))\varvec{\psi }_{2}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ),$$where for some sequence \(r_{m} \rightarrow \infty\),
$$\varvec{\psi }_{1}\left( r_{m}^{1-\nu }\ell ^{-1}(r_{m})\right) W_{m}=O_{\mathbb {P}_W^{*}} \left( \varvec{\psi }_{2}\left( r_{m}^{1-\nu }\ell ^{-1}(r_{m})\right) \right) , ~~\text{ i.p. }$$
Remark 3.4
-
(i)
Assumption (B1) is a high-level assumption. Such condition on the nuisance parameter \(\widehat{h}\) could be obtained by many asymptotic results. In the present paper, we are primarily concerned with the cases where the convergence rate of the M-estimator of \(\varvec{\theta }\) is not affected by the estimation of the nuisance parameter h.
-
(ii)
Assumption (B2) is fulfilled if we assume that for any \(\varvec{x}\) the function \((\varvec{\theta }, h) \rightarrow \mathbf {m}(\varvec{x}, \varvec{\theta }, h(\varvec{x}, \varvec{\theta }))-\mathbf {m}\left( \varvec{x}, \varvec{\theta }_{0}, h\left( \varvec{x}, \varvec{\theta }_{0}\right) \right)\) is uniformly bounded on an open neighborhood of \(\left( \varvec{\theta }_{0}, h^{0}\right)\), i.e., on
$$\left\{ (\varvec{\theta }, h): \Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta _{0}, d_{\mathcal {H}}\left( h, h^{0}\right) \le \delta _{1}^{\prime }\right\} ,$$for some \(\delta _{0}, \delta _{1}^{\prime }>0.\) We consider the class \(\mathcal {M}_{\delta ,\delta ^{\prime }_{1}}\) for any \(\delta \le \delta _{0}\) and its envelope \(M_{\delta , \delta _{1}^{\prime }}.\) For any \(\delta _{1},\) we have, for n large enough; \(\delta _{1} v_{n}^{-1} \le \delta _{1}^{\prime }\). After by the entropy conditions on \(\mathcal {M}_{\delta ,\delta _{1}^{\prime }},\)
$$\begin{aligned} \int _{0}^{1}\sup _{\delta<\delta _{0}} \sup _{\mathbb Q} \sqrt{1+\log N\left( \epsilon \left\| M_{\delta , \delta _{1}^{\prime }}\right\| _{\mathbb {L}_{2}\left( \mathbb {Q}\right) }, \mathcal {M}_{\delta ,\delta _{1}^{\prime }}, \mathbb {L}_{2}(\mathbb {Q})\right) } d \epsilon <+\infty , \end{aligned}$$(4)where the second supremum is taken over all finitely discrete probability measures \(\mathbb {Q}\) on \(\mathcal {S}\). We get;
$$\mathbb {P}^{*}\left[ \sup \limits _{\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta , d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{n}}}|\mathbb {G}_n\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|\right] \le K_{1} \sqrt{\mathbb {P}^{*}\left[ M_{\delta , \delta _{1}^{\prime }}^{2}\right] },$$see Theorems 2.14.1 and 2.14.2 in van der Vaart and Wellner (1996). Then the last part of (B2) holds if \(\varphi (\delta )\) can be chosen such that
$$\begin{aligned} \exists K_{0}, \forall \delta \le \delta _{0}: \sqrt{\mathbb {P}^{*}\left[ M_{\delta , \delta _{1}^{\prime }}^{2}\right] } \le K_{0}\varphi (\delta ). \end{aligned}$$(5)Note that, all the different rate of convergence \(r_{n}\) in the literature for smooth or not smooth function satisfied the last term in assumption (B2).
-
(iii)
We choose for simplification \(\varvec{\psi }_{1}(x)=Id(x)=x\) and \(\varvec{\psi }_{2}(x)=x^{2}\) in assumption (B3), so it’s hold under the following conditions :
-
(a)
\(\varvec{\Theta } \subset \mathcal {B}\), where \(\mathcal {B}\) is a Banach space.
-
(b)
There exists \(\delta _{2}>0\) such that for any h satisfying \(d_{\mathcal {H}}(h,h^{0})\le \delta _{2}\), the function \(\varvec{\theta }\mapsto \mathbb {P}(\mathbf {m}(\mathbf {X}, \varvec{\theta }, h))\) is twice Fréchet differentiable on an open neighborhood of \(\varvec{\theta }_{0}\),
$$\begin{aligned} \lim \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \rightarrow 0}\sup \limits _{d_{\mathcal {H}}(h,h^{0}) \le \delta _{2}}\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert ^{-2}&\left| \mathbb {P}\mathbf {m}_{\varvec{\theta },h}-\mathbb {P}\mathbf {m}_{\varvec{\theta }_{0},h}-\Gamma (\varvec{\theta }_{0},h)(\varvec{\theta }-\varvec{\theta }_{0})\right. \\&\left. +\frac{1}{2}\Lambda (\varvec{\theta }_{0},h)(\varvec{\theta }-\varvec{\theta }_{0},\varvec{\theta }-\varvec{\theta }_{0})\right| =0. \end{aligned}$$For more detail see (Allaire 2005, p.306).
-
(c)
\(\Gamma (\varvec{\theta }_{0},h)(\cdot )\) is a continuous linear form, with \(\Vert \Gamma (\varvec{\theta }_{0},\widehat{h})\Vert =O_{\mathbb {P}^{*}}\left( \frac{1}{r^{\nu -1}_{n}\ell (r_{n})}\right)\) and \(\Gamma (\varvec{\theta }_{0},h^{0})=0\).
-
(d)
\(\Lambda (\varvec{\theta }_{0},h)(\cdot ,\cdot )\) is bilinear form with \(\Lambda (\varvec{\theta }_{0},h^{0})\) is bounded, symmetric, positive definite and elliptic (i.e. \(\Lambda (\varvec{\theta }_{0},h^{0})(u,u)\ge C\Vert u\Vert ^{2}\)) and \(h\mapsto \Lambda (\varvec{\theta }_{0},h)\) is continuous in \(h^{0}\), i.e.,
$$\lim _{d_{\mathcal {H}}\left( h, h^{0}\right) \rightarrow 0} \sup _{u \in \mathbb {R}^{k},\Vert u\Vert =1}\left\| \left( \Lambda \left( \varvec{\theta }_{0}, h\right) -\Lambda \left( \varvec{\theta }_{0}, h^{0}\right) \right) u\right\| =0.$$These assumptions and the fact that the bilinear form is bounded, it results when \(d_{\mathcal {H}}(\widehat{h},h^{0})\le \delta _{2}\);
$$\begin{aligned} \mathbb {P}\mathbf {m}_{\varvec{\theta },\widehat{h}}-\mathbb {P}\mathbf {m}_{\varvec{\theta }_{0},\widehat{h}}&= \Gamma (\varvec{\theta }_{0},\widehat{h})(\gamma _{\varvec{\theta }})-\frac{1}{2}\Lambda (\varvec{\theta }_{0},h^{0})(\gamma _{\varvec{\theta }},\gamma _{\varvec{\theta }})+o_{\mathbb {P}^{*}}(\Vert \gamma _{\varvec{\theta }}\Vert ^{2})+o(\Vert \gamma _{\varvec{\theta }}\Vert ^{2})\\&\le W_{n}\Vert \gamma _{\varvec{\theta }}\Vert -C\Vert \gamma _{\varvec{\theta }}\Vert ^{2}+\beta _{n}(\Vert \gamma _{\varvec{\theta }}\Vert ), \end{aligned}$$where \(\gamma _{\varvec{\theta }}=\varvec{\theta }-\varvec{\theta }_{0}\). So (B3) holds with
$$W_{n}=\left\| \Gamma \left( \varvec{\theta }_{0}, \widehat{h}\right) \right\| .$$Note that when the space \(\varvec{\Theta } \subset \mathbf {E}\) where \(\mathbf {E}\) is some Euclidean space, the Fréchet derivatives \(\Gamma (\varvec{\theta }_{0},h)\) and \(\Lambda (\varvec{\theta }_{0},h)\) become the usually derivatives i.e., the Gradient and Hessian matrix respectively, which is given in Remark 2(v) of Delsol and Van Keilegom (2020).
-
(a)
-
(iv)
Assumption (BB1) poses no difficulty in practice and is met trivially by, for example, setting \(\widehat{h}_{m}=\widehat{h}\), like in Remark 3.1 (iv).
-
(v)
Assumption (BB2) is a ’high-level’ assumption. It serves to control the modulus of continuity of the bootstrapped empirical processes; which is needed to derive the rate of convergence of the bootstrapped estimator \(\varvec{\theta }_{m}\). Note that when we are in the situation of the n out of n bootstrap this condition is given in Ma and Kosorok (2005) and in Lemma 1 of Cheng and Huang (2010) for more generally in the exchangeable bootstrap weights. In our setting; it’s fulfilled under some entropy conditions, for brevity with the same notation in (ii), let \(\widetilde{N}_{1}, \widetilde{N}_{2}, \ldots\) be i.i.d. symmetrized Poisson variables with parameter \(\frac{1}{2} m / n\) and \(\varepsilon _{1}, \varepsilon _{2}, \ldots\) are i.i.d. Rademacher variables independent of \(\widetilde{N}_{1}, \widetilde{N}_{2}, \ldots\) and \(\mathbf{X}_{1}, \mathbf{X}_{2}, \ldots\). Denote by \(R=\left( R_{1}, \ldots , R_{n}\right)\) a random permutation of \(\{1,2, \ldots , n\}\), independent of all other variables. Let us introduce
$$\mathbb {P}_{k}^{R}=k^{-1} \sum _{i=1}^{k}\delta _{\mathbf {X}_{R_{i}}},$$for each \(k \in \{1, \ldots , n\}.\) By Lemma 3.6.6 of van der Vaart and Wellner (1996) and the paragraph before it (ahead) with sub-Gaussian inequality for Rademacher process we obtain
$$\begin{aligned} \mathbb {P}_{W}^{*}\left\| \widehat{\mathbb {G}}_{m}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}} \le 4 \mathbb {P}_{\widetilde{N}}\left\| \frac{1}{\sqrt{k}} \sum _{i=1}^{n} |\widetilde{N}_{i}|\varepsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}. \end{aligned}$$(6)Applying now Lemma 3.6.7 of van der Vaart and Wellner (1996) to the right side of (6) with \(n_{0}\) set to 1 we get;
$$\begin{aligned} \begin{aligned} \mathbb {P}_{W}^{*}\left\| \widehat{\mathbb {G}}_{m}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}&\le 4 \mathbb {P}_{\widetilde{N}}\left\| \frac{1}{\sqrt{k}} \sum _{i=1}^{n} |\widetilde{N}_{i}|\varepsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}\\&\le \sqrt{\frac{n}{k}}\left\| \widetilde{N}_{i}\right\| _{2,1} \max _{1 \le k \le n}\mathbb {P}_{R} \mathbb {P}_{\varepsilon }\left\| \frac{1}{\sqrt{k}} \sum _{i=1}^{k} \varepsilon _{i} \delta _{\mathbf {X}_{R_{i}}}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}^{*}\\&\le C \max _{1 \le k \le n} \mathbb {P}_{R}\left( \mathbb {P}_{k}^{R} M_{\delta , \delta _{1}^{\prime }}\right) ^{1 / 2}\\&\le C\left( \mathbb {P}_{n}M_{\delta , \delta _{1}^{\prime }} \right) ^{1 / 2}, \end{aligned} \end{aligned}$$(7)where \(C>\sqrt{\frac{n}{k}}\left\| \widetilde{N}_{i}\right\| _{2,1}\) see Problem 3.6.3 of van der Vaart and Wellner (1996). By Jensen’s inequality the outer expectation of the left side of (7) is bounded by \(C\sqrt{\mathbb {P}[M_{\delta ,\delta ^{'}_{1}}]^{2}},\) for every \(\delta < \delta _{1}\). The inequality in assumption (BB2) holds for every \(n \in \mathbb {N}\) this implied by the fact that the variables we consider are i.i.d.
-
(vi)
Finally, for the assumption (BB3) with the same discussion given in (iii) only the choice \(W_{n}=\left\| \Gamma \left( \varvec{\theta }_{0}, \widehat{h}\right) \right\|\) becomes
$$W_{m}=\left\| \Gamma \left( \varvec{\theta }_{0}, \widehat{h}_{m}\right) \right\| ,$$with \(W_{m}=O_{\mathbb {P}_W^{*}}\left( \frac{1}{r^{\nu -1}_{m}\ell (r_{m})}\right)\) i.p.
Theorem 3.5
-
(i)
Assume (C) and (B1)-(B3). Then
$$r_{n}({\varvec{\theta }}_{n} - {\varvec{\theta }}_{0}) = O_{\mathbb {P}^{*}}(1).$$ -
(ii)
Assume (CP) and (BB1)-(BB3). Then
$$r_{m}({\varvec{\theta }}_{m}-{\varvec{\theta }}_{0}) = O_{\mathbb {P}_W^{*}} (1)~~\text{ i.p. }$$
Remark 3.6
The result (i) of this Theorem still holds for \(\varvec{\theta }\) belongs to Banach space which is more general of the Theorem 2 of Delsol and Van Keilegom (2020), where the authors are interested in the finite dimensional case. Noting that the choice of \(\nu =2\) and \(\ell \equiv 1\) in assumptions B2 and B3, reduces to the assumptions B2 and B3 respectively of the last reference.
3.3 Weak Convergence
We start this section by introducing some notation. For any \(\varvec{\theta } \in \varvec{\Theta }\) and \(h \in \mathcal {H}\), let \(\mathcal {K}=\{ \varvec{\varvec{\gamma }}\in \mathbf {E} : \Vert \varvec{\gamma }\Vert \le K \}\) for \(K > 0\). Define, for sufficiently large n and for \(\gamma \in \mathcal {K}\), the empirical processes
which can be treated as random elements in \(\ell ^{\infty }(\mathcal {K})\). Also let for any \(\delta >0\);
Finally, for any \(p \in \mathbb {N}\) and any \(f : \varvec{\Theta } \longrightarrow \mathbb {R}\) and for any \(\varvec{\gamma }=(\varvec{\gamma }_{1},\ldots ,\varvec{\gamma }_{p}) \in \varvec{\Theta }^{p}\), denote
We give the set of assumptions for the asymptotic distribution of the processes given in (8) and their maximum.
-
(C1) \(r_{n}\Vert \varvec{\theta }_{n}-\varvec{\theta }_{0}\Vert =O_{\mathbb {P}^{*}}(1)\) and \(v_{n}d_{\mathcal {H}}(\widehat{h},h^{0})=O_{\mathbb {P}^{*}}(1)\) for some sequences \(r_{n} \longrightarrow \infty\) and \(v_{n} \longrightarrow \infty ,\) and \(r^{\nu -2}_{n}\ell (r_{n})< C\) for some \(C>0\).
-
(C2) \(\varvec{\theta }_{0}\) lies to the interior of \(\varvec{\Theta }\), where \(\varvec{\Theta } \subset (\mathcal {B},\Vert \cdot \Vert )\).
-
(C3) For all \(\delta _{2},\delta _{3}>0\),
$$\sup \limits _{\begin{array}{c} \Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}\\ d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}} \end{array} }\frac{\displaystyle |(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}{\displaystyle r^{-\nu }_{n}\ell ^{-1}(r_{n})+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|+|\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}=o_{\mathbb {P}^{*}}(1).$$ -
(C4) There exists a sequence \(\{a_{n}\}\) with
$$a_{m}^{2}m^{-1/2}\log n/ \log (n/m + 1) = o(1)~~ \text{ and } ~~a_{n}^{-1} = O(1),$$such that, for all \(C,\eta >0\) and for every sequence \(\{j_{n}\}\) with \(a_{n}= o(j_{n})\),
$$\begin{aligned} \begin{array}{cc} \frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n}\mathbb {P}^{*}\left[ \mathbf {M}^{2}_{\frac{C}{r_{n}}}\right] =O(1)~~\text{ and }~~\frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n}\mathbb {P}^{*}\left[ \mathbf {M}^{2}_{\frac{C}{r_{n}}} \mathrm{1\!I}_{\Big \{\mathbf {M}_{\frac{C}{r_{n}}} > \frac{\eta j_{n} n^{1/2}}{r^{\nu }_{n}\ell (r_{n})} \Big \}} \right] =o(1). \end{array} \end{aligned}$$ -
(C5) For all K and for any \(\eta _{n}\longrightarrow 0\),
$$\sup \limits _{\Vert \varvec{\gamma }_{1}-\varvec{\gamma }_{2}\Vert <\eta _{n}, \Vert \varvec{\gamma }_{1}\Vert \vee \Vert \varvec{\gamma }_{2}\Vert \le K}\frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n}\mathbb {P}\left[ \mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\frac{\varvec{\gamma }_{1}}{r_{n}},h^{0}\right) -\mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\frac{\varvec{\gamma }_{2}}{r_{n}},h^{0}\right) \right] ^{2}=o(1).$$ -
(C6) For \(\varvec{x}\), the function \(\varvec{\theta } \mapsto \mathbf {m}(\varvec{x},\varvec{\theta },h^{0})\) and almost all paths of the two processes \(\varvec{\theta } \mapsto \mathbf {m}(\varvec{x},\varvec{\theta },\widehat{h})\) and \(\varvec{\theta } \mapsto \mathbf {m}(\varvec{x},\varvec{\theta },\widehat{h}_{m})\) are uniformly bounded on closed bounded sets (over \(\varvec{\theta }\)).
-
(C7) There exist a random and linear function \(W_{n} : \mathcal {B}\longrightarrow \mathbb {R}\), a deterministic and bilinear function \(V : \mathcal {B}\times \mathcal {B} \longrightarrow \mathbb {R}\) and \(\beta _{n}=o_{\mathbb {P}^{*}}(1)\); such that for all \(\varvec{\theta } \in \varvec{\Theta }\);
$$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}}=W_{n}(\varvec{\gamma }_{\varvec{\theta }})+V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+\beta _{n}\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})$$and
$$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}=V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}),$$where \(\varvec{\gamma }_{\varvec{\theta }}=\varvec{\theta }-\varvec{\theta }_{0}\) and the notation \(o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})\) means
$$\lim \limits _{\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert \longrightarrow 0}\frac{\displaystyle o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})}{\displaystyle \Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}}=0.$$Moreover, for any bounded closed set \(\mathcal {K} \subset \mathcal {B}\),
$$\begin{aligned} \begin{array}{cc} \exists \tau , \delta _{1}>0, r^{\nu -1}_{n}\ell (r_{n}) \underset{\Vert \varvec{\gamma }\Vert \le \delta }{\underset{\varvec{\varvec{\gamma }}\in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\left| \frac{\displaystyle W_{n}(\varvec{\gamma })}{\displaystyle \delta ^{\tau }}\right| =O_{\mathbb {P}}(1) ~~\text{ and }~~ \underset{\Vert \varvec{\gamma }-\varvec{\gamma }^{'}\Vert \le \delta }{\underset{\varvec{\gamma }, \varvec{\gamma }^{'} \in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\frac{\displaystyle |V(\varvec{\gamma },\varvec{\gamma })-V(\varvec{\gamma }^{'},\varvec{\gamma }^{'})|}{\displaystyle \delta ^{\tau }}<\infty . \end{array} \end{aligned}$$ -
(C8) There exists a zero-mean Gaussian process \(\mathbb {G}\) defined on \(\mathcal {B}\) and a continuous function \(\Lambda\) such that for all \(p \in \mathbb {N}\) and for all \(\varvec{\gamma }=(\varvec{\gamma }_{1},\ldots ,\varvec{\gamma }_{p}) \in \mathcal {K}^{p}\),
$$r^{\nu -1}_{n}\ell (r_{n})\overline{W_{n}}_{\varvec{\gamma }}+r^{\nu }_{n}\ell (r_{n})\overline{\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\cdot }{r_{n}},h^{0}}}_{\varvec{\gamma }}\Rightarrow \overline{\Lambda }_{\varvec{\gamma }}+\overline{\mathbb {G}}_{\varvec{\gamma }}.$$Moreover, \(\mathbb {G}(\varvec{\gamma })=\mathbb {G}(\varvec{\gamma }^{'})\) a.s. implies that \(\varvec{\gamma }=\varvec{\gamma }^{'}\), and
$$\mathbb {P}^{*}\left( \limsup \limits _{\Vert \varvec{\gamma }\Vert \longrightarrow \infty }{(\Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }))}<\sup \limits _{\varvec{\varvec{\gamma }}\in \mathcal {B}}(\Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }))\right) =1.$$ -
(C9) There exists a \(\delta _{0}>0\) such that
$$\int \limits _{0}^{\infty }\sup \limits _{\delta \le \delta _{0}}\sup _{\mathbb Q}\sqrt{\log \left( N(\epsilon \Vert \mathbf {M}_{\delta }\Vert _{\mathbb {Q},2}, \mathcal {M}_{\delta }, \mathbb {L}^{2}(\mathbb {Q}))\right) }d\epsilon <\infty .$$ -
(C10) For all \(\delta ,\eta > 0\), the classes \(\mathcal {M}_{\delta }, \mathcal {M}_{\delta }(\eta )\) and \(\mathcal {M}_{\delta }(\eta )^{2}\) are \(\mathbb {P}\)-measurable, see (van der Vaart and Wellner 1996, p.110) for definition of \(\mathbb {P}\)-measurability.
-
(C11) For all \(C>0\), there exists \(n_{0} \in \mathbb {N}\) such that for all \(n_{0} \ge n\),
$$\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{n}},\widehat{h}} \ge \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{C}{r_{n}}}\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}-R_{n},$$where \(R_{n}\) is given in (A5).
-
(CB1) \(r_{m}\Vert \varvec{\theta }_{m}-\varvec{\theta }_{0}\Vert =O_{\mathbb {P}_W^{*}}(1)\) i.p. and \(v_{m}d_{\mathcal {H}}(\widehat{h}_{m},h^{0})=O_{\mathbb {P}_W^{*}}(1)\) i.p. for some sequences \(r_{m} \longrightarrow \infty\) and \(v_{m} \longrightarrow \infty\) and \(r^{\nu -2}_{m}\ell (r_{m}) \le C\).
-
(CB2) For all \(\delta _{2},\delta _{3}>0\),
$$\sup \limits _{\begin{array}{c} \Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}\\ d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}} \end{array} }\frac{\displaystyle |(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}{\displaystyle r^{-\nu }_{m}\ell ^{-1}(r_{m})+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|+|\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}=o_{\mathbb {P}^{*}}(1).$$ -
(CB3) There exists a random and linear function \(W_{m} : \mathcal {B}\longrightarrow \mathbb {R}\), and \(\beta _{m}=o_{\mathbb {P}^{*}}(1)\), such that for all \(\varvec{\theta } \in \varvec{\Theta }\);
$$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}=W_{m}(\varvec{\gamma }_{\varvec{\theta }})+V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+\beta _{n}\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})$$and
$$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}=V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}).$$ -
Moreover, for any closed bounded set \(\mathcal {K} \subset \mathbf {E}\),
$$\begin{aligned} \begin{array}{cc} \exists \tau , \delta _{1}>0, r^{\nu -1}_{m}\ell (r_{m}) \underset{\Vert \varvec{\gamma }\Vert \le \delta }{\underset{\varvec{\varvec{\gamma }}\in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\left| \frac{\displaystyle W_{m}(\varvec{\gamma })}{\displaystyle \delta ^{\tau }}\right| =O_{\mathbb {P}_W^{*}}(1)~ \text{ i.p. }, \underset{\Vert \varvec{\gamma }-\varvec{\gamma }^{'}\Vert \le \delta }{\underset{\varvec{\gamma }, \varvec{\gamma }^{'} \in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\frac{\displaystyle |V(\varvec{\gamma },\varvec{\gamma })-V(\varvec{\gamma }^{'},\varvec{\gamma }^{'})|}{\displaystyle \delta ^{\tau }}<\infty . \end{array} \end{aligned}$$ -
(CB4)
$$r^{\nu -1}_{m}\ell (r_{m})\overline{W_{m}}_{\varvec{\gamma }}+r^{\nu }_{m}\ell (r_{m})\overline{\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\cdot }{r_{m}},h^{0}}}_{\varvec{\gamma }}\Rightarrow \overline{\Lambda }_{\varvec{\gamma }}+\overline{\mathbb {G}}_{\varvec{\gamma }}~\text {i.p.},$$where \(\Lambda\) and \(\mathbb {G}\) are given in (C8) and the weak convergence is conditionally on the sample.
-
(CB5) For all \(C>0\), there exist \(m_{0} \in \mathbb {N}\) such that for all \(m \ge m_{0}\),
$$\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }_{m}},\widehat{h}_{m}} \ge \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{C}{r_{m}}}\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}_{m}}-\widehat{R}_{n},$$where \(\widehat{R}_{n}\) is given in (AB3).
Remark 3.7
-
(i)
We can obtained the first part of condition (C1) by part (i) of Theorem 3.5.
-
(ii)
Assumption (C3) holds under the common condition: for all \(\delta _{2},\delta _{3}>0,\)
$$\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}}}\left| (\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| =o_{\mathbb {P}^{*}}(r^{-\nu }_{n}\ell ^{-1}(r_{n})),$$which is implied by the fact that; there exists a function f and a constant \(\delta _{0}>0\) such that for all \(\delta _{2},\delta _{3}<\delta _{0}\),
$$r^{\nu }_{n}\ell (r_{n})f\left( \frac{\delta _{2}}{r_{n}},\frac{\delta _{3}}{v_{n}}\right) =o\left( \sqrt{n}\right) ,$$and
$$\begin{aligned} \mathbb {P}^{*}&\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}}}\left| (\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| \right] \\&\le 2\mathbb {P}^{*}\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \right] \\&\le \frac{1}{\sqrt{n}}{f\left( \frac{\delta _{2}}{r_{n}},\frac{\delta _{3}}{v_{n}}\right) }. \end{aligned}$$Using the same arguments as in Remark 3.3(ii), we get the last inequality.
-
(iii)
If we assume that \(j_{n}=\sqrt{n}\), and noting; \(\gamma \mapsto \mathbb {M}_{n}(\gamma ,h^{0})=\frac{r_{n}^{\nu }\ell (r_{n})}{\sqrt{n}} \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\gamma /r_{n},h^{0}}\) is the empirical process with indexed class \(\frac{r_{n}^{\nu }\ell (r_{n})}{\sqrt{n}}\mathcal {M}_{\frac{C}{r_{n}}}\) then, under assumption (B2), the assumptions (C4) and (C5) hold by the following conditions: there exists a \(\delta _{4}>0\) such that for all \(\delta \le \delta _{4}\), \(\mathbb {P}^{*}(\mathbf {M}^{2}_{\delta }) \le K\varphi ^{2}(\delta )\) for some \(C>0\),
$$\lim \limits _{\delta \longrightarrow 0}\frac{\mathbb {P}^{*}\left[ \mathbf {M}^{2}_{\delta } \mathrm{1\!I}_{\{\mathbf {M}_{\delta }>\eta \delta ^{-2} \varphi ^{2}(\delta ) \}} \right] }{\varphi ^{2}(\delta )}=0,$$for all \(\eta >0\) and
$$\lim \limits _{\epsilon \longrightarrow 0} \lim \limits _{\delta \longrightarrow 0} \sup \limits _{\Vert \varvec{\gamma }_{1}-\varvec{\gamma }_{2}\Vert <\epsilon , \Vert \varvec{\gamma }_{1}\Vert \vee \varvec{\gamma }_{2} \le K}\frac{\mathbb {P}\left[ \mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\varvec{\gamma }_{1}\delta ,h^{0}\right) -\mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\varvec{\gamma }_{2}\delta ,h^{0}\right) \right] ^{2}}{\varphi ^{2}(\delta )}=0,$$for all \(C>0\), corresponding the case to Theorem 3.2.10 in van der Vaart and Wellner (1996).
-
(iv)
Let \(\mathcal {K}\) be an arbitrary closed bounded subset in \(\mathcal {B}\), the first part of condition (C8) is used to assume the convergence of the marginals of the process \(\gamma \mapsto r^{\nu -1}_{n}\ell (r_{n})W_{n}(\gamma )+r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}\) for deriving its weak convergence in \(\ell ^{\infty }(\mathcal {K})\) by the fact that it is asymptotically tight; which is fulfilling by using (C4), (C5), (C9) and the preceding discussion in (iii). If
$$r^{\nu -1}_{n}\ell (r_{n})\sup \limits _{\varvec{\varvec{\gamma }}\in \mathcal {K},\varvec{\varvec{\gamma }}\ne 0}\Vert W_{n}(\varvec{\gamma })\Vert \varvec{\gamma }\Vert ^{-1}\Vert =o_{\mathbb {P}}(1),$$we treat the given process as in the parametric case, where its marginals converge provided that
$$\begin{aligned} \begin{array}{l} \lim \limits _{n \rightarrow \infty } \frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n} \mathbb {P}\left\{ \left[ \mathbf {m}\left( \mathbf {X}, \varvec{\theta }_{0}+\frac{\gamma _{1}}{r_{n}}, h^{0}\right) -\mathbf {m}\left( \mathbf {X}, \varvec{\theta }_{0}+\frac{\gamma _{2}}{r_{n}}, h^{0}\right) \right] ^{2}\right\} \\ \quad =\mathbb {P}\left[ \left( \mathbb {G}\left( \gamma _{1}\right) -\mathbb {G}\left( \gamma _{2}\right) \right) ^{2}\right] , \end{array} \end{aligned}$$for all \(\gamma _{1}, \gamma _{2}\) and we lead to a rate of convergence \(r_{n}\) as the solution of
$$r_{n}^{\nu }\ell (r_{n})\varphi (1/r_{n})=\sqrt{n},$$for more detail see Theorem 3.2.10 of van der Vaart and Wellner (1996). Note that almost all sample paths of the process \(\varvec{\varvec{\gamma }}\mapsto \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })\) have a supremum affiliated to their attitude on bounded closed set, which is guaranteed by the last assumption. The dominant term of the deterministic part \(\Lambda\) is usually a negative definite quadratic form and hence exponential inequalities could lead to such result, for example when we are in the situation of the smooth function, one can refer to Lee and Pun (2006), Ma and Kosorok (2005), Kosorok (2008), Kristensen and Salanié (2017) among many others.
-
(v)
Assumption (C9) is a technical assumption, which is the same in the parametric case where the nuisance parameter \(h^{0}\) is known, needs to show that; the process \(\varvec{\varvec{\gamma }}\mapsto r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}\) is asymptotically tight, see Theorem 3.2.10 of van der Vaart and Wellner (1996).
-
(vi)
First part of (CB1) follows by part (ii) of Theorem 3.5.
-
(vii)
Assumption (CB2) is automatically hold under the condition : for all \(\delta _{2},\delta _{3}>0,\)
$$\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}}}\left| (\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| =o_{\mathbb {P}_{W}^{*}}(r^{-\nu }_{m}\ell ^{-1}(r_{m})) ~\text {i.p}.$$This condition is hold if: there exists a function g and a constant \(\delta _{0}>0\) such that for all \(\delta _{2},\delta _{3}<\delta _{0}\),
$$r^{\nu }_{m}\ell (r_{m})g\left( \frac{\delta _{2}}{r_{m}},\frac{\delta _{3}}{v_{m}}\right) =o\left( \sqrt{m}\right) ,$$and
$$\begin{aligned} \mathbb {P}^{*}\mathbb {P}_{W}^{*}&\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}}}\left| (\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| \right] \\&\le 2\mathbb {P}^{*}\mathbb {P}_{W}^{*}\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}}}\left| \widehat{\mathbb {G}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \right] \\&\le \frac{1}{\sqrt{m}}{g\left( \frac{\delta _{2}}{r_{m}},\frac{\delta _{3}}{v_{m}}\right) }. \end{aligned}$$Using the same arguments as in Remark 3.3(v), we get the last inequality.
-
(viii)
Following similar discussion of the condition (C7) provided in Remark 3(iv) of Delsol and Van Keilegom (2020), we only change the random function \(W_{n}(\varvec{\gamma })\) for the bootstrap version to \(W_{m}(\gamma )=\langle \Gamma (\varvec{\theta }_{0}, \widehat{h}_{m}),\varvec{\gamma }\rangle\). If we are in the situation where \(\widehat{h}_{m}\) is calculated from a dataset independently from the bootstrapped sample \((\mathbf {X}_{1}^{*},\ldots ,\mathbf {X}_{m}^{*})\), so it is sufficient for assumption (CB4) to suppose the conditional weak convergence of each term; \(r^{\nu -1}_{m}\ell (r_{m})\overline{W_{m}}_{\varvec{\gamma }}\) and \(r^{\nu }_{m}\ell (r_{m})\overline{\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\cdot }{r_{n}},h^{0}}}_{\varvec{\gamma }}\) separately. We can get the convergence of the second one as the same in the situation without the nuisance parameter, the interested reader is referred to Lemma 1 of Lee (2012). Note that if \(r^{\nu -1}_{m}\ell (r_{m}) \Gamma (\varvec{\theta }_{0}, \widehat{h}_{m}) \rightarrow W\) conditionally in distribution, the marginals of the process \(\gamma \mapsto \langle r^{\nu -1}_{m}\ell (r_{m}) \Gamma (\varvec{\theta }_{0}, \widehat{h}_{m}), \gamma \rangle\) tend in distribution to the marginals of \(\gamma \mapsto \langle W, \gamma \rangle .\) Furthermore, if \(r_{m}=\sqrt{m}\) and \(\ell \equiv 1,\) it is common to assume that
$$\Gamma (\varvec{\theta }_{0}, \widehat{h}_{m})=m^{-1} \sum \limits _{i=1}^{m} U_{i, m}+ o_{\mathbb {P}_{W}^{*}}\left( m^{-1 / 2}\right) ,$$where \(U_{i, m}, i=1,\ldots ,m,\) are independent and centered random variables. The convergence follows from Lindeberg’s condition.
Theorem 3.8
(Weak Convergence of Empirical Processes). For all \(K>0\), let \(\mathcal {K}=\{\varvec{\varvec{\gamma }}\in \mathbf {E} : \Vert \varvec{\gamma }\Vert \le K\}\) be a closed bounded subset of \(\mathcal {B}\), treating \(\varvec{\varvec{\gamma }}\mapsto \mathbb {M}_{n}(\gamma ,\widehat{h})\) and \(\varvec{\varvec{\gamma }}\mapsto \widehat{\mathbb {M}}_{n}(\gamma ,\widehat{h}_{m})\) as random elements in \(\ell (\mathcal {K})\) for sufficiently large n, we have the following results:
-
(i)
Assume (C1)-(C10). Then
$$r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},\widehat{h}}\Rightarrow \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }).$$ -
(ii)
Assume (A2), (AB1), (B2), (C2)-(C6),(C9)- (C11) and (CB1)-(CB4). Then
$$r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}} \Rightarrow \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })~~ \text{ i.p. }$$
Our main results concerning weak convergence of \(r_{n}({\varvec{\theta }}_{n} -{\varvec{\theta }}_{0})\) and m out of n bootstrap consistency are embodied in the following theorem.
Theorem 3.9
Assume for any such \(\mathcal {K}\) that almost every sample path of the process \(\varvec{\varvec{\gamma }}\mapsto \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })\) achieves its supremum at a unique random point \(\gamma _{0}= \arg \max \limits _{\gamma \in \mathcal {B}} \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })\), then;
-
(i)
Assume (C1)-(C11). Then
$$r_{n}({\varvec{\theta }}_{n}-{\varvec{\theta }}_{0}) \Rightarrow \gamma _{0}.$$ -
(ii)
Assume (A2), (AB1), (B2), the first part of (C1), (C2)-(C6),(C9)- (C11) and (CB1)-(CB5). Then
$$r_{m}({\varvec{\theta }}_{m} -{\varvec{\theta }}_{n}) \Rightarrow \gamma _{0} ~~\text{ i.p. }$$
Remark 3.10
The result (i) of the Theorem 3.8 is the same result of Lemma 1 of Delsol and Van Keilegom (2020) where the parameter of the interest \(\varvec{\theta }\) is in a Euclidean space, for the particular case \(\nu =2\) and \(\ell \equiv 1\), then by the application of Theorem 3.2.2 of van der Vaart and Wellner (1996) and the uniform tightness of the sequence \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\), the authors established the weak convergence to some tight random variable \(\gamma _{0}\) in \(\ell ^{\infty }(\mathcal {K})\) for the compact set \(\mathcal {K}\) in their Theorem 3 which is given in the result (i) of the Theorem 3.9 in this case. In our setting, we provide the weak convergence of the same sequence for the Banach valued parameter by using Theorem of van der Vaart and Wellner (1996) where the compact sets and the uniformed tightness of \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) are replaced, respectively, by closed bounded sets with a similar structure as the set \(\mathcal {K}\) and \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})=O_{\mathbb {P}^{*}}(1)\), as given in Lee (2012) without the nuisance parameter \(h^{0}\).
Note that (i) still holds if (C4) is replaced by this more weak condition
In order to prove the conditional stochastic equicontinuity of the bootstrapped process \(\widehat{\mathbb {M}}_{n}\) we need the condition (C4), that is fulfilled if the uniform integrability condition is imposed for \(j_{n} \ge n^{c}\), for some \(0<c < 1/4\).
Remark 3.11
It is well known that Theorem 3.9 can be used easily through routine bootstrap sampling, which we describe briefly as follows. More precisely, this can be used, for example, to form confidence bands for the true parameter \(\varvec{\theta }\) based N, be a large integer, sampled samples \(\mathbf { Y}_{1}^{k},\ldots ,\mathbf { Y}_{m}^{k}\), \(k=1,\ldots ,\mathfrak {N}\). Let \(\varvec{\theta }_{m}^{k}\) the bootstrapped estimator of \(\varvec{\theta }\) based on the sample \(\mathbf { Y}_{1}^{k},\ldots ,\mathbf { Y}_{m}^{k}\), \(k=1,\ldots ,\mathfrak {N}\). An application of Theorem 3.9 implies that
where \(\gamma _{0}^{1},\ldots ,\gamma _{0}^{\mathfrak {N}}\) are independent copies of \(\gamma _{0}\). Notice that we have
In order to approximate \(c(\alpha )\), one can use the sampling estimator \(\widehat{c}(\alpha )\), of \(c(\alpha )\), as the smallest z such that
Remark 3.12
In nonregular problems where the conventional n out of n bootstrap is inconsistent, the m out of n bootstrap provides a useful remedy to restore consistency. In practice, however, choosing an appropriate m needs careful attention. Asymptotically, \(\sqrt{n}, \log n\) or \(20 \log n\) satisfy the o(n) requirement, but in finite sample settings the actual results can vary dramatically depending on the choice. Let \(\mathcal {X}_{n}=\left( X_{1}, \ldots , X_{n}\right)\) be a random sample drawn from an unknown distribution F, and \(T_{n}\left( \mathcal {X}_{n}, F\right)\) be a statistical functional of interest. Under mild conditions the m out of n bootstrap distribution \(\mathcal {L}_{m, n}^{*}\) provides a consistent estimator of the distribution \(\mathcal {L}_{n}\) of \(T_{n}\left( \mathcal {X}_{n}, F\right) ,\) provided that the bootstrap sample size m is properly chosen, refer to Götze and Račkauskas (2001) and Bickel et al. (1997). Empirical selection of m has long been an important problem, which has been discussed by, for example, Datta and McCormick (1995), Hall et al. (1995) and Politis et al. (1999) in different contexts. The prevailing idea is to estimate a theoretically optimal sample size m, defined in a frequentist sense to be the value of m which minimises the expected value of some metric measure \(d\left( \mathcal {L}_{n}, \mathcal {L}_{m, n}^{*}\right)\) between \(\mathcal {L}_{n}\) and \(\mathcal {L}_{m, n}^{*}\). The problem can be solved using bootstrap samples of size m, where \(m \rightarrow \infty\) and \(m / n \rightarrow 0\). Bickel and Sakov (2008) proposed an adaptive rule to select a value \(\hat{m}\) and discuss its properties. The authors show, under some conditions, that \(\widehat{m} / n {\mathop {\rightarrow }\limits ^{P}} 1\) when the n bootstrap works, but \(\widehat{m} \rightarrow \infty\) and \(\widehat{m} / n \rightarrow 0\) when the n-bootstrap does not work. More precisely, the authors suggested the following rule for choosing m:
-
1.
Consider a sequence of m’s of the form
$$m_{j}=\left\lfloor q^{j} n \right\rfloor , \quad \text{ for } j=0,1,2, \ldots , \quad 0<q<1,$$where \(\lfloor \alpha \rfloor\) denotes the smallest integer \(\ge \alpha\).
-
2.
For each \(m_{j},\) find \(L_{m_{j}, n}^{*}\) (in practice this is done by Monte-Carlo).
-
3.
Let d be some metric consistent with convergence in law, and set
$$\hat{m}=\underset{m_{j}}{{\text {argmin }}} d \left( L_{m_{j}, n}^{*}, L_{m_{j+1}, n}^{*}\right) .$$
If there is more than one value of m which minimizes the difference, then we pick the largest one. These results mean that the rule behaves well under both situations. Swanepoel (1986) proposed \(m = (2/3) n\) to obtain the desired coverage probability of a confidence interval. Alin et al. (2017) have considered \(m=n^{j}\) where the value j satisfies \(n^{j}=\frac{2}{3} n\). Solving this equation for j, this expression leads to the choice
for which we note \(0<j<1\), so that \(m=o(n)\). Götze and Račkauskas (2001) have suggested the estimation m by minimizing \(d\left( \mathcal {L}_{m, n}^{*}, \mathcal {L}_{m / 2, n}^{*}\right) ,\) yielding an optimal bootstrap sample size in the sense of Wei et al. (2016), provided that the latter has order \(o_{p}(n)\). Wei et al. (2016) have investigated stochastic version of the optimal bootstrap sample size, defined as the minimiser of an error measure calculated directly from the observed sample. The authors have developed procedures for calculating the stochastically optimal value of m. The performance of their methodology is illustrated in the special forms of Edgeworth-type expansions which are typically satisfied by statistics of the shrinkage type.
4 Numerical Results
We provide numerical illustrations regarding the asymptotic distribution of estimators in the classification with missing data, details are provided in Sect. 7.2. The computing program codes were implemented in R. In our simulation, we will show resampling bootstrap samples of size n fails while resampling with size m satisfying the conditions given in previous sections for the consistency of the bootstrap. Let us describe the model, define
where \(U \sim \mathcal {U}[0,1]\), \(\epsilon \sim \mathcal {U}[-.1,.1]\) and \(X_{1} \sim \mathcal {U}[0,1]\), with \(X_{1}\), \(\epsilon\) and U are independent. Let
were \(f_{\theta }(x_{1})=\theta x_{1}\), for some \(\theta ,\) we define
The data is composed of \(Z_{i}=\left( X_{i1}, X_{i2}, Y_{i}\Delta _{i}, \Delta _{i}\right)\) \(i=1,2,\ldots ,n\) from the described model. For the bandwidth, we use \(h_{n}=\frac{c_{h}}{\sqrt{n}}\) (\(h_{m}=\frac{c_{h}}{\sqrt{m}}\)), which satisfies the requirements of regularity conditions of the asymptotic theory. In this simulation, we use the quadratic kernel defined by
which is a density function having support \([-1,1]\). The results given below are based on three different value of n, we took \(n=250\), \(n=1000\) and \(n=2000\) and the true value to be \(\theta ^{0}=1\), we choose \(c_{h}=3.5\) and \(\alpha _{0}=0.5\), this choice is not restrictive, we can obtain the same desired result with different value of \(c_{h}\) and \(\alpha _{0}\) for example \(c_{h}=2~~ \text {or} ~~5\) and \(\alpha _{0}=.25~~ \text {or} ~~.75\). The bootstrap procedure is as follows, for each value of m we generate B independent bootstrap samples \(\left\{ Z_{i b}^{*}: i \le m\right\}\) for \(b=1,\ldots ,{B}\), using some method of bootstrapping, and for each given value of m, we compute an estimator \(\theta ^{(b)}_{m}\) based on the b-th bootstrapped sample. Our main objective is to give a comparison between the distribution of \(n^{1/3}(\theta _{n}-\theta ^{0})\) with the m out of n bootstrap distribution of \(m^{1/3}(\theta _{m}-\theta _{n})\). To achieve this goal, we have used the Kolmogorov distance between the distributions of \(n^{1/3}(\theta _{n}-\theta ^{0})\) and \(m^{1/3}(\theta _{m}-\theta _{n})\) by averaging over 1000 and 1500 m out of n bootstrap sample drawn from one chosen arbitrarily random sample. Table 1 displays the results for \(n=250\), \(n=1000\) and \(n=2000\) which show that the most accurate estimates are given for the choices of \(m=50\), \(m=60\) and \(m=110\) respectively. Deviations from these choices in either direction result in deteriorating accuracy. In Figs. 1, 2 and 3, we give the empirical distribution of the true distribution and the empirical distribution of the bootstrapped one for some values of m given in Table 1, which each figure compares the estimated bootstrap empirical distribution with those of \(n^{1/3}(\theta _{n}-\theta ^{0})\) for the different values of n. All these figures show that the classical bootstraps (n out n bootstrap) fail while the m out n bootstraps are consistent. Figures 4, 5 and 6 show the root mean squared error (RMSE) of the estimator \(\theta _{m}\) for several values of m given in Table 1, for each value of n. One can see as in any other inferential context, the greater the sample size, the better.
5 Concluding Remarks
In the present work, we have considered the estimation of a parameter \(\varvec{\theta }\) that maximizes a certain criterion function depending on an unknown, possibly infinite-dimensional nuisance parameter h. We have followed the common estimation procedure by maximizing the corresponding empirical criterion, in which the nuisance parameter is replaced by some nonparametric estimator. We show that the M-estimators converge weakly to maximizers of Gaussian processes in an abstract setting permitting a great flexibility for applications. We have established that the m out of n bootstrap, in this extended setting, is weakly consistent under conditions similar to those required for weak convergence of the M-estimators in the general framework of Lee (2012), when an additional difficulty comes from the nuisance parameters. The goal of this paper is therefore to extend the existing theory on the bootstrap of the M-estimators, this generalization is far from being trivial and harder to control the nuisance parameter in non-standard framework, which form a basically unsolved open problem in the literature. This requires the effective application of large sample theory techniques, which were developed for the empirical processes. Examples of applications are given to illustrate the generality and the usefulness of our results. It would be interesting to extend the results to a dependent framework, this would require further theory which is out of the scope of the present article. An important question is how to extend our findings to the setting of incomplete data (censored data, missing data, etc). This will be a subject of investigation for future work.
6 Mathematical Developments
In this section, we give the proofs of the asymptotic results of our M-estimator \(\varvec{\theta }_{n}\) and its bootstrap version.
Proof of Theorem 3.2
Part (i) follows directly from Theorem 1 of Delsol and Van Keilegom (2020). For (ii), note that (AB1) and (A2) imply that
By using the result in Lemma 3.6.16 of van der Vaart and Wellner (1996). We have; for every \(\eta >0\) there is \(\delta >0\), such that
Making use of the assumption (AB3), there is \(n_{0} \in \mathbb {N}\), such that for every \(n\ge n_{0}\), we obtain the existence of \(\delta ^{\prime }>0\), such that \(\delta -\widehat{R}_{n}\ge 4\delta ^{\prime }\) i.p., and the last expression is bounded by:
By using the assumptions (AB1), (A3), (AB3) in combination with (10), we obtain the desired result.\(\square\)
Proof of Theorem 3.5
Firstly note that, we will give the proof of this theorem for the particular choice of function
It worth noticing that this condition is in agreement with those used in Lee (2012) in the parametric setting. Let \(\beta _{n}\) be the \(o_{\mathbb {P}^{*}}(1)\) in assumption (B3) and we define the sets
we observe
Our objective is to show that; for any \(\epsilon >0\), there exists \(\tau _{\epsilon }>0\) such that
for any n sufficiently large. In the sequel, we work with arbitrary fixed \(\epsilon >0\). For any \(\delta ,\delta _{1},M,K,K^{'}>0\), by using the condition (A5), we readily obtain
where
Indeed, we can write
Condition (C) implies, for all \(\delta >0\), that there exists \(n_{\epsilon }\), such that, for \(n>n_{\epsilon }\), we have
By the definitions of \(R_{n}\), \(W_{n}\) and under condition (B1), there exist \(\delta _{1}, K_{\epsilon }, K^{'}_{\epsilon }\) and \(K_{2,\epsilon }\) such that we have
For n larger than some \(n_{1}\). We fix \(\delta <\delta _{0}\) and suppose \(n \ge \max (n_{0},n_{1},n_{\epsilon })\), for \(2^{j} \le \delta r_ {n}\), we have the assumptions (B2) and (B3) are fulfilled on all \(S_{j,n}\). For each fixed j such that \(2^{j} \le \delta r_{n}\), under assumption (B3), for all \(\varvec{\theta } \in S_{j,n}\), we then have
Consequently, we obtain the following inequalities;
For any \(\lambda >0\), we can find a non-decreasing function \(\xi\) such that
It follows that \(\frac{2^{-j\lambda }\ell (2^{-j}r_{n})}{\ell (r_{n})}\) is uniformly bounded for \(M \le j \le \log _{2}\delta r_{n}\) and for all n. Making use of the condition (B2) in combination with the Chebyshev’s inequality and the fact that \(\varphi (c\delta ) \le c^{\alpha }\delta\) for all \(c \ge 1\), there exists a positive constant \(C^{\prime }\) and for any \(\lambda >0\), we have
where \(\nu ^{\prime }=\nu -\lambda >\alpha\). By choosing small value of \(\lambda\) and by using the properties of the function \(\varphi (\cdot )\), we infer that
the last expression tends to 0 as \(M\rightarrow \infty\), so we obtain the result (i) of our theorem for sufficiently large value of M and n.
For (ii) we have :
We obtain from assumption (BB3), for each fixed j such that \(2^{j}< \delta r_{m}\) and for all \(\varvec{\theta }\in S_{m,j}\)
This gives us, by using Chebyshev’s inequality, for some \(C^{\prime }>0\)
From assumptions (B2) and (BB2) the outer expectation of the first term in right of (13) is bounded by
with assumption (AB1) in mind the last two terms converge to 0 as \(M,n \rightarrow \infty\), the outer expectation of the others terms in (13) are \(o_{\mathbb {P}_{W}^{*}}(1)\) i.p., by Lemma 3 of Cheng and Huang (2010), which completes the proof of Theorem 3.5.\(\square\)
Proof of Theorem 3.8
The proof of the first part (i) of Theorem 3.8 is given in Lemmas 1, 2 and 3 of Delsol and Van Keilegom (2020), where in our setting we use bounded closed subsets in the place of compact subsets. We note by their Lemma 2, we obtain the existence of \(\xi _{1,n},\xi _{2,n},\xi _{3,n}\) such that
and the following decomposition
By their Lemma 3, the properties of the function \(\gamma \mapsto W_{n}(\gamma )\) and the assumptions of Theorem 3.8; we obtain the weak convergence of the process
Briefly, we have the following decomposition;
where
and
The process \(\gamma \mapsto T_{1,n}(\gamma )\) does not depend on the estimation of nuisance parameter, so it can be studied in a similar way as in the parametric model, by Theorem 2.11.1 of van der Vaart and Wellner (1996) and the use of assumptions (C4), (C5), (C9) and (C10), we obtain its uniformly asymptotic equicontinuity. For the process \(\gamma \mapsto T_{2,n}(\gamma )\), we can show that it is asymptotically uniformly equicontinuous by the same method given in the proof of their Lemma 3. By Theorem 1.5.7 and 1.5.4 of van der Vaart and Wellner (1996), we obtain the asymptotic tightness and the weak convergence of \(T_{n}\) to \(\Lambda +\mathbb {G}\) in \(\ell (\mathcal {K})\) and using Addendum 1.5.8 in the same reference; the almost all paths of the limiting process on \(\mathcal {K}\) are uniformly continuous with respect to \(\Vert \cdot \Vert\). Finally by Slutsky’s theorem we obtain the desired result.\(\square\)
For part (ii) we are in the situation to show the weak convergence of the bootstrapped process, which follows directly from Slutsky’s theorem and Lemmas 6.1 and 6.2 given bellow.
Lemma 6.1
Let \(\mathcal {K}=\{\varvec{\varvec{\gamma }}\in \mathbf {E} : \Vert \varvec{\gamma }\Vert \le K\}.\) Then under assumptions of part (ii) of Theorem 3.8, for all \(\gamma \in \mathcal {K}\), there exist \(z_{0,m}\), \(z_{1,m}\), \(z_{2,m}\), such that
and
Proof of Lemma 6.1
We need to introduce the following notation
The set \(\mathcal {K}\) is bounded and \(\varvec{\theta }_{0}\) belongs to the interior of \(\Theta\), there exist \(m_{\mathcal {K}}\) such that for all \(m \ge m_{\mathcal {K}}\) and for all \(\gamma \in \mathcal {K}\), the quantity \(\varvec{\theta }_{0}+\frac{\gamma }{r_{m}}\) is in \(\Theta\). Then for all \(\gamma \in \mathcal {K}\) we have;
This can be rewritten as follows
where;
We get from the assumption (CB1) and (CB3) that
By combining (14) and (15), we infer that
where
and
It is easily to show that
by using assumptions (A2), (AB1) (C3), (CB2), (CB3) and Lemma 3 of Cheng and Huang (2010).\(\square\)
Lemma 6.2
Under the assumptions of Lemma 6.1, the process
converges weakly conditionally in probability to the process
Proof of Lemma 6.2
Making use of the assumption (CB4), we need only to show the equicontinuity of the process
One can see that the process \(T_m\) can be decomposed into the sum three processes in the following way
where
We shall study separately the properties of each process. Firstly, we note that by assumption (C6), (CB3) and (AB1), for sufficiently large m, we have \(\varvec{\theta }_{0}+\frac{\mathcal {K}}{r_{m}} \subset \Theta\) and then the processes \(T_{1,m}\), \(T_{2,m}\) and \(T_{3,m}\) take values in \(\ell ^{\infty }(\mathcal {K})\). The process \(T_{2,m}\) can be treated as in the proof of part (i) by reformulating it to this form
as in the proof of (i) apply Theorem 2.11.22 of van der Vaart and Wellner (1996) to the process
by assumptions (C4), (C5), (C9) and (C10) we get its uniform equicontinuity with respect to \(\Vert \cdot \Vert\) on \(\mathcal {K}\) and by the use of assumption (AB1), we obtain our main result for the process \(T_{2,m}\). Then the process \(T_{1,m}\) also does not depend to the estimation of the nuisance parameter, it can be treated in the same way as in part (ii) of Lemma 2 in Lee (2012). Briefly, we want to show that
Define the class
and let
its envelope function. Making use of the condition (B2), we readily infer that
It follows by Lemmas 2.9.1 and 3.6.6 of van der Vaart and Wellner (1996) that, for \(1 \le n_{0} \le n\);
where \(\tilde{N}=(\tilde{N}_{1}, \tilde{N}_{2}, \ldots )\) and \(\varepsilon =\left( \epsilon _{1}, \epsilon _{2}, \ldots \right)\) are independent sequences of i.i.d. symmetrized Poisson variables with parameter m/(2n) and i.i.d. Rademacher variables, respectively and both being independent of \(\mathbf {X}_{1},\ldots , \mathbf {X}_{n}\). By Jensen’s inequality, problem 3.6.3 of van der Vaart and Wellner (1996) and (B2), we readily get
By taking
it follows, by condition (C4), that
We refer to the integrand in (C9) by \(\mathcal {N}(\epsilon )\) for \(\epsilon >0\). By using the triangular inequality, the properties of sub-Gaussianity of Rademacher processes, under (C10) and using the Cauchy-Schwarz inequality with (B2), we obtain
where
Our aim is to show that
for an arbitrary subsequence \(\left\{ n_{k}: k=1,2, \ldots \right\}\) of \(\{n\},\) and any arbitrary sequence \(\left\{ j_{k}\right\}\) such that \(n_{0, n_{k}} \le j_{k} \le n_{k}\) for all \(k=\) \(1,2, \ldots .\) Write \(m_{k}^{*}=m_{n_{k}}.\) Define, for any \(\gamma \in \mathcal {K}\),
As in the proof of part (ii) of Lemma 2 of Lee (2012); he showed these variables satisfy the condition of Theorem 2.11.1 of van der Vaart and Wellner (1996), which is implies our result in (20) for arbitrary subsequence \(n_{k}\) and \(j_{k} \in \left[ n_{0, n_{k}}, n\right]\), by arguing as in the proof of this Theorem. It then follows by the dominated convergence theorem that the bound in (19) has limsup equal to 0 as \(n \rightarrow \infty\). Substituting this and (18) into (17) to obtain the desired result. Finally, for the process \(T_{3,m}\), for large value of m, we have \(\varvec{\theta }_{0}+\frac{\gamma }{r_{m}} \in \Theta\) by using the assumption (CB3), we get, for all \(0< \delta < \delta _{1}\),
where
and \(\alpha _{m}=O_{\mathbb {P}_W^{*}}(1)\) i.p. uniformly over \(\delta \le \delta _{1}\). From this, we obtain, for any \(\epsilon >0\) and \(\eta >0\),
By choosing \(C_{\eta }\) such that the last term is bounded by \(\eta\) for large value of m, and taking \(\delta \le \delta _{1} \wedge \left( \frac{\epsilon }{2C_{\eta }} \right) ^{\frac{1}{\tau }}\), which implies the main result for the process \(T_{3,m}\). Finally by using the fact that
we obtain the desired result on the process \(T_{m}\).\(\square\)
Proof of Theorem 3.9
Making use of the result (i) in Theorem 3.8 in connection with the assumption (C8), we infer that we have almost all paths of the process \(\gamma \mapsto \mathbb {G}(\gamma )+\Lambda (\gamma )\) are uniformly continuous on every \(\mathcal {K} \subset \mathcal {B}\), and reaching the supremum at an unique point \(\gamma _{0}\). For part (i), an application of (i) in Theorem 3.8, for any closed bounded \(\mathcal {K} \subset \mathcal {B}\), gives
We get from the assumption (C11) that
Noting that \(\gamma _{0}\) is the unique, well-separated, maximizer of \(\mathbb {G}(\gamma )+\Lambda (\gamma )\), then part (i) follows by Theorem 3.2.2 of van der Vaart and Wellner (1996), where compact sets and uniform tightness of \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) are replaced respectively by closed bounded sets with similar structure as the set \(\mathcal {K}\) and
For part (ii), we infer that
By combining the assumption (CB5) with the first part of (CB1), we have respectively
and
An application of Lemma 4(ii) of Lee (2012) gives
It follows from the first part of the assumptions (C1), (AB1) and Slutsky’s theorem that
Hence the proof of the statement (ii) is complete.\(\square\)
References
Alin A, Martin MA, Beyaztas U, Pathak PK (2017) Sufficient m-out-of-n(m/n) bootstrap. J Stat Comput Simul 87(9):1742–1753
Allaire G (2005) Analyse numérique et optimisation: une introduction à la modélisation mathématique et à la simulation numérique. Editions Ecole (Polytechnique)
Alvarez-Andrade S, Bouzebda S (2013) Strong approximations for weighted bootstrap of empirical and quantile processes with applications. Stat Methodol 11:36–52
Alvarez-Andrade S, Bouzebda S (2015) On the local time of the weighted bootstrap and compound empirical processes. Stoch Anal Appl 33(4):609–629
Alvarez-Andrade S, Bouzebda S (2019) Some selected topics for the bootstrap of the empirical and quantile processes. Theory Stoch Process 24(1):19–48
Arcones MA, Giné E (1992) On the bootstrap of M-estimators and other statistical functionals. In Exploring the limits of bootstrap (East Lansing, MI, 1990), Wiley Ser Probab Math Statist Probab. Math. Statist., pages 13–47. Wiley, New York
Bickel PJ, Sakov A (2008) On the choice of m in the m out of n bootstrap and confidence bounds for extrema. Statist Sinica 18(3):967–985
Bickel PJ, Klaassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation for semiparametric models. Johns Hopkins Series in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD
Bickel PJ, Götze F, van Zwet WR (1997) Resampling fewer than n observations: gains, losses, and remedies for losses. Statist Sinica 7(1), 1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995)
Bose A, Chatterjee S (2001) Generalised bootstrap in non-regular M-estimation problems. Statist Probab Lett 55(3):319–328
Bouzebda S (2010) Bootstrap de l’estimateur de Hill: théorèmes limites. Ann ISUP 54(1-2), 61–72
Bouzebda S, Limnios N (2013) On general bootstrap of empirical estimator of a semi-Markov kernel with applications. J Multivariate Anal 116:52–62
Bouzebda S, Papamichail C, Limnios N (2018) On a multidimensional general bootstrap for empirical estimator of continuous-time semi-Markov kernels with applications. J Nonparametr Stat 30(1):49–86
Chen X, Linton O, Van Keilegom I (2003) Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71(5):1591–1608
Cheng G, Huang JZ (2010) Bootstrap consistency for general semiparametric M-estimation. Ann Statist 38(5):2884–2915
Chernick MR (2008) Bootstrap methods: a guide for practitioners and researchers. Wiley Series in Probability and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, second edition
Datta S, McCormick WP (1995) Bootstrap inference for a first-order autoregression with positive innovations. J Amer Statist Assoc 90(432):1289–1300
Davison AC, Hinkley DV (1997) Bootstrap methods and their application, volume 1 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge. With 1 IBM-PC floppy disk (3.5 inch; HD)
Delsol L, Van Keilegom I (2020) Semiparametric M-estimation with non-smooth criterion functions. Ann Inst Statist Math 72(2):577–605
Dudley RM (1999) Uniform central limit theorems, volume 63 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge
Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Statist 7(1):1–26
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap, volume 57 of Monographs on Statistics and Applied Probability. Chapman and Hall, New York
El Bantli F (2004) M-estimation in linear models under nonstandard conditions. J Statist Plann Inference 121(2):231–248
Giné E, Zinn J (1989) Necessary conditions for the bootstrap of the mean. Ann Statist 17(2):684–691
Götze F, Račkauskas A (2001) Adaptive choice of bootstrap sample sizes. In State of the art in probability and statistics (Leiden, 1999), volume 36 of IMS Lecture Notes Monogr. Ser., pages 286–309. Inst. Math. Statist., Beachwood, OH
Hall P (1992) The bootstrap and Edgeworth expansion. Springer Series in Statistics. Springer-Verlag, New York
Hall P, Horowitz JL, Jing B-Y (1995) On blocking rules for the bootstrap with dependent data. Biometrika 82(3):561–574
Hoffmann-Jørgensen J (1991) Stochastic processes on Polish spaces. Various publications series. Aarhus Universitet, Matematisk Institut
Ichimura H (1993) Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J Econometrics 58(1–2):71–120
Kim J, Pollard D (1990) Cube root asymptotics. Ann Statist 18(1):191–219
Kosorok MR (2008) Introduction to empirical processes and semiparametric inference. Springer Series in Statistics, Springer, New York
Koul HL, Müller UU, Schick A et al (2012) The transfer principle: a tool for complete case analysis. Ann Stat 40(6):3031–3049
Kristensen D, Salanié B (2017) Higher-order properties of approximate estimators. J Econometrics 198(2):189–208
Lahiri SN (1992) On bootstrapping M-estimators. Sankhyā Ser A 54(2):157–170
Lee SMS (2012) General M-estimation and its bootstrap. J Korean Statist Soc 41(4):471–490
Lee SMS, Pun MC (2006) On m out of n bootstrapping for nonstandard M-estimation with nuisance parameters. J Amer Statist Assoc 101(475):1185–1197
Lee SMS, Yang P (2020) Bootstrap confidence regions based on M-estimators under nonstandard conditions. Ann Statist 48(1):274–299
Ma S, Kosorok MR (2005) Robust semiparametric m-estimation and the weighted bootstrap. J Multivar Anal 96(1):190–217
Müller UU et al (2009) Estimating linear functionals in nonlinear regression with responses missing at random. Ann Stat 37(5A):2245–2277
Pakes A, Olley S (1995) A limit theorem for a smooth class of semiparametric estimators. J. Econometrics 65(1):295–332
Pakes A, Pollard D (1989) Simulation and the asymptotics of optimization estimators. Econometrica 57(5):1027–1057
Pérez-González A, Vilar-Fernández JM, González-Manteiga W (2009) Asymptotic properties of local polynomial regression with missing data and correlated errors. Ann Inst Stat Math 61(1):85–109
Pfanzagl J (1990) Estimation in semiparametric models, vol 63. Lecture Notes in Statistics. Springer-Verlag, New York, Some recent developments
Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer Series in Statistics. Springer-Verlag, New York
Pollard D (1985) New ways to prove central limit theorems. Economet Theor 1(3):295–313
Shao J, Tu DS (1995) The jackknife and bootstrap. Springer Series in Statistics. Springer-Verlag, New York
Swanepoel JWH (1986) A note on proving that the (modified) bootstrap works. Comm Statist A-Theory Methods 15(11):3193–3203
van de Geer SA (2000) Applications of empirical process theory, volume 6 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge
van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York. With applications to statistics
Wei B, Lee SMS, Wu X (2016) Stochastically optimal bootstrap sample size for shrinkage-type statistics. Stat Comput 26(1–2):249–262
Wellner JA, Zhan Y (1996) Bootstrapping Z-estimators. Preprint
Zhan Y (2002) Central limit theorems for functional Z-estimators. Statist. Sinica 12(2):609–634
Acknowledgements
The authors are indebted to the Editor-in-Chief, Associate Editor and the referee for their very valuable comments, suggestions careful reading of the article which led to a considerable improvement of the manuscript. The Third author gratefully acknowledges the funding received towards his PhD from the Algerian government PhD fellowship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Applications
Appendix: Applications
We present in this section some examples which can not handled with the classical theory of semiparametric estimators and their m out of n bootstrap version cannot be applied while theory of the paper can be applied. This illustrate the usefulness of our results. Delsol and Van Keilegom (2020) provided some examples of situations in which the existing theory on semiparametric estimators cannot be applied, whereas their result could be applied. It worth noticing that the aim of this section is to verify the bootstrap conditions that are different from those used for the non bootstrapped estimators checked in the last mentioned reference. Although only three examples will be given here, they stand as archetypes for a variety of models that can be investigated by the methodology of the present paper.
1.1 Single Index Model with Monotone Link Function
The single index regression models are typical examples which are given
where \(\mathbb {P}(\varepsilon | \mathbf{X})=0, {\text {Var}}(\varepsilon | \mathbf{X})<\infty\) and we assume that the unknown function \(g(\cdot )\) is monotone, we refer to Ichimura (1993) for more details. On the basis of the sample \(\left( \mathbf{X}_{1}, Y_{1}\right) , \ldots ,\left( \mathbf{X}_{n}, Y_{n}\right)\) coming from the model (22), we make use of the the pool-adjacent-violators algorithm to constrcut and estimator of the function \(g(\cdot )\). This gives a non-smooth estimator \(\widehat{g}_{\varvec{\beta }}(\cdot )\) of \(g_{\varvec{\beta }}(\mathbf {z})=\mathbb E\left[ Y |\mathbf{X}^{\top } \varvec{\beta }=\mathbf {z}\right] .\) Next, by using the least-squares estimation method we estimate \(\varvec{\beta }\)
By the fact that \(\widehat{g}_{\varvec{\beta }}(\cdot )\) is of non-smooth nature implies that the criterion function is not smooth in \(\varvec{\beta }\). This is a situation where the theory of the present paper can be applied.
1.2 Classification with Missing Data
Let \(\mathbf {X}_{1}=(\mathbf {X}_{11},\mathbf {X}_{12}),\ldots ,\mathbf {X}_{n}=(\mathbf {X}_{n1},\mathbf {X}_{n2})\) be independent and identically distributed random copies of the random vector \(\mathbf {X}=(\mathbf {X}_{1},\mathbf {X}_{2})\), coming from two underlying populations. For \(j=0,1\), let \(\mathbf {Y}_{i}=j\) when the \(\mathbf{X}_i\) comes from the population j. Let us denote by \(\mathbf {Y}\) the population indicator associated with the vector \(\mathbf {X}\). Using the information of available data, we seek to find a classification method for novel observations with unknown true population.
The classification is performed by regressing \(\mathbf {X}_{2}\) on \(\mathbf {X}_{1}\) making use of the parametric criterion function \(f_{\varvec{\theta }}(\cdot )\), and choosing \(\varvec{\theta }\) that maximize the following
Let \(\varvec{\theta }_{0}\) denote the maximizer of (23) with respect to all \(\varvec{\theta } \in \varvec{\Theta }\), here \(\varvec{\Theta }\) is assumed to be a compact subset of \(\mathbb {R}^{k}\) containing as an interior point \(\varvec{\theta }_{0}\). Now assume that \(\mathbf {Y}_{i}\)’s are subject to some missing mechanism. Let \(\Delta _{i}\)be a random variable (respectively \(\Delta\)) equals to 1 when we observe the random variable \(\mathbf {Y}_{i}\) (respectively \(\mathbf {Y}\)), and 0 otherwise. Let \(\mathbf {Z}_{1}=(\mathbf {X}_{1},\mathbf {Y}_{1}\Delta _{1},\Delta _{1}),\ldots , \mathbf {Z}_{n}=(\mathbf {X}_{n},\mathbf {Y}_{n}\Delta _{n},\Delta _{n})\) be the observations at hand. The missing at random mechanism in considered in the following sense
Note that the relation (23) can be written
We define
here the infinite dimensional nuisance parameter \(p(\cdot )\) belonging to some functional space \(\mathcal {P}\) to be specified later. Consequently, the estimator \(\varvec{\theta }_{n}\) of \(\varvec{\theta }_{0}\) is given by
where, for any x and a bandwidth sequence \(h=h_{n}\),
where the kernel function \(K(\cdot )\) is assumed to be a density function having support \([-1,1]\), \(K_{h}(u)=\frac{K\left( \frac{u}{h}\right) }{h}\). Non parametric regression with missing have long attracted a great deal of attention, for good sources of references to research literature in this area along with statistical applications consult Müller (2009), Pérez-González et al. (2009) and Koul et al. (2012) among many others.
1.3 Binary Choice Model with Missing Data
Let us define the binary choice model, in the linear regression function framework, by
where we assume that \(\varepsilon\) is zero median conditionally on \(\mathbf{X} .\) The random variable Y is missing at random with the probability, to observe Y, depending on \(\mathbf{X}\) via the following relation
where \(\Delta =1\) when we observe Y and 0 elsewhere. The observed data for the preceding model are given by of i.i.d. triplets \(\left( \mathbf {X}_{1}, Y_{1} \Delta {1}, \Delta _{n}\right) ,\ldots ,\left( \mathbf {X}_{n}, Y_{n} \Delta _{n}, \Delta _{n}\right)\). To estimate \(p_{\gamma }(z)=\) \(\mathbb {P}\left( \mathrm{1\!I}_{ \{\Delta =1\}} | \mathbf {X}^{\top } \varvec{\gamma }=z\right) ,\) we use the following
The parameter estimate is given by
where
The existing theory cannot be applied here by the fact that the function \(\mathbf {m}_{\varvec{\beta }, \varvec{\gamma }, p}\) is smooth in \(\varvec{\gamma }\) but non-smooth in \(\varvec{\beta }\).
Now we will study in full detail the example in Sect. 7.2 and we work out the verification of the conditions of Theorems 3.2, 3.5, 3.8 and 3.9 the most of this conditions verified in Sect. 7 of Delsol and Van Keilegom (2020) by noting that \(\nu =2\) and \(\ell \equiv 1\), so our focuses is to verify the conditions needed for the m out of n bootstrapped version. In the beginning we give some information about the nuisance function and her space and some notation. Let \(\mathcal {P}\) be the space of functions \(p:\mathbf {R}_{\mathbf {X}_{1}}\rightarrow \mathbb {R}\) that are continuously differentiable, for which
where
and \(\mathbf {R}_{\mathbf {X}_{1}}\) is the support of \(\mathbf {X}_{1}\), where we suppose it is a compact subspace of \(\mathbb {R}\). We equip the space \(\mathcal {P}\) with the supremum norm:
After, the conditions of the consistency are verified as follows, (A1) holds true provided the functions \(p_{0}(\cdot )\) and \(K(\cdot )\) are continuously differentiable. For assumption (A2) we can showing that the bracketing number of the class \(\mathcal {F}=\{\mathbf {m}_{\varvec{\theta },p}, \varvec{\theta }\in \Theta , p\in \mathcal {P}\}\); \(N_{[~]}\left( \epsilon , \mathcal {F}, \mathbb {L}_{\mathbb {P}}\right)\) is finite for all \(\epsilon >0\), by using Corollary 2.7.2 of van der Vaart and Wellner (1996), we get
and
by the properties of the set \(\mathcal {P}\) and the fact that \(\mathbf {x}\mapsto f_{\varvec{\theta }}(\mathbf {x})\) is continuously differentiable over \(\varvec{\theta }\) with bounded derivative and as a consequence it’s easily to show that
for the class \(\mathcal {T}=\left\{ \left( \mathbf {x}_{1}, \mathbf {x}_{2}\right) \rightarrow \mathrm{1\!I}_{\{x_{2} \ge f_{\varvec{\theta }}\left( \mathbf {x}_{1}\right) \}}: \varvec{\theta } \in \varvec{\Theta }\right\}\). From (24) and (25) we get;
Then assumption (A3) is straightforward. Assumption (A4) is an identifiability condition to ensure the uniqueness of \(\varvec{\theta }_{0}\) and (A5) is verified by construction of the estimator \(\varvec{\theta }_{n}\). The consistency of \(\varvec{\theta }_{n}\) is then follows. For the conditions of the bootstrap version they are verified as follows; fist part of assumption (AB1) is satisfied by definition of the m out of n bootstrap, where the second part in this situation follows directly by noting that if \(r_{n}=n^{\kappa }\), we get \(r_{m}=m^{\kappa }\) for some \(\kappa >0\), by consequent we have \(r^{2}_{m}=o(r^{2}_{n})\). For (AB2) as mentioned in remark 3.1(v) we take \(\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )\) where we replace the variables \(\mathbf {X}_{1i}\) and \(\Delta _{i}\) by \(\mathbf {X}^{*}_{1i}\) and \(\Delta ^{*}_{i}\) respectively in \(\widehat{p}(\cdot )\); i.e.,
we remark that
which implies \(d_{\mathcal {H}}\left( \widehat{p}_{m},\widehat{p}\right) =o_{\mathbb {P}^{*}_{W}}(1)\) i.p. By the triangular inequality we get
(AB3) is verified by construction of the estimator \(\varvec{\theta }_{m}\). Which implies the consistency of \(\varvec{\theta }_{m}\). Next for the rate of convergence we show only conditions (B2) and (B3). For (B2), it suffices by remark 3.3(ii) to show (4) and (5). For that by uses of the relation between covering and bracketing numbers and Corollary 2.7.2 of van der Vaart and Wellner (1996) we get that
for every probability measure \(\mathbb {Q}\) on \(\mathbb {R}^{4}\), which implies our relation in (4), (5) is verified by the choice \(\varphi (\delta )=\sqrt{\delta }\) as consequence we get (B2). For (B3), it follows directly like in Sect. 7 of the same reference which described this example and by the choice of the two functions \(\varvec{\psi }_{1}(\cdot )\) and \(\varvec{\psi }_{2}(\cdot )\) given in Remark 3.3(iii), which implies (B3). By their discussion for the rates \(r_{n}\), \(v_{n}\) and the bandwidth h of the kernel; it follows
We verify the assumption (BB1) as in the verification of condition (AB2) by choosing \(\widehat{p}_{m}(\cdot )=\widehat{p}(\cdot )\) we get \(v^{-1}_{m}=\sqrt{\frac{\log m}{mh}}+h\), where \(h=h_{m}.\) Assumption (BB2) holds by the same argument given for (B2). For assumption (BB3), we check conditions (b)-(d) of Remark 3.3(iii). We obtain
and
provided the derivatives in \(\Lambda \left( \varvec{\theta }_{0}, p\right)\) all exist. By the definition of maximum it follows that \(\Gamma \left( \varvec{\theta }_{0}, p^{0}\right) =0\) and \(\Lambda \left( \varvec{\theta }_{0}, p^{0}\right)\) is negative. Noting that
if \(r_{m}\) satisfies
by noting that the expectation in (26) is taken with respect to \(\mathbf {Z}\) and \(\mathbf {W}\) when we are working with \(\widehat{p}_{m}\), since our function are measurable, we obtain such result by applying Fubini’s Theorem. This condition on \(r_{m}\) and the other given in (BB2) which is satisfied for \(r_{m}=O(m^{1/3})\) are reconcilable provided
Note that if we assume that \(p^{0}(\cdot )\) is twice continuously differentiable we can weaken the first condition to \(mh_{m}^{6}=O(1)\), as a consequence we get the \(v^{-1}_{m}\) of \(\widehat{p}_{m}\) would be \(O\left( \sqrt{\frac{\log m}{mh_{m}}}+h^{2}_{m}\right)\), which is faster than \(r^{-1}_{m}=m^{-1/3}\) of \(\varvec{\theta }_{m}\) provided \(mh_{m}^{3} \longrightarrow \infty\). The level of complexity of the latter case is less than the case where \(p^{0}\) is only once differentiable, And we do not discuss it any further, therefore. We conclude that,
Finally, for the weak convergence of \(\varvec{\theta }_{n}\), we note that our assumptions (C4) is satisfied for \(j_{n}=\sqrt{n}\) like in Remark3.5 (iii) and (C9) hold similarly to (B2). By consequence \(n^{1/3}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) converge weakly, where assumption (CB1) follows from part (ii) of Theorem 3.5 and condition (BB1), by similar proof of condition (BB2) we get (CB2). We get from Remark 3.3 (iii), (vi) and Remark 3.5 (viii) that assumption (CB3) holds, provided that
Clearly we have for some positive constant \(c>0\) that \(m^{-1/3} < C\). For assumption (CB4), we have
provided \(mh^{3}_{m}=o(1)\) and \(\frac{\log ^{3/2}m}{mh^{3/2}_{m}}=o(m^{-1/2})\), by using what we discuss already for (BB3). Next, by the result given to the process in (16) i.e., the process \(\gamma \mapsto \mathbb {G}_{n}\frac{r^{2}_{m}}{\sqrt{m}}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\) converges weakly to the process \(\mathbb {G}(\gamma )\) and condition (AB1), we get
with \(\Gamma (\varvec{\theta }_{0},p^{0})=0\) and
The process \(\gamma \rightarrow r^{2}_{m}\left( \widehat{\mathbb {P}}_{m}-\mathbb {P}_{n}\right) \widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},p^{0}}\) are the same given in Lee (2012) where there is no presence of nuisance parameter. Hence, we can follow the same steps given in Lemma 1 of Lee (2012) and get the convergence of the marginals using Lindeberg’s condition and some regularity assumption on \(f_{\mathbf {X}_{1}/\mathbf {X}_{2}}\) and \(\varvec{\theta }\mapsto f_{\varvec{\theta }}\). By construction of the estimator \(\varvec{\theta }_{m}\), condition (CB5) follows. Then we get the asymptotic distribution of \(r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{n})\) from part (ii) of Theorem 3.9.
Rights and permissions
About this article
Cite this article
Bouzebda, S., Elhattab, I. & Ferfache, A.A. General M-Estimator Processes and their m out of n Bootstrap with Functional Nuisance Parameters. Methodol Comput Appl Probab 24, 2961–3005 (2022). https://doi.org/10.1007/s11009-022-09965-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11009-022-09965-y
Keywords
- Gaussian process
- M-estimation
- Empirical process
- m out n of bootstrap
- Asymptotic distribution
- Nuisance parameter
- Semiparametric estimation
- Non standard distribution
- Missing data