1 Introduction

The semiparametric modeling has proved to be a flexible tool and provided a powerful statistical modeling framework in a variety of applied and theoretical contexts [refer to Pfanzagl (1990), Bickel et al. (1993), van der Vaart and Wellner (1996), van de Geer (2000), and Kosorok (2008). An important work to be cited is the paper of Pakes and Pollard (1989), where a general central limit theorem is proved for estimators defined by minimization of the length of a vector-valued, random criterion function with no smoothness assumptions. The last reference was extended in different settings, among many others, by Pakes and Olley (1995), Chen et al. (2003), Zhan (2002). Recall that the semiparametric models are statistical models where at least one parameter of interest is not Euclidean. The term “M-estimation” refers to a general method of estimation, where the estimators are obtained by maximizing (or minimizing) certain criterion functions. The most widely used M-estimators include maximum likelihood (MLE), ordinary least-squares (OLS), and least absolute deviation estimators. Notice that the major practical problem of maximum likelihood estimators is the lack of robustness, while many robust estimators achieve robustness at some cost in first-order efficiency. The appeal of the M-estimation method is that in addition to the statistical efficiency of the estimators when the parametric model is correctly specified, these estimators are also robust to contamination when the objective function is appropriately chosen. Throughout the available literature, investigations on the asymptotic properties of the M-estimators, as well as the relevant test statistics, have privileged the parametric case. However, in practice, we need more flexible models that contain both parametric and nonparametric components. This paper concentrates on this specific problem. To formulate the problem that we will treat in this paper, we need the following notation. Let \(\mathcal {X} = (\mathbf {X}_{1},\ldots , \mathbf {X}_{n})\) be n independent copies of a random element \(\mathbf {X}\) in a probability space \((\mathcal {S}, \mathcal {A}, \mathbb {P})\). For a Banach spaces \(\mathcal {B}\) and \(\mathcal {H}\) equipped with a norm \(\Vert \cdot \Vert\) and a metric denoted by \(d_{\mathcal {H}}(\cdot ,\cdot )\) respectively, let \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\) be a class of Borel measurable functions \(\mathbf {m}_{{\varvec{\theta }},h} : \mathcal {S} \rightarrow \mathbb {R}\), indexed by \({\varvec{\theta }}\) over some parameter space \({\varvec{\Theta }} \subset \mathcal {B}\) and \(h \in \mathcal {H}\), where \(\varvec{\theta }\) is the parameter of interest and \(h^{0}\) the true value of h consists of nuisance parameter. We define the empirical measure to be

$$\mathbb {P}_{n} = n^{-1} \sum _{i=1}^{n} \delta _{\mathbf { X}_{i}},$$

where, for \(\mathbf {x}\in \mathcal {S}\), \(\delta _\mathbf { x}\) is the measure that assigns mass 1 at \(\mathbf { x}\) and zero elsewhere. Let \(f(\cdot )\) be a real valued measurable function, \(f:\mathcal {S}\rightarrow \mathbb {R}\). In the modern theory of the empirical it is usual to identify \(\mathbb {P}\) and \(\mathbb {P}_{n}\) with the mappings given by

$$f\rightarrow \mathbb {P}f=\int fd\mathbb {P}, ~~~\text {and}~~~f\rightarrow \mathbb {P}_{n}f=\int fd\mathbb {P}_{n}=\frac{1}{n}\sum _{k=1}^{n}f(\mathbf {X}_{i}).$$

The M-estimand of interest \({\varvec{\theta }}_{0}\) and its corresponding M-estimator \({\varvec{\theta }}_{n}\) are assumed to be well-separated maximizers of the processes \(\Big \{\mathbb {P}\mathbf {m}_{{\varvec{\theta }},h^{0}} : {\varvec{\theta }} \in {\varvec{\Theta }}\Big \}\) and \(\Big \{\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}} : {\varvec{\theta }} \in {\varvec{\Theta }}\Big \}\) for a given consistent sequence of estimators \(\widehat{h}\) for \(h^{0}\), respectively. Under suitable entropy conditions on \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\) (defined below) and moment conditions on its envelope, we show that there exist norming sequences \(\{\alpha _{n}\}\) and \(\{r_{n}\}\) such that the random process \(\Big \{\alpha _{n}\mathbb {P}_{n}(\mathbf {m}_{{\varvec{\theta }}_{0}+\gamma /r_{n},\widehat{h}}- \mathbf {m}_{{\varvec{\theta }}_{0},\widehat{h}}) : \gamma \in K \Big \}\) converges weakly, in the sense of Hoffmann-Jørgensen (1991), see van der Vaart and Wellner (1996), in particular their Definition 1.3.3., to the process \(\{\mathbb {Z}(\gamma ) : \gamma \in K\}\), for each closed bounded subset \(K \subset \mathcal {B}\). It follows by an argmax continuous mapping theorem, refer to Kosorok (2008) in particular Chapter 14, that \(r_{n} ({\varvec{\theta }}_{n} -{\varvec{\theta }}_{0} )\) converges weakly to \(\arg \max _{\gamma } \mathbb {Z}(\gamma )\). The latter weak limit has a complicated form in general and does not permit explicit computation. It would therefore be of interest to estimate the sampling distribution of \(r_{n} ({\varvec{\theta }}_{n} -{\varvec{\theta }}_{0} )\) by the bootstrap for inferencing purposes. Bootstrap samples were introduced and first investigated in Efron (1979). Since this seminal paper, bootstrap methods have been proposed, discussed, investigated and applied in a huge number of papers in the literature. Being one of the most important ideas in the practice of statistics, the bootstrap also introduced a wealth of innovative probability problems, which in turn formed the basis for the creation of new mathematical theories. The bootstrap can be described briefly as follows. Let \(T(\mathbb {F})\) be a functional of an unknown distribution function \(\mathbb {F}(\cdot )\), \(\mathbf { X}_{1},\ldots ,\mathbf { X}_{n}\) a sample from \(\mathbb {F}(\cdot )\), and \(\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{n}\) an independent and identically distributed [i.i.d.] sample with common distribution given by the empirical distribution \(\mathbb {F}_{n}(\cdot )\) of the original sample. The distribution of \(\{T(\mathbb {F}_{n})-T(\mathbb {F})\}\) is then approximated by that of \(\{T(\mathbb {F}_{n}^{*})-T(\mathbb {F}_{n})\}\) conditionally on \(\mathbf { X}_{1},\ldots ,\mathbf { X}_{n}\), with \(\mathbb {F}^{*}_{n}(\cdot )\) being the empirical distribution of \(\mathbf { Y}_{1},\ldots ,\mathbf { Y}_{n}\). The key idea behind the bootstrap is that if a sample is representative of the underlying population, then one can make inferences about the population characteristics by resampling from the current sample. The asymptotic theory of the bootstrap with statistical applications has been reviewed in the books among others Efron and Tibshirani (1993) and Shao and Tu (1995). Chernick (2008), Davison and Hinkley (1997), van der Vaart and Wellner (1996), Hall (1992) and Kosorok (2008). A major application for an estimator is in the calculation of confidence intervals. By far the most favored confidence interval is the standard confidence interval based on a normal or a Student t-distribution. Such standard intervals are useful tools, but based on an approximation that can be quite inaccurate in practice. Bootstrap procedures are an attractive alternative. One way to look at them is as procedures for handling data when one is not willing to make assumptions about the parameters of the populations from which one sampled. The most that one is willing to assume that the data are a reasonable representation of the population from which they come. One then resamples from the data and draws inferences about the corresponding population and its parameters. The resulting confidence intervals have received the most theoretical study of any topic in the bootstrap analysis. Roughly speaking, it is known that the bootstrap works in the i.i.d. case if and only if the central limit theorem holds for the random variable under consideration. For further discussion we refer the reader to the landmark paper by Giné and Zinn (1989). It is worth noticing that some special examples reveal that the conventional bootstrap based on resamples of size n breaks down; see, for example, Bose and Chatterjee (2001) and El Bantli (2004). We focus on a modified form of bootstrap methods, known as the m out of n bootstrap, with a view to remedy the problem of inconsistency. The m out of n scheme modifies the conventional scheme by drawing bootstrap resamples of size \(m = o(n)\). See, for example, Bickel et al. (1997) for a review of this technique in a variety of contexts. For more recent references on the bootstrap one can refer to Bouzebda (2010), Bouzebda and Limnios (2013), Bouzebda et al. (2018), Alvarez-Andrade and Bouzebda (2013, 2015, 2019) and the reference therein. Denote by \({\varvec{\widehat{\theta }}}_m\) the M-estimator calculated from a bootstrap resample of size m. Weak convergence in probability of the conditional distribution of \(r_{m}({\varvec{\widehat{\theta }}}_{m} - {\varvec{\theta }}_{n})\) to the distribution of \(\arg \max _g \mathbb {Z}(g )\) is established under essentially similar conditions for weak convergence of \(r_{n} ({\varvec{\theta }}_{n}-{\varvec{\theta }}_{0})\), provided that \(m = o(n), m \rightarrow \infty\) and \(a^{2}_m m^{-1/2} \log n/ \log (n/m + 1) = o(1)\) for a fixed sequence \(\{a_{m}\}\) depending on the size of the envelope for \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\). The asymptotic properties of \({\varvec{\theta }}_{n}\) have been established by, among many others, Bose and Chatterjee (2001) and El Bantli (2004), under appropriate concavity or differentiability conditions. Empirical process methods are instrumental tools for evaluating the large sample properties of estimators based on semiparametric models, including consistency, distributional convergence, and validity of the bootstrap. In particular, modern empirical process theory provides a more general approach to theoretical investigation of general M-estimators; see, for example, Dudley (1999), Kim and Pollard (1990), Pollard (1985), van de Geer (2000) and van der Vaart and Wellner (1996). Most results obtained thus far are, however, restricted to the cases where the Gaussian process \(\mathbb {Z}\) has either quadratic mean function or quadratic covariance function. In order to establish stronger results which cover cases where the covariance and mean functions of \(\mathbb {Z}\) may take on more general structures, we will use the empirical process approach. Applications of the bootstrap to M-estimation have been investigated deeply in the literature extensively. Relevant theoretical results concern mostly M-estimators with \(r_{n} = n^{1/2}\) and asymptotically Gaussian limits. The most common technique for studying bootstrap M-estimators is the linearization which can not be used in a nonstandard setting. Under standard regularity conditions, the Edgeworth expansions for bootstrap distributions of finite-dimensional M-estimators are Lahiri (1992). Under a weak form of differentiability condition, Arcones and Giné (1992) investigated bootstrapping finite-dimensional \(n^{1/2}\)-consistent M-estimators and established an almost sure bootstrap central limit theorem. An in-probability bootstrap central limit theorem for possibly infinite-dimensional Z-estimators is investigated by Wellner and Zhan (1996). In the setting of the nonregular vector-valued M-estimators obtained from \(\mathbf {m}_{{\varvec{\theta }}}\) concave in \({\varvec{\theta }}\), Bose and Chatterjee (2001) investigated a weighted form of the bootstrap including the m out of n bootstrap is a special case. The M-estimation for linear models under nonstandard conditions was considered by El Bantli (2004), and proved that the m out of n bootstrap is consistent but the conventional bootstrap is not in general. The Bose and Chatterjee (2001) and El Bantli (2004) results are restricted to the case where \(\mathbb{Z}\) has a quadratic covariance function, concavity and differentiability assumptions. Lee and Pun (2006) prove m out of n bootstrap consistency for vector-valued M-estimators under twice-differentiability of the process \(\mathbb {P}\mathbf {m}_{{\varvec{\theta }}}\), where \({\varvec{\theta }}\) may contain a subvector of nuisance parameters, in which case the process \(\mathbb {Z}\) has a quadratic mean function. Lee (2012) gives the general result of m out of n bootstrap of M-estimators without the presence of a nuisance parameter. Under nonstandard conditions, Lee and Yang (2020) proposed a one-dimensional pivot derived from the criterion function, and prove that its distribution can be consistently estimated by the m out of n bootstrap, or by a modified version of the perturbation bootstrap. They provide a new method for constructing confidence regions which are transformation equivariant and have shapes driven solely by the criterion function.

The main purpose of the present work is to consider a general framework of non-smooth semi-parametric M-estimators extending the setting of Lee (2012) to the \(\mathcal {B}\)-valued M-estimators in presence of nuisance parameter where the rate of convergence of the nuisance parameter may be different of that of the parameter of interest. More precisely, we consider the m out n bootstrapped version of the M-estimator investigated in Delsol and Van Keilegom (2020), where these authors showed that, their M-estimator converges weakly to some process which is composed on Gaussian process and some deterministic continuous function, which is harder to evaluate for practical use. For that we propose in this paper as a solution to this problem the m out of n bootstrap. We mention at this stage that parameter \(\varvec{\theta }\), in the present paper, belongs to some Banach space which is different from the last mentioned work where the parameter of interest is euclidean. Hence, we restate the results of Delsol and Van Keilegom (2020) under more general conditions. The main aim of the present paper is to provide a first full theoretical justification of the m out of n bootstrap consistency of M-estimators with non-smooth criterion functions of the parameters and gives the consistency rate together with the asymptotic distribution of the parameters of interest \(\varvec{\theta }_0\). This requires the effective application of large sample theory techniques, which were developed for the empirical processes. The Lee (2012) results are not directly applicable here since the estimation procedures depend on some nuisance parameters. These results are not only useful in their own right but essential for the derivation of our asymptotic results.

The paper is organized as follows. Section 2 introduces the notation and assumptions. Section 3 states the main theorems. Though our main objective in the paper is theoretical, we provide in Sect. 4 Monte Carlo simulations of simulations to look at the method’s performance in practice. Some concluding remarks are given in Sect. 5. All proofs are gathered in Sect. 6. In the Appendix we apply our theorems and prove as corollaries new m out of n bootstrap consistency results for three examples.

2 Notation

We abuse notation slightly by identifying the underlying probability space \((\mathcal {S},\mathcal {A},\mathbb {P})\) with the product space \((\mathcal {S}^{\infty },\mathcal {A}^{\infty },\mathbb {P}^{\infty })\times (\mathcal {Z}, \mathcal {C} , \widetilde{P} )\). Now \(\mathbf {X}_{1},\dots , \mathbf {X}_{n}\) are equal to the coordinate projections on the first n coordinates. All auxiliary variables, assumed to be independent of the \(\mathbf {X}_{i}\), depend only on the last coordinate. We will use the usual notation of the empirical processes of van der Vaart and Wellner (1996). Let \(\mathbb {Q}\) denote some signed measure on \(\mathcal {S}\). Let \(\mathcal {F}\) be a class of measurable functions \(f : \mathcal {S} \rightarrow \mathbb {R}\). Define

$$\Vert \mathbb {Q}f\Vert _{\mathcal {F}} = \sup _{f \in \mathcal {F}} |\mathbb {Q}f |.$$

For any \(r \ge 1\), denote by \(L^{r} (\mathbb {Q})\) the class of measurable functions \(f : S \rightarrow \mathbb {R}\) with

$$\int |f|^{r}d\mathbb {Q}< \infty ,$$

where \(\mathbb {Q}\) is a probability measure. The \(L^{r}(\mathbb {Q})\)-norm \(\Vert \cdot \Vert _{\mathbb {Q},r}\) is defined by

$$\Vert f\Vert _{\mathbb {Q},r} =\left( \int |f|^{r} d\mathbb {Q}\right) ^{1/r},$$

for \(f \in L^{r}(\mathbb {Q})\). The essential supremum of \(f \in L^{\infty } (\mathbb {Q} )\)is denoted by \(\Vert f \Vert _{\mathbb {Q},\infty }\).

The covering number \(N(\epsilon ,\mathcal {F},L^{r}(\mathbb {Q}))\) of a function class \(\mathcal {F} \subset L^{r}(\mathbb {Q})\) is computed with respect to the \(L^{r}(\mathbb {Q})\)-norm for radius \(\epsilon > 0\). To be more precise, \(N(\epsilon ,\mathcal {F},L^{r}(\mathbb {Q}))\) is the minimum number of balls \(\{g : \Vert g-h\Vert _{\mathbb {Q},r} < \epsilon \}\) of radius \(\epsilon\) covering \(\mathcal {F}\).

For some random element \(\mathbf {Z}\), the probability measure induced by \(\mathbf {Z}\) is denoted by \(\mathbb {P}_\mathbf {Z}\), conditional on all other variables. The empirical process is defined to be

$$\mathbb {G}_n = n^{1/2} (\mathbb {P}_{n} - \mathbb {P} ).$$

The outer and inner probability measures derived from \(\mathbb {P}\) are designated by \(\mathbb {P}^{*}\) and \(\mathbb {P}_{*}\), respectively. Outer and inner probability measures to be understood in the sense used in the monograph by van der Vaart and Wellner (1996), in particular their definitions in page 6. Let T be any map from the underlying probability space to the extended real line \(\overline{\mathbb {R}}\). The minimal measurable majorant and maximal measurable minorant of T are denoted by \(T^{*}\) and \(T_{*}\), respectively. For any subset B of the probability space, by similar notation, its indicator function satisfies \(\mathbbm {1}_{B^{*}} = \mathbbm {1}^{*}_{B}\) and \(\mathbbm {1}_{B_{*}} = (\mathbbm {1}_{B})_{*}\). We draw randomly with replacement from \(\mathcal {X}\) independent bootstrap observations \(\mathbf {Y}_{1},\ldots , \mathbf {Y}_{m}\). Let us define

$$\widehat{\mathbb {P}}_{m}=m^{-1}\sum _{i=1}^m \delta _{\mathbf {Y}_{i}},$$

so that

$$\widehat{\mathbb {P}}_{m} = \sum _{i=1}^{m} W_{i} \delta _{\mathbf {X}_{i}},$$

where \(mW = m(W_{1},\ldots , W_{n})\) is a multinomial vector with m trials and parameters \((n^{-1},\ldots ,n^{-1})\), independent of the \(\mathbf {X}_{i}\). The probability measure induced by bootstrap resampling conditional on \(\mathcal {X}\) is denoted by \(P_W\). Let us define the bootstrappped empirical process by

$$\widehat{\mathbb {G}}_m = m^{1/2}\Big (\widehat{\mathbb {P}}_{m} - \mathbb {P}_{n}\Big ).$$

Let \(T_{n}\) denote a sequence of maps. Let \(\mathbb {D}\) be a metric space. Let T be a \(\mathbb {D}\)-valued measurable map from the underlying probability. If \(T_{n}\) is bounded in outer probability, we will write \(T_{n} = O_{\mathbb {P}^{*}} (1)\), in a similar way, if \(T_{n}\) converges in outer probability to zero, we will write \(T_{n} = o_{\mathbb {P}^{*}} (1)\). Assume that

$$\begin{aligned} \lim _{M\rightarrow \infty }\liminf _{n\rightarrow \infty }\mathbb {P}_{W}\Big \{\Vert T_{n}\Vert <M\Big \}_{*}=1. \end{aligned}$$
(1)

If (1) holds along almost every sequence \(\mathbf {X}_{1},\mathbf {X}_{2},\ldots\), we write \(T_{n} = O_{\mathbb {P}_{W}^{*}} (1)\) a.s. (almost surely). If for any subsequence \(\{T_{n^{\prime }}\}\), there exists a further subsequence \(\{T_{n^{\prime \prime }} \}\) with \(T_{n^{\prime \prime }} = O_{\mathbb {P}_{W}^{*}} (1)\) a.s., we write \(T_{n} = O_{\mathbb {P}_{W}^{*}} (1)\) i.p. (in probability). We write \(T_{n} =o_{\mathbb {P}_{W}^{*}}(1)\) a.s., if, for any \(\epsilon > 0\), we have

$$\begin{aligned} \mathbb {P}_{W}\Big \{\Vert T_{n}\Vert >\epsilon \Big \}^{*}\rightarrow 0,~~\text{ as }~~n\rightarrow \infty \end{aligned}$$
(2)

almost surely. We write \(T_{n} =o_{\mathbb {P}_{W}^{*}}(1)\) i.p., in the case when the convergence (2) is in probability. The weak convergence of \(T_{n}\) to T, in the sense of Hoffmann-Jørgensen (1991), is denoted by \(T_{n} \Rightarrow T\). The space of \(\mathbb {D}\)-valued functions in \(\mathbb {R}\) bounded by 1 in the Lipschitz norm is denoted by \(BL_{1}(\mathbb {D})\). The conditional weak convergence of \(T_{n}\) to a separable T in \(\mathbb {D}\) is characterized by the condition

$$\begin{aligned} \sup _{f\in BL_{1}(\mathbb {D})}\Big |\mathbb {P}_{W}^{*}f(T_{n})-\mathbb {P}f(T)\Big |\rightarrow 0. \end{aligned}$$
(3)

In the case of the convergence (3) is in outer probability, we will write write \(T_{n} \Rightarrow T\) i.p., in a similar way, if it is outer almost sure, we write \(T_{n} \Rightarrow T\) a.s.

Define \(\mathcal {M}_{S,\mathcal {H}} = \{\mathbf {m}_{{\varvec{\theta }},h} : {\varvec{\theta }} \in S, h \in {\mathcal {H}}\} \subset \mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\), where \(S \subset {\varvec{\Theta }}\). For any \(\delta , \delta _{1},\eta > 0\), let us denote by \(\mathcal {M}_{\delta ,\delta _{1}}(\eta )\) and \(\mathcal {M}_{\delta ,\delta _{1}}\) the class of functions

$$\mathcal {M}_{\delta ,\delta _{1}} = \Big \{\mathbf {m}_{{\varvec{\theta }},h}-\mathbf {m}_{{\varvec{\theta }}_{0},h} : \Vert {\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert \le \delta , d_{\mathcal {H}}(h,h^{0}) \le \delta _{1}, {\varvec{\theta }} \in {\varvec{\Theta }}, h\in \mathcal {H} \Big \},$$
$$\mathcal {M}_{\delta ,\delta _{1}}(\eta )=\Big \{\mathbf {m}_{{\varvec{\theta }},h}- \mathbf {m}_{\varvec{\psi },h} :\Vert {\varvec{\theta }}-\varvec{\psi }\Vert<\eta ,\Vert {\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert \vee \Vert \varvec{\psi }-{\varvec{\theta }}_{0}\Vert <\delta , d_{\mathcal {H}}(h,h^{0}) \le \delta _{1}, {\varvec{\theta }},\varvec{\psi } \in {\varvec{\Theta }}, h\in \mathcal {H}\Big \}.$$

The envelope function of \(\mathcal {M}_{\delta ,\delta _{1}}\) is denoted by \(M_{\delta ,\delta _{1}}\). For each \(\varvec{\psi } \in \mathcal {B}\) and \(h \in \mathcal {H}\) with \({\varvec{\theta }}_{0} +\varvec{\psi } \in {\varvec{\Theta }}\), define \(\widetilde{\mathbf {m}}_{\varvec{\psi },h} =\mathbf {m}_{{\varvec{\theta }}_{0}+\varvec{\psi },h}- \mathbf {m}_{{\varvec{\theta }}_{0},h}\). For any \(\mathcal {T} \subset \mathcal {B}\), the class of bounded functions from \(\mathcal {T}\) to \(\mathbb {R}\) is denoted by \(l^{\infty }(\mathcal {T})\), equipped with the sup norm. In the sequel, for all \(x\in \mathcal {S}\) and closed bounded \(K \subset {\varvec{\Theta }}\), assume that

$$\sup _{{\varvec{\theta }} \in K, h \in \mathcal {H}} |\mathbf {m}_{{\varvec{\theta }},h}(x) - \mathbb {P}\mathbf {m}_{{\varvec{\theta }},h} | < \infty .$$

In the sequel, we denote by C a positive constant that may be different from line to line. The choice of the bootstrap sample size m is theoretically governed by (AB1) and (C4). The above conditions are typically satisfied by taking \(m \propto n^{c}\), for some sufficiently small \(c \in (0, 1)\). Empirical determination of m has long been an important problem which has not yet been fully resolved, for more comments see Remark 3.12 below.

3 Main Results

In this section, we present four main theorems, each of independent interest, which lead eventually to weak convergence of \(r_{n}({\varvec{\theta }}_n - {\varvec{\theta }}_{0})\) and in-outer-probability m out of n bootstrap consistencies in the context of general M-estimation by applying the argmax theorem in van der Vaart and Wellner (1996) and in Lee (2012) respectively. Let us recall the basic idea. If the argmax functional is continuous with respect to some metric on the space of the criterion functions, then convergence in distribution of the criterion functions will imply the convergence in distribution of their points of maximum, the M-estimators, to the maximum of the limit criterion function. First, we establish consistency of \({\varvec{\theta }}_{n}\) and \({\varvec{\theta }}_{m}\) for \({\varvec{\theta }}_{0}\) by the following theorem.

3.1 Consistency

In our analysis, we consider the following assumptions. Assume that the sequence of positive constants \(r_{n} \uparrow \infty\), for some fixed \(\nu > 1\) and for some function \(\ell : (0, \infty ) \rightarrow [0, \infty )\) which is slowly varying at \(\infty\).

  • (A1) \(\mathbb {P}\left( \widehat{h}\in \mathcal {H}\right) \longrightarrow 1\) as \(n \longrightarrow \infty\) and \(d_{\mathcal {H}}(\widehat{h},h^{0}){\mathop {\longrightarrow }\limits ^{\mathbb {P}^{*}}}0\).

  • (A2) \(\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}\) is Glivenko-Cantelli:

    $$\Vert \mathbb {P}_{n}-\mathbb {P}\Vert _{\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}}=o_{\mathbb {P}^{*}} (1).$$
  • (A3) \(\lim _{d_{\mathcal {H}}(h,h_{0})\rightarrow 0}\sup _{\varvec{\theta }\in \varvec{\Theta }}|\mathbb {P}\mathbf {m}_{\varvec{\theta },h}-\mathbb {P}\mathbf {m}_{\varvec{\theta },h^{0}}|=0\).

  • (A4) The parameter of interest \({\varvec{\theta }}_{0}\) lies in the interior of \({\varvec{\Theta }}\) and satisfies, for every open \(\mathcal {O}\) containing \({\varvec{\theta }}_{0}\),

    $$\mathbb {P}\mathbf {m}_{{\varvec{\theta }}_{0},h^{0}} > \sup _{{\varvec{\theta }}\notin \mathcal {O}} \mathbb {P}\mathbf {m}_{{\varvec{\theta }},h^{0}}.$$
  • (A5) The M-estimator \({\varvec{\theta }}_{n}\) satisfies \(\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }}_{n},\widehat{h}} \ge \mathbb {P}_{n} \mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}-R_{n}\), with

    $$r_{n}^{\nu } \ell (r_{n} )R_{n} = o_{\mathbb {P}^{*}}(1).$$
  • (AB1) \(m=m_{n}\rightarrow \infty\), \(m=o(n)\) and \(r_{m}^{\nu }\ell (r_{m})=o\left( r_{n}^{\nu }\ell (r_{n})\right)\).

  • (AB2) \(d_{\mathcal {H}}(\widehat{h}_{m},h^{0})=o_{\mathbb {P}_W^{*}} (1)\) i.p.

  • (AB3) The m out of n bootstrap M-estimator \({\varvec{\theta }}_{m}\) satisfies \(\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }}_{m},\widehat{h}_{m}} \ge \widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}_{m}}-\widehat{R}_{n}\), with

    $$r_{m}^{\nu }\ell (r_{m})\widehat{R}_{n} = o_{\mathbb {P}_W^{*}} (1),~ \text{ i.p. }$$

Remark 3.1

  1. (i)

    Assumption (A2) fulfilled under some entropy and moment conditions; see for example, Theorem 2.4.3, \((\mathrm {p} .123)\) of van der Vaart and Wellner (1996).

  2. (ii)

    Assumption (A3) is automatically hold for example if; there exist function \(\mathfrak {G}(\cdot )\) such that for any h in the neighborhood of \(h^{0}\) and any \(\varvec{\theta } \in \varvec{\Theta }\), we have:

    $$|\mathbf {m}(\mathbf {X}_{i}, \varvec{\theta }, h)-\mathbf {m}(\mathbf {X}_{i}, \varvec{\theta }, h^{0})| \le \mathfrak {G}(\mathbf {X}_{i})d_{\mathcal {H}}(h,h^{0}).$$

    The function \(\mathfrak {G}(\cdot )\) satisfies;

    $$\mathbb {P}\mathfrak {G}(\mathbf {X}) <\infty ,$$

    or the function \(h \mapsto \mathbf {m}(x,\varvec{\theta },h)\) is Lipschitz uniformly over x and \(\varvec{\theta }\).

  3. (iii)

    Assumptions (A5) and (AB3) are trivially fulfilled when

    $$\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }}_{n},\widehat{h}} \ge \sup _{\varvec{\theta } \in \varvec{\Theta }} \mathbb {P}_{n}\mathbf {m}_{\varvec{\theta },\widehat{h}}-R_{n},$$

    and

    $$\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }}_{m},\widehat{h}_{m}} \ge \sup _{\varvec{\theta } \in \varvec{\Theta }}\widehat{\mathbb {P}}_{m}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}-\widehat{R}_{n},$$

    respectively, which allows to deal with approximations of the value that actually \({\text {maximizes }} \theta \mapsto \mathbb {P}_{n}\mathbf {m}_{\varvec{\theta },\widehat{h}}\) and \({\text {maximizes }} \theta \mapsto \widehat{\mathbb {P}}_{m}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}\) respectively.

  4. (iv)

    Assumption (AB2) poses no difficulty in practice and is met trivially by, for example, setting \(\widehat{h}_{m}=\widehat{h}\).

  5. (v)

    For the finite-dimensional \(\theta\), (A5) and (AB3) can be achieved by a global maximization of the processes \(\mathbb {P}_{n} \mathbf {m}_{\varvec{\theta },\widehat{h}}\) and \(\widehat{\mathbb {P}}_{m} \mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}\), in this situation \(R_{n}=\widehat{R}_{n}=0\). For the infinite-dimensional \(\varvec{\theta }\), the maximization of the processes may be very complex or not practically feasible. To circumvent this, we need sophisticated algorithms to construct \(\varvec{\theta }_{n}\) and \(\varvec{\theta }_{m}\) fulfilling (A5) and (AB3).

  6. (vi)

    Finally, it’s possible to replace the following assumptions (A2) and (A4) by:

    • (A1’) For every compact \(K \subset \varvec{\Theta }\), \(\mathcal {M}_{K,\mathcal {H}}\) is Glivenko-Cantelli.

    • (A2’) The map \(\theta \mapsto \mathbb {P} \mathbf {m}_{\varvec{\theta },h^{0}}\) is upper semicontinuous with a unique maximum at \(\varvec{\theta }_{0}\).

    • (A3’) \(\varvec{\theta }_{n}\) is uniformly tight.

    • (AB1’) \(\varvec{\theta }_{m}\) is uniformly tight i.p.

Theorem 3.2

  1. (i)

    Assume (A1)-(A5). Then

    $${\varvec{\theta }}_{n} - {\varvec{\theta }}_{0} = o_{\mathbb {P}^{*}}(1).$$
  2. (ii)

    Assume (A2), (A3), (A4) and (AB1)-(AB3). Then

    $${\varvec{\theta }}_{m} - {\varvec{\theta }}_{0}= o_{\mathbb {P}_W^{*}} (1)~~ {i.p.}$$

Note that, the result of part (i) holds if we replaced (A2) and (A4) by ((A1’)-(A3’) and the result of part (ii) holds if we replaced (A2) and (A4) by (A1’), (A2’) and (AB1’).

In the sequel, we refer to the sets of assumptions which implies the parts (i) and (ii); (C) and (CP); respectively. Next we give the set of assumptions needed to identify rates of convergence of \({\varvec{\theta }}_{n}\) and \({\varvec{\theta }}_{m}\) to \({\varvec{\theta }}_{0}\), which is the important step for studying the weak convergence of these estimators.

Remark 3.3

We highlight that the parameter of interest \(\varvec{\theta }\) is not restricted to belong to some Euclidean space as in Delsol and Van Keilegom (2020). More precisely, we consider the general framework in which \(\varvec{\theta } \in \varvec{\Theta }\), where \(\varvec{\Theta }\) is a subset of some Banach space \(\mathcal {B}\). Notice that the result (i) of Theorem 3.2 is a bit more general than the analogous stated in the last reference, by the fact the conditions imposed are more general in our setting and extend those of Lee (2012) to the semiparametric models.

3.2 Rates of Convergence

Let us introduce the following assumptions:

  • (B1) \(v_{n}d_{\mathcal {H}}(\widehat{h},h^{0})=O_{\mathbb {P}^{*}}(1)\) for some \(v_{n}\longrightarrow \infty\).

  • (B2) For all \(\delta _{1}>0\), there exist \(\varvec{\alpha }<\nu\), \(K>0\), \(\delta _{0}>0\), for all \(n \in \mathbb {N}\) there exist a function \(\varphi\) for which \(\delta \mapsto \frac{\varphi (\delta )}{\delta ^{\varvec{\alpha }}}\) is decreasing on \((0,\delta _{0}]\) and \(r_{n}^{\nu }\ell (r_{n})n^{-1/2}\varphi (1/r_{n}) \le C\) for n sufficiently large and some positive constant C, such that for all \(\delta \le \delta _{0}\),

    $$\mathbb {P}^{*}\left[ \sup \limits _{\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta , d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{n}}}|\mathbb {G}_n\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|\right] \le K\varphi (\delta ).$$
  • (B3) There exist \(\eta _{0}>0\), \(C>0\) and two positive and non-decreasing functions \(\varvec{\psi }_{1}\) and \(\varvec{\psi }_{2}\) on \((0,\eta _{0}]\) such that for all \(\varvec{\theta } \in \varvec{\Theta }\) satisfying \(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta _{0}\):

    $$\mathbb {P}\widetilde{\mathbf {m}}_{{\varvec{\theta }}- {\varvec{\theta }}_{0},\widehat{h}} \le W_{n}\varvec{\psi }_{1}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ) -(C+o_{\mathbb {P}^{*}}(1))\varvec{\psi }_{2}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ).$$

    Moreover, there exist \(\beta _{2}>\alpha , \beta _{1}<\beta _{2}, \delta _{0}>0\) such that \(\delta \mapsto \varvec{\psi }_{1}(\delta ) \delta ^{-\beta _{1}}\) is non-increasing and \(\delta \mapsto \varvec{\psi }_{2}(\delta ) \delta ^{-\beta _{2}}\) is non-decreasing on \(\left( 0, \delta _{0}\right] ,\) and such that, for some sequence \(r_{n} \rightarrow \infty\),

    $$\varvec{\psi }_{1}\left( r_{n}^{1-\nu }\ell ^{-1}(r_{n})\right) W_{n}=O_{\mathbb {P}^{*}}\left( \varvec{\psi }_{2}\left( r_{n}^{1-\nu }\ell ^{-1}(r_{n})\right) \right) .$$

    for definition of \(\mathbb {P}\)-measurability.

  • (BB1) \(v_{m}d_{\mathcal {H}}(\widehat{h}_{m},h^{0})=O_{\mathbb {P}_W^{*}}(1)\) i.p. for some \(v_{m}\longrightarrow \infty\).

  • (BB2) With the same notation in assumption (B2) we replace \(r_{n}\) (\(v_{n}\)) by \(r_{m}\) (\(v_{m}\)) with assumption (AB1) we have;

    $$\mathbb {P}^{*}\mathbb {P}_{W}^{*}\left[ \sup \limits _{\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta , d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{m}}}|\widehat{\mathbb {G}}_m\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|\right] \le K\varphi (\delta ).$$
  • (BB3) With the same notation in assumption (B3) we replace \(r_{n}\) by \(r_{m}\) with assumption (AB1) in mind we have;

    $$\mathbb {P}\widetilde{\mathbf {m}}_{{\varvec{\theta }}- {\varvec{\theta }}_{0},\widehat{h}_{m}} \le W_{m}\varvec{\psi }_{1}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ) -(C+o_{\mathbb {P}^{*}}(1))\varvec{\psi }_{2}(\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0}\Vert ),$$

    where for some sequence \(r_{m} \rightarrow \infty\),

    $$\varvec{\psi }_{1}\left( r_{m}^{1-\nu }\ell ^{-1}(r_{m})\right) W_{m}=O_{\mathbb {P}_W^{*}} \left( \varvec{\psi }_{2}\left( r_{m}^{1-\nu }\ell ^{-1}(r_{m})\right) \right) , ~~\text{ i.p. }$$

Remark 3.4

  1. (i)

    Assumption (B1) is a high-level assumption. Such condition on the nuisance parameter \(\widehat{h}\) could be obtained by many asymptotic results. In the present paper, we are primarily concerned with the cases where the convergence rate of the M-estimator of \(\varvec{\theta }\) is not affected by the estimation of the nuisance parameter h.

  2. (ii)

    Assumption (B2) is fulfilled if we assume that for any \(\varvec{x}\) the function \((\varvec{\theta }, h) \rightarrow \mathbf {m}(\varvec{x}, \varvec{\theta }, h(\varvec{x}, \varvec{\theta }))-\mathbf {m}\left( \varvec{x}, \varvec{\theta }_{0}, h\left( \varvec{x}, \varvec{\theta }_{0}\right) \right)\) is uniformly bounded on an open neighborhood of \(\left( \varvec{\theta }_{0}, h^{0}\right)\), i.e., on

    $$\left\{ (\varvec{\theta }, h): \Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta _{0}, d_{\mathcal {H}}\left( h, h^{0}\right) \le \delta _{1}^{\prime }\right\} ,$$

    for some \(\delta _{0}, \delta _{1}^{\prime }>0.\) We consider the class \(\mathcal {M}_{\delta ,\delta ^{\prime }_{1}}\) for any \(\delta \le \delta _{0}\) and its envelope \(M_{\delta , \delta _{1}^{\prime }}.\) For any \(\delta _{1},\) we have, for n large enough; \(\delta _{1} v_{n}^{-1} \le \delta _{1}^{\prime }\). After by the entropy conditions on \(\mathcal {M}_{\delta ,\delta _{1}^{\prime }},\)

    $$\begin{aligned} \int _{0}^{1}\sup _{\delta<\delta _{0}} \sup _{\mathbb Q} \sqrt{1+\log N\left( \epsilon \left\| M_{\delta , \delta _{1}^{\prime }}\right\| _{\mathbb {L}_{2}\left( \mathbb {Q}\right) }, \mathcal {M}_{\delta ,\delta _{1}^{\prime }}, \mathbb {L}_{2}(\mathbb {Q})\right) } d \epsilon <+\infty , \end{aligned}$$
    (4)

    where the second supremum is taken over all finitely discrete probability measures \(\mathbb {Q}\) on \(\mathcal {S}\). We get;

    $$\mathbb {P}^{*}\left[ \sup \limits _{\Vert {\varvec{\theta }}- {\varvec{\theta }}_{0} \Vert \le \delta , d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{n}}}|\mathbb {G}_n\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|\right] \le K_{1} \sqrt{\mathbb {P}^{*}\left[ M_{\delta , \delta _{1}^{\prime }}^{2}\right] },$$

    see Theorems 2.14.1 and 2.14.2 in van der Vaart and Wellner (1996). Then the last part of (B2) holds if \(\varphi (\delta )\) can be chosen such that

    $$\begin{aligned} \exists K_{0}, \forall \delta \le \delta _{0}: \sqrt{\mathbb {P}^{*}\left[ M_{\delta , \delta _{1}^{\prime }}^{2}\right] } \le K_{0}\varphi (\delta ). \end{aligned}$$
    (5)

    Note that, all the different rate of convergence \(r_{n}\) in the literature for smooth or not smooth function satisfied the last term in assumption (B2).

  3. (iii)

    We choose for simplification \(\varvec{\psi }_{1}(x)=Id(x)=x\) and \(\varvec{\psi }_{2}(x)=x^{2}\) in assumption (B3), so it’s hold under the following conditions :

    1. (a)

      \(\varvec{\Theta } \subset \mathcal {B}\), where \(\mathcal {B}\) is a Banach space.

    2. (b)

      There exists \(\delta _{2}>0\) such that for any h satisfying \(d_{\mathcal {H}}(h,h^{0})\le \delta _{2}\), the function \(\varvec{\theta }\mapsto \mathbb {P}(\mathbf {m}(\mathbf {X}, \varvec{\theta }, h))\) is twice Fréchet differentiable on an open neighborhood of \(\varvec{\theta }_{0}\),

      $$\begin{aligned} \lim \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \rightarrow 0}\sup \limits _{d_{\mathcal {H}}(h,h^{0}) \le \delta _{2}}\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert ^{-2}&\left| \mathbb {P}\mathbf {m}_{\varvec{\theta },h}-\mathbb {P}\mathbf {m}_{\varvec{\theta }_{0},h}-\Gamma (\varvec{\theta }_{0},h)(\varvec{\theta }-\varvec{\theta }_{0})\right. \\&\left. +\frac{1}{2}\Lambda (\varvec{\theta }_{0},h)(\varvec{\theta }-\varvec{\theta }_{0},\varvec{\theta }-\varvec{\theta }_{0})\right| =0. \end{aligned}$$

      For more detail see (Allaire 2005, p.306).

    3. (c)

      \(\Gamma (\varvec{\theta }_{0},h)(\cdot )\) is a continuous linear form, with \(\Vert \Gamma (\varvec{\theta }_{0},\widehat{h})\Vert =O_{\mathbb {P}^{*}}\left( \frac{1}{r^{\nu -1}_{n}\ell (r_{n})}\right)\) and \(\Gamma (\varvec{\theta }_{0},h^{0})=0\).

    4. (d)

      \(\Lambda (\varvec{\theta }_{0},h)(\cdot ,\cdot )\) is bilinear form with \(\Lambda (\varvec{\theta }_{0},h^{0})\) is bounded, symmetric, positive definite and elliptic (i.e. \(\Lambda (\varvec{\theta }_{0},h^{0})(u,u)\ge C\Vert u\Vert ^{2}\)) and \(h\mapsto \Lambda (\varvec{\theta }_{0},h)\) is continuous in \(h^{0}\), i.e.,

      $$\lim _{d_{\mathcal {H}}\left( h, h^{0}\right) \rightarrow 0} \sup _{u \in \mathbb {R}^{k},\Vert u\Vert =1}\left\| \left( \Lambda \left( \varvec{\theta }_{0}, h\right) -\Lambda \left( \varvec{\theta }_{0}, h^{0}\right) \right) u\right\| =0.$$

      These assumptions and the fact that the bilinear form is bounded, it results when \(d_{\mathcal {H}}(\widehat{h},h^{0})\le \delta _{2}\);

      $$\begin{aligned} \mathbb {P}\mathbf {m}_{\varvec{\theta },\widehat{h}}-\mathbb {P}\mathbf {m}_{\varvec{\theta }_{0},\widehat{h}}&= \Gamma (\varvec{\theta }_{0},\widehat{h})(\gamma _{\varvec{\theta }})-\frac{1}{2}\Lambda (\varvec{\theta }_{0},h^{0})(\gamma _{\varvec{\theta }},\gamma _{\varvec{\theta }})+o_{\mathbb {P}^{*}}(\Vert \gamma _{\varvec{\theta }}\Vert ^{2})+o(\Vert \gamma _{\varvec{\theta }}\Vert ^{2})\\&\le W_{n}\Vert \gamma _{\varvec{\theta }}\Vert -C\Vert \gamma _{\varvec{\theta }}\Vert ^{2}+\beta _{n}(\Vert \gamma _{\varvec{\theta }}\Vert ), \end{aligned}$$

      where \(\gamma _{\varvec{\theta }}=\varvec{\theta }-\varvec{\theta }_{0}\). So (B3) holds with

      $$W_{n}=\left\| \Gamma \left( \varvec{\theta }_{0}, \widehat{h}\right) \right\| .$$

      Note that when the space \(\varvec{\Theta } \subset \mathbf {E}\) where \(\mathbf {E}\) is some Euclidean space, the Fréchet derivatives \(\Gamma (\varvec{\theta }_{0},h)\) and \(\Lambda (\varvec{\theta }_{0},h)\) become the usually derivatives i.e., the Gradient and Hessian matrix respectively, which is given in Remark 2(v) of Delsol and Van Keilegom (2020).

  4. (iv)

    Assumption (BB1) poses no difficulty in practice and is met trivially by, for example, setting \(\widehat{h}_{m}=\widehat{h}\), like in Remark 3.1 (iv).

  5. (v)

    Assumption (BB2) is a ’high-level’ assumption. It serves to control the modulus of continuity of the bootstrapped empirical processes; which is needed to derive the rate of convergence of the bootstrapped estimator \(\varvec{\theta }_{m}\). Note that when we are in the situation of the n out of n bootstrap this condition is given in Ma and Kosorok (2005) and in Lemma 1 of Cheng and Huang (2010) for more generally in the exchangeable bootstrap weights. In our setting; it’s fulfilled under some entropy conditions, for brevity with the same notation in (ii), let \(\widetilde{N}_{1}, \widetilde{N}_{2}, \ldots\) be i.i.d. symmetrized Poisson variables with parameter \(\frac{1}{2} m / n\) and \(\varepsilon _{1}, \varepsilon _{2}, \ldots\) are i.i.d. Rademacher variables independent of \(\widetilde{N}_{1}, \widetilde{N}_{2}, \ldots\) and \(\mathbf{X}_{1}, \mathbf{X}_{2}, \ldots\). Denote by \(R=\left( R_{1}, \ldots , R_{n}\right)\) a random permutation of \(\{1,2, \ldots , n\}\), independent of all other variables. Let us introduce

    $$\mathbb {P}_{k}^{R}=k^{-1} \sum _{i=1}^{k}\delta _{\mathbf {X}_{R_{i}}},$$

    for each \(k \in \{1, \ldots , n\}.\) By Lemma 3.6.6 of van der Vaart and Wellner (1996) and the paragraph before it (ahead) with sub-Gaussian inequality for Rademacher process we obtain

    $$\begin{aligned} \mathbb {P}_{W}^{*}\left\| \widehat{\mathbb {G}}_{m}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}} \le 4 \mathbb {P}_{\widetilde{N}}\left\| \frac{1}{\sqrt{k}} \sum _{i=1}^{n} |\widetilde{N}_{i}|\varepsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}. \end{aligned}$$
    (6)

    Applying now Lemma 3.6.7 of van der Vaart and Wellner (1996) to the right side of (6) with \(n_{0}\) set to 1 we get;

    $$\begin{aligned} \begin{aligned} \mathbb {P}_{W}^{*}\left\| \widehat{\mathbb {G}}_{m}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}&\le 4 \mathbb {P}_{\widetilde{N}}\left\| \frac{1}{\sqrt{k}} \sum _{i=1}^{n} |\widetilde{N}_{i}|\varepsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}\\&\le \sqrt{\frac{n}{k}}\left\| \widetilde{N}_{i}\right\| _{2,1} \max _{1 \le k \le n}\mathbb {P}_{R} \mathbb {P}_{\varepsilon }\left\| \frac{1}{\sqrt{k}} \sum _{i=1}^{k} \varepsilon _{i} \delta _{\mathbf {X}_{R_{i}}}\right\| _{\mathcal {M}_{\delta , \delta _{1}^{\prime }}}^{*}\\&\le C \max _{1 \le k \le n} \mathbb {P}_{R}\left( \mathbb {P}_{k}^{R} M_{\delta , \delta _{1}^{\prime }}\right) ^{1 / 2}\\&\le C\left( \mathbb {P}_{n}M_{\delta , \delta _{1}^{\prime }} \right) ^{1 / 2}, \end{aligned} \end{aligned}$$
    (7)

    where \(C>\sqrt{\frac{n}{k}}\left\| \widetilde{N}_{i}\right\| _{2,1}\) see Problem 3.6.3 of van der Vaart and Wellner (1996). By Jensen’s inequality the outer expectation of the left side of (7) is bounded by \(C\sqrt{\mathbb {P}[M_{\delta ,\delta ^{'}_{1}}]^{2}},\) for every \(\delta < \delta _{1}\). The inequality in assumption (BB2) holds for every \(n \in \mathbb {N}\) this implied by the fact that the variables we consider are i.i.d.

  6. (vi)

    Finally, for the assumption (BB3) with the same discussion given in (iii) only the choice \(W_{n}=\left\| \Gamma \left( \varvec{\theta }_{0}, \widehat{h}\right) \right\|\) becomes

    $$W_{m}=\left\| \Gamma \left( \varvec{\theta }_{0}, \widehat{h}_{m}\right) \right\| ,$$

    with \(W_{m}=O_{\mathbb {P}_W^{*}}\left( \frac{1}{r^{\nu -1}_{m}\ell (r_{m})}\right)\) i.p.

Theorem 3.5

  1. (i)

    Assume (C) and (B1)-(B3). Then

    $$r_{n}({\varvec{\theta }}_{n} - {\varvec{\theta }}_{0}) = O_{\mathbb {P}^{*}}(1).$$
  2. (ii)

    Assume (CP) and (BB1)-(BB3). Then

    $$r_{m}({\varvec{\theta }}_{m}-{\varvec{\theta }}_{0}) = O_{\mathbb {P}_W^{*}} (1)~~\text{ i.p. }$$

Remark 3.6

The result (i) of this Theorem still holds for \(\varvec{\theta }\) belongs to Banach space which is more general of the Theorem 2 of Delsol and Van Keilegom (2020), where the authors are interested in the finite dimensional case. Noting that the choice of \(\nu =2\) and \(\ell \equiv 1\) in assumptions B2 and B3, reduces to the assumptions B2 and B3 respectively of the last reference.

3.3 Weak Convergence

We start this section by introducing some notation. For any \(\varvec{\theta } \in \varvec{\Theta }\) and \(h \in \mathcal {H}\), let \(\mathcal {K}=\{ \varvec{\varvec{\gamma }}\in \mathbf {E} : \Vert \varvec{\gamma }\Vert \le K \}\) for \(K > 0\). Define, for sufficiently large n and for \(\gamma \in \mathcal {K}\), the empirical processes

$$\begin{aligned} \begin{aligned} \mathbb {M}_{n}(\gamma ,h)&= r_{n}^{\nu }\ell (r_{n})(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\gamma /r_{n},h}, \\ \widehat{\mathbb {M}}_{n}(\gamma ,h)&= r_{m}^{\nu } \ell (r_{m})(\widehat{\mathbb {P}}_{m} - \mathbb {P}_{n})\widetilde{\mathbf {m}}_{\gamma /r_{m},h}, \end{aligned} \end{aligned}$$
(8)

which can be treated as random elements in \(\ell ^{\infty }(\mathcal {K})\). Also let for any \(\delta >0\);

$$\begin{aligned} \mathbf {M}_{\delta }(\cdot )&\ge {} \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \delta }\left| \mathbf {m}(\cdot ,{\varvec{\theta }},h^{0})- \mathbf {m}(\cdot ,{\varvec{\theta }}_{0},h^{0})\right| , \\ \mathcal {M}_{\delta }&= {} \Big \{ \mathbf {m}(\cdot ,{\varvec{\theta }},h^{0})- \mathbf {m}(\cdot ,{\varvec{\theta }}_{0},h^{0}) : \Vert {\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert \le \delta \Big \},\\ \mathcal {M}_{\delta }(\eta )&= {} \Big \{\mathbf {m}_{{\varvec{\theta }},h^{0}}- \mathbf {m}_{\varvec{\psi },h^{0}} :\Vert {\varvec{\theta }}-\varvec{\psi }\Vert<\eta ,\Vert {\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert \vee \Vert \varvec{\psi }-{\varvec{\theta }}_{0}\Vert <\delta , {\varvec{\theta }},\varvec{\psi } \in {\varvec{\Theta }}\Big \}. \end{aligned}$$

Finally, for any \(p \in \mathbb {N}\) and any \(f : \varvec{\Theta } \longrightarrow \mathbb {R}\) and for any \(\varvec{\gamma }=(\varvec{\gamma }_{1},\ldots ,\varvec{\gamma }_{p}) \in \varvec{\Theta }^{p}\), denote

$$\overline{f}_{\varvec{\gamma }}=(f(\varvec{\gamma }_{1}),\ldots ,f(\varvec{\gamma }_{p}))^{\top }.$$

We give the set of assumptions for the asymptotic distribution of the processes given in (8) and their maximum.

  • (C1) \(r_{n}\Vert \varvec{\theta }_{n}-\varvec{\theta }_{0}\Vert =O_{\mathbb {P}^{*}}(1)\) and \(v_{n}d_{\mathcal {H}}(\widehat{h},h^{0})=O_{\mathbb {P}^{*}}(1)\) for some sequences \(r_{n} \longrightarrow \infty\) and \(v_{n} \longrightarrow \infty ,\) and \(r^{\nu -2}_{n}\ell (r_{n})< C\) for some \(C>0\).

  • (C2) \(\varvec{\theta }_{0}\) lies to the interior of \(\varvec{\Theta }\), where \(\varvec{\Theta } \subset (\mathcal {B},\Vert \cdot \Vert )\).

  • (C3) For all \(\delta _{2},\delta _{3}>0\),

    $$\sup \limits _{\begin{array}{c} \Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}\\ d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}} \end{array} }\frac{\displaystyle |(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}{\displaystyle r^{-\nu }_{n}\ell ^{-1}(r_{n})+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|+|\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}=o_{\mathbb {P}^{*}}(1).$$
  • (C4) There exists a sequence \(\{a_{n}\}\) with

    $$a_{m}^{2}m^{-1/2}\log n/ \log (n/m + 1) = o(1)~~ \text{ and } ~~a_{n}^{-1} = O(1),$$

    such that, for all \(C,\eta >0\) and for every sequence \(\{j_{n}\}\) with \(a_{n}= o(j_{n})\),

    $$\begin{aligned} \begin{array}{cc} \frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n}\mathbb {P}^{*}\left[ \mathbf {M}^{2}_{\frac{C}{r_{n}}}\right] =O(1)~~\text{ and }~~\frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n}\mathbb {P}^{*}\left[ \mathbf {M}^{2}_{\frac{C}{r_{n}}} \mathrm{1\!I}_{\Big \{\mathbf {M}_{\frac{C}{r_{n}}} > \frac{\eta j_{n} n^{1/2}}{r^{\nu }_{n}\ell (r_{n})} \Big \}} \right] =o(1). \end{array} \end{aligned}$$
  • (C5) For all K and for any \(\eta _{n}\longrightarrow 0\),

    $$\sup \limits _{\Vert \varvec{\gamma }_{1}-\varvec{\gamma }_{2}\Vert <\eta _{n}, \Vert \varvec{\gamma }_{1}\Vert \vee \Vert \varvec{\gamma }_{2}\Vert \le K}\frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n}\mathbb {P}\left[ \mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\frac{\varvec{\gamma }_{1}}{r_{n}},h^{0}\right) -\mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\frac{\varvec{\gamma }_{2}}{r_{n}},h^{0}\right) \right] ^{2}=o(1).$$
  • (C6) For \(\varvec{x}\), the function \(\varvec{\theta } \mapsto \mathbf {m}(\varvec{x},\varvec{\theta },h^{0})\) and almost all paths of the two processes \(\varvec{\theta } \mapsto \mathbf {m}(\varvec{x},\varvec{\theta },\widehat{h})\) and \(\varvec{\theta } \mapsto \mathbf {m}(\varvec{x},\varvec{\theta },\widehat{h}_{m})\) are uniformly bounded on closed bounded sets (over \(\varvec{\theta }\)).

  • (C7) There exist a random and linear function \(W_{n} : \mathcal {B}\longrightarrow \mathbb {R}\), a deterministic and bilinear function \(V : \mathcal {B}\times \mathcal {B} \longrightarrow \mathbb {R}\) and \(\beta _{n}=o_{\mathbb {P}^{*}}(1)\); such that for all \(\varvec{\theta } \in \varvec{\Theta }\);

    $$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}}=W_{n}(\varvec{\gamma }_{\varvec{\theta }})+V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+\beta _{n}\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})$$

    and

    $$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}=V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}),$$

    where \(\varvec{\gamma }_{\varvec{\theta }}=\varvec{\theta }-\varvec{\theta }_{0}\) and the notation \(o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})\) means

    $$\lim \limits _{\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert \longrightarrow 0}\frac{\displaystyle o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})}{\displaystyle \Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}}=0.$$

    Moreover, for any bounded closed set \(\mathcal {K} \subset \mathcal {B}\),

    $$\begin{aligned} \begin{array}{cc} \exists \tau , \delta _{1}>0, r^{\nu -1}_{n}\ell (r_{n}) \underset{\Vert \varvec{\gamma }\Vert \le \delta }{\underset{\varvec{\varvec{\gamma }}\in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\left| \frac{\displaystyle W_{n}(\varvec{\gamma })}{\displaystyle \delta ^{\tau }}\right| =O_{\mathbb {P}}(1) ~~\text{ and }~~ \underset{\Vert \varvec{\gamma }-\varvec{\gamma }^{'}\Vert \le \delta }{\underset{\varvec{\gamma }, \varvec{\gamma }^{'} \in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\frac{\displaystyle |V(\varvec{\gamma },\varvec{\gamma })-V(\varvec{\gamma }^{'},\varvec{\gamma }^{'})|}{\displaystyle \delta ^{\tau }}<\infty . \end{array} \end{aligned}$$
  • (C8) There exists a zero-mean Gaussian process \(\mathbb {G}\) defined on \(\mathcal {B}\) and a continuous function \(\Lambda\) such that for all \(p \in \mathbb {N}\) and for all \(\varvec{\gamma }=(\varvec{\gamma }_{1},\ldots ,\varvec{\gamma }_{p}) \in \mathcal {K}^{p}\),

    $$r^{\nu -1}_{n}\ell (r_{n})\overline{W_{n}}_{\varvec{\gamma }}+r^{\nu }_{n}\ell (r_{n})\overline{\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\cdot }{r_{n}},h^{0}}}_{\varvec{\gamma }}\Rightarrow \overline{\Lambda }_{\varvec{\gamma }}+\overline{\mathbb {G}}_{\varvec{\gamma }}.$$

    Moreover, \(\mathbb {G}(\varvec{\gamma })=\mathbb {G}(\varvec{\gamma }^{'})\) a.s. implies that \(\varvec{\gamma }=\varvec{\gamma }^{'}\), and

    $$\mathbb {P}^{*}\left( \limsup \limits _{\Vert \varvec{\gamma }\Vert \longrightarrow \infty }{(\Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }))}<\sup \limits _{\varvec{\varvec{\gamma }}\in \mathcal {B}}(\Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }))\right) =1.$$
  • (C9) There exists a \(\delta _{0}>0\) such that

    $$\int \limits _{0}^{\infty }\sup \limits _{\delta \le \delta _{0}}\sup _{\mathbb Q}\sqrt{\log \left( N(\epsilon \Vert \mathbf {M}_{\delta }\Vert _{\mathbb {Q},2}, \mathcal {M}_{\delta }, \mathbb {L}^{2}(\mathbb {Q}))\right) }d\epsilon <\infty .$$
  • (C10) For all \(\delta ,\eta > 0\), the classes \(\mathcal {M}_{\delta }, \mathcal {M}_{\delta }(\eta )\) and \(\mathcal {M}_{\delta }(\eta )^{2}\) are \(\mathbb {P}\)-measurable, see (van der Vaart and Wellner 1996, p.110) for definition of \(\mathbb {P}\)-measurability.

  • (C11) For all \(C>0\), there exists \(n_{0} \in \mathbb {N}\) such that for all \(n_{0} \ge n\),

    $$\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{n}},\widehat{h}} \ge \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{C}{r_{n}}}\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}-R_{n},$$

    where \(R_{n}\) is given in (A5).

  • (CB1) \(r_{m}\Vert \varvec{\theta }_{m}-\varvec{\theta }_{0}\Vert =O_{\mathbb {P}_W^{*}}(1)\) i.p. and \(v_{m}d_{\mathcal {H}}(\widehat{h}_{m},h^{0})=O_{\mathbb {P}_W^{*}}(1)\) i.p. for some sequences \(r_{m} \longrightarrow \infty\) and \(v_{m} \longrightarrow \infty\) and \(r^{\nu -2}_{m}\ell (r_{m}) \le C\).

  • (CB2) For all \(\delta _{2},\delta _{3}>0\),

    $$\sup \limits _{\begin{array}{c} \Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}\\ d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}} \end{array} }\frac{\displaystyle |(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}{\displaystyle r^{-\nu }_{m}\ell ^{-1}(r_{m})+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|+|\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}|+|\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}|}=o_{\mathbb {P}^{*}}(1).$$
  • (CB3) There exists a random and linear function \(W_{m} : \mathcal {B}\longrightarrow \mathbb {R}\), and \(\beta _{m}=o_{\mathbb {P}^{*}}(1)\), such that for all \(\varvec{\theta } \in \varvec{\Theta }\);

    $$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}=W_{m}(\varvec{\gamma }_{\varvec{\theta }})+V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+\beta _{n}\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2})$$

    and

    $$\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}=V(\varvec{\gamma }_{\varvec{\theta }},\varvec{\gamma }_{\varvec{\theta }})+o(\Vert \varvec{\gamma }_{\varvec{\theta }}\Vert ^{2}).$$
  • Moreover, for any closed bounded set \(\mathcal {K} \subset \mathbf {E}\),

    $$\begin{aligned} \begin{array}{cc} \exists \tau , \delta _{1}>0, r^{\nu -1}_{m}\ell (r_{m}) \underset{\Vert \varvec{\gamma }\Vert \le \delta }{\underset{\varvec{\varvec{\gamma }}\in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\left| \frac{\displaystyle W_{m}(\varvec{\gamma })}{\displaystyle \delta ^{\tau }}\right| =O_{\mathbb {P}_W^{*}}(1)~ \text{ i.p. }, \underset{\Vert \varvec{\gamma }-\varvec{\gamma }^{'}\Vert \le \delta }{\underset{\varvec{\gamma }, \varvec{\gamma }^{'} \in \mathcal {K}, \delta \le \delta _{1}}{\sup }}\frac{\displaystyle |V(\varvec{\gamma },\varvec{\gamma })-V(\varvec{\gamma }^{'},\varvec{\gamma }^{'})|}{\displaystyle \delta ^{\tau }}<\infty . \end{array} \end{aligned}$$
  • (CB4)

    $$r^{\nu -1}_{m}\ell (r_{m})\overline{W_{m}}_{\varvec{\gamma }}+r^{\nu }_{m}\ell (r_{m})\overline{\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\cdot }{r_{m}},h^{0}}}_{\varvec{\gamma }}\Rightarrow \overline{\Lambda }_{\varvec{\gamma }}+\overline{\mathbb {G}}_{\varvec{\gamma }}~\text {i.p.},$$

    where \(\Lambda\) and \(\mathbb {G}\) are given in (C8) and the weak convergence is conditionally on the sample.

  • (CB5) For all \(C>0\), there exist \(m_{0} \in \mathbb {N}\) such that for all \(m \ge m_{0}\),

    $$\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }_{m}},\widehat{h}_{m}} \ge \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{C}{r_{m}}}\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}_{m}}-\widehat{R}_{n},$$

    where \(\widehat{R}_{n}\) is given in (AB3).

Remark 3.7

  1. (i)

    We can obtained the first part of condition (C1) by part (i) of Theorem 3.5.

  2. (ii)

    Assumption (C3) holds under the common condition: for all \(\delta _{2},\delta _{3}>0,\)

    $$\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}}}\left| (\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| =o_{\mathbb {P}^{*}}(r^{-\nu }_{n}\ell ^{-1}(r_{n})),$$

    which is implied by the fact that; there exists a function f and a constant \(\delta _{0}>0\) such that for all \(\delta _{2},\delta _{3}<\delta _{0}\),

    $$r^{\nu }_{n}\ell (r_{n})f\left( \frac{\delta _{2}}{r_{n}},\frac{\delta _{3}}{v_{n}}\right) =o\left( \sqrt{n}\right) ,$$

    and

    $$\begin{aligned} \mathbb {P}^{*}&\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}}}\left| (\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| \right] \\&\le 2\mathbb {P}^{*}\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{n}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{n}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \right] \\&\le \frac{1}{\sqrt{n}}{f\left( \frac{\delta _{2}}{r_{n}},\frac{\delta _{3}}{v_{n}}\right) }. \end{aligned}$$

    Using the same arguments as in Remark 3.3(ii), we get the last inequality.

  3. (iii)

    If we assume that \(j_{n}=\sqrt{n}\), and noting; \(\gamma \mapsto \mathbb {M}_{n}(\gamma ,h^{0})=\frac{r_{n}^{\nu }\ell (r_{n})}{\sqrt{n}} \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\gamma /r_{n},h^{0}}\) is the empirical process with indexed class \(\frac{r_{n}^{\nu }\ell (r_{n})}{\sqrt{n}}\mathcal {M}_{\frac{C}{r_{n}}}\) then, under assumption (B2), the assumptions (C4) and (C5) hold by the following conditions: there exists a \(\delta _{4}>0\) such that for all \(\delta \le \delta _{4}\), \(\mathbb {P}^{*}(\mathbf {M}^{2}_{\delta }) \le K\varphi ^{2}(\delta )\) for some \(C>0\),

    $$\lim \limits _{\delta \longrightarrow 0}\frac{\mathbb {P}^{*}\left[ \mathbf {M}^{2}_{\delta } \mathrm{1\!I}_{\{\mathbf {M}_{\delta }>\eta \delta ^{-2} \varphi ^{2}(\delta ) \}} \right] }{\varphi ^{2}(\delta )}=0,$$

    for all \(\eta >0\) and

    $$\lim \limits _{\epsilon \longrightarrow 0} \lim \limits _{\delta \longrightarrow 0} \sup \limits _{\Vert \varvec{\gamma }_{1}-\varvec{\gamma }_{2}\Vert <\epsilon , \Vert \varvec{\gamma }_{1}\Vert \vee \varvec{\gamma }_{2} \le K}\frac{\mathbb {P}\left[ \mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\varvec{\gamma }_{1}\delta ,h^{0}\right) -\mathbf {m}\left( \mathbf {X},\varvec{\theta }_{0}+\varvec{\gamma }_{2}\delta ,h^{0}\right) \right] ^{2}}{\varphi ^{2}(\delta )}=0,$$

    for all \(C>0\), corresponding the case to Theorem 3.2.10 in van der Vaart and Wellner (1996).

  4. (iv)

    Let \(\mathcal {K}\) be an arbitrary closed bounded subset in \(\mathcal {B}\), the first part of condition (C8) is used to assume the convergence of the marginals of the process \(\gamma \mapsto r^{\nu -1}_{n}\ell (r_{n})W_{n}(\gamma )+r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}\) for deriving its weak convergence in \(\ell ^{\infty }(\mathcal {K})\) by the fact that it is asymptotically tight; which is fulfilling by using (C4), (C5), (C9) and the preceding discussion in (iii). If

    $$r^{\nu -1}_{n}\ell (r_{n})\sup \limits _{\varvec{\varvec{\gamma }}\in \mathcal {K},\varvec{\varvec{\gamma }}\ne 0}\Vert W_{n}(\varvec{\gamma })\Vert \varvec{\gamma }\Vert ^{-1}\Vert =o_{\mathbb {P}}(1),$$

    we treat the given process as in the parametric case, where its marginals converge provided that

    $$\begin{aligned} \begin{array}{l} \lim \limits _{n \rightarrow \infty } \frac{r^{2\nu }_{n}\ell ^{2}(r_{n})}{n} \mathbb {P}\left\{ \left[ \mathbf {m}\left( \mathbf {X}, \varvec{\theta }_{0}+\frac{\gamma _{1}}{r_{n}}, h^{0}\right) -\mathbf {m}\left( \mathbf {X}, \varvec{\theta }_{0}+\frac{\gamma _{2}}{r_{n}}, h^{0}\right) \right] ^{2}\right\} \\ \quad =\mathbb {P}\left[ \left( \mathbb {G}\left( \gamma _{1}\right) -\mathbb {G}\left( \gamma _{2}\right) \right) ^{2}\right] , \end{array} \end{aligned}$$

    for all \(\gamma _{1}, \gamma _{2}\) and we lead to a rate of convergence \(r_{n}\) as the solution of

    $$r_{n}^{\nu }\ell (r_{n})\varphi (1/r_{n})=\sqrt{n},$$

    for more detail see Theorem 3.2.10 of van der Vaart and Wellner (1996). Note that almost all sample paths of the process \(\varvec{\varvec{\gamma }}\mapsto \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })\) have a supremum affiliated to their attitude on bounded closed set, which is guaranteed by the last assumption. The dominant term of the deterministic part \(\Lambda\) is usually a negative definite quadratic form and hence exponential inequalities could lead to such result, for example when we are in the situation of the smooth function, one can refer to Lee and Pun (2006), Ma and Kosorok (2005), Kosorok (2008), Kristensen and Salanié (2017) among many others.

  5. (v)

    Assumption (C9) is a technical assumption, which is the same in the parametric case where the nuisance parameter \(h^{0}\) is known, needs to show that; the process \(\varvec{\varvec{\gamma }}\mapsto r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}\) is asymptotically tight, see Theorem 3.2.10 of van der Vaart and Wellner (1996).

  6. (vi)

    First part of (CB1) follows by part (ii) of Theorem 3.5.

  7. (vii)

    Assumption (CB2) is automatically hold under the condition : for all \(\delta _{2},\delta _{3}>0,\)

    $$\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}}}\left| (\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| =o_{\mathbb {P}_{W}^{*}}(r^{-\nu }_{m}\ell ^{-1}(r_{m})) ~\text {i.p}.$$

    This condition is hold if: there exists a function g and a constant \(\delta _{0}>0\) such that for all \(\delta _{2},\delta _{3}<\delta _{0}\),

    $$r^{\nu }_{m}\ell (r_{m})g\left( \frac{\delta _{2}}{r_{m}},\frac{\delta _{3}}{v_{m}}\right) =o\left( \sqrt{m}\right) ,$$

    and

    $$\begin{aligned} \mathbb {P}^{*}\mathbb {P}_{W}^{*}&\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}}}\left| (\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+(\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h^{0}}\right| \right] \\&\le 2\mathbb {P}^{*}\mathbb {P}_{W}^{*}\left[ \sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{\delta _{2}}{r_{m}}, d_{\mathcal {H}}(h,h^{0}) \le \frac{\delta _{3}}{v_{m}}}\left| \widehat{\mathbb {G}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \right] \\&\le \frac{1}{\sqrt{m}}{g\left( \frac{\delta _{2}}{r_{m}},\frac{\delta _{3}}{v_{m}}\right) }. \end{aligned}$$

    Using the same arguments as in Remark 3.3(v), we get the last inequality.

  8. (viii)

    Following similar discussion of the condition (C7) provided in Remark 3(iv) of Delsol and Van Keilegom (2020), we only change the random function \(W_{n}(\varvec{\gamma })\) for the bootstrap version to \(W_{m}(\gamma )=\langle \Gamma (\varvec{\theta }_{0}, \widehat{h}_{m}),\varvec{\gamma }\rangle\). If we are in the situation where \(\widehat{h}_{m}\) is calculated from a dataset independently from the bootstrapped sample \((\mathbf {X}_{1}^{*},\ldots ,\mathbf {X}_{m}^{*})\), so it is sufficient for assumption (CB4) to suppose the conditional weak convergence of each term; \(r^{\nu -1}_{m}\ell (r_{m})\overline{W_{m}}_{\varvec{\gamma }}\) and \(r^{\nu }_{m}\ell (r_{m})\overline{\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\cdot }{r_{n}},h^{0}}}_{\varvec{\gamma }}\) separately. We can get the convergence of the second one as the same in the situation without the nuisance parameter, the interested reader is referred to Lemma 1 of Lee (2012). Note that if \(r^{\nu -1}_{m}\ell (r_{m}) \Gamma (\varvec{\theta }_{0}, \widehat{h}_{m}) \rightarrow W\) conditionally in distribution, the marginals of the process \(\gamma \mapsto \langle r^{\nu -1}_{m}\ell (r_{m}) \Gamma (\varvec{\theta }_{0}, \widehat{h}_{m}), \gamma \rangle\) tend in distribution to the marginals of \(\gamma \mapsto \langle W, \gamma \rangle .\) Furthermore, if \(r_{m}=\sqrt{m}\) and \(\ell \equiv 1,\) it is common to assume that

    $$\Gamma (\varvec{\theta }_{0}, \widehat{h}_{m})=m^{-1} \sum \limits _{i=1}^{m} U_{i, m}+ o_{\mathbb {P}_{W}^{*}}\left( m^{-1 / 2}\right) ,$$

    where \(U_{i, m}, i=1,\ldots ,m,\) are independent and centered random variables. The convergence follows from Lindeberg’s condition.

Theorem 3.8

(Weak Convergence of Empirical Processes). For all \(K>0\), let \(\mathcal {K}=\{\varvec{\varvec{\gamma }}\in \mathbf {E} : \Vert \varvec{\gamma }\Vert \le K\}\) be a closed bounded subset of \(\mathcal {B}\), treating \(\varvec{\varvec{\gamma }}\mapsto \mathbb {M}_{n}(\gamma ,\widehat{h})\) and \(\varvec{\varvec{\gamma }}\mapsto \widehat{\mathbb {M}}_{n}(\gamma ,\widehat{h}_{m})\) as random elements in \(\ell (\mathcal {K})\) for sufficiently large n, we have the following results:

  1. (i)

    Assume (C1)-(C10). Then

    $$r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},\widehat{h}}\Rightarrow \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }).$$
  2. (ii)

    Assume (A2), (AB1), (B2), (C2)-(C6),(C9)- (C11) and (CB1)-(CB4). Then

    $$r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}} \Rightarrow \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })~~ \text{ i.p. }$$

Our main results concerning weak convergence of \(r_{n}({\varvec{\theta }}_{n} -{\varvec{\theta }}_{0})\) and m out of n bootstrap consistency are embodied in the following theorem.

Theorem 3.9

Assume for any such \(\mathcal {K}\) that almost every sample path of the process \(\varvec{\varvec{\gamma }}\mapsto \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })\) achieves its supremum at a unique random point \(\gamma _{0}= \arg \max \limits _{\gamma \in \mathcal {B}} \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })\), then;

  1. (i)

    Assume (C1)-(C11). Then

    $$r_{n}({\varvec{\theta }}_{n}-{\varvec{\theta }}_{0}) \Rightarrow \gamma _{0}.$$
  2. (ii)

    Assume (A2), (AB1), (B2), the first part of (C1), (C2)-(C6),(C9)- (C11) and (CB1)-(CB5). Then

    $$r_{m}({\varvec{\theta }}_{m} -{\varvec{\theta }}_{n}) \Rightarrow \gamma _{0} ~~\text{ i.p. }$$

Remark 3.10

The result (i) of the Theorem 3.8 is the same result of Lemma 1 of Delsol and Van Keilegom (2020) where the parameter of the interest \(\varvec{\theta }\) is in a Euclidean space, for the particular case \(\nu =2\) and \(\ell \equiv 1\), then by the application of Theorem 3.2.2 of van der Vaart and Wellner (1996) and the uniform tightness of the sequence \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\), the authors established the weak convergence to some tight random variable \(\gamma _{0}\) in \(\ell ^{\infty }(\mathcal {K})\) for the compact set \(\mathcal {K}\) in their Theorem 3 which is given in the result (i) of the Theorem 3.9 in this case. In our setting, we provide the weak convergence of the same sequence for the Banach valued parameter by using Theorem of van der Vaart and Wellner (1996) where the compact sets and the uniformed tightness of \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) are replaced, respectively, by closed bounded sets with a similar structure as the set \(\mathcal {K}\) and \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})=O_{\mathbb {P}^{*}}(1)\), as given in Lee (2012) without the nuisance parameter \(h^{0}\).

Note that (i) still holds if (C4) is replaced by this more weak condition

$$n^{-1}r^{2\nu }_{n}\ell (r_{n})^{2}\mathbb {P}^{*}\mathbf {M}^{2}_{c/r_{n}} \left\{ \mathbf {M}_{c/r_{n}} > \eta nr^{-\nu }_{n}\ell ^{-1}(r_{n})\right\} \rightarrow 0.$$

In order to prove the conditional stochastic equicontinuity of the bootstrapped process \(\widehat{\mathbb {M}}_{n}\) we need the condition (C4), that is fulfilled if the uniform integrability condition is imposed for \(j_{n} \ge n^{c}\), for some \(0<c < 1/4\).

Remark 3.11

It is well known that Theorem 3.9 can be used easily through routine bootstrap sampling, which we describe briefly as follows. More precisely, this can be used, for example, to form confidence bands for the true parameter \(\varvec{\theta }\) based N, be a large integer, sampled samples \(\mathbf { Y}_{1}^{k},\ldots ,\mathbf { Y}_{m}^{k}\), \(k=1,\ldots ,\mathfrak {N}\). Let \(\varvec{\theta }_{m}^{k}\) the bootstrapped estimator of \(\varvec{\theta }\) based on the sample \(\mathbf { Y}_{1}^{k},\ldots ,\mathbf { Y}_{m}^{k}\), \(k=1,\ldots ,\mathfrak {N}\). An application of Theorem 3.9 implies that

$$(r_{n}({\varvec{\theta }}_{n}-{\varvec{\theta }}_{0}),r_{m}({\varvec{\theta }}_{m}^{1} -{\varvec{\theta }}_{n}),\ldots , r_{m}({\varvec{\theta }}_{m}^{\mathfrak {N}} -{\varvec{\theta }}_{n}))\Rightarrow (\gamma _{0},\gamma _{0}^{1},\ldots ,\gamma _{0}^{\mathfrak {N}})~~ \text{ i.p. },$$

where \(\gamma _{0}^{1},\ldots ,\gamma _{0}^{\mathfrak {N}}\) are independent copies of \(\gamma _{0}\). Notice that we have

$$\lim _{n\rightarrow \infty }\mathbb {P}\left( {\varvec{\theta }}_{n}-r_{n}^{-1}c(\alpha )\le {\varvec{\theta }}_{0}\le {\varvec{\theta }}_{n}+r_{n}^{-1}c(\alpha )\right) =\mathbb {P}(|\gamma _{0}|\le c(\alpha ))=1-\alpha .$$

In order to approximate \(c(\alpha )\), one can use the sampling estimator \(\widehat{c}(\alpha )\), of \(c(\alpha )\), as the smallest z such that

$$\frac{1}{\mathfrak N}\sum _{k=1}^{\mathfrak N}\mathbbm {1}_{\left\{ r_{m}({\varvec{\theta }}_{m}^{k} -{\varvec{\theta }}_{n})\le z\right\} } \ge 1-\alpha .$$

Remark 3.12

In nonregular problems where the conventional n out of n bootstrap is inconsistent, the m out of n bootstrap provides a useful remedy to restore consistency. In practice, however, choosing an appropriate m needs careful attention. Asymptotically, \(\sqrt{n}, \log n\) or \(20 \log n\) satisfy the o(n) requirement, but in finite sample settings the actual results can vary dramatically depending on the choice. Let \(\mathcal {X}_{n}=\left( X_{1}, \ldots , X_{n}\right)\) be a random sample drawn from an unknown distribution F, and \(T_{n}\left( \mathcal {X}_{n}, F\right)\) be a statistical functional of interest. Under mild conditions the m out of n bootstrap distribution \(\mathcal {L}_{m, n}^{*}\) provides a consistent estimator of the distribution \(\mathcal {L}_{n}\) of \(T_{n}\left( \mathcal {X}_{n}, F\right) ,\) provided that the bootstrap sample size m is properly chosen, refer to Götze and Račkauskas (2001) and Bickel et al. (1997). Empirical selection of m has long been an important problem, which has been discussed by, for example, Datta and McCormick (1995), Hall et al. (1995) and Politis et al. (1999) in different contexts. The prevailing idea is to estimate a theoretically optimal sample size m, defined in a frequentist sense to be the value of m which minimises the expected value of some metric measure \(d\left( \mathcal {L}_{n}, \mathcal {L}_{m, n}^{*}\right)\) between \(\mathcal {L}_{n}\) and \(\mathcal {L}_{m, n}^{*}\). The problem can be solved using bootstrap samples of size m, where \(m \rightarrow \infty\) and \(m / n \rightarrow 0\). Bickel and Sakov (2008) proposed an adaptive rule to select a value \(\hat{m}\) and discuss its properties. The authors show, under some conditions, that \(\widehat{m} / n {\mathop {\rightarrow }\limits ^{P}} 1\) when the n bootstrap works, but \(\widehat{m} \rightarrow \infty\) and \(\widehat{m} / n \rightarrow 0\) when the n-bootstrap does not work. More precisely, the authors suggested the following rule for choosing m:

  1. 1.

    Consider a sequence of m’s of the form

    $$m_{j}=\left\lfloor q^{j} n \right\rfloor , \quad \text{ for } j=0,1,2, \ldots , \quad 0<q<1,$$

    where \(\lfloor \alpha \rfloor\) denotes the smallest integer \(\ge \alpha\).

  2. 2.

    For each \(m_{j},\) find \(L_{m_{j}, n}^{*}\) (in practice this is done by Monte-Carlo).

  3. 3.

    Let d be some metric consistent with convergence in law, and set

    $$\hat{m}=\underset{m_{j}}{{\text {argmin }}} d \left( L_{m_{j}, n}^{*}, L_{m_{j+1}, n}^{*}\right) .$$

If there is more than one value of m which minimizes the difference, then we pick the largest one. These results mean that the rule behaves well under both situations. Swanepoel (1986) proposed \(m = (2/3) n\) to obtain the desired coverage probability of a confidence interval. Alin et al. (2017) have considered \(m=n^{j}\) where the value j satisfies \(n^{j}=\frac{2}{3} n\). Solving this equation for j, this expression leads to the choice

$$m=n^{j} \quad \text{ for } j=1+\frac{\log \left( \frac{2}{3}\right) }{\log (n)},$$

for which we note \(0<j<1\), so that \(m=o(n)\). Götze and Račkauskas (2001) have suggested the estimation m by minimizing \(d\left( \mathcal {L}_{m, n}^{*}, \mathcal {L}_{m / 2, n}^{*}\right) ,\) yielding an optimal bootstrap sample size in the sense of Wei et al. (2016), provided that the latter has order \(o_{p}(n)\). Wei et al. (2016) have investigated stochastic version of the optimal bootstrap sample size, defined as the minimiser of an error measure calculated directly from the observed sample. The authors have developed procedures for calculating the stochastically optimal value of m. The performance of their methodology is illustrated in the special forms of Edgeworth-type expansions which are typically satisfied by statistics of the shrinkage type.

4 Numerical Results

We provide numerical illustrations regarding the asymptotic distribution of estimators in the classification with missing data, details are provided in Sect. 7.2. The computing program codes were implemented in R. In our simulation, we will show resampling bootstrap samples of size n fails while resampling with size m satisfying the conditions given in previous sections for the consistency of the bootstrap. Let us describe the model, define

$$X_{2}=\max (\min (U+\epsilon ,1),0),$$

where \(U \sim \mathcal {U}[0,1]\), \(\epsilon \sim \mathcal {U}[-.1,.1]\) and \(X_{1} \sim \mathcal {U}[0,1]\), with \(X_{1}\), \(\epsilon\) and U are independent. Let

$$\begin{aligned} Y= \mathbbm {1}\{U \ge f_{\theta }(X_{1})\}, \end{aligned}$$
(9)

were \(f_{\theta }(x_{1})=\theta x_{1}\), for some \(\theta ,\) we define

$$p\left( x_{1}\right) =\mathbb {P}\left( \Delta =1 \mid X_{1}=x_{1}\right) =\alpha _{0}+\left( x_{1}-0.5\right) ^{2}.$$

The data is composed of \(Z_{i}=\left( X_{i1}, X_{i2}, Y_{i}\Delta _{i}, \Delta _{i}\right)\) \(i=1,2,\ldots ,n\) from the described model. For the bandwidth, we use \(h_{n}=\frac{c_{h}}{\sqrt{n}}\) (\(h_{m}=\frac{c_{h}}{\sqrt{m}}\)), which satisfies the requirements of regularity conditions of the asymptotic theory. In this simulation, we use the quadratic kernel defined by

$$K(u)=\frac{15}{16}\left( 1-u^{2}\right) ^{2} \mathbbm {1}\{|u| \le 1\},$$

which is a density function having support \([-1,1]\). The results given below are based on three different value of n, we took \(n=250\), \(n=1000\) and \(n=2000\) and the true value to be \(\theta ^{0}=1\), we choose \(c_{h}=3.5\) and \(\alpha _{0}=0.5\), this choice is not restrictive, we can obtain the same desired result with different value of \(c_{h}\) and \(\alpha _{0}\) for example \(c_{h}=2~~ \text {or} ~~5\) and \(\alpha _{0}=.25~~ \text {or} ~~.75\). The bootstrap procedure is as follows, for each value of m we generate B independent bootstrap samples \(\left\{ Z_{i b}^{*}: i \le m\right\}\) for \(b=1,\ldots ,{B}\), using some method of bootstrapping, and for each given value of m, we compute an estimator \(\theta ^{(b)}_{m}\) based on the b-th bootstrapped sample. Our main objective is to give a comparison between the distribution of \(n^{1/3}(\theta _{n}-\theta ^{0})\) with the m out of n bootstrap distribution of \(m^{1/3}(\theta _{m}-\theta _{n})\). To achieve this goal, we have used the Kolmogorov distance between the distributions of \(n^{1/3}(\theta _{n}-\theta ^{0})\) and \(m^{1/3}(\theta _{m}-\theta _{n})\) by averaging over 1000 and 1500 m out of n bootstrap sample drawn from one chosen arbitrarily random sample. Table 1 displays the results for \(n=250\), \(n=1000\) and \(n=2000\) which show that the most accurate estimates are given for the choices of \(m=50\), \(m=60\) and \(m=110\) respectively. Deviations from these choices in either direction result in deteriorating accuracy. In Figs. 1, 2 and 3, we give the empirical distribution of the true distribution and the empirical distribution of the bootstrapped one for some values of m given in Table 1, which each figure compares the estimated bootstrap empirical distribution with those of \(n^{1/3}(\theta _{n}-\theta ^{0})\) for the different values of n. All these figures show that the classical bootstraps (n out n bootstrap) fail while the m out n bootstraps are consistent. Figures 4, 5 and 6 show the root mean squared error (RMSE) of the estimator \(\theta _{m}\) for several values of m given in Table 1, for each value of n. One can see as in any other inferential context, the greater the sample size, the better.

Table 1 Kolmogorov Distance (KD) Between Distributions of \(n^{1/3}(\theta _{n}-\theta ^{0})\) and \(m^{1/3}(\theta _{m}-\theta _{n})\), for \(n=250\), \(n=1000\) and \(n=2000\)
Fig. 1
figure 1

Empirical distribution of \(n^{1/3}(\theta _{n}-\theta ^{0})\) compared with those of \(m^{1/3}(\theta _{m}-\theta _{n})\), \(m=50\), \(m=110\), \(m=200\), \(m=250\) and \(n=250\)

Fig. 2
figure 2

Empirical distribution of \(n^{1/3}(\theta _{n}-\theta ^{0})\) compared with those of \(m^{1/3}(\theta _{m}-\theta _{n})\), \(m=50\), \(m=60\), \(m=275\), \(m=1000\) and \(n=1000\)

Fig. 3
figure 3

Empirical distribution of \(n^{1/3}(\theta _{n}-\theta ^{0})\) compared with those of \(m^{1/3}(\theta _{m}-\theta _{n})\), \(m=50\), \(m=110\), \(m=500\), \(m=2000\) and \(n=2000\)

Fig. 4
figure 4

The RMSE of \(\theta _{m}\) in function of m, for \(n=250\)

Fig. 5
figure 5

The RMSE of \(\theta _{m}\) in function of m, for \(n=1000\)

Fig. 6
figure 6

The RMSE of \(\theta _{m}\) in function of m, for \(n=2000\)

5 Concluding Remarks

In the present work, we have considered the estimation of a parameter \(\varvec{\theta }\) that maximizes a certain criterion function depending on an unknown, possibly infinite-dimensional nuisance parameter h. We have followed the common estimation procedure by maximizing the corresponding empirical criterion, in which the nuisance parameter is replaced by some nonparametric estimator. We show that the M-estimators converge weakly to maximizers of Gaussian processes in an abstract setting permitting a great flexibility for applications. We have established that the m out of n bootstrap, in this extended setting, is weakly consistent under conditions similar to those required for weak convergence of the M-estimators in the general framework of Lee (2012), when an additional difficulty comes from the nuisance parameters. The goal of this paper is therefore to extend the existing theory on the bootstrap of the M-estimators, this generalization is far from being trivial and harder to control the nuisance parameter in non-standard framework, which form a basically unsolved open problem in the literature. This requires the effective application of large sample theory techniques, which were developed for the empirical processes. Examples of applications are given to illustrate the generality and the usefulness of our results. It would be interesting to extend the results to a dependent framework, this would require further theory which is out of the scope of the present article. An important question is how to extend our findings to the setting of incomplete data (censored data, missing data, etc). This will be a subject of investigation for future work.

6 Mathematical Developments

In this section, we give the proofs of the asymptotic results of our M-estimator \(\varvec{\theta }_{n}\) and its bootstrap version.

Proof of Theorem 3.2

Part (i) follows directly from Theorem 1 of Delsol and Van Keilegom (2020). For (ii), note that (AB1) and (A2) imply that

$$\begin{aligned} \Vert \widehat{\mathbb {P}}_{m} - \mathbb {P}\Vert _{\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}} = o_{\mathbb {P}_W^{*}} (1)~~ \text{ a.s. } \end{aligned}$$
(10)

By using the result in Lemma 3.6.16 of van der Vaart and Wellner (1996). We have; for every \(\eta >0\) there is \(\delta >0\), such that

$$\begin{aligned} \mathbb {P}_W^{*}&\left( \Vert {\varvec{\theta }}_{m} - {\varvec{\theta }}_{0}\Vert>\eta \right) \\\le & {} \mathbb {P}_W^{*}\left( \mathbb {P}\mathbf {m}_{{\varvec{\theta }}_{0},h^{0}}- \mathbb {P}\mathbf {m}_{{\varvec{\theta }_{m}},h^{0}}>\delta \right) \\\le & {} \mathbb {P}_W^{*}\left( 2\sup _{\varvec{\theta }\in \varvec{\Theta }}|\mathbb {P}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}-\mathbb {P}\mathbf {m}_{\varvec{\theta },h^{0}}|+\mathbb {P}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}_{m}}-\mathbb {P}\mathbf {m}_{{\varvec{\theta }_{m}},\widehat{h}_{m}}>\delta \right) \\\le & {} \mathbb {P}_W^{*}\left( 2\sup _{\varvec{\theta }\in \varvec{\Theta }}|\mathbb {P}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}-\mathbb {P}\mathbf {m}_{\varvec{\theta },h^{0}}|+2\Vert \widehat{\mathbb {P}}_{m} - \mathbb {P}\Vert _{\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}}>\delta -\widehat{R}_{n}\right) . \end{aligned}$$

Making use of the assumption (AB3), there is \(n_{0} \in \mathbb {N}\), such that for every \(n\ge n_{0}\), we obtain the existence of \(\delta ^{\prime }>0\), such that \(\delta -\widehat{R}_{n}\ge 4\delta ^{\prime }\) i.p., and the last expression is bounded by:

$$\begin{aligned} \mathbb {P}_W^{*}&(\Vert {\varvec{\theta }}_{m} - {\varvec{\theta }}_{0}\Vert>\eta )\\\le & {} \mathbb {P}_W^{*}\left( 2\sup _{\varvec{\theta }\in \varvec{\Theta }}|\mathbb {P}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}-\mathbb {P}\mathbf {m}_{\varvec{\theta },h^{0}}|+2\Vert \widehat{\mathbb {P}}_{m} - \mathbb {P}\Vert _{\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}}>4\delta ^{\prime }\right) \\\le & {} \mathbb {P}_W^{*}\left( \sup _{\varvec{\theta }\in \varvec{\Theta }}|\mathbb {P}\mathbf {m}_{\varvec{\theta },\widehat{h}_{m}}-\mathbb {P}\mathbf {m}_{\varvec{\theta },h^{0}}|>\delta ^{\prime }\right) +\mathbb {P}_W^{*}\left( \Vert \widehat{\mathbb {P}}_{m} - \mathbb {P}\Vert _{\mathcal {M}_{{\varvec{\Theta }},\mathcal {H}}}>\delta ^{\prime }\right) . \end{aligned}$$

By using the assumptions (AB1), (A3), (AB3) in combination with (10), we obtain the desired result.\(\square\)

Proof of Theorem 3.5

Firstly note that, we will give the proof of this theorem for the particular choice of function

$$\varvec{\psi }_{1}(x)=Id(x)=x~ \text{ and }~ \varvec{\psi }_{2}(x)=\frac{x^{\nu }}{\ell (1/x)}~ \text{ for } \text{ every }~x\ne 0.$$

It worth noticing that this condition is in agreement with those used in Lee (2012) in the parametric setting. Let \(\beta _{n}\) be the \(o_{\mathbb {P}^{*}}(1)\) in assumption (B3) and we define the sets

$$S_{j,n}=\left\{ \varvec{\theta } \in \varvec{\Theta } : 2^{j-1}<r_{n}\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le 2^{j}\right\} ,$$

we observe

$$\varvec{\Theta } \backslash \varvec{\theta }_{0}=\bigcup _{j=1}^{\infty }S_{j,n}.$$

Our objective is to show that; for any \(\epsilon >0\), there exists \(\tau _{\epsilon }>0\) such that

$$\begin{aligned} \mathbb {P}^{*}\left( r_{n}\Vert \varvec{\theta }_{n}-\varvec{\theta }_{0}\Vert >\tau _{\epsilon }\right) <\epsilon , \end{aligned}$$
(11)

for any n sufficiently large. In the sequel, we work with arbitrary fixed \(\epsilon >0\). For any \(\delta ,\delta _{1},M,K,K^{'}>0\), by using the condition (A5), we readily obtain

$$\begin{aligned} \mathbb {P}^{*}&\left( r_{n}\Vert \varvec{\theta }_{n}-\varvec{\theta }_{0}\Vert>2^{M}\right) \\\le & {} \sum \limits _{M \le j, 2^{j}\le \delta r_{n}}\mathbb {P}^{*}\left( \sup \limits _{\varvec{\theta } \in S_{j,n}}[\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }}_{0},\widehat{h}}] \ge -Kr^{-\nu }_{n}\ell (r_{n})^{-1}, A_{n}\right) \\&+\mathbb {P}^{*}\left( 2\Vert \varvec{\theta }_{n}-\varvec{\theta }_{0}\right\| \ge \delta )+\mathbb {P}^{*}\left( r^{\nu }_{n}\ell (r_{n})|R_{n}|>K\right) +\mathbb {P}^{*}\left( r^{\nu -1}_{n}\ell (r_{n})|W_{n}|>K^{'}\right) \\&+\mathbb {P}^{*}\left( |\beta _{n}|>\frac{C}{2}\right) +\mathbb {P}^{*}\left( d_{\mathcal {H}}\left( \widehat{h},h^{0}\right) >\frac{\delta _{1}}{v_{n}}\right) , \end{aligned}$$

where

$$A_{n}=\left\{ r^{\nu -1}_{n}\ell (r_{n})|W_{n}| \le K^{'}, |\beta _{n}| \le \frac{C}{2}, d_{\mathcal {H}}\left( \widehat{h},h^{0}\right) \le \frac{\delta _{1}}{v_{n}}\right\} .$$

Indeed, we can write

$$\begin{aligned} \mathbb {P}^{*}&\left( r_{n} \Vert \varvec{\theta }_{n}- \varvec{\theta }_{0}\Vert >2^{M}, 2\Vert \varvec{\theta }_{n}- \varvec{\theta }_{0}\Vert <\delta , r^{\nu }_{n}\ell (r_{n})|R_{n}| \le K, A_{n}\right) \\&\le \sum _{j \ge M, 2^{j} \le \delta r_{n}} \mathbb {P}^{*}\left( \varvec{\theta }_{n} \in S_{j, n}, r^{\nu }_{n}\ell (r_{n})|R_{n}| \le K, A_{n}\right) \\&\le \sum _{j \ge M, 2^{j} \le \delta r_{n}} \mathbb {P}^{*}\left( \sup _{\theta \in S_{j, n}}\left[ \mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\right] \ge -R_{n}, r^{\nu }_{n}\ell (r_{n})|R_{n}| \le K, A_{n}\right) \\&\le \sum _{j \ge M, 2^{j} \le \delta r_{n}} \mathbb {P}^{*}\left( \sup _{\theta \in S_{j, n}}\left[ \mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\right] \ge -Kr^{-\nu }_{n}\ell (r_{n})^{-1}, A_{n}\right) . \end{aligned}$$

Condition (C) implies, for all \(\delta >0\), that there exists \(n_{\epsilon }\), such that, for \(n>n_{\epsilon }\), we have

$$\mathbb {P}^{*}\Big (2\Vert \varvec{\theta }_{n}- \varvec{\theta }_{0}\Vert \ge \delta \Big )< \frac{\epsilon }{6}.$$

By the definitions of \(R_{n}\), \(W_{n}\) and under condition (B1), there exist \(\delta _{1}, K_{\epsilon }, K^{'}_{\epsilon }\) and \(K_{2,\epsilon }\) such that we have

$$\begin{aligned} \begin{array}{ll} \mathbb {P}^{*}\left( r^{\nu }_{n}\ell (r_{n})|R_{n}|>K_{\epsilon }\right)<\frac{\epsilon }{6}, ~~~~\mathbb {P}^{*}\left( r^{\nu -1}_{n}\ell (r_{n})|W_{n}|>K^{'}_{\epsilon }\right) ,\\ \mathbb {P}^{*}\left( |\beta _{n}|>\frac{C}{2}\right)<\frac{\epsilon }{6}, ~~~~\mathbb {P}^{*}\left( d_{\mathcal {H}}(\widehat{h},h^{0})>\frac{\delta _{1}}{v_{n}}\right) <\frac{\epsilon }{6}. \end{array} \end{aligned}$$
(12)

For n larger than some \(n_{1}\). We fix \(\delta <\delta _{0}\) and suppose \(n \ge \max (n_{0},n_{1},n_{\epsilon })\), for \(2^{j} \le \delta r_ {n}\), we have the assumptions (B2) and (B3) are fulfilled on all \(S_{j,n}\). For each fixed j such that \(2^{j} \le \delta r_{n}\), under assumption (B3), for all \(\varvec{\theta } \in S_{j,n}\), we then have

$$\begin{aligned} \mathbb {P}_{n}&\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\\&\le \mathbb {P}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}+\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{n}}}\left| \mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}-\mathbb {P}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}+\mathbb {P}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\right| \\ &\le |W_{n}|\frac{2^{j}}{r_{n}}-(C-\beta _{n})\frac{2^{\nu (j-1)}}{r^{\nu }_{n}\ell (2^{-(j-1)}r_{n})}+\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{n}}}\left| \mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}-\mathbb {P}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}+\mathbb {P}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\right| \\&\le |W_{n}|\frac{2^{j}}{r_{n}}-(C-\beta _{n})\frac{1}{2^{-j\nu }r^{\nu }_{n}\ell (2^{-j}r_{n})}+n^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{n}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}}\right| . \end{aligned}$$

Consequently, we obtain the following inequalities;

$$\begin{aligned} \mathbb {P}^{*}&\left( \sup _{\theta \in S_{j, n}}\left[ \mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\right] \ge -K_{\epsilon }r^{-\nu }_{n}\ell (r_{n})^{-1}, A_{n}\right) \\\le & {} \mathbb {P}^{*}\left( n^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{n}},d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{n}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \ge \frac{C}{2}\frac{2^{j\nu }}{r^{\nu }_{n}\ell (2^{-j}r_{n})}-K^{\prime }_{\epsilon }\frac{2^{j}}{r^{\nu }_{n}\ell (r_{n})}-K_{\epsilon }\frac{1}{r^{\nu }_{n}\ell (r_{n})}\right) \\\le & {} \mathbb {P}^{*}\left( n^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{n}},d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{n}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \right. \\&\left. \qquad \qquad \qquad \qquad \ge \frac{2^{j\nu }}{r^{\nu }_{n}\ell (2^{-j}r_{n})} \left( \frac{C}{2}-K^{\prime }_{\epsilon }\frac{\ell (2^{-j}r_{n})}{2^{j(\nu -1)}\ell (r_{n})}-K_{\epsilon }\frac{2^{-j\nu }\ell (2^{-j}r_{n})}{\ell (r_{n})} \right) \right) . \end{aligned}$$

For any \(\lambda >0\), we can find a non-decreasing function \(\xi\) such that

$$x^{\lambda }\ell (x)\sim \xi (x)~ \text{ as } ~x\rightarrow \infty .$$

It follows that \(\frac{2^{-j\lambda }\ell (2^{-j}r_{n})}{\ell (r_{n})}\) is uniformly bounded for \(M \le j \le \log _{2}\delta r_{n}\) and for all n. Making use of the condition (B2) in combination with the Chebyshev’s inequality and the fact that \(\varphi (c\delta ) \le c^{\alpha }\delta\) for all \(c \ge 1\), there exists a positive constant \(C^{\prime }\) and for any \(\lambda >0\), we have

$$\begin{aligned} \mathbb {P}^{*}&\left( \sup _{\theta \in S_{j, n}}\left[ \mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }_{0}},\widehat{h}}\right] \ge -K_{\epsilon }r^{-\nu }_{n}\ell (r_{n})^{-1}, A_{n}\right) \\\le & {} C^{\prime }2^{-j\nu }r^{\nu }_{n}\ell \left( 2^{-j}r_{n}\right) n^{-1/2}\varphi \left( \frac{2^{j}}{r_{n}}\right) \\\le & {} C^{\prime }2^{-j(\nu -\lambda )}r^{\nu }_{n}\ell \left( r_{n}\right) n^{-1/2}\varphi \left( \frac{2^{j}}{r_{n}}\right) \\ \le & {} C^{\prime }2^{-j(\nu ^{\prime }-\alpha )}r^{\nu }_{n}\ell (r_{n})n^{-1/2}\varphi \left( \frac{1}{r_{n}}\right) , \end{aligned}$$

where \(\nu ^{\prime }=\nu -\lambda >\alpha\). By choosing small value of \(\lambda\) and by using the properties of the function \(\varphi (\cdot )\), we infer that

$$\begin{aligned} \sum \limits _{M \le j, 2^{j}\le \delta r_{n}}&\mathbb {P}^{*}\left( \sup \limits _{\varvec{\theta } \in S_{j,n}}[\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }},\widehat{h}}-\mathbb {P}_{n}\mathbf {m}_{{\varvec{\theta }}_{0},\widehat{h}}] \ge -Kr^{-\nu }_{n}\ell (r_{n})^{-1}, A_{n}\right) \\&\le \sum \limits _{M \le j}2^{-j(\nu ^{\prime }-\alpha )}, \end{aligned}$$

the last expression tends to 0 as \(M\rightarrow \infty\), so we obtain the result (i) of our theorem for sufficiently large value of M and n.

For (ii) we have :

$$\begin{aligned} \begin{aligned} \mathbb {P}_{W}^{*}&\left( r_{m}\Vert \varvec{\theta }_{m}-\varvec{\theta }_{0}\Vert>2^{M}\right) \\\le & {} \sum \limits _{M \le j, 2^{j}\le \delta r_{m}}\mathbb {P}_{W}^{*}\left( \sup \limits _{\varvec{\theta } \in S_{j,n}}[\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }},\widehat{h}_{m}}-\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }}_{0},\widehat{h}_{m}}] \ge -Kr^{-\nu }_{m}\ell (r_{m})^{-1}, A_{m}\right) \\&+\mathbb {P}_{W}^{*}\left( 2\Vert \varvec{\theta }_{m}-\varvec{\theta }_{0}\right\| \ge \delta )+\mathbb {P}_{W}^{*}\left( r^{\nu }_{m}\ell (r_{m})|\widehat{R}_{n}|>K\right) +\mathbb {P}_{W}^{*}\left( r^{\nu -1}_{m}\ell (r_{m})|W_{m}|>K^{'}\right) \\&+\mathbb {P}_{W}^{*}\left( |\beta _{n}|>\frac{C}{2}\right) +\mathbb {P}_{W}^{*}\left( d_{\mathcal {H}}\left( \widehat{h}_{m},h^{0}\right) >\frac{\delta _{1}}{v_{m}}\right) . \end{aligned} \end{aligned}$$
(13)

We obtain from assumption (BB3), for each fixed j such that \(2^{j}< \delta r_{m}\) and for all \(\varvec{\theta }\in S_{m,j}\)

$$\begin{aligned} \widehat{\mathbb {P}}_{m}&\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}\\\le & {} \mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{m}}}\left| \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}-\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}+\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}-\mathbb {P}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \\\le & {} |W_{m}|\frac{2^{j}}{r_{m}}-(C-\beta _{n})\frac{1}{2^{-j\nu }r^{\nu }_{m}\ell (2^{-j}r_{m})}+m^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{m}}}\left| \widehat{\mathbb {G}}_m\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}\right| \\&+n^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{m}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}\right| . \end{aligned}$$

This gives us, by using Chebyshev’s inequality, for some \(C^{\prime }>0\)

$$\begin{aligned}\mathbb {P}_{W}^{*}&\left( \sup \limits _{\varvec{\theta } \in S_{j,n}}[\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},\widehat{h}_{m}}] \ge -Kr^{-\nu }_{m}\ell (r_{m})^{-1}\right) \\ &\le {} \mathbb {P}_{W}^{*}\left( n^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{m}},d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{m}}}\left| \mathbb {G}_{n}\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| +m^{-1/2}\sup \limits _{\Vert \varvec{\theta }-\varvec{\theta }_{0}\Vert \le \frac{2^{j}}{r_{m}},d_{\mathcal {H}}(h,h^{0})\le \frac{\delta _{1}}{v_{m}}}\left| \widehat{\mathbb {G}}_m\widetilde{\mathbf {m}}_{\varvec{\theta }-\varvec{\theta }_{0},h}\right| \right. \\&\left. \qquad \qquad \qquad \qquad \ge \frac{2^{j\nu }}{r^{\nu }_{m}\ell (2^{-j}r_{m})} \left( \frac{C}{2}-K^{\prime }_{\epsilon }\frac{\ell (2^{-j}r_{m})}{2^{j(\nu -1)}\ell (r_{m})}-K_{\epsilon }\frac{2^{-j\nu }\ell (2^{-j}r_{m})}{\ell (r_{m})} \right) \right) \\& \le {} C^{\prime }2^{-j\nu ^{\prime }}r^{\nu }_{m}\ell (r_{m})m^{-1/2}\left\{ \mathbb {P}_{W}^{*}\left\| \widehat{\mathbb {G}}_{m}\right\| _{\mathcal {M}_{2^{j}/r_{m},\delta _{1}/v_{m}}}+m^{1 / 2} n^{-1 / 2}\left\| \mathbb {G}_{n}\right\| _{\mathcal {M}_{2^{j}/r_{m},\delta _{1}/v_{m}}}\right\} . \end{aligned}$$

From assumptions (B2) and (BB2) the outer expectation of the first term in right of (13) is bounded by

$$\begin{aligned} \sum \limits _{M\le j, 2^{j}\le \delta r_{m}}&\mathbb {P}\mathbb {P}_{W}^{*}\left( \sup \limits _{\varvec{\theta } \in S_{j,n}}[\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }},\widehat{h}_{m}}-\widehat{\mathbb {P}}_{m}\mathbf {m}_{{\varvec{\theta }}_{0},\widehat{h}_{m}}] \ge -Kr^{-\nu }_{m}\ell (r_{m})^{-1}, A_{m}\right) \\\le & {} \sum \limits _{M \le j, 2^{j}\le \delta r_{m}}C^{\prime }2^{-j\nu ^{\prime }}r^{\nu }_{m}\ell (r_{m})m^{-1/2}\left\{ \mathbb {P}\mathbb {P}_{W}^{*}\left\| \widehat{\mathbb {G}}_{m}\right\| _{\mathcal {M}_{2^{j}/r_{m},\delta _{1}/v_{m}}}+m^{1 / 2} n^{-1 / 2}\mathbb {P}\left\| \mathbb {G}_{n}\right\| _{\mathcal {M}_{2^{j}/r_{m},\delta _{1}/v_{m}}}\right\} \\\le & {} \sum \limits _{M \le j, 2^{j}\le \delta r_{m}}C^{\prime }2^{-j\nu ^{\prime }}r^{\nu }_{m}\ell (r_{m})m^{-1/2}\varphi \left( \frac{2^{j}}{r_{m}}\right) \\&+\sum \limits _{M \le j, 2^{j}\le \delta r_{m}}C^{\prime }m^{1 / 2} n^{-1 / 2}2^{-j\nu ^{\prime }}r^{\nu }_{m}\ell (r_{m})m^{-1/2}\varphi \left( \frac{2^{j}}{r_{m}}\right) \\\le & {} C^{\prime }\sum \limits _{M \le j}2^{-j(\nu ^{\prime }-\alpha )}+C^{\prime }m^{1 / 2} n^{-1 / 2}\sum \limits _{M \le j}2^{-j(\nu ^{\prime }-\alpha )}, \end{aligned}$$

with assumption (AB1) in mind the last two terms converge to 0 as \(M,n \rightarrow \infty\), the outer expectation of the others terms in (13) are \(o_{\mathbb {P}_{W}^{*}}(1)\) i.p., by Lemma 3 of Cheng and Huang (2010), which completes the proof of Theorem 3.5.\(\square\)

Proof of Theorem 3.8

The proof of the first part (i) of Theorem 3.8 is given in Lemmas 1, 2 and 3 of Delsol and Van Keilegom (2020), where in our setting we use bounded closed subsets in the place of compact subsets. We note by their Lemma 2, we obtain the existence of \(\xi _{1,n},\xi _{2,n},\xi _{3,n}\) such that

$$\sup \limits _{\varvec{\varvec{\gamma }}\in \mathcal {K}}|\xi _{l,n}|=o_{\mathbb {P}}(1), ~\text{ for }~l=1,2,3,$$

and the following decomposition

$$r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},\widehat{h}}(1+\xi _{1,n})=\left[ r^{\nu -1}_{n}\ell (r_{n})W_{n}+r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}\right] (1+\xi _{2,n})+\xi _{3,n}.$$

By their Lemma 3, the properties of the function \(\gamma \mapsto W_{n}(\gamma )\) and the assumptions of Theorem 3.8; we obtain the weak convergence of the process

$$\gamma \mapsto r^{\nu -1}_{n}\ell (r_{n})W_{n}(\gamma )+r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}.$$

Briefly, we have the following decomposition;

$$T_{n}(\gamma )=r^{\nu -1}_{n}\ell (r_{n})W_{n}(\gamma )+r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}=T_{1,n}(\gamma )+T_{2,n}(\gamma ),$$

where

$$T_{1,n}(\gamma )=\mathbb {M}_{n}(\gamma ,h^{0})$$

and

$$T_{2,n}=r^{\nu }_{n}\ell (r_{n})\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}+r^{\nu -1}_{n}\ell (r_{n})W_{n}(\gamma ).$$

The process \(\gamma \mapsto T_{1,n}(\gamma )\) does not depend on the estimation of nuisance parameter, so it can be studied in a similar way as in the parametric model, by Theorem 2.11.1 of van der Vaart and Wellner (1996) and the use of assumptions (C4), (C5), (C9) and (C10), we obtain its uniformly asymptotic equicontinuity. For the process \(\gamma \mapsto T_{2,n}(\gamma )\), we can show that it is asymptotically uniformly equicontinuous by the same method given in the proof of their Lemma 3. By Theorem 1.5.7 and 1.5.4 of van der Vaart and Wellner (1996), we obtain the asymptotic tightness and the weak convergence of \(T_{n}\) to \(\Lambda +\mathbb {G}\) in \(\ell (\mathcal {K})\) and using Addendum 1.5.8 in the same reference; the almost all paths of the limiting process on \(\mathcal {K}\) are uniformly continuous with respect to \(\Vert \cdot \Vert\). Finally by Slutsky’s theorem we obtain the desired result.\(\square\)  

For part (ii) we are in the situation to show the weak convergence of the bootstrapped process, which follows directly from Slutsky’s theorem and Lemmas 6.1 and 6.2 given bellow.

Lemma 6.1

Let \(\mathcal {K}=\{\varvec{\varvec{\gamma }}\in \mathbf {E} : \Vert \varvec{\gamma }\Vert \le K\}.\) Then under assumptions of part (ii) of Theorem 3.8, for all \(\gamma \in \mathcal {K}\), there exist \(z_{0,m}\), \(z_{1,m}\), \(z_{2,m}\), such that

$$\sup _{\gamma \in \mathcal {K}}\left| z_{j, m}\right| =o_{\mathbb {P}_{W}^{*}}(1),~ \text{ i.p. },~ j=0,1,2,$$

and

$$\begin{aligned} r^{\nu }_{m}&\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}}\left( 1+z_{0, m}\right) \\ {}= & {} \left[ r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+r^{\nu -1}_{m}\ell (r_{m}) W_{m}(\gamma )\right] \left( 1+z_{1, m}\right) +z_{2, m}. \end{aligned}$$

Proof of Lemma 6.1

We need to introduce the following notation

$$\begin{aligned} \alpha _{0, n}(\gamma ) &= {} \frac{\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}-\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}-\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}}{r^{-\nu }_{n}\ell ^{-1}(r_{n})+\left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}\right| +\left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| +|\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}|+\left| \mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| },\\ \alpha _{0, m}(\gamma )&= {} \frac{\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}-\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}-\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}}{r^{-\nu }_{m}\ell ^{-1}(r_{m})+\left| \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}\right| +\left| \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| +|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}|+\left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| },\\ s_{n, h}(\gamma )&= {} {\text {sign}}\left[ \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h}\right] ,\\ s_{h}(\gamma )&= {} {\text {sign}}\left[ \mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h}\right] ,\\ s_{m, h}(\gamma )&= {} {\text {sign}}\left[ \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h}\right] . \end{aligned}$$

The set \(\mathcal {K}\) is bounded and \(\varvec{\theta }_{0}\) belongs to the interior of \(\Theta\), there exist \(m_{\mathcal {K}}\) such that for all \(m \ge m_{\mathcal {K}}\) and for all \(\gamma \in \mathcal {K}\), the quantity \(\varvec{\theta }_{0}+\frac{\gamma }{r_{m}}\) is in \(\Theta\). Then for all \(\gamma \in \mathcal {K}\) we have;

$$\begin{aligned} \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}&= \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}-\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},h^{0}}\\&\quad+\alpha _{0, m}(\gamma )\left( r^{-\nu }_{m}\ell ^{-1}(r_{m})+\left| \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}\right| \right. \\&\left. \quad \;+\left| \widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| +|\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}|+\left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| \right) \\&\quad+\alpha _{0, n}(\gamma )\left( r^{-\nu }_{n}\ell ^{-1}(r_{n})+\left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}}\right| +\left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| +|\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}}|+\left| \mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| \right) . \end{aligned}$$

This can be rewritten as follows

$$\begin{aligned}r^{\nu }_{m}&\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}}\left( 1-\alpha _{0, m}(\gamma )s_{m,\widehat{h}_{n}}(\gamma )\right) \\&=r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\left( 1+\alpha _{0, m}(\gamma )s_{m,h^{0}}(\gamma )\right) \\&\quad+r^{\nu }_{m}\ell (r_{m})\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}}\left( 1+\alpha _{0, n}(\gamma )s_{\widehat{h}_{n}}(\gamma )\right) \\&\quad-r^{\nu }_{m}\ell (r_{m})\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}}, h^{0}}\left( 1-\alpha _{0, n}(\gamma )s_{h^{0}}(\gamma )\right) +z^{\prime }_{2,m}, \end{aligned}$$
(14)

where;

$$\begin{aligned} z^{\prime }_{2,m}&= \alpha _{0, m}(\gamma )+r^{\nu }_{m}\ell (r_{m})\left( \alpha _{0, m}(\gamma )+\alpha _{0, n}(\gamma )\right) \left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}\right| \\&+r^{\nu }_{m}\ell (r_{m})\left( \alpha _{0, m}(\gamma )+\alpha _{0, n}(\gamma )\right) \left| \mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right| +\frac{r^{\nu }_{m}\ell (r_{m})}{r^{\nu }_{n}\ell (r_{n})}\alpha _{0, n}(\gamma ). \end{aligned}$$

We get from the assumption (CB1) and (CB3) that

$$\begin{aligned} r^{\nu }_{m}\ell (r_{m})\left[ \mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{n}}-\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right]&=r^{\nu -1}_{m}\ell (r_{m})W_{m}(\gamma )+r^{\nu -2}_{m}\ell (r_{m})\beta _{n}\Vert \gamma \Vert ^{2}\nonumber \\&\quad+r^{\nu -2}_{m}\ell (r_{m})o(\Vert \gamma \Vert ^{2})\nonumber \\:&= {} r^{\nu -1}_{m}\ell (r_{m})W_{m}(\gamma )+\alpha _{1,n}(\gamma ). \end{aligned}$$
(15)

By combining (14) and (15), we infer that

$$\begin{aligned}r^{\nu}_{m}&\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}}\left( 1+z_{0, m}\right)\\ {}= & {} \left[ r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+r^{\nu -1}_{m}\ell (r_{m}) W_{m}(\gamma )\right] \left( 1+z_{1, m}\right) +z_{2, m}, \end{aligned}$$

where

$$\begin{aligned} z_{0, m}(\gamma )&=-\alpha _{0, m}(\gamma )s_{m,\widehat{h}_{n}}(\gamma )\\ z_{1, m}(\gamma )&=\alpha _{0, m}(\gamma )s_{m,h^{0}}(\gamma )\\ z_{2, m}(\gamma )&=z^{\prime }_{2, m}(\gamma )+z^{\prime \prime }_{2, m}(\gamma ), \end{aligned}$$

and

$$\begin{aligned} z^{\prime \prime }_{2, m}(\gamma ) &= \alpha _{0, n}(\gamma )\left[ 1+\left( V(\gamma , \gamma )+r^{\nu -2}_{m}\ell (r_{m})o(\Vert \gamma \Vert ^{2})\right) \left( s_{\widehat{h}}+s_{h_{0}}\right) (\gamma )\right. \\&\quad\;\left. +\left( r^{\nu -1}_{m}\ell (r_{m}) W_{m}(\gamma )+\alpha _{1, n}(\gamma )\right) \left( s_{\widehat{h}}-s_{n, h_{0}}\right) (\gamma )\right] \\&\quad+\alpha _{1, n}(\gamma )\left( 1+z_{1, m}(\gamma )\right) . \end{aligned}$$

It is easily to show that

$$\sup _{\gamma \in \mathcal {K}}\left| z_{j, m}\right| =o_{\mathbb {P}_{W}^{*}}(1) ~ \text{ i.p., } \text{ for } ~ j=0,1,2,$$

by using assumptions (A2), (AB1) (C3), (CB2), (CB3) and Lemma 3 of Cheng and Huang (2010).\(\square\)

Lemma 6.2

Under the assumptions of Lemma 6.1, the process

$$\gamma \mapsto r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+r^{\nu -1}_{m}\ell (r_{m})W_{m}(\gamma )$$

converges weakly conditionally in probability to the process

$$\gamma \mapsto \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })~ \text{ in } ~\ell ^{\infty }(\mathcal {K}).$$

Proof of Lemma 6.2

Making use of the assumption (CB4), we need only to show the equicontinuity of the process

$$T_{m}: \gamma \mapsto r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+r^{\nu -1}_{m}\ell (r_{m})W_{m}(\gamma ).$$

One can see that the process \(T_m\) can be decomposed into the sum three processes in the following way

$$T_m=\sum _{i=1}^{3} T_{i,m},$$

where

$$\begin{aligned} T_{1,m}&:\gamma \mapsto r^{\nu }_{m}\ell (r_{m})\left( (\widehat{\mathbb {P}}_{m}-\mathbb {P}_{n})\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right) ,\\ T_{2,m}&:\gamma \mapsto r^{\nu }_{m}\ell (r_{m})\left( (\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right) ,\\T_{3,m}&:\gamma \mapsto r^{\nu }_{m}\ell (r_{m})\mathbb {P}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}+r^{\nu -1}_{m}\ell (r_{m})W_{m}(\gamma ). \end{aligned}$$

We shall study separately the properties of each process. Firstly, we note that by assumption (C6), (CB3) and (AB1), for sufficiently large m, we have \(\varvec{\theta }_{0}+\frac{\mathcal {K}}{r_{m}} \subset \Theta\) and then the processes \(T_{1,m}\), \(T_{2,m}\) and \(T_{3,m}\) take values in \(\ell ^{\infty }(\mathcal {K})\). The process \(T_{2,m}\) can be treated as in the proof of part (i) by reformulating it to this form

$$\begin{aligned} T_{2,m}(\gamma )&=r^{\nu }_{m}\ell (r_{m})\left( (\mathbb {P}_{n}-\mathbb {P})\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}\right) \nonumber \\ {}&=\sqrt{\frac{m}{n}}\mathbb {G}_{n}\frac{r^{\nu }_{m}\ell (r_{m})}{\sqrt{m}}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}}, \end{aligned}$$
(16)

as in the proof of (i) apply Theorem 2.11.22 of van der Vaart and Wellner (1996) to the process

$$\gamma \mapsto \mathbb {G}_{n}\frac{r^{\nu }_{m}\ell (r_{m})}{\sqrt{m}}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},h^{0}},$$

by assumptions (C4), (C5), (C9) and (C10) we get its uniform equicontinuity with respect to \(\Vert \cdot \Vert\) on \(\mathcal {K}\) and by the use of assumption (AB1), we obtain our main result for the process \(T_{2,m}\). Then the process \(T_{1,m}\) also does not depend to the estimation of the nuisance parameter, it can be treated in the same way as in part (ii) of Lemma 2 in Lee (2012). Briefly, we want to show that

$$\Delta _{n} {\mathop {=}\limits ^{ \text{ def } }} P_{W}^{*} \sup \left\{ \left| T_{1,m}(\gamma )-T_{1,m}(\gamma ^{\prime })\right| :\Vert \gamma -\gamma ^{\prime }\Vert \le \delta _{n}, \gamma ,\gamma ^{\prime } \in \mathcal {K}\right\} \rightarrow 0 ~~\text {i.p.}$$

Define the class

$$\mathscr {H}_{n}=r_{m}^{v} \ell \left( r_{m}\right) m^{-1 / 2} \mathcal {M}_{d / r_{m}}\left( \delta _{n} / r_{m}\right) ,$$

and let

$$\tilde{M}_{n}=r_{m}^{v} \ell \left( r_{m}\right) m^{-1 / 2} M_{d / r_{m}}$$

its envelope function. Making use of the condition (B2), we readily infer that

$$\mathbb {P}^{*} \sup \left\{ |f|: f \in \mathscr {H}_{n}\right\} \le C \mathbb {P}^{*} \tilde{M}_{n} \le C<\infty .$$

It follows by Lemmas 2.9.1 and 3.6.6 of van der Vaart and Wellner (1996) that, for \(1 \le n_{0} \le n\);

$$\begin{aligned} \mathbb {P}^{*} \Delta _{n}\le & {} C n_{0} m^{-1 / 2}\left( \mathbb {E} \max _{1 \le i \le n} \tilde{N}_{i}\right) \mathbb {P}^{*} \tilde{M}_{n} \\&+C n^{1 / 2} m^{-1 / 2} \int _{0}^{\infty }\left( \mathbb P\left\{ \left| \tilde{N}_{1}\right| >x\right\} \right) ^{1 / 2} d x \max _{n_{0} \le j \le n} \mathbb {P}^{*}\left\| j^{-1 / 2} \sum _{i=n_{0}}^{j} \epsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {H}_{n}}, \end{aligned}$$

where \(\tilde{N}=(\tilde{N}_{1}, \tilde{N}_{2}, \ldots )\) and \(\varepsilon =\left( \epsilon _{1}, \epsilon _{2}, \ldots \right)\) are independent sequences of i.i.d. symmetrized Poisson variables with parameter m/(2n) and i.i.d. Rademacher variables, respectively and both being independent of \(\mathbf {X}_{1},\ldots , \mathbf {X}_{n}\). By Jensen’s inequality, problem 3.6.3 of van der Vaart and Wellner (1996) and (B2), we readily get

$$\begin{aligned} \mathbb {P}^{*} \Delta _{n} \le C n_{0} m^{-1 / 2} \log n / \log (n / m+1)+C \max _{n_{0} \le j \le n} \mathbb {P}^{*}\left\| j^{-1 / 2} \sum _{i=n_{0}}^{j} \epsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {H}_{n}}. \end{aligned}$$
(17)

By taking

$$n_{0}=n_{0, n}=a_{m} m^{1 / 4}\{\log (n / m+1) / \log n\}^{1 / 2} \in [1, n],$$

it follows, by condition (C4), that

$$\begin{aligned} n_{0} m^{-1 / 2} \log n / \log (n / m+1) \rightarrow 0 \quad \text{ as } n \rightarrow \infty . \end{aligned}$$
(18)

We refer to the integrand in (C9) by \(\mathcal {N}(\epsilon )\) for \(\epsilon >0\). By using the triangular inequality, the properties of sub-Gaussianity of Rademacher processes, under (C10) and using the Cauchy-Schwarz inequality with (B2), we obtain

$$\begin{aligned} \begin{aligned} \max _{n_{0} \le j \le n} \mathbb {P}^{*}\left\| j^{-1 / 2} \sum _{i=n_{0}}^{j} \epsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {H}_{n}}&= \max _{n_{0} \le j \le n} \mathbb {P}_{X} \mathbb {P}_{\varepsilon }\left\| j^{-1 / 2} \sum _{i=n_{0}}^{j} \epsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {H}_{n}} \\ &\le 2\max _{n_{0} \le j \le n} \mathbb {P}_{X} \mathbb {P}_{\varepsilon }\left\| j^{-1 / 2} \sum _{i=1}^{j} \epsilon _{i} \delta _{\mathbf {X}_{i}}\right\| _{\mathcal {H}_{n}} \\ &\le C \max _{n_{0} \le j \le n}\left\{ \mathbb {P}^{*}\left( \int _{0}^{\varvec{\psi }_{n, j}} \mathcal {N}(\epsilon ) d \epsilon \right) ^{2}\right\} ^{1 / 2}\left( \mathbb {P}^{*} \tilde{M}_{n}^{2}\right) ^{1 / 2} \\ &\le C \max _{n_{0} \le j \le n}\left\{ \mathbb {P}^{*}\left( \int _{0}^{\varvec{\psi }_{n, j}} \mathcal {N}(\epsilon ) d \epsilon \right) ^{2}\right\} ^{1 / 2}, \end{aligned} \end{aligned}$$
(19)

where

$$\varvec{\psi }_{n, j}=\sup \left\{ \Vert f\Vert _{\mathbb {P}_{j}, 2}: f \in \mathscr {H}_{n}\right\} .$$

Our aim is to show that

$$\begin{aligned} \varvec{\psi }_{n_{k}, j_{k}}=o_{\mathbb {P} *}(1) \quad \text{ as } k \rightarrow \infty , \end{aligned}$$
(20)

for an arbitrary subsequence \(\left\{ n_{k}: k=1,2, \ldots \right\}\) of \(\{n\},\) and any arbitrary sequence \(\left\{ j_{k}\right\}\) such that \(n_{0, n_{k}} \le j_{k} \le n_{k}\) for all \(k=\) \(1,2, \ldots .\) Write \(m_{k}^{*}=m_{n_{k}}.\) Define, for any \(\gamma \in \mathcal {K}\),

$$Z_{k i}(\gamma )=r_{m_{k}^{*}}^{v} \ell \left( r_{m_{k}^{*}}\right) m_{k}^{*-1 / 2} j_{k}^{-1 / 2} \tilde{m}_{\gamma / r_{m_{k}}^{*}}\left( \mathbf {X}_{i}\right) .$$

As in the proof of part (ii) of Lemma 2 of Lee (2012); he showed these variables satisfy the condition of Theorem 2.11.1 of van der Vaart and Wellner (1996), which is implies our result in (20) for arbitrary subsequence \(n_{k}\) and \(j_{k} \in \left[ n_{0, n_{k}}, n\right]\), by arguing as in the proof of this Theorem. It then follows by the dominated convergence theorem that the bound in (19) has limsup equal to 0 as \(n \rightarrow \infty\). Substituting this and (18) into (17) to obtain the desired result. Finally, for the process \(T_{3,m}\), for large value of m, we have \(\varvec{\theta }_{0}+\frac{\gamma }{r_{m}} \in \Theta\) by using the assumption (CB3), we get, for all \(0< \delta < \delta _{1}\),

$$\begin{aligned} &{\sup_{\gamma , \gamma ^{\prime } \in \mathcal {K},\left\| \gamma -\gamma ^{\prime }\right\| \le \delta }\left| T_{3, m}(\gamma )-T_{3, m}\left( \gamma ^{\prime }\right) \right| } \\&=\sup _{\gamma , \gamma ^{\prime } \in \mathcal {K},\left\| \gamma -\gamma ^{\prime }\right\| \le \delta }\Big |r^{\nu -1}_{m}\ell (r_{m})W_{m}\left( \gamma -\gamma ^{\prime }\right) +r^{\nu -2}_{m}\ell (r_{m})\left( V(\gamma , \gamma )-V\left( \gamma ^{\prime }, \gamma ^{\prime }\right) \right) \\&\left. +r^{\nu }_{m}\ell (r_{m})\left( o\left( \frac{\Vert \gamma \Vert ^{2}}{r_{m}^{2}}\right) +o\left( \frac{\left\| \gamma ^{\prime }\right\| ^{2}}{r_{m}^{2}}\right) \right) \right| \\&\le \delta ^{\tau }\left( r^{\nu -1}_{m}\ell (r_{m}) \sup _{\gamma \in \mathcal {K}, \delta \le \delta _{1},\Vert \gamma \Vert \le \delta }\left| \frac{W_{m}(\gamma )}{\delta ^{\tau }}\right| +r^{\nu -2}_{m}\ell (r_{m})\sup _{\gamma , \gamma ^{\prime } \in \mathcal {K}, \delta \le \delta _{1},\left\| \gamma -\gamma ^{\prime }\right\| \le \delta } \frac{\left| V(\gamma , \gamma )-V\left( \gamma ^{\prime }, \gamma ^{\prime }\right) \right| }{\delta ^{\tau }}\right) \\&\quad \, \, +b_{m} \\&:= {} \delta ^{\tau } \alpha _{m}+b_{m}, \end{aligned}$$
(21)

where

$$b_{m} \le \sup \limits _{\gamma , \gamma ^{\prime } \in \mathcal {K}}r^{\nu }_{m}\ell (r_{m})\left( o\left( \frac{\Vert \gamma \Vert ^{2}}{r_{m}^{2}}\right) +o\left( \frac{\left\| \gamma ^{\prime }\right\| ^{2}}{r_{m}^{2}}\right) \right) \rightarrow 0,~ \text{ as }~ m \rightarrow \infty ,$$

and \(\alpha _{m}=O_{\mathbb {P}_W^{*}}(1)\) i.p. uniformly over \(\delta \le \delta _{1}\). From this, we obtain, for any \(\epsilon >0\) and \(\eta >0\),

$$\begin{aligned} \mathbb {P}_W^{*}&\left( \sup _{\gamma , \gamma ^{\prime } \in \mathcal {K},\left\| \gamma -\gamma ^{\prime }\right\| \le \delta }\left| T_{3, m}(\gamma )-T_{3, m}(\gamma ^{\prime })\right|>\epsilon \right) \\&\le \mathbb {P}_W^{*}\left( \delta ^{\tau } \alpha _{m}+b_{m}>\epsilon , \alpha _{m} \le C, |b_{m}| < \frac{\epsilon }{2}\right) +\mathbb {P}_W^{*}\left( \alpha _{m}> C\right) \\&\le \mathbb {P}_W^{*}\left( \delta ^{\tau }> \frac{\epsilon }{2C}\right) +\mathbb {P}_W^{*}\left( \alpha _{m} > C\right) . \end{aligned}$$

By choosing \(C_{\eta }\) such that the last term is bounded by \(\eta\) for large value of m, and taking \(\delta \le \delta _{1} \wedge \left( \frac{\epsilon }{2C_{\eta }} \right) ^{\frac{1}{\tau }}\), which implies the main result for the process \(T_{3,m}\). Finally by using the fact that

$$T_{m}=T_{1,m}+T_{2,m}+T_{3,m},$$

we obtain the desired result on the process \(T_{m}\).\(\square\)

Proof of Theorem 3.9

Making use of the result (i) in Theorem 3.8 in connection with the assumption (C8), we infer that we have almost all paths of the process \(\gamma \mapsto \mathbb {G}(\gamma )+\Lambda (\gamma )\) are uniformly continuous on every \(\mathcal {K} \subset \mathcal {B}\), and reaching the supremum at an unique point \(\gamma _{0}\). For part (i), an application of (i) in Theorem 3.8, for any closed bounded \(\mathcal {K} \subset \mathcal {B}\), gives

$$\mathbb {T}_{n}(\gamma )=r^{\nu }_{n}\ell (r_{n})\mathbb {P}_{n}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{n}},\widehat{h}} \Rightarrow \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma }), ~\text{ in }~ \ell ^{\infty }(\mathcal {K}).$$

We get from the assumption (C11) that

$$\mathbb {T}_{n}\left( r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\right) \ge \sup \limits _{\gamma \le K}\mathbb {T}_{n}(\gamma )-o_{\mathbb {P}}(1).$$

Noting that \(\gamma _{0}\) is the unique, well-separated, maximizer of \(\mathbb {G}(\gamma )+\Lambda (\gamma )\), then part (i) follows by Theorem 3.2.2 of van der Vaart and Wellner (1996), where compact sets and uniform tightness of \(r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})\) are replaced respectively by closed bounded sets with similar structure as the set \(\mathcal {K}\) and

$$r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0})=O_{\mathbb {P}^{*}}(1).$$

For part (ii), we infer that

$$\widehat{\mathbb {T}}_{m}(\gamma )=r^{\nu }_{m}\ell (r_{m})\widehat{\mathbb {P}}_{m}\widetilde{\mathbf {m}}_{\frac{\gamma }{r_{m}},\widehat{h}_{m}} \Rightarrow \Lambda (\varvec{\gamma })+\mathbb {G}(\varvec{\gamma })~~ \text{ i.p. } \text{ in } \ell (\mathcal {K}).$$

By combining the assumption (CB5) with the first part of (CB1), we have respectively

$$\widehat{\mathbb {T}}_{m}\left( r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{0})\right) \ge \sup \limits _{\gamma \le K}\widehat{\mathbb {T}}_{m}(\gamma )-o_{\mathbb {P}_W^{*}}(1)$$

and

$$r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{0})=O_{\mathbb {P}_W^{*}}(1),~ \text{ i.p. }$$

An application of Lemma 4(ii) of Lee (2012) gives

$$r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{0}) \Rightarrow \gamma _{0}, ~\text{ i.p. }$$

It follows from the first part of the assumptions (C1), (AB1) and Slutsky’s theorem that

$$r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{n})=r_{m}(\varvec{\theta }_{m}-\varvec{\theta }_{0})-\frac{r_{m}}{r_{n}}r_{n}(\varvec{\theta }_{n}-\varvec{\theta }_{0}) \Rightarrow \gamma _{0}~~\text{ i.p. }$$

Hence the proof of the statement (ii) is complete.\(\square\)