Weighted Empirical Minimum Distance Estimators in Berkson Measurement Error Regression Models

Koul, Hira L.; Geng, Pei

doi:10.1007/978-3-030-48814-7_3

Hira L. Koul⁴ &
Pei Geng⁵

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 329))

Included in the following conference series:

Workshop on Analytical Methods in Statistics

581 Accesses

Abstract

We develop analogs of the two classes of weighted empirical minimum distance (m.d.) estimators of the underlying parameters in linear and nonlinear regression models when covariates are observed with Berkson measurement error. One class is based on the integral of the square of symmetrized weighted empirical of residuals while the other is based on a similar integral involving a weighted empirical of residual ranks. The former class requires the regression and measurement errors to be symmetric around zero while the latter class does not need any such assumption. The first class of estimators includes the analogs of the least absolute deviation and Hodges-Lehmann estimators while the second class includes an estimator that is asymptotically more efficient than these two estimators at some error distributions when there is no measurement error. In the case of linear model, no knowledge of the measurement error distribution is needed. We also develop these estimators for nonlinear models when the measurement error distribution is known and when it is unknown but validation data is available.

Access provided by Autonomous University of Puebla. Download conference paper PDF

A robust conditional maximum likelihood estimator for generalized linear models with a dispersion parameter

Article 07 December 2018

Minimum distance model checking in Berkson measurement error models with validation data

Article 23 August 2018

An R-Estimator in the Errors in Variables Linear Regression Model

Article 29 March 2022

Keywords

1 Introduction

Statistical literature is replete with the various minimum distance estimation methods in the one and two sample location models. Beran [2, 3] and Donoho and Liu [7, 8] argue that the minimum distance estimators based on $L_2$ distances involving either density estimators or residual empirical distribution functions have some desirable finite sample properties, tend to be robust against some contaminated models and are also asymptotically efficient at some error distributions.

In the classical regression models without measurement error in the covariates, classes of minimum distance estimators of the underlying parameters based on Cramér-von Mises type distances between certain weighted residual empirical processes were developed in Koul [12,13,14,15]. These classes include some estimators that are robust against outliers in the regression errors and asymptotically efficient at some error distributions.

In practice there are numerous situations when covariates are not observable. Instead one observes their surrogate with some error. The regression models with such covariates are known as the measurement error regression models. Fuller [9], Cheng and Van Ness [6], Carroll et al. [5] and Yi [19] discuss numerous examples of practical importance of these models.

Given the desirable properties of the above minimum distance (m.d.) estimators and the importance of the measurement error regression models, it is desirable to develop their analogs for these models. The next section describes the m.d. estimators of interest and their asymptotic distributions in the classical linear regression model. Their analogs for the linear regression Berkson measurement error (ME) model are developed in Sect. 3. The two classes of m.d. estimators are developed. One assumes the symmetry of the regression model error and ME error distributions and then basis the m.d. estimators on the symmetrized weighted empirical of the residuals. This class includes an analog of the Hodges-Lehmann estimator of the one sample location parameter, see Hodges and Lehmann (1963), and the least absolute deviation (LAD) estimator. The second class is based on a weighted empirical of residual ranks. This class of estimators does not need the symmetry of the errors distributions. This class includes an estimator that is asymptotically more efficient than the analog of Hodges-Lehmann and LAD estimators at some error distributions. Neither classes need the knowledge of the measurement error or regression error distributions.

Section 4 discusses analogs of these estimators in the Berkson measurement error nonlinear regression models, where the measurement error distribution is assumed to be known. Section 5 develops their analogs when the ME distribution is unknown but validation data is available. In this case the consistency rate of these estimators is min$(n, N)^{1/2}$, where n and N are the primary data and validation data sample sizes, respectively. Section 6 provides an application of the proposed estimators to a real data example. Several proofs are deferred to the last section.

2 Linear Regression Model

In this section we recall the definition of the m.d. estimators of interest here in the no measurement error linear regression model and their known asymptotic normality results.

Accordingly, consider the linear regression model where for some $\theta \in {\mathbb R}^p$, the response variable Y and the p dimensional observable predicting covariate vector X obey the relation

$$\begin{aligned}&Y=X' \theta +\varepsilon , \end{aligned}$$

(1)

where $\varepsilon $ is independent of X and symmetrically distributed around $E(\varepsilon )=0$. For an $x\in {\mathbb R},\,x'$ and $\Vert x\Vert $ denote its transpose and Euclidean norm, respectively. Let $(X_i, Y_i), 1\le i\le n$ be a random sample from this model. The two classes of m.d. estimators of $\theta $ based on weighted empirical processes of the residuals and residual ranks were developed in Koul [12,13,14,15]. To describe these estimators, let G be a nondecreasing right continuous function from ${\mathbb R}$ to ${\mathbb R}$ having left limits and define

$$\begin{aligned} V(x,\vartheta )&:= n^{-1/2} \sum _{i=1}^nX_i\big \{ I(Y_i - X'_i \vartheta \le x) - I(-Y_i + X_i' \vartheta < x)\big \}, \\ M(\vartheta )&:= \int \big \Vert V(x,\vartheta )\big \Vert ^2 dG(x), \qquad \hat{\theta }:= \text {argmin}_{\vartheta \in {\mathbb R}^p}M(\vartheta ). \nonumber \end{aligned}$$

This class of estimators, one for each G, includes some well celebrated estimators. For example $\hat{\theta }$ corresponding to $G(x)\equiv x$ yields an analog of the one sample location parameter Hodges-Lehmann estimator in the linear regression model. Similarly, $G(x)\equiv \delta _0(x)$, the degenerate measure at zero, makes $\hat{\theta }$ equal to the least absolute deviation (LAD) estimator.

A class of m.d. estimators when the error distribution is not symmetric and unknown is obtained by using the weighted empirical of the residual ranks defined as follows. Write $X_i=(X_{i1}, X_{i2}, \ldots , X_{ip})', \, {i=1,\ldots , n}$. Let $\bar{X}_j:= n^{-1}\sum _{i=1}^nX_{ij}$, $\bar{X}:=(\bar{X}_1,\ldots , \bar{X}_p)'$ and $X_{ic}:=X_i-\bar{X}$, $1\le i\le n$. Let $R_{i\vartheta }$ denote the rank of the ith residual $Y_i-X_i'\vartheta $ among $Y_j-X_j'\vartheta , \, j=1,\ldots , n$. Let ${\varPsi }$ be a distribution function on [0, 1] and define

$$\begin{aligned} V(u,\vartheta )&:= n^{-1/2}\sum _{i=1}^nX_{ic} I(R_{i\vartheta }\le nu), \quad K(\vartheta ):= \int \limits _0^1 \big \Vert \mathcal{V}(u,\vartheta )\Vert ^2 d{\varPsi }(u), \\ \hat{\theta }_R&:= \text {argmin}_{\vartheta \in {\mathbb R}^p} K(\vartheta ). \end{aligned}$$

Yet another m.d. estimator, when error distribution is unknown and not symmetric, is

$$\begin{aligned} V_c(x,\vartheta )&:= n^{-1/2}\sum _{i=1}^nX_{ic} I(Y_i-X_i'\vartheta \le x), \\ M_c(\vartheta )&:= \int \big \Vert V_c(x,\vartheta )\big \Vert ^2 dx, \qquad \hat{\theta }_c := \text {argmin}_{\vartheta \in {\mathbb R}^p} M_c(\vartheta ). \end{aligned}$$

If one reduces the model (1) to the two sample location model, then $\hat{\theta }_c$ is the median of pairwise differences, the so called Hodges-Lehmann estimator of the two sample location parameter. Thus in general $\hat{\theta }_c$ is an analog of this estimator in the linear regression model.

The following asymptotic normality results can be deduced from Koul [15] and [16, Sect. 5.4].

Lemma 1

Suppose the model (1) holds and $E\Vert X\Vert ^2<\infty $.

(a). In addition, suppose ${\varSigma }_X:= E(XX')$ is positive definite and the error d.f. F is symmetric around zero and has density f. Further, suppose the following hold.

$$\begin{aligned}&G\text { is a nondecreasing right continuous function on }{\mathbb R}\text { to }{\mathbb R}, \end{aligned}$$

(2)

$$\begin{aligned}&\text { having left limits and }dG(x)=-dG(-x), \forall \, x\in {\mathbb R}. \nonumber \\&0<\int f^j dG<\infty , \quad \lim _{z\rightarrow 0}\int \big [f(x+z)-f(x)\big ]^j dG(x)=0,\,j=1,2, \\&\int \limits _0^\infty (1-F) dG<\infty . \nonumber \end{aligned}$$

(3)

Then

$$ n^{1/2}(\hat{\theta }- \theta )\rightarrow _D N\big (0, \sigma ^2_G {\varSigma }_X^{-1}\big ), \quad \sigma _G^2:= \frac{\mathrm{\text{ Var }}\Big (\int \limits _{-\infty }^\varepsilon f(x) dG(x)\Big )}{\big (\int f^2 dG\big )^2}. $$

(b). In addition, suppose the error d.f. F has uniformly continuous bounded density f, ${\varOmega }:= E\{(X-EX)(X-EX)'\}$ is positive definite and ${\varPsi }$ is a d.f. on [0, 1] such that $\int \limits _0^1 f^2(F^{-1}(s)) d{\varPsi }(s)>0$. Then

$$n^{1/2}\big (\hat{\theta }_R - \theta \big ) \rightarrow _D N(0,\gamma _{\varPsi }^2 {\varOmega }^{-1}), \quad \gamma _{\varPsi }^2:= \frac{{\text{ Var }}\big (\int \limits _0^{F(\varepsilon )} f(F^{-1}(s)) d{\varPsi }(s)\big )}{\big (\int \limits _0^1 f^2(F^{-1}(s)) d{\varPsi }(s)\big )^2}. $$

(c). In addition, suppose ${\varOmega }$ is positive definite, F has square integrable density f and $E|\varepsilon |<\infty $. Then $n^{1/2}(\hat{\theta }_c - \theta )\rightarrow _D N\big (0, \sigma ^2_I {\varOmega }^{-1}\big )$, where $ \sigma ^2_I:= 1/12 \big (\int f^2 (x) dx\big )^2. $

Before proceeding further we now describe some comparison of the above asymptotic variances. Let $\sigma ^2_{LAD}:=1/(4 f^2(0))$ and $\sigma _{LSE}^2:= \text{ Var }(\varepsilon )$ denote the factors of the asymptotic covariance matrices of the LAD and the least squares estimators, respectively. Let $\gamma _I^2$ denote the $\gamma _{\varPsi }^2$ when ${\varPsi }(s)\equiv s$, i.e.,

$$\begin{aligned} \gamma ^2_I= & {} \frac{\int \int \big [F(x\wedge y) -F(x)F(y)\big ] f^2(x) f^2(y) dx dy}{\big (\int \limits _0^1 f^3(x) dx\big )^2}. \end{aligned}$$

Table 1, obtained from Koul [16], gives the values of these factors for some distributions F. From this table one sees that the estimator $\hat{\theta }_R$ corresponding to ${\varPsi }(s)\equiv s$ is asymptotically more efficient than the LAD at logistic error distribution while it is asymptotically more efficient than the Hodges-Lehmann type estimator at the double exponential and Cauchy error distributions. For these reasons it is desirable to develop analogs of $\hat{\theta }_R$ also for the ME models.

Table 1 A comparison of asymptotic variances

Full size table

As argued in Koul (Chap. 5, [16]), the estimators $\{\hat{\theta }_G, \, G\,\,\text {a d.f.}\}$ are robust against heavy tails in the error distribution in the general linear regression model. The estimator $\hat{\theta }_I$, where $G(x)\equiv x$, not a d.f., is robust against heavy tails and also asymptotically efficient at the logistic errors.

3 Berkson ME Linear Regression Model

In this section we shall develop analogs of the above estimators in the Berkson ME linear regression model, where the response variable Y obeys the relation (1) and where, instead of observing X, one observes a surrogate Z obeying the relation

$$\begin{aligned}&X=Z+\eta . \end{aligned}$$

(4)

In (4), $Z, \eta , \varepsilon $ are assumed to be mutually independent and $E(\eta )=0$. Note that $\eta $ is $p\times 1$ vector of errors and its distribution need not be known.

Analog of $\hat{\theta }$. We shall first develop and derive the asymptotic distribution of the analogs of the estimators $\hat{\theta }$ in the Berkson ME linear regression model (1) and (4). Rewrite the model as

$$\begin{aligned}&Y=Z' \theta + \xi , \qquad \xi := \eta ' \theta +\varepsilon , \,\,\,E(\xi )=0, \quad \exists \, \theta \in {\mathbb R}. \end{aligned}$$

(5)

Because $Z,\eta ,\varepsilon $ are mutually independent, $\xi $ is independent of Z in (5).

Let H denote the distribution functions (d.f.) of $\eta $. Assume that the d.f. F of $\varepsilon $ is continuous and symmetric around zero and that H is also symmetric around zero, i.e., $-dH(v)=dH(-v)$, for all $v\in {\mathbb R}^p$. Then the d.f. of $\xi $

$$\begin{aligned} L(x):=P(\xi \le x)=P( \eta '\theta +\varepsilon \le x) =\int F(x- v'\theta ) dH(v) \end{aligned}$$

is also continuous and symmetric around zero. This symmetry in turn motivates the following definition of the class of m.d. estimators of $\theta $ in the model (5), which mimics the definition of $\hat{\theta }$ by simply replacing $X_i$ by $Z_i$. Define

$$\begin{aligned}&\widetilde{V}(x,t):= n^{-1/2} \sum _{i=1}^nZ_i\big \{ I(Y_i - Z_i' t\le x) - I(-Y_i + Z_i't< x)\big \}, \\&\widetilde{M}(t):=\int \big \Vert \widetilde{V}(x,t)\big \Vert ^2 dG(x), \qquad \widetilde{\theta }:= \text {argmin}_{t\in {\mathbb R}^p}\widetilde{M}(t). \nonumber \end{aligned}$$

Because L is continuous and symmetric around zero and $\xi $ is independent of Z, $E\widetilde{V}(x,\theta )\equiv 0$.

The following assumptions are needed for the asymptotic normality of $\widetilde{\theta }$.

$$\begin{aligned}&E\Vert Z\Vert ^2<\infty \,\,\text {and }{\varGamma }:= EZZ'\text { is positive definite.} \end{aligned}$$

(6)

$$\begin{aligned}&H\text { satisfies }dH(v)=-dH(-v), \, \forall \, v\in {\mathbb R}^p. \end{aligned}$$

(7)

$$\begin{aligned}&F\text { has Lebesgue density }f\text {, symmetric around zero, and } \end{aligned}$$

(8)

$$\begin{aligned}&\text {such that }\ell (x)=\int f(x- v' \theta ) dH(v)\text { of }L\text { satisfies the following:} \nonumber \\&0<\int \ell ^j dG<\infty , \qquad \lim _{z\rightarrow 0}\int \big [\ell (y+z)-\ell (y)\big ]^j dG(y)=0, \,\,\, j=1,2. \nonumber \\&A:=\int \limits _0^\infty (1-L) dG<\infty . \end{aligned}$$

(9)

Under (6), $n^{-1}\sum _{i=1}^nZ_iZ_i'\rightarrow _p {\varGamma }$ and $n^{-1/2} \max _{1\le i\le n}\Vert Z_i\Vert \rightarrow _p 0$. Use these facts and argue as in Koul [15] to deduce that (2) and (6)–(9) imply

$$\begin{aligned} n^{1/2}\big (\widetilde{\theta }- \theta \big ) \rightarrow _D \mathcal{N}(0, \tau ^2_G{\varGamma }^{-1}), \qquad \tau ^2_G :=\frac{\text {Var}\big (\int \nolimits _{-\infty }^\xi \ell dG\big )}{\big ( \int \ell ^2 dG\big )^2}.\\\nonumber \end{aligned}$$

(10)

Remark 1

We shall discuss some examples and some sufficient conditions for the above assumptions. The conditions (8) and (9) are satisfied by a large class of densities f, ME distributions H and integrating measure G. If G is a d.f., then f being uniformly continuous and bounded implies these conditions. In this case $\ell $ is also uniformly continuous, $\sup _x\ell (x)\le \sup _x f(x)<\infty $ so that $\int \ell ^jdG\le \sup _xf^j(x)<\infty $ and $\int \big [\ell (y+z)-\ell (y)\big ]^j dG(y)\le \sup _{|x-y|\le z}|\ell (y)-\ell (x)|^j\rightarrow 0$, as $z\rightarrow 0$. Moreover, here $A\le 1$. Thus these two assumptions reduce to assuming $\int \ell ^jdG>0$, $j=1,2.$

Given the importance of the two estimators corresponding to $G(x)\equiv x,\, G(x)\equiv \delta _0(x)$, it is of interest to provide some easy to verify sufficient conditions that imply conditions (8) and (9) for these two estimators.

Consider the case $G(x)\equiv x$. Assume f to be continuous and $\int f^2(x)dx<\infty .$ Then because H is a d.f., $\ell $ is also continuous and symmetric around zero and $ \int \ell (x+z)dx =\int \ell (x) dx=1$. Moreover, by the Cauchy-Schwartz (C-S) inequality and Fubini’s Theorem,

$$\begin{aligned} 0<\int \ell ^2(y) dy&=\int \Big (\int f(y- v' \theta ) dH(v)\Big )^2 dy\\&\le \int \int f^2(y- v' \theta )dy dH(v)=\int f^2(x)dx<\infty . \end{aligned}$$

Finally, because $\ell \in L_2$, by Theorem 9.5 in Rudin [18], it is shift continuous in $L_2$, i.e., (8) holds. Hence all conditions of (8) are satisfied.

Next, consider (9). The assumptions $E(\varepsilon )=0$ and $E(\eta )=0$ imply that $\int |x|f(x) dx<\infty $, $\int \Vert v\Vert dH(v)<\infty $ and hence

$$ \int |y|dL(y)=\int |y|\int f(y- v' \theta ) dH(v) dy=\int \int |x+ v' \theta | f(x) dx dH(v)<\infty . $$

This in turn implies (9) in the case $G(x)\equiv x$.

To summarize, (6), (7), and F having continuous symmetric square integrable density f implies all of the above conditions needed for the asymptotic normality of the above analog of the Hodges-Lehmann estimator in the Berkson ME linear regression model. This fact is similar to the observation made in Berkson (1950) that the naive least square estimator, where one replace $X_i$’s by $Z_i$’s, continues to be consistent and asymptotically normal under the same conditions as when there is no ME. But, unlike in the no ME case, here the asymptotic variance

$$\begin{aligned} \tau ^2_I :=\frac{\text {Var}\big (L(\xi )\big )}{\big ( \int \ell ^2(y) dy\big )^2} =\frac{1}{12\big ( \int \big (\int f(y- v' \theta ) dH(v)\big )^2 dy\big )^2} \end{aligned}$$

depends on $\theta $. If H is degenerate at zero, i.e., if there is no ME, then $\tau _I^2=\sigma _I^2$, the factor that appears in the asymptotic covariance matrix of the Hodges-Lehmann estimator in this case.

Next, consider the case $G(x)\equiv \delta _0(x)$—degenerate measure at 0. Assume f to be continuous and bounded from the above and

$$\begin{aligned} \ell (0):= \int f( v' \theta ) dH(v) >0. \end{aligned}$$

(11)

Then the continuity and symmetry of f implies that as $z\rightarrow 0$,

$$\begin{aligned}&\int \ell (y+z) dG(y)= \ell (z)=\int f(z- v' \theta ) dH(v) \rightarrow \int f(- v' \theta ) dH(v)=\ell (0), \\&\int \big [\ell (y+z)-\ell (y)\big ]^2 dG(y)= \Big [\int \big \{f(z- v' \theta ) -f(- v' \theta )\big \} dH(v)\Big ]^2 \\&\le \int \big \{f(z- v' \theta ) -f(-v' \theta )\big \}^2 dH(v) \rightarrow 0. \end{aligned}$$

Moreover, here $\int \limits _0^\infty (1-L)dG=1-L(0)=1/2$ so that (9) is also satisfied.

To summarize, (6), (7), (11) and f being continuous, symmetric around zero and bounded from the above imply all the needed conditions for the asymptotic normality of the above analog of the LAD estimator in the Berkson ME linear regression model. Moreover, here

$$\begin{aligned}&\int \limits _{-\infty }^\xi \ell (x)dG(x) = \ell (0)I(\xi \ge 0), \quad \int \ell ^2(x) dG(x)=\ell ^2(0), \\&\text{ Var }\Big (\int \limits _{-\infty }^\xi \ell (x)dG(x)\Big ) = \ell ^2(0)/4 . \end{aligned}$$

Consequently, here the asymptotic covariance matrix also depends on $\theta $ via

$$\begin{aligned} \tau ^2_0= & {} 1\big / 4 \ell ^2(0)= 1\Big /4 \Big (\int f( v' \theta ) dH(v)\Big )^2. \end{aligned}$$

In the case of no ME, ${\varGamma }^{-1}\tau _0^2$ equals the asymptotic covariance matrix of the LAD estimator. Unlike in the case of the previous estimator, here the conditions needed for f are a bit more stringent than those required for the asymptotic normality of the LAD estimator when there is no ME.

Analog of $\hat{\theta }_R$. Here we shall describe the analogs of the class of estimators $\hat{\theta }_R$ based on the residual ranks obtained from the model (5). These estimators do not need the errors $\xi _i$’s to be symmetrically distributed. Let $\widetilde{R}_{i\vartheta }$ denote the rank of $Y_i-Z_i'\vartheta $ among $Y_j-Z_j'\vartheta , \, j=1,\ldots ,n$, $\bar{Z}:= n^{-1}\sum _{i=1}^nZ_i$, $Z_{ic}:=Z_i-\bar{Z}$, $1\le i\le n$ and define

$$\begin{aligned} \widetilde{\mathcal{V}}(u,\vartheta )&:= n^{-1/2}\sum _{i=1}^nZ_{ic} I(\widetilde{R}_{i\vartheta }\le nu), \quad \widetilde{K}(\vartheta ):= \int \nolimits _0^1 \big \Vert \widetilde{\mathcal{V}}(u,\vartheta )\Vert ^2 d{\varPsi }(u), \\ \widetilde{\theta }_R&:= \text {argmin}_{\vartheta \in {\mathbb R}^p} \widetilde{K}(\vartheta ). \end{aligned}$$

Use the facts $\sum _{i=1}^nZ_{ic}=0$, ${\varPsi }(\max (a,b))=\max \{{\varPsi }(a),{\varPsi }(b)\}$ and $\max (a,b)$ $=2^{-1}[a+b+|a-b|]$, for any $a,b\in {\mathbb R}$, to obtain the computational formula

$$\begin{aligned} \widetilde{K}(t)=-\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n Z_{ic}'Z_{jc}\Big |{\varPsi }\Big (\frac{R_{it}}{n}\,-\Big ) - {\varPsi }\Big (\frac{R_{jt}}{n}\,-\Big )\Big |. \end{aligned}$$

The following result can be deduced from Koul [15]. Suppose $E\Vert Z\Vert ^2<\infty $, $\widetilde{\varGamma }:=E(Z-EZ)(Z-EZ)'$ is positive definite, density $\ell $ of the r.v. $\xi $ is uniformly continuous and bounded and $\int \limits _0^1 \ell ^2(L^{-1}(s)) d{\varPsi }(s)>0$. Then $n^{-1/2}\max _{1\le i \le n}\Vert Z_i\Vert \rightarrow _p 0,$ $ n^{-1}\sum _{i=1}^n(Z_i-\bar{Z})(Z_i-\bar{Z})' \rightarrow _p \widetilde{\varGamma }$ and

$$\begin{aligned}&n^{1/2}\big (\widetilde{\theta }_R -\theta ) \rightarrow _D N\big (0, \widetilde{\tau }_{\varPsi }^2 \widetilde{{\varGamma }}^{-1}\big ), \quad \widetilde{\tau }_{\varPsi }^2:= \frac{\text{ Var }\big (\int \nolimits _0^{L(\xi )} \ell (L^{-1}(s)) d{\varPsi }(s)\big )}{\big (\int \nolimits _0^1 \ell ^2(L^{-1}(s)) d{\varPsi }(s)\big )^2}. \end{aligned}$$

Density f of F being uniformly continuous and bounded implies the same for $\ell (x)=\int f(x-v'\theta ) dH(v)$. It is also worth pointing out the assumptions on F, H and L needed here are relatively less stringent than those needed for the asymptotic normality of $\widetilde{\theta }$.

Of special interest is the case ${\varPsi }(s)\equiv s$. Let $\widetilde{\tau }^2_I$ denote the corresponding $\widetilde{\tau }^2_{\varPsi }$. Then by the change of variable formula,

$$\begin{aligned} \widetilde{\tau }^2_I= & {} \frac{\text{ Var }\big (\int \nolimits _0^{L(\xi )} \ell (L^{-1}(s)) ds\big )}{\int \nolimits _0^1 \ell ^2(L^{-1}(s)) ds} = \frac{\text{ Var }\big (\int \nolimits _0^\xi \ell ^2(x) dx\big )}{\big (\int \nolimits _0^1 \ell ^3(x) dx)^2}\\= & {} \frac{\int \int \big [L(x\wedge y) -L(x)L(y)\big ] \ell ^2(x) \ell ^2(y) dx dy}{\big (\int \nolimits _0^1 \ell ^3(x) dx\big )^2}. \end{aligned}$$

An analog of $\hat{\theta }_c$ here is $\widetilde{\theta }_c := \text {argmin}_{\vartheta \in {\mathbb R}^p} \widetilde{M}_c(\vartheta )$, where

$$\begin{aligned} \widetilde{V}_c(x,\vartheta ):= & {} n^{-1/2}\sum _{i=1}^nZ_{ic} I(Y_i-Z_i'\vartheta \le x), \qquad \widetilde{M}_c(\vartheta ):= \int \big \Vert \widetilde{V}_c(x,\vartheta )\big \Vert ^2 dx. \end{aligned}$$

Arguing as above one obtains that $ n^{1/2}\big (\widetilde{\theta }_c - \theta \big ) \rightarrow _D N\big (0, \tau ^2_I\widetilde{\varGamma }^{-1}\big ). $

4 Nonlinear Regression with Berkson ME

In this section we shall investigate the analogs of the above m.d. estimators in nonlinear regression models with Berkson ME.

Let $q\ge 1, p\ge 1$ be known positive integers, ${\varTheta }\subseteq {\mathbb R}^q$ be a subset of the q-dimensional Euclidean space ${\mathbb R}^q$ and consider the model where the unobservable p-dimensional covariate X, its observable surrogate Z and the response variable Y obey the relations

$$\begin{aligned} Y=m_\theta (X)+\varepsilon , \qquad X=Z+\eta ,\\\nonumber \end{aligned}$$

(12)

for some $\theta \in {\varTheta }$. Here $m_\vartheta (x)$ is a known parametric function, nonlinear in x, from ${\varTheta }\times {\mathbb R}^p$ to ${\mathbb R}$ with $E|m_\vartheta (X)|<\infty $, for all $\vartheta \in {\varTheta }$. The r.v.’s $\varepsilon , Z, \eta $ are assumed to be mutually independent, $E\varepsilon =0$ and $E\eta =0$. Unlike in the linear case, here we need to assume that the d.f. H of $\eta $ is known. See Sect. 5 for the unknown H case.

Fix a $\theta $ for which (12) holds. Let $\nu _\vartheta (z):= E(m_\vartheta (X)|Z=z)$, $\vartheta \in {\mathbb R}^q, z\in {\mathbb R}^p$. Under (12), $E(Y|Z=z)\equiv \nu _\theta (z)$. Moreover, because H is known,

$$\begin{aligned} \nu _\vartheta (z)=\int m_\vartheta (z+s) dH(s) \end{aligned}$$

is a known parametric regression function. Thus, under (12), we have the regression model

$$\begin{aligned} Y=\nu _\theta (Z) + \zeta , \qquad E(\zeta |Z=z)=0, \qquad z\in {\mathbb R}^p. \end{aligned}$$

Unlike in the linear case, the error $\zeta $ is no longer independent of Z in general.

To proceed further we assume there is a vector of p functions $\dot{m}_\vartheta (x)$ such that, with $\dot{\nu }_\vartheta (z):= \int \dot{m}_\vartheta (z+s) dH(s),$ for every $0<b<\infty $,

$$\begin{aligned}&\max _{1\le i\le n,n^{1/2}\Vert \vartheta -\theta \Vert \le b} n^{1/2}\big |\nu _\vartheta (Z_i)-\nu _\theta (Z_i)-(\vartheta -\theta )'\dot{\nu }_\theta (Z_i)\big |=o_p(1), \end{aligned}$$

(13)

$$\begin{aligned}&E\Vert \dot{\nu }_\theta (Z)\Vert ^2<\infty . \end{aligned}$$

(14)

Let

$$\begin{aligned} L_z(x):= P(\zeta \le x|Z=z), \qquad x\in {\mathbb R},\, z\in {\mathbb R}^p. \end{aligned}$$

Assume the following. For every $z\in {\mathbb R}^p$,

$$\begin{aligned}&L_z(\cdot )\text { is continuous and }L_z(x)=1-L_z(-x), \,\, \forall \, x\in {\mathbb R}^p. \end{aligned}$$

(15)

Let G be as before and define

$$\begin{aligned} U(x,\vartheta )&:= n^{-1/2}\sum _{i=1}^n\dot{\nu }_\vartheta (Z_i) \big \{ I(Y_i - \nu _\vartheta (Z_i) \le x) - I(-Y_i + \nu _\vartheta (Z_i) < x )\big \}\\ D(\vartheta )&:= \int \big \Vert U(x,\vartheta )\big \Vert ^2 dG(x), \qquad \widehat{\theta }:= \text {argmin}_\vartheta D(\vartheta ). \end{aligned}$$

In the case $ q=p$ and $ m_\theta (x)=x'\theta $, $\widehat{\theta }$ agrees with $\widetilde{\theta }$. Thus the class of estimators $\widehat{\theta }$, one for each G, is an extension of the class of estimators $\widetilde{\theta }$ from the linear case to the above nonlinear case.

Next, consider the extension of $\hat{\theta }_R$ to the above nonlinear model (12). Let $S_{i\vartheta }$ denote the rank of $Y_i-\nu _\vartheta (Z_i)$ among $Y_j-\nu _\vartheta (Z_j)$, $j=1, \ldots , n$ and define

$$\begin{aligned} \mathcal{U}_n(u, \vartheta ):= & {} \frac{1}{\sqrt{n}} \sum _{i=1}^n\dot{\nu }_\vartheta (Z_i) \big \{ I(S_{i\vartheta }\le nu) -u\big \}, \\ \mathcal{K}(\vartheta ):= & {} \int \Vert \mathcal{U}_n(u, \vartheta )\Vert ^2 d{\varPsi }(u), \quad \widehat{\theta }_R := \text {argmin}_\vartheta \mathcal{K}(\vartheta ). \end{aligned}$$

The estimator $\widehat{\theta }_R$ gives an analog of $\hat{\theta }_R$ in the present set up.

Our goal here is to prove the asymptotic normality of $\widehat{\theta }, \, \widehat{\theta }_R$. This will be done by following the general method of Sect. 5.4 of Koul [16]. This method requires the two steps. In the first step we need to show that the defining dispersions $D(\vartheta )$ and $\mathcal{K}(\vartheta )$ are AULQ (asymptotically uniformly locally quadratic) in $\vartheta -\theta $ for $\vartheta \in \mathcal{N}_n(b):=\{\vartheta \in {\varTheta }, n^{1/2}\Vert \vartheta -\theta \Vert \le b\}$, for every $0<b<\infty $. The second step requires to show that $n^{1/2}\Vert \widehat{\theta }- \theta \Vert =O_p(1)=n^{1/2}\Vert \widehat{\theta }_R - \theta \Vert . $

4.1 Asymptotic Distribution of $\widehat{\theta }$

In this subsection we shall derive the asymptotic normality of $\widehat{\theta }$. To state the needed assumptions for achieving this goal we need some more notation. Let $\nu _{nt}(z):= \nu _{\theta +n^{-1/2}t}(z),\, \xi _{it}:=\nu _{nt}(Z_i)-\nu _\theta (Z_i),\, 1\le i\le n,$ $ \dot{\nu }_{nt}(z):= \dot{\nu }_{\theta +n^{-1/2}t}(z),$ and $\dot{\nu }_{ntj}(z)$ denote the jth coordinate of $\dot{\nu }_{nt}(z)$, $1\le j\le q, t\in {\mathbb R}^q$. For any real number a, let $a^\pm =\max (0,\pm a)$ so that $a=a^+-a^-$. Also, let $ \beta _i(x):= I(\zeta _i\le x) - L_{Z_i}(x) $ and $\alpha _i(x,t):= I(\zeta _i\le x+ \xi _{it}) - I(\zeta _i\le x) - L_{Z_i}(x+\xi _{it})+ L_{Z_i}(x) . $

Because $dG(x)\equiv -dG(-x)$ and $U(x,\vartheta )\equiv U(-x, \vartheta ),$ we have

$$\begin{aligned} D(\vartheta )\equiv 2 \int \limits _0^\infty \big \Vert U(x,\vartheta )\big \Vert ^2 dG(x)\equiv 2\widetilde{D}(\vartheta ), \qquad \text {say}. \end{aligned}$$

(16)

We are now ready to state our assumptions.

$$\begin{aligned}&\int \limits _0^\infty E\Big (\big \Vert \dot{\nu }_\theta (Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dG(x)<\infty . \end{aligned}$$

(17)

$$\begin{aligned} \nonumber&\int \limits _0^\infty E\Big (\Vert \dot{\nu }_{nt}(Z) - \dot{\nu }_\theta (Z)\Vert ^2 L_Z(x)(1-L_Z(x)\Big ) dG(x) \rightarrow 0,\,\, \forall \,t\in {\mathbb R}^q. \\&\sup _{\Vert t\Vert \le b, 1\le i\le n} \big \Vert \dot{\nu }_{nt}(Z_i)-\dot{\nu }_\theta (Z_i)\big \Vert \rightarrow _p 0. \end{aligned}$$

(18)

$$\begin{aligned}&\text {Density }\ell _z\text { of }L_z\text { exists for all }z\in {\mathbb R}^p\text { such that } \end{aligned}$$

(19)

$$\begin{aligned}&0<\int \ell _z(x)dG(x)<\infty ,\,\forall \, z\in {\mathbb R}^p,\, \,\,0<\int E(\ell ^2_Z(x))dG(x)<\infty , \nonumber \\&\int E\big (\Vert \dot{\nu }_\theta (Z)\Vert ^2 \ell _Z^j(x)\big ) dG(x)<\infty , \,\,j=1,2. \nonumber \\&\lim _{u\rightarrow 0}\int \limits _{-\infty }^\infty \big (\ell _{z}(x+u) -\ell _{z}(x)\big )^j dG(x) =0,\,\, j=1,2, \forall \, z\in {\mathbb R}^p. \end{aligned}$$

(20)

$$\begin{aligned}&E\Big (\int \limits _{-|\xi _t(Z)|}^{|\xi _t(Z)|} \Vert \dot{\nu }_{nt}(Z)\Vert ^2 \int \limits _{-\infty }^\infty \ell _Z(x+u) dG(x) du \Big )\rightarrow 0, \, \forall \,t\in {\mathbb R}^q, \end{aligned}$$

(21)

$$\begin{aligned}&\text {where }\xi _t(z):= \nu _{nt}(z)-\nu _\theta (z). \nonumber \\&\text {With }{\varGamma }_\theta (x):= E\big (\dot{\nu }_\theta (Z) \dot{\nu }_\theta (Z)'\ell _Z(x)\big ),\text { the matrix} \\&{\varOmega }_\theta :=\int \limits _{-\infty }^\infty {\varGamma }_\theta (x){\varGamma }_\theta (x)' dG(x)\text { is positive definite}. \nonumber \end{aligned}$$

(22)

For every $\epsilon >0$ there is a $\delta >0$ and $N_\epsilon <\infty $ such that $\forall \,\Vert s\Vert \le b, n>N_\epsilon $,

$$\begin{aligned}&P\Big (\sup _{\Vert t-s\Vert<\delta } \Big ( n^{-1/2}\int \sum _{i=1}^n\big [\dot{\nu }_{ntj}^\pm (Z_i)- \dot{\nu }_{nsj}^\pm (Z_i) \big ] \alpha _i(x,t) dG(x) \Big )^2>\epsilon \Big ) <\epsilon , \end{aligned}$$

(23)

$$\begin{aligned}&P\Big (\sup _{\Vert t-s\Vert<\delta } n^{-1}\int \limits _0^\infty \Big \Vert \sum _{i=1}^n\{\dot{\nu }_{nt}(Z_i) - \dot{\nu }_{ns}(Z_i)\} \beta _i(x) \Big \Vert ^2 dG(x)>\epsilon \Big ) <\epsilon .\\\nonumber \end{aligned}$$

(24)

For every $\epsilon>0, \alpha >0$ there exists $N\equiv N_{\alpha ,\varepsilon }$ and $b\equiv b_{\alpha ,\epsilon }$ such that

$$\begin{aligned} P\Big (\inf _{\Vert t\Vert>b}D(\theta + n^{-1/2}t)\ge \alpha \Big )\ge 1-\epsilon , \quad \forall \, n>N.\\\nonumber \end{aligned}$$

(25)

From now onwards we shall write $\nu $ and $\dot{\nu }$ for $\nu _\theta $ and $\dot{\nu }_\theta $, respectively.

Remark 2

We shall now discuss the above assumptions when $m_\vartheta (x)=\vartheta 'h(x),$ where $h=(h_1,\ldots ,h_q)'$ is a vector of q function on ${\mathbb R}^p$ with $E\Vert h(X)\Vert ^2<\infty $, first for general G and then for some special cases of G. An example of this is the polynomial regression model with Berkson ME, where $p=1, h_j(x)=x^j, j=1,\ldots ,q$. Let $\beta (z):= E(h(X)|Z=z)$. Then $\nu _\vartheta (z)=\vartheta '\beta (z)$ and $\dot{\nu }_\vartheta (z)\equiv \beta (z)$, a constant in $\vartheta $. Therefore (13), (14), (18), (23) and (24) are all vacuously satisfied. The condition (25) also holds here, in a similar way as in the linear regression model, cf., Koul [16, Proof of Lemma 5.5.4, pp. 183–185]. Direct calculations show that (26)–(29) below imply the remaining assumptions (17), (19), (21) and (22), respectively.

$$\begin{aligned}&\int \limits _0^\infty E\Big (\big \Vert \beta (Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dG(x)<\infty . \end{aligned}$$

(26)

$$\begin{aligned}&\forall \, z\in {\mathbb R}^p\text {, density }\ell _z\text { of }L_z\text { exists and satisfies} \end{aligned}$$

(27)

$$\begin{aligned}&0<\int \ell _z^j(x)dG(x)<\infty ,\,\,j=1,2, \quad 0<\int E(\ell ^2_Z(x))dG(x)<\infty , \nonumber \\&\int E\big (\Vert \beta (Z)\Vert ^2 \ell _Z^j(x)\big ) dG(x)<\infty , \,\,j=1,2, \quad \text {and (20) holds.} \nonumber \\&E\Big (\int \limits _{n^{-1/2}b\Vert \beta (Z)\Vert }^{|n^{-1/2}b\Vert \beta (Z)\Vert } \Vert \beta (Z)\Vert ^2 \int \limits _{-\infty }^\infty \ell _Z(x+u) dG(x) du \Big )\rightarrow 0, \end{aligned}$$

(28)

$$\begin{aligned}&\text {for every} \,\,0<b<\infty . \nonumber \\&\text {With }\mathcal{B}(x):= E\big (\beta (Z) \beta (Z)'\ell _Z(x)\big ),\text { the matrix} \\&\int \limits _{-\infty }^\infty \mathcal{B}(x)\mathcal{B}(x)' dG(x)\text { is positive definite.} \nonumber \end{aligned}$$

(29)

Consider further the case $G(x)\equiv x$. Let $\sigma :=(E\varepsilon ^2)^{1/2}$. Assume

$$\begin{aligned} \mathrm{(a)} \,\,\, E\Vert h(X)\Vert ^3<\infty , \,\,\, E\zeta ^2<\infty . \quad \mathrm{(b)} \,\,\, C:=\sup _{x\in {\mathbb R},z\in {\mathbb R}^p} \ell _z(x)<\infty . \end{aligned}$$

(30)

Then $E\Vert \beta (Z)\Vert ^j\le E\Vert h(X)\Vert ^j<\infty , \, j=1,2,3$. Let $\gamma (z):= 2\Vert \theta \Vert \Vert \beta (z)\Vert +\sigma $. Then

$$\begin{aligned} E\big (|\zeta | \big | Z=z\big )= & {} E\big (|Y-\theta '\beta (Z)|\big | Z=z\big ) \\= & {} E\big (|\theta 'h(X)+\varepsilon -\theta '\beta (Z)|\big | Z=z\big )\le \gamma (z), \quad \forall \, z\in {\mathbb R}^p. \nonumber \end{aligned}$$

(31)

Hence

$$\begin{aligned}&\int \limits _0^\infty E\Big (\big \Vert \beta (Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dx \\&\, \le E\Big (\Vert \beta (Z)\Vert ^2 E\big (|\zeta |\big |Z\big )\Big ) \le E\Big (\Vert \beta (Z)\Vert ^2\gamma (Z)\Big ) \nonumber \\&\, \le 2 \Vert \theta \Vert E\big ( \Vert \beta (Z)\Vert ^3\big ) +\sigma E\big ( \Vert \beta (Z)\Vert ^2\big )<\infty , \nonumber \end{aligned}$$

thereby showing that (26) is satisfied. The assumption (30)(b) and $\ell _z(x)$ being a density in x for each z and Theorem 9.5 of Rudin [18] readily imply (27) here. The left hand side of (28) equals $2n^{-1/2}bE\big (\Vert \beta (Z)\Vert ^3\big )\rightarrow 0$, by (30)(a).

Next, consider the case $G(x)=\delta _0(x)$- measure degenerate at zero. Assume

$$\begin{aligned}&\lim _{u\rightarrow 0}\ell _z(u)=\ell _z(0)>0, \,\, \forall \, z\in {\mathbb R}^p, \quad 0<E\ell _Z^2(0)<\infty , \\&E\big (\Vert \beta (Z)\Vert ^2\ell _Z^j(0)\big )<\infty , \quad j=1,2. \nonumber \end{aligned}$$

(32)

Then the left hand side of (26) equals $(1/2)E\Vert \beta (Z)\Vert ^2<E\Vert h(X)\Vert ^2<\infty $. Condition (27) is trivially satisfied and the left hand side of (28) equals

$$\begin{aligned} E\Big (\Vert \beta (Z)\Vert ^2 \big [L_Z(n^{-1/2}b\Vert \beta (Z)\Vert )-L_Z(-n^{-1/2}b\Vert \beta (Z)\Vert )\big ]\Big )&\rightarrow 0, \end{aligned}$$

by the DCT and the continuity of $L_z(\cdot )$, for each z.

To summarize, in the case $m_\vartheta (x)=\vartheta 'h(X)$ and $G(x)\equiv x$, assumptions (30)(a), (b) and $\int \mathcal{B}(x)\mathcal{B}(x)' dx$ being positive definite imply all of the above assumptions (13), (14) and (17)–(25). Similarly, in the case $m_\vartheta (x)=\vartheta 'h(X)$ and $G(x)\equiv \delta _0(x)$, $E\Vert h(X)\Vert ^2<\infty $, (32) and $ \mathcal{B}(0)\mathcal{B}(0)'$ being positive definite imply all these conditions.

Remark 3

Because of the importance of the estimators $\widehat{\theta }$ when $G(x)=x$, and $G(x)=\delta _0(x)$, it is of interest to give some simple sufficient conditions for a general $m_\vartheta $ that imply the given assumptions for these two estimators.

Suppose G satisfies $dG(x)\equiv g(x) dx$, where $g_\infty :=\sup _{x\in {\mathbb R}}g(x)<\infty $. Note that $G(x)\equiv x$ corresponds to the case $g(x)\equiv 1$. Consider the following assumptions.

$$\begin{aligned}&\mathrm{(a)} \,\,\, E \big \Vert \dot{m}_\theta (X)\big \Vert ^4<\infty , \end{aligned}$$

(33)

$$\begin{aligned}&\mathrm{(b)}\,\,\,E\big \Vert \dot{m}_{\theta +n^{-1/2}t}(X) - \dot{m}_\theta (X)\big \Vert ^2 \rightarrow 0, \,\, \forall \,t\in {\mathbb R}^q. \nonumber \\&\text {Density }\ell _z\text { of }L_z\text { exists for all }z\in {\mathbb R}^p\text { and satisfies } \\&0<\int \ell _z^2(x)dx<\infty , \, \forall \, z\in {\mathbb R}^p,\quad 0< \int E(\ell ^2_Z(x))dx<\infty , \nonumber \end{aligned}$$

(34)

$$\begin{aligned}&0<\int E\big (\Vert \dot{\nu }(Z)\Vert ^2 \ell _Z^2(x)\big ) dx<\infty . \nonumber \\&\,\,\,E\big ( \Vert \dot{\nu }_{nt}(Z)\Vert ^2 |\nu _{nt}(Z)-\nu (Z)|\big ) \rightarrow 0, \quad \forall \,t\in {\mathbb R}^q. \\\nonumber \end{aligned}$$

(35)

Because $\big \Vert \dot{\nu }(Z)\big \Vert ^j \le E\big ( \Vert \dot{m}_\theta (X) \big \Vert ^j\big | Z\big )$, $E \big \Vert \dot{\nu }(Z)\big \Vert ^j \le E \big \Vert \dot{m}_\theta (X)\big \Vert ^j <\infty , \, j=1,2,3,4$, by (33)(a). Similarly, for every $t\in {\mathbb R}^q$,

$$\begin{aligned} E\big \Vert \dot{\nu }_{nt}(Z) - \dot{\nu }(Z)\big \Vert ^2 \le E \big \Vert \dot{m}_{nt}(X)-\dot{m}_\theta (X)\big \Vert ^2\rightarrow 0, \qquad \text {by (34)(b)}. \end{aligned}$$

(36)

$$\begin{aligned} g_\infty E\Big (\Vert \dot{\nu }(Z)\Vert ^2 E\big (|\zeta |\big |Z\big )\Big )\le & {} g_\infty E\Big (\Vert \dot{\nu }(Z)\Vert ^2 \big (2|\nu (Z)|+\sigma \big )\Big )\\\le & {} g_\infty \big [2E^{1/2}(\Vert \dot{\nu }(Z)\Vert ^4) E^{1/2}(m_\theta ^2(X)) + \sigma E \Vert \dot{\nu }(Z)\Vert ^2\big ] \\< & {} \infty , \end{aligned}$$

by (33)(a), thereby verifying (17) here. Similarly, with C denoting the above upper bound, for every $t\in {\mathbb R}^q$, the left hand side of (18) is bounded from the above $ C E\Big (\Vert \dot{\nu }_{nt}(Z) - \dot{\nu }_\theta (Z)\Vert ^2\Big ) \rightarrow 0, $ by (36). The left hand side of (21) is bounded from the above by $ 2 g_\infty E\big ( \Vert \dot{\nu }_{nt}(Z)\Vert ^2 |\nu _{nt}(Z)-\nu (Z)|\big ) \rightarrow 0,$ by (35).

In other words, in the case G has bounded Lebesgue density, conditions (33)–(35) imply assumptions (14), (17), (18), (19), (20), and (21). Not much simplification occurs in the remaining assumptions (18) and (22)–(25). See Remark 2 for some special cases.

Next consider the case when $G(x)=\delta _0(x)$ and the following assumptions.

$$\begin{aligned}&\sup _{x\in {\mathbb R}}\ell _z(x)<\infty ,\,\, 0<\lim _{u\rightarrow 0} \ell _z(u)=\ell _z(0)<\infty , \,\,\forall \, z\in {\mathbb R}^p. \end{aligned}$$

(37)

$$\begin{aligned}&{\varGamma }_\theta (0)\text { is positive definite.} \end{aligned}$$

(38)

In this case (33), (35), (37) and (38) together imply the assumptions (14), (17)–(22). Not much simplification occurs in the remaining three assumptions (23)–(25), except in some special cases as in Remark 2.

We now resume the discussion about the asymptotic normality of $\widehat{\theta }$. First, we show that $E(D(\theta ))<\infty $, so that by the Markov inequality, $D(\theta )$ is bounded in probability. To see this, by (15), $EU(x,\theta )\equiv 0$ and, for $x\ge 0$,

$$\begin{aligned} E\Vert U(x, \theta )\Vert ^2&=E\Big (\big \Vert \dot{\nu }(Z)\Vert \big \{I(\zeta \le x) - I(\zeta > - x)\big \}\Big )^2\\&= 2E\Big (\big \Vert \dot{\nu }(Z)\Vert ^2\big (1-L_Z(x\big ) \Big ). \end{aligned}$$

By the Fubini Theorem, (16) and (17),

$$\begin{aligned} E(D(\theta ))= 2E (\widetilde{D}(\theta ))= 4\int \limits _0^\infty E\Big (\big \Vert \dot{\nu }(Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dG(x)<\infty . \end{aligned}$$

(39)

To state the AULQ result for D, we need some more notation. Let

$$\begin{aligned}&W(x,0) := n^{-1/2}\sum _{i=1}^n\dot{\nu }(Z_i) \big \{I\big (\zeta _i \le x) - L_{Z_i}(x)\big \}, \\&T_n:=\int \limits _{-\infty }^\infty {\varGamma }_\theta (x) \big \{ W(x,0)+W(-x,0)\big \} dG(x), \quad \widehat{t}= -{\varOmega }_\theta ^{-1} T_n/2, \nonumber \end{aligned}$$

(40)

where ${\varGamma }_\theta (x)$ and ${\varOmega }_\theta $ are as in (22). We are ready to state the following lemma.

Lemma 2

Suppose the above set up and assumptions (17)– (24) hold. Then for every $b<\infty $,

$$\begin{aligned} \sup _{\Vert t\Vert \le b}\big |D(\theta +n^{-1/2}t)-D(\theta ) - 4 T_n' t - 4 t' {\varOmega }_\theta t\big | \rightarrow _p 0.\\\nonumber \end{aligned}$$

(41)

If in addition (25) holds, then, with ${\varSigma }_\theta $ given at (45) below,

$$\begin{aligned}&\mathrm{(a)}\quad \Vert n^{1/2}\big (\widehat{\theta }- \theta ) - \widehat{t}\, \Vert \rightarrow _p 0. \\&\mathrm{(b)} \quad n^{1/2}\big (\widehat{\theta }- \theta ) \rightarrow _D N\big (0,4^{-1}{\varOmega }_\theta ^{-1}{\varSigma }_\theta {\varOmega }_\theta ^{-1}\big ). \nonumber \end{aligned}$$

(42)

Proof

The proof of (41) appears in Sect. 6. The proof of the claim (42)(a), which uses (25), (39) and (41), is similar to that of Theorem 5.4.1 of Koul [16], where (25) and (39) are used to show that $n^{1/2}\Vert \widehat{\theta }-\theta \Vert =O_p(1)$.

Define, for $y\in {\mathbb R},\,u\in {\mathbb R}^p,$

$$\begin{aligned} \psi _u(y):= & {} \int \limits _{-\infty }^y \ell _u(x) dG(x),\quad \varphi _u(y):= \psi _u(-y)-\psi _u(y). \end{aligned}$$

(43)

By (19), $0<\psi _u(y)\le \psi _u(\infty )=\int \limits _{-\infty }^\infty \ell _u(x)dG(x)<\infty ,$ for all $u\in {\mathbb R}^p$. Thus for each u, $\psi _u(y)$ is an increasing continuous bounded function of y and $\psi _u(-y) \equiv \psi _u(\infty )-\psi _u(y)$, and $\varphi _u(y)= \psi _u(\infty )-2\psi _u(y)$, for all $y\in {\mathbb R}$.

By (15), $E(\varphi _u(\zeta ) | Z=z)=0$, for all $u, z\in {\mathbb R}^p$. Let

$$\begin{aligned} C_z(u,v):= & {} \text {Cov}\big [\big (\varphi _u(\zeta ), \varphi _v(\zeta )\big ) \big | Z=z\big ]=4\text {Cov}\big [\big (\psi _u(\zeta ), \psi _v(\zeta )\big ) \big | Z=z\big ],\\ \mathcal{K}(u,v):= & {} E\big (\dot{\nu }(Z)\dot{\nu }(Z)'C_Z(u,v)\big ), \qquad u,v\in {\mathbb R}^p. \end{aligned}$$

Next let $\mu (z):=\dot{\nu }(z) \dot{\nu }(z)'$, Q denote the d.f. of Z and rewrite ${\varGamma }_\theta (x)=E\big (\dot{\nu }_\theta (Z) \dot{\nu }_\theta (Z)'\ell _Z(x)\big )$ $=\int \mu (z) \ell _z(x) dQ(z)$. By the Fubini Theorem,

$$\begin{aligned} T_n:= & {} \int \limits _{-\infty }^\infty {\varGamma }_\theta (x) \big \{ W(x,0)+W(-x,0)\big \} dG(x) \\= & {} \int \int \limits _{-\infty }^\infty \mu (z)\big \{ W(x,0)+W(-x,0)\big \} \ell _z(x) dG(x) dQ(z) \nonumber \end{aligned}$$

(44)

$$\begin{aligned}= & {} \frac{1}{\sqrt{n}} \sum _{i=1}^n\int \mu (z) \dot{\nu }(Z_i) \varphi _z(\zeta _i) dQ(z). \end{aligned}$$

Clearly, $ET_n=0$ and by the Fubini Theorem, the covariance matrix of $T_n$ is

$$\begin{aligned}&{\varSigma }_\theta := ET_n T_n' \\&\,\, = E\Big \{ \Big (\int \mu (z) \dot{\nu }(Z) \varphi _z(\zeta ) dQ(z) \Big ) \Big (\int \mu (v) \dot{\nu }(Z) \varphi _v(\zeta ) dQ(v)\Big )'\Big \} \nonumber \\&\,\, = \int \int \mu (z) \mathcal{K}(z,v) \mu (v)' dQ(z) dQ(v). \nonumber \end{aligned}$$

(45)

Thus $T_n$ is a $p\times 1$ vector of independent centered finite variance r.v.’s. By the classical CLT, $T_n\rightarrow _D N(0,{\varSigma }_\theta )$. Hence, the minimizer $\widetilde{t}$ of the approximating quadratic form $D(\theta ) + 4 T_n t+ 4 t' {\varOmega }_\theta t$ with respect to t satisfies $ \tilde{t}= -{\varOmega }_\theta ^{-1} T_n/2 \rightarrow _D N\big (0,4^{-1}{\varOmega }_\theta ^{-1}{\varSigma }_\theta {\varOmega }_\theta ^{-1}\big ). $ The claim (42)(b) now follows from this result and (42)(a). $\Box $

4.2 Asymptotic Distribution of $\widehat{\theta }_R$

In this subsection we shall establish the asymptotic normality of $\widehat{\theta }_R$. For this we need the following assumptions, where $\mathcal{U}(b):=\{t\in {\mathbb R}^q; \Vert t\Vert \le b\}$, and $0<b<\infty $.

$$\begin{aligned}&\ell _z\text { is uniformly continuous and bounded for every }z\in {\mathbb R}^p. \end{aligned}$$

(46)

$$\begin{aligned}&n^{-1}\sum _{i=1}^nE\Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }(Z_i)\Vert ^2\rightarrow 0, \quad \forall \, t\in \mathcal{U}(b). \end{aligned}$$

(47)

$$\begin{aligned}&n^{-1/2}\sum _{i=1}^n\Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }(Z_i)\Vert =O_p(1), \quad \forall \, t\in \mathcal{U}(b). \end{aligned}$$

(48)

$\forall \,\epsilon>0, \, \exists \, \delta >0$ and $n_\epsilon <\infty $ such that for each $s\in \mathcal{U}(b)$, $\forall \, n>n_\epsilon $,

$$\begin{aligned} P\Big (\sup _{t\in \mathcal{U}(b);\Vert t-s\Vert \le \delta }n^{-1/2}\sum _{i=1}^n\Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }_{ns}(Z_i)\Vert \le \epsilon \Big )>1-\epsilon . \\\nonumber \end{aligned}$$

(49)

$\forall \,\epsilon >0, \, 0<\alpha <\infty ,\, \exists \, N\equiv N_{\alpha ,\epsilon }$ and $b\equiv b_{\epsilon ,\alpha }$ such that

$$\begin{aligned} P\Big (\inf _{\Vert t\Vert>b} \mathcal{K}(\theta +n^{-1/2}t)\ge \alpha \Big ) \ge 1-\epsilon , \quad \forall \, n>N.\\\nonumber \end{aligned}$$

(50)

Let

$$\begin{aligned}&\bar{\dot{\nu }}:= n^{-1}\sum _{i=1}^n\dot{\nu }(Z_i), \quad \dot{\nu }^c(Z_i):= \dot{\nu }(Z_i) - \bar{\dot{\nu }}, \\&\widehat{\varGamma }_\theta (u):= E\Big (\dot{\nu }^c(Z)\dot{\nu }^c(Z)'\ell _Z(L_Z^{-1}(u))\Big ), \quad \widehat{\varOmega }_\theta := \int \limits _0^1\widehat{\varGamma }_\theta (u)\widehat{\varGamma }_\theta (u)'d{\varPsi }(u), \\&\widehat{\mathcal{U}}(u):= n^{-1/2}\sum _{i=1}^n\dot{\nu }^c(Z_i) \big \{I(L_{Z_i}(\zeta _i)\le u) -u\big \}, \quad 0\le u\le 1, \\&\widehat{T}_n:= \int \limits _0^1 \widehat{\varGamma }_\theta (u)\widehat{\mathcal{U}}(u) d{\varPsi }(u) ,\quad \widehat{\mathcal{K}}(t):= \int \limits _0^1 \big \Vert \widehat{\mathcal{U}}(u)\big \Vert ^2 d{\varPsi }(u)+ 2 \widetilde{T}_n' t+ t'\widetilde{\varOmega }_\theta t. \end{aligned}$$

We need to have an alternate representation of the covariance matrix of $\widehat{T}_n$. Let, for $z\in {\mathbb R}^p, \quad 0\le v\le 1$,

$$\begin{aligned} \kappa _z(v):=\int \limits _0^v \ell _z(L_z^{-1}(u))d{\varPsi }(u), \quad k_z^c(v)=k_z(v)-\int \limits _0^1 k_z(u)du. \end{aligned}$$

By (46), $\kappa _z$ is a uniformly continuous increasing and bounded function on [0, 1], for all $z\in {\mathbb R}^p$. Let U denote a uniform [0, 1] r.v. Conditionally, given Z, $L_Z(\zeta )\sim _D U$. Hence, $E\big (k_z\big (L_Z(\zeta )\big )\big |Z\big )=E k_z(U)$ so that $E\big (k_z^c(L_Z(\zeta ))\big |Z\big )=Ek_z^c(U)=0,$ a.s. Let $\mu ^c(z):= \dot{\nu }^c(z) \dot{\nu }^c(z)'$. Argue as for (44) and use the facts that $\sum _{i=1}^n\dot{\nu }_c(Z_i)\equiv 0$ and $\int \limits _0^1 u dk_z(u)= k_z(1) - \int \nolimits _0^1 k_z(u) du$ to obtain that

$$\begin{aligned} \widehat{T}_n= & {} -n^{-1/2} \sum _{i=1}^n\int \mu ^c(z) \dot{\nu }^c(Z_i) \kappa _z^c\big (L_{Z_i}(\zeta _i)\big ) dQ(z). \end{aligned}$$

Define

$$\begin{aligned}&\widehat{C}_z(s,t):= E\big [\kappa _s^c( L_Z(\zeta ))\kappa _t^c( L_Z(\zeta ))\big |Z=z\big ]= E\big [\kappa _s^c(U)\kappa _t^c(U)\big ], \\&\widehat{K}(s,t):= E\big (\dot{\nu }^c(Z) \dot{\nu }^c(Z)' \widehat{C}_Z(s,t)\big ). \end{aligned}$$

Then argue as in (45) to obtain

$$\begin{aligned} \widehat{\varSigma }_\theta :=E\widehat{T}_n \widehat{T}_n '= \int \int \mu ^c(z) \widehat{K}(z,v) \mu ^c(v)' dQ(z) dQ(v). \end{aligned}$$

We are now ready to state the following asymptotic normality result for $\widehat{\theta }_R$.

Lemma 3

Suppose the nonlinear Berkson measurement error model (12) and the assumptions (13), (14), (46)–(49) hold. Then the following holds.

$$\begin{aligned}&\sup _{\Vert t\Vert \le b}\big |\mathcal{K}(\theta +n^{-1/2}t)-\widehat{K}(t)\big |=o_p(1). \end{aligned}$$

(51)

In addition, if (50) holds and $\widehat{\varOmega }_\theta $ is positive definite then $ n^{1/2}(\widehat{\theta }_R-\theta ) \rightarrow _d N\big (0, \widehat{\varOmega }_\theta ^{-1}\widehat{\varSigma }_\theta \widehat{\varOmega }_\theta ^{-1}\big ). $

The proof of this lemma is similar to that of Theorem 1.2 of Koul [15], hence no details are given here. Assumption (50) is used to show that $n^{1/2}\Vert \widehat{\theta }_R-\theta \Vert =O_p(1).$

Remark 4

As in Remark 1, let $m_\theta (x)=\theta ' h(x)$. Then $\nu _\vartheta (z)= \vartheta '\beta (z)$, where $\beta (z):= E\big (h(X)|Z=z\big )$. Thus $\dot{\nu }_\vartheta (z)\equiv \beta (z)$ and the assumptions (47)–(49) are vacuously satisfied. The assumption (50) is shown to be satisfied by an argument similar to the one used in the proof of Lemma 5.4.4 of Koul [16, pp. 183–185]. This proof uses the monotonicity in t for every unit vector $e\in {\mathbb R}^p$ of simple linear rank statistics based on the ranks of $Y_i-te'h(X_i)$, $1\le i\le n$, see Hájek [10, Theorem II.7E].

For the asymptotic normality of $\widehat{\theta }_R$ here, one only needs (46) and ${\varPsi }$ to be a d.f. such that $\widehat{\varOmega }$ is positive definite. Note that here $\mu ^c(z)=\beta ^c(z):=\beta (z) - \bar{\beta }$, $\bar{\beta }:= n^{-1}\sum _{i=1}^n\beta (Z_i)$ and

$$\begin{aligned} \widehat{K}(s,t)&:= E\big (\beta ^c(Z) \beta ^c(Z)' \widehat{C}_Z(s,t)\big ), \\ \widehat{\varSigma }&= \int \int \beta ^c(z)\beta ^c(z)' \widehat{K}(z,v) \beta ^c(v)\beta ^c(v)' dQ(z) dQ(v), \\ \widehat{\varGamma }(u)&:= E\Big (\beta ^c(Z)\beta ^c(Z)'\ell _Z(L_Z^{-1}(u))\Big ), \quad \widehat{\varOmega }=\int \limits _0^1\widehat{\varGamma }(u)\widehat{\varGamma }(u)'d{\varPsi }(u), \end{aligned}$$

do not depend on $\theta $. Clearly, these assumptions are far less stringent than those needed for the asymptotic normality of $\widehat{\theta }$ corresponding to $G(x)\equiv x$.

5 M.D. Estimators with Validation Data

In this section we develop the m.d. estimators of Sect. 4 when the d.f. H of the Berkson ME $\eta $ is unknown but a validation data set is available. Not knowing H renders $\nu _\theta $ to be an unknown function. Validation data is used to estimate this function, which in turn is used to define m.d. estimators.

Let N be a known positive integer. A set of r.v.’s $\{(\tilde{X}_k, \tilde{Z}_k), k = 1, ...,N\}$ is said to be validation data if these r.v.’s are independent of the original sample and both $\tilde{Z}_k$ and $\tilde{X}_k$ are observable and obey the model (12). Besides having the primary data set $\{(Y_i,Z_i), 1 \le i \le n\}$, we assume that a validation data set of the covariate $\{(\tilde{X}_k, \tilde{Z}_k), 1\le k \le N\}$ is available. Then $\tilde{\eta }_k := \tilde{X}_k - \tilde{Z}_k, 1 \le k \le N$ are observable and their empirical d.f. $H_N(s) := N^{-1}\sum _{k=1}^{N} I(\tilde{\eta }_k \le s), s \in \mathbb {R}$, provides an estimate of H.

Under (13)–(15), we have the following estimates of $\nu _{\theta }$ and $\dot{\nu }_\theta $.

$$\begin{aligned}&\hat{\nu }_\vartheta (z) := N^{-1} \sum _{k=1}^N m_\vartheta (z + \tilde{\eta }_k),\qquad \hat{\dot{\nu }}_\vartheta (z) := N^{-1} \sum _{k=1}^N \dot{m}_\vartheta (z + \tilde{\eta }_k). \end{aligned}$$

An analog of $\widehat{\theta }$ in the current set up is defined as follows. Let

$$\begin{aligned} \widehat{U}(x,\vartheta )&:= n^{-1/2} \sum _{i=1}^n \hat{\dot{\nu }}_\vartheta (Z_i) \{ I(Y_i - \hat{\nu }_\vartheta (Z_i) \le x) - I(-Y_i + \hat{\nu }_\vartheta (Z_i) < x)\}, \\ D_1(\vartheta )&:= \int \Vert \widehat{U}(x,\vartheta )\Vert ^2 dG(x), \qquad \widehat{\theta }_1 = \text {argmin}_\vartheta D_1(\vartheta ). \end{aligned}$$

To define the analog of $\hat{\theta }_R$ here, let $\tilde{S}_{i\vartheta }$ be the rank of $Y_i - \hat{\nu }_\vartheta (Z_i)$ among $Y_j - \hat{\nu }_\vartheta (Z_j), 1\le j\le n$ and define

$$\begin{aligned} \tilde{\mathcal {U}}_n(u,\vartheta )&:= \frac{1}{\sqrt{n}} \sum _{i=1}^n \hat{\dot{\nu }}_\vartheta (Z_i) \{I(\tilde{S}_{i\vartheta }\le nu)-u\}, \quad 0\le u\le 1,\\ \tilde{\mathcal {K}}(\vartheta )&:= \int \limits _0^1 \Vert \tilde{\mathcal {U}}_n(u,\vartheta )\Vert ^2 d{\varPsi }(u), \quad \tilde{\theta }_R := \text {argmin}_\vartheta \tilde{\mathcal {K}}(\vartheta ). \end{aligned}$$

The asymptotic distributions of $\widehat{\theta }_1$ and $\tilde{\theta }_R$ as $n\wedge N\rightarrow \infty $ are described in the next two subsections. In their derivations, the lim(n/N) of the ratio n/N plays an important role. Some of the proofs are similar to those of $\widehat{\theta }$ and $\widehat{\theta }_R$. Some key steps of the proof can be found in the Appendix.

5.1 Asymptotic Distribution of $\widehat{\theta }_1$

In this subsection we derive the asymptotic distribution of $\widehat{\theta }_1$. In addition to (13)–(15) and (17)–(25), the following assumptions are needed, where ${\varDelta }_\vartheta (z) := \hat{\nu }_\vartheta (z) - \nu _\vartheta (z)$ and $\theta $ is as in (12).

$$\begin{aligned}&E\big \Vert E\big \{\dot{m}_\theta (X)[\nu _\theta (Z) - m_\theta (X)] |Z\big \}\big \Vert ^2 <\infty ,\end{aligned}$$

(52)

$$\begin{aligned}&E\big \Vert E\big \{\dot{m}_\theta (X)[\nu _\theta (Z) - m_\theta (X)] |\eta \big \}\big \Vert ^2 < \infty . \nonumber \\&\text {The matrix } \end{aligned}$$

(53)

$$\begin{aligned}&{\varSigma }_1 := \text {Cov}\Big (E\big [ \textstyle \int \int \mu (z)\dot{\nu }_\theta (Z)\ell _z(x) \ell _Z(x) [m_\theta (Z+\eta ) - \nu _\theta (Z)]dxdQ(z)\big |\eta \big ]\Big ) \nonumber \\&\text { is positive definite.} \nonumber \\&\lambda := \lim ( n/N)\ge 0. \end{aligned}$$

(54)

$$\begin{aligned}&\max _{1\le i\le n} \Big |N^{-1}\sum _{k=1}^N m_\theta (Z_i + \tilde{\eta }_k) - \nu _\theta (Z_i)\Big | = o_p(1). \end{aligned}$$

(55)

$$\begin{aligned}&E \Big \{\Vert \dot{\nu }_\theta (Z)\Vert ^2\big (m_\theta (X) - \nu _\theta (Z)\big )^2\Big \} <\infty . \end{aligned}$$

(56)

$$\begin{aligned}&\int \limits _0^\infty E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \big (1 - L_Z(x\pm {\varDelta }_\theta (Z))\big )\Big ) dG(x)<\infty . \end{aligned}$$

(57)

$$\begin{aligned}&\int \limits _0^\infty E\Big (\Vert \hat{\dot{\nu }}_{nt}(Z) - \hat{\dot{\nu }}_\theta (Z)\Vert ^2 L_Z(x\pm {\varDelta }_\theta (Z)) \\&\qquad \qquad \qquad \qquad \quad \times (1-L_Z(x\pm {\varDelta }_\theta (Z))\Big ) dG(x) \rightarrow 0,\,\, \forall \,t\in {\mathbb R}^q. \nonumber \end{aligned}$$

(58)

We also assume that (18)–(24) and (25) hold with $\dot{\nu }_{nt}$, $\dot{\nu }_{\theta }$ and D replaced by $\hat{\dot{\nu }}_{nt}$, $\hat{\dot{\nu }}$ and $D_1$, respectively. We denote these assumptions as (18)$^*$–(25)$^*$.

Here we discuss some sufficient conditions for the above assumptions. By the C-S inequality, both the expressions of (52) are bounded from the above by $2 E\big \Vert \dot{m}_\theta (X)\big \Vert ^2 E\big |m_\theta (X)\big |^2$. Thus (52) is implied by (33)(a) and having $E\big |m_\theta (X)\big |^2<\infty $.

Next, under (33)(a), (57) is trivially satisfied when $G(x)\equiv \delta _0(x)$. In the case $dG(x)=g(x)dx$ with $g_\infty :=\sup _{y\in {\mathbb R}}g(y)<\infty $, (57) is implied by (33)(a) and the following conditions.

$$\begin{aligned} E\big (\big \Vert \dot{m}_\theta (X)\Vert ^2 |m_\theta (X)|\big )<\infty .\\\nonumber \end{aligned}$$

(59)

To see this, note that $E\big (|{\varDelta }_\theta (Z)|\big |Z\big )\le 2 E\big (|m_\theta (X)|\big | Z\big )$, and $\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \le N^{-1}\sum _{k=1}^N\Vert \dot{m}_\theta (Z+\eta _k)\Vert ^2$ so that $E\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \le E\Vert \dot{m}_\theta (X)\Vert ^2$. Now argue as in Remark 4.2 and use these facts to obtain that the left hand side of (57) is bounded from the above by

$$\begin{aligned}&g_\infty E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 E\Big (|\zeta |+|{\varDelta }_\theta (Z)|\big |Z\Big )\Big )\\&\le g_\infty \Big [C E \big \Vert \dot{m}_\theta (X)\Vert ^2 + 2 E\big (\big \Vert \dot{m}_\theta (X)\Vert ^2 |m_\theta (X)|\big )\Big ]<\infty , \end{aligned}$$

by (33)(a) and (59). Similarly, the left hand side of (58) is bounded from the above by a constant multiple of

$$\begin{aligned} \,\,E\big \Vert \hat{\dot{\nu }}_{nt}(Z)- \hat{\dot{\nu }}_\theta (Z)\big \Vert ^2 \le E\big \Vert \dot{m}_{\theta +n^{-1/2}t}(X)-\dot{m}_{\theta }(X)\big \Vert ^2\rightarrow 0, \,\,\, \text {by (34)(b)}. \end{aligned}$$

We now turn to proving the asymptotic normality of $\widehat{\theta }_1$. Similar to Sect. 4.1, we first prove that $E(D_1(\theta ))<\infty $. Recall ${\varDelta }_\vartheta (z) := \hat{\nu }_\vartheta (z) - \nu _\vartheta (z)$ and rewrite

$$\begin{aligned} \widehat{U}(x,\theta )= & {} \frac{1}{\sqrt{n}} \sum _{i=1}^n \hat{\dot{\nu }}_\theta (Z_i) \Big \{ I(\zeta _i \le x + {\varDelta }_\theta (Z_i)) - I( - \zeta _i < x - {\varDelta }_\theta (Z_i))\Big \}. \end{aligned}$$

By the independence of the primary and validation data and a conditioning argument, for every $x>0$,

$$\begin{aligned} E\Vert \widehat{U}(x,\theta )\Vert ^2&= E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert \{I(\zeta \le x + {\varDelta }_\theta (Z)) - I( - \zeta < x - {\varDelta }_\theta (Z))\}\Big )^2 \\&= E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \{1 - L_Z(x+{\varDelta }_\theta (Z)) + 1 - L_Z(x-{\varDelta }_\theta (Z))\}\Big ). \end{aligned}$$

Hence by (56), $ED_1(\theta )<\infty $.

Next we sketch the proof of the AULQ property of $D_1(\vartheta )$. Define

$$\begin{aligned} \tilde{W}(x,0):= & {} n^{-1/2} \sum _{i=1}^n \hat{\dot{\nu }}(Z_i) \{I(\zeta _i \le x + {\varDelta }_\theta (Z_i)) - L_{Z_i}(x)\}, \\ \tilde{T}_n:= & {} \int {\varGamma }_\theta (x) \{\tilde{W}(x,0) + \tilde{W}(-x,0)\} dG(x). \end{aligned}$$

In the Appendix, we show that $\tilde{T}_n$ is approximated by a U-statistic based on the two independent samples. Theorem 6.1.4 in Lehmann [17] yields

$$\begin{aligned}&\mathrm{(a)} \quad \tilde{T}_n \rightarrow N(0, {\varSigma }_\theta + 4 \lambda {\varSigma }_1), \quad \lambda <\infty , \\&\mathrm{(b)}\quad \sqrt{N/n} \,\tilde{T}_n \rightarrow N(0, 4 {\varSigma }_1), \quad \lambda = \infty . \nonumber \end{aligned}$$

(60)

Next, the assumptions (54)–(58) and (18)$^*$–(24)$^*$ ensure that the analog of Lemma 5 holds here also. Hence (41) with $T_n$ and $D(\vartheta )$ replaced by $\widetilde{T}_n$ and $D_1(\vartheta )$, respectively, holds. Moreover, analog of (42) can be shown to hold in a similar manner as in Sect. 4 under (25)$^*$. Consequently, the asymptotic distribution of $\widehat{\theta }_1$ based on data sets $\{(Y_i,Z_i),1\le i\le n\}$ and $\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}$ described in the following lemma.

Lemma 4

Suppose model (12) with H unknown holds and an independent validation data $\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}$ obeying (12) is available. In addition assume that (17)–(25) and (52)–(57) hold. Then

$$\begin{aligned}&\sqrt{n}(\widehat{\theta }_1 - \theta ) \rightarrow N(0,4^{-1}{\varOmega }_\theta ^{-1}({\varSigma }_\theta + 4 \lambda {\varSigma }_1){\varOmega }_\theta ^{-1}), \,\,\mathrm{for} \,\, 0\le {\lambda } < \infty ;\\[.2cm]&\sqrt{N} (\widehat{\theta }_1 - \theta ) \rightarrow N(0,16^{-1} {\varOmega }_\theta ^{-1}{\varSigma }_1{\varOmega }_\theta ^{-1}), \,\,\mathrm{for}\,\,{\lambda } = \infty . \end{aligned}$$

The above result shows that the estimation step of regression function $\nu _\theta (z)$ due to the unknown distribution H introduces more variation in the asymptotic distribution of the m.d. estimators. Moreover, the limiting ratio $\lambda $ of the sample sizes plays a role in the additional variation. When $\lambda = \lim n/N = 0$, the additional covariance term vanishes, therefore it reduces to the case when the ME distribution is known. In other words, when the validation sample size N is sufficiently large, compared to the primary sample size n, both $\hat{\theta }$ and $\widehat{\theta }_1$ achieve the same asymptotic efficiency. On the other hand, when $\lambda = \infty $, i.e., when the validation data size is very limited compared to the primary data size, the estimation consistency rate is restricted to $\sqrt{N}$ instead of $\sqrt{n}$.

5.2 Asymptotic Distribution of $\tilde{\theta }_R$

In this subsection we present the asymptotic distribution of the class of estimators $\tilde{\theta }_R$. First, we provide the additional assumptions. Let $\hat{\dot{\nu }}_{nt}(z) = N^{-1}\sum _{k=1}^N \dot{\nu }_{\theta +n^{-1/2}t}(z+\tilde{\eta }_k)$. Consider the following assumptions.

$$\begin{aligned}&n^{-1}\sum _{i=1}^n E\Vert \hat{\dot{\nu }}_{nt}(Z_i) - \hat{\dot{\nu }}(Z_i)\Vert ^2 \rightarrow 0,\quad \forall \, t\in \mathcal{U}(b).\end{aligned}$$

(61)

$$\begin{aligned}&\,\,n^{-1/2}\sum _{i=1}^n\Vert \hat{\dot{\nu }}_{nt}(Z_i) - \hat{\dot{\nu }}(Z_i)\Vert = O_p(1),\quad \forall \, t\in \mathcal{U}(b).\\\nonumber \end{aligned}$$

(62)

$\forall \,\epsilon> 0,\,\,\exists \,\,\delta > 0$ and $n_\epsilon < \infty $ such that for each $s \in \mathcal{U}(b)$, $\forall \,n>n_\epsilon $,

$$\begin{aligned}&P\Big (\sup _{t\in \mathcal{U}(b);\Vert t-s\Vert \le \delta }n^{-1/2}\sum _{i=1}^n \Vert \hat{\dot{\nu }}_{nt}(Z_i) - \hat{\dot{\nu }}_{ns}(Z_i)\Vert \le \epsilon \Big )>1-\epsilon . \\\nonumber \end{aligned}$$

(63)

For every $\epsilon >0, \, 0<\alpha <\infty $, there exist an $N_\epsilon $ and $b\equiv b_{\epsilon ,\alpha }$ such that

$$\begin{aligned}&P\Big (\inf _{\Vert t\Vert>b} \tilde{\mathcal {K}}(\theta +n^{-1/2}t)\ge \alpha \Big ) \ge 1-\epsilon , \quad \forall \, n>N_\epsilon .\end{aligned}$$

(64)

$$\begin{aligned}&\text {The matrix } \\&{\varSigma }_2 := \text {Cov}\Big (E\big [\textstyle \int \int \mu ^c(z) \{\dot{\nu }_\theta (Z) - E(\dot{\nu }_\theta (Z))\} \ell _z(x)\ell _Z(x) \nonumber \\&\qquad \qquad \qquad \qquad \;\;\;\; \times \{m_\theta (Z+\eta ) - \nu _\theta (Z)\} dxdQ(z)\big |\eta \big ]\Big ) \nonumber \\&\text {is positive definite}. \nonumber \end{aligned}$$

(65)

Next, define $\tilde{\dot{\nu }} := n^{-1} \sum _{i=1}^n\hat{\dot{\nu }}(Z_i), \qquad \tilde{\dot{\nu }}^c (Z_i) := \hat{\dot{\nu }}(Z_i) - \tilde{\dot{\nu }},$

$$\begin{aligned}&\tilde{T}_{n,R} := \int \limits _0^1 \widehat{{\varGamma }}_\theta (u)\tilde{\mathcal {U}}_R(u) d{\varPsi }(u), \\&\tilde{\mathcal{K}}_R(t) := \int \limits _0^1 \Vert \tilde{\mathcal {U}}_R(u)\Vert ^2 d{\varPsi }(u) + 2 \tilde{T}_{n,R}'t + t'\widehat{\varOmega }_\theta t, \end{aligned}$$

where $\widehat{\varGamma }_\theta $ and $\widehat{\varOmega }_\theta $ are defined in Sect. 4.2. Similar to Lemma 4. we have the following lemma.

Lemma 5

Suppose model (12) with H unknown holds and an independent validation data $\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}$ obeying (12) is available. In addition assume that (54), (55), (61)–(65) hold. Then, for $0\le \lambda < \infty $,

$$\begin{aligned}&\mathrm{(a)} \quad \tilde{T}_{n,R} \rightarrow N(0,\widehat{\varSigma }_\theta + \lambda {\varSigma }_2), \\&\mathrm{(b)} \quad n^{1/2}(\tilde{\theta }_R - \theta ) \rightarrow N(0,\widehat{\varOmega }_\theta ^{-1}(\widehat{\varSigma }_\theta + \lambda {\varSigma }_2)\widehat{\varOmega }_\theta ^{-1}). \nonumber \end{aligned}$$

(66)

Moreover, $ N^{1/2}(\tilde{\theta }_R - \theta ) \rightarrow N(0,\widehat{\varOmega }_\theta ^{-1}{\varSigma }_2 \widehat{\varOmega }_\theta ^{-1}) $, for $\lambda =\infty $.

See Appendix for some details of the proof.

6 Data Analysis

Example. We shall now compute the above estimators based on some real data. The data pertains to the study of the relationship between the enzyme reaction speed (Y) and the basal density (X) of the UDP-galactose, see Bates and Watts [1], p. 70. A suitable model commonly used to analyze this data is the Michaelis-Menten model

$$\begin{aligned} m_\theta (x) = \frac{\alpha x}{\beta + x}, \quad \theta :=(\alpha , \beta )', \,\alpha> 0, \, \beta>0, \, x > 0. \end{aligned}$$

In the primary data, consisting of $n=30$ observations, the basal density variable was measured using a simple chemical method. It was believed that this method caused measurement error in the observation. Hence, in the validation data, consisting of $N=10$ observations, an expensive procedure with a precision machine tool was used to produce precise observations of the basal density. Let Z denote the basal-density obtained by the chemical method parts per millions (ppm), $\widetilde{Z}$ denote the basal-density obtained by the exact measure (ppm) and Y, the reaction speed (counts/min$^2$). The primary and validation data are as follows. Table 2 gives the m.d. estimators $\hat{\theta }_1$ with $G(x)=x$ and $\tilde{\theta }_R$ with ${\varPsi }(u) = u$, based on the above primary and validation data, and the naive least squares estimators $\hat{\theta }_\mathrm{nLS}$ obtained by ignoring measurement errors. The MSEs are calculated by using the following formulas, where $\widetilde{\eta }_k=\widetilde{X}_k-\widetilde{Z}_k$, $MSE(\hat{\theta }_1) = \frac{1}{n} \sum _{i=1}^n \Big [Y_i - \frac{1}{N}\sum _{k=1}^N m_{\hat{\theta }_1}(Z_i+\widetilde{\eta }_k)\Big ]^2,$ $MSE(\tilde{\theta }_R) = \frac{1}{n} \sum _{i=1}^n \Big [ Y_i - \frac{1}{N}\sum _{k=1}^N m_{\tilde{\theta }_R}(Z_i+\widetilde{\eta }_k)\Big ]^2$ and $MSE(\hat{\theta }_\mathrm{nLS}) = \frac{1}{n} \sum _{i=1}^n \Big [Y_i - m_{\hat{\theta }_\mathrm{nLS}}(Z_i)\Big ]^2.$ Figure 1 presents the fitted regression curves using the three estimators.

Z	0.02	0.02	0.04	0.04	0.06	0.06	0.08	0.08	0.11	0.11	0.14	0.14	0.18	0.18	0.22
Y	76	47	82	95	97	107	118	127	123	139	146	149	157	151	159
Z	0.42	0.42	0.56	0.56	0.66	0.66	0.86	0.86	1.10	1.10	0.22	0.28	0.28	0.34	0.34
Y	185	189	191	192	193	196	198	202	207	2.04	152	173	180	179	182

$\widetilde{Z}$	0.04	0.07	0.20	0.30	0.38	0.48	0.60	0.76	0.95	1.110
$\widetilde{X}$	0.035	0.076	0.207	0.295	0.388	0.486	0.601	0.754	0.952	1.112

Table 2 M.D. and naive estimators and their MSE

Full size table

References

Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley, New York (1998)
MATH Google Scholar
Beran, R.J.: Minimum Helinger distance estimates for parametric models. Ann. Statist. 5, 445–463 (1977)
Article MathSciNet Google Scholar
Beran, R.J.: An efficient and robust adaptive estimator of location. Ann. Statist. 6, 292–313 (1978)
Article MathSciNet Google Scholar
Berkson, J.: Are these two regressions? J. Amer. Statist. Assoc. 5, 164–180 (1950)
Article Google Scholar
Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.P.: Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman & Hall/CRC, Bota Raton, FL (2006)
Book Google Scholar
Cheng, C.L., Van Ness, J.W.: Statistical Regression with Measurement Error. Wiley, New York (1999)
MATH Google Scholar
Donoho, D.L., Liu, R.C.: Pathologies of some minimum distance estimators. Ann. Statist. 16, 587–608 (1988a)
Article MathSciNet Google Scholar
Donoho, D.L., Liu, R.C.: The “automatic” robustness of minimum distance functionals. Ann. Statist. 16, 552–586 (1988b)
Article MathSciNet Google Scholar
Fuller, W.A.: Measurement Error Models. Wiley, New York (1987)
Book Google Scholar
Hájek, J.: Nonparametric Statistics. Holden Day, San Francisco, USA (1969)
MATH Google Scholar
Hodges Jr., J.L., Lehmann, E.L.: Estimates of location based on rank tests. Ann. Math. Statist. 34, 598–611 (1963)
Article MathSciNet Google Scholar
Koul, Hira L.: Weighted empirical processes and the regression model. J. Indian Statist. Assoc. 17, 83–91 (1979)
MathSciNet Google Scholar
Koul, Hira L.: Minimum distance estimation in multiple linear regression. Sankhya Ser. A., 47. Part 1, 57–74 (1985a)
Google Scholar
Koul, Hira L.: Minimum distance estimation in linear regression with unknown errors. Statist. Prob. Lett. 3, 1–8 (1985b)
Article MathSciNet Google Scholar
Koul, Hira L.: Asymptotics of some estimators and sequential residual empiricals in non-linear time series. Ann. Statist. 24, 380–404 (1996)
Article MathSciNet Google Scholar
Koul, H.L.: Weighted Empirical Processes in Dynamic Nonlinear Models. Lecture Notes Series in Statistics, 2nd edn., vol. 166. Springer, New York (2002)
Google Scholar
Lehmann, E.L.: Elements of Large-Sample Theory. Springer, New York, N.Y., USA (1999)
Book Google Scholar
Rudin, W.: Real and Complex Analysis, 2nd edn. McGraw-Hill, New York (1974)
MATH Google Scholar
Yi, G.: Statistical analysis with measurement error or misclassification. Strategy, method and application. With a foreword by Raymond J. Carroll. In: Springer Series in Statistics. Springer, New York (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics and Probability, Michigan State University, East Lansing, MI, USA
Hira L. Koul
Department of Mathematics, College of Arts and Sciences, Illinois State University, Normal, IL, USA
Pei Geng

Authors

Hira L. Koul
View author publications
You can also search for this author in PubMed Google Scholar
Pei Geng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hira L. Koul .

Editor information

Editors and Affiliations

Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Matúš Maciak
Department of Probability and Mathematical Statistics, Charles University, Prague, Czech Republic
Michal Pešta
Department of Applied Mathematics, Technical University of Liberec, Liberec, Czech Republic
Martin Schindler

Appendix

This section contains some details of the proofs of the various results.

Proof of (41). Let $\widetilde{M}(t)= \widetilde{D}(\theta +n^{-1/2}t)$, where $\widetilde{D}(\vartheta )$ is as in (16). Define

$$\begin{aligned} \nu _{nt}(z)&:= \nu _{\theta +n^{-1/2}t}(z), \qquad \xi _{it} := \nu _{nt}(Z_i) - \nu _\theta (Z_i), \\ V_s(x, t)&:= \frac{1}{\sqrt{n}}\sum _{i=1}^n\dot{\nu }_{ns}(Z_i) I(Y_i - \nu _{nt}(Z_i) \le x)\\&= \frac{1}{\sqrt{n}}\sum _{i=1}^n\dot{\nu }_{ns}(Z_i) I\big (\zeta _i \le x+ \xi _{i t} \big ), \nonumber \\ V(x, t)&:= \frac{1}{\sqrt{n}}\sum _{i=1}^n\dot{\nu }(Z_i) I\big (\zeta _i \le x+ \xi _{i t} \big ), \\ J(x, t)&:= \frac{1}{\sqrt{n}}\sum _{i=1}^n\dot{\nu }(Z_i) L_{Z_i}(x+ \xi _{i t}),\\ J_s(x, t)&:= \frac{1}{\sqrt{n}}\sum _{i=1}^n\dot{\nu }_{ns}(Z_i) L_{Z_i}(x+ \xi _{i t}),\,\,\,\, W_s(x, t) := V_s(x, t)-J_s(x,t),\nonumber \\ W(x, t)&:= V(x, t)-J(x,t), \,\,\,\, s,t\in {\mathbb R}^q, x\in {\mathbb R}. \nonumber \end{aligned}$$

Note that $EV_s(x,t)\equiv E J_s(x,t), \, EW_s(x,t)\equiv 0$. By (15), $\forall \, s\in {\mathbb R}^q, x\in {\mathbb R}$,

$$\begin{aligned} n^{-1/2}\sum _{i=1}^n\dot{\nu }_{ns} (Z_i) \big \{L_{Z_i}(x) + L_{Z_i}(-x)\big \}= n^{-1/2}\sum _{i=1}^n\dot{\nu }_{ns}(Z_i). \end{aligned}$$

Define

$$\begin{aligned} \gamma _{nt}(x):= n^{-1/2} \sum _{i=1}^n\dot{\nu }_{nt}(Z_i) \xi _{it} \ell _{Z_i}(x), \quad g_n(x):= n^{-1}\sum _{i=1}^n\dot{\nu }(Z_i) \dot{\nu }(Z_i)'\ell _{Z_i}(x) . \end{aligned}$$

Because of (15), $\gamma _{nt}(x)\equiv \gamma _{nt}(-x),\, g_n(x)\equiv g_n(-x)$ and we rewrite

$$\begin{aligned} \widetilde{M}(t)= & {} \int \limits _0^\infty \big \Vert V_t(x,t)+V_t(-x,t) -n^{-1/2}\sum _{i=1}^n\dot{\nu }_{nt}(Z_i)\big \Vert ^2 dG(x) \nonumber \\= & {} \int \limits _0^\infty \Big \Vert \big \{W_t(x,t) - W_t(x,0)\big \} + \big \{W_t(x,0)-W(x,0)\big \} \\&\quad \; + \big \{W_t(-x,t) - W_t(-x,0)\big \} + \big \{W_t(-x,0)-W(-x,0)\big \} \\&\quad \; + \,\big \{J_t(x,t) -J_t(x,0)- \gamma _{nt}(x) \big \} \\&\quad \; + \,\big \{J_t(-x,t)-J_t(-x,0)- \gamma _{nt}(-x) \big \} + 2 \big \{\gamma _{nt}(x) - g_n(x)t \big \} \nonumber \\&\quad \; +\,\big \{ W(x,0)+W(-x,0)+ 2 g_n(x) t \big \}\Big \Vert ^2 dG(x). \end{aligned}$$

Expand the quadratic of the six summands in the integrand to obtain

$$\begin{aligned} \widetilde{M}(t)=M_1(t)+M_2(t)+\cdots + M_8(t)+ 28\,\, \text {cross product terms}, \end{aligned}$$

where

$$\begin{aligned} M_1(t)&:= \int \limits _0^\infty \big \Vert W_t( x,t) - W_t(x,0)\big \Vert ^2 dG(x), \\ M_2(t)&:= \int \limits _0^\infty \big \Vert W_t( x,0) - W(x,0)\big \Vert ^2 dG(x), \\ M_3(t)&:= \int \limits _0^\infty \big \Vert W_t(-x,t) - W_t(-x,0)\big \Vert ^2 dG(x),\\ M_4(t)&:= \int \limits _0^\infty \big \Vert W_t(-x,0) - W(-x,0)\big \Vert ^2 dG(x),\\ M_5(t)&:= \int \limits _0^\infty \big \Vert J_t( x,t) -J_t( x,0)- \gamma _{nt}( x) \big \Vert ^2 dG(x), \\ M_6(t)&:= \int \limits _0^\infty \big \Vert J_t(-x,t) -J_t(-x,0)- \gamma _{nt}(-x) \big \Vert ^2 dG(x),\\ M_7(t)&:= 4\int \limits _0^\infty \big \Vert \gamma _{nt}(x) - g_n(x)t \big \Vert ^2 dG(x), \\ M_8(t)&:= \int \limits _0^\infty \big \Vert W(x,0)+W(-x,0)+ 2 g_n(x) t \big \Vert ^2 dG(x). \end{aligned}$$

Recall $\mathcal{U}(b):=\{ t\in {\mathbb R}^q; \Vert t\Vert \le b\}$, $b>0$. We shall prove the following lemma shortly.

Lemma 6

Under the assumptions (13) to (18), $\forall \, 0<b<\infty $,

$$\begin{aligned} \sup _{t\in \mathcal{U}(b)} M_j(t)&\rightarrow _p&0, \quad j=1,2, \ldots ,7, \end{aligned}$$

(67)

$$\begin{aligned} \sup _{t\in \mathcal{U}(b)} M_8(t)= & {} O_p(1). \\\nonumber \end{aligned}$$

(68)

Unless mentioned otherwise, all the supremum below are taken over $t\in \mathcal{U}(b)$. Lemma 6 together with the C-S inequality implies that the supremum over t of all the cross product terms tends to zero, in probability. For example, by the C-S inequality,

$$\begin{aligned}&\sup _t \Big |\int \limits _0^\infty \big \{W_t( x,t) - W_t(x,0)\big \} \big \{ J_t( x,t) -J_t( x,0)- \gamma _{nt}( x)\big \} dG(x)\Big |^2 \\&\qquad \quad \le \sup _t M_1(t) \sup _t M_5(t) =o_p(1), \end{aligned}$$

by (67) used with $j=1, 5$. Similarly, by (67) with $j=1$ and (68),

$$\begin{aligned}&\sup _t\Big |\int \limits _0^\infty \big \{W_t( x,t) - W_t(x,0)\big \} \big \{W(x,0)+W(-x,0)+ 2 g_n(x) t\big \}dG(x)\Big |^2 \\&\qquad \quad \le \sup _t M_1(t) \sup _t M_8(t) =o_p(1) \times O_p(1)=o_p(1). \end{aligned}$$

Consequently, we obtain

$$\begin{aligned} \sup _t\big |\widetilde{M}(t) -M_8(t)\big |=o_p(1).\\\nonumber \end{aligned}$$

(69)

Expand the quadratic in $M_8$ to write

$$\begin{aligned} M_8(t):= & {} \int \limits _0^\infty \big \Vert W(x,0)+W(-x,0)\big \Vert ^2 dG(x) + \\&4 t'\int \limits _0^\infty g_n(x) \big \{ W(x,0)+W(-x,0)\big \} dG(x) + 4\int \limits _0^\infty \big (t'g_n(x)\big )^2 dG(x) \nonumber \\= & {} \widetilde{M}(0) + 4t' \widetilde{T}_n + 4 \int \limits _0^\infty \big (t'g_n(x)\big )^2 dG(x), \nonumber \end{aligned}$$

(70)

where $ \widetilde{T}_n:=\int \limits _0^\infty g_n(x) \big \{ W(x,0)+W(-x,0)\big \} dG(x). $ Let

$$\begin{aligned} T_n^*:=\int \limits _0^\infty {\varGamma }_\theta (x) \big \{ W(x,0)+W(-x,0)\big \} dG(x). \end{aligned}$$

By the LLNs and an Extended Dominated Convergence Theorem

$$\begin{aligned}&\sup _t \big \Vert t'(g_n(x)-{\varGamma }_\theta (x))\big \Vert \rightarrow _p 0, \quad \forall \, x\in {\mathbb R}; \\&\sup _t\int \limits _0^\infty \big \Vert t'(g_n(x)-{\varGamma }_\theta (x))\big \Vert ^2 dG(x)\rightarrow _p 0. \end{aligned}$$

Moreover, recall $\widetilde{M}(0)=\widetilde{D}(\theta )$, so that by (39), $\widetilde{M}(0)=O_p(1)$. These facts together with the C-S inequality imply that

$$\begin{aligned} \big \Vert \widetilde{T}_n - T_n^*\big \Vert ^2= & {} \Big \Vert \int \limits _0^\infty \big \{g_n(x)-{\varGamma }_\theta (x)\big \} \big \{W(x,0)+W(-x,0)\big \} dG(x)\Big \Vert ^2 \\\le & {} \widetilde{M}(0) \int \limits _0^\infty \big \Vert g_n(x)-{\varGamma }_\theta (x)\big \Vert ^2 dG(x) \rightarrow _p 0. \end{aligned}$$

These facts combined with (22), (69), (70) yield that

$$\begin{aligned} \sup _t\Big |\widetilde{M}(t)- \widetilde{M}(0) - 4T_n^* t - 4 t' \int \limits _0^\infty {\varGamma }_\theta (x) {\varGamma }_\theta (x) dG(x)\, t \Big |= o_p(1). \end{aligned}$$

Now recall that $D(\vartheta )=2\widetilde{D}(\vartheta )$, $\widetilde{M}(t)= \widetilde{D}(\theta +n^{-1/2}t)$, ${\varOmega }_\theta =2\int \limits _0^\infty {\varGamma }_\theta {\varGamma }_\theta dG$ and $T_n =2T_n^*$, see (40). Hence the above expansion is equivalent to

$$\begin{aligned} \sup _t\big |\widetilde{D}(\theta +n^{-1/2}t) - \widetilde{D}(\theta ) - 2T_n t - 2 t'{\varOmega }_\theta t\big |= o_p(1), \\ \sup _t\big | D(\theta +n^{-1/2}t) - D(\theta ) - 4T_n t - 4t'{\varOmega }_\theta t\big |=o_p(1), \end{aligned}$$

which is precisely the claim (41). $\square $

Proof of Lemma 6. Let $\delta _{it}:= \xi _{it}-n^{-1/2}t'\dot{\nu }(Z_i).$ By (13) and (14),

$$\begin{aligned} \max _{1\le i\le n,\, t}n^{1/2}\big |\delta _{i t}\big |=o_p(1), \qquad \max _{1\le i\le n}n^{-1/2}\Vert \dot{\nu }(Z_i)\Vert =o_p(1).\\\nonumber \end{aligned}$$

(71)

Hence,

$$\begin{aligned} \max _{1\le i\le n,\,t} \big |\xi _{it}\big |\le & {} \max _{1\le i\le n,\,\Vert t\Vert \le b} \big |\delta _{i t}\big | + \max _{1\le i\le n,\,t}n^{-1/2}\big \Vert t\Vert \Vert \dot{\nu }(Z_i)\Vert \end{aligned}$$

(72)

$$\begin{aligned}\le & {} o_p(n^{-1/2})+b \max _{1\le i\le n} n^{-1/2}\Vert \dot{\nu }(Z_i)\Vert =o_p(1), \nonumber \\ \sum _{i=1}^n\xi _{it}^2= & {} \sum _{i=1}^n(\nu _{nt}(Z_i)-\nu (Z_i))^2 = \sum _{i=1}^n\delta _{it}^2 + n^{-1} \sum _{i=1}^n(t'\dot{\nu }(Z_i))^2,\nonumber \\ \sup _t \sum _{i=1}^n\xi _{it}^2\le & {} n \max _{1\le i\le n,\,\Vert t\Vert \le b}\big |\delta _{i t}\big |^2 + b^2 n^{-1}\sum _{i=1}^n\Vert \dot{\nu }(Z_i)\Vert ^2 =O_p(1), \\\nonumber \end{aligned}$$

(73)

by (14). Moreover, by (14) and the Law of Large Numbers,

$$\begin{aligned}&\sup _t \big \Vert n^{-1/2}\sum _{i=1}^n\dot{\nu }_{\theta }(Z_i) \xi _{it}\big \Vert \\&\,\, \le \max _{1\le i\le n,\,\Vert t\Vert \le b} n^{1/2}|\delta _{i t}| n^{-1}\sum _{i=1}^n\Vert \dot{\nu }_{\theta }(Z_i) \big \Vert + b n^{-1} \big \Vert \sum _{i=1}^n\dot{\nu }_{\theta }(Z_i) \dot{\nu }_{\theta }(Z_i)'\big \Vert \nonumber \\&\,\, =o_p(1)+O_p(1)=O_p(1). \nonumber \end{aligned}$$

(74)

These facts will be use in the sequel.

Consider the term $M_7$. Write

$$\begin{aligned}&\gamma _{nt}(x) - g_n(x)t \\&\, = n^{-1/2}\sum _{i=1}^n\dot{\nu }_{nt}(Z_i)\xi _{it}\ell _{Z_i}(x)-n^{-1}\sum _{i=1}^n\dot{\nu }(Z_i) \dot{\nu }(Z_i)'\ell _{Z_i}(x) t \\&\,= n^{-1/2}\sum _{i=1}^n\big [\dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big ] \xi _{it}\ell _{Z_i}(x) + n^{-1/2}\sum _{i=1}^n\dot{\nu }(Z_i) \delta _{it} \ell _{Z_i}(x) \\&\,= n^{-1/2}\sum _{i=1}^n\big [\dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big ] \delta _{it} \ell _{Z_i}(x) \\&\quad + n^{-1}\sum _{i=1}^n\big [\dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big ]\dot{\nu }(Z_i)' \ell _{Z_i}(x) t + n^{-1/2}\sum _{i=1}^n\dot{\nu }(Z_i) \delta _{it} \ell _{Z_i}(x). \end{aligned}$$

Hence

$$\begin{aligned} M_7=\int \limits _0^\infty \big \Vert \gamma _{nt}(x) - g_n(x)t \big \Vert ^2 dG(x)\le & {} 4\{M_{71}(t)+M_{72}(t)+M_{73}(t)\}, \end{aligned}$$

where

$$\begin{aligned} M_{71}(t)= & {} n^{-1}\int \limits _0^\infty \Big \Vert \sum _{i=1}^n\big [\dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big ] \delta _{it} \ell _{Z_i}(x)\Big \Vert ^2 dG(x), \\ M_{72}(t)= & {} n^{-2}\int \limits _0^\infty \Big \Vert \sum _{i=1}^n\big [\dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big ]\dot{\nu }(Z_i)' \ell _{Z_i}(x) t \Big \Vert ^2 dG(x),\\ M_{73}(t)= & {} n^{-1}\int \limits _0^\infty \Big \Vert \sum _{i=1}^n\dot{\nu }(Z_i) \delta _{it} \ell _{Z_i}(x)\Big \Vert ^2dG(x). \end{aligned}$$

But, by (18) and (71),

$$\begin{aligned}&\sup _t M_{71}(t)\\&\, \le n \sup _{t, 1\le i\le n}\delta _{it}^2 \sup _{t, 1\le i\le n} \big \Vert \dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big \Vert ^2 \int \limits _0^\infty n^{-1} \sum _{i=1}^n\ell _{Z_i}^2(x) dG(x)= o_p(1). \end{aligned}$$

Similarly, by the C-S inequality,

$$\begin{aligned} \sup _t M_{72}(t)\le & {} b^2 \sup _{t, 1\le i\le n} \big \Vert \dot{\nu }_{nt}(Z_i)-\dot{\nu }(Z_i)\big \Vert ^2 n^{-1}\int \limits _0^\infty \sum _{i=1}^n\Vert \dot{\nu }(Z_i)\Vert ^2 \ell _{Z_i}^2(x) dG(x) \\= & {} o_p(1) O_p(1)=o_p(1), \end{aligned}$$

by (18) and (19). Again, by (19) and (71),

$$\begin{aligned} \sup _t M_{73}(t)\le & {} \sup _{t, 1\le i\le n} n |\delta _{it}|^2 n^{-1}\int \limits _0^\infty \sum _{i=1}^n\Vert \dot{\nu }(Z_i)\Vert ^2 \ell _{Z_i}^2(x)dG(x)=o_p(1). \end{aligned}$$

These facts prove (67) for $j=7$.

Next consider $M_5$. Let $D_{it}(x):=L_{Z_i}(x +\xi _{it}) - L_{Z_i}(x) -\xi _{it}\ell _{Z_i}( x)$. Then

$$\begin{aligned} M_5(t):= & {} \frac{1}{n} \int \limits _0^\infty \Big \Vert \sum _{i=1}^n\dot{\nu }_{nt}(Z_i) D_{it}(x)\Big \Vert ^2 dG(x) \\\le & {} \frac{1}{n} \sum _{i=1}^n\Big \Vert \dot{\nu }_{nt}(Z_i)\Big \Vert ^2 \int \limits _0^\infty \sum _{i=1}^nD_{it}^2(x) dG(x). \nonumber \end{aligned}$$

(75)

By (14) and (18),

$$\begin{aligned} \sup _tn^{-1} \sum _{i=1}^n\Big \Vert \dot{\nu }_{nt}(Z_i)\Big \Vert ^2 \le \sup _t&n^{-1} \sum _{i=1}^n\Big \Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }(Z_i)\Big \Vert ^2 \\&+ \sup _t n^{-1} \sum _{i=1}^n\Big \Vert \dot{\nu }(Z_i)\Big \Vert ^2 = o_p(1). \end{aligned}$$

By the C-S inequality, Fubini Theorem, (20) and (73),

$$\begin{aligned}&\int \limits _0^\infty \sum _{i=1}^nD_{it}^2(x) dG(x) \\&\, \le \int \limits _0^\infty \sum _{i=1}^n\Big (\int \limits _{-|\xi _{it}|}^{|\xi _{it}|} \big (\ell _{Z_i}(x+u) -\ell _{Z_i}(x)\big ) du \Big )^2 dG(x)\\&\, \le \int \limits _0^\infty \sum _{i=1}^n|\xi _{it}| \int \limits _{-|\xi _{it}|}^{|\xi _{it}|} \big (\ell _{Z_i}(x+u) -\ell _{Z_i}(x)\big )^2 du dG(x)\\&\, \le \max _{1\le i\le n, t}|\xi _{it}|^{-1}\int \limits _{-|\xi _{it}|}^{|\xi _{it}|} \int \limits _0^\infty \big (\ell _{Z_i}(x+u) -\ell _{Z_i}(x)\big )^2 dG(x) du \sum _{i=1}^n|\xi _{it}|^2 \\&\, =o_p(1). \end{aligned}$$

Upon combining these facts with (75) we obtain $\sup _t M_5(t)=o_p(1)$, thereby proving (67) for $j=5$. The proof for $j=6$ is exactly similar.

Now consider $M_1$. Let $\xi _t(Z):= \nu _{nt}(Z)- \nu (Z)$. Then

$$\begin{aligned} EM_1(t):\le&\int \limits _{-\infty }^\infty E \big \Vert W_t( x,t) - W_t(x,0)\big \Vert ^2 dG(x) \\ \le&n^{-1}\sum _{i=1}^nE\Big ( \Vert \dot{\nu }_{nt}(Z_i)\Vert ^2 \int \limits _{-\infty }^\infty \Big |L_{Z_i}(x+\xi _{it}) - L_{Z_i}(x)\Big |dG(x)\Big ) \\ \le&n^{-1}\sum _{i=1}^nE\Big ( \Vert \dot{\nu }_{nt}(Z_i)\Vert ^2 \int \limits _{-\infty }^\infty \int \limits _{-|\xi _{it}|}^{|\xi _{it}|} \ell _{Z_i}(x+u) du dG(x) \\ =&E\Big ( \int \limits _{-|\xi _{t}(Z)|}^{|\xi _{t}(Z)|} \Vert \dot{\nu }_{nt}(Z)\Vert ^2 \int \limits _{-\infty }^\infty \ell _{Z}(x+u) dG(x)\,du\Big ) \rightarrow 0, \end{aligned}$$

by (21). Thus

$$\begin{aligned} M_1(t)=o_p(1), \qquad \forall \, t\in \mathcal{U}(b). \end{aligned}$$

(76)

To prove that this holds uniformly in $t\in \mathcal{U}(b)$, because of the compactness of the ball $\mathcal{U}(b)$, it suffices to show that for every $\epsilon >0$ there is a $\delta >0$ and an $N_\epsilon $ such that for every $s\in \mathcal{U}(b)$,

$$\begin{aligned} P\big (\sup _{\Vert t-s\Vert < \delta } \Vert M_1(t)- M_1(s)|\ge \epsilon \big )\le \epsilon , \quad \forall \, n>N_\epsilon . \end{aligned}$$

(77)

Let $\dot{\nu }_{ntj}(z)$ denote the jth coordinate of $\dot{\nu }_{nt}(z)$, $j=1,\ldots , q$ and let

$$\alpha _i(x,t):= I(\zeta _i\le x+ \xi _{it}) - I(\zeta _i\le x) - L_{Z_i}(x+\xi _{it})+ L_{Z_i}(x). $$

Then

$$\begin{aligned} M_1(t)= & {} \int \limits _0^\infty \big \Vert W_t( x,t) - W_t(x,0)\big \Vert ^2 dG(x) \\= & {} \sum _{j=1}^q \int \limits _0^\infty \Big (n^{-1/2}\sum _{i=1}^n\dot{\nu }_{ntj}(Z_i) \alpha _i(x,t)\Big )^2 dG(x) =\sum _{j=1}^q M_{1j}(t), \quad \text {say}. \end{aligned}$$

Thus it suffices to prove (77) with $M_1$ replaced by $M_{1j}$ for each $j=1,\ldots , q$.

Any real number a can be written as $a=a^+-a^-$, where $a^+=\max (0,a), a^-= \max (0,-a)$. Note that $a^\pm \ge 0$. Fix a $j=1,\ldots ,q$, write $\dot{\nu }_{ntj}(Z_i)=\dot{\nu }_{ntj}^+(Z_i) - \dot{\nu }_{ntj}^- (Z_i)$ and define

$$\begin{aligned}&W_j^\pm (x,t):= n^{-1/2} \sum _{i=1}^n\dot{\nu }_{ntj}^\pm (Z_i) \alpha _i(x,t),\\&D_j^\pm (x,s,t):=W_j^\pm (x,t)- W_j^\pm (x,s), \quad R^\pm _j(s,t) := \int \limits _0^\infty \big ( D_j^\pm (x,s,t)\big )^2 dG(x). \end{aligned}$$

Then

$$\begin{aligned}&\big |M_{1j}(t)-M_{1j}(s)\big | \\= & {} \Big |\int \limits _0^\infty \big (W_j^+(x,t)- W_j^-(x,t)\big )^2 dG(x) \nonumber \\&\qquad \qquad - \int \limits _0^\infty \big (W_j^+(x,s)- W_j^-(x,s)\big )^2 dG(x) \Big | \nonumber \\\le & {} \int \limits _0^\infty \big (D_j^+(x,s,t)\big )^2 dG(x) + \,\int \limits _0^\infty \big (D_j^-(x,s,t)\big )^2 dG(x) \nonumber \\&+\, 2 \Big \{\int \limits _0^\infty \big ( D_j^+(x,s,t)\big )^2 dG(x) \int \limits _0^\infty \big (D_j^-(x,s,t)\big )^2 dG(x)\Big \}^{1/2} \nonumber \end{aligned}$$

(78)

$$\begin{aligned}&+\, 2 \Big [\Big \{\int \limits _0^\infty \big ( D_j^+(x,s,t)\big )^2 dG(x)\Big \}^{1/2} \nonumber \\&\qquad \qquad \qquad \quad \;\; + \Big \{\int \limits _0^\infty \big ( D_j^-(x,s,t)\big )^2 dG(x)\Big \}^{1/2}\Big ] M_{1j}^{1/2}(s) \nonumber \\= & {} R_j^+(s,t)+R_j^-(s,t)+2\big (R_j^+(s,t) R_j^-(s,t)\big )^{1/2} \nonumber \\&\qquad \qquad \quad \;\; +\big \{ (R_j^+(s,t))^{1/2}+ (R_j^-(s,t)\big )^{1/2}\big \} M_{1j}^{1/2}(s). \nonumber \end{aligned}$$

Write

$$\begin{aligned} D_j^+(x,s,t)&= n^{-1/2} \sum _{i=1}^n\dot{\nu }_{ntj}^+(Z_i) \alpha _i(x,t) -n^{-1/2} \sum _{i=1}^n\dot{\nu }_{nsj}^+(Z_i) \alpha _i(x,s) \\&= n^{-1/2} \sum _{i=1}^n\big [\dot{\nu }_{ntj}^+(Z_i)- \dot{\nu }_{nsj}^+(Z_i) \big ] \alpha _i(x,t) \\&\qquad \qquad \quad \;\; + n^{-1/2} \sum _{i=1}^n\dot{\nu }_{nsj}^+(Z_i) \big [ \alpha _i(x,t) - \alpha _i(x,s)\big ] \\&= D_{j1}^+(x, s,t)+ D_{j2}^+(x, s,t), \quad \text {say}. \end{aligned}$$

Hence

$$\begin{aligned} R_j^+(s,t) \le 2\int \limits _0^\infty \big (D_{j1}^+(x, s,t)\big )^2 dG(x) + 2\int \limits _0^\infty \big (D_{j2}^+(x, s,t)\big )^2 dG(x). \end{aligned}$$

(79)

By (23), the first term here satisfies (77). We proceed to verify it for the second term. Fix an $s\in \mathcal{U}_b$, $\epsilon >0$ and $\delta >0$. Let

$$\begin{aligned} {\varDelta }_{ni}:=n^{-1/2}\big (\delta \Vert \dot{\nu }(Z_i)\Vert + 2\epsilon ), \quad B_n:=\Big \{\sup _{t\in \mathcal{N}_b,\Vert t-s\Vert \le \delta } \big |\xi _{it}-\xi _{is}\big |\le {\varDelta }_{ni}\Big \}. \end{aligned}$$

By (18), there exists an $N_\epsilon $ such that $P(B_n)>1-\epsilon $, for all $n>N_\epsilon $. On $B_n$, $\xi _{is}-{\varDelta }_{ni}\le \xi _{it}\le \xi _{is}+ {\varDelta }_{ni}$ and, by the nondecreasing property of the indicator function and d.f., we obtain

$$\begin{aligned}&I(\zeta _i\le x+ \xi _{is} -{\varDelta }_{ni}) - I(\zeta _i\le x) - L_{Z_i}(x-\xi _{is}+{\varDelta }_{ni})+ L_{Z_i}(x) \\&\qquad \qquad \qquad \qquad \qquad \;\, - L_{Z_i}(x+\xi _{is}+{\varDelta }_{ni}) + L_{Z_i}(x+\xi _{is}-{\varDelta }_{ni})\\&\quad \, \le \alpha _i(x,t)=I(\zeta _i\le x+ \xi _{it}) - I(\zeta _i\le x) - L_{Z_i}(x+\xi _{it})+ L_{Z_i}(x) \\&\qquad \; \le I(\zeta _i\le x+ \xi _{is} +{\varDelta }_{ni}) - I(\zeta _i\le x) - L_{Z_i}(x+\xi _{is}+{\varDelta }_{ni})+ L_{Z_i}(x) \\&\qquad \qquad \qquad \qquad \quad \;\;\;\;\; + L_{Z_i}(x+\xi _{is}+{\varDelta }_{ni}) - L_{Z_i}(x+\xi _{is}-{\varDelta }_{ni}). \end{aligned}$$

Let

$$\begin{aligned} \mathcal{D}_{j2}^\pm (x,s, a):= & {} n^{-1/2}\sum _{i=1}^n\dot{\nu }_{njs}^\pm \Big \{I(\zeta _i\le x+ \xi _{is} +a {\varDelta }_{ni}) - I(\zeta _i\le x) \end{aligned}$$

$$\begin{aligned}&- L_{Z_i}(x+\xi _{is}+a{\varDelta }_{ni})+ L_{Z_i}(x) \Big \}. \end{aligned}$$

The above inequalities and $\dot{\nu }_{njs}^+(Z_i)$ being nonnegative yield that on $B_n$,

$$\begin{aligned}&\int \limits _0^\infty \big (D_{j2}^+(x, s,t)\big )^2 dG(x) \\&\,\,\le \int \limits _0^\infty \big (\mathcal{D}_{j2}^+(x,s, 1) - \mathcal{D}_{j2}^+(x,s, 0)\big )^2 dG(x) \\&\qquad \;\; + \int \limits _0^\infty \big (\mathcal{D}_{j2}^+(x,s, -1) - \mathcal{D}_{j2}^+(x,s, 0)\big )^2 dG(x) \\&\qquad \;\; + \int \limits _0^\infty \Big (n^{-1/2}\sum _{i=1}^n\dot{\nu }_{njs}^+(Z_i)\big \{ L_{Z_i}(x+\xi _{is}+{\varDelta }_{ni})\\&\qquad \qquad \qquad \quad \, - L_{Z_i }(x +\xi _{is}-{\varDelta }_{ni})\big \} dG(x) \Big )^2. \end{aligned}$$

Note that $\max _{1\le i\le n}(|\xi _{is}|+{\varDelta }_{ni})=o_p(1)$. Argue as for (76) to see that the first two terms in the above bound are $o_p(1)$, while the last term is bounded from the above by

$$\begin{aligned}&\int \limits _0^\infty \Big (n^{-1/2}\sum _{i=1}^n\dot{\nu }_{njs}^+(Z_i) \int \limits _{\xi _{is}-{\varDelta }_{ni}}^{\xi _{is}+{\varDelta }_{ni}} \ell _{Z_i}(x+u)du \, dG(x) \Big )^2 \\&\, \le 2 n^{-1}\sum _{i=1}^n(\dot{\nu }_{njs}^+(Z_i))^2 \sum _{i=1}^n{\varDelta }_{ni} \int \limits _{\xi _{is}-{\varDelta }_{ni}}^{\xi _{is}+{\varDelta }_{ni}} \int \limits _0^\infty \big [\ell _{Z_i}^2(x+u) -\ell _{Z_i}^2(x)\big ]\, dG(x)\, du \nonumber \\&\qquad \; + \,4 n^{-1}\sum _{i=1}^n(\dot{\nu }_{njs}^+(Z_i))^2 \sum _{i=1}^n{\varDelta }_{ni}^2 \int \limits _0^\infty \ell _{Z_i}^2(x) dG(x). \nonumber \end{aligned}$$

(80)

The first summand in the above bound is bounded above by

$$\begin{aligned}&2 \max _{1\le i\le n} (2{\varDelta }_{ni})^{-1} \int \limits _{\xi _{is}-{\varDelta }_{ni}}^{\xi _{is}+{\varDelta }_{ni}} \int \limits _0^\infty \big [\ell _{Z_i}^2(x+u) -\ell _{Z_i}^2(x)\big ] du dG(x) \\&\qquad \quad \;\;\;\;\; \times n^{-1}\sum _{i=1}^n(\dot{\nu }_{njs}^+(Z_i))^2 \sum _{i=1}^n{\varDelta }_{ni}^2 =o_p(1), \end{aligned}$$

because the first factor tends to zero in probability by (20) and the second factor satisfies

$$\begin{aligned} n^{-1}\sum _{i=1}^n(\dot{\nu }_{njs}^+(Z_i))^2 \sum _{i=1}^n{\varDelta }_{ni}^2\le n^{-1}\sum _{i=1}^n\Vert \dot{\nu }_{ns}\Vert ^2 \big (2n^{-1}\delta ^2 \sum _{i=1}^n\Vert \dot{\nu }(Z_i)\Vert ^2 + 4\epsilon ^2). \end{aligned}$$

The second term in the upper bound of (80) is bounded from the above by

$$\begin{aligned}&4 n^{-1}\sum _{i=1}^n\Vert \dot{\nu }_{ns}(Z_i)\Vert ^2\, n^{-1} \sum _{i=1}^n\big (\delta ^2 \Vert \dot{\nu }(Z_i)\Vert ^2 + 4\epsilon ^2) \int \limits _0^\infty \ell _{Z_i}^2(x) dG(x)\\&\rightarrow _p E \Vert \dot{\nu }(Z)\Vert ^2\big [\delta ^2 \int \limits _0^\infty E(\Vert \dot{\nu }(Z)\Vert ^2 \ell _Z^2(x) ) dG(x) +4\epsilon ^2 \int \limits _0^\infty E(\ell _Z^2(x)) dG(x)\big ]. \end{aligned}$$

Since the factor multiplying $\delta ^2$ is positive, the above term can be made smaller than $\epsilon $ by the choice of $\delta $. Hence (77) is satisfied by the second term in the upper bound of (79). This then completes the proof of $R_j^+$ satisfying (77). The details of the proof for verifying (77) for $R_j^-$ are exactly similar. These facts together with the upper bound of (78) show that (77) is satisfied by $M_{1j}$ for each $j=1,\ldots , q$. This also completes the proof of $\sup _t M_1(t)=o_p(1),$ thereby proving (67) for $j=1$. The proof for $j=3$ is similar.

Next, consider $M_2$. Recall $\beta _i(x):= I(\zeta _i\le x)-L_{Z_i}(x)$. Then

$$\begin{aligned} M_2(t):= & {} n^{-1}\int \limits _0^\infty \big \Vert \sum _{i=1}^n\{\dot{\nu }_{nt}(Z_i) - \dot{\nu }(Z_i)\} \beta _i(x) \big \Vert ^2 dG(x). \end{aligned}$$

Because $E(\beta _i(x)|Z_i)\equiv 0$, a.s., we have

$$\begin{aligned} EM_2(t)=\int \limits _0^\infty E\Big (\big \Vert \dot{\nu }_{nt}(Z) - \dot{\nu }(Z)\Vert ^2 L_{Z}(x)(1-L_{Z}(x)\Big ) dG(x) \rightarrow 0, \end{aligned}$$

by (18). Thus

$$\begin{aligned} M_2(t)=o_p(1), \qquad \forall \, t\in {\mathbb R}^q. \end{aligned}$$

(81)

To prove this holds uniformly in $t\in \mathcal{U}(b)$, we shall verify (77) for $M_2$. Accordingly, let $\delta >0$, $s\in \mathcal{U}(b)$ be fixed. Then forall $t\in \mathcal{U}(b)$ such that $\Vert t-s\Vert <\delta $,

$$\begin{aligned}&\big |M_2(t)-M_2(s)\big | \\&\,\, \le n^{-1}\int \limits _0^\infty \big \Vert \sum _{i=1}^n\{\dot{\nu }_{nt}(Z_i) - \dot{\nu }_{ns}(Z_i)\} \beta _i(x) \big \Vert ^2 dG(x) \\&\quad +\, 2\Big (n^{-1}\int \limits _0^\infty \big \Vert \sum _{i=1}^n\{\dot{\nu }_{nt}(Z_i) - \dot{\nu }_{ns}(Z_i)\} \beta _i(x) \big \Vert ^2 dG(x)\Big )^{1/2} M_2(s)^{1/2} \end{aligned}$$

This bound, (24) and (81) now readily verifies (77) for $M_2$, which also completes the proof of (67) for $j=2$. The proof of (67) for $j=4$ is precisely similar. This in turn completes the proof of Lemma 6. $\square $

Proof of (60). Recall (43). Let $D(z,x):= m_\theta (z + x)-\nu _\theta (z)$. Use the fact $\widehat{U}(x,\theta ) = \tilde{W}(x,0) + \tilde{W}(-x,0)$, to rewrite

$$\begin{aligned} \tilde{T}_n&= \frac{1}{N^2\sqrt{n}} \sum _{i=1}^n \sum _{j,k=1}^N \int \mu (z) \dot{m}_\theta (Z_i + \tilde{\eta }_j) \{\varphi _z(\zeta _i) - 2 \ell _z(\zeta _i) D(Z_i, \tilde{\eta }_k) \}dQ(z) \\&+\text { higher order terms}, \end{aligned}$$

where $\varphi _z(x)$ is defined as in Sect. 4.1.

For further analysis of $\tilde{T}_n$, with the two independent samples $\{(Z_i,\zeta _i),1\le i \le n\}$ and $\{\tilde{\eta }_k, 1\le k \le N\}$, define the symmetric kernel function $\phi $ and its projections as follows.

$$\begin{aligned}&\phi (Z_1,\zeta _1, \tilde{\eta }_1,\tilde{\eta }_2) \\&\,\, := \int \mu (z) \dot{m}_\theta (Z_1+ \tilde{\eta }_1) \{\varphi _z(\zeta _1) - 2 \ell _z(\zeta _1)D(Z_1,\tilde{\eta }_2)\} dQ(z) \\&\;\; +\, \int \mu (z) \dot{m}_\theta (Z_1+ \tilde{\eta }_2) \{\varphi _z(\zeta _1) - 2 \ell _z(\zeta _1) D(Z_1,\tilde{\eta }_1)\} dQ(z) \\ E(\phi |Z_1,\zeta _1)&= 2 \int \mu (z) \dot{\nu }_\theta (Z_1) \varphi _z(\zeta _1) dQ(z), \quad E\phi (Z_1,\zeta _1,\tilde{\eta }_1,\tilde{\eta }_2) = 0, \\ E(\phi |\tilde{\eta }_1)&= -2 \int \mu (z) E\{\dot{\nu }(Z) \ell _z(\zeta )D(Z,\tilde{\eta }_1)|\tilde{\eta }_1\}dQ(z). \end{aligned}$$

Let $\tilde{T}_{n1}$ denote the first term in the right hand side of $\tilde{T}_n$. Then

$$\begin{aligned} \tilde{T}_{n1}= & {} \frac{1}{N^2\sqrt{n}} \sum _{i=1}^n \sum _{1\le j<k\le N} \phi (Z_i,\zeta _i,\tilde{\eta }_j,\tilde{\eta }_k) \\&+\, \frac{1}{N^2\sqrt{n}} \sum _{i=1}^n \sum _{j=1}^N \int \mu (z) \dot{m}_\theta (Z_i + \tilde{\eta }_j) \{\varphi _z(\zeta _i) - 2 \ell _z(\zeta _i) D(Z_i, \tilde{\eta }_j)\} dQ(z)\\=: & {} \tilde{T}_{n11} + \tilde{T}_{n12}. \end{aligned}$$

Note that that $\tilde{T}_{n11}$ is a U-statistic with permutation degree 1 in the primary sample $\{(Z_i,\zeta _i),1\le i \le n\}$ and permutation degree 2 in the validation sample $\{\tilde{\eta }_k, 1\le k \le N\}$. Theorem 6.1.4 in Lehmann [17] and (52) yield that, for $0\le \lambda <\infty $,

$$\begin{aligned} \tilde{T}_{n11}&\; = \sqrt{n}\times \frac{N(N-1)}{2N^2}\times \frac{1}{{n\atopwithdelims ()1} {N\atopwithdelims ()2}} \sum _{i=1}^n \sum _{1\le j<k\le N} \phi (Z_i,\zeta _i,\tilde{\eta }_j,\tilde{\eta }_k) \\&\rightarrow _D \frac{1}{2}\, N\Big (0, \text{ Var }(E(\phi |Z_1,\zeta _1)) + 4\, \lambda \, \text{ Var }(E(\phi |\tilde{\eta }_1))\Big ) = N(0, {\varSigma }_\theta + 4 \lambda {\varSigma }_1). \end{aligned}$$

Moreover, for $\lambda = \infty $, Theorem 6.1.4 in Lehmann [17] also yields that $ \sqrt{N/n} \, \tilde{T}_{n11} \rightarrow _D N(0,4{\varSigma }_1). $

Similarly, $\tilde{T}_{n12}$ is a U-statistic with permutation degree 1 for both samples. Since (52) implies that $E\{\Vert \dot{m}_\theta (X) [m_\theta (X) - \nu _{\theta }(Z)]\Vert \}<\infty $, therefore $E \tilde{T}_{n12} = O(n^{-1/2})$. Moreover, Theorem 6.1.3 of U-statistics in Lehmann [17] implies that $\text{ Var }(\tilde{T}_{n12}) = O(n^{-1})$ and hence $\tilde{T}_{n12} = o_p(1)$. Hence the claim (60).

Proof of (66). Let $\dot{D}_{ijk}:= \dot{m}_\theta (Z_i+\tilde{\eta }_k) - \dot{m}_\theta (Z_j+\tilde{\eta }_k)$. Based on the definitions of $\widehat{\varGamma }_\theta (u)$ and $\kappa _z(v)$, $T_{n,R}$ can be rewritten as

$$\begin{aligned} \tilde{T}_{n,R}= & {} \int \int \limits _0^1 \mu ^c(z) \tilde{\mathcal {U}}_R(u) \ell _z(L_z^{-1}(u))d{\varPsi }(u)dQ(z)\\= & {} -\,n^{-1/2} \sum _{i=1}^n \int \mu ^c(z) \hat{\dot{\nu }}^c(Z_i) \kappa _z(L_{Z_i}(\zeta _i - {\varDelta }(Z_i))) dQ(z) \\= & {} -\,\frac{n^{-1/2}}{nN} \sum _{i=1}^n\sum _{j=1,j\ne i}^n \sum _{k=1}^N \int \mu ^c(z) \dot{D}_{ijk} \kappa _z (L_{Z_i}(\zeta _i))dQ(z) \\&-\,\frac{n^{-1/2}}{nN} \sum _{i=1}^n\sum _{j=1,j\ne i}^n \sum _{k=1}^N \int \mu ^c(z)\dot{D}_{ijk} \ell _z(\zeta _i) {\varDelta }(Z_i) dQ(z) \\&+\, \text { higher order} := T_{n,R1} + T_{n,R2} + \text { higher order}. \end{aligned}$$

First, we study the asymptotic distribution of $T_{n,R1}$. Define for $1\le i,j \le n, i\ne j$, and $1\le k\le N$,

$$\begin{aligned} \psi _1(Z_i,\zeta _i,Z_j,\zeta _j,\tilde{\eta }_k):= & {} \int \mu ^c(z) \dot{D}_{ijk} \Big \{ \kappa _z (L_{Z_i}(\zeta _i)) +\kappa _z (L_{Z_j}(\zeta _j))\Big \}dQ(z). \end{aligned}$$

Then $T_{n,R1}$ can be rewritten as

$$\begin{aligned} T_{n,R1} = - \frac{n(n-1)}{2n^2} \times \frac{\sqrt{n}}{{n \atopwithdelims ()2} {N\atopwithdelims ()1}} \sum _{1\le i<j\le n} \sum _{k=1}^N \psi _1(Z_i,\zeta _i,Z_j,\zeta _j,\tilde{\eta }_k). \end{aligned}$$

By the definition of U-statistics in Lehmann [17], $T_{n,R1}$ is a two sample U-statistic based on function $\psi _1$ with permutation degree of 2 on the sample $\{(Z_i,\zeta _i),1\le i\le n\}$ and permutation degree of 1 on the sample $\{\tilde{\eta }_k,1\le k\le N\}$. Because conditionally, $L_{Z_i}(\zeta _i)$, given $Z_i$, is a uniformly distributed r.v., we have $E(L_{Z_i}(\zeta _i)|Z_i) = \int \limits _0^1 \kappa _z(u)du := K(z)$. Then the conditional expectations of $\psi $ can be calculated as follows.

$$\begin{aligned} E(\psi _1|Z_1,\zeta _1)= & {} \int \mu ^c(z) [\dot{\nu }_\theta (Z_1) - E(\dot{\nu }_\theta (Z))]\kappa _z^c(L_{Z_1}(\zeta _1)) dQ(z),\\ E(\psi _1|Z_1,Z_2,\tilde{\eta }_1)= & {} \int \mu ^c(z) \big \{\dot{D}_{121} + \dot{D}_{211} \big \} K(z)dQ(z) = 0,\\ E(\psi _1|\tilde{\eta }_1)= & {} 0. \end{aligned}$$

It can be seen that Cov$(E(\psi _1|Z_1,\zeta _1)) = \widehat{\varSigma }_\theta $ as defined in Sect. 4.2. Then Theorem 6.1.4 in Lehmann [17] yields that

$$\begin{aligned} T_{n,R1} \rightarrow _D \frac{1}{2} N(0,4 \text {Cov}(E(\psi |Z_1,\zeta _1))) = N(0,\widehat{\varSigma }_\theta ). \end{aligned}$$

Next, in order to study $T_{n,R2}$, define

$$\begin{aligned}&\psi _2(Z_i,\zeta _i,Z_j,\zeta _j,\tilde{\eta }_k,\tilde{\eta }_l) \\&\,= \int \mu ^c(z)[\dot{m}_\theta (Z_i+\tilde{\eta }_k)-\dot{m}_\theta (Z_j+\eta _k)] \ell _z(\zeta _i) D(Z_i,\tilde{\eta }_l) dQ(z) \\&\quad + \int \mu ^c(z)[\dot{m}_\theta (Z_j+\tilde{\eta }_k)-\dot{m}_\theta (Z_i+\eta _k)] \ell _z(\zeta _j) D(Z_j,\tilde{\eta }_l) dQ(z) \\&\quad + \int \mu ^c(z)[\dot{m}_\theta (Z_i+\tilde{\eta }_l)-\dot{m}_\theta (Z_j+\eta _l)] \ell _z(\zeta _i) D(Z_i,\tilde{\eta }_k) dQ(z)\\&\quad + \int \mu ^c(z)[\dot{m}_\theta (Z_j+\tilde{\eta }_l)-\dot{m}_\theta (Z_i+\eta _l)] \ell _z(\zeta _j) D(Z_j,\tilde{\eta }_k) dQ(z). \end{aligned}$$

Then $T_{n,R2}$ can be rewritten as a two sample U-statistic with permutation degree 2 for both primary sample and validation sample.

$$\begin{aligned}&T_{n,R2} \\&\,\, = - \frac{n(n-1)}{2n^2} \, \frac{N(N-1)}{2N^2}\, \frac{\sqrt{n}}{{n \atopwithdelims ()2} {N\atopwithdelims ()2}} \sum _{1\le i<j\le n} \sum _{1\le k < l\le N} \psi _2(Z_i,\zeta _i,Z_j,\zeta _j,\tilde{\eta }_k,\tilde{\eta }_l). \end{aligned}$$

The conditional expectations of $\psi _2$ are calculated as

$$\begin{aligned}&E(\psi _2|\tilde{\eta }_1) = 2 \int \mu ^c(z) E\big \{[\dot{\nu }_\theta (Z) - E(\dot{\nu }_\theta (Z))] \ell _z(\zeta )D(Z, \tilde{\eta }_1) dQ(z)\\&E(\psi _2|Z_1,\zeta _1) = 0. \end{aligned}$$

Then Theorem 6.1.4 in Lehmann [17] shows that, for $0\le \lambda < \infty $,

$$\begin{aligned} T_{n,R2} \rightarrow _D \frac{1}{4} N(0, 4\lambda \text {Cov}(E(\psi _2|\tilde{\eta }_1))) = N(0, \lambda {\varSigma }_2). \end{aligned}$$

The two terms $T_{n,R1}$ and $T_{n,R2}$ are asymptotically independent becuase of the independence between the primary sample and validation sample. In fact, $T_{n,R1}$ is based on $E(\psi _1|Z_1,\zeta _1)$ and $T_{n,R2}$ is based on $E(\psi _2|\tilde{\eta }_1)$. Therefore, (66)(a) holds. An argument similar to one used for (51) yields that $ \sup _{\Vert t\Vert \le b} |\tilde{\mathcal{K}}(\theta + n^{-1/2}t) - \tilde{\mathcal{K}}_R(t)| = o_p(1), $ which in turn yields the claim (66)(b) about $\tilde{\theta }_R$.

When $\lambda = \infty $, by Theorem 6.1.4 in Lehmann [17], $ \sqrt{N/n} \, T_{n,R2} \rightarrow _D N(0,{\varSigma }_2). $ Then $\sqrt{N/n} \,\tilde{T}_{n,R} = \sqrt{N/n} \,\tilde{T}_{n,R1} + \sqrt{N/n} \,\tilde{T}_{n,R2} \rightarrow _D N(0,{\varSigma }_2)$. Therefore, we obtain that $\sqrt{N}(\tilde{\theta }_R - \theta )\rightarrow _D N(0,\widehat{\varOmega }_\theta ^{-1} {\varSigma }_2 \widehat{\varOmega }_\theta ^{-1})$ for $\lambda =\infty $. $\Box $

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Koul, H.L., Geng, P. (2020). Weighted Empirical Minimum Distance Estimators in Berkson Measurement Error Regression Models. In: Maciak, M., Pešta, M., Schindler, M. (eds) Analytical Methods in Statistics. AMISTAT 2019. Springer Proceedings in Mathematics & Statistics, vol 329. Springer, Cham. https://doi.org/10.1007/978-3-030-48814-7_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-48814-7_3
Published: 20 July 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48813-0
Online ISBN: 978-3-030-48814-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Weighted Empirical Minimum Distance Estimators in Berkson Measurement Error Regression Models

Abstract

Similar content being viewed by others

A robust conditional maximum likelihood estimator for generalized linear models with a dispersion parameter

Minimum distance model checking in Berkson measurement error models with validation data

An R-Estimator in the Errors in Variables Linear Regression Model

Keywords

1 Introduction

2 Linear Regression Model

Lemma 1

3 Berkson ME Linear Regression Model

Remark 1

4 Nonlinear Regression with Berkson ME

4.1 Asymptotic Distribution of \(\widehat{\theta }\)

Remark 2

Remark 3

Lemma 2

Proof

4.2 Asymptotic Distribution of \(\widehat{\theta }_R\)

Lemma 3

Remark 4

5 M.D. Estimators with Validation Data

5.1 Asymptotic Distribution of \(\widehat{\theta }_1\)

Lemma 4

5.2 Asymptotic Distribution of \(\tilde{\theta }_R\)

Lemma 5

6 Data Analysis

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Lemma 6

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation