Keywords

1 Introduction

Statistical literature is replete with the various minimum distance estimation methods in the one and two sample location models. Beran [2, 3] and Donoho and Liu [7, 8] argue that the minimum distance estimators based on \(L_2\) distances involving either density estimators or residual empirical distribution functions have some desirable finite sample properties, tend to be robust against some contaminated models and are also asymptotically efficient at some error distributions.

In the classical regression models without measurement error in the covariates, classes of minimum distance estimators of the underlying parameters based on Cramér-von Mises type distances between certain weighted residual empirical processes were developed in Koul [12,13,14,15]. These classes include some estimators that are robust against outliers in the regression errors and asymptotically efficient at some error distributions.

In practice there are numerous situations when covariates are not observable. Instead one observes their surrogate with some error. The regression models with such covariates are known as the measurement error regression models. Fuller [9], Cheng and Van Ness [6], Carroll et al.  [5] and Yi [19] discuss numerous examples of practical importance of these models.

Given the desirable properties of the above minimum distance (m.d.) estimators and the importance of the measurement error regression models, it is desirable to develop their analogs for these models. The next section describes the m.d.  estimators of interest and their asymptotic distributions in the classical linear regression model. Their analogs for the linear regression Berkson measurement error (ME) model are developed in Sect. 3. The two classes of m.d.  estimators are developed. One assumes the symmetry of the regression model error and ME error distributions and then basis the m.d.  estimators on the symmetrized weighted empirical of the residuals. This class includes an analog of the Hodges-Lehmann estimator of the one sample location parameter, see Hodges and Lehmann (1963), and the least absolute deviation (LAD) estimator. The second class is based on a weighted empirical of residual ranks. This class of estimators does not need the symmetry of the errors distributions. This class includes an estimator that is asymptotically more efficient than the analog of Hodges-Lehmann and LAD estimators at some error distributions. Neither classes need the knowledge of the measurement error or regression error distributions.

Section 4 discusses analogs of these estimators in the Berkson measurement error nonlinear regression models, where the measurement error distribution is assumed to be known. Section 5 develops their analogs when the ME distribution is unknown but validation data is available. In this case the consistency rate of these estimators is min\((n, N)^{1/2}\), where n and N are the primary data and validation data sample sizes, respectively. Section 6 provides an application of the proposed estimators to a real data example. Several proofs are deferred to the last section.

2 Linear Regression Model

In this section we recall the definition of the m.d.  estimators of interest here in the no measurement error linear regression model and their known asymptotic normality results.

Accordingly, consider the linear regression model where for some \(\theta \in {\mathbb R}^p\), the response variable Y and the p dimensional observable predicting covariate vector X obey the relation

$$\begin{aligned}&Y=X' \theta +\varepsilon , \end{aligned}$$
(1)

where \(\varepsilon \) is independent of X and symmetrically distributed around \(E(\varepsilon )=0\). For an \(x\in {\mathbb R},\,x'\) and \(\Vert x\Vert \) denote its transpose and Euclidean norm, respectively. Let \((X_i, Y_i), 1\le i\le n\) be a random sample from this model. The two classes of m.d.  estimators of \(\theta \) based on weighted empirical processes of the residuals and residual ranks were developed in Koul [12,13,14,15]. To describe these estimators, let G be a nondecreasing right continuous function from \({\mathbb R}\) to \({\mathbb R}\) having left limits and define

$$\begin{aligned} V(x,\vartheta )&:= n^{-1/2} \sum _{i=1}^nX_i\big \{ I(Y_i - X'_i \vartheta \le x) - I(-Y_i + X_i' \vartheta < x)\big \}, \\ M(\vartheta )&:= \int \big \Vert V(x,\vartheta )\big \Vert ^2 dG(x), \qquad \hat{\theta }:= \text {argmin}_{\vartheta \in {\mathbb R}^p}M(\vartheta ). \nonumber \end{aligned}$$

This class of estimators, one for each G, includes some well celebrated estimators. For example \(\hat{\theta }\) corresponding to \(G(x)\equiv x\) yields an analog of the one sample location parameter Hodges-Lehmann estimator in the linear regression model. Similarly, \(G(x)\equiv \delta _0(x)\), the degenerate measure at zero, makes \(\hat{\theta }\) equal to the least absolute deviation (LAD) estimator.

A class of m.d.  estimators when the error distribution is not symmetric and unknown is obtained by using the weighted empirical of the residual ranks defined as follows. Write \(X_i=(X_{i1}, X_{i2}, \ldots , X_{ip})', \, {i=1,\ldots , n}\). Let \(\bar{X}_j:= n^{-1}\sum _{i=1}^nX_{ij}\), \(\bar{X}:=(\bar{X}_1,\ldots , \bar{X}_p)'\) and \(X_{ic}:=X_i-\bar{X}\), \(1\le i\le n\). Let \(R_{i\vartheta }\) denote the rank of the ith residual \(Y_i-X_i'\vartheta \) among \(Y_j-X_j'\vartheta , \, j=1,\ldots , n\). Let \({\varPsi }\) be a distribution function on [0, 1] and define

$$\begin{aligned} V(u,\vartheta )&:= n^{-1/2}\sum _{i=1}^nX_{ic} I(R_{i\vartheta }\le nu), \quad K(\vartheta ):= \int \limits _0^1 \big \Vert \mathcal{V}(u,\vartheta )\Vert ^2 d{\varPsi }(u), \\ \hat{\theta }_R&:= \text {argmin}_{\vartheta \in {\mathbb R}^p} K(\vartheta ). \end{aligned}$$

Yet another m.d. estimator, when error distribution is unknown and not symmetric, is

$$\begin{aligned} V_c(x,\vartheta )&:= n^{-1/2}\sum _{i=1}^nX_{ic} I(Y_i-X_i'\vartheta \le x), \\ M_c(\vartheta )&:= \int \big \Vert V_c(x,\vartheta )\big \Vert ^2 dx, \qquad \hat{\theta }_c := \text {argmin}_{\vartheta \in {\mathbb R}^p} M_c(\vartheta ). \end{aligned}$$

If one reduces the model (1) to the two sample location model, then \(\hat{\theta }_c\) is the median of pairwise differences, the so called Hodges-Lehmann estimator of the two sample location parameter. Thus in general \(\hat{\theta }_c\) is an analog of this estimator in the linear regression model.

The following asymptotic normality results can be deduced from Koul [15] and [16, Sect. 5.4].

Lemma 1

Suppose the model (1) holds and \(E\Vert X\Vert ^2<\infty \).

(a). In addition, suppose \({\varSigma }_X:= E(XX')\) is positive definite and the error d.f.  F is symmetric around zero and has density f. Further, suppose the following hold.

$$\begin{aligned}&G\text { is a nondecreasing right continuous function on }{\mathbb R}\text { to }{\mathbb R}, \end{aligned}$$
(2)
$$\begin{aligned}&\text { having left limits and }dG(x)=-dG(-x), \forall \, x\in {\mathbb R}. \nonumber \\&0<\int f^j dG<\infty , \quad \lim _{z\rightarrow 0}\int \big [f(x+z)-f(x)\big ]^j dG(x)=0,\,j=1,2, \\&\int \limits _0^\infty (1-F) dG<\infty . \nonumber \end{aligned}$$
(3)

Then

$$ n^{1/2}(\hat{\theta }- \theta )\rightarrow _D N\big (0, \sigma ^2_G {\varSigma }_X^{-1}\big ), \quad \sigma _G^2:= \frac{\mathrm{\text{ Var }}\Big (\int \limits _{-\infty }^\varepsilon f(x) dG(x)\Big )}{\big (\int f^2 dG\big )^2}. $$

(b). In addition, suppose the error d.f.  F has uniformly continuous bounded density f, \({\varOmega }:= E\{(X-EX)(X-EX)'\}\) is positive definite and \({\varPsi }\) is a d.f.  on [0, 1] such that \(\int \limits _0^1 f^2(F^{-1}(s)) d{\varPsi }(s)>0\). Then

$$n^{1/2}\big (\hat{\theta }_R - \theta \big ) \rightarrow _D N(0,\gamma _{\varPsi }^2 {\varOmega }^{-1}), \quad \gamma _{\varPsi }^2:= \frac{{\text{ Var }}\big (\int \limits _0^{F(\varepsilon )} f(F^{-1}(s)) d{\varPsi }(s)\big )}{\big (\int \limits _0^1 f^2(F^{-1}(s)) d{\varPsi }(s)\big )^2}. $$

(c). In addition, suppose \({\varOmega }\) is positive definite, F has square integrable density f and \(E|\varepsilon |<\infty \). Then \(n^{1/2}(\hat{\theta }_c - \theta )\rightarrow _D N\big (0, \sigma ^2_I {\varOmega }^{-1}\big )\), where \( \sigma ^2_I:= 1/12 \big (\int f^2 (x) dx\big )^2. \)

Before proceeding further we now describe some comparison of the above asymptotic variances. Let \(\sigma ^2_{LAD}:=1/(4 f^2(0))\) and \(\sigma _{LSE}^2:= \text{ Var }(\varepsilon )\) denote the factors of the asymptotic covariance matrices of the LAD and the least squares estimators, respectively. Let \(\gamma _I^2\) denote the \(\gamma _{\varPsi }^2\) when \({\varPsi }(s)\equiv s\), i.e.,

$$\begin{aligned} \gamma ^2_I= & {} \frac{\int \int \big [F(x\wedge y) -F(x)F(y)\big ] f^2(x) f^2(y) dx dy}{\big (\int \limits _0^1 f^3(x) dx\big )^2}. \end{aligned}$$

Table 1, obtained from Koul [16], gives the values of these factors for some distributions F. From this table one sees that the estimator \(\hat{\theta }_R\) corresponding to \({\varPsi }(s)\equiv s\) is asymptotically more efficient than the LAD at logistic error distribution while it is asymptotically more efficient than the Hodges-Lehmann type estimator at the double exponential and Cauchy error distributions. For these reasons it is desirable to develop analogs of \(\hat{\theta }_R\) also for the ME models.

Table 1 A comparison of asymptotic variances

As argued in Koul (Chap.  5, [16]), the estimators \(\{\hat{\theta }_G, \, G\,\,\text {a d.f.}\}\) are robust against heavy tails in the error distribution in the general linear regression model. The estimator \(\hat{\theta }_I\), where \(G(x)\equiv x\), not a d.f., is robust against heavy tails and also asymptotically efficient at the logistic errors.

3 Berkson ME Linear Regression Model

In this section we shall develop analogs of the above estimators in the Berkson ME linear regression model, where the response variable Y obeys the relation (1) and where, instead of observing X, one observes a surrogate Z obeying the relation

$$\begin{aligned}&X=Z+\eta . \end{aligned}$$
(4)

In (4), \(Z, \eta , \varepsilon \) are assumed to be mutually independent and \(E(\eta )=0\). Note that \(\eta \) is \(p\times 1\) vector of errors and its distribution need not be known.

Analog of \(\hat{\theta }\). We shall first develop and derive the asymptotic distribution of the analogs of the estimators \(\hat{\theta }\) in the Berkson ME linear regression model (1) and (4). Rewrite the model as

$$\begin{aligned}&Y=Z' \theta + \xi , \qquad \xi := \eta ' \theta +\varepsilon , \,\,\,E(\xi )=0, \quad \exists \, \theta \in {\mathbb R}. \end{aligned}$$
(5)

Because \(Z,\eta ,\varepsilon \) are mutually independent, \(\xi \) is independent of Z in (5).

Let H denote the distribution functions (d.f.) of \(\eta \). Assume that the d.f.  F of \(\varepsilon \) is continuous and symmetric around zero and that H is also symmetric around zero, i.e., \(-dH(v)=dH(-v)\), for all \(v\in {\mathbb R}^p\). Then the d.f. of \(\xi \)

$$\begin{aligned} L(x):=P(\xi \le x)=P( \eta '\theta +\varepsilon \le x) =\int F(x- v'\theta ) dH(v) \end{aligned}$$

is also continuous and symmetric around zero. This symmetry in turn motivates the following definition of the class of m.d. estimators of \(\theta \) in the model (5), which mimics the definition of \(\hat{\theta }\) by simply replacing \(X_i\) by \(Z_i\). Define

$$\begin{aligned}&\widetilde{V}(x,t):= n^{-1/2} \sum _{i=1}^nZ_i\big \{ I(Y_i - Z_i' t\le x) - I(-Y_i + Z_i't< x)\big \}, \\&\widetilde{M}(t):=\int \big \Vert \widetilde{V}(x,t)\big \Vert ^2 dG(x), \qquad \widetilde{\theta }:= \text {argmin}_{t\in {\mathbb R}^p}\widetilde{M}(t). \nonumber \end{aligned}$$

Because L is continuous and symmetric around zero and \(\xi \) is independent of Z, \(E\widetilde{V}(x,\theta )\equiv 0\).

The following assumptions are needed for the asymptotic normality of \(\widetilde{\theta }\).

$$\begin{aligned}&E\Vert Z\Vert ^2<\infty \,\,\text {and }{\varGamma }:= EZZ'\text { is positive definite.} \end{aligned}$$
(6)
$$\begin{aligned}&H\text { satisfies }dH(v)=-dH(-v), \, \forall \, v\in {\mathbb R}^p. \end{aligned}$$
(7)
$$\begin{aligned}&F\text { has Lebesgue density }f\text {, symmetric around zero, and } \end{aligned}$$
(8)
$$\begin{aligned}&\text {such that }\ell (x)=\int f(x- v' \theta ) dH(v)\text { of }L\text { satisfies the following:} \nonumber \\&0<\int \ell ^j dG<\infty , \qquad \lim _{z\rightarrow 0}\int \big [\ell (y+z)-\ell (y)\big ]^j dG(y)=0, \,\,\, j=1,2. \nonumber \\&A:=\int \limits _0^\infty (1-L) dG<\infty . \end{aligned}$$
(9)

Under (6), \(n^{-1}\sum _{i=1}^nZ_iZ_i'\rightarrow _p {\varGamma }\) and \(n^{-1/2} \max _{1\le i\le n}\Vert Z_i\Vert \rightarrow _p 0\). Use these facts and argue as in Koul [15] to deduce that (2) and (6)–(9) imply

$$\begin{aligned} n^{1/2}\big (\widetilde{\theta }- \theta \big ) \rightarrow _D \mathcal{N}(0, \tau ^2_G{\varGamma }^{-1}), \qquad \tau ^2_G :=\frac{\text {Var}\big (\int \nolimits _{-\infty }^\xi \ell dG\big )}{\big ( \int \ell ^2 dG\big )^2}.\\\nonumber \end{aligned}$$
(10)

Remark 1

We shall discuss some examples and some sufficient conditions for the above assumptions. The conditions (8) and (9) are satisfied by a large class of densities f, ME distributions H and integrating measure G. If G is a d.f., then f being uniformly continuous and bounded implies these conditions. In this case \(\ell \) is also uniformly continuous, \(\sup _x\ell (x)\le \sup _x f(x)<\infty \) so that \(\int \ell ^jdG\le \sup _xf^j(x)<\infty \) and \(\int \big [\ell (y+z)-\ell (y)\big ]^j dG(y)\le \sup _{|x-y|\le z}|\ell (y)-\ell (x)|^j\rightarrow 0\), as \(z\rightarrow 0\). Moreover, here \(A\le 1\). Thus these two assumptions reduce to assuming \(\int \ell ^jdG>0\), \(j=1,2.\)

Given the importance of the two estimators corresponding to \(G(x)\equiv x,\, G(x)\equiv \delta _0(x)\), it is of interest to provide some easy to verify sufficient conditions that imply conditions (8) and (9) for these two estimators.

Consider the case \(G(x)\equiv x\). Assume f to be continuous and \(\int f^2(x)dx<\infty .\) Then because H is a d.f., \(\ell \) is also continuous and symmetric around zero and \( \int \ell (x+z)dx =\int \ell (x) dx=1\). Moreover, by the Cauchy-Schwartz (C-S) inequality and Fubini’s Theorem,

$$\begin{aligned} 0<\int \ell ^2(y) dy&=\int \Big (\int f(y- v' \theta ) dH(v)\Big )^2 dy\\&\le \int \int f^2(y- v' \theta )dy dH(v)=\int f^2(x)dx<\infty . \end{aligned}$$

Finally, because \(\ell \in L_2\), by Theorem 9.5 in Rudin [18], it is shift continuous in \(L_2\), i.e., (8) holds. Hence all conditions of (8) are satisfied.

Next, consider (9). The assumptions \(E(\varepsilon )=0\) and \(E(\eta )=0\) imply that \(\int |x|f(x) dx<\infty \), \(\int \Vert v\Vert dH(v)<\infty \) and hence

$$ \int |y|dL(y)=\int |y|\int f(y- v' \theta ) dH(v) dy=\int \int |x+ v' \theta | f(x) dx dH(v)<\infty . $$

This in turn implies (9) in the case \(G(x)\equiv x\).

To summarize, (6), (7), and F having continuous symmetric square integrable density f implies all of the above conditions needed for the asymptotic normality of the above analog of the Hodges-Lehmann estimator in the Berkson ME linear regression model. This fact is similar to the observation made in Berkson (1950) that the naive least square estimator, where one replace \(X_i\)’s by \(Z_i\)’s, continues to be consistent and asymptotically normal under the same conditions as when there is no ME. But, unlike in the no ME case, here the asymptotic variance

$$\begin{aligned} \tau ^2_I :=\frac{\text {Var}\big (L(\xi )\big )}{\big ( \int \ell ^2(y) dy\big )^2} =\frac{1}{12\big ( \int \big (\int f(y- v' \theta ) dH(v)\big )^2 dy\big )^2} \end{aligned}$$

depends on \(\theta \). If H is degenerate at zero, i.e., if there is no ME, then \(\tau _I^2=\sigma _I^2\), the factor that appears in the asymptotic covariance matrix of the Hodges-Lehmann estimator in this case.

Next, consider the case \(G(x)\equiv \delta _0(x)\)—degenerate measure at 0. Assume f to be continuous and bounded from the above and

$$\begin{aligned} \ell (0):= \int f( v' \theta ) dH(v) >0. \end{aligned}$$
(11)

Then the continuity and symmetry of f implies that as \(z\rightarrow 0\),

$$\begin{aligned}&\int \ell (y+z) dG(y)= \ell (z)=\int f(z- v' \theta ) dH(v) \rightarrow \int f(- v' \theta ) dH(v)=\ell (0), \\&\int \big [\ell (y+z)-\ell (y)\big ]^2 dG(y)= \Big [\int \big \{f(z- v' \theta ) -f(- v' \theta )\big \} dH(v)\Big ]^2 \\&\le \int \big \{f(z- v' \theta ) -f(-v' \theta )\big \}^2 dH(v) \rightarrow 0. \end{aligned}$$

Moreover, here \(\int \limits _0^\infty (1-L)dG=1-L(0)=1/2\) so that (9) is also satisfied.

To summarize, (6), (7), (11) and f being continuous, symmetric around zero and bounded from the above imply all the needed conditions for the asymptotic normality of the above analog of the LAD estimator in the Berkson ME linear regression model. Moreover, here

$$\begin{aligned}&\int \limits _{-\infty }^\xi \ell (x)dG(x) = \ell (0)I(\xi \ge 0), \quad \int \ell ^2(x) dG(x)=\ell ^2(0), \\&\text{ Var }\Big (\int \limits _{-\infty }^\xi \ell (x)dG(x)\Big ) = \ell ^2(0)/4 . \end{aligned}$$

Consequently, here the asymptotic covariance matrix also depends on \(\theta \) via

$$\begin{aligned} \tau ^2_0= & {} 1\big / 4 \ell ^2(0)= 1\Big /4 \Big (\int f( v' \theta ) dH(v)\Big )^2. \end{aligned}$$

In the case of no ME, \({\varGamma }^{-1}\tau _0^2\) equals the asymptotic covariance matrix of the LAD estimator. Unlike in the case of the previous estimator, here the conditions needed for f are a bit more stringent than those required for the asymptotic normality of the LAD estimator when there is no ME.

Analog of \(\hat{\theta }_R\). Here we shall describe the analogs of the class of estimators \(\hat{\theta }_R\) based on the residual ranks obtained from the model (5). These estimators do not need the errors \(\xi _i\)’s to be symmetrically distributed. Let \(\widetilde{R}_{i\vartheta }\) denote the rank of \(Y_i-Z_i'\vartheta \) among \(Y_j-Z_j'\vartheta , \, j=1,\ldots ,n\), \(\bar{Z}:= n^{-1}\sum _{i=1}^nZ_i\), \(Z_{ic}:=Z_i-\bar{Z}\), \(1\le i\le n\) and define

$$\begin{aligned} \widetilde{\mathcal{V}}(u,\vartheta )&:= n^{-1/2}\sum _{i=1}^nZ_{ic} I(\widetilde{R}_{i\vartheta }\le nu), \quad \widetilde{K}(\vartheta ):= \int \nolimits _0^1 \big \Vert \widetilde{\mathcal{V}}(u,\vartheta )\Vert ^2 d{\varPsi }(u), \\ \widetilde{\theta }_R&:= \text {argmin}_{\vartheta \in {\mathbb R}^p} \widetilde{K}(\vartheta ). \end{aligned}$$

Use the facts \(\sum _{i=1}^nZ_{ic}=0\), \({\varPsi }(\max (a,b))=\max \{{\varPsi }(a),{\varPsi }(b)\}\) and \(\max (a,b)\) \(=2^{-1}[a+b+|a-b|]\), for any \(a,b\in {\mathbb R}\), to obtain the computational formula

$$\begin{aligned} \widetilde{K}(t)=-\frac{1}{2}\sum _{i=1}^n\sum _{j=1}^n Z_{ic}'Z_{jc}\Big |{\varPsi }\Big (\frac{R_{it}}{n}\,-\Big ) - {\varPsi }\Big (\frac{R_{jt}}{n}\,-\Big )\Big |. \end{aligned}$$

The following result can be deduced from Koul [15]. Suppose \(E\Vert Z\Vert ^2<\infty \), \(\widetilde{\varGamma }:=E(Z-EZ)(Z-EZ)'\) is positive definite, density \(\ell \) of the r.v.  \(\xi \) is uniformly continuous and bounded and \(\int \limits _0^1 \ell ^2(L^{-1}(s)) d{\varPsi }(s)>0\). Then \(n^{-1/2}\max _{1\le i \le n}\Vert Z_i\Vert \rightarrow _p 0,\) \( n^{-1}\sum _{i=1}^n(Z_i-\bar{Z})(Z_i-\bar{Z})' \rightarrow _p \widetilde{\varGamma }\) and

$$\begin{aligned}&n^{1/2}\big (\widetilde{\theta }_R -\theta ) \rightarrow _D N\big (0, \widetilde{\tau }_{\varPsi }^2 \widetilde{{\varGamma }}^{-1}\big ), \quad \widetilde{\tau }_{\varPsi }^2:= \frac{\text{ Var }\big (\int \nolimits _0^{L(\xi )} \ell (L^{-1}(s)) d{\varPsi }(s)\big )}{\big (\int \nolimits _0^1 \ell ^2(L^{-1}(s)) d{\varPsi }(s)\big )^2}. \end{aligned}$$

Density f of F being uniformly continuous and bounded implies the same for \(\ell (x)=\int f(x-v'\theta ) dH(v)\). It is also worth pointing out the assumptions on FH and L needed here are relatively less stringent than those needed for the asymptotic normality of \(\widetilde{\theta }\).

Of special interest is the case \({\varPsi }(s)\equiv s\). Let \(\widetilde{\tau }^2_I\) denote the corresponding \(\widetilde{\tau }^2_{\varPsi }\). Then by the change of variable formula,

$$\begin{aligned} \widetilde{\tau }^2_I= & {} \frac{\text{ Var }\big (\int \nolimits _0^{L(\xi )} \ell (L^{-1}(s)) ds\big )}{\int \nolimits _0^1 \ell ^2(L^{-1}(s)) ds} = \frac{\text{ Var }\big (\int \nolimits _0^\xi \ell ^2(x) dx\big )}{\big (\int \nolimits _0^1 \ell ^3(x) dx)^2}\\= & {} \frac{\int \int \big [L(x\wedge y) -L(x)L(y)\big ] \ell ^2(x) \ell ^2(y) dx dy}{\big (\int \nolimits _0^1 \ell ^3(x) dx\big )^2}. \end{aligned}$$

An analog of \(\hat{\theta }_c\) here is \(\widetilde{\theta }_c := \text {argmin}_{\vartheta \in {\mathbb R}^p} \widetilde{M}_c(\vartheta )\), where

$$\begin{aligned} \widetilde{V}_c(x,\vartheta ):= & {} n^{-1/2}\sum _{i=1}^nZ_{ic} I(Y_i-Z_i'\vartheta \le x), \qquad \widetilde{M}_c(\vartheta ):= \int \big \Vert \widetilde{V}_c(x,\vartheta )\big \Vert ^2 dx. \end{aligned}$$

Arguing as above one obtains that \( n^{1/2}\big (\widetilde{\theta }_c - \theta \big ) \rightarrow _D N\big (0, \tau ^2_I\widetilde{\varGamma }^{-1}\big ). \)

4 Nonlinear Regression with Berkson ME

In this section we shall investigate the analogs of the above m.d.  estimators in nonlinear regression models with Berkson ME.

Let \(q\ge 1, p\ge 1\) be known positive integers, \({\varTheta }\subseteq {\mathbb R}^q\) be a subset of the q-dimensional Euclidean space \({\mathbb R}^q\) and consider the model where the unobservable p-dimensional covariate X, its observable surrogate Z and the response variable Y obey the relations

$$\begin{aligned} Y=m_\theta (X)+\varepsilon , \qquad X=Z+\eta ,\\\nonumber \end{aligned}$$
(12)

for some \(\theta \in {\varTheta }\). Here \(m_\vartheta (x)\) is a known parametric function, nonlinear in x, from \({\varTheta }\times {\mathbb R}^p\) to \({\mathbb R}\) with \(E|m_\vartheta (X)|<\infty \), for all \(\vartheta \in {\varTheta }\). The r.v.’s \(\varepsilon , Z, \eta \) are assumed to be mutually independent, \(E\varepsilon =0\) and \(E\eta =0\). Unlike in the linear case, here we need to assume that the d.f.  H of \(\eta \) is known. See Sect. 5 for the unknown H case.

Fix a \(\theta \) for which (12) holds. Let \(\nu _\vartheta (z):= E(m_\vartheta (X)|Z=z)\), \(\vartheta \in {\mathbb R}^q, z\in {\mathbb R}^p\). Under (12), \(E(Y|Z=z)\equiv \nu _\theta (z)\). Moreover, because H is known,

$$\begin{aligned} \nu _\vartheta (z)=\int m_\vartheta (z+s) dH(s) \end{aligned}$$

is a known parametric regression function. Thus, under (12), we have the regression model

$$\begin{aligned} Y=\nu _\theta (Z) + \zeta , \qquad E(\zeta |Z=z)=0, \qquad z\in {\mathbb R}^p. \end{aligned}$$

Unlike in the linear case, the error \(\zeta \) is no longer independent of Z in general.

To proceed further we assume there is a vector of p functions \(\dot{m}_\vartheta (x)\) such that, with \(\dot{\nu }_\vartheta (z):= \int \dot{m}_\vartheta (z+s) dH(s),\) for every \(0<b<\infty \),

$$\begin{aligned}&\max _{1\le i\le n,n^{1/2}\Vert \vartheta -\theta \Vert \le b} n^{1/2}\big |\nu _\vartheta (Z_i)-\nu _\theta (Z_i)-(\vartheta -\theta )'\dot{\nu }_\theta (Z_i)\big |=o_p(1), \end{aligned}$$
(13)
$$\begin{aligned}&E\Vert \dot{\nu }_\theta (Z)\Vert ^2<\infty . \end{aligned}$$
(14)

Let

$$\begin{aligned} L_z(x):= P(\zeta \le x|Z=z), \qquad x\in {\mathbb R},\, z\in {\mathbb R}^p. \end{aligned}$$

Assume the following. For every \(z\in {\mathbb R}^p\),

$$\begin{aligned}&L_z(\cdot )\text { is continuous and }L_z(x)=1-L_z(-x), \,\, \forall \, x\in {\mathbb R}^p. \end{aligned}$$
(15)

Let G be as before and define

$$\begin{aligned} U(x,\vartheta )&:= n^{-1/2}\sum _{i=1}^n\dot{\nu }_\vartheta (Z_i) \big \{ I(Y_i - \nu _\vartheta (Z_i) \le x) - I(-Y_i + \nu _\vartheta (Z_i) < x )\big \}\\ D(\vartheta )&:= \int \big \Vert U(x,\vartheta )\big \Vert ^2 dG(x), \qquad \widehat{\theta }:= \text {argmin}_\vartheta D(\vartheta ). \end{aligned}$$

In the case \( q=p\) and \( m_\theta (x)=x'\theta \), \(\widehat{\theta }\) agrees with \(\widetilde{\theta }\). Thus the class of estimators \(\widehat{\theta }\), one for each G, is an extension of the class of estimators \(\widetilde{\theta }\) from the linear case to the above nonlinear case.

Next, consider the extension of \(\hat{\theta }_R\) to the above nonlinear model (12). Let \(S_{i\vartheta }\) denote the rank of \(Y_i-\nu _\vartheta (Z_i)\) among \(Y_j-\nu _\vartheta (Z_j)\), \(j=1, \ldots , n\) and define

$$\begin{aligned} \mathcal{U}_n(u, \vartheta ):= & {} \frac{1}{\sqrt{n}} \sum _{i=1}^n\dot{\nu }_\vartheta (Z_i) \big \{ I(S_{i\vartheta }\le nu) -u\big \}, \\ \mathcal{K}(\vartheta ):= & {} \int \Vert \mathcal{U}_n(u, \vartheta )\Vert ^2 d{\varPsi }(u), \quad \widehat{\theta }_R := \text {argmin}_\vartheta \mathcal{K}(\vartheta ). \end{aligned}$$

The estimator \(\widehat{\theta }_R\) gives an analog of \(\hat{\theta }_R\) in the present set up.

Our goal here is to prove the asymptotic normality of \(\widehat{\theta }, \, \widehat{\theta }_R\). This will be done by following the general method of Sect. 5.4 of Koul [16]. This method requires the two steps. In the first step we need to show that the defining dispersions \(D(\vartheta )\) and \(\mathcal{K}(\vartheta )\) are AULQ (asymptotically uniformly locally quadratic) in \(\vartheta -\theta \) for \(\vartheta \in \mathcal{N}_n(b):=\{\vartheta \in {\varTheta }, n^{1/2}\Vert \vartheta -\theta \Vert \le b\}\), for every \(0<b<\infty \). The second step requires to show that \(n^{1/2}\Vert \widehat{\theta }- \theta \Vert =O_p(1)=n^{1/2}\Vert \widehat{\theta }_R - \theta \Vert . \)

4.1 Asymptotic Distribution of \(\widehat{\theta }\)

In this subsection we shall derive the asymptotic normality of \(\widehat{\theta }\). To state the needed assumptions for achieving this goal we need some more notation. Let \(\nu _{nt}(z):= \nu _{\theta +n^{-1/2}t}(z),\, \xi _{it}:=\nu _{nt}(Z_i)-\nu _\theta (Z_i),\, 1\le i\le n,\) \( \dot{\nu }_{nt}(z):= \dot{\nu }_{\theta +n^{-1/2}t}(z),\) and \(\dot{\nu }_{ntj}(z)\) denote the jth coordinate of \(\dot{\nu }_{nt}(z)\), \(1\le j\le q, t\in {\mathbb R}^q\). For any real number a, let \(a^\pm =\max (0,\pm a)\) so that \(a=a^+-a^-\). Also, let \( \beta _i(x):= I(\zeta _i\le x) - L_{Z_i}(x) \) and \(\alpha _i(x,t):= I(\zeta _i\le x+ \xi _{it}) - I(\zeta _i\le x) - L_{Z_i}(x+\xi _{it})+ L_{Z_i}(x) . \)

Because \(dG(x)\equiv -dG(-x)\) and \(U(x,\vartheta )\equiv U(-x, \vartheta ),\) we have

$$\begin{aligned} D(\vartheta )\equiv 2 \int \limits _0^\infty \big \Vert U(x,\vartheta )\big \Vert ^2 dG(x)\equiv 2\widetilde{D}(\vartheta ), \qquad \text {say}. \end{aligned}$$
(16)

We are now ready to state our assumptions.

$$\begin{aligned}&\int \limits _0^\infty E\Big (\big \Vert \dot{\nu }_\theta (Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dG(x)<\infty . \end{aligned}$$
(17)
$$\begin{aligned} \nonumber&\int \limits _0^\infty E\Big (\Vert \dot{\nu }_{nt}(Z) - \dot{\nu }_\theta (Z)\Vert ^2 L_Z(x)(1-L_Z(x)\Big ) dG(x) \rightarrow 0,\,\, \forall \,t\in {\mathbb R}^q. \\&\sup _{\Vert t\Vert \le b, 1\le i\le n} \big \Vert \dot{\nu }_{nt}(Z_i)-\dot{\nu }_\theta (Z_i)\big \Vert \rightarrow _p 0. \end{aligned}$$
(18)
$$\begin{aligned}&\text {Density }\ell _z\text { of }L_z\text { exists for all }z\in {\mathbb R}^p\text { such that } \end{aligned}$$
(19)
$$\begin{aligned}&0<\int \ell _z(x)dG(x)<\infty ,\,\forall \, z\in {\mathbb R}^p,\, \,\,0<\int E(\ell ^2_Z(x))dG(x)<\infty , \nonumber \\&\int E\big (\Vert \dot{\nu }_\theta (Z)\Vert ^2 \ell _Z^j(x)\big ) dG(x)<\infty , \,\,j=1,2. \nonumber \\&\lim _{u\rightarrow 0}\int \limits _{-\infty }^\infty \big (\ell _{z}(x+u) -\ell _{z}(x)\big )^j dG(x) =0,\,\, j=1,2, \forall \, z\in {\mathbb R}^p. \end{aligned}$$
(20)
$$\begin{aligned}&E\Big (\int \limits _{-|\xi _t(Z)|}^{|\xi _t(Z)|} \Vert \dot{\nu }_{nt}(Z)\Vert ^2 \int \limits _{-\infty }^\infty \ell _Z(x+u) dG(x) du \Big )\rightarrow 0, \, \forall \,t\in {\mathbb R}^q, \end{aligned}$$
(21)
$$\begin{aligned}&\text {where }\xi _t(z):= \nu _{nt}(z)-\nu _\theta (z). \nonumber \\&\text {With }{\varGamma }_\theta (x):= E\big (\dot{\nu }_\theta (Z) \dot{\nu }_\theta (Z)'\ell _Z(x)\big ),\text { the matrix} \\&{\varOmega }_\theta :=\int \limits _{-\infty }^\infty {\varGamma }_\theta (x){\varGamma }_\theta (x)' dG(x)\text { is positive definite}. \nonumber \end{aligned}$$
(22)

For every \(\epsilon >0\) there is a \(\delta >0\) and \(N_\epsilon <\infty \) such that \(\forall \,\Vert s\Vert \le b, n>N_\epsilon \),

$$\begin{aligned}&P\Big (\sup _{\Vert t-s\Vert<\delta } \Big ( n^{-1/2}\int \sum _{i=1}^n\big [\dot{\nu }_{ntj}^\pm (Z_i)- \dot{\nu }_{nsj}^\pm (Z_i) \big ] \alpha _i(x,t) dG(x) \Big )^2>\epsilon \Big ) <\epsilon , \end{aligned}$$
(23)
$$\begin{aligned}&P\Big (\sup _{\Vert t-s\Vert<\delta } n^{-1}\int \limits _0^\infty \Big \Vert \sum _{i=1}^n\{\dot{\nu }_{nt}(Z_i) - \dot{\nu }_{ns}(Z_i)\} \beta _i(x) \Big \Vert ^2 dG(x)>\epsilon \Big ) <\epsilon .\\\nonumber \end{aligned}$$
(24)

For every \(\epsilon>0, \alpha >0\) there exists \(N\equiv N_{\alpha ,\varepsilon }\) and \(b\equiv b_{\alpha ,\epsilon }\) such that

$$\begin{aligned} P\Big (\inf _{\Vert t\Vert>b}D(\theta + n^{-1/2}t)\ge \alpha \Big )\ge 1-\epsilon , \quad \forall \, n>N.\\\nonumber \end{aligned}$$
(25)

From now onwards we shall write \(\nu \) and \(\dot{\nu }\) for \(\nu _\theta \) and \(\dot{\nu }_\theta \), respectively.

Remark 2

We shall now discuss the above assumptions when \(m_\vartheta (x)=\vartheta 'h(x),\) where \(h=(h_1,\ldots ,h_q)'\) is a vector of q function on \({\mathbb R}^p\) with \(E\Vert h(X)\Vert ^2<\infty \), first for general G and then for some special cases of G. An example of this is the polynomial regression model with Berkson ME, where \(p=1, h_j(x)=x^j, j=1,\ldots ,q\). Let \(\beta (z):= E(h(X)|Z=z)\). Then \(\nu _\vartheta (z)=\vartheta '\beta (z)\) and \(\dot{\nu }_\vartheta (z)\equiv \beta (z)\), a constant in \(\vartheta \). Therefore (13), (14), (18), (23) and (24) are all vacuously satisfied. The condition (25) also holds here, in a similar way as in the linear regression model, cf., Koul [16, Proof of Lemma 5.5.4, pp. 183–185]. Direct calculations show that (26)–(29) below imply the remaining assumptions (17), (19), (21) and (22), respectively.

$$\begin{aligned}&\int \limits _0^\infty E\Big (\big \Vert \beta (Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dG(x)<\infty . \end{aligned}$$
(26)
$$\begin{aligned}&\forall \, z\in {\mathbb R}^p\text {, density }\ell _z\text { of }L_z\text { exists and satisfies} \end{aligned}$$
(27)
$$\begin{aligned}&0<\int \ell _z^j(x)dG(x)<\infty ,\,\,j=1,2, \quad 0<\int E(\ell ^2_Z(x))dG(x)<\infty , \nonumber \\&\int E\big (\Vert \beta (Z)\Vert ^2 \ell _Z^j(x)\big ) dG(x)<\infty , \,\,j=1,2, \quad \text {and (20) holds.} \nonumber \\&E\Big (\int \limits _{n^{-1/2}b\Vert \beta (Z)\Vert }^{|n^{-1/2}b\Vert \beta (Z)\Vert } \Vert \beta (Z)\Vert ^2 \int \limits _{-\infty }^\infty \ell _Z(x+u) dG(x) du \Big )\rightarrow 0, \end{aligned}$$
(28)
$$\begin{aligned}&\text {for every} \,\,0<b<\infty . \nonumber \\&\text {With }\mathcal{B}(x):= E\big (\beta (Z) \beta (Z)'\ell _Z(x)\big ),\text { the matrix} \\&\int \limits _{-\infty }^\infty \mathcal{B}(x)\mathcal{B}(x)' dG(x)\text { is positive definite.} \nonumber \end{aligned}$$
(29)

Consider further the case \(G(x)\equiv x\). Let \(\sigma :=(E\varepsilon ^2)^{1/2}\). Assume

$$\begin{aligned} \mathrm{(a)} \,\,\, E\Vert h(X)\Vert ^3<\infty , \,\,\, E\zeta ^2<\infty . \quad \mathrm{(b)} \,\,\, C:=\sup _{x\in {\mathbb R},z\in {\mathbb R}^p} \ell _z(x)<\infty . \end{aligned}$$
(30)

Then \(E\Vert \beta (Z)\Vert ^j\le E\Vert h(X)\Vert ^j<\infty , \, j=1,2,3\). Let \(\gamma (z):= 2\Vert \theta \Vert \Vert \beta (z)\Vert +\sigma \). Then

$$\begin{aligned} E\big (|\zeta | \big | Z=z\big )= & {} E\big (|Y-\theta '\beta (Z)|\big | Z=z\big ) \\= & {} E\big (|\theta 'h(X)+\varepsilon -\theta '\beta (Z)|\big | Z=z\big )\le \gamma (z), \quad \forall \, z\in {\mathbb R}^p. \nonumber \end{aligned}$$
(31)

Hence

$$\begin{aligned}&\int \limits _0^\infty E\Big (\big \Vert \beta (Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dx \\&\, \le E\Big (\Vert \beta (Z)\Vert ^2 E\big (|\zeta |\big |Z\big )\Big ) \le E\Big (\Vert \beta (Z)\Vert ^2\gamma (Z)\Big ) \nonumber \\&\, \le 2 \Vert \theta \Vert E\big ( \Vert \beta (Z)\Vert ^3\big ) +\sigma E\big ( \Vert \beta (Z)\Vert ^2\big )<\infty , \nonumber \end{aligned}$$

thereby showing that (26) is satisfied. The assumption (30)(b) and \(\ell _z(x)\) being a density in x for each z and Theorem 9.5 of Rudin [18] readily imply (27) here. The left hand side of (28) equals \(2n^{-1/2}bE\big (\Vert \beta (Z)\Vert ^3\big )\rightarrow 0\), by (30)(a).

Next, consider the case \(G(x)=\delta _0(x)\)- measure degenerate at zero. Assume

$$\begin{aligned}&\lim _{u\rightarrow 0}\ell _z(u)=\ell _z(0)>0, \,\, \forall \, z\in {\mathbb R}^p, \quad 0<E\ell _Z^2(0)<\infty , \\&E\big (\Vert \beta (Z)\Vert ^2\ell _Z^j(0)\big )<\infty , \quad j=1,2. \nonumber \end{aligned}$$
(32)

Then the left hand side of (26) equals \((1/2)E\Vert \beta (Z)\Vert ^2<E\Vert h(X)\Vert ^2<\infty \). Condition (27) is trivially satisfied and the left hand side of (28) equals

$$\begin{aligned} E\Big (\Vert \beta (Z)\Vert ^2 \big [L_Z(n^{-1/2}b\Vert \beta (Z)\Vert )-L_Z(-n^{-1/2}b\Vert \beta (Z)\Vert )\big ]\Big )&\rightarrow 0, \end{aligned}$$

by the DCT and the continuity of \(L_z(\cdot )\), for each z.

To summarize, in the case \(m_\vartheta (x)=\vartheta 'h(X)\) and \(G(x)\equiv x\), assumptions (30)(a), (b) and \(\int \mathcal{B}(x)\mathcal{B}(x)' dx\) being positive definite imply all of the above assumptions (13), (14) and (17)–(25). Similarly, in the case \(m_\vartheta (x)=\vartheta 'h(X)\) and \(G(x)\equiv \delta _0(x)\), \(E\Vert h(X)\Vert ^2<\infty \), (32) and \( \mathcal{B}(0)\mathcal{B}(0)'\) being positive definite imply all these conditions.

Remark 3

Because of the importance of the estimators \(\widehat{\theta }\) when \(G(x)=x\), and \(G(x)=\delta _0(x)\), it is of interest to give some simple sufficient conditions for a general \(m_\vartheta \) that imply the given assumptions for these two estimators.

Suppose G satisfies \(dG(x)\equiv g(x) dx\), where \(g_\infty :=\sup _{x\in {\mathbb R}}g(x)<\infty \). Note that \(G(x)\equiv x\) corresponds to the case \(g(x)\equiv 1\). Consider the following assumptions.

$$\begin{aligned}&\mathrm{(a)} \,\,\, E \big \Vert \dot{m}_\theta (X)\big \Vert ^4<\infty , \end{aligned}$$
(33)
$$\begin{aligned}&\mathrm{(b)}\,\,\,E\big \Vert \dot{m}_{\theta +n^{-1/2}t}(X) - \dot{m}_\theta (X)\big \Vert ^2 \rightarrow 0, \,\, \forall \,t\in {\mathbb R}^q. \nonumber \\&\text {Density }\ell _z\text { of }L_z\text { exists for all }z\in {\mathbb R}^p\text { and satisfies } \\&0<\int \ell _z^2(x)dx<\infty , \, \forall \, z\in {\mathbb R}^p,\quad 0< \int E(\ell ^2_Z(x))dx<\infty , \nonumber \end{aligned}$$
(34)
$$\begin{aligned}&0<\int E\big (\Vert \dot{\nu }(Z)\Vert ^2 \ell _Z^2(x)\big ) dx<\infty . \nonumber \\&\,\,\,E\big ( \Vert \dot{\nu }_{nt}(Z)\Vert ^2 |\nu _{nt}(Z)-\nu (Z)|\big ) \rightarrow 0, \quad \forall \,t\in {\mathbb R}^q. \\\nonumber \end{aligned}$$
(35)

Because \(\big \Vert \dot{\nu }(Z)\big \Vert ^j \le E\big ( \Vert \dot{m}_\theta (X) \big \Vert ^j\big | Z\big )\), \(E \big \Vert \dot{\nu }(Z)\big \Vert ^j \le E \big \Vert \dot{m}_\theta (X)\big \Vert ^j <\infty , \, j=1,2,3,4\), by (33)(a). Similarly, for every \(t\in {\mathbb R}^q\),

$$\begin{aligned} E\big \Vert \dot{\nu }_{nt}(Z) - \dot{\nu }(Z)\big \Vert ^2 \le E \big \Vert \dot{m}_{nt}(X)-\dot{m}_\theta (X)\big \Vert ^2\rightarrow 0, \qquad \text {by (34)(b)}. \end{aligned}$$
(36)

Next, similar to (31), \(E\big (|\zeta | \big | Z\big )=E\big (|Y-\nu (Z)|\big | Z\big ) \le 2|\nu (Z)|+\sigma \) implies that the left hand side of (17) is bounded from the above by

$$\begin{aligned} g_\infty E\Big (\Vert \dot{\nu }(Z)\Vert ^2 E\big (|\zeta |\big |Z\big )\Big )\le & {} g_\infty E\Big (\Vert \dot{\nu }(Z)\Vert ^2 \big (2|\nu (Z)|+\sigma \big )\Big )\\\le & {} g_\infty \big [2E^{1/2}(\Vert \dot{\nu }(Z)\Vert ^4) E^{1/2}(m_\theta ^2(X)) + \sigma E \Vert \dot{\nu }(Z)\Vert ^2\big ] \\< & {} \infty , \end{aligned}$$

by (33)(a), thereby verifying (17) here. Similarly, with C denoting the above upper bound, for every \(t\in {\mathbb R}^q\), the left hand side of (18) is bounded from the above \( C E\Big (\Vert \dot{\nu }_{nt}(Z) - \dot{\nu }_\theta (Z)\Vert ^2\Big ) \rightarrow 0, \) by (36). The left hand side of (21) is bounded from the above by \( 2 g_\infty E\big ( \Vert \dot{\nu }_{nt}(Z)\Vert ^2 |\nu _{nt}(Z)-\nu (Z)|\big ) \rightarrow 0,\) by (35).

In other words, in the case G has bounded Lebesgue density, conditions (33)–(35) imply assumptions (14), (17), (18), (19), (20), and (21). Not much simplification occurs in the remaining assumptions (18) and (22)–(25). See Remark 2 for some special cases.

Next consider the case when \(G(x)=\delta _0(x)\) and the following assumptions.

$$\begin{aligned}&\sup _{x\in {\mathbb R}}\ell _z(x)<\infty ,\,\, 0<\lim _{u\rightarrow 0} \ell _z(u)=\ell _z(0)<\infty , \,\,\forall \, z\in {\mathbb R}^p. \end{aligned}$$
(37)
$$\begin{aligned}&{\varGamma }_\theta (0)\text { is positive definite.} \end{aligned}$$
(38)

In this case (33), (35), (37) and (38) together imply the assumptions (14), (17)–(22). Not much simplification occurs in the remaining three assumptions (23)–(25), except in some special cases as in Remark 2.

We now resume the discussion about the asymptotic normality of \(\widehat{\theta }\). First, we show that \(E(D(\theta ))<\infty \), so that by the Markov inequality, \(D(\theta )\) is bounded in probability. To see this, by (15), \(EU(x,\theta )\equiv 0\) and, for \(x\ge 0\),

$$\begin{aligned} E\Vert U(x, \theta )\Vert ^2&=E\Big (\big \Vert \dot{\nu }(Z)\Vert \big \{I(\zeta \le x) - I(\zeta > - x)\big \}\Big )^2\\&= 2E\Big (\big \Vert \dot{\nu }(Z)\Vert ^2\big (1-L_Z(x\big ) \Big ). \end{aligned}$$

By the Fubini Theorem, (16) and (17),

$$\begin{aligned} E(D(\theta ))= 2E (\widetilde{D}(\theta ))= 4\int \limits _0^\infty E\Big (\big \Vert \dot{\nu }(Z)\Vert ^2\big (1-L_Z(x\big ) \Big ) dG(x)<\infty . \end{aligned}$$
(39)

To state the AULQ result for D, we need some more notation. Let

$$\begin{aligned}&W(x,0) := n^{-1/2}\sum _{i=1}^n\dot{\nu }(Z_i) \big \{I\big (\zeta _i \le x) - L_{Z_i}(x)\big \}, \\&T_n:=\int \limits _{-\infty }^\infty {\varGamma }_\theta (x) \big \{ W(x,0)+W(-x,0)\big \} dG(x), \quad \widehat{t}= -{\varOmega }_\theta ^{-1} T_n/2, \nonumber \end{aligned}$$
(40)

where \({\varGamma }_\theta (x)\) and \({\varOmega }_\theta \) are as in (22). We are ready to state the following lemma.

Lemma 2

Suppose the above set up and assumptions (17)– (24) hold. Then for every \(b<\infty \),

$$\begin{aligned} \sup _{\Vert t\Vert \le b}\big |D(\theta +n^{-1/2}t)-D(\theta ) - 4 T_n' t - 4 t' {\varOmega }_\theta t\big | \rightarrow _p 0.\\\nonumber \end{aligned}$$
(41)

If in addition (25) holds, then, with \({\varSigma }_\theta \) given at (45) below,

$$\begin{aligned}&\mathrm{(a)}\quad \Vert n^{1/2}\big (\widehat{\theta }- \theta ) - \widehat{t}\, \Vert \rightarrow _p 0. \\&\mathrm{(b)} \quad n^{1/2}\big (\widehat{\theta }- \theta ) \rightarrow _D N\big (0,4^{-1}{\varOmega }_\theta ^{-1}{\varSigma }_\theta {\varOmega }_\theta ^{-1}\big ). \nonumber \end{aligned}$$
(42)

Proof

The proof of (41) appears in Sect. 6. The proof of the claim (42)(a), which uses (25), (39) and (41), is similar to that of Theorem 5.4.1 of Koul [16], where (25) and (39) are used to show that \(n^{1/2}\Vert \widehat{\theta }-\theta \Vert =O_p(1)\).

Define, for \(y\in {\mathbb R},\,u\in {\mathbb R}^p,\)

$$\begin{aligned} \psi _u(y):= & {} \int \limits _{-\infty }^y \ell _u(x) dG(x),\quad \varphi _u(y):= \psi _u(-y)-\psi _u(y). \end{aligned}$$
(43)

By (19), \(0<\psi _u(y)\le \psi _u(\infty )=\int \limits _{-\infty }^\infty \ell _u(x)dG(x)<\infty ,\) for all \(u\in {\mathbb R}^p\). Thus for each u, \(\psi _u(y)\) is an increasing continuous bounded function of y and \(\psi _u(-y) \equiv \psi _u(\infty )-\psi _u(y)\), and \(\varphi _u(y)= \psi _u(\infty )-2\psi _u(y)\), for all \(y\in {\mathbb R}\).

By (15), \(E(\varphi _u(\zeta ) | Z=z)=0\), for all \(u, z\in {\mathbb R}^p\). Let

$$\begin{aligned} C_z(u,v):= & {} \text {Cov}\big [\big (\varphi _u(\zeta ), \varphi _v(\zeta )\big ) \big | Z=z\big ]=4\text {Cov}\big [\big (\psi _u(\zeta ), \psi _v(\zeta )\big ) \big | Z=z\big ],\\ \mathcal{K}(u,v):= & {} E\big (\dot{\nu }(Z)\dot{\nu }(Z)'C_Z(u,v)\big ), \qquad u,v\in {\mathbb R}^p. \end{aligned}$$

Next let \(\mu (z):=\dot{\nu }(z) \dot{\nu }(z)'\), Q denote the d.f. of Z and rewrite \({\varGamma }_\theta (x)=E\big (\dot{\nu }_\theta (Z) \dot{\nu }_\theta (Z)'\ell _Z(x)\big )\) \(=\int \mu (z) \ell _z(x) dQ(z)\). By the Fubini Theorem,

$$\begin{aligned} T_n:= & {} \int \limits _{-\infty }^\infty {\varGamma }_\theta (x) \big \{ W(x,0)+W(-x,0)\big \} dG(x) \\= & {} \int \int \limits _{-\infty }^\infty \mu (z)\big \{ W(x,0)+W(-x,0)\big \} \ell _z(x) dG(x) dQ(z) \nonumber \end{aligned}$$
(44)
$$\begin{aligned}= & {} \frac{1}{\sqrt{n}} \sum _{i=1}^n\int \mu (z) \dot{\nu }(Z_i) \varphi _z(\zeta _i) dQ(z). \end{aligned}$$

Clearly, \(ET_n=0\) and by the Fubini Theorem, the covariance matrix of \(T_n\) is

$$\begin{aligned}&{\varSigma }_\theta := ET_n T_n' \\&\,\, = E\Big \{ \Big (\int \mu (z) \dot{\nu }(Z) \varphi _z(\zeta ) dQ(z) \Big ) \Big (\int \mu (v) \dot{\nu }(Z) \varphi _v(\zeta ) dQ(v)\Big )'\Big \} \nonumber \\&\,\, = \int \int \mu (z) \mathcal{K}(z,v) \mu (v)' dQ(z) dQ(v). \nonumber \end{aligned}$$
(45)

Thus \(T_n\) is a \(p\times 1\) vector of independent centered finite variance r.v.’s. By the classical CLT, \(T_n\rightarrow _D N(0,{\varSigma }_\theta )\). Hence, the minimizer \(\widetilde{t}\) of the approximating quadratic form \(D(\theta ) + 4 T_n t+ 4 t' {\varOmega }_\theta t\) with respect to t satisfies \( \tilde{t}= -{\varOmega }_\theta ^{-1} T_n/2 \rightarrow _D N\big (0,4^{-1}{\varOmega }_\theta ^{-1}{\varSigma }_\theta {\varOmega }_\theta ^{-1}\big ). \) The claim (42)(b) now follows from this result and (42)(a).  \(\Box \)

4.2 Asymptotic Distribution of \(\widehat{\theta }_R\)

In this subsection we shall establish the asymptotic normality of \(\widehat{\theta }_R\). For this we need the following assumptions, where \(\mathcal{U}(b):=\{t\in {\mathbb R}^q; \Vert t\Vert \le b\}\), and \(0<b<\infty \).

$$\begin{aligned}&\ell _z\text { is uniformly continuous and bounded for every }z\in {\mathbb R}^p. \end{aligned}$$
(46)
$$\begin{aligned}&n^{-1}\sum _{i=1}^nE\Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }(Z_i)\Vert ^2\rightarrow 0, \quad \forall \, t\in \mathcal{U}(b). \end{aligned}$$
(47)
$$\begin{aligned}&n^{-1/2}\sum _{i=1}^n\Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }(Z_i)\Vert =O_p(1), \quad \forall \, t\in \mathcal{U}(b). \end{aligned}$$
(48)

\(\forall \,\epsilon>0, \, \exists \, \delta >0\) and \(n_\epsilon <\infty \) such that for each \(s\in \mathcal{U}(b)\), \(\forall \, n>n_\epsilon \),

$$\begin{aligned} P\Big (\sup _{t\in \mathcal{U}(b);\Vert t-s\Vert \le \delta }n^{-1/2}\sum _{i=1}^n\Vert \dot{\nu }_{nt}(Z_i) - \dot{\nu }_{ns}(Z_i)\Vert \le \epsilon \Big )>1-\epsilon . \\\nonumber \end{aligned}$$
(49)

\(\forall \,\epsilon >0, \, 0<\alpha <\infty ,\, \exists \, N\equiv N_{\alpha ,\epsilon }\) and \(b\equiv b_{\epsilon ,\alpha }\) such that

$$\begin{aligned} P\Big (\inf _{\Vert t\Vert>b} \mathcal{K}(\theta +n^{-1/2}t)\ge \alpha \Big ) \ge 1-\epsilon , \quad \forall \, n>N.\\\nonumber \end{aligned}$$
(50)

Let

$$\begin{aligned}&\bar{\dot{\nu }}:= n^{-1}\sum _{i=1}^n\dot{\nu }(Z_i), \quad \dot{\nu }^c(Z_i):= \dot{\nu }(Z_i) - \bar{\dot{\nu }}, \\&\widehat{\varGamma }_\theta (u):= E\Big (\dot{\nu }^c(Z)\dot{\nu }^c(Z)'\ell _Z(L_Z^{-1}(u))\Big ), \quad \widehat{\varOmega }_\theta := \int \limits _0^1\widehat{\varGamma }_\theta (u)\widehat{\varGamma }_\theta (u)'d{\varPsi }(u), \\&\widehat{\mathcal{U}}(u):= n^{-1/2}\sum _{i=1}^n\dot{\nu }^c(Z_i) \big \{I(L_{Z_i}(\zeta _i)\le u) -u\big \}, \quad 0\le u\le 1, \\&\widehat{T}_n:= \int \limits _0^1 \widehat{\varGamma }_\theta (u)\widehat{\mathcal{U}}(u) d{\varPsi }(u) ,\quad \widehat{\mathcal{K}}(t):= \int \limits _0^1 \big \Vert \widehat{\mathcal{U}}(u)\big \Vert ^2 d{\varPsi }(u)+ 2 \widetilde{T}_n' t+ t'\widetilde{\varOmega }_\theta t. \end{aligned}$$

We need to have an alternate representation of the covariance matrix of \(\widehat{T}_n\). Let, for \(z\in {\mathbb R}^p, \quad 0\le v\le 1\),

$$\begin{aligned} \kappa _z(v):=\int \limits _0^v \ell _z(L_z^{-1}(u))d{\varPsi }(u), \quad k_z^c(v)=k_z(v)-\int \limits _0^1 k_z(u)du. \end{aligned}$$

By (46), \(\kappa _z\) is a uniformly continuous increasing and bounded function on [0, 1], for all \(z\in {\mathbb R}^p\). Let U denote a uniform [0, 1] r.v. Conditionally, given Z, \(L_Z(\zeta )\sim _D U\). Hence, \(E\big (k_z\big (L_Z(\zeta )\big )\big |Z\big )=E k_z(U)\) so that \(E\big (k_z^c(L_Z(\zeta ))\big |Z\big )=Ek_z^c(U)=0,\) a.s. Let \(\mu ^c(z):= \dot{\nu }^c(z) \dot{\nu }^c(z)'\). Argue as for (44) and use the facts that \(\sum _{i=1}^n\dot{\nu }_c(Z_i)\equiv 0\) and \(\int \limits _0^1 u dk_z(u)= k_z(1) - \int \nolimits _0^1 k_z(u) du\) to obtain that

$$\begin{aligned} \widehat{T}_n= & {} -n^{-1/2} \sum _{i=1}^n\int \mu ^c(z) \dot{\nu }^c(Z_i) \kappa _z^c\big (L_{Z_i}(\zeta _i)\big ) dQ(z). \end{aligned}$$

Define

$$\begin{aligned}&\widehat{C}_z(s,t):= E\big [\kappa _s^c( L_Z(\zeta ))\kappa _t^c( L_Z(\zeta ))\big |Z=z\big ]= E\big [\kappa _s^c(U)\kappa _t^c(U)\big ], \\&\widehat{K}(s,t):= E\big (\dot{\nu }^c(Z) \dot{\nu }^c(Z)' \widehat{C}_Z(s,t)\big ). \end{aligned}$$

Then argue as in (45) to obtain

$$\begin{aligned} \widehat{\varSigma }_\theta :=E\widehat{T}_n \widehat{T}_n '= \int \int \mu ^c(z) \widehat{K}(z,v) \mu ^c(v)' dQ(z) dQ(v). \end{aligned}$$

We are now ready to state the following asymptotic normality result for \(\widehat{\theta }_R\).

Lemma 3

Suppose the nonlinear Berkson measurement error model (12) and the assumptions (13), (14), (46)–(49) hold. Then the following holds.

$$\begin{aligned}&\sup _{\Vert t\Vert \le b}\big |\mathcal{K}(\theta +n^{-1/2}t)-\widehat{K}(t)\big |=o_p(1). \end{aligned}$$
(51)

In addition, if (50) holds and \(\widehat{\varOmega }_\theta \) is positive definite then \( n^{1/2}(\widehat{\theta }_R-\theta ) \rightarrow _d N\big (0, \widehat{\varOmega }_\theta ^{-1}\widehat{\varSigma }_\theta \widehat{\varOmega }_\theta ^{-1}\big ). \)

The proof of this lemma is similar to that of Theorem 1.2 of Koul [15], hence no details are given here. Assumption (50) is used to show that \(n^{1/2}\Vert \widehat{\theta }_R-\theta \Vert =O_p(1).\)

Remark 4

As in Remark 1, let \(m_\theta (x)=\theta ' h(x)\). Then \(\nu _\vartheta (z)= \vartheta '\beta (z)\), where \(\beta (z):= E\big (h(X)|Z=z\big )\). Thus \(\dot{\nu }_\vartheta (z)\equiv \beta (z)\) and the assumptions (47)–(49) are vacuously satisfied. The assumption (50) is shown to be satisfied by an argument similar to the one used in the proof of Lemma 5.4.4 of Koul [16, pp. 183–185]. This proof uses the monotonicity in t for every unit vector \(e\in {\mathbb R}^p\) of simple linear rank statistics based on the ranks of \(Y_i-te'h(X_i)\), \(1\le i\le n\), see Hájek [10, Theorem II.7E].

For the asymptotic normality of \(\widehat{\theta }_R\) here, one only needs (46) and \({\varPsi }\) to be a d.f.  such that \(\widehat{\varOmega }\) is positive definite. Note that here \(\mu ^c(z)=\beta ^c(z):=\beta (z) - \bar{\beta }\), \(\bar{\beta }:= n^{-1}\sum _{i=1}^n\beta (Z_i)\) and

$$\begin{aligned} \widehat{K}(s,t)&:= E\big (\beta ^c(Z) \beta ^c(Z)' \widehat{C}_Z(s,t)\big ), \\ \widehat{\varSigma }&= \int \int \beta ^c(z)\beta ^c(z)' \widehat{K}(z,v) \beta ^c(v)\beta ^c(v)' dQ(z) dQ(v), \\ \widehat{\varGamma }(u)&:= E\Big (\beta ^c(Z)\beta ^c(Z)'\ell _Z(L_Z^{-1}(u))\Big ), \quad \widehat{\varOmega }=\int \limits _0^1\widehat{\varGamma }(u)\widehat{\varGamma }(u)'d{\varPsi }(u), \end{aligned}$$

do not depend on \(\theta \). Clearly, these assumptions are far less stringent than those needed for the asymptotic normality of \(\widehat{\theta }\) corresponding to \(G(x)\equiv x\).

5 M.D. Estimators with Validation Data

In this section we develop the m.d.  estimators of Sect. 4 when the d.f. H of the Berkson ME \(\eta \) is unknown but a validation data set is available. Not knowing H renders \(\nu _\theta \) to be an unknown function. Validation data is used to estimate this function, which in turn is used to define m.d.  estimators.

Let N be a known positive integer. A set of r.v.’s \(\{(\tilde{X}_k, \tilde{Z}_k), k = 1, ...,N\}\) is said to be validation data if these r.v.’s are independent of the original sample and both \(\tilde{Z}_k\) and \(\tilde{X}_k\) are observable and obey the model (12). Besides having the primary data set \(\{(Y_i,Z_i), 1 \le i \le n\}\), we assume that a validation data set of the covariate \(\{(\tilde{X}_k, \tilde{Z}_k), 1\le k \le N\}\) is available. Then \(\tilde{\eta }_k := \tilde{X}_k - \tilde{Z}_k, 1 \le k \le N\) are observable and their empirical d.f. \(H_N(s) := N^{-1}\sum _{k=1}^{N} I(\tilde{\eta }_k \le s), s \in \mathbb {R}\), provides an estimate of H.

Under (13)–(15), we have the following estimates of \(\nu _{\theta }\) and \(\dot{\nu }_\theta \).

$$\begin{aligned}&\hat{\nu }_\vartheta (z) := N^{-1} \sum _{k=1}^N m_\vartheta (z + \tilde{\eta }_k),\qquad \hat{\dot{\nu }}_\vartheta (z) := N^{-1} \sum _{k=1}^N \dot{m}_\vartheta (z + \tilde{\eta }_k). \end{aligned}$$

An analog of \(\widehat{\theta }\) in the current set up is defined as follows. Let

$$\begin{aligned} \widehat{U}(x,\vartheta )&:= n^{-1/2} \sum _{i=1}^n \hat{\dot{\nu }}_\vartheta (Z_i) \{ I(Y_i - \hat{\nu }_\vartheta (Z_i) \le x) - I(-Y_i + \hat{\nu }_\vartheta (Z_i) < x)\}, \\ D_1(\vartheta )&:= \int \Vert \widehat{U}(x,\vartheta )\Vert ^2 dG(x), \qquad \widehat{\theta }_1 = \text {argmin}_\vartheta D_1(\vartheta ). \end{aligned}$$

To define the analog of \(\hat{\theta }_R\) here, let \(\tilde{S}_{i\vartheta }\) be the rank of \(Y_i - \hat{\nu }_\vartheta (Z_i)\) among \(Y_j - \hat{\nu }_\vartheta (Z_j), 1\le j\le n\) and define

$$\begin{aligned} \tilde{\mathcal {U}}_n(u,\vartheta )&:= \frac{1}{\sqrt{n}} \sum _{i=1}^n \hat{\dot{\nu }}_\vartheta (Z_i) \{I(\tilde{S}_{i\vartheta }\le nu)-u\}, \quad 0\le u\le 1,\\ \tilde{\mathcal {K}}(\vartheta )&:= \int \limits _0^1 \Vert \tilde{\mathcal {U}}_n(u,\vartheta )\Vert ^2 d{\varPsi }(u), \quad \tilde{\theta }_R := \text {argmin}_\vartheta \tilde{\mathcal {K}}(\vartheta ). \end{aligned}$$

The asymptotic distributions of \(\widehat{\theta }_1\) and \(\tilde{\theta }_R\) as \(n\wedge N\rightarrow \infty \) are described in the next two subsections. In their derivations, the lim(n/N) of the ratio n/N plays an important role. Some of the proofs are similar to those of \(\widehat{\theta }\) and \(\widehat{\theta }_R\). Some key steps of the proof can be found in the Appendix.

5.1 Asymptotic Distribution of \(\widehat{\theta }_1\)

In this subsection we derive the asymptotic distribution of \(\widehat{\theta }_1\). In addition to (13)–(15) and (17)–(25), the following assumptions are needed, where \({\varDelta }_\vartheta (z) := \hat{\nu }_\vartheta (z) - \nu _\vartheta (z)\) and \(\theta \) is as in (12).

$$\begin{aligned}&E\big \Vert E\big \{\dot{m}_\theta (X)[\nu _\theta (Z) - m_\theta (X)] |Z\big \}\big \Vert ^2 <\infty ,\end{aligned}$$
(52)
$$\begin{aligned}&E\big \Vert E\big \{\dot{m}_\theta (X)[\nu _\theta (Z) - m_\theta (X)] |\eta \big \}\big \Vert ^2 < \infty . \nonumber \\&\text {The matrix } \end{aligned}$$
(53)
$$\begin{aligned}&{\varSigma }_1 := \text {Cov}\Big (E\big [ \textstyle \int \int \mu (z)\dot{\nu }_\theta (Z)\ell _z(x) \ell _Z(x) [m_\theta (Z+\eta ) - \nu _\theta (Z)]dxdQ(z)\big |\eta \big ]\Big ) \nonumber \\&\text { is positive definite.} \nonumber \\&\lambda := \lim ( n/N)\ge 0. \end{aligned}$$
(54)
$$\begin{aligned}&\max _{1\le i\le n} \Big |N^{-1}\sum _{k=1}^N m_\theta (Z_i + \tilde{\eta }_k) - \nu _\theta (Z_i)\Big | = o_p(1). \end{aligned}$$
(55)
$$\begin{aligned}&E \Big \{\Vert \dot{\nu }_\theta (Z)\Vert ^2\big (m_\theta (X) - \nu _\theta (Z)\big )^2\Big \} <\infty . \end{aligned}$$
(56)
$$\begin{aligned}&\int \limits _0^\infty E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \big (1 - L_Z(x\pm {\varDelta }_\theta (Z))\big )\Big ) dG(x)<\infty . \end{aligned}$$
(57)
$$\begin{aligned}&\int \limits _0^\infty E\Big (\Vert \hat{\dot{\nu }}_{nt}(Z) - \hat{\dot{\nu }}_\theta (Z)\Vert ^2 L_Z(x\pm {\varDelta }_\theta (Z)) \\&\qquad \qquad \qquad \qquad \quad \times (1-L_Z(x\pm {\varDelta }_\theta (Z))\Big ) dG(x) \rightarrow 0,\,\, \forall \,t\in {\mathbb R}^q. \nonumber \end{aligned}$$
(58)

We also assume that (18)–(24) and (25) hold with \(\dot{\nu }_{nt}\), \(\dot{\nu }_{\theta }\) and D replaced by \(\hat{\dot{\nu }}_{nt}\), \(\hat{\dot{\nu }}\) and \(D_1\), respectively. We denote these assumptions as (18)\(^*\)–(25)\(^*\).

Here we discuss some sufficient conditions for the above assumptions. By the C-S inequality, both the expressions of (52) are bounded from the above by \(2 E\big \Vert \dot{m}_\theta (X)\big \Vert ^2 E\big |m_\theta (X)\big |^2\). Thus (52) is implied by (33)(a) and having \(E\big |m_\theta (X)\big |^2<\infty \).

Next, under (33)(a), (57) is trivially satisfied when \(G(x)\equiv \delta _0(x)\). In the case \(dG(x)=g(x)dx\) with \(g_\infty :=\sup _{y\in {\mathbb R}}g(y)<\infty \), (57) is implied by (33)(a) and the following conditions.

$$\begin{aligned} E\big (\big \Vert \dot{m}_\theta (X)\Vert ^2 |m_\theta (X)|\big )<\infty .\\\nonumber \end{aligned}$$
(59)

To see this, note that \(E\big (|{\varDelta }_\theta (Z)|\big |Z\big )\le 2 E\big (|m_\theta (X)|\big | Z\big )\), and \(\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \le N^{-1}\sum _{k=1}^N\Vert \dot{m}_\theta (Z+\eta _k)\Vert ^2\) so that \(E\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \le E\Vert \dot{m}_\theta (X)\Vert ^2\). Now argue as in Remark 4.2 and use these facts to obtain that the left hand side of (57) is bounded from the above by

$$\begin{aligned}&g_\infty E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 E\Big (|\zeta |+|{\varDelta }_\theta (Z)|\big |Z\Big )\Big )\\&\le g_\infty \Big [C E \big \Vert \dot{m}_\theta (X)\Vert ^2 + 2 E\big (\big \Vert \dot{m}_\theta (X)\Vert ^2 |m_\theta (X)|\big )\Big ]<\infty , \end{aligned}$$

by (33)(a) and (59). Similarly, the left hand side of (58) is bounded from the above by a constant multiple of

$$\begin{aligned} \,\,E\big \Vert \hat{\dot{\nu }}_{nt}(Z)- \hat{\dot{\nu }}_\theta (Z)\big \Vert ^2 \le E\big \Vert \dot{m}_{\theta +n^{-1/2}t}(X)-\dot{m}_{\theta }(X)\big \Vert ^2\rightarrow 0, \,\,\, \text {by (34)(b)}. \end{aligned}$$

We now turn to proving the asymptotic normality of \(\widehat{\theta }_1\). Similar to Sect. 4.1, we first prove that \(E(D_1(\theta ))<\infty \). Recall \({\varDelta }_\vartheta (z) := \hat{\nu }_\vartheta (z) - \nu _\vartheta (z)\) and rewrite

$$\begin{aligned} \widehat{U}(x,\theta )= & {} \frac{1}{\sqrt{n}} \sum _{i=1}^n \hat{\dot{\nu }}_\theta (Z_i) \Big \{ I(\zeta _i \le x + {\varDelta }_\theta (Z_i)) - I( - \zeta _i < x - {\varDelta }_\theta (Z_i))\Big \}. \end{aligned}$$

By the independence of the primary and validation data and a conditioning argument, for every \(x>0\),

$$\begin{aligned} E\Vert \widehat{U}(x,\theta )\Vert ^2&= E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert \{I(\zeta \le x + {\varDelta }_\theta (Z)) - I( - \zeta < x - {\varDelta }_\theta (Z))\}\Big )^2 \\&= E\Big (\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \{1 - L_Z(x+{\varDelta }_\theta (Z)) + 1 - L_Z(x-{\varDelta }_\theta (Z))\}\Big ). \end{aligned}$$

Hence by (56), \(ED_1(\theta )<\infty \).

Next we sketch the proof of the AULQ property of \(D_1(\vartheta )\). Define

$$\begin{aligned} \tilde{W}(x,0):= & {} n^{-1/2} \sum _{i=1}^n \hat{\dot{\nu }}(Z_i) \{I(\zeta _i \le x + {\varDelta }_\theta (Z_i)) - L_{Z_i}(x)\}, \\ \tilde{T}_n:= & {} \int {\varGamma }_\theta (x) \{\tilde{W}(x,0) + \tilde{W}(-x,0)\} dG(x). \end{aligned}$$

In the Appendix, we show that \(\tilde{T}_n\) is approximated by a U-statistic based on the two independent samples. Theorem 6.1.4 in Lehmann [17] yields

$$\begin{aligned}&\mathrm{(a)} \quad \tilde{T}_n \rightarrow N(0, {\varSigma }_\theta + 4 \lambda {\varSigma }_1), \quad \lambda <\infty , \\&\mathrm{(b)}\quad \sqrt{N/n} \,\tilde{T}_n \rightarrow N(0, 4 {\varSigma }_1), \quad \lambda = \infty . \nonumber \end{aligned}$$
(60)

Next, the assumptions (54)–(58) and (18)\(^*\)–(24)\(^*\) ensure that the analog of Lemma 5 holds here also. Hence (41) with \(T_n\) and \(D(\vartheta )\) replaced by \(\widetilde{T}_n\) and \(D_1(\vartheta )\), respectively, holds. Moreover, analog of (42) can be shown to hold in a similar manner as in Sect. 4 under (25)\(^*\). Consequently, the asymptotic distribution of \(\widehat{\theta }_1\) based on data sets \(\{(Y_i,Z_i),1\le i\le n\}\) and \(\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}\) described in the following lemma.

Lemma 4

Suppose model (12) with H unknown holds and an independent validation data \(\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}\) obeying (12) is available. In addition assume that (17)–(25) and (52)–(57) hold. Then

$$\begin{aligned}&\sqrt{n}(\widehat{\theta }_1 - \theta ) \rightarrow N(0,4^{-1}{\varOmega }_\theta ^{-1}({\varSigma }_\theta + 4 \lambda {\varSigma }_1){\varOmega }_\theta ^{-1}), \,\,\mathrm{for} \,\, 0\le {\lambda } < \infty ;\\[.2cm]&\sqrt{N} (\widehat{\theta }_1 - \theta ) \rightarrow N(0,16^{-1} {\varOmega }_\theta ^{-1}{\varSigma }_1{\varOmega }_\theta ^{-1}), \,\,\mathrm{for}\,\,{\lambda } = \infty . \end{aligned}$$

The above result shows that the estimation step of regression function \(\nu _\theta (z)\) due to the unknown distribution H introduces more variation in the asymptotic distribution of the m.d.  estimators. Moreover, the limiting ratio \(\lambda \) of the sample sizes plays a role in the additional variation. When \(\lambda = \lim n/N = 0\), the additional covariance term vanishes, therefore it reduces to the case when the ME distribution is known. In other words, when the validation sample size N is sufficiently large, compared to the primary sample size n, both \(\hat{\theta }\) and \(\widehat{\theta }_1\) achieve the same asymptotic efficiency. On the other hand, when \(\lambda = \infty \), i.e., when the validation data size is very limited compared to the primary data size, the estimation consistency rate is restricted to \(\sqrt{N}\) instead of \(\sqrt{n}\).

5.2 Asymptotic Distribution of \(\tilde{\theta }_R\)

In this subsection we present the asymptotic distribution of the class of estimators \(\tilde{\theta }_R\). First, we provide the additional assumptions. Let \(\hat{\dot{\nu }}_{nt}(z) = N^{-1}\sum _{k=1}^N \dot{\nu }_{\theta +n^{-1/2}t}(z+\tilde{\eta }_k)\). Consider the following assumptions.

$$\begin{aligned}&n^{-1}\sum _{i=1}^n E\Vert \hat{\dot{\nu }}_{nt}(Z_i) - \hat{\dot{\nu }}(Z_i)\Vert ^2 \rightarrow 0,\quad \forall \, t\in \mathcal{U}(b).\end{aligned}$$
(61)
$$\begin{aligned}&\,\,n^{-1/2}\sum _{i=1}^n\Vert \hat{\dot{\nu }}_{nt}(Z_i) - \hat{\dot{\nu }}(Z_i)\Vert = O_p(1),\quad \forall \, t\in \mathcal{U}(b).\\\nonumber \end{aligned}$$
(62)

\(\forall \,\epsilon> 0,\,\,\exists \,\,\delta > 0\) and \(n_\epsilon < \infty \) such that for each \(s \in \mathcal{U}(b)\), \(\forall \,n>n_\epsilon \),

$$\begin{aligned}&P\Big (\sup _{t\in \mathcal{U}(b);\Vert t-s\Vert \le \delta }n^{-1/2}\sum _{i=1}^n \Vert \hat{\dot{\nu }}_{nt}(Z_i) - \hat{\dot{\nu }}_{ns}(Z_i)\Vert \le \epsilon \Big )>1-\epsilon . \\\nonumber \end{aligned}$$
(63)

For every \(\epsilon >0, \, 0<\alpha <\infty \), there exist an \(N_\epsilon \) and \(b\equiv b_{\epsilon ,\alpha }\) such that

$$\begin{aligned}&P\Big (\inf _{\Vert t\Vert>b} \tilde{\mathcal {K}}(\theta +n^{-1/2}t)\ge \alpha \Big ) \ge 1-\epsilon , \quad \forall \, n>N_\epsilon .\end{aligned}$$
(64)
$$\begin{aligned}&\text {The matrix } \\&{\varSigma }_2 := \text {Cov}\Big (E\big [\textstyle \int \int \mu ^c(z) \{\dot{\nu }_\theta (Z) - E(\dot{\nu }_\theta (Z))\} \ell _z(x)\ell _Z(x) \nonumber \\&\qquad \qquad \qquad \qquad \;\;\;\; \times \{m_\theta (Z+\eta ) - \nu _\theta (Z)\} dxdQ(z)\big |\eta \big ]\Big ) \nonumber \\&\text {is positive definite}. \nonumber \end{aligned}$$
(65)

Next, define   \(\tilde{\dot{\nu }} := n^{-1} \sum _{i=1}^n\hat{\dot{\nu }}(Z_i), \qquad \tilde{\dot{\nu }}^c (Z_i) := \hat{\dot{\nu }}(Z_i) - \tilde{\dot{\nu }},\)

$$\begin{aligned}&\tilde{T}_{n,R} := \int \limits _0^1 \widehat{{\varGamma }}_\theta (u)\tilde{\mathcal {U}}_R(u) d{\varPsi }(u), \\&\tilde{\mathcal{K}}_R(t) := \int \limits _0^1 \Vert \tilde{\mathcal {U}}_R(u)\Vert ^2 d{\varPsi }(u) + 2 \tilde{T}_{n,R}'t + t'\widehat{\varOmega }_\theta t, \end{aligned}$$

where \(\widehat{\varGamma }_\theta \) and \(\widehat{\varOmega }_\theta \) are defined in Sect. 4.2. Similar to Lemma 4. we have the following lemma.

Lemma 5

Suppose model (12) with H unknown holds and an independent validation data \(\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}\) obeying (12) is available. In addition assume that (54), (55), (61)–(65) hold. Then, for \(0\le \lambda < \infty \),

$$\begin{aligned}&\mathrm{(a)} \quad \tilde{T}_{n,R} \rightarrow N(0,\widehat{\varSigma }_\theta + \lambda {\varSigma }_2), \\&\mathrm{(b)} \quad n^{1/2}(\tilde{\theta }_R - \theta ) \rightarrow N(0,\widehat{\varOmega }_\theta ^{-1}(\widehat{\varSigma }_\theta + \lambda {\varSigma }_2)\widehat{\varOmega }_\theta ^{-1}). \nonumber \end{aligned}$$
(66)

Moreover, \( N^{1/2}(\tilde{\theta }_R - \theta ) \rightarrow N(0,\widehat{\varOmega }_\theta ^{-1}{\varSigma }_2 \widehat{\varOmega }_\theta ^{-1}) \), for \(\lambda =\infty \).

See Appendix for some details of the proof.

6 Data Analysis

Example. We shall now compute the above estimators based on some real data. The data pertains to the study of the relationship between the enzyme reaction speed (Y) and the basal density (X) of the UDP-galactose, see Bates and Watts [1], p. 70. A suitable model commonly used to analyze this data is the Michaelis-Menten model

$$\begin{aligned} m_\theta (x) = \frac{\alpha x}{\beta + x}, \quad \theta :=(\alpha , \beta )', \,\alpha> 0, \, \beta>0, \, x > 0. \end{aligned}$$

In the primary data, consisting of \(n=30\) observations, the basal density variable was measured using a simple chemical method. It was believed that this method caused measurement error in the observation. Hence, in the validation data, consisting of \(N=10\) observations, an expensive procedure with a precision machine tool was used to produce precise observations of the basal density. Let Z denote the basal-density obtained by the chemical method parts per millions (ppm), \(\widetilde{Z}\) denote the basal-density obtained by the exact measure (ppm) and Y, the reaction speed (counts/min\(^2\)). The primary and validation data are as follows. Table 2 gives the m.d.  estimators \(\hat{\theta }_1\) with \(G(x)=x\) and \(\tilde{\theta }_R\) with \({\varPsi }(u) = u\), based on the above primary and validation data, and the naive least squares estimators \(\hat{\theta }_\mathrm{nLS}\) obtained by ignoring measurement errors. The MSEs are calculated by using the following formulas, where \(\widetilde{\eta }_k=\widetilde{X}_k-\widetilde{Z}_k\), \(MSE(\hat{\theta }_1) = \frac{1}{n} \sum _{i=1}^n \Big [Y_i - \frac{1}{N}\sum _{k=1}^N m_{\hat{\theta }_1}(Z_i+\widetilde{\eta }_k)\Big ]^2,\) \(MSE(\tilde{\theta }_R) = \frac{1}{n} \sum _{i=1}^n \Big [ Y_i - \frac{1}{N}\sum _{k=1}^N m_{\tilde{\theta }_R}(Z_i+\widetilde{\eta }_k)\Big ]^2\) and \(MSE(\hat{\theta }_\mathrm{nLS}) = \frac{1}{n} \sum _{i=1}^n \Big [Y_i - m_{\hat{\theta }_\mathrm{nLS}}(Z_i)\Big ]^2.\) Figure 1 presents the fitted regression curves using the three estimators.

Z

0.02

0.02

0.04

0.04

0.06

0.06

0.08

0.08

0.11

0.11

0.14

0.14

0.18

0.18

0.22

Y

76

47

82

95

97

107

118

127

123

139

146

149

157

151

159

Z

0.42

0.42

0.56

0.56

0.66

0.66

0.86

0.86

1.10

1.10

0.22

0.28

0.28

0.34

0.34

Y

185

189

191

192

193

196

198

202

207

2.04

152

173

180

179

182

\(\widetilde{Z}\)

0.04

0.07

0.20

0.30

0.38

0.48

0.60

0.76

0.95

1.110

\(\widetilde{X}\)

0.035

0.076

0.207

0.295

0.388

0.486

0.601

0.754

0.952

1.112

Table 2 M.D. and naive estimators and their MSE
Fig. 1
figure 1

Fitted regressions based on the three estimators