Abstract
We develop analogs of the two classes of weighted empirical minimum distance (m.d.) estimators of the underlying parameters in linear and nonlinear regression models when covariates are observed with Berkson measurement error. One class is based on the integral of the square of symmetrized weighted empirical of residuals while the other is based on a similar integral involving a weighted empirical of residual ranks. The former class requires the regression and measurement errors to be symmetric around zero while the latter class does not need any such assumption. The first class of estimators includes the analogs of the least absolute deviation and Hodges-Lehmann estimators while the second class includes an estimator that is asymptotically more efficient than these two estimators at some error distributions when there is no measurement error. In the case of linear model, no knowledge of the measurement error distribution is needed. We also develop these estimators for nonlinear models when the measurement error distribution is known and when it is unknown but validation data is available.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Statistical literature is replete with the various minimum distance estimation methods in the one and two sample location models. Beran [2, 3] and Donoho and Liu [7, 8] argue that the minimum distance estimators based on \(L_2\) distances involving either density estimators or residual empirical distribution functions have some desirable finite sample properties, tend to be robust against some contaminated models and are also asymptotically efficient at some error distributions.
In the classical regression models without measurement error in the covariates, classes of minimum distance estimators of the underlying parameters based on Cramér-von Mises type distances between certain weighted residual empirical processes were developed in Koul [12,13,14,15]. These classes include some estimators that are robust against outliers in the regression errors and asymptotically efficient at some error distributions.
In practice there are numerous situations when covariates are not observable. Instead one observes their surrogate with some error. The regression models with such covariates are known as the measurement error regression models. Fuller [9], Cheng and Van Ness [6], Carroll et al. [5] and Yi [19] discuss numerous examples of practical importance of these models.
Given the desirable properties of the above minimum distance (m.d.) estimators and the importance of the measurement error regression models, it is desirable to develop their analogs for these models. The next section describes the m.d. estimators of interest and their asymptotic distributions in the classical linear regression model. Their analogs for the linear regression Berkson measurement error (ME) model are developed in Sect. 3. The two classes of m.d. estimators are developed. One assumes the symmetry of the regression model error and ME error distributions and then basis the m.d. estimators on the symmetrized weighted empirical of the residuals. This class includes an analog of the Hodges-Lehmann estimator of the one sample location parameter, see Hodges and Lehmann (1963), and the least absolute deviation (LAD) estimator. The second class is based on a weighted empirical of residual ranks. This class of estimators does not need the symmetry of the errors distributions. This class includes an estimator that is asymptotically more efficient than the analog of Hodges-Lehmann and LAD estimators at some error distributions. Neither classes need the knowledge of the measurement error or regression error distributions.
Section 4 discusses analogs of these estimators in the Berkson measurement error nonlinear regression models, where the measurement error distribution is assumed to be known. Section 5 develops their analogs when the ME distribution is unknown but validation data is available. In this case the consistency rate of these estimators is min\((n, N)^{1/2}\), where n and N are the primary data and validation data sample sizes, respectively. Section 6 provides an application of the proposed estimators to a real data example. Several proofs are deferred to the last section.
2 Linear Regression Model
In this section we recall the definition of the m.d. estimators of interest here in the no measurement error linear regression model and their known asymptotic normality results.
Accordingly, consider the linear regression model where for some \(\theta \in {\mathbb R}^p\), the response variable Y and the p dimensional observable predicting covariate vector X obey the relation
where \(\varepsilon \) is independent of X and symmetrically distributed around \(E(\varepsilon )=0\). For an \(x\in {\mathbb R},\,x'\) and \(\Vert x\Vert \) denote its transpose and Euclidean norm, respectively. Let \((X_i, Y_i), 1\le i\le n\) be a random sample from this model. The two classes of m.d. estimators of \(\theta \) based on weighted empirical processes of the residuals and residual ranks were developed in Koul [12,13,14,15]. To describe these estimators, let G be a nondecreasing right continuous function from \({\mathbb R}\) to \({\mathbb R}\) having left limits and define
This class of estimators, one for each G, includes some well celebrated estimators. For example \(\hat{\theta }\) corresponding to \(G(x)\equiv x\) yields an analog of the one sample location parameter Hodges-Lehmann estimator in the linear regression model. Similarly, \(G(x)\equiv \delta _0(x)\), the degenerate measure at zero, makes \(\hat{\theta }\) equal to the least absolute deviation (LAD) estimator.
A class of m.d. estimators when the error distribution is not symmetric and unknown is obtained by using the weighted empirical of the residual ranks defined as follows. Write \(X_i=(X_{i1}, X_{i2}, \ldots , X_{ip})', \, {i=1,\ldots , n}\). Let \(\bar{X}_j:= n^{-1}\sum _{i=1}^nX_{ij}\), \(\bar{X}:=(\bar{X}_1,\ldots , \bar{X}_p)'\) and \(X_{ic}:=X_i-\bar{X}\), \(1\le i\le n\). Let \(R_{i\vartheta }\) denote the rank of the ith residual \(Y_i-X_i'\vartheta \) among \(Y_j-X_j'\vartheta , \, j=1,\ldots , n\). Let \({\varPsi }\) be a distribution function on [0, 1] and define
Yet another m.d. estimator, when error distribution is unknown and not symmetric, is
If one reduces the model (1) to the two sample location model, then \(\hat{\theta }_c\) is the median of pairwise differences, the so called Hodges-Lehmann estimator of the two sample location parameter. Thus in general \(\hat{\theta }_c\) is an analog of this estimator in the linear regression model.
The following asymptotic normality results can be deduced from Koul [15] and [16, Sect. 5.4].
Lemma 1
Suppose the model (1) holds and \(E\Vert X\Vert ^2<\infty \).
(a). In addition, suppose \({\varSigma }_X:= E(XX')\) is positive definite and the error d.f. F is symmetric around zero and has density f. Further, suppose the following hold.
Then
(b). In addition, suppose the error d.f. F has uniformly continuous bounded density f, \({\varOmega }:= E\{(X-EX)(X-EX)'\}\) is positive definite and \({\varPsi }\) is a d.f. on [0, 1] such that \(\int \limits _0^1 f^2(F^{-1}(s)) d{\varPsi }(s)>0\). Then
(c). In addition, suppose \({\varOmega }\) is positive definite, F has square integrable density f and \(E|\varepsilon |<\infty \). Then \(n^{1/2}(\hat{\theta }_c - \theta )\rightarrow _D N\big (0, \sigma ^2_I {\varOmega }^{-1}\big )\), where \( \sigma ^2_I:= 1/12 \big (\int f^2 (x) dx\big )^2. \)
Before proceeding further we now describe some comparison of the above asymptotic variances. Let \(\sigma ^2_{LAD}:=1/(4 f^2(0))\) and \(\sigma _{LSE}^2:= \text{ Var }(\varepsilon )\) denote the factors of the asymptotic covariance matrices of the LAD and the least squares estimators, respectively. Let \(\gamma _I^2\) denote the \(\gamma _{\varPsi }^2\) when \({\varPsi }(s)\equiv s\), i.e.,
Table 1, obtained from Koul [16], gives the values of these factors for some distributions F. From this table one sees that the estimator \(\hat{\theta }_R\) corresponding to \({\varPsi }(s)\equiv s\) is asymptotically more efficient than the LAD at logistic error distribution while it is asymptotically more efficient than the Hodges-Lehmann type estimator at the double exponential and Cauchy error distributions. For these reasons it is desirable to develop analogs of \(\hat{\theta }_R\) also for the ME models.
As argued in Koul (Chap. 5, [16]), the estimators \(\{\hat{\theta }_G, \, G\,\,\text {a d.f.}\}\) are robust against heavy tails in the error distribution in the general linear regression model. The estimator \(\hat{\theta }_I\), where \(G(x)\equiv x\), not a d.f., is robust against heavy tails and also asymptotically efficient at the logistic errors.
3 Berkson ME Linear Regression Model
In this section we shall develop analogs of the above estimators in the Berkson ME linear regression model, where the response variable Y obeys the relation (1) and where, instead of observing X, one observes a surrogate Z obeying the relation
In (4), \(Z, \eta , \varepsilon \) are assumed to be mutually independent and \(E(\eta )=0\). Note that \(\eta \) is \(p\times 1\) vector of errors and its distribution need not be known.
Analog of \(\hat{\theta }\). We shall first develop and derive the asymptotic distribution of the analogs of the estimators \(\hat{\theta }\) in the Berkson ME linear regression model (1) and (4). Rewrite the model as
Because \(Z,\eta ,\varepsilon \) are mutually independent, \(\xi \) is independent of Z in (5).
Let H denote the distribution functions (d.f.) of \(\eta \). Assume that the d.f. F of \(\varepsilon \) is continuous and symmetric around zero and that H is also symmetric around zero, i.e., \(-dH(v)=dH(-v)\), for all \(v\in {\mathbb R}^p\). Then the d.f. of \(\xi \)
is also continuous and symmetric around zero. This symmetry in turn motivates the following definition of the class of m.d. estimators of \(\theta \) in the model (5), which mimics the definition of \(\hat{\theta }\) by simply replacing \(X_i\) by \(Z_i\). Define
Because L is continuous and symmetric around zero and \(\xi \) is independent of Z, \(E\widetilde{V}(x,\theta )\equiv 0\).
The following assumptions are needed for the asymptotic normality of \(\widetilde{\theta }\).
Under (6), \(n^{-1}\sum _{i=1}^nZ_iZ_i'\rightarrow _p {\varGamma }\) and \(n^{-1/2} \max _{1\le i\le n}\Vert Z_i\Vert \rightarrow _p 0\). Use these facts and argue as in Koul [15] to deduce that (2) and (6)–(9) imply
Remark 1
We shall discuss some examples and some sufficient conditions for the above assumptions. The conditions (8) and (9) are satisfied by a large class of densities f, ME distributions H and integrating measure G. If G is a d.f., then f being uniformly continuous and bounded implies these conditions. In this case \(\ell \) is also uniformly continuous, \(\sup _x\ell (x)\le \sup _x f(x)<\infty \) so that \(\int \ell ^jdG\le \sup _xf^j(x)<\infty \) and \(\int \big [\ell (y+z)-\ell (y)\big ]^j dG(y)\le \sup _{|x-y|\le z}|\ell (y)-\ell (x)|^j\rightarrow 0\), as \(z\rightarrow 0\). Moreover, here \(A\le 1\). Thus these two assumptions reduce to assuming \(\int \ell ^jdG>0\), \(j=1,2.\)
Given the importance of the two estimators corresponding to \(G(x)\equiv x,\, G(x)\equiv \delta _0(x)\), it is of interest to provide some easy to verify sufficient conditions that imply conditions (8) and (9) for these two estimators.
Consider the case \(G(x)\equiv x\). Assume f to be continuous and \(\int f^2(x)dx<\infty .\) Then because H is a d.f., \(\ell \) is also continuous and symmetric around zero and \( \int \ell (x+z)dx =\int \ell (x) dx=1\). Moreover, by the Cauchy-Schwartz (C-S) inequality and Fubini’s Theorem,
Finally, because \(\ell \in L_2\), by Theorem 9.5 in Rudin [18], it is shift continuous in \(L_2\), i.e., (8) holds. Hence all conditions of (8) are satisfied.
Next, consider (9). The assumptions \(E(\varepsilon )=0\) and \(E(\eta )=0\) imply that \(\int |x|f(x) dx<\infty \), \(\int \Vert v\Vert dH(v)<\infty \) and hence
This in turn implies (9) in the case \(G(x)\equiv x\).
To summarize, (6), (7), and F having continuous symmetric square integrable density f implies all of the above conditions needed for the asymptotic normality of the above analog of the Hodges-Lehmann estimator in the Berkson ME linear regression model. This fact is similar to the observation made in Berkson (1950) that the naive least square estimator, where one replace \(X_i\)’s by \(Z_i\)’s, continues to be consistent and asymptotically normal under the same conditions as when there is no ME. But, unlike in the no ME case, here the asymptotic variance
depends on \(\theta \). If H is degenerate at zero, i.e., if there is no ME, then \(\tau _I^2=\sigma _I^2\), the factor that appears in the asymptotic covariance matrix of the Hodges-Lehmann estimator in this case.
Next, consider the case \(G(x)\equiv \delta _0(x)\)—degenerate measure at 0. Assume f to be continuous and bounded from the above and
Then the continuity and symmetry of f implies that as \(z\rightarrow 0\),
Moreover, here \(\int \limits _0^\infty (1-L)dG=1-L(0)=1/2\) so that (9) is also satisfied.
To summarize, (6), (7), (11) and f being continuous, symmetric around zero and bounded from the above imply all the needed conditions for the asymptotic normality of the above analog of the LAD estimator in the Berkson ME linear regression model. Moreover, here
Consequently, here the asymptotic covariance matrix also depends on \(\theta \) via
In the case of no ME, \({\varGamma }^{-1}\tau _0^2\) equals the asymptotic covariance matrix of the LAD estimator. Unlike in the case of the previous estimator, here the conditions needed for f are a bit more stringent than those required for the asymptotic normality of the LAD estimator when there is no ME.
Analog of \(\hat{\theta }_R\). Here we shall describe the analogs of the class of estimators \(\hat{\theta }_R\) based on the residual ranks obtained from the model (5). These estimators do not need the errors \(\xi _i\)’s to be symmetrically distributed. Let \(\widetilde{R}_{i\vartheta }\) denote the rank of \(Y_i-Z_i'\vartheta \) among \(Y_j-Z_j'\vartheta , \, j=1,\ldots ,n\), \(\bar{Z}:= n^{-1}\sum _{i=1}^nZ_i\), \(Z_{ic}:=Z_i-\bar{Z}\), \(1\le i\le n\) and define
Use the facts \(\sum _{i=1}^nZ_{ic}=0\), \({\varPsi }(\max (a,b))=\max \{{\varPsi }(a),{\varPsi }(b)\}\) and \(\max (a,b)\) \(=2^{-1}[a+b+|a-b|]\), for any \(a,b\in {\mathbb R}\), to obtain the computational formula
The following result can be deduced from Koul [15]. Suppose \(E\Vert Z\Vert ^2<\infty \), \(\widetilde{\varGamma }:=E(Z-EZ)(Z-EZ)'\) is positive definite, density \(\ell \) of the r.v. \(\xi \) is uniformly continuous and bounded and \(\int \limits _0^1 \ell ^2(L^{-1}(s)) d{\varPsi }(s)>0\). Then \(n^{-1/2}\max _{1\le i \le n}\Vert Z_i\Vert \rightarrow _p 0,\) \( n^{-1}\sum _{i=1}^n(Z_i-\bar{Z})(Z_i-\bar{Z})' \rightarrow _p \widetilde{\varGamma }\) and
Density f of F being uniformly continuous and bounded implies the same for \(\ell (x)=\int f(x-v'\theta ) dH(v)\). It is also worth pointing out the assumptions on F, H and L needed here are relatively less stringent than those needed for the asymptotic normality of \(\widetilde{\theta }\).
Of special interest is the case \({\varPsi }(s)\equiv s\). Let \(\widetilde{\tau }^2_I\) denote the corresponding \(\widetilde{\tau }^2_{\varPsi }\). Then by the change of variable formula,
An analog of \(\hat{\theta }_c\) here is \(\widetilde{\theta }_c := \text {argmin}_{\vartheta \in {\mathbb R}^p} \widetilde{M}_c(\vartheta )\), where
Arguing as above one obtains that \( n^{1/2}\big (\widetilde{\theta }_c - \theta \big ) \rightarrow _D N\big (0, \tau ^2_I\widetilde{\varGamma }^{-1}\big ). \)
4 Nonlinear Regression with Berkson ME
In this section we shall investigate the analogs of the above m.d. estimators in nonlinear regression models with Berkson ME.
Let \(q\ge 1, p\ge 1\) be known positive integers, \({\varTheta }\subseteq {\mathbb R}^q\) be a subset of the q-dimensional Euclidean space \({\mathbb R}^q\) and consider the model where the unobservable p-dimensional covariate X, its observable surrogate Z and the response variable Y obey the relations
for some \(\theta \in {\varTheta }\). Here \(m_\vartheta (x)\) is a known parametric function, nonlinear in x, from \({\varTheta }\times {\mathbb R}^p\) to \({\mathbb R}\) with \(E|m_\vartheta (X)|<\infty \), for all \(\vartheta \in {\varTheta }\). The r.v.’s \(\varepsilon , Z, \eta \) are assumed to be mutually independent, \(E\varepsilon =0\) and \(E\eta =0\). Unlike in the linear case, here we need to assume that the d.f. H of \(\eta \) is known. See Sect. 5 for the unknown H case.
Fix a \(\theta \) for which (12) holds. Let \(\nu _\vartheta (z):= E(m_\vartheta (X)|Z=z)\), \(\vartheta \in {\mathbb R}^q, z\in {\mathbb R}^p\). Under (12), \(E(Y|Z=z)\equiv \nu _\theta (z)\). Moreover, because H is known,
is a known parametric regression function. Thus, under (12), we have the regression model
Unlike in the linear case, the error \(\zeta \) is no longer independent of Z in general.
To proceed further we assume there is a vector of p functions \(\dot{m}_\vartheta (x)\) such that, with \(\dot{\nu }_\vartheta (z):= \int \dot{m}_\vartheta (z+s) dH(s),\) for every \(0<b<\infty \),
Let
Assume the following. For every \(z\in {\mathbb R}^p\),
Let G be as before and define
In the case \( q=p\) and \( m_\theta (x)=x'\theta \), \(\widehat{\theta }\) agrees with \(\widetilde{\theta }\). Thus the class of estimators \(\widehat{\theta }\), one for each G, is an extension of the class of estimators \(\widetilde{\theta }\) from the linear case to the above nonlinear case.
Next, consider the extension of \(\hat{\theta }_R\) to the above nonlinear model (12). Let \(S_{i\vartheta }\) denote the rank of \(Y_i-\nu _\vartheta (Z_i)\) among \(Y_j-\nu _\vartheta (Z_j)\), \(j=1, \ldots , n\) and define
The estimator \(\widehat{\theta }_R\) gives an analog of \(\hat{\theta }_R\) in the present set up.
Our goal here is to prove the asymptotic normality of \(\widehat{\theta }, \, \widehat{\theta }_R\). This will be done by following the general method of Sect. 5.4 of Koul [16]. This method requires the two steps. In the first step we need to show that the defining dispersions \(D(\vartheta )\) and \(\mathcal{K}(\vartheta )\) are AULQ (asymptotically uniformly locally quadratic) in \(\vartheta -\theta \) for \(\vartheta \in \mathcal{N}_n(b):=\{\vartheta \in {\varTheta }, n^{1/2}\Vert \vartheta -\theta \Vert \le b\}\), for every \(0<b<\infty \). The second step requires to show that \(n^{1/2}\Vert \widehat{\theta }- \theta \Vert =O_p(1)=n^{1/2}\Vert \widehat{\theta }_R - \theta \Vert . \)
4.1 Asymptotic Distribution of \(\widehat{\theta }\)
In this subsection we shall derive the asymptotic normality of \(\widehat{\theta }\). To state the needed assumptions for achieving this goal we need some more notation. Let \(\nu _{nt}(z):= \nu _{\theta +n^{-1/2}t}(z),\, \xi _{it}:=\nu _{nt}(Z_i)-\nu _\theta (Z_i),\, 1\le i\le n,\) \( \dot{\nu }_{nt}(z):= \dot{\nu }_{\theta +n^{-1/2}t}(z),\) and \(\dot{\nu }_{ntj}(z)\) denote the jth coordinate of \(\dot{\nu }_{nt}(z)\), \(1\le j\le q, t\in {\mathbb R}^q\). For any real number a, let \(a^\pm =\max (0,\pm a)\) so that \(a=a^+-a^-\). Also, let \( \beta _i(x):= I(\zeta _i\le x) - L_{Z_i}(x) \) and \(\alpha _i(x,t):= I(\zeta _i\le x+ \xi _{it}) - I(\zeta _i\le x) - L_{Z_i}(x+\xi _{it})+ L_{Z_i}(x) . \)
Because \(dG(x)\equiv -dG(-x)\) and \(U(x,\vartheta )\equiv U(-x, \vartheta ),\) we have
We are now ready to state our assumptions.
For every \(\epsilon >0\) there is a \(\delta >0\) and \(N_\epsilon <\infty \) such that \(\forall \,\Vert s\Vert \le b, n>N_\epsilon \),
For every \(\epsilon>0, \alpha >0\) there exists \(N\equiv N_{\alpha ,\varepsilon }\) and \(b\equiv b_{\alpha ,\epsilon }\) such that
From now onwards we shall write \(\nu \) and \(\dot{\nu }\) for \(\nu _\theta \) and \(\dot{\nu }_\theta \), respectively.
Remark 2
We shall now discuss the above assumptions when \(m_\vartheta (x)=\vartheta 'h(x),\) where \(h=(h_1,\ldots ,h_q)'\) is a vector of q function on \({\mathbb R}^p\) with \(E\Vert h(X)\Vert ^2<\infty \), first for general G and then for some special cases of G. An example of this is the polynomial regression model with Berkson ME, where \(p=1, h_j(x)=x^j, j=1,\ldots ,q\). Let \(\beta (z):= E(h(X)|Z=z)\). Then \(\nu _\vartheta (z)=\vartheta '\beta (z)\) and \(\dot{\nu }_\vartheta (z)\equiv \beta (z)\), a constant in \(\vartheta \). Therefore (13), (14), (18), (23) and (24) are all vacuously satisfied. The condition (25) also holds here, in a similar way as in the linear regression model, cf., Koul [16, Proof of Lemma 5.5.4, pp. 183–185]. Direct calculations show that (26)–(29) below imply the remaining assumptions (17), (19), (21) and (22), respectively.
Consider further the case \(G(x)\equiv x\). Let \(\sigma :=(E\varepsilon ^2)^{1/2}\). Assume
Then \(E\Vert \beta (Z)\Vert ^j\le E\Vert h(X)\Vert ^j<\infty , \, j=1,2,3\). Let \(\gamma (z):= 2\Vert \theta \Vert \Vert \beta (z)\Vert +\sigma \). Then
Hence
thereby showing that (26) is satisfied. The assumption (30)(b) and \(\ell _z(x)\) being a density in x for each z and Theorem 9.5 of Rudin [18] readily imply (27) here. The left hand side of (28) equals \(2n^{-1/2}bE\big (\Vert \beta (Z)\Vert ^3\big )\rightarrow 0\), by (30)(a).
Next, consider the case \(G(x)=\delta _0(x)\)- measure degenerate at zero. Assume
Then the left hand side of (26) equals \((1/2)E\Vert \beta (Z)\Vert ^2<E\Vert h(X)\Vert ^2<\infty \). Condition (27) is trivially satisfied and the left hand side of (28) equals
by the DCT and the continuity of \(L_z(\cdot )\), for each z.
To summarize, in the case \(m_\vartheta (x)=\vartheta 'h(X)\) and \(G(x)\equiv x\), assumptions (30)(a), (b) and \(\int \mathcal{B}(x)\mathcal{B}(x)' dx\) being positive definite imply all of the above assumptions (13), (14) and (17)–(25). Similarly, in the case \(m_\vartheta (x)=\vartheta 'h(X)\) and \(G(x)\equiv \delta _0(x)\), \(E\Vert h(X)\Vert ^2<\infty \), (32) and \( \mathcal{B}(0)\mathcal{B}(0)'\) being positive definite imply all these conditions.
Remark 3
Because of the importance of the estimators \(\widehat{\theta }\) when \(G(x)=x\), and \(G(x)=\delta _0(x)\), it is of interest to give some simple sufficient conditions for a general \(m_\vartheta \) that imply the given assumptions for these two estimators.
Suppose G satisfies \(dG(x)\equiv g(x) dx\), where \(g_\infty :=\sup _{x\in {\mathbb R}}g(x)<\infty \). Note that \(G(x)\equiv x\) corresponds to the case \(g(x)\equiv 1\). Consider the following assumptions.
Because \(\big \Vert \dot{\nu }(Z)\big \Vert ^j \le E\big ( \Vert \dot{m}_\theta (X) \big \Vert ^j\big | Z\big )\), \(E \big \Vert \dot{\nu }(Z)\big \Vert ^j \le E \big \Vert \dot{m}_\theta (X)\big \Vert ^j <\infty , \, j=1,2,3,4\), by (33)(a). Similarly, for every \(t\in {\mathbb R}^q\),
Next, similar to (31), \(E\big (|\zeta | \big | Z\big )=E\big (|Y-\nu (Z)|\big | Z\big ) \le 2|\nu (Z)|+\sigma \) implies that the left hand side of (17) is bounded from the above by
by (33)(a), thereby verifying (17) here. Similarly, with C denoting the above upper bound, for every \(t\in {\mathbb R}^q\), the left hand side of (18) is bounded from the above \( C E\Big (\Vert \dot{\nu }_{nt}(Z) - \dot{\nu }_\theta (Z)\Vert ^2\Big ) \rightarrow 0, \) by (36). The left hand side of (21) is bounded from the above by \( 2 g_\infty E\big ( \Vert \dot{\nu }_{nt}(Z)\Vert ^2 |\nu _{nt}(Z)-\nu (Z)|\big ) \rightarrow 0,\) by (35).
In other words, in the case G has bounded Lebesgue density, conditions (33)–(35) imply assumptions (14), (17), (18), (19), (20), and (21). Not much simplification occurs in the remaining assumptions (18) and (22)–(25). See Remark 2 for some special cases.
Next consider the case when \(G(x)=\delta _0(x)\) and the following assumptions.
In this case (33), (35), (37) and (38) together imply the assumptions (14), (17)–(22). Not much simplification occurs in the remaining three assumptions (23)–(25), except in some special cases as in Remark 2.
We now resume the discussion about the asymptotic normality of \(\widehat{\theta }\). First, we show that \(E(D(\theta ))<\infty \), so that by the Markov inequality, \(D(\theta )\) is bounded in probability. To see this, by (15), \(EU(x,\theta )\equiv 0\) and, for \(x\ge 0\),
By the Fubini Theorem, (16) and (17),
To state the AULQ result for D, we need some more notation. Let
where \({\varGamma }_\theta (x)\) and \({\varOmega }_\theta \) are as in (22). We are ready to state the following lemma.
Lemma 2
Suppose the above set up and assumptions (17)– (24) hold. Then for every \(b<\infty \),
If in addition (25) holds, then, with \({\varSigma }_\theta \) given at (45) below,
Proof
The proof of (41) appears in Sect. 6. The proof of the claim (42)(a), which uses (25), (39) and (41), is similar to that of Theorem 5.4.1 of Koul [16], where (25) and (39) are used to show that \(n^{1/2}\Vert \widehat{\theta }-\theta \Vert =O_p(1)\).
Define, for \(y\in {\mathbb R},\,u\in {\mathbb R}^p,\)
By (19), \(0<\psi _u(y)\le \psi _u(\infty )=\int \limits _{-\infty }^\infty \ell _u(x)dG(x)<\infty ,\) for all \(u\in {\mathbb R}^p\). Thus for each u, \(\psi _u(y)\) is an increasing continuous bounded function of y and \(\psi _u(-y) \equiv \psi _u(\infty )-\psi _u(y)\), and \(\varphi _u(y)= \psi _u(\infty )-2\psi _u(y)\), for all \(y\in {\mathbb R}\).
By (15), \(E(\varphi _u(\zeta ) | Z=z)=0\), for all \(u, z\in {\mathbb R}^p\). Let
Next let \(\mu (z):=\dot{\nu }(z) \dot{\nu }(z)'\), Q denote the d.f. of Z and rewrite \({\varGamma }_\theta (x)=E\big (\dot{\nu }_\theta (Z) \dot{\nu }_\theta (Z)'\ell _Z(x)\big )\) \(=\int \mu (z) \ell _z(x) dQ(z)\). By the Fubini Theorem,
Clearly, \(ET_n=0\) and by the Fubini Theorem, the covariance matrix of \(T_n\) is
Thus \(T_n\) is a \(p\times 1\) vector of independent centered finite variance r.v.’s. By the classical CLT, \(T_n\rightarrow _D N(0,{\varSigma }_\theta )\). Hence, the minimizer \(\widetilde{t}\) of the approximating quadratic form \(D(\theta ) + 4 T_n t+ 4 t' {\varOmega }_\theta t\) with respect to t satisfies \( \tilde{t}= -{\varOmega }_\theta ^{-1} T_n/2 \rightarrow _D N\big (0,4^{-1}{\varOmega }_\theta ^{-1}{\varSigma }_\theta {\varOmega }_\theta ^{-1}\big ). \) The claim (42)(b) now follows from this result and (42)(a). \(\Box \)
4.2 Asymptotic Distribution of \(\widehat{\theta }_R\)
In this subsection we shall establish the asymptotic normality of \(\widehat{\theta }_R\). For this we need the following assumptions, where \(\mathcal{U}(b):=\{t\in {\mathbb R}^q; \Vert t\Vert \le b\}\), and \(0<b<\infty \).
\(\forall \,\epsilon>0, \, \exists \, \delta >0\) and \(n_\epsilon <\infty \) such that for each \(s\in \mathcal{U}(b)\), \(\forall \, n>n_\epsilon \),
\(\forall \,\epsilon >0, \, 0<\alpha <\infty ,\, \exists \, N\equiv N_{\alpha ,\epsilon }\) and \(b\equiv b_{\epsilon ,\alpha }\) such that
Let
We need to have an alternate representation of the covariance matrix of \(\widehat{T}_n\). Let, for \(z\in {\mathbb R}^p, \quad 0\le v\le 1\),
By (46), \(\kappa _z\) is a uniformly continuous increasing and bounded function on [0, 1], for all \(z\in {\mathbb R}^p\). Let U denote a uniform [0, 1] r.v. Conditionally, given Z, \(L_Z(\zeta )\sim _D U\). Hence, \(E\big (k_z\big (L_Z(\zeta )\big )\big |Z\big )=E k_z(U)\) so that \(E\big (k_z^c(L_Z(\zeta ))\big |Z\big )=Ek_z^c(U)=0,\) a.s. Let \(\mu ^c(z):= \dot{\nu }^c(z) \dot{\nu }^c(z)'\). Argue as for (44) and use the facts that \(\sum _{i=1}^n\dot{\nu }_c(Z_i)\equiv 0\) and \(\int \limits _0^1 u dk_z(u)= k_z(1) - \int \nolimits _0^1 k_z(u) du\) to obtain that
Define
Then argue as in (45) to obtain
We are now ready to state the following asymptotic normality result for \(\widehat{\theta }_R\).
Lemma 3
Suppose the nonlinear Berkson measurement error model (12) and the assumptions (13), (14), (46)–(49) hold. Then the following holds.
In addition, if (50) holds and \(\widehat{\varOmega }_\theta \) is positive definite then \( n^{1/2}(\widehat{\theta }_R-\theta ) \rightarrow _d N\big (0, \widehat{\varOmega }_\theta ^{-1}\widehat{\varSigma }_\theta \widehat{\varOmega }_\theta ^{-1}\big ). \)
The proof of this lemma is similar to that of Theorem 1.2 of Koul [15], hence no details are given here. Assumption (50) is used to show that \(n^{1/2}\Vert \widehat{\theta }_R-\theta \Vert =O_p(1).\)
Remark 4
As in Remark 1, let \(m_\theta (x)=\theta ' h(x)\). Then \(\nu _\vartheta (z)= \vartheta '\beta (z)\), where \(\beta (z):= E\big (h(X)|Z=z\big )\). Thus \(\dot{\nu }_\vartheta (z)\equiv \beta (z)\) and the assumptions (47)–(49) are vacuously satisfied. The assumption (50) is shown to be satisfied by an argument similar to the one used in the proof of Lemma 5.4.4 of Koul [16, pp. 183–185]. This proof uses the monotonicity in t for every unit vector \(e\in {\mathbb R}^p\) of simple linear rank statistics based on the ranks of \(Y_i-te'h(X_i)\), \(1\le i\le n\), see Hájek [10, Theorem II.7E].
For the asymptotic normality of \(\widehat{\theta }_R\) here, one only needs (46) and \({\varPsi }\) to be a d.f. such that \(\widehat{\varOmega }\) is positive definite. Note that here \(\mu ^c(z)=\beta ^c(z):=\beta (z) - \bar{\beta }\), \(\bar{\beta }:= n^{-1}\sum _{i=1}^n\beta (Z_i)\) and
do not depend on \(\theta \). Clearly, these assumptions are far less stringent than those needed for the asymptotic normality of \(\widehat{\theta }\) corresponding to \(G(x)\equiv x\).
5 M.D. Estimators with Validation Data
In this section we develop the m.d. estimators of Sect. 4 when the d.f. H of the Berkson ME \(\eta \) is unknown but a validation data set is available. Not knowing H renders \(\nu _\theta \) to be an unknown function. Validation data is used to estimate this function, which in turn is used to define m.d. estimators.
Let N be a known positive integer. A set of r.v.’s \(\{(\tilde{X}_k, \tilde{Z}_k), k = 1, ...,N\}\) is said to be validation data if these r.v.’s are independent of the original sample and both \(\tilde{Z}_k\) and \(\tilde{X}_k\) are observable and obey the model (12). Besides having the primary data set \(\{(Y_i,Z_i), 1 \le i \le n\}\), we assume that a validation data set of the covariate \(\{(\tilde{X}_k, \tilde{Z}_k), 1\le k \le N\}\) is available. Then \(\tilde{\eta }_k := \tilde{X}_k - \tilde{Z}_k, 1 \le k \le N\) are observable and their empirical d.f. \(H_N(s) := N^{-1}\sum _{k=1}^{N} I(\tilde{\eta }_k \le s), s \in \mathbb {R}\), provides an estimate of H.
Under (13)–(15), we have the following estimates of \(\nu _{\theta }\) and \(\dot{\nu }_\theta \).
An analog of \(\widehat{\theta }\) in the current set up is defined as follows. Let
To define the analog of \(\hat{\theta }_R\) here, let \(\tilde{S}_{i\vartheta }\) be the rank of \(Y_i - \hat{\nu }_\vartheta (Z_i)\) among \(Y_j - \hat{\nu }_\vartheta (Z_j), 1\le j\le n\) and define
The asymptotic distributions of \(\widehat{\theta }_1\) and \(\tilde{\theta }_R\) as \(n\wedge N\rightarrow \infty \) are described in the next two subsections. In their derivations, the lim(n/N) of the ratio n/N plays an important role. Some of the proofs are similar to those of \(\widehat{\theta }\) and \(\widehat{\theta }_R\). Some key steps of the proof can be found in the Appendix.
5.1 Asymptotic Distribution of \(\widehat{\theta }_1\)
In this subsection we derive the asymptotic distribution of \(\widehat{\theta }_1\). In addition to (13)–(15) and (17)–(25), the following assumptions are needed, where \({\varDelta }_\vartheta (z) := \hat{\nu }_\vartheta (z) - \nu _\vartheta (z)\) and \(\theta \) is as in (12).
We also assume that (18)–(24) and (25) hold with \(\dot{\nu }_{nt}\), \(\dot{\nu }_{\theta }\) and D replaced by \(\hat{\dot{\nu }}_{nt}\), \(\hat{\dot{\nu }}\) and \(D_1\), respectively. We denote these assumptions as (18)\(^*\)–(25)\(^*\).
Here we discuss some sufficient conditions for the above assumptions. By the C-S inequality, both the expressions of (52) are bounded from the above by \(2 E\big \Vert \dot{m}_\theta (X)\big \Vert ^2 E\big |m_\theta (X)\big |^2\). Thus (52) is implied by (33)(a) and having \(E\big |m_\theta (X)\big |^2<\infty \).
Next, under (33)(a), (57) is trivially satisfied when \(G(x)\equiv \delta _0(x)\). In the case \(dG(x)=g(x)dx\) with \(g_\infty :=\sup _{y\in {\mathbb R}}g(y)<\infty \), (57) is implied by (33)(a) and the following conditions.
To see this, note that \(E\big (|{\varDelta }_\theta (Z)|\big |Z\big )\le 2 E\big (|m_\theta (X)|\big | Z\big )\), and \(\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \le N^{-1}\sum _{k=1}^N\Vert \dot{m}_\theta (Z+\eta _k)\Vert ^2\) so that \(E\Vert \hat{\dot{\nu }}_\theta (Z)\Vert ^2 \le E\Vert \dot{m}_\theta (X)\Vert ^2\). Now argue as in Remark 4.2 and use these facts to obtain that the left hand side of (57) is bounded from the above by
by (33)(a) and (59). Similarly, the left hand side of (58) is bounded from the above by a constant multiple of
We now turn to proving the asymptotic normality of \(\widehat{\theta }_1\). Similar to Sect. 4.1, we first prove that \(E(D_1(\theta ))<\infty \). Recall \({\varDelta }_\vartheta (z) := \hat{\nu }_\vartheta (z) - \nu _\vartheta (z)\) and rewrite
By the independence of the primary and validation data and a conditioning argument, for every \(x>0\),
Hence by (56), \(ED_1(\theta )<\infty \).
Next we sketch the proof of the AULQ property of \(D_1(\vartheta )\). Define
In the Appendix, we show that \(\tilde{T}_n\) is approximated by a U-statistic based on the two independent samples. Theorem 6.1.4 in Lehmann [17] yields
Next, the assumptions (54)–(58) and (18)\(^*\)–(24)\(^*\) ensure that the analog of Lemma 5 holds here also. Hence (41) with \(T_n\) and \(D(\vartheta )\) replaced by \(\widetilde{T}_n\) and \(D_1(\vartheta )\), respectively, holds. Moreover, analog of (42) can be shown to hold in a similar manner as in Sect. 4 under (25)\(^*\). Consequently, the asymptotic distribution of \(\widehat{\theta }_1\) based on data sets \(\{(Y_i,Z_i),1\le i\le n\}\) and \(\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}\) described in the following lemma.
Lemma 4
Suppose model (12) with H unknown holds and an independent validation data \(\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}\) obeying (12) is available. In addition assume that (17)–(25) and (52)–(57) hold. Then
The above result shows that the estimation step of regression function \(\nu _\theta (z)\) due to the unknown distribution H introduces more variation in the asymptotic distribution of the m.d. estimators. Moreover, the limiting ratio \(\lambda \) of the sample sizes plays a role in the additional variation. When \(\lambda = \lim n/N = 0\), the additional covariance term vanishes, therefore it reduces to the case when the ME distribution is known. In other words, when the validation sample size N is sufficiently large, compared to the primary sample size n, both \(\hat{\theta }\) and \(\widehat{\theta }_1\) achieve the same asymptotic efficiency. On the other hand, when \(\lambda = \infty \), i.e., when the validation data size is very limited compared to the primary data size, the estimation consistency rate is restricted to \(\sqrt{N}\) instead of \(\sqrt{n}\).
5.2 Asymptotic Distribution of \(\tilde{\theta }_R\)
In this subsection we present the asymptotic distribution of the class of estimators \(\tilde{\theta }_R\). First, we provide the additional assumptions. Let \(\hat{\dot{\nu }}_{nt}(z) = N^{-1}\sum _{k=1}^N \dot{\nu }_{\theta +n^{-1/2}t}(z+\tilde{\eta }_k)\). Consider the following assumptions.
\(\forall \,\epsilon> 0,\,\,\exists \,\,\delta > 0\) and \(n_\epsilon < \infty \) such that for each \(s \in \mathcal{U}(b)\), \(\forall \,n>n_\epsilon \),
For every \(\epsilon >0, \, 0<\alpha <\infty \), there exist an \(N_\epsilon \) and \(b\equiv b_{\epsilon ,\alpha }\) such that
Next, define \(\tilde{\dot{\nu }} := n^{-1} \sum _{i=1}^n\hat{\dot{\nu }}(Z_i), \qquad \tilde{\dot{\nu }}^c (Z_i) := \hat{\dot{\nu }}(Z_i) - \tilde{\dot{\nu }},\)
where \(\widehat{\varGamma }_\theta \) and \(\widehat{\varOmega }_\theta \) are defined in Sect. 4.2. Similar to Lemma 4. we have the following lemma.
Lemma 5
Suppose model (12) with H unknown holds and an independent validation data \(\{(\tilde{X}_k,\tilde{Z}_k),1\le k\le N\}\) obeying (12) is available. In addition assume that (54), (55), (61)–(65) hold. Then, for \(0\le \lambda < \infty \),
Moreover, \( N^{1/2}(\tilde{\theta }_R - \theta ) \rightarrow N(0,\widehat{\varOmega }_\theta ^{-1}{\varSigma }_2 \widehat{\varOmega }_\theta ^{-1}) \), for \(\lambda =\infty \).
See Appendix for some details of the proof.
6 Data Analysis
Example. We shall now compute the above estimators based on some real data. The data pertains to the study of the relationship between the enzyme reaction speed (Y) and the basal density (X) of the UDP-galactose, see Bates and Watts [1], p. 70. A suitable model commonly used to analyze this data is the Michaelis-Menten model
In the primary data, consisting of \(n=30\) observations, the basal density variable was measured using a simple chemical method. It was believed that this method caused measurement error in the observation. Hence, in the validation data, consisting of \(N=10\) observations, an expensive procedure with a precision machine tool was used to produce precise observations of the basal density. Let Z denote the basal-density obtained by the chemical method parts per millions (ppm), \(\widetilde{Z}\) denote the basal-density obtained by the exact measure (ppm) and Y, the reaction speed (counts/min\(^2\)). The primary and validation data are as follows. Table 2 gives the m.d. estimators \(\hat{\theta }_1\) with \(G(x)=x\) and \(\tilde{\theta }_R\) with \({\varPsi }(u) = u\), based on the above primary and validation data, and the naive least squares estimators \(\hat{\theta }_\mathrm{nLS}\) obtained by ignoring measurement errors. The MSEs are calculated by using the following formulas, where \(\widetilde{\eta }_k=\widetilde{X}_k-\widetilde{Z}_k\), \(MSE(\hat{\theta }_1) = \frac{1}{n} \sum _{i=1}^n \Big [Y_i - \frac{1}{N}\sum _{k=1}^N m_{\hat{\theta }_1}(Z_i+\widetilde{\eta }_k)\Big ]^2,\) \(MSE(\tilde{\theta }_R) = \frac{1}{n} \sum _{i=1}^n \Big [ Y_i - \frac{1}{N}\sum _{k=1}^N m_{\tilde{\theta }_R}(Z_i+\widetilde{\eta }_k)\Big ]^2\) and \(MSE(\hat{\theta }_\mathrm{nLS}) = \frac{1}{n} \sum _{i=1}^n \Big [Y_i - m_{\hat{\theta }_\mathrm{nLS}}(Z_i)\Big ]^2.\) Figure 1 presents the fitted regression curves using the three estimators.
Z | 0.02 | 0.02 | 0.04 | 0.04 | 0.06 | 0.06 | 0.08 | 0.08 | 0.11 | 0.11 | 0.14 | 0.14 | 0.18 | 0.18 | 0.22 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Y | 76 | 47 | 82 | 95 | 97 | 107 | 118 | 127 | 123 | 139 | 146 | 149 | 157 | 151 | 159 |
Z | 0.42 | 0.42 | 0.56 | 0.56 | 0.66 | 0.66 | 0.86 | 0.86 | 1.10 | 1.10 | 0.22 | 0.28 | 0.28 | 0.34 | 0.34 |
Y | 185 | 189 | 191 | 192 | 193 | 196 | 198 | 202 | 207 | 2.04 | 152 | 173 | 180 | 179 | 182 |
\(\widetilde{Z}\) | 0.04 | 0.07 | 0.20 | 0.30 | 0.38 | 0.48 | 0.60 | 0.76 | 0.95 | 1.110 |
---|---|---|---|---|---|---|---|---|---|---|
\(\widetilde{X}\) | 0.035 | 0.076 | 0.207 | 0.295 | 0.388 | 0.486 | 0.601 | 0.754 | 0.952 | 1.112 |
References
Bates, D.M., Watts, D.G.: Nonlinear Regression Analysis and Its Applications. Wiley, New York (1998)
Beran, R.J.: Minimum Helinger distance estimates for parametric models. Ann. Statist. 5, 445–463 (1977)
Beran, R.J.: An efficient and robust adaptive estimator of location. Ann. Statist. 6, 292–313 (1978)
Berkson, J.: Are these two regressions? J. Amer. Statist. Assoc. 5, 164–180 (1950)
Carroll, R.J., Ruppert, D., Stefanski, L.A., Crainiceanu, C.P.: Measurement Error in Nonlinear Models: A Modern Perspective, 2nd edn. Chapman & Hall/CRC, Bota Raton, FL (2006)
Cheng, C.L., Van Ness, J.W.: Statistical Regression with Measurement Error. Wiley, New York (1999)
Donoho, D.L., Liu, R.C.: Pathologies of some minimum distance estimators. Ann. Statist. 16, 587–608 (1988a)
Donoho, D.L., Liu, R.C.: The “automatic” robustness of minimum distance functionals. Ann. Statist. 16, 552–586 (1988b)
Fuller, W.A.: Measurement Error Models. Wiley, New York (1987)
Hájek, J.: Nonparametric Statistics. Holden Day, San Francisco, USA (1969)
Hodges Jr., J.L., Lehmann, E.L.: Estimates of location based on rank tests. Ann. Math. Statist. 34, 598–611 (1963)
Koul, Hira L.: Weighted empirical processes and the regression model. J. Indian Statist. Assoc. 17, 83–91 (1979)
Koul, Hira L.: Minimum distance estimation in multiple linear regression. Sankhya Ser. A., 47. Part 1, 57–74 (1985a)
Koul, Hira L.: Minimum distance estimation in linear regression with unknown errors. Statist. Prob. Lett. 3, 1–8 (1985b)
Koul, Hira L.: Asymptotics of some estimators and sequential residual empiricals in non-linear time series. Ann. Statist. 24, 380–404 (1996)
Koul, H.L.: Weighted Empirical Processes in Dynamic Nonlinear Models. Lecture Notes Series in Statistics, 2nd edn., vol. 166. Springer, New York (2002)
Lehmann, E.L.: Elements of Large-Sample Theory. Springer, New York, N.Y., USA (1999)
Rudin, W.: Real and Complex Analysis, 2nd edn. McGraw-Hill, New York (1974)
Yi, G.: Statistical analysis with measurement error or misclassification. Strategy, method and application. With a foreword by Raymond J. Carroll. In: Springer Series in Statistics. Springer, New York (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
This section contains some details of the proofs of the various results.
Proof of (41). Let \(\widetilde{M}(t)= \widetilde{D}(\theta +n^{-1/2}t)\), where \(\widetilde{D}(\vartheta )\) is as in (16). Define
Note that \(EV_s(x,t)\equiv E J_s(x,t), \, EW_s(x,t)\equiv 0\). By (15), \(\forall \, s\in {\mathbb R}^q, x\in {\mathbb R}\),
Define
Because of (15), \(\gamma _{nt}(x)\equiv \gamma _{nt}(-x),\, g_n(x)\equiv g_n(-x)\) and we rewrite
Expand the quadratic of the six summands in the integrand to obtain
where
Recall \(\mathcal{U}(b):=\{ t\in {\mathbb R}^q; \Vert t\Vert \le b\}\), \(b>0\). We shall prove the following lemma shortly.
Lemma 6
Under the assumptions (13) to (18), \(\forall \, 0<b<\infty \),
Unless mentioned otherwise, all the supremum below are taken over \(t\in \mathcal{U}(b)\). Lemma 6 together with the C-S inequality implies that the supremum over t of all the cross product terms tends to zero, in probability. For example, by the C-S inequality,
by (67) used with \(j=1, 5\). Similarly, by (67) with \(j=1\) and (68),
Consequently, we obtain
Expand the quadratic in \(M_8\) to write
where \( \widetilde{T}_n:=\int \limits _0^\infty g_n(x) \big \{ W(x,0)+W(-x,0)\big \} dG(x). \) Let
By the LLNs and an Extended Dominated Convergence Theorem
Moreover, recall \(\widetilde{M}(0)=\widetilde{D}(\theta )\), so that by (39), \(\widetilde{M}(0)=O_p(1)\). These facts together with the C-S inequality imply that
These facts combined with (22), (69), (70) yield that
Now recall that \(D(\vartheta )=2\widetilde{D}(\vartheta )\), \(\widetilde{M}(t)= \widetilde{D}(\theta +n^{-1/2}t)\), \({\varOmega }_\theta =2\int \limits _0^\infty {\varGamma }_\theta {\varGamma }_\theta dG\) and \(T_n =2T_n^*\), see (40). Hence the above expansion is equivalent to
which is precisely the claim (41). \(\square \)
Proof of Lemma 6. Let \(\delta _{it}:= \xi _{it}-n^{-1/2}t'\dot{\nu }(Z_i).\) By (13) and (14),
Hence,
by (14). Moreover, by (14) and the Law of Large Numbers,
These facts will be use in the sequel.
Consider the term \(M_7\). Write
Hence
where
Similarly, by the C-S inequality,
by (18) and (19). Again, by (19) and (71),
These facts prove (67) for \(j=7\).
Next consider \(M_5\). Let \(D_{it}(x):=L_{Z_i}(x +\xi _{it}) - L_{Z_i}(x) -\xi _{it}\ell _{Z_i}( x)\). Then
By the C-S inequality, Fubini Theorem, (20) and (73),
Upon combining these facts with (75) we obtain \(\sup _t M_5(t)=o_p(1)\), thereby proving (67) for \(j=5\). The proof for \(j=6\) is exactly similar.
Now consider \(M_1\). Let \(\xi _t(Z):= \nu _{nt}(Z)- \nu (Z)\). Then
by (21). Thus
To prove that this holds uniformly in \(t\in \mathcal{U}(b)\), because of the compactness of the ball \(\mathcal{U}(b)\), it suffices to show that for every \(\epsilon >0\) there is a \(\delta >0\) and an \(N_\epsilon \) such that for every \(s\in \mathcal{U}(b)\),
Let \(\dot{\nu }_{ntj}(z)\) denote the jth coordinate of \(\dot{\nu }_{nt}(z)\), \(j=1,\ldots , q\) and let
Then
Thus it suffices to prove (77) with \(M_1\) replaced by \(M_{1j}\) for each \(j=1,\ldots , q\).
Any real number a can be written as \(a=a^+-a^-\), where \(a^+=\max (0,a), a^-= \max (0,-a)\). Note that \(a^\pm \ge 0\). Fix a \(j=1,\ldots ,q\), write \(\dot{\nu }_{ntj}(Z_i)=\dot{\nu }_{ntj}^+(Z_i) - \dot{\nu }_{ntj}^- (Z_i)\) and define
Then
Write
Hence
By (23), the first term here satisfies (77). We proceed to verify it for the second term. Fix an \(s\in \mathcal{U}_b\), \(\epsilon >0\) and \(\delta >0\). Let
By (18), there exists an \(N_\epsilon \) such that \(P(B_n)>1-\epsilon \), for all \(n>N_\epsilon \). On \(B_n\), \(\xi _{is}-{\varDelta }_{ni}\le \xi _{it}\le \xi _{is}+ {\varDelta }_{ni}\) and, by the nondecreasing property of the indicator function and d.f., we obtain
Let
The above inequalities and \(\dot{\nu }_{njs}^+(Z_i)\) being nonnegative yield that on \(B_n\),
Note that \(\max _{1\le i\le n}(|\xi _{is}|+{\varDelta }_{ni})=o_p(1)\). Argue as for (76) to see that the first two terms in the above bound are \(o_p(1)\), while the last term is bounded from the above by
The first summand in the above bound is bounded above by
because the first factor tends to zero in probability by (20) and the second factor satisfies
The second term in the upper bound of (80) is bounded from the above by
Since the factor multiplying \(\delta ^2\) is positive, the above term can be made smaller than \(\epsilon \) by the choice of \(\delta \). Hence (77) is satisfied by the second term in the upper bound of (79). This then completes the proof of \(R_j^+\) satisfying (77). The details of the proof for verifying (77) for \(R_j^-\) are exactly similar. These facts together with the upper bound of (78) show that (77) is satisfied by \(M_{1j}\) for each \(j=1,\ldots , q\). This also completes the proof of \(\sup _t M_1(t)=o_p(1),\) thereby proving (67) for \(j=1\). The proof for \(j=3\) is similar.
Next, consider \(M_2\). Recall \(\beta _i(x):= I(\zeta _i\le x)-L_{Z_i}(x)\). Then
Because \(E(\beta _i(x)|Z_i)\equiv 0\), a.s., we have
by (18). Thus
To prove this holds uniformly in \(t\in \mathcal{U}(b)\), we shall verify (77) for \(M_2\). Accordingly, let \(\delta >0\), \(s\in \mathcal{U}(b)\) be fixed. Then forall \(t\in \mathcal{U}(b)\) such that \(\Vert t-s\Vert <\delta \),
This bound, (24) and (81) now readily verifies (77) for \(M_2\), which also completes the proof of (67) for \(j=2\). The proof of (67) for \(j=4\) is precisely similar. This in turn completes the proof of Lemma 6. \(\square \)
Proof of (60). Recall (43). Let \(D(z,x):= m_\theta (z + x)-\nu _\theta (z)\). Use the fact \(\widehat{U}(x,\theta ) = \tilde{W}(x,0) + \tilde{W}(-x,0)\), to rewrite
where \(\varphi _z(x)\) is defined as in Sect. 4.1.
For further analysis of \(\tilde{T}_n\), with the two independent samples \(\{(Z_i,\zeta _i),1\le i \le n\}\) and \(\{\tilde{\eta }_k, 1\le k \le N\}\), define the symmetric kernel function \(\phi \) and its projections as follows.
Let \(\tilde{T}_{n1}\) denote the first term in the right hand side of \(\tilde{T}_n\). Then
Note that that \(\tilde{T}_{n11}\) is a U-statistic with permutation degree 1 in the primary sample \(\{(Z_i,\zeta _i),1\le i \le n\}\) and permutation degree 2 in the validation sample \(\{\tilde{\eta }_k, 1\le k \le N\}\). Theorem 6.1.4 in Lehmann [17] and (52) yield that, for \(0\le \lambda <\infty \),
Moreover, for \(\lambda = \infty \), Theorem 6.1.4 in Lehmann [17] also yields that \( \sqrt{N/n} \, \tilde{T}_{n11} \rightarrow _D N(0,4{\varSigma }_1). \)
Similarly, \(\tilde{T}_{n12}\) is a U-statistic with permutation degree 1 for both samples. Since (52) implies that \(E\{\Vert \dot{m}_\theta (X) [m_\theta (X) - \nu _{\theta }(Z)]\Vert \}<\infty \), therefore \(E \tilde{T}_{n12} = O(n^{-1/2})\). Moreover, Theorem 6.1.3 of U-statistics in Lehmann [17] implies that \(\text{ Var }(\tilde{T}_{n12}) = O(n^{-1})\) and hence \(\tilde{T}_{n12} = o_p(1)\). Hence the claim (60).
Proof of (66). Let \(\dot{D}_{ijk}:= \dot{m}_\theta (Z_i+\tilde{\eta }_k) - \dot{m}_\theta (Z_j+\tilde{\eta }_k)\). Based on the definitions of \(\widehat{\varGamma }_\theta (u)\) and \(\kappa _z(v)\), \(T_{n,R}\) can be rewritten as
First, we study the asymptotic distribution of \(T_{n,R1}\). Define for \(1\le i,j \le n, i\ne j\), and \(1\le k\le N\),
Then \(T_{n,R1}\) can be rewritten as
By the definition of U-statistics in Lehmann [17], \(T_{n,R1}\) is a two sample U-statistic based on function \(\psi _1\) with permutation degree of 2 on the sample \(\{(Z_i,\zeta _i),1\le i\le n\}\) and permutation degree of 1 on the sample \(\{\tilde{\eta }_k,1\le k\le N\}\). Because conditionally, \(L_{Z_i}(\zeta _i)\), given \(Z_i\), is a uniformly distributed r.v., we have \(E(L_{Z_i}(\zeta _i)|Z_i) = \int \limits _0^1 \kappa _z(u)du := K(z)\). Then the conditional expectations of \(\psi \) can be calculated as follows.
It can be seen that Cov\((E(\psi _1|Z_1,\zeta _1)) = \widehat{\varSigma }_\theta \) as defined in Sect. 4.2. Then Theorem 6.1.4 in Lehmann [17] yields that
Next, in order to study \(T_{n,R2}\), define
Then \(T_{n,R2}\) can be rewritten as a two sample U-statistic with permutation degree 2 for both primary sample and validation sample.
The conditional expectations of \(\psi _2\) are calculated as
Then Theorem 6.1.4 in Lehmann [17] shows that, for \(0\le \lambda < \infty \),
The two terms \(T_{n,R1}\) and \(T_{n,R2}\) are asymptotically independent becuase of the independence between the primary sample and validation sample. In fact, \(T_{n,R1}\) is based on \(E(\psi _1|Z_1,\zeta _1)\) and \(T_{n,R2}\) is based on \(E(\psi _2|\tilde{\eta }_1)\). Therefore, (66)(a) holds. An argument similar to one used for (51) yields that \( \sup _{\Vert t\Vert \le b} |\tilde{\mathcal{K}}(\theta + n^{-1/2}t) - \tilde{\mathcal{K}}_R(t)| = o_p(1), \) which in turn yields the claim (66)(b) about \(\tilde{\theta }_R\).
When \(\lambda = \infty \), by Theorem 6.1.4 in Lehmann [17], \( \sqrt{N/n} \, T_{n,R2} \rightarrow _D N(0,{\varSigma }_2). \) Then \(\sqrt{N/n} \,\tilde{T}_{n,R} = \sqrt{N/n} \,\tilde{T}_{n,R1} + \sqrt{N/n} \,\tilde{T}_{n,R2} \rightarrow _D N(0,{\varSigma }_2)\). Therefore, we obtain that \(\sqrt{N}(\tilde{\theta }_R - \theta )\rightarrow _D N(0,\widehat{\varOmega }_\theta ^{-1} {\varSigma }_2 \widehat{\varOmega }_\theta ^{-1})\) for \(\lambda =\infty \). \(\Box \)
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Koul, H.L., Geng, P. (2020). Weighted Empirical Minimum Distance Estimators in Berkson Measurement Error Regression Models. In: Maciak, M., Pešta, M., Schindler, M. (eds) Analytical Methods in Statistics. AMISTAT 2019. Springer Proceedings in Mathematics & Statistics, vol 329. Springer, Cham. https://doi.org/10.1007/978-3-030-48814-7_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-48814-7_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-48813-0
Online ISBN: 978-3-030-48814-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)