Abstract
For a linear regression model subject to left-truncation and right-censoring where the truncation and censoring points are known constants (or always observed if random), Karlsson and Laitila (Stat Probab Lett 78:2567–2571, 2008) proposed a semiparametric estimator which deals with left-truncation by trimming and right-censoring by ‘winsorizing’. The estimator was motivated by a zero moment condition where a transformed error term appears with trimmed and winsorized tails. This paper takes the semiparametric estimator further by deriving the asymptotic distribution that was not shown in Karlsson and Laitila (Stat Probab Lett 78:2567–2571, 2008) and discusses its implementation aspects in practice, albeit brief.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Left-truncated and right-censored (LTRC) data occur in many situations due to, e.g., the sampling procedure, the study design and/or the measuring instrument. Left-truncation of a response variable in a regression model means that neither the response or the explanatory variables are observed if the value of the response is smaller than a truncation point \(t\). Right-censoring of the response variable means that the observed value, \(Y\), is the minimum of the (latent) value of the response, \(Y^{*}\), and the censoring point \(c\), i.e., whenever the value of the response is larger than \(c\) its true value is not observed but \(c\) is recorded instead.
One example of LTRC can be seen in an income survey where only individuals with income above a certain level get sampled (truncation) and then their income is top-coded (censoring). Another example is studies of damages sizes measured as claims paid by insurance companies, where only those damages larger than their deductibles are reported to the insurance companies (truncation) and where the is an upper limit of indemnification (censoring). LTRC also occurs when an individual is observed for an event \(A_{1}\), but gets sampled only when another event \(A_{0}\) holds. For instance, studying the duration of un-employment, suppose the sampling is done only from individuals that has been un-employed more than a specific amount of days, e.g., long enough to receive un-employment insurance contributions. Here the truncation point is a known constant, i.e., fixed. Depending on how the study ends, the censoring point can be fixed or random, and known or unknown. If the study ends on a fixed-calendar date, the censoring point is random, but known for everybody. If the study follows everybody until a fixed duration of un-employment (e.g., until the compensation days from un-employment insurance has run out) the censoring point is known and fixed. Suppose instead that, the sampling is done among the individuals (still) being un-employed at a fixed-calender date, then the truncation point is random and unknown.
In this paper we consider linear regression models where the truncation and censoring points are both known constants or always observed (i.e., known) if they are random.
Ignoring the truncation and/or censoring, e.g., by using the least squares (LS) approach to estimate a regression model, when the response variable is in fact LTRC, can yield very misleading estimates. Moreover, maximum likelihood based estimators are rather sensitive to distributional misspecification, i.e., the specification of the distribution of the error term, when data is LTRC. The goal of this paper is to estimate the regression coefficients without specifying the distribution of the error term in the regression model under LTRC. This is accomplished by using the semiparametric estimator proposed by Karlsson and Laitila (2008). This paper takes the estimator further by deriving the asymptotic distribution that was not shown in their paper.
There are many estimators for LTRC data (see e.g., Shen 2009, 2012, and the references therein) but they are mostly for two-sample location differences. For regression parameters, the estimators are rather complicated (Lai and Ying 1991, 1994; Gross and Lai 1996) involving estimation of the \(Y^{*}\) distribution. Moreover, most of them require random LTRC data. The estimator of Karlsson and Laitila (2008) is a simple semiparametric estimator for LTRC when the truncation and censoring points are fixed (or always observed if random). While the simplicity is its advantage, the disadvantage is inapplicability if the censoring point is unknown when greater than \(Y^{*}\).
Section 2 explains the estimator, and Sect. 3 examines its asymptotic distribution and discusses its implementation. The paper ends with a section with conclusions.
2 The estimator and its main idea
For each individual \(i\) in the population,
where \(X_{i}= (1,\tilde{X}_{i}^{\prime })^{\prime }\), \(\tilde{X}_{i}\equiv (X_{i2},...,X_{ik})^{\prime }\) is a \((k-1)\times 1\) regressor vector, \(\beta \equiv (\beta _{1},\tilde{\beta }^{\prime })^{\prime }\) is a \(k\times 1\) coefficient vector with \(\tilde{\beta }\) being the \((k-1)\times 1\) slope coefficients, and \(U_{i}\) is an error term independent of \(X_{i}\) (“\(U\amalg X\)”) with its density function (pdf) and distribution function (cdf) denoted by \(f_{U}\) and \(F_{U}\).
Now, consider a LTRC linear regression model with, for simplicity, left-truncation at \(t=0\) and right-censoring at a known constant \(c>0\) for all individuals. Let \(Y_{i}\equiv \min (Y_{i}^{*},c)\) and sample \(N\) units \((X_{i},Y_{i})\) from the sub-population (or stratum) where \(Y_{i}^{*}>t=0\). Henceforth, if possible, the subscripts \(i\) of \(Y_{i}^{*}, Y_{i}, X_{i}\) and \(U_{i}\) are dropped for convenience.
It is easy to allow for a known non-zero truncation constant by subtracting it from the response, the intercept and from the censoring point. Similarly, we can, as long as they are observed (i.e., known), allow left-truncation at \(t_{i}\) and right-censoring at \(c_{i}\), i.e., truncation and censoring points varying across individuals. The main finding of this paper still holds conditionally on \((t_{i},c_{i})\) so long as \(Y_{i}^{*}\amalg (t_{i},c_{i})\) given \(X_{i}\).
Without truncation/censoring, an estimate of \(\beta \) in (1) can be calculated by applying the LS estimator under the moment condition of \(E(XU)=0\). With LTRC data, Karlsson and Laitila (2008) proposed an estimator motivated by a zero moment condition where a transformed error trimmed at the lower end and winsorized at the upper tail appears. Henceforth, the estimator of Karlsson and Laitila (2008) is called the “trimmed and winsorized estimator” (TWE). The moment condition of TWE is a combination of the two moment conditions under which, the two estimators suggested in Karlsson (2006) for left truncation and left censoring respectively, are derived. They, in turn, are generalizations of the “quadratic mode regression estimator” (QME) by Lee (1993) and the “winsorized mean estimator” (WME) by Lee (1992). In order to explain the TWE, we first take a closer look at QME and WME and their properties.
2.1 The QME and the WME
If data is left-truncated at \(0\) (but not right-censored), i.e., only observed if \(0<Y^{*}\) or equivalently only observed if \(-X^{\prime }\beta <U\), the QME can be used. The QME is derived from the moment condition
where \(1[A]\) denotes an indicator function taking value \(1\) if condition \(A\) is valid and \(0\) otherwise and \(w>0\) is a “trimming” constant. This means that the “trimmed error” \(1[-w<U<w]U\) is orthogonal to \(X\). The factor \(1[X^{\prime }\beta >w]\) in (2) is for \(-w<U\) to hold when \(-X^{\prime }\beta <U\). QME is consistent under semiparametric assumptions—e.g., partial symmetry of \(U|X\) up to \(\pm w\).
In contrast to QME, when there is only left censoring at \(0\), the WME used another modified moment condition
instead of trimming \(U\) when \(|U|\ge w\), \(U\) gets replaced by \(\pm w\). The “winsorized error” \(-w1[U\le -w]+1[-w<U<w]U+w1[w\le U]\) is orthogonal to \(X\) under the partial symmetry of \(U\) up to \(\pm w\) and \(P(U\le -w)=P(U\ge w)\).
Whereas the LS estimator has \(E(XX^{\prime })\) as its second order matrix, QME has (\(-\)1 times)
where \(1-F_{U}(-X^{\prime }\beta )\) is the truncation normalizing factor (see the left panel of Fig. 1). For \(H_{QME}\) to be positive definite, \(f_{U}\) should be strictly unimodal at \(0\), explaining the word “mode” in QME.
The asymptotic variance of WME has as its second order matrix
(see right panel of Fig. 1) which differs somewhat from (4).
Although we used \(f_{U}\) for QME and WME, QME and WME allow heteroskedasticity of unknown form, in which case \(f_{U|X}\) replaces \(f_{U}\). It is the local symmetry assumption up to \(\pm w\) that assures \(E(1[|U|<w]U)=0\) while allowing for heteroskedasticity of unknown form.
2.2 The trimmed and winsorized estimator
In QME and WME where only one tail of \(U\) is subject to truncation/censoring, symmetric trimming/winsorizing is done artificially at the other tail of \(U\). But symmetric trimming/winsorizing is not feasible under LTRC as both tails of \(U\) are subject to truncation/censoring. The left-truncation and right-censoring points give the interval \((-X^{\prime }\beta ,c-X^{\prime }\beta ]\) for \(U\). In this case, a transformed error trimmed at the lower end and winsorized at the upper end can be used; for some chosen trimming constants \(w_{l},w_{u}>0\). The trimming and winsorizing interval \((-w_{l},w_{u})\) should be placed within the range of \(U\), such that,
For this, it is necessary to have
for which it is in turn necessary to have \(w_{l}+w_{u}<c\). Added to this condition is ‘\(0<w_{u}<w_{l}\)’ to help the expected value of the trimmed and winsorized error
to be equal to zero conditional on \(X\). Because of the restrictions \(w_{l}+w_{u}<c\) and \(0<w_{u}<w_{l}\), we can choose \(w_{u}\) from \((0,c/2)\) first, and then \(w_{l}\) over \((w_{u},c-w_{u})\).
In view of (6), one may try to use
instead of (7), but this fails because the intercept that makes its expected value, conditional on \(X\), equal to zero depends on \(X\). Using the constant interval \( (-w_{l},w_{u})\) falling within the \(X\)-varying interval \((-X^{\prime }\beta ,c-X^{\prime }\beta )\) is the key idea.
The restriction on \(X^{\prime }\beta \) in (6) combined with the trimmed and winsorized error in (7), yields the key unconditional moment condition under LTRC; \(E\{m(X,Y^*;\beta )\}=0\) (or just \(E\{m(\beta )\}=0\) if \((X,Y^*)\) is omitted), where
This is thus a combination of the corresponding moment conditions of QME and WME, (2) and (3) respectively.
Recall from (1) that \(Y^{*}=\beta _{1}+\tilde{X}^{\prime }\tilde{\beta }+U\) and consider
with \(\varDelta \equiv \beta _{1}^{*}-\beta _{1}\), where \(\beta _1^{*}\) is for the intercept shift (i.e., \(\beta _{1}^{*}\) that does not necessarily equal \(\beta _{1}\)), which is assumed to exist (Assumption A1, Karlsson and Laitila 2008) for \(E\{m(\beta _1^{*}, \tilde{\beta })\}=0\) to hold. If \(U\) is not independent of \(X\), then \(\beta _{1}^{*}\) may have to change depending on \(X\) because \(w_{l}\) and \(w_{u}\) are fixed. This would then hamper the identification of \(\tilde{\beta }\), which is why \(U\amalg X\) is invoked in this paper although it was not assumed in Karlsson and Laitila (2008). ‘\(U\amalg X\)’ can be relaxed if the form of heteroskedasticity of \( U|X \) is known. For instance, if \(U=\exp (X_{k})V\) where \(X_{k}\) is the \(k\) th element of \(X\) and \(V\) is an error term with \(V\amalg X\), we can replace \( w_{l}\) and \(w_{u}\) with \(w_{l}\exp (X_{k})\) and \(w_{u}\exp (X_{k})\), respectively. Then the above moment condition becomes a moment condition involving \(V\), \(w_{l}\) and \(w_{u}\). But since a known form of heteroskedasticity is rare, we will proceed with \(U\amalg X\).
The TWE is defined using a minimand in Karlsson and Laitila (2008),
where
This minimand has three components depending on where \(X^{\prime }b\) locates relative to \(w_{l}\) and \(c-w_{u}\), and each component can take three different forms as can be seen in \(g(\cdot )\). The first term \(w_{l}^{2}/2\) of \(g(\cdot )\) appears for the QME maximand to make use of the truncated part, the second term \(r^{2}/2\) is the usual squared residual in the LS estimator, and the last term \(w_{u}r-w_{u}/2\) appears for the WME minimand to make use of the censored part.
Karlsson and Laitila (2008) obtained this minimand by integrating the population moment condition back. Their proof uses the proofs in Newey (2001), who showed that integrating a population moment condition back, a minimand is obtained with a unique minimum, and that the estimator minimizing the sample version is \(\sqrt{N}\)-consistent satisfying the sample moment condition \(N^{-1/2}\sum _{i}m(X_{i},Y_{i};b_{N})=o_{p}(1)\). Karlsson and Laitila (2008) defined \(\beta ^{*}\equiv (\beta _{1}^{*},\tilde{\beta }^{\prime })^{\prime }\) and showed that \(b_{N}\) converged in probability to \(\beta ^{*}\), i.e. \( b_{N}\rightarrow ^{p}\beta ^{*}\).
3 Asymptotic distribution
In addition to the assumptions in Karlsson and Laitila (2008) for the consistency of TWE, with \(|X|\) denoting the Euclidean norm of \(X\), suppose the following assumptions hold:
Assumption U
\(U\amalg X\), \(E(U)<\infty \), and \(f_{U}\) is continuous and satisfies, for \(\varDelta \equiv \beta _{1}^{*}-\beta _{1}\), \(P(\varDelta -w_{l}<U<\varDelta +w_{u})>w_{l}f_{U}(\varDelta -w_{l})/\{1-F_{U}(-X^{\prime }\beta )\}\).
Assumption X
\(E\{XX^{\prime }1[w_{l}<X^{\prime }\beta ^{*}<c-w_{u}]\}\) is positive definite, \(E|X|^{2}<\infty \) and for some constants \(\nu _{0},\nu _{1},\nu _{2}>0\),
-
(i):
\(E\{1[|w_{l}-X^{\prime }\beta ^{*}|<|X| |b-\beta ^{*}|]\cdot |X|\}\le \nu _{1}|b-\beta ^{*}|\) and
-
(ii):
\(E\{1[|c-w_{u}-X^{\prime }\beta ^{*}|<|X| |b-\beta ^{*}|]\cdot |X|\}\le \nu _{2}|b-\beta ^{*}|\)
for all \(|b-\beta ^{*}|<\nu _{0}\).
Assumption X is for smoothness of indicator functions of \(b\) such as \(E(1[X^{\prime }b\ge w_{l}])\). Almost the same assumptions appeared in Powell (1984, p. 310), Newey and Powell (1990, p. 304), and, Lee (1993, p. 4) in the proofs of asymptotic normality of the estimators of censored or truncated regression models suggested in those papers. Assumption X can be replaced with another assumption that makes \(X^{\prime }\beta \) smooth, e.g., \(F_{k|-k}(x_{k}|x_{-k})\) having a continuously differentiable derivative \(f_{k|-k}(x_{k}|x_{-k})\) for \(x_{k}\) that is uniformly bounded over \(x=(x_{-k}^{\prime },x_{k})^{\prime }\), where \(F_{k|-k}\) is the distribution function of \(X_{k}\) given the other elements \(X_{-k}\) of \(X\). The motivation for making Assumption U will be explained shortly.
Theorem AD
Under Assumptions U and X, with ‘\(\leadsto \)’ denoting convergence in law,
where
Theorem AD is proven in the Appendix. Roughly speaking, \(H_{TWE}\) falls between \(H_{QME}\) and \(H_{WME}\), see (4) and (5), because the \(X\)-conditional mean of \( \{\cdot \}\) in \(H_{TWE}\) is “area under the density \(\frac{f_{U}}{1-F_{U}(-X^{\prime }\beta )}\) between \(\varDelta -w_{l}\) and \(\varDelta +w_{u}\)” minus \(w_{l}\frac{ f_{U}(\varDelta -w_{l})}{1-F_{U}(-X^{\prime }\beta )}\). If \(\beta _{1}^{*}=\beta _{1}\Longleftrightarrow \varDelta =0\), then the strict unimodality of \(U\) at \(0\) is enough to make this display positive. But \(\beta _{1}^{*}\) might differ much from \(\beta _{1}\), depending on \( f_{U}\) and \((-w_{l},w_{u})\) so that the interval \((-w_{l},w_{u})\) may fall under a downward sloping part of \(f_{U}\) to make the last display non-positive. This is why Assumption U was imposed.
3.1 Estimation and implementation aspects of TWE
Let \(\hat{U}_{i}\equiv Y_{i}-X_{i}^{\prime }b_{N}\rightarrow ^{p}Y_{i}-X_{i}^{\prime }\beta +\beta _{1}-\beta _{1}^{*}=U_{i}-\varDelta \). Then \(\hat{H}_{TWE}^{-1}\hat{G}\hat{H}_{TWE}^{-1}\) can be used to estimate the asymptotic variance, where
and
for a bandwidth \(h\rightarrow ^{+}0\) as \(N\rightarrow \infty \). The expression‘\( h^{-1}1[-w_{l}\le \hat{U}_{i}\le -w_{l}+h]\)’ in \(\hat{H}_{TWE}\) is to estimate \(f_{U}(\varDelta -w_{l})/\{1-F_{U}(-X^{\prime }\beta )\}\) nonparametrically. As \(\hat{H}_{TWE}\) is bandwidth-dependent, practitioners would prefer using nonparametric bootstrap instead. This is one reason why we omit proving \(\hat{H}_{TWE}\rightarrow ^{p}H_{TWE}\) and \(\hat{G} \rightarrow ^{p}E\{m(\beta ^{*})m(\beta ^{*})^{\prime }\}\) that need more assumptions.
The above derivation suggests an iteration algorithm: with \(U_{i}(b_{0})\equiv Y_{i}-X_{i}^{\prime }b_{0}\),
which is to be iterated until convergence; \(b_{N}\) in \(\hat{H}_{TWE}\) should be replaced by \(b_{0}\) as well. In the iteration, \(\hat{H}_{TWE}\) may pose a trouble as it may not be positive definite, in which case \(w_{l}h^{-1}1[-w_{l}\le U_{i}(b_{0})\le -w_{l}+h]\) in \(\hat{H}_{TWE}\) may be dropped. If the problem persists, \(1[w_{l}<X_{i}^{\prime }b_{0}<c-w_{u}]\) may be dropped as well.
The finite sample bias could be large for the intercept estimator since it plays the role of satisfying \(E\{m(\beta _{1}^{*},\tilde{\beta } )\}=0\), and the intercept bias magnitude would depend on \((-w_{l},w_{u})\). Instead of modifying the intercept, TWE might try to adjust the scale of \(U\) , but this cannot be done easily because the scale of \(Y\) is fixed and there is no scale parameter to estimate. What could still happen though is scaling all slope estimates up/down to alter the scale of the residual \(Y-X^{\prime }b_{N}\)—this would be how the intercept bias affects the slope estimates. In this case, all slope estimates will be biased up/down by the same positive factor. This, however, would not bias signs of slope estimates, nor their ratios.
4 Conclusions
LTRC data is a common problem that face researchers and it is important to use an estimator that takes the truncation and censoring into account in order not to draw incorrect conclusions due to misleading results. In this paper we derive the asymptotic distribution of the semiparametric estimator TWE, suggested by Karlsson and Laitila (2008). \(\sqrt{N}\)-consistency for the slope estimates were already shown in Karlsson and Laitila (2008). The results in this paper together with the results in Karlsson and Laitila (2008) suggest that the TWE is a suitable estimator when data is LTRC. However, there are also situations in practice, where it has drawbacks as discussed in the previous section.
References
Gross ST, Lai TL (1996) Nonparametric estimation and regression analysis with left-truncated and right-censored data. J Am Stat Assoc 91:1166–1180
Karlsson M (2006) Estimators of regression parameters for truncated and censored data. Metrika 63:329–341
Karlsson M, Laitila T (2008) A semiparametric regression estimator under left truncation and right censoring. Stat Probab Lett 78:2567–2571
Lai T, Ying Z (1991) Rank regression methods for left-truncated and right-censored data. Ann Stat 19:531–556
Lai TL, Ying Z (1994) A missing information principle and M-estimators in regression analysis with censored and truncated data. Ann Stat 22:1222–1255
Lee MJ (1992) Winsorized mean estimator for censored regression. Econ Theory 8:368–382
Lee MJ (1993) Quadratic mode regression. J Econ 57:1–19
Newey WK (2001) Conditional moment restrictions in censored and truncated regression models. Econ Theory 17:863–888
Newey WK, Powell JL (1990) Efficient estimation of linear and type I censored regression models under conditional quantile restrictions. Econ Theory 6:295–317
Powell J (1984) Least absolute deviations estimation for the censored regression model. J Econ 25:303–325
Shen P (2009) A class of rank-based test for left-truncated and right-censored data. Ann Inst Stat Math 61:461–476
Shen P (2012) Median regression model with left truncated and right censored data. J Stat Plan Inference 142:1757–1766
Van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge
Acknowledgments
Myoung-jae Lee’s research has been supported by a Korea University grant.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
We will first show that \(E\{m(b)\}\) is continuously differentiable at \(b=\beta ^{*}\) with derivative matrix \(H_{TWE}\). Rewrite \(E\{m(b)\}-E\{m(\beta ^{*})\}\) as
The last two terms are 0 because \(U\amalg X\) and the transformed error with \( Y-X^{\prime }\beta ^{*}\) has mean zero. The first two terms share \( 1[w_{l}<X^{\prime }b<c-w_{u}]X\), and thus we rewrite them as
The following shows that the first two terms give \(H_{TWE}\) and that the remainder is \(o(|b-\beta ^{*}|)\).
With \(f_{U,X^{\prime }\beta }(\cdot )\equiv f_{U}(\cdot )/\{1-F_{U}(-X^{\prime }\beta )\}\), observe, for the first two terms of (8),
Differentiate this for \(b\) to get a continuous derivative vector
Set \(b=\beta ^{*}\), attach \(1[w_{l}<X^{\prime }\beta ^{*}<c-w_{u}]X\) and then take \(E(\cdot )\) to get \(-H_{TWE}\).
Since the transformed error is bounded by \(-w_{l}\) and \(w_{l}\), the last two terms of (8) is less than a constant times
Since the transformed error difference is \(O(|b-\beta ^{*}|)\), this shows that the last two terms of (8) are \(O(|b-\beta ^{*}|^{2})\), and thus \(o(|b-\beta ^{*}|)\).
Consider the stochastic equicontinuity of the empirical process
Since \(m(b)\) consists of indicator and polynomial functions, \(\{m(b),b\in B\} \) with \(B\) being a compact parameter space is a Donsker class with a square-integrable envelope under \(E|X|^{2}\); see, e.g., Van der Vaart (1998). From the stochastic equicontinuity, \(o_{p}(1)=G_{N}(b_{N})-G_{N}( \beta ^{*})\) holds. Substitute \(N^{-1/2} \sum _{i}m(X_{i},Y_{i},b_{N})=o_{p}(1)\) into this to obtain
From this, Theorem AD follows.
Rights and permissions
About this article
Cite this article
Lee, Mj., Karlsson, M. Trimmed and winsorized semiparametric estimator for left-truncated and right-censored regression models. Metrika 78, 485–495 (2015). https://doi.org/10.1007/s00184-014-0513-9
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-014-0513-9