1 Introduction

Left-truncated and right-censored (LTRC) data occur in many situations due to, e.g., the sampling procedure, the study design and/or the measuring instrument. Left-truncation of a response variable in a regression model means that neither the response or the explanatory variables are observed if the value of the response is smaller than a truncation point \(t\). Right-censoring of the response variable means that the observed value, \(Y\), is the minimum of the (latent) value of the response, \(Y^{*}\), and the censoring point \(c\), i.e., whenever the value of the response is larger than \(c\) its true value is not observed but \(c\) is recorded instead.

One example of LTRC can be seen in an income survey where only individuals with income above a certain level get sampled (truncation) and then their income is top-coded (censoring). Another example is studies of damages sizes measured as claims paid by insurance companies, where only those damages larger than their deductibles are reported to the insurance companies (truncation) and where the is an upper limit of indemnification (censoring). LTRC also occurs when an individual is observed for an event \(A_{1}\), but gets sampled only when another event \(A_{0}\) holds. For instance, studying the duration of un-employment, suppose the sampling is done only from individuals that has been un-employed more than a specific amount of days, e.g., long enough to receive un-employment insurance contributions. Here the truncation point is a known constant, i.e., fixed. Depending on how the study ends, the censoring point can be fixed or random, and known or unknown. If the study ends on a fixed-calendar date, the censoring point is random, but known for everybody. If the study follows everybody until a fixed duration of un-employment (e.g., until the compensation days from un-employment insurance has run out) the censoring point is known and fixed. Suppose instead that, the sampling is done among the individuals (still) being un-employed at a fixed-calender date, then the truncation point is random and unknown.

In this paper we consider linear regression models where the truncation and censoring points are both known constants or always observed (i.e., known) if they are random.

Ignoring the truncation and/or censoring, e.g., by using the least squares (LS) approach to estimate a regression model, when the response variable is in fact LTRC, can yield very misleading estimates. Moreover, maximum likelihood based estimators are rather sensitive to distributional misspecification, i.e., the specification of the distribution of the error term, when data is LTRC. The goal of this paper is to estimate the regression coefficients without specifying the distribution of the error term in the regression model under LTRC. This is accomplished by using the semiparametric estimator proposed by Karlsson and Laitila (2008). This paper takes the estimator further by deriving the asymptotic distribution that was not shown in their paper.

There are many estimators for LTRC data (see e.g., Shen 2009, 2012, and the references therein) but they are mostly for two-sample location differences. For regression parameters, the estimators are rather complicated (Lai and Ying 1991, 1994; Gross and Lai 1996) involving estimation of the \(Y^{*}\) distribution. Moreover, most of them require random LTRC data. The estimator of Karlsson and Laitila (2008) is a simple semiparametric estimator for LTRC when the truncation and censoring points are fixed (or always observed if random). While the simplicity is its advantage, the disadvantage is inapplicability if the censoring point is unknown when greater than \(Y^{*}\).

Section 2 explains the estimator, and Sect. 3 examines its asymptotic distribution and discusses its implementation. The paper ends with a section with conclusions.

2 The estimator and its main idea

For each individual \(i\) in the population,

$$\begin{aligned} Y_{i}^{*}=X_{i}^{\prime }\beta + U_{i}, \text {independent and identically distributed}\ (iid), \end{aligned}$$
(1)

where \(X_{i}= (1,\tilde{X}_{i}^{\prime })^{\prime }\), \(\tilde{X}_{i}\equiv (X_{i2},...,X_{ik})^{\prime }\) is a \((k-1)\times 1\) regressor vector, \(\beta \equiv (\beta _{1},\tilde{\beta }^{\prime })^{\prime }\) is a \(k\times 1\) coefficient vector with \(\tilde{\beta }\) being the \((k-1)\times 1\) slope coefficients, and \(U_{i}\) is an error term independent of \(X_{i}\) (“\(U\amalg X\)”) with its density function (pdf) and distribution function (cdf) denoted by \(f_{U}\) and \(F_{U}\).

Now, consider a LTRC linear regression model with, for simplicity, left-truncation at \(t=0\) and right-censoring at a known constant \(c>0\) for all individuals. Let \(Y_{i}\equiv \min (Y_{i}^{*},c)\) and sample \(N\) units \((X_{i},Y_{i})\) from the sub-population (or stratum) where \(Y_{i}^{*}>t=0\). Henceforth, if possible, the subscripts \(i\) of \(Y_{i}^{*}, Y_{i}, X_{i}\) and \(U_{i}\) are dropped for convenience.

It is easy to allow for a known non-zero truncation constant by subtracting it from the response, the intercept and from the censoring point. Similarly, we can, as long as they are observed (i.e., known), allow left-truncation at \(t_{i}\) and right-censoring at \(c_{i}\), i.e., truncation and censoring points varying across individuals. The main finding of this paper still holds conditionally on \((t_{i},c_{i})\) so long as \(Y_{i}^{*}\amalg (t_{i},c_{i})\) given \(X_{i}\).

Without truncation/censoring, an estimate of \(\beta \) in (1) can be calculated by applying the LS estimator under the moment condition of \(E(XU)=0\). With LTRC data, Karlsson and Laitila (2008) proposed an estimator motivated by a zero moment condition where a transformed error trimmed at the lower end and winsorized at the upper tail appears. Henceforth, the estimator of Karlsson and Laitila (2008) is called the “trimmed and winsorized estimator” (TWE). The moment condition of TWE is a combination of the two moment conditions under which, the two estimators suggested in Karlsson (2006) for left truncation and left censoring respectively, are derived. They, in turn, are generalizations of the “quadratic mode regression estimator” (QME) by Lee (1993) and the “winsorized mean estimator” (WME) by Lee (1992). In order to explain the TWE, we first take a closer look at QME and WME and their properties.

2.1 The QME and the WME

If data is left-truncated at \(0\) (but not right-censored), i.e., only observed if \(0<Y^{*}\) or equivalently only observed if \(-X^{\prime }\beta <U\), the QME can be used. The QME is derived from the moment condition

$$\begin{aligned} E\{1[X^{\prime }\beta >w]X\cdot 1[-w<U<w]U\}=0 \end{aligned}$$
(2)

where \(1[A]\) denotes an indicator function taking value \(1\) if condition \(A\) is valid and \(0\) otherwise and \(w>0\) is a “trimming” constant. This means that the “trimmed error” \(1[-w<U<w]U\) is orthogonal to \(X\). The factor \(1[X^{\prime }\beta >w]\) in (2) is for \(-w<U\) to hold when \(-X^{\prime }\beta <U\). QME is consistent under semiparametric assumptions—e.g., partial symmetry of \(U|X\) up to \(\pm w\).

In contrast to QME, when there is only left censoring at \(0\), the WME used another modified moment condition

$$\begin{aligned} E\{1[X^{\prime }\beta >w]X\cdot (-w1[U\le -w]+1[-w<U<w]U+w1[w\le U])\}=0; \end{aligned}$$
(3)

instead of trimming \(U\) when \(|U|\ge w\), \(U\) gets replaced by \(\pm w\). The “winsorized error” \(-w1[U\le -w]+1[-w<U<w]U+w1[w\le U]\) is orthogonal to \(X\) under the partial symmetry of \(U\) up to \(\pm w\) and \(P(U\le -w)=P(U\ge w)\).

Whereas the LS estimator has \(E(XX^{\prime })\) as its second order matrix, QME has (\(-\)1 times)

$$\begin{aligned} H_{\textit{QME}}&\equiv E \Bigg [XX^{\prime } \Bigg \{\left( \text {``area under density}\,\frac{f_{U}}{ 1-F_{U}(-X^{\prime }\beta )}\,\text {between}\,\pm w\text {''}\right) \nonumber \\&\quad -\, 2w\frac{f_{U}(w)}{ 1-F_{U}(-X^{\prime }\beta )}\Bigg \}\Bigg ], \end{aligned}$$
(4)

where \(1-F_{U}(-X^{\prime }\beta )\) is the truncation normalizing factor (see the left panel of Fig. 1). For \(H_{QME}\) to be positive definite, \(f_{U}\) should be strictly unimodal at \(0\), explaining the word “mode” in QME.

The asymptotic variance of WME has as its second order matrix

$$\begin{aligned} H_{\textit{WME}}\equiv E\Big [XX^{\prime } \Big \{ (\text {``area under} \,f_{U}\,\text {between}\, \pm w'')\Big \}\Big ], \end{aligned}$$
(5)

(see right panel of Fig. 1) which differs somewhat from (4).

Fig. 1
figure 1

Density areas for QME (left) and WME (right) hessians

Although we used \(f_{U}\) for QME and WME, QME and WME allow heteroskedasticity of unknown form, in which case \(f_{U|X}\) replaces \(f_{U}\). It is the local symmetry assumption up to \(\pm w\) that assures \(E(1[|U|<w]U)=0\) while allowing for heteroskedasticity of unknown form.

2.2 The trimmed and winsorized estimator

In QME and WME where only one tail of \(U\) is subject to truncation/censoring, symmetric trimming/winsorizing is done artificially at the other tail of \(U\). But symmetric trimming/winsorizing is not feasible under LTRC as both tails of \(U\) are subject to truncation/censoring. The left-truncation and right-censoring points give the interval \((-X^{\prime }\beta ,c-X^{\prime }\beta ]\) for \(U\). In this case, a transformed error trimmed at the lower end and winsorized at the upper end can be used; for some chosen trimming constants \(w_{l},w_{u}>0\). The trimming and winsorizing interval \((-w_{l},w_{u})\) should be placed within the range of \(U\), such that,

$$\begin{aligned} -X^{\prime }\beta <-w_{l}<U<w_{u}\le c-X^{\prime }\beta . \end{aligned}$$

For this, it is necessary to have

$$\begin{aligned} w_{l} < X^{\prime }\beta \le c-w_{u} \end{aligned}$$
(6)

for which it is in turn necessary to have \(w_{l}+w_{u}<c\). Added to this condition is ‘\(0<w_{u}<w_{l}\)’ to help the expected value of the trimmed and winsorized error

$$\begin{aligned} 1[-w_{l}<U<w_{u}]U+w_{u}1[w_{u}\le U]. \end{aligned}$$
(7)

to be equal to zero conditional on \(X\). Because of the restrictions \(w_{l}+w_{u}<c\) and \(0<w_{u}<w_{l}\), we can choose \(w_{u}\) from \((0,c/2)\) first, and then \(w_{l}\) over \((w_{u},c-w_{u})\).

In view of (6), one may try to use

$$\begin{aligned} 1[-X^{\prime }\beta <U<c-X^{\prime }\beta ]U+(c-X^{\prime }\beta )1[c-X^{\prime }\beta \le U] \end{aligned}$$

instead of (7), but this fails because the intercept that makes its expected value, conditional on \(X\), equal to zero depends on \(X\). Using the constant interval \( (-w_{l},w_{u})\) falling within the \(X\)-varying interval \((-X^{\prime }\beta ,c-X^{\prime }\beta )\) is the key idea.

The restriction on \(X^{\prime }\beta \) in (6) combined with the trimmed and winsorized error in (7), yields the key unconditional moment condition under LTRC; \(E\{m(X,Y^*;\beta )\}=0\) (or just \(E\{m(\beta )\}=0\) if \((X,Y^*)\) is omitted), where

$$\begin{aligned} m(\beta )&\equiv 1\left[ w_{l}<X^{\prime }\beta <c-w_{u}\right] X \cdot \Big ( 1\left[ -w_{l}<Y^{*}-X^{\prime }\beta <w_{u}\right] \left( Y^{*}-X^{\prime }\beta \right) \nonumber \\&\quad +\, w_{u}1\left[ w_{u}\le Y^{*}-X^{\prime }\beta \right] \Big ). \end{aligned}$$

This is thus a combination of the corresponding moment conditions of QME and WME, (2) and (3) respectively.

Recall from (1) that \(Y^{*}=\beta _{1}+\tilde{X}^{\prime }\tilde{\beta }+U\) and consider

$$\begin{aligned}&E\left( 1\left[ -w_{l}<Y^{*}\!-\!\beta _{1}^{*}\!-\!\tilde{X}^{\prime }\tilde{\beta } <w_{u}\right] \left( Y^{*}\!-\!\beta _{1}^{*}\!-\!\tilde{X}^{\prime }\tilde{\beta }\right) \!+\!w_{u}1\left[ w_{u}\le Y^{*}\!-\!\beta _{1}^{*}\!-\!\tilde{X}^{\prime }\tilde{\beta }\right] |X\right) \\&\quad =E\left( 1\left[ -w_{l}<U-\varDelta <w_{u}\right] \left( U-\varDelta \right) +w_{u}1\left[ w_{u}\le U-\varDelta \right] \right) \end{aligned}$$

with \(\varDelta \equiv \beta _{1}^{*}-\beta _{1}\), where \(\beta _1^{*}\) is for the intercept shift (i.e., \(\beta _{1}^{*}\) that does not necessarily equal \(\beta _{1}\)), which is assumed to exist (Assumption A1, Karlsson and Laitila 2008) for \(E\{m(\beta _1^{*}, \tilde{\beta })\}=0\) to hold. If \(U\) is not independent of \(X\), then \(\beta _{1}^{*}\) may have to change depending on \(X\) because \(w_{l}\) and \(w_{u}\) are fixed. This would then hamper the identification of \(\tilde{\beta }\), which is why \(U\amalg X\) is invoked in this paper although it was not assumed in Karlsson and Laitila (2008). ‘\(U\amalg X\)’ can be relaxed if the form of heteroskedasticity of \( U|X \) is known. For instance, if \(U=\exp (X_{k})V\) where \(X_{k}\) is the \(k\) th element of \(X\) and \(V\) is an error term with \(V\amalg X\), we can replace \( w_{l}\) and \(w_{u}\) with \(w_{l}\exp (X_{k})\) and \(w_{u}\exp (X_{k})\), respectively. Then the above moment condition becomes a moment condition involving \(V\), \(w_{l}\) and \(w_{u}\). But since a known form of heteroskedasticity is rare, we will proceed with \(U\amalg X\).

The TWE is defined using a minimand in Karlsson and Laitila (2008),

$$\begin{aligned}&\frac{1}{N}\sum _{i}\left\{ 1\left[ X_{i}^{\prime }b\le w_{l}\right] g\left( Y_{i}-w_{l}\right) + 1\left[ w_{l}<X_{i}^{\prime }b<c-w_{u}\right] g\left( Y_{i}-X_{i}^{\prime }b\right) \right. \\&\quad \left. + 1\left[ c-w_{u}\le X_{i}^{\prime }b\right] g(Y_{i}-c+w_{u}) \right\} \end{aligned}$$

where

$$\begin{aligned} g(r)\equiv 1[r\le -w_{l}]\frac{w_{l}^{2}}{2} +1[-w_{l}<r<w_{u}]\frac{r^{2}}{2}+1[w_{u}\le r]\left( w_{u}r-\frac{w_{u}}{2}\right) . \end{aligned}$$

This minimand has three components depending on where \(X^{\prime }b\) locates relative to \(w_{l}\) and \(c-w_{u}\), and each component can take three different forms as can be seen in \(g(\cdot )\). The first term \(w_{l}^{2}/2\) of \(g(\cdot )\) appears for the QME maximand to make use of the truncated part, the second term \(r^{2}/2\) is the usual squared residual in the LS estimator, and the last term \(w_{u}r-w_{u}/2\) appears for the WME minimand to make use of the censored part.

Karlsson and Laitila (2008) obtained this minimand by integrating the population moment condition back. Their proof uses the proofs in Newey (2001), who showed that integrating a population moment condition back, a minimand is obtained with a unique minimum, and that the estimator minimizing the sample version is \(\sqrt{N}\)-consistent satisfying the sample moment condition \(N^{-1/2}\sum _{i}m(X_{i},Y_{i};b_{N})=o_{p}(1)\). Karlsson and Laitila (2008) defined \(\beta ^{*}\equiv (\beta _{1}^{*},\tilde{\beta }^{\prime })^{\prime }\) and showed that \(b_{N}\) converged in probability to \(\beta ^{*}\), i.e. \( b_{N}\rightarrow ^{p}\beta ^{*}\).

3 Asymptotic distribution

In addition to the assumptions in Karlsson and Laitila (2008) for the consistency of TWE, with \(|X|\) denoting the Euclidean norm of \(X\), suppose the following assumptions hold:

Assumption U

\(U\amalg X\), \(E(U)<\infty \), and \(f_{U}\) is continuous and satisfies, for \(\varDelta \equiv \beta _{1}^{*}-\beta _{1}\), \(P(\varDelta -w_{l}<U<\varDelta +w_{u})>w_{l}f_{U}(\varDelta -w_{l})/\{1-F_{U}(-X^{\prime }\beta )\}\).

Assumption X

\(E\{XX^{\prime }1[w_{l}<X^{\prime }\beta ^{*}<c-w_{u}]\}\) is positive definite, \(E|X|^{2}<\infty \) and for some constants \(\nu _{0},\nu _{1},\nu _{2}>0\),

  1. (i):

    \(E\{1[|w_{l}-X^{\prime }\beta ^{*}|<|X| |b-\beta ^{*}|]\cdot |X|\}\le \nu _{1}|b-\beta ^{*}|\) and

  2. (ii):

    \(E\{1[|c-w_{u}-X^{\prime }\beta ^{*}|<|X| |b-\beta ^{*}|]\cdot |X|\}\le \nu _{2}|b-\beta ^{*}|\)

for all \(|b-\beta ^{*}|<\nu _{0}\).

Assumption X is for smoothness of indicator functions of \(b\) such as \(E(1[X^{\prime }b\ge w_{l}])\). Almost the same assumptions appeared in Powell (1984, p. 310), Newey and Powell (1990, p. 304), and, Lee (1993, p. 4) in the proofs of asymptotic normality of the estimators of censored or truncated regression models suggested in those papers. Assumption X can be replaced with another assumption that makes \(X^{\prime }\beta \) smooth, e.g., \(F_{k|-k}(x_{k}|x_{-k})\) having a continuously differentiable derivative \(f_{k|-k}(x_{k}|x_{-k})\) for \(x_{k}\) that is uniformly bounded over \(x=(x_{-k}^{\prime },x_{k})^{\prime }\), where \(F_{k|-k}\) is the distribution function of \(X_{k}\) given the other elements \(X_{-k}\) of \(X\). The motivation for making Assumption U will be explained shortly.

Theorem AD

Under Assumptions U and X, with ‘\(\leadsto \)’ denoting convergence in law,

$$\begin{aligned} \sqrt{N}(b_{N}-\beta ^{*})\leadsto N\left[ 0, H_{\textit{TWE}}^{-1}E\{m(\beta ^{*})m(\beta ^{*})^{\prime }\}H_{TWE}^{-1}\right] \end{aligned}$$

where

$$\begin{aligned} H_{\textit{TWE}}&\equiv E\Bigg [1\left[ w_{l}<X^{\prime }\beta ^{*}<c-w_{u}\right] XX^{\prime } \\&\times \left\{ 1\left[ \varDelta -w_{l}<U<\varDelta +w_{u}\right] -w_{l}\frac{f_{U}(\varDelta -w_{l})}{ 1-F_{U}(-X^{\prime }\beta )}\right\} \Bigg ]. \end{aligned}$$

Theorem AD is proven in the Appendix. Roughly speaking, \(H_{TWE}\) falls between \(H_{QME}\) and \(H_{WME}\), see (4) and (5), because the \(X\)-conditional mean of \( \{\cdot \}\) in \(H_{TWE}\) is “area under the density \(\frac{f_{U}}{1-F_{U}(-X^{\prime }\beta )}\) between \(\varDelta -w_{l}\) and \(\varDelta +w_{u}\)” minus \(w_{l}\frac{ f_{U}(\varDelta -w_{l})}{1-F_{U}(-X^{\prime }\beta )}\). If \(\beta _{1}^{*}=\beta _{1}\Longleftrightarrow \varDelta =0\), then the strict unimodality of \(U\) at \(0\) is enough to make this display positive. But \(\beta _{1}^{*}\) might differ much from \(\beta _{1}\), depending on \( f_{U}\) and \((-w_{l},w_{u})\) so that the interval \((-w_{l},w_{u})\) may fall under a downward sloping part of \(f_{U}\) to make the last display non-positive. This is why Assumption U was imposed.

3.1 Estimation and implementation aspects of TWE

Let \(\hat{U}_{i}\equiv Y_{i}-X_{i}^{\prime }b_{N}\rightarrow ^{p}Y_{i}-X_{i}^{\prime }\beta +\beta _{1}-\beta _{1}^{*}=U_{i}-\varDelta \). Then \(\hat{H}_{TWE}^{-1}\hat{G}\hat{H}_{TWE}^{-1}\) can be used to estimate the asymptotic variance, where

$$\begin{aligned} \hat{G}&\equiv \frac{1}{N}\sum _{i}1\left[ w_{l}<X_{i}^{\prime }b_{N}<c-w_{u}\right] X_{i}X_{i}^{\prime } \\&\times \left\{ 1\left[ -w_{l}<\hat{U}_{i}<w_{u}\right] \hat{U} _{i}+w_{u}1\left[ w_{u}\le \hat{U}_{i}\right] \right\} ^{2} \end{aligned}$$

and

$$\begin{aligned} \hat{H}_{\textit{TWE}}&\equiv \frac{1}{N}\sum _{i}1\left[ w_{l}<X_{i}^{\prime }b_{N}<c-w_{u}\right] X_{i}X_{i}^{\prime } \\&\times \left\{ 1\left[ -w_{l}<\hat{U}_{i}<w_{u}\right] -w_{l}\frac{1 }{h}1\left[ -w_{l}\le \hat{U}_{i}\le -w_{l}+h\right] \right\} \end{aligned}$$

for a bandwidth \(h\rightarrow ^{+}0\) as \(N\rightarrow \infty \). The expression‘\( h^{-1}1[-w_{l}\le \hat{U}_{i}\le -w_{l}+h]\)’ in \(\hat{H}_{TWE}\) is to estimate \(f_{U}(\varDelta -w_{l})/\{1-F_{U}(-X^{\prime }\beta )\}\) nonparametrically. As \(\hat{H}_{TWE}\) is bandwidth-dependent, practitioners would prefer using nonparametric bootstrap instead. This is one reason why we omit proving \(\hat{H}_{TWE}\rightarrow ^{p}H_{TWE}\) and \(\hat{G} \rightarrow ^{p}E\{m(\beta ^{*})m(\beta ^{*})^{\prime }\}\) that need more assumptions.

The above derivation suggests an iteration algorithm: with \(U_{i}(b_{0})\equiv Y_{i}-X_{i}^{\prime }b_{0}\),

$$\begin{aligned} b_{1}&= b_{0}+\hat{H}_{TWE}^{-1}\frac{1}{N}\sum _{i}1\left[ w_{l}<X_{i}^{\prime }b_{0}<c-w_{u}\right] X_{i}\\&\times \left\{ 1\left[ -w_{l}<U_{i}(b_{0})<w_{u}\right] U_{i}(b_{0})+w_{u}1\left[ w_{u} \le U_{i}(b_{0})\right] \right\} \end{aligned}$$

which is to be iterated until convergence; \(b_{N}\) in \(\hat{H}_{TWE}\) should be replaced by \(b_{0}\) as well. In the iteration, \(\hat{H}_{TWE}\) may pose a trouble as it may not be positive definite, in which case \(w_{l}h^{-1}1[-w_{l}\le U_{i}(b_{0})\le -w_{l}+h]\) in \(\hat{H}_{TWE}\) may be dropped. If the problem persists, \(1[w_{l}<X_{i}^{\prime }b_{0}<c-w_{u}]\) may be dropped as well.

The finite sample bias could be large for the intercept estimator since it plays the role of satisfying \(E\{m(\beta _{1}^{*},\tilde{\beta } )\}=0\), and the intercept bias magnitude would depend on \((-w_{l},w_{u})\). Instead of modifying the intercept, TWE might try to adjust the scale of \(U\) , but this cannot be done easily because the scale of \(Y\) is fixed and there is no scale parameter to estimate. What could still happen though is scaling all slope estimates up/down to alter the scale of the residual \(Y-X^{\prime }b_{N}\)—this would be how the intercept bias affects the slope estimates. In this case, all slope estimates will be biased up/down by the same positive factor. This, however, would not bias signs of slope estimates, nor their ratios.

4 Conclusions

LTRC data is a common problem that face researchers and it is important to use an estimator that takes the truncation and censoring into account in order not to draw incorrect conclusions due to misleading results. In this paper we derive the asymptotic distribution of the semiparametric estimator TWE, suggested by Karlsson and Laitila (2008). \(\sqrt{N}\)-consistency for the slope estimates were already shown in Karlsson and Laitila (2008). The results in this paper together with the results in Karlsson and Laitila (2008) suggest that the TWE is a suitable estimator when data is LTRC. However, there are also situations in practice, where it has drawbacks as discussed in the previous section.