Linear censored regression models with scale mixtures of normal distributions

Garay, Aldo M.; Lachos, Victor H.; Bolfarine, Heleno; Cabral, Celso R. B.

doi:10.1007/s00362-015-0696-9

Linear censored regression models with scale mixtures of normal distributions

Regular Article
Published: 11 June 2015

Volume 58, pages 247–278, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Statistical Papers Aims and scope Submit manuscript

Linear censored regression models with scale mixtures of normal distributions

Download PDF

Aldo M. Garay¹,
Victor H. Lachos¹,
Heleno Bolfarine² &
…
Celso R. B. Cabral³

1135 Accesses
35 Citations
Explore all metrics

Abstract

In the framework of censored regression models the random errors are routinely assumed to have a normal distribution, mainly for mathematical convenience. However, this method has been criticized in the literature because of its sensitivity to deviations from the normality assumption. Here, we first establish a new link between the censored regression model and a recently studied class of symmetric distributions, which extend the normal one by the inclusion of kurtosis, called scale mixtures of normal (SMN) distributions. The Student-t, Pearson type VII, slash, contaminated normal, among others distributions, are contained in this class. A member of this class can be a good alternative to model this kind of data, because they have been shown its flexibility in several applications. In this work, we develop an analytically simple and efficient EM-type algorithm for iteratively computing maximum likelihood estimates of the parameters, with standard errors as a by-product. The algorithm has closed-form expressions at the E-step, that rely on formulas for the mean and variance of certain truncated SMN distributions. The proposed algorithm is implemented in the R package SMNCensReg. Applications with simulated and a real data set are reported, illustrating the usefulness of the new methodology.

Flexible regression modeling for censored data based on mixtures of student-t distributions

Article 03 December 2018

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

Article 24 August 2018

Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

Article 17 June 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Regression models with censored dependent variable (hereafter CR models) are applied in many fields, like econometric analysis, clinical essays, medical surveys, engineering studies, among others. For example, in econometrics, the study of the labor force participation of married women is usually conducted under the ordinary Tobit model Greene (2012). In this case, the observed response is the wage rate, which is typically considered as censored below zero, i.e., for working women, positive values for the wage rates are registered, whereas for the non-working women the observed wage rates are zero (see Mroz 1987). In AIDS research, the viral load measures may be subjected to some upper and lower detection limits, below or above which they are not quantifiable. As a result, the viral load responses are either left or right censored depending on the diagnostic assays used (see Wu 2010).

In general, for mathematical tractability reasons, it is assumed that the random errors have a normal distribution Wei and Tanner (1990). However, it is well-known that several phenomena are not always in agreement with this assumption, yielding data with a distribution with heavier tails. The problem of longer-than-normal tails (or outliers) can be circumvented by data transformations (namely, Box–Cox, etc.), which can render approximate normality with reasonable empirical results. However, some possible drawbacks of these methods are: (i) transformations provide reduced information on the underlying data generation scheme; (ii) component wise transformations may not guarantee joint normality; (iii) parameters may lose interpretability on a transformed scale and (iv) transformations may not be universal and usually vary with the data set. Hence, from a practical perspective, there is a necessity to seek an appropriate theoretical model that avoids data transformations, yet presenting a robustified “Gaussian” framework.

To deal with the problem of atypical observations in regression models with complete responses, proposals have been made in the literature to replace normality with more flexible classes of distributions. For instance, Lange et al. (1989) discussed the use of the Student-t distribution in multivariate regression models. In this case, the degrees of freedom parameter is the natural choice to control kurtosis. Ibacache-Pulgar and Paula (2011), proposed some local influence measures in Student-t partially linear models. Villegas et al. (2012) proposed the generalized symmetric linear models, in which a link function is defined to establish a relationship between the mean values of symmetric distributions and linear predictors. Recently, Arellano-Valle et al. (2012) advocated the use of the Student-t distribution in the context of truncated regression models. More recently, Massuia et al. (2014) developed diagnostic measures for censored regression models using the Student-t distribution, including the implementation of an interesting (and simple) expectation-maximization (EM) algorithm for maximum likelihood (ML) estimation. They demonstrated its robustness aspects against outliers through extensive simulations.

Although there are some proposals that overcome the problem of atypical observations in CR models, there are no studies taking into account, at the same time, censored responses and observational errors modeled by a distribution in the scale mixture of normal class, which is, maybe, the most important family of symmetric distributions. SMN distributions are extensions of the normal one, incorporating kurtosis. The Student-t (T), Pearson type VII (PVII), slash (SL), power exponential (PE), contaminated normal (CN) and, obviously, the normal (N) distributions are included in this class. Comprehensive surveys are available in Fang and Zhang (1990), Arellano-Valle (1994) and Meza et al. (2012), among others. In this paper, we propose a CR model where the observational errors have a SMN distribution (hereafter we will call it the SMN-CR model). A fully likelihood-based approach is carried out, including the implementation of an exact EM-type algorithm for ML estimation. As in Massuia et al. (2014), we show that the E-step reduces to computing the first two moments of certain truncated SMN distributions. The general formulas for these moments were derived in closed form by Genç (2012). The likelihood function and the asymptotic standard errors (SE) are easily computed as a by-product of the E-step and are used for monitoring convergence and for model selection using the akaike information criterion (AIC) or the bayesian information criterion (BIC). The theoretical justification of the proposal rests on the facts that the SMN class stochastically attributes varying weights to each subject, i.e., lower weight for outliers and thus controls the influence of atypical observations on the overall inference. Moreover, every member of the SMN class tends to the normal case, for example, as the Student-t degrees of freedom tends to the infinity, it approaches normality.

The rest of the paper is organized as follows. Section 2 briefly outlines some preliminary properties of the SMN and truncated SMN distributions. The SMN-CR model is presented in Sect. 3, including the implementation of the ECME algorithm Liu and Rubin (1994) for ML estimation, which is a simple extension/modification of the EM algorithm. In Sect. 4, we derive approximate SE for the regression parameters of the SMN-CR model. Sect. 5, presents some simulation studies to compare the performance of our methods with other normality-based methods. In Sect. 6, advantages of the proposed methodology is illustrated through the analysis of a real data set on housewives wages, previously analyzed under normal errors. Section 7 concludes with a short discussion on the issues raised by our study and some possible directions for future research.

2 Preliminaries

Throughout this paper $X \sim \text {N}(\mu ,\sigma ^2)$ denotes a random variable X with normal distribution with mean $\mu $ and variance $\sigma ^2$ and $\phi \left( \cdot |\mu ,\sigma ^2\right) $ denotes its probability density function (pdf). $\phi (\cdot )$ and $\Phi (\cdot )$ denote, respectively, the pdf and the cumulative distribution function (cdf) of the standard normal distribution. In general, we use the traditional convention denoting a random variable (or a random vector) by an upper case letter and its realization by the corresponding lower case. Random vectors and matrices are denoted by boldface letters. $\mathbf {X}^{\top }$ is the transpose of $\mathbf {X}$. $X \bot Y$ indicates that the random variables X and Y are independent.

We start by defining the SMN distributions, through their hierarchical formulation, and then we introduce some further properties.

Definition 1

We say that a random variable X has a SMN distribution with location parameter $\mu $ and scale parameter $\sigma ^2>0$ if it has the following stochastic representation:

$$\begin{aligned} X=\mu +U^{-\frac{1}{2}}Z,\,\,\,\,\, Z \bot U, \end{aligned}$$

(1)

where $Z\sim {N}(0,\sigma ^2)$, U is a positive random variable with cdf $H(\cdot |{\varvec{\nu }})$ and ${\varvec{\nu }}$ is a scalar or vector parameter indexing the distribution of U.

We use the notation $X \sim \text {SMN}(\mu , \sigma ^2,{\varvec{\nu }}).$ When $\mu =0$ and $\sigma ^2=1$ we have the so-called standard SMN distribution. Note from (1) that $X|U=u~\sim ~N(\mu ,u^{-1}\sigma ^2)$. Thus, integrating out U from the joint density of X and U will lead to the following marginal density of X:

$$\begin{aligned} f_{SMN}\left( x|\mu ,\sigma ^2,{\varvec{\nu }}\right) = \big (2 \pi \sigma ^2\big )^{-\frac{1}{2}} \int ^{\infty }_{0} u^{\frac{1}{2}} \exp \left\{ -\big (u/2 \sigma ^2\big )(x-\mu )^2\right\} dH\left( u|{\varvec{\nu }}\right) , \end{aligned}$$

(2)

where $H(\cdot |{\varvec{\nu }})$ is the cdf of U, which determines the form of the SMN distribution. U is called the scale factor and $H(\cdot |{\varvec{\nu }})$ is called the mixture distribution.

It is important to notice that there exists a relation between SMN distributions and elliptical distributions. We say that the random variable X has a univariate elliptical distribution with location parameter $\mu $ and scale parameter $\sigma ^2$, when its density is given by

$$\begin{aligned} f(x) = \sigma ^{-1} g\left( z \right) \!, \end{aligned}$$

(3)

where $z=(x-\mu )^2/\sigma ^2$ and $g: \mathbb {R} \rightarrow [0,\infty )$ satisfies $\int _{0}^{\infty } z^{-\frac{1}{2}}g(z) dz < \infty $. It easy to see that (2) has the form (3). The relation between SMN and elliptical distributions will be used in Sect. 4, to obtain SE for the regression parameters.

Definition 2

Let $X \sim \text {SMN}(\mu , \sigma ^2,{\varvec{\nu }})$ and $a<b$ such that $P(a<X<b)>0$. A random variable Y has a truncated SMN distribution in the interval (a, b) if it has the same distribution as $X|X \in (a,b) $. In this case we write $Y \sim \text {TSMN}_{(a,b)}(\mu , \sigma ^2,{\varvec{\nu }})$.

As an obvious consequence of Definition 2, we can obtain the density of $Y \sim \text {TSMN}_{(a,b)}(\mu ,\sigma ^2,{\varvec{\nu }})$, given by

$$\begin{aligned}&f_{\text {TSMN}}(y|\mu ,\sigma ^2,{\varvec{\nu }};(a,b))\\&\quad =f_{SMN}(y|\mu ,\sigma ^2,{\varvec{\nu }}) \left[ {F}_{SMN}\left( \frac{b-\mu }{\sigma }\right) -{F}_{SMN}\left( \frac{a-\mu }{\sigma }\right) \right] ^{-1},~~ a < y < b, \nonumber \end{aligned}$$

(4)

and $f_{\text {TSMN}}(y|\mu ,\sigma ^2,{\varvec{\nu }};(a,b))=0$ otherwise, where ${F}_{SMN}(\cdot )$ denotes the cdf of the standard SMN distribution. Now we establish the following proposition, which is crucial to the development of our proposed theory. It is a natural extension of Theorem 1 (and Corollary 1) of Genç (2012). In what follows $\text {E}[\cdot ]$ denotes expectation, $\text {E}_X[\cdot ]$ denotes expectation relative to the distribution of X and, for the sake of notation simplicity, we denote all pdf’s by $f(\cdot )$. Thus, for example, f(u, x) denotes the joint pdf of U and X and $f(u|X \in {\mathcal A})$ denotes the pdf of U given the event $\left\{ X \in {\mathcal A}\right\} $.

Proposition 1

Let $X \sim \text {SMN}(0,1,{\varvec{\nu }})$ with scale factor U and mixture distribution $H(\cdot |{\varvec{\nu }})$. Then, for $a <b$, the $\text {E}\left[ U^{r}X^{s}|X\in (a,b)\right] $ for $r\ge 1$ and $s=0,1,2$ is given by:

$$\begin{aligned} \text {E}\left[ U^{r}|X\in (a,b)\right]= & {} \tau (a,b) \left[ \text {E}_{\Phi }\left( r,b\right) -\text {E}_{\Phi }\left( r,a\right) \right] ,\\ \text {E}\left[ U^{r}X|X\in (a,b)\right]= & {} \tau (a,b) \left[ \text {E}_{\phi }\left( r-\frac{1}{2},a\right) - \text {E}_{\phi }\left( r-\frac{1}{2},b\right) \right] \!,\\ \text {E}\left[ U^{r}X^{2}|X\in (a,b)\right]= & {} \tau (a,b) \left[ \text {E}_{\Phi }\left( r-1,b\right) -\text {E}_{\Phi }\left( r-1,a\right) + a\text {E}_{\phi }\left( r-\frac{1}{2},a\right) \right. \\&\left. - b\text {E}_{\phi }\left( r-\frac{1}{2},b\right) \right] , \end{aligned}$$

where

$$\begin{aligned} \tau (a,b)&= \left( {F}_{SMN}\left( b\right) -{F}_{SMN}\left( a\right) \right) ^{-1}; \end{aligned}$$

(5)

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=\text {E}\left[ U^{r}\phi \left( h\, U^{\frac{1}{2}}\right) \right] =\int ^{\infty }_{0}u^{r}\phi \left( h \,u^{\frac{1}{2}}\right) dH\left( u|{\varvec{\nu }}\right) ; \end{aligned}$$

(6)

$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\text {E}\left[ U^{r}\Phi \left( h \,U^{\frac{1}{2}} \right) \right] =\int ^{\infty }_{0}u^{r}\Phi \left( h\, u^{\frac{1}{2}} \right) dH\left( u|{\varvec{\nu }}\right) . \end{aligned}$$

(7)

Proof

Let ${\mathcal A}=(a,b)$. From Definitions 1 and 2, we have that $X|U=u\sim \text { N}(0,u^{-1})$, $X|X\in \mathcal{A}\sim \text {TSMN}_{\mathcal{A}}(0,1,{\varvec{\nu }})$ and $X|U=u,X\in \mathcal{A}\sim \text {TN}_{\mathcal{A}}(0,u^{-1})$, that is, a truncated normal distribution in $\mathcal{A}$, being 0 and $u^{-1}$ the mean and variance, respectively, before truncation. Then,

$$\begin{aligned} \text {E}\left[ U^{r}X^{s}|X\in \mathcal{A}\right]&\quad {} = \text { E}_{U}\left[ U^{r}\text { E}_{X}\left[ X^{s}|U=u,X\in \mathcal{A}\right] |X\in \mathcal{A}\right] \nonumber \\&\quad {}= \int ^{\infty }_{0}u^{r}\text {E}_{X}\left[ X^{s}|U=u,X\in \mathcal{A}\right] f(u|X\in \mathcal{A})du. \end{aligned}$$

(8)

The pdf in the integral sign takes the following form:

$$\begin{aligned} f(u|X\in \mathcal{A})&=\int f(u,x|X\in \mathcal{A})dx \end{aligned}$$

(9)

$$\begin{aligned}&=\int f(u|X=x,X \in {\mathcal A})f(x|X \in {\mathcal A}) dx \nonumber \\&= \tau (a,b) \int f(u|X=x,X \in {\mathcal A}) f(x) \mathbb {I}_{\mathcal A}(x) dx \end{aligned}$$

(10)

$$\begin{aligned}&= \tau (a,b) \int f\left( u,x\right) \mathbb {I}_{\mathcal A}(x) dx \\&=\tau (a,b) \int _{\mathcal{A}}f\left( u\right) \phi \left( x|0,u^{-1}\right) dx =\tau (a,b) f\left( u\right) \int _{\mathcal{A}^*}\phi \left( z\right) dz \nonumber \\&=\tau (a,b) f\left( u\right) \left[ \Phi \left( b u^{\frac{1}{2}} \right) -\Phi \left( a u^{\frac{1}{2}}\right) \right] , \nonumber \end{aligned}$$

(11)

where $\mathcal{A}^*=(a u^{\frac{1}{2}},b u^{\frac{1}{2}})$ and $\mathbb {I}_{A}(\cdot )$ is the indicator function of the set A. Equation (10) is obtained using the pdf’s expression of $X| X \in \mathcal A$. Equation (11) is consequence of the fact that, if $\left\{ x \in \mathcal A\right\} $, then $\left\{ X \in \mathcal A, X=x\right\} =\left\{ X=x\right\} $, implying that $f(u,x) =f(u|X=x)f(x) =f(u|X=x, X \in \mathcal A)f(x)$. If $\left\{ x \notin \mathcal A\right\} $ then $\mathbb {I}_{\mathcal A}(x) =0$ and the integrands in (10) and (11) are equal to zero. By (8) and Lemma 1 given in Appendix A, it follows that

for $s=0$,
$$\begin{aligned} \text {E}\left[ U^{r}|X\in \mathcal{A}\right]&=\int ^{\infty }_{0}u^{r}f(u|X \in {\mathcal A})du \\&=\tau (a,b) \text {E}_U\left\{ U^{r}\left[ \Phi \left( b {U} ^{\frac{1}{2}}\right) -\Phi \left( a {U} ^{\frac{1}{2}}\right) \right] \right\} \!; \end{aligned}$$
for $s=1$,
$$\begin{aligned} \text {E}\left[ U^{r}X|X\in \mathcal{A}\right]&= \int ^{\infty }_{0}\frac{u^{r}}{u^{\frac{1}{2}}} \frac{\phi \left( a {u}^{\frac{1}{2}} \right) -\phi \left( b {u}^{\frac{1}{2}}\right) }{\Phi \left( b {u}^{\frac{1}{2}}\right) -\Phi \left( a {u}^{\frac{1}{2}} \right) } f(u|X \in {\mathcal A})(u)du\\&=\tau (a,b) \text {E}_U\left\{ U^{r-\frac{1}{2}}\left[ \phi \left( a {U}^{\frac{1}{2}} \right) -\phi \left( b {U}^{\frac{1}{2}}\right) \right] \right\} \!. \end{aligned}$$
for $s=2$,
$$\begin{aligned} \text {E}\left[ U^{r}X^2|X\in \mathcal{A}\right]&=\int ^{\infty }_{0}\left[ u^{r-1}+\frac{au^{r-\frac{1}{2}}\phi \left( a {u}^{\frac{1}{2}}\right) -bu^{r-\frac{1}{2}}\phi \left( b {u}^{\frac{1}{2}}\right) }{\Phi \left( b {u}^{\frac{1}{2}}\right) -\Phi \left( a {u}^{\frac{1}{2}}\right) }\right] \nonumber \\&\quad \times f(u|X \in {\mathcal A})du\\&=\tau (a,b) \text {E}_U\left\{ U^{r-1}\left[ \Phi \left( b {U}^{\frac{1}{2}}\right) - \Phi \left( a {U}^{\frac{1}{2}}\right) \right] \right. \\&\left. \quad {} + U^{r-\frac{1}{2}} \left[ a\phi \left( a {U}^{\frac{1}{2}}\right) -b\phi \left( b {U}^{\frac{1}{2}}\right) \right] \right\} . \end{aligned}$$

$\square $

When the distribution of U is available, this proposition gives closed form expressions for the expected values $\text {E}\left[ U^{r}X^{s}|X\in (a,b)\right] $, where $s=0,1,2$ and $r \ge 1.$

Now we compute the quantities $\text {E}_{\phi }\left( r,h\right) $ and $\text {E}_{\Phi }\left( r,h\right) $ for some elements of the SMN family. They are useful for implementing the ECME algorithm. For the sake of completeness, a detailed proof of these results is sketched in Appendix B.

Pearson type VII distribution: in this case we consider $U\sim Gamma(\nu /2,\delta /2)$, with $\nu >0~\text {and}~\delta >0$, where Gamma(a, b) denotes the Gamma distribution with mean a / b. The density of the random variable X, defined in (1), takes the form
$$\begin{aligned} f_{PVII}(x|\nu ,\delta )=\frac{1}{B\left( \nu /2,1/2\right) \sqrt{\delta }}\left( 1+\frac{x^2}{\delta }\right) ^{-\frac{\nu +1}{2}}, \end{aligned}$$
where $\delta >0$ and $\nu >0$ are shape parameters and B(a, b) represents the beta function. We use the notation $X\sim PVII(0,1;\nu ,\delta )$. In this case, we have that
$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\delta );\\ \text {E}_{\phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) \sqrt{2\pi }}\left( \frac{\delta }{2}\right) ^{\frac{\nu }{2}}\left( \frac{h^2 +\delta }{2}\right) ^{-\frac{\left( \nu +2r\right) }{2}}, \end{aligned}$$
where $\Gamma \left( a\right) $ is the gamma function and $F_{PVII}(\cdot )$ is the cdf of the Pearson type VII distribution. When $\delta =\nu $ we have the Student-t distribution with $\nu $ degrees of freedom. Also, we have the Cauchy distribution when $\delta =\nu =1$.
Slash distribution: here the distribution of the scale factor U is $Beta(\nu ,1)$, with $\nu >0$. The density of the random variable X, defined in (1), is given by
$$\begin{aligned} f_{sl}(x|\nu )=\nu \int ^1_0u^{\nu -1}\phi (x {u}^{\frac{1}{2}})du. \end{aligned}$$
We use the notation $X\sim SL(0,1;\nu )$. In this case, we have that
$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\left( \frac{\nu }{\nu +r}\right) F_{SL}(h|\nu +r);\\ \text {E}_{\phi }\left( r,h\right)&=\frac{\nu }{\sqrt{2\pi }}\left( \frac{h^2}{2}\right) ^{-\left( \nu +r\right) }\Gamma \left( \nu +r,\frac{h^2}{2}\right) \!, \end{aligned}$$
where $\Gamma \left( a,b\right) =\int ^{b}_{0}e^{-t}t^{a-1}dt$ is the incomplete gamma function, see Lemma 6 in Genç (2012), and $F_{SL}(\cdot )$ is the cdf of the slash distribution.
Contaminated normal distribution: here U is a discrete random variable taking one of two states 1 or $\gamma $. In this case the probability function of U is given by
$$\begin{aligned} U= \left\{ \begin{array}{ll} \gamma &{} \text{ with } \text{ probability }~~\xi ;\\ 1 &{} \text{ with } \text{ probability } 1- \xi , \end{array} \right. \end{aligned}$$
It follows immediately that the density of the random variable X, defined in (1), is given by
$$\begin{aligned} f_{CN}(x|\xi ,\gamma )= & {} \xi \phi (x|0,\gamma ^{-\frac{1}{2}})+(1-\xi )\phi (x). \end{aligned}$$
So, we have that
$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\gamma ^r F_{CN}(h|\xi ,\gamma ) + \left( 1-\gamma ^r\right) \left( 1-\xi \right) \Phi \left( h\right) ;\\ \text {E}_{\phi }\left( r,h\right)&=\xi \gamma ^{r}\phi \left( h\sqrt{\gamma }\right) +\left( 1-\xi \right) \phi \left( h\right) , \end{aligned}$$
where $F_{CN}(\cdot )$ is the cdf of the contaminated normal distribution.

As a direct consequence of Proposition 1, in Appendix A we present an important corollary, which is useful for implementing the ECME algorithm.

3 The SMN censored linear regression model

3.1 The model

Consider first a linear regression model where the responses are observed with errors which are independent and identically distributed according to some SMN distribution. To be more precise, let us write

$$\begin{aligned} Y_i=\mathbf {x}^{\top }_i{\varvec{\beta }}+\varepsilon _i,\;\; \varepsilon _i\mathop {\sim }\limits ^{\mathrm{iid}}\text {SMN}(0, \sigma ^2,{\varvec{\nu }}), \;\; i=1,\ldots ,n, \end{aligned}$$

(12)

where the $Y_i$ are responses, ${\varvec{\beta }}=(\beta _1,\ldots ,\beta _p)^{\top }$ is a vector of regression parameters and $\mathbf {x}^{\top }_i=(x_{i1},\ldots ,x_{ip})$ is a vector such that $x_{ij}$ is the value of the j-th explanatory variable for the subject i. By Definition 1, we have that $Y_i\sim \text {SMN}(\mathbf {x}^{\top }_i{\varvec{\beta }},\sigma ^2,{\varvec{\nu }})$, for $i=1,\ldots ,n$. We call it the SMN regression (SMN-R) model.

We are interested in the case where left-censored observations can occur. That is, the observations are of the form

$$\begin{aligned} Y_{\text {obs}_i}= \left\{ \begin{array}{lll} \kappa _{i} &{} \text{ if } &{} Y_i \le \kappa _{i};\\ Y_i &{} \text{ if } &{} Y_i > \kappa _{i}, \end{array} \right. \end{aligned}$$

(13)

$i=1,\ldots ,n$, for some threshold point $\kappa _{i}$. This is called the SMN-CR model. For convenience, we have chosen to work with the left censored case, but the results are easily extensible to other censoring types. If we make $\kappa _i=0$ and assume that $\epsilon _i \sim \text {N}(0,\sigma ^2)$, which corresponds to $U_i=1$ in Definition 1, $i=1,\ldots ,n$, we obtain the Tobit censored response model studied by Barros et al. (2010). In addition, if $U_i\sim Gamma(\nu /2,\nu /2)$ we obtain the Student-t censored regression model developed by Massuia et al. (2014).

It is important to emphasize the difference between censored and truncated data. Citing Lee and Scott (2012), data are said to be censored when the exact values of measurements are not reported. For example, the needle of a scale that does not provide a reading over 200 kg will show 200 kg for all the objects that weigh more than the limit. Data are said to be truncated when the number of measurements outside a certain range is not reported.

Let ${\varvec{\theta }}=({\varvec{\beta }}^{\top },\sigma ^2,{\varvec{\nu }})^{\top }$ be the vector with all parameters in the SMN-CR model. Supposing that there are (possibly) m censored values of the characteristic of interest, we can partition the observed sample $\mathbf {y}_\text {obs}$ in two subsamples of m censored and $n-m$ uncensored values, such that $\mathbf {y}_\text {obs}=\{\kappa _1,\ldots ,\kappa _m,y_{m+1},\ldots ,y_n\}$. Then, the log-likelihood function is given by

$$\begin{aligned} \ell ({\varvec{\theta }}|\mathbf {y}_\text {obs})&=\sum _{i=1}^{m} \log \left[ {F}_{SMN}\left( \frac{\kappa _i-\mathbf {x}^{\top }_i{\varvec{\beta }}}{\sigma }\right) \right] + \sum ^{n}_{i=m+1} \log \left[ f_{SMN}(y_i|\mathbf {x}^{\top }_i{\varvec{\beta }},\sigma ^2,{\varvec{\nu }})\right] . \end{aligned}$$

(14)

To estimate the parameters of the SMN-CR model, an alternative is to maximize this log-likelihood function directly, a procedure that can be quite cumbersome. Alternatively, the standard algorithm in this case is the so-called EM algorithm of Dempster et al. (1977) or some extension like the ECM Meng and Rubin (1993) or the ECME algorithms Liu and Rubin (1994). Our choice is to use the ECME algorithm, a classical, reliable, widespread tool to obtain maximum likelihood estimates.

3.2 Parameter estimation via an EM-type algorithm for the SMN-CR model

In this section we develop an EM-type algorithm for maximum likelihood estimation of the parameters in the SMN-CR model. In order to do this, we need a representation of the model in terms of missing data. First, note that using Definition 1, we have the following hierarchical representation:

$$\begin{aligned} Y_i|U_i=u_i\sim \text {N}\left( \mathbf {x}^{\top }_i{\varvec{\beta }},u_{i}^{-1}\sigma ^2\right) \!; \quad U_i\sim H(\cdot |{\varvec{\nu }}). \end{aligned}$$

(15)

If the observation i is censored, we can consider $y_i$ as a realization of the latent unobservable variable $Y_i \sim \text {SMN}(\mathbf {x}_i^{\top } {\varvec{\beta }}, \sigma ^2,{\varvec{\nu }})$, $i=1,\ldots ,m$. The key to the development of our EM-type algorithm is to consider the complete-data $\mathbf {z}=\{\mathbf {y}_\text {obs}, y_1,\ldots ,y_m,u_1,\ldots ,u_n\}$, that is, we treat the problem as if the missing data $\mathbf {y}_{L}=\{y_1,\ldots ,y_m\}$ and $\mathbf {u}=\{u_1,\ldots ,u_n\}$ were in fact observed. Then, using representation (15), we obtain the complete-data log-likelihood, given by

$$\begin{aligned} \ell _{c}({\varvec{\theta }}|\mathbf {z})&= -\frac{n}{2} \log (2 \pi ) -\frac{n}{2} \log (\sigma ^2) +\frac{1}{2} \sum _{i=1}^{n}\log (u_i) \nonumber \\&\quad \, - \frac{1}{2\sigma ^2}\sum _{i=1}^n {u_i}(y_i-\mathbf {x}^{\top }_i{\varvec{\beta }})^2 + \sum _{i=1}^{n} \log \left( f(u_i|{\varvec{\nu }})\right) \!, \end{aligned}$$

(16)

where $f(\cdot |{\varvec{\nu }})$ is the density of the random variable U.

In what follows the superscript (k) indicates the estimate of the related parameter at stage k of the algorithm. In the E-step of the algorithm, we must obtain the so-called Q-function,

$$\begin{aligned} Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big ) =\text {E}_{{{\varvec{\theta }}^{(k)}}}\left[ \ell _{c}\left( {\varvec{\theta }}|\mathbf {Z}\right) |\mathbf {y}_\text {obs}\right] \!, \end{aligned}$$

where $\text {E}_{{\varvec{\theta }}^{(k)}}$ means that the expectation is being affected using ${\varvec{\theta }}^{(k)}$ for ${\varvec{\theta }}$. Observe that the expression of the Q-function is completely determined by the knowledge of the following expectations

$$\begin{aligned} \mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) = \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^s|y_{\text {obs}_i}],\,\,\, s=0,1,2, \end{aligned}$$

as well as

$$\begin{aligned} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}]~~\text {and}~~\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

Thus, dropping unimportant constants, the Q-function can be written in a synthetic form as

$$\begin{aligned} Q\big ({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}\big )&= -\frac{n}{2}\log (\sigma ^2) -\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left[ \mathcal{E}_{2 i}\big ({\varvec{\theta }}^{(k)}\big ) -2 \mathcal{E}_{1 i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}+\mathcal{E}_{0 i}\big ({\varvec{\theta }}^{(k)}\big ) \big (\mathbf {x}^{\top }_i{\varvec{\beta }}\big )^2\right] \nonumber \\&\,\,\,\,\, {} +\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

(17)

At each step, the conditional expectations $\mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big )$ can be easily derived from the results given in Proposition 1. Thus, for an uncensored observation i, we have that $Y_{\text {obs}_i}=Y_i \sim \text {SMN}(\mathbf {x}_i^{\top } {\varvec{\beta }},\sigma ^2,{\varvec{\nu }})$ and, therefore,

$$\begin{aligned} \mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big )= y_i^s \text {E}_{{\varvec{\theta }}^{(k)}}[U_i |y_i], \end{aligned}$$

(18)

where $\text {E}_{{\varvec{\theta }}^{(k)}}[U_i |y_i]$ can be obtained using results in Osorio et al. (2007). Thus, for example,

If $Y_i \sim PVII(\mathbf {x}_i^{\top } {\varvec{\beta }},\sigma ^2,\nu ,\delta )$, we have
$$\begin{aligned} \text {E}_{{\varvec{\theta }}^{(k)}}[U_i |y_i]=\frac{\left( \nu +1\right) }{\delta +d^2\big ({\varvec{\theta }}^{(k)},y_i\big )}; \end{aligned}$$
If $Y_i \sim SL(\mathbf {x}_i^{\top } {\varvec{\beta }},\sigma ^2,\nu )$, we have
$$\begin{aligned} \text {E}_{{\varvec{\theta }}^{(k)}}[U_i |y_i] = \frac{\Gamma \left( \nu +1.5,d^2\big ({\varvec{\theta }}^{(k)},y_i\big )/2\right) }{\Gamma \left( \nu +0.5,d^2\big ({\varvec{\theta }}^{(k)},y_i\big )/2\right) }; \end{aligned}$$
If $Y_i \sim CN(\mathbf {x}_i^{\top } {\varvec{\beta }},\sigma ^2,\nu ,\gamma )$, we have
$$\begin{aligned} \text {E}_{{\varvec{\theta }}^{(k)}}[U_i |y_i] = \frac{1-\nu +\nu \gamma ^{1.5}e^{0.5\left( 1-\gamma \right) d^2\big ({\varvec{\theta }}^{(k)},y_i\big )}}{1-\nu +\nu \gamma ^{0.5}e^{0.5\left( 1-\gamma \right) d^2\big ({\varvec{\theta }}^{(k)},y_i\big )}}, \end{aligned}$$

where $d\big ({\varvec{\theta }}^{(k)},y_i\big )=\left( y_i-\mathbf {x}_i^{\top }{\varvec{\beta }}^{(k)}\right) /\sigma ^{(k)}$.

For a censored observation i, we have $Y_i \le \kappa _i$, so that

$$\begin{aligned} \mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) = \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^s|Y_i \le \kappa _i], \end{aligned}$$

(19)

which can be obtained for the different distributions using the results given in Proposition 1, along with the results given in Eqs. (6) and (7) with $r=1$.

When the M-step turns out to be analytically intractable, it can be replaced with a sequence of conditional maximization (CM) steps. The resulting procedure in known as ECM algorithmMeng and Rubin (1993). The ECME algorithmLiu and Rubin (1994), a faster extension of EM and ECM algorithm, is obtained by maximizing the constrained Q-function with some CM-steps that maximize the corresponding constrained actual marginal likelihood function, called CML-steps. Therefore, our EM-type algorithm (ECME) for the SMN-CR models can be summarized in the following way (see Appendix C for details):

E-step: Given ${\varvec{\theta }}={\varvec{\theta }}^{(k)}$, for $i=1,\ldots ,n$;

If observation i is uncensored then, for $s=0,1,2$, compute $\mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) $ given in (18);
If observation i is censored then, for $s=0,1,2$, compute $\mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) $ in (19).

CM-step: Update ${{\varvec{\theta }}}^{(k)}$ by maximizing $Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )$ over ${\varvec{\theta }}$, which leads to the following expressions,

$$\begin{aligned}&\displaystyle {{\varvec{\beta }}}^{(k+1)}=\left( \sum ^n_{i=1}\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\mathbf {x}_i\mathbf {x}^{\top }_i\right) ^ {-1}\sum ^n_{i=1}\mathbf {x}_i\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big );&\end{aligned}$$

(20)

$$\begin{aligned}&\displaystyle {\sigma ^2}^{(k+1)}=\frac{1}{n}\sum ^n_{i=1}\left[ \mathcal{E}_{2i}\big ({\varvec{\theta }}^{(k)}\big )-2\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)} +\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\big (\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}\big )^2\right] .&\nonumber \\ \end{aligned}$$

(21)

CML-step: Update $\nu ^{(k)}$ by maximizing the actual marginal log-likelihood function, obtaining

$$\begin{aligned} {\varvec{\nu }}^{(k+1)}&=\text {argmax}_{\nu }\left\{ \sum _{i=1}^{m} \log \left[ {F}_{SMN}\left( \frac{\kappa _i-\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}}{\sigma ^{(k+1)}}\right) \right] \right. \nonumber \\&\,\,\,\, {} \left. + \sum ^{n}_{i=m+1} \log \left[ f_{SMN}(y_i|\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)},{\sigma ^2}^{(k+1)},{\varvec{\nu }})\right] \right\} . \end{aligned}$$

(22)

This process is iterated until some distance involving two successive evaluations of the actual log-likelihood $\ell ({\varvec{\theta }}|\mathbf {y}_{obs})$, like

$$\begin{aligned}&||\ell \big ({{\varvec{\theta }}}^{(k+1)}|\mathbf {y}_{obs}\big )-\ell \big ({{\varvec{\theta }}}^{(k)}|\mathbf {y}_{obs}\big )|| \quad \text{ or } \\&||\ell \big ({{\varvec{\theta }}}^{(k+1)}|\mathbf {y}_{obs}\big )/\ell \big ({{\varvec{\theta }}}^{(k)}|\mathbf {y}_{obs}\big )-1||, \end{aligned}$$

is small enough. We have adopted this strategy to update the estimate of ${\varvec{\nu }}$, by direct maximization of the marginal log-likelihood, circumventing the computation of $\text {E}_{{\varvec{\theta }}^{(k)}}[\log (U_i)|y_{\text {obs}_i}]$ and $\text {E}_{{\varvec{\theta }}^{(k)}}[\log (f(U_i |{\varvec{\nu }}))|y_{\text {obs}_i}]$.

4 Approximated standard errors for the fixed effects

Standard errors of the ML estimates can be approximated by the inverse of the observed information matrix, but there is generally no closed form, see Meilijson (1989) and Lin (2009). Considering ${\varvec{\theta }}=\left( {\varvec{\beta }},\sigma ^2,{\varvec{\nu }}\right) $, the empirical information matrix is defined as

$$\begin{aligned} \mathbf {I}_{e}\left( {\varvec{\theta }}|\mathbf {y}_{obs}\right)= & {} \sum _{i=1}^{n}\mathbf {v}\left( \mathbf {y}_{{obs}_i}|{\varvec{\theta }}\right) \mathbf {v}^{\top }\left( \mathbf {y}_{{obs}_i}|{\varvec{\theta }}\right) \\&-\frac{1}{n}\mathbf {V}\left( \mathbf {y}_{obs}|{\varvec{\theta }}\right) \mathbf {V}^{\top }\left( \mathbf {y}_{obs}|{\varvec{\theta }}\right) \!, \end{aligned}$$

where $\mathbf {V}^{\top }\left( \mathbf {y}_{{obs}}|{\varvec{\theta }}\right) =\sum _{i=1}^{n}\mathbf {v}\left( \mathbf {y}_{{obs}_i}|{\varvec{\theta }}\right) $. It is noted from the result of Louis (1982) that, the individual score can be determined as

$$\begin{aligned} \mathbf {v}\left( \mathbf {y}_{{obs}_i}|{\varvec{\theta }}\right) = \frac{\partial \ell ({\varvec{\theta }}|\mathbf {y}_{\text {obs}_i})}{\partial {\varvec{\theta }}}= \text {E}\left[ \frac{\partial \ell _{c}({\varvec{\theta }}|\mathbf {z}_i)}{\partial {\varvec{\theta }}}|{\mathbf {y}_{{obs}_i},{\varvec{\theta }}}\right] . \end{aligned}$$

(23)

Thus, substituting the ML estimates of ${\varvec{\theta }}$ in (23), the empirical information matrix $\mathbf {I}_{e}\left( {\varvec{\theta }}|\mathbf {y}_{obs}\right) $ is reduced to

$$\begin{aligned} \mathbf {I}_{e}\left( \widehat{{\varvec{\theta }}}|\mathbf {y}_{obs}\right) =\sum _{i=1}^{n}\widehat{\mathbf {v}}_i\widehat{\mathbf {v}}^{\top }_i, \end{aligned}$$

(24)

where $\widehat{\mathbf {v}}_i=\left( \widehat{\mathbf {v}}_{{\varvec{\beta }}i},\widehat{\mathbf {v}}_{\sigma ^2 i},\widehat{\mathbf {v}}_{{\varvec{\nu }}i}\right) $ is an individual score vector and

$$\begin{aligned} \widehat{\mathbf {v}}_{{\varvec{\beta }}i}= & {} \text {E}\left[ \frac{\partial \ell _{c}({\varvec{\theta }}|\mathbf {z}_i)}{\partial {\varvec{\beta }}}|{\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}}\right] = \frac{1}{\sigma ^2}\left( \mathbf {x}_i\mathcal{E}_{1 i}(\widehat{{\varvec{\theta }}})-\mathcal{E}_{0 i}(\widehat{{\varvec{\theta }}})\mathbf {x}_i\mathbf {x}^{\top }_i\widehat{{\varvec{\beta }}}\right) \!,\nonumber \\ \widehat{\mathbf {v}}_{\sigma ^2 i}= & {} \text {E}\left[ \frac{\partial \ell _{c}({\varvec{\theta }}|\mathbf {z}_i)}{\partial \sigma ^2}|{\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}}\right] \nonumber \\= & {} -\frac{1}{2\widehat{\sigma }^2}+ \frac{1}{2\widehat{\sigma }^4}\left( \mathcal{E}_{2 i}(\widehat{{\varvec{\theta }}})-2\mathcal{E}_{1 i}(\widehat{{\varvec{\theta }}})\mathbf {x}_i^{\top }\widehat{{\varvec{\beta }}} + \mathcal{E}_{0 i}(\widehat{{\varvec{\theta }}})(\mathbf {x}^{\top }_i\widehat{{\varvec{\beta }}})^2\right) \!, \text {and}~~\nonumber \\ \widehat{\mathbf {v}}_{{\varvec{\nu }}i}= & {} \text {E}\left[ \frac{\partial \ell _{c}({\varvec{\theta }}|\mathbf {z}_i)}{\partial {\varvec{\nu }}}|{\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}}\right] = \text {E}\left[ \frac{\partial \log \left( f(U_i|{\varvec{\nu }})\right) }{\partial {\varvec{\nu }}}|\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}\right] , \end{aligned}$$

(25)

where $\ell _{c}({\varvec{\theta }}|\mathbf {z}_i )$ is the log-likelihood formed from the single complete observation $\mathbf {z}_i=(y_{\text {obs}_i},y_{i},u_i)^{\top }$ and $\mathcal{E}_{si}({\varvec{\theta }}^{(k)})=\text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^s|y_{\text {obs}_i}]$. It is important notice that the values of Eq. (25) depend of the distribution of U. Thus for example:

For the Student-t distribution: We consider $U\sim Gamma(\nu /2,\delta /2)$, with $\nu >0$, then
$$\begin{aligned} \widehat{\mathbf {v}}_{\nu i}= & {} -\psi \left( \frac{\widehat{\nu }}{2}\right) +\frac{1}{2}\left( \log \left( \frac{\widehat{\nu }}{2}\right) +1\right) \\&+\frac{1}{2}\left( \text {E}\left[ \log \left( U_i\right) |\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}\right] -\mathcal{E}_{0 i}\big (\widehat{{\varvec{\theta }}}\big )\right) , \end{aligned}$$
where $\psi \left( x\right) $ represents the digamma function of x.
For the Slash distribution: We consider $U \sim Beta(\nu ,1)$ with positive shape parameter $\nu $, then
$$\begin{aligned} \widehat{\mathbf {v}}_{\nu i}= & {} \frac{1}{\widehat{\nu }}+ \text {E}\left[ \log \left( U_i\right) |\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}\right] . \end{aligned}$$

It is important to stress that the SE of $\nu $ depends heavily on the calculation of $\text {E}\left[ \log \left( U_i\right) |\mathbf {y}_{{obs}_i},\widehat{{\varvec{\theta }}}\right] $, which relies on computationally intensive Monte Carlo integrations. In our analysis, we focus solely on comparing the SE of $\beta $ and $\sigma ^2$.

5 Simulation studies

5.1 Robustness of the EM estimates (simulation study 1)

The goal of this section is to compare the performance of the estimates for some censored regression models in the presence of outliers on the response variable. We consider the cases normal, Student-t, contaminated normal and slash, and denote them by N-CR, T-CR, CN-CR and SL-CR, respectively. The computational procedures were implemented using the "R" software R Core Team (2015).

We performed a simulation study based on the N-CR model. Specifically, we considered model (12) with $\mathbf {x}_i^{\top }=(1,x_i)$ and $\varepsilon _i\sim \text {N}(0, \sigma ^2)$, $i=1,\ldots ,n$. We generated 1000 artificial samples of size $n=300$, considering ${\varvec{\beta }}^{\top }=\left( \beta _1,\beta _2\right) =\left( 1,4\right) ,\sigma ^2=2$ and fixing the left censoring level at $p=8,~20~\text {and}~35\,\%$ (that is, 8, 20 and $35\,\%$ of the observations in each data set were left censored, respectively). We generated independently the values $x_i$, for $i=1,\ldots ,n$, from a uniform distribution on the interval (2, 20). These values were fixed throughout the simulations.

To assess how much the EM estimates are influenced by the presence of outliers, we replaced the observation $y_{150}$ by $y_{150}(\vartheta )=y_{150}-\vartheta $, with $\vartheta =1,2,\ldots ,10$. Let $\widehat{\beta }_{i}(\vartheta )$ and $\widehat{\beta }_{i}$ be the EM estimates of $\beta _i$ with and without contamination, respectively, $i=1,2$. We are particularly interested in the relative changes

$$\begin{aligned} RC\big (\widehat{\beta }_{i}(\vartheta )\big )=|\big (\widehat{\beta }_{i}(\vartheta )-\widehat{\beta }_{i}\big )/\widehat{\beta }_{i}|\!. \end{aligned}$$

We define the relative changes for $\sigma ^2$ analogously.

For each replication we obtained the parameter estimates with and without outliers, under the following models: N-CR, T-CR and SL-CR, both with $\nu =3$, and CN-CR with ${\varvec{\nu }}^{\top }= (\xi ,\gamma )=(0.3,0.3)$. Table 1 and Fig. 1 depict the average values of the relative changes across all samples and different censoring levels. In the N-CR case, we observe that influence increases dramatically when $\vartheta $ increases. However, for the SMN-CR models with heavy tails, as the T-CR and the SL-CR, these measures vary little, which indicates that they are more robust than the N-CR model in the presence of discrepant observations. For the CN-CR model with censoring level $p=20\,\%$ and $35\,\%$ we can observe that the relative change increases as $\vartheta $ increases, specially for the parameter $\beta _1$.

Table 1 Simulation study 1

Full size table

5.2 Asymptotic properties (simulation study 2)

We also conducted a simulation study to evaluate the finite-sample performance of the parameter estimates. We generated artificial samples from the SMN-CR model (12), with $\mathbf {x}_i^{\top }=(1,x_{i})$, $i=1,\ldots ,n$.

We considered the censoring levels $p=10$, 25 and $45\,\%$. The sample sizes were fixed at $n=50$, 100, 150, 200, 300, 400, 500, 700 and 800. The true values of the regression parameters were taken as $\beta _1=1.5,~\beta _2=4$ and $\sigma ^2=0.5$. As considered in Labra et al. (2012), the variable $x_i$ ranges from 0.1 to 20 and these values were maintained throughout the experiment. For each combination of parameters, sample sizes and censoring levels, we generated 1000 samples from the SMN-CR model, under four different situations: N-CR, T-CR $(\nu =3)$, SL-CR $(\nu =4)$ and CN-CR $\left( {\varvec{\nu }}^{\top }=(0.5,0.5)\right) $.

In order to analyze the performance of the estimates obtained using our proposed EM-type algorithm, we computed the bias and the mean squared error (MSE) for each combination of sample size, level of censoring and parameter value. For $\beta _i$, they are given, respectively, by

$$\begin{aligned} \text {Bias}\left( \beta _i\right)&=\frac{1}{1000}\sum ^{1000}_{j=1}\left( \widehat{\beta }^{(j)}_i-\beta _i\right) \!; \\ \text {MSE}\left( \beta _i\right)&= \frac{1}{1000}\sum ^{1000}_{j=1}\left( \widehat{\beta }^{(j)}_i-\beta _i\right) ^2, \end{aligned}$$

where $\widehat{\beta }^{(j)}_i$ is the estimate of $\beta _i$ for the j-th sample. We define bias and MSE for $\sigma ^2$ in the same manner. The result considering $p=10\,\%$ is shown in Fig. 2. We can see a pattern of convergence to zero of the bias and MSE when n increases. As a general rule, we can say that bias and MSE tend to approach to zero when the sample size increases indicating that the estimates based on the proposed EM-type algorithm do provide good asymptotic properties. This same pattern of convergence to zero is repeated considering different levels of censoring p (see Appendix D for details).

5.3 Consistency of the estimates of the standard errors for the MLE’s of the parameters (simulation study 3)

Now we show, via simulation study, that the method suggested in Sect. 4 to approximate the SE of the MLE of the regression parameters has good asymptotic properties. We fixed a SMN-CR model (N-CR and T-CR or SL-CR with $\nu =4$ respectively) and a censoring level (5, 10, 20, 30 or 50 %). For each one of these fifteen combinations of model and censoring level, we generated 1000 samples of size $n=100$ with $\beta _1=2$, $\beta _2=1$ and $\sigma ^2=1$. For each sample, we obtained the MLE’s of ${\varvec{\theta }}=(\beta _1,\beta _2,\sigma ^2)$, the estimates of their SE using the technique proposed in Sect. 4 and an approximate 95 % confidence interval assuming asymptotic normality. Table 2 presents the sample standard errors of $\widehat{\theta }$, i.e., the value

$$\begin{aligned} \text {MC SE}= \frac{1}{999}\left[ \sum _{i=1}^{1000}\left( \widehat{\theta }_i\right) ^2- \frac{1}{1000}\left( \sum _{i=1}^{1000}\widehat{\theta }_i\right) ^2\right] \!, \end{aligned}$$

The results from this table show a reasonable MC coverage for both ${\varvec{\beta }}$ and $\sigma ^2$, although the values for $\sigma ^2$ tend to be lower the nominal level $(95\,\%)$. Taking into account the moderate sample size (n=100), we consider these results quite satisfactory.

Table 2 Simulation study 3

Full size table

6 Application

In this section, we provide an application of the results derived in the previous sections using the data described by Mroz (1987). The data set consists of 753 married white women with ages between 30 and 60 years old in 1975, with 428 women that worked at some point during that year. The response variable is the wage rate, which represents a measure of the wage of the housewife known as the average hourly earnings. It is important to stress that if the wage rates are set equal to zero, these wives did not work in 1975. Therefore, these observations are considered left censored at zero. Four predictor variables were considered: the wife’s age, years of schooling, the number of children younger than six years old in the household and the number of children between six and nineteen years old. These data were analyzed by Arellano-Valle et al. (2012) using a truncated Student-t regression model. We analyzed it with the aim of providing additional inferences by using the SMN distributions in the context of censored models. We fitted a regression model with an intercept parameter $\beta _1$ and applied the EM-type algorithm for censored data explained in Sect. 3.2, considering again the N-CR, T-CR, SL-CR and CN-CR models for comparative purposes.

Table 3 shows the parameter estimates, together with their corresponding SE. Table 4 presents some model selection criteria, together with the values of the log-likelihood. The AIC Akaike (1974), BIC Schwarz (1978) and EDC Bai et al. (1989) criteria indicate that the three models with longer tails than the N-CR model seem to produce more accurate estimates. The SE of the T-CR, SL-CR and CN-CR models are smaller than that of the N-CR model.

Table 3 Real data

Full size table

Table 4 Real data

Full size table

In order to identify atypical observations and/or model misspecification, we analyzed the transformation of the martingale residual, $r_{MT_i}$, proposed by Barros et al. (2010). These residuals are defined by

$$\begin{aligned} r_{MT_i}=\text {sign}(r_{M_i})\sqrt{-2\left[ r_{M_i}+\delta _i\log \left( \delta _i-r_{M_i}\right) \right] }, \end{aligned}$$

$i=1,\ldots ,n$, where ${r_{M_i}=\delta _i + \log S(y_i,\widehat{{\varvec{\theta }}}) }$ is the martingale residual proposed by Ortega et al. (2003)—see more details in Therneau et al. (1990), with $\delta _i=0,1$ indicating whether the i-th observation is censored or not, respectively, $\text {sign}(r_{M_i})$ denoting the sign of $r_{M_i}$ and $S\big (y_i,\widehat{{\varvec{\theta }}}\big )=P_{\widehat{{\varvec{\theta }}}}(Y_i > y_i)$ representing the survival function evaluated at ${y}_i$, supposing that it is being affected using the EM estimate $\widehat{{\varvec{\theta }}}$ for ${\varvec{\theta }}$.

The plots of $r_{MT_i}$ with generated confidence envelopes are presented in Fig. 3. From this figure, we can see clearly that the SMN-CR models with heavy tails fit better the data than the N-CR model, since, in that cases, there are fewer observations which lie outside the envelopes.

The robustness of the three models with longer tails than the N-CR model can be assessed by considering the influence of a single outlying observation on the EM estimate of ${\varvec{\theta }}$. In particular, we can assess how much the EM estimate of ${\varvec{\theta }}$ is influenced by a change of $\nabla $ units in a single observation $y_{i}$. Replacing $y_{i}$ by $y_{i}(\nabla )=y_{i}+\nabla $, we define $\widehat{\beta }_{i}(\nabla )$ as the EM estimate of $\beta _i$ after contamination, $i=1,\ldots ,5$, and analyze the behavior of the relative changes, as we did in Sect. 5.1. In this study we contaminated the observations $y_{750}$ (censored) and $y_7$ (uncensored), considering $\nabla \in \{0,1,\ldots ,10\}$.

Figure 4 displays the results of the relatives changes of the estimates for different values of $\nabla $. We omitted the plot concerning $\beta _2$ because the relative changes patterns are not so distinguishable in this case. As expected, the estimates from the models with longer tails than the N-CR model are less affected by variations on $\nabla $, no matter if the observation is censored or not. Thus, it is clear that the SMN-CR models with heavy tails are more robust, providing more accurate estimates when the data have departures from normality.

7 Conclusions

We have proposed a robust approach to linear regression models with censored observations based on SMN distributions, called SMN-CR models. This offers a high degree of flexibility, allowing us to deal properly with censored data in the presence of outliers. A novel ECME algorithm to obtain approximated maximum likelihood estimates is developed using formulas for the moments of the truncated SMN distribution, leading to closed-form expressions for the E-step. We applied our methodology to real data set (freely downloadable from "R") as well as to simulated data, in order to illustrate how the procedures can be used to evaluate model assumptions, identify outliers, and obtain robust parameter estimates. From these results, it is encouraging that the use of SMN-CR models with heavy tails offer a better fitting, a better protection against outliers and more precise inferences than the N-CR model.

Although the SMN-CR models considered here have shown great flexibility to model symmetric data, its robustness against outliers can be seriously affected by the presence of skewness. Recently, Lachos et al. (2010) proposed a remedy to accommodate skewness and heavy-tailedness simultaneously, using SMSN distributions. We conjecture that our methodology can be used under CR models, and should yield satisfactory results at the expense of additional complexity in implementation. An in-depth investigation of such extensions is beyond the scope of the present paper, but it is an interesting topic for further research. Finally, the proposed EM-type algorithm has been coded and implemented in the "R" package "SMNCensReg" Garay et al. (2013), which is available for download at CRAN repository. A great advantage of this package is that all the censoring possibilities are taken into account: left, right and interval.

References

Akaike H (1974) A new look at the statistical model identification. Autom Control IEEE Trans 19:716–723
Article MathSciNet MATH Google Scholar
Arellano-Valle R, Castro L, González-Farías G, Muñoz-Gajardo K (2012) Student-t censored regression model: properties and inference. Stat Methods Appl 21:453–473
Article MathSciNet MATH Google Scholar
Arellano-Valle R. B (1994) Distribuições elípticas: propriedades, inferência e aplicações a modelos de regressão. Ph.D. thesis, Instituto de Matemática e Estatística, Universidade de São Paulo, in portuguese
Bai ZD, Krishnaiah PR, Zhao LC (1989) On rates of convergence of efficient detection criteria in signal processing with white noise. Inform Theory IEEE Trans 35:380–388
Article MathSciNet MATH Google Scholar
Barros M, Galea M, González M, Leiva V (2010) Influence diagnostics in the Tobit censored response model. Stat Methods Appl 19:716–723
Article MathSciNet MATH Google Scholar
Dempster A, Laird N, Rubin D (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39:1–38
MathSciNet MATH Google Scholar
Fang KT, Zhang YT (1990) Generalized multivariate analysis. Springer, Berlin
MATH Google Scholar
Garay A. M, Lachos V, Massuia M. B (2013) SMNCensReg: fitting univariate censored regression model under the scale mixture of normal distributions. R package version 2.2. http://CRAN.R-project.org/package=SMNCensReg
Genç AI (2012) Moments of truncated normal/independent distributions. Stat Pap 54:741–764
Article MathSciNet MATH Google Scholar
Greene W (2012) Econometric analysis. Prentice Hall, New York
Google Scholar
Ibacache-Pulgar G, Paula G (2011) Local influence for Student-t partially linear models. Comput Stat Data Anal 55:1462–1478
Article MathSciNet MATH Google Scholar
Kim HJ (2008) Moments of truncated Student- distribution. J Korean Stat Soc 37:81–87
Article MathSciNet MATH Google Scholar
Labra FV, Garay AM, Lachos VH, Ortega EMM (2012) Estimation and diagnostics for heteroscedastic nonlinear regression models based on scale mixtures of skew-normal distributions. J Stat Plan Inference 142:2149–2165
Article MathSciNet MATH Google Scholar
Lachos VH, Ghosh P, Arellano-Valle RB (2010) Likelihood based inference for skew-normal independent linear mixed models. Stat Sin 20:303–322
MathSciNet MATH Google Scholar
Lange KL, Little R, Taylor J (1989) Robust statistical modeling using t distribution. J Am Stat Assoc 84:881–896
MathSciNet Google Scholar
Lee G, Scott C (2012) EM algorithms for multivariate Gaussian mixture models with truncated and censored data. Comput Stat Data Anal 56:2816–2829
Article MathSciNet MATH Google Scholar
Liu C, Rubin DB (1994) The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 80:267–278
MathSciNet MATH Google Scholar
Lin TI (2010) Robust mixture modeling using multivariate skew tădistributions. Stat Comput 20:343–356
Article MathSciNet Google Scholar
Louis TA (1982) Finding the observed information matrix when using the EM algorithm. J R Stat Soc 44:226–233
MathSciNet MATH Google Scholar
Massuia MB, Cabral CRB, Matos LA, Lachos VH (2014) Influence diagnostics for Student-t censored linear regression models. Statistics 85:1–21. doi:10.1080/02331888.2014.958489
MATH Google Scholar
Meilijson I (1989) A fast improvement to the EM algorithm to its own terms. J R Stat Soc Ser B 51:127–138
MathSciNet MATH Google Scholar
Meng XL, Rubin BD (1993) Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika 80:267–278
Article MathSciNet MATH Google Scholar
Meza C, Osorio F, la Cruz RD (2012) Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Stat Comput 22:121–139
Article MathSciNet MATH Google Scholar
Mroz TA (1987) The sensitivity of an empirical model of married women’s hours of work to economic and statistical assumptions. Econometrica 55:765–799
Article Google Scholar
Ortega EMM, Bolfarine H, Paula GA (2003) Influence diagnostics in generalized log-gamma regression models. Comput Stat Data Anal 42:165–186
Article MathSciNet MATH Google Scholar
Osorio F, Paula GA, Galea M (2007) Assessment of local influence in elliptical linear models with longitudinal structure. Comput Stat Data Anal 51:4354–4368
Article MathSciNet MATH Google Scholar
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, http://www.R-project.org
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet MATH Google Scholar
Therneau TM, Grambsch PM, Fleming RT (1990) Martingale-based residuals for survival models. Biometrika 77(1):147–160
Article MathSciNet MATH Google Scholar
Villegas C, Paula G, Cysneiros F, Galea M (2012) Influence diagnostics in generalized symmetric linear models. Comput Stat Data Anal 59:161–170
Article MathSciNet Google Scholar
Wei CG, Tanner MA (1990) Posterior computations for censored regression data. J Am Stat Assoc 85:829–839
Article Google Scholar
Wu L (2010) Mixed effects models for complex data. Chapman & Hall/CRC, Boca Raton
MATH Google Scholar

Download references

Acknowledgments

We thank the editor, associate editor, and two referees whose constructive comments led to an improved presentation of the paper. The research of Víctor H. Lachos was supported by Grant 305054/2011-2 from Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq-Brazil) and by Grant 2014/02938-9 from Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP-Brazil). Celso R. B. Cabral was supported by CNPq (via the Universal and CT-Amazônia projects) and CAPES (via Project PROCAD 2007). The research of Aldo M. Garay is supported by Grant 161119/2012-3 from CNPq and by Grant 2013/21468-0 from FAPESP-Brazil and Heleno Bolfarine was supported by CNPq.

Author information

Authors and Affiliations

Departamento de Estatística, Universidade Estadual de Campinas, Rua Sérgio Buarque de Holanda, 651 – Cidade Universitária Zeferino Vaz Campinas, São Paulo, SP, CEP 13083-859, Brazil
Aldo M. Garay & Victor H. Lachos
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo, SP, Brazil
Heleno Bolfarine
Departamento de Estatística, Universidade Federal do Amazonas, Manaus, AM, Brazil
Celso R. B. Cabral

Authors

Aldo M. Garay
View author publications
You can also search for this author in PubMed Google Scholar
Victor H. Lachos
View author publications
You can also search for this author in PubMed Google Scholar
Heleno Bolfarine
View author publications
You can also search for this author in PubMed Google Scholar
Celso R. B. Cabral
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aldo M. Garay.

Appendices

Appendix 1: Lemmas and corollary

The following Lemmas, provided by Kim (2008) and Genç (2012), are useful for evaluating some integrals used in this paper as well as for the implementation of the proposed EM-type algorithm.

Lemma 1

If $Z\sim \text {TN}_{(a,b)}\left( 0,1\right) $, then

$$\begin{aligned} \left( k+1\right) E\left[ Z^k\right] -E\left[ Z^{k+2}\right] =\frac{\left( b\right) ^{k+1}\phi \left( b\right) -\left( a\right) ^{k+1}\phi \left( a\right) }{\Phi \left( b\right) -\Phi \left( a\right) }, \end{aligned}$$

for $k=-1,0,1,2,\ldots $

Proof: See Lemma 2.3 in Kim (2008).

Lemma 2

Let U be a positive random variable. Then ${F}_{SMN}\left( a\right) =E_U\left[ \Phi \left( aU^{\frac{1}{2}}\right) \right] ,$ where ${F}_{SMN}(\cdot )$ denotes the cdf of a standard SMN random variable, that is, when $\mu =0$ and $\sigma ^2=1$.

Proof: See Lemma 3 in Genç (2012).

The following Corollary is a direct consequence of Proposition 1 given in Sect. 2.

Corollary 1

Let $Y \sim \text {SMN}(\mu ,\sigma ^2, {\varvec{\nu }})$ with scale factor U and $\mathcal{A}=(\text {a},\text {b})$. Then, for $r \ge 1$,

$$\begin{aligned} \text {E}\left[ U^{r}|Y\in \mathcal{A}\right]&=\text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] ;\\ \text {E}\left[ U^{r}Y|Y\in \mathcal{A}\right]&=\mu \text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] + \sigma \text {E}\left[ U^{r}X|X\in \mathcal{A}^*\right] ;\\ \text {E}\left[ U^{r}Y^{2}|Y\in \mathcal{A}\right]&=\mu ^2\text {E}\left[ U^{r}|X\in \mathcal{A}^*\right] +2\mu \sigma \text {E}\left[ U^{r}X|X\in \mathcal{A}^*\right] \\&\quad +\sigma ^2\text {E}\left[ U^{r}X^{2}|X\in \mathcal{A}^*\right] , \end{aligned}$$

where $X \sim \text {SMN }(0,1,{\varvec{\nu }})$ and $\mathcal{A}^*=\left( a^*,b^*\right) $, with $a^*=\left( a-\mu \right) /\sigma $ and $b^*=\left( b-\mu \right) /\sigma $.

Appendix 2: Derivations of quantities $\text {E}_{\phi }\left( r,h\right) $ and $\text {E}_{\Phi }\left( r,h\right) ~$for SMN distributions

In this Appendix, we calculate the expressions for the expected values $\text {E}_{\phi }\left( r,h\right) $ and $\text {E}_{\Phi }\left( r,h\right) ,$ for $r,h\ge 0$, given in Proposition 1.

1.1 Pearson type VII distribution (and the Student-t distribution)

In this case $U\sim Gamma(\nu /2,\delta /2)$, with $\nu >0~\text {and}~\delta >0.$ To facilitate the notation, let us make $\alpha _1=(\nu +2r)/2$ and $\alpha _2=(h^2+\delta )/2$. Then,

$$\begin{aligned} \text { E}_{\phi }\left( r,h\right)&=\int ^{\infty }_{0}\frac{\delta ^{\frac{\nu }{2}} u^{\frac{\nu }{2}-1} u^r}{\sqrt{2\pi } \Gamma \left( \frac{\nu }{2}\right) 2^{\frac{\nu }{2}} }\exp \left\{ -\frac{u(h^2+\delta ) }{2}\right\} du \nonumber \\&= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) \delta ^{\frac{\nu }{2}}\left( \frac{h^2+\delta }{2}\right) ^{-\frac{\nu +2r}{2}}}{ \sqrt{2\pi }\Gamma \left( \frac{\nu }{2}\right) 2^\frac{\nu }{2}} \int ^{\infty }_{0}\frac{\alpha _2^{\alpha _1}u'^{\left\{ \alpha _1-1\right\} }}{\Gamma \left( \alpha _1\right) }\exp \left\{ -\alpha _2{u'}\right\} du' \nonumber \\&= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \sqrt{2\pi }\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{\nu /2}\left( \frac{h^2+\delta }{2}\right) ^{-\frac{\nu +2r}{2}}, \end{aligned}$$

(26)

where the integrand in (26) is the pdf of a random variable $U'$ with distribution $Gamma\left( \alpha _1,\alpha _2\right) $.

$$\begin{aligned} {\text {E}}_{\Phi }\left( r,h\right)&=\int ^{\infty }_{0} \frac{u^{\frac{2r+\nu }{2}-1} \Phi \left( h u^\frac{1}{2}\right) \delta ^\frac{\nu }{2} }{2^\frac{\nu }{2} \Gamma \left( \frac{\nu }{2} \right) } \exp \left\{ -\frac{u\delta }{2}\right\} du \nonumber \\&\quad {}= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r} \int ^{\infty }_{0}\left( \frac{\delta }{2}\right) ^{\alpha _1}\frac{\Phi \left( h u'^{\{ \frac{1}{2} \} }\right) {u'}^{\left\{ {\alpha _1-1}\right\} }}{\Gamma \left( \alpha _1\right) }\exp \left\{ -\frac{u'\delta }{2}\right\} du' \nonumber \\&\quad {}= \frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}E_{U'}\left[ \Phi \left( hU'^{\{ \frac{1}{2} \} }\right) \right] \nonumber \\&\quad {}=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{ \Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\delta }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\delta ), \end{aligned}$$

(27)

where in (27) the expectation is computed with respect to $U' \sim Gamma\left( \alpha _1,\delta /2\right) $ and $F_{PVII}(\cdot )$ represents the cdf of the Pearson type VII distribution. Then, the result follows from Lemma 2. When $\delta =\nu $, i.e., the Student-t distribution, we have that $\text { E}_{\phi }\left( r,h\right) $ and $\text { E}_{\Phi }\left( r,h\right) $ are given by

$$\begin{aligned} \text { E}_{\phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) \sqrt{2\pi }}\left( \frac{\nu }{2}\right) ^{\frac{\nu }{2}}\left( \frac{h^2+\nu }{2}\right) ^{-\frac{\left( \nu +2r\right) }{2}};\\ \text {E}_{\Phi }\left( r,h\right)&=\frac{\Gamma \left( \frac{\nu +2r}{2}\right) }{\Gamma \left( \frac{\nu }{2}\right) }\left( \frac{\nu }{2}\right) ^{-r}F_{PVII}(h|\nu +2r,\nu ). \end{aligned}$$

1.2 Slash distribution

In this case $U \sim Beta(\nu ,1)$, with positive shape parameter $\nu $, and

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=\int ^{1}_{0}u^r\frac{1}{\sqrt{2\pi }}\exp \left\{ -\frac{h^2}{2}u\right\} \nu u^{\nu -1}du = \frac{\nu }{\sqrt{2\pi }}\int ^{1}_{0}u^{\nu +r-1}\exp \left\{ -\frac{h^2}{2}u\right\} du, \nonumber \\&= \frac{\nu }{\sqrt{2\pi }}\left( \frac{h^2}{2}\right) ^{-\left( \nu +r\right) }\gamma ^{*}\left( \nu +r,\frac{h^2}{2}\right) \!, \end{aligned}$$

(28)

thus, considering $\gamma ^{*}\left( a,x\right) =\int ^{x}_{0}e^{-t}t^{a-1}dt$, we obtain Eq. (28).

$$\begin{aligned} \text {E}_{\Phi }\left( r,h\right)&=\int ^{1}_{0}u^r\Phi \left( h u^\frac{1}{2}\right) \nu u^{\nu -1}du \nonumber \\&= \frac{\nu }{\nu +r}\int ^{1}_{0}\Phi \left( h u'^{\{ \frac{1}{2} \} }\right) {u'}^{\{\nu +r-1\}}\left( \nu +r\right) du' \end{aligned}$$

(29)

$$\begin{aligned}&= \frac{\nu }{\nu +r}F_{SL}(h|\nu +r), \end{aligned}$$

(30)

where the integrand in (29) is the expectation of the random variable $\Phi (h U'^{\{ \frac{1}{2} \} })$, with ${U'}\sim ~\text {Beta}(\nu +r,1)$. Using Lemma 2, we obtain Eq. (30), where $F_{SL}(\cdot )$ is the cdf of the slash distribution.

1.3 Contaminated normal distribution

$$\begin{aligned} \text {E}_{\phi }\left( r,h\right)&=u^r\phi \left( h u^\frac{1}{2}\right) \left[ \nu \mathbb {I}_{\{\gamma \}}(u)+(1-\nu ) \mathbb {I}_{\{1\}}(u)\right] \\&= \nu \gamma ^r\phi \left( h{\gamma }^\frac{1}{2}\right) +(1-\nu )\phi \left( h{\gamma }^\frac{1}{2}\right) \!;\\ \text {E}_{\Phi }\left( r,h\right)&=u^r\Phi \left( h u^\frac{1}{2}\right) \left[ \nu \mathbb {I}_{\{\gamma \}}(u)+(1-\nu ) \mathbb {I}_{\{1\}}(u)\right] \\&= \nu \gamma ^r\Phi \left( h{\gamma }^\frac{1}{2}\right) +(1-\nu )\Phi \left( h \right) = \gamma ^r\left[ \nu \Phi \left( h u^\frac{1}{2}\right) + \left( 1-\nu \right) \Phi \left( h\right) \right] \\&\quad +\left( 1-\nu \right) \left( 1-\gamma ^r\right) \Phi \left( h\right) \\&= \gamma ^rF_{CN}(h|\nu ,\gamma )+\left( 1-\nu \right) \left( 1-\gamma ^r\right) \Phi \left( h\right) \!, \end{aligned}$$

where $F_{CN}(\cdot )$ is the cdf of the contaminated normal distribution.

Appendix C. Details of the EM-type algorithm

In this Appendix, we derive the EM algorithm Eqs. (20)–(22). Let ${\varvec{\theta }}=({\varvec{\beta }}^{\top }, \sigma ^2, {\varvec{\nu }})$ be the vector with all parameters in the SMN-CR model and consider the notation given in Sect. 3.2. Denoting the complete-data likelihood by $L(\cdot |\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})$ and pdf’s in general by $f(\cdot )$, we have that

$$\begin{aligned} L({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})&=f(\mathbf {y}_{\text {obs}}, \mathbf {y}_{L},\mathbf {u})=f(\mathbf {y}_{\text {obs}},\mathbf {y}_{L}|\mathbf {u})f(\mathbf {u})\\&= f(\mathbf {y}|\mathbf {u})f(\mathbf {u}) = \prod ^n_{i=1}f(y_i|u_i)f(u_i|{\varvec{\nu }}). \end{aligned}$$

Dropping unimportant constants, the complete-data log-likelihood function is given by

$$\begin{aligned} \ell _c({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u})&= \log (L({\varvec{\theta }}|\mathbf {y}_\text {obs}, \mathbf {y}_{L},\mathbf {u}))\\&\quad {}= \frac{n}{2} \log (\sigma ^2) +\frac{1}{2} \sum _{i=1}^{n}\log \left( u_i\right) - \frac{1}{2\sigma ^2}\sum _{i=1}^n u_i(y_i-\mathbf {x}^{\top }_i{\varvec{\beta }})^2 \\&\qquad + \sum _{i=1}^{n} \log \left( f(u_i|{\varvec{\nu }})\right) . \end{aligned}$$

The Q-function at the E-step of the algorithm is given by

$$\begin{aligned} Q({\varvec{\theta }}|{\varvec{\theta }}^{(k)}) =\text {E}_{{{\varvec{\theta }}^{(k)}}}\left[ \ell _{c}\left( {\varvec{\theta }}|\mathbf {Y}_\text {obs},\mathbf {Y}_L,\mathbf {U}\right) |\mathbf {y}_\text {obs}\right] , \end{aligned}$$

so we have

$$\begin{aligned} Q({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)})&= -\frac{n}{2}\log \left( \sigma ^2\right) -\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left\{ \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^2|y_{\text {obs}_i}] \right. \\&\quad {} \left. -\,2 \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i|y_{\text {obs}_i}] \mathbf {x}^{\top }_i{\varvec{\beta }}+\text {E}_{{\varvec{\theta }}^{(k)}}[U_i|y_{\text {obs}_i}] (\mathbf {x}^{\top }_i{\varvec{\beta }})^2\right\} \\&\quad {} +\,\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

The expectations $\mathcal{E}_{si}\big ({\varvec{\theta }}^{(k)}\big ) = \text {E}_{{\varvec{\theta }}^{(k)}}[U_i Y_i^s|y_{\text {obs}_i}],\,\,\, s=0,1,2$, used in the E-step of the algorithm, are computed considering the two possible cases: (i) when the observation i is uncensored and (ii) otherwise. In the former case we solve the problem using results obtained by Osorio et al. (2007). In the later case we use Proposition 1. Then, we have

$$\begin{aligned} Q({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}) =&-\frac{n}{2}\log (\sigma ^2)-\frac{1}{2\sigma ^2} \sum _{i=1}^{n} \left[ \mathcal{E}_{2 i}\big ({\varvec{\theta }}^{(k)}\big ) -2 \mathcal{E}_{1 i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}+\mathcal{E}_{0 i}\big ({\varvec{\theta }}^{(k)}\big ) \big (\mathbf {x}^{\top }_i{\varvec{\beta }}\big )^2\right] \\&+\frac{1}{2} \sum _{i=1}^{n}\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}] + \sum _{i=1}^{n} \text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]. \end{aligned}$$

In the CM-step, we take the derivatives of $Q\big ({\varvec{\theta }}|{{\varvec{\theta }}}^{(k)}\big )$ with respect to ${\varvec{\beta }}$ and $\sigma ^2$, i.e.,

The solution of $\displaystyle \frac{\partial Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )}{\partial {\varvec{\beta }}} = 0$ is

$$\begin{aligned} {{\varvec{\beta }}}^{(k+1)}=\left( \sum ^n_{i=1}\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\mathbf {x}_i\mathbf {x}^{\top }_i\right) ^ {-1}\sum ^n_{i=1}\mathbf {x}_i\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ). \end{aligned}$$

The solution of $\displaystyle \frac{\partial Q\big ({\varvec{\theta }}|{\varvec{\theta }}^{(k)}\big )}{\partial \sigma ^2} = 0$ is

$$\begin{aligned} {\sigma ^2}^{(k+1)}&=\frac{1}{n}\sum ^n_{i=1}\left[ \mathcal{E}_{2i}\big ({\varvec{\theta }}^{(k)}\big )-2\mathcal{E}_{1i}\big ({\varvec{\theta }}^{(k)}\big ) \mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)} +\mathcal{E}_{0i}\big ({\varvec{\theta }}^{(k)}\big )\big (\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}\big )^2\right] . \end{aligned}$$

For the CML-step, we estimate ${\varvec{\nu }}$ by maximizing the marginal log-likelihood, circumventing the (in general) complicated task of computing $\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( U_i\right) |y_{\text {obs}_i}]$ and $\text {E}_{{\varvec{\theta }}^{(k)}}[\log \left( f(U_i |{\varvec{\nu }})\right) |y_{\text {obs}_i}]$, i.e.,

$$\begin{aligned} {\varvec{\nu }}^{(k+1)}&=\text {argmax}_{{\varvec{\nu }}}\left\{ \sum _{i=1}^{m} \log \left[ {F}_{SMN}\left( \frac{\kappa _i-\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)}}{\sigma ^{(k+1)}}\right) \right] \right. \\&\quad {} + \left. \sum ^{n}_{i=m+1} \log \left[ f_{SMN}\big (y_i|\mathbf {x}^{\top }_i{\varvec{\beta }}^{(k+1)},{\sigma ^2}^{(k+1)},{\varvec{\nu }}\big )\right] \right\} . \end{aligned}$$

Appendix D. Complementary results of the simulation studies: asymptotic properties

Figures 5 and 6 depict the average bias and the average MSE of $\widehat{\beta }_1$, $\widehat{\beta }_2$ and $\widehat{\sigma ^2}$ for the levels of censoring $p=25\,\%$ and $p=45\,\%$, respectively.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Garay, A.M., Lachos, V.H., Bolfarine, H. et al. Linear censored regression models with scale mixtures of normal distributions. Stat Papers 58, 247–278 (2017). https://doi.org/10.1007/s00362-015-0696-9

Download citation

Received: 08 March 2014
Revised: 21 January 2015
Published: 11 June 2015
Issue Date: March 2017
DOI: https://doi.org/10.1007/s00362-015-0696-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Linear censored regression models with scale mixtures of normal distributions

Abstract

Similar content being viewed by others

Flexible regression modeling for censored data based on mixtures of student-t distributions

Finite mixture of regression models for censored data based on scale mixtures of normal distributions

Finite mixture modeling of censored and missing data using the multivariate skew-normal distribution

1 Introduction

2 Preliminaries

Definition 1

Definition 2

Proposition 1

Proof

3 The SMN censored linear regression model

3.1 The model

3.2 Parameter estimation via an EM-type algorithm for the SMN-CR model

4 Approximated standard errors for the fixed effects

5 Simulation studies

5.1 Robustness of the EM estimates (simulation study 1)

5.2 Asymptotic properties (simulation study 2)

5.3 Consistency of the estimates of the standard errors for the MLE’s of the parameters (simulation study 3)

6 Application

7 Conclusions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Lemmas and corollary

Lemma 1

Lemma 2

Corollary 1

Appendix 2: Derivations of quantities \(\text {E}_{\phi }\left( r,h\right) \) and \(\text {E}_{\Phi }\left( r,h\right) ~\)for SMN distributions

1.1 Pearson type VII distribution (and the Student-t distribution)

1.2 Slash distribution

1.3 Contaminated normal distribution

Appendix C. Details of the EM-type algorithm

Appendix D. Complementary results of the simulation studies: asymptotic properties

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation