Keywords

JEL Classifications

The estimation of duration models has been the subject of significant research in econometrics since the late 1970s. Cox (1972) proposed the use of proportional hazard models in biostatistics and they were soon adopted for use in economics. Since Lancaster (1979), it has been recognized among economists that it is important to account for unobserved heterogeneity in models for duration data. Failure to account for unobserved heterogeneity causes the estimated hazard rate to decrease more with the duration than the hazard rate of a randomly selected member of the population. Moreover, the estimated proportional effect of explanatory variables on the population hazard rate is smaller in absolute value than that on the hazard rate of the average population member and decreases with the duration. To account for unobserved heterogeneity Lancaster proposed a parametric mixed proportional hazard (MPH) model, a partial generalization of Cox’s proportional hazard model, that specifies the hazard rate as the product of a regression function that captures the effect of observed explanatory variables, a baseline hazard that captures variation in the hazard over the spell, and a random variable that accounts for the omitted heterogeneity. In particular, Lancaster (1979) introduced the mixed proportional hazard model in which the hazard is a function of a regressor X unobserved heterogeneity v, and a function of time λ(t),

$$ \theta \left(t|X,v\right)={ve}^{X{\beta}_0}\ \uplambda (t). $$
(1)

The function λ(t) is often referred to as the baseline hazard and v|X has a gamma distribution. The popularity of the mixed proportional hazard model is partly due to the fact that it nests two alternative explanations for the hazard θ(t|X ) to be decreasing with time. In particular, estimating the mixed proportional hazard model gives the relative importance of the heterogeneity, v, and genuine duration dependence, λ(t) (see Lancaster 1990, and Van den Berg 2001, for overviews). Lancaster (1979) uses functional form assumptions on λ(t), which were not required by the Cox model, and distributional assumptions on v to identify the model. Examples by Lancaster and Nickell (1980) and Heckman and Singer (1984), however, show the sensitivity to these functional form and distributional assumptions. Thus, Lancaster’s MPH model is fully parametric and from the outset questions were raised about the role of functional form and parametric assumptions in the distinction between unobserved heterogeneity and duration dependence. (Heckman 1991, gives an overview of attempts to make this distinction in duration and dynamic panel data models.) This question was resolved by Elbers and Ridder (1982), who showed that the MPH model is semi-parametrically identified if there is minimal variation in the regression function. A single indicator variable in the regression function suffices to recover the regression function, the baseline hazard, and the distribution of the unobserved component, provided that this distribution does not depend on the explanatory variables. Semi-parametric identification means that semi-parametric estimation is feasible, and a number of semi-parametric estimators for the MPH model have been proposed that progressively relaxed the parametric restrictions.

Nielsen et al. (1992) showed that the partial likelihood estimator of Cox (1972) can be generalized to the MPH model with gamma-distributed unobserved heterogeneity. Their estimator is semi-parametric because it uses parametric specifications of the regression function and the distribution of the unobserved heterogeneity. The estimator requires numerical integration of the order of the sample size, as originally discussed by Han and Hausman (1990), which further limits its usefulness and makes it impractical for most situations in econometrics. Heckman and Singer (1984) considered the non-parametric maximum likelihood estimator of the MPH model with a parametric baseline hazard and regression function. Using results of Kiefer and Wolfowitz (1956), they approximate the unobserved heterogeneity with a discrete mixture. The rate of convergence and the asymptotic distribution of this estimator are not known. As a result, these estimators that use discrete mixture with an increasing number of support points cannot be used to test hypotheses. Another estimator that does not require the specification of the unobserved heterogeneity distribution was suggested by Honoré (1990). This estimator assumes a Weibull baseline hazard and uses only very short durations to estimate the Weibull parameter.

Han and Hausman (1990) and Meyer (1990) propose an estimator that assumes that the baseline hazard is piecewise-constant, to permit flexibility, and that the heterogeneity has a gamma distribution. Both papers find that the hazard rate, conditional on heterogeneity, is non-monotonic so that the Weibull model cannot hold. Hausman and Woutersen (2005) present simulations and a theoretical result that show that using a nonparametric estimator of the baseline hazard with gamma heterogeneity yields inconsistent estimates for all parameters and functions if the true mixing distribution is not a gamma, which limits the usefulness of the Han–Hausman–Meyer approach. Thus, Hausman and Woutersen (2005) find it important to specify a model that does not require a parametric specification of the unobserved heterogeneity.

Horowitz (1999) was the first to propose an estimator that estimates both the baseline hazard and the distribution of the unobserved heterogeneity nonparametrically. His estimator is an adaptation of the semi-parametric estimator for a transformation model that he introduced in Horowitz (1996). In particular, if the regressors are constant over the duration, then the MPH model has a transformation model representation with the logarithm of the integrated baseline hazard as the dependent variable and a random error that is equal to the logarithm of a log standard exponential minus the logarithm of a positive random variable. In the transformation model the regression coefficients are identified only up to scale. As shown by Ridder (1990), the scale parameter is identified in the MPH model if the unobserved heterogeneity has a finite mean. Horowitz (1999) suggests an estimator of the scale parameter that is similar to Honoré’s (1990) estimator of the Weibull parameter and is consistent if the finite mean assumption holds so that his approach allows estimation of the regression coefficients (not just up to scale). However, the Horowitz approach permits estimation of the regression coefficients only at a slow rate of convergence and it is not N−1/2 consistent, where N is the sample size. The reason for the slower than N−1/2 convergence is that the information matrix of the MPH model is singular under Horowitz assumptions (see Hahn 1994; Ishwaran 1996a). In particular, Horowitz (1999) assumes that the first three moments of the heterogeneity distribution exist, and Ishwaran (1996b) shows that the fastest possible rate of convergence is N−2/5 for that case and Horowitz’s (1999) estimator converges arbitrarily close to that rate. In other words, the slow rate of convergence is implied by the assumptions and is not a peculiarity of the estimator.

Subsequent research has focused on strengthening the assumptions of the MPH model so that N−1/2 convergence is possible. Ridder and Woutersen (2003) derive a N−1/2 consistent estimator for the MPH model by assuming that the baseline hazard rate is constant over a small interval, λ(t)= λ for 0 ≤ tε for any ε> 0 while allowing for a nonparametric baseline hazard function for t > ε. For parametric baseline hazards, Ridder and Woutersen (2003) assume that limt0λ(t)= λ for 0 <λ< ∞ and derive another N−1/2 consistent estimator. Hausman and Woutersen (2005) derive an estimator for the mixed proportional hazard model (with heterogeneity) that allows for a nonparametric baseline hazard and uses time-varying regressors. No parametric specification of the heterogeneity distribution or nonparametric estimation of the heterogeneity distribution is necessary. Intuitively, Hausman and Woutersen (2005) condition out the heterogeneity distribution, which makes it unnecessary to estimate it. Thus, they eliminate the problems that arise with the Lancaster (1979) approach to MPH models. In this model the baseline hazard rate is nonparametric, and the estimator of the integrated baseline hazard rate converges at the regular rate, N−1/2 where N is the sample size. This convergence rate is the same rate as for a duration model without heterogeneity. The regressor parameters also converge at the regular rate. A nice feature of the estimator is that it allows the durations to be measured on a finite set of points. Such discrete measurement of durations is important in economics; for example, unemployment is often measured in weeks. In the case of discrete duration measurements, the estimator of the integrated baseline hazard converges only at this set of points, as would be expected.

It may be argued that the bias in the estimates of the regression coefficients is small if the estimates of the MPH model indicate that there is no significant unobserved heterogeneity. The problem with this argument is that estimates of the heterogeneity distribution are usually not very accurate. Given the results in Horowitz (1999), this finding should not come as a surprise. The simulation results in Baker and Melino (2000) show that it is empirically difficult to find evidence of unobserved heterogeneity, in particular if one chooses a flexible parametric representation of the baseline hazard. However, Han and Hausman (1990) and applications of their approach have found significant heterogeneity using a flexible approach to the baseline hazard. Bijwaard and Ridder (2002) find that the bias in the regression parameters is largely independent of the specification of the baseline hazard. Hence, failure to find significant unobserved heterogeneity should not lead to the conclusion that the bias due to correlation of the regressors and the unobservables that affect the hazard is small.

Because it is empirically difficult to recover the distribution of the unobserved heterogeneity, estimators that rely on estimation of this distribution may be unreliable. Therefore, it may be advisable to avoid estimating the unobserved heterogeneity distribution and the remainder of the MPH model simultaneously. Nevertheless, after estimating the baseline hazard and regression function, one can usually identify the mixing distribution. In particular, Horowitz (1999) uses the following equation to estimate the mixing distribution,

$$ \ln \left\{\Lambda (T)\right\}+ X\beta -\ln (Z)=-\ln (v) $$

where Λ(T) and β can be estimated and the unobserved Z has an exponential distribution with mean one. Thus, Horowitz (1999) solves a deconvolution problem and the speed of convergence depends on the assumptions on the distribution of v.

A hazard model is a natural framework for time-varying regressors if a flow or a transition probability depends on a regressor that changes with time since a hazard model avoids the curse of dimensionality that would arise from interacting the regressors at each point in time with one another. A non-constructive identification proof for the duration model with time-varying regressors can be produced using techniques similar to Honoré (1993b), and Honoré (1993a) gives such a proof. (A non-constructive identification proof is an identification proof that does not suggest an estimator.) In particular, Honoré (1993a) does not assume that the mean of the heterogeneity distribution is finite (nor does Honoré 1993a, assume a tail condition as in Heckman and Singer 1984). Ridder and Woutersen (2003) argue that it is precisely the finite mean assumption that makes the identification of Elbers and Ridder (1982) ‘weak’ in the sense that the model of Elbers and Ridder (1982) cannot be estimated at rate N−1/2. As in Honoré (1993a), Hausman and Woutersen (2005) do not need the finite mean N−1/2 assumption which gives an intuitive explanation of why Hausman and Woutersen (2005) can estimate the model at rate N−1/2.