Keywords

1 Introduction

The usual goals in the analysis of survival data include: (a) describing the distributional shape of the time variable; (b) comparing the survival experiences of different groups in a population; and (c) modeling the relationship between explanatory variables and survival time—as measured by time to the event of interest or the rate at which the event occurs.

Two classes of models are common in the literature for investigating effects of explanatory variables on survival. In the Cox proportional hazards models, the explanatory variables act multiplicatively on a baseline hazard so that their effect is to increase or decrease the hazard relative to that of the baseline group. A second class of models, known as the accelerated failure-time models, specifies the covariates to act multiplicatively on time to event itself so that their effect is to accelerate or decelerate time to event relative to an event time for baseline group. According to Wei (1992), the accelerated failure-time model has an intuitive physical interpretation and would be a useful alternative to the Cox PH model in survival analysis.

It has been documented that covariate effects on survival time are not robust to the choice of the baseline distribution—see, for instance (Addison and Portugal, 1987; Bergström and Edin, 1992; Bergström et al., 1994; Ghilagaber, 2005). It is, therefore, of paramount importance to correctly specify the baseline distribution if results from analysis of survival data are to be utilized optimally. A number of distributions for survival data are available in the literature scattered across disciplines and application areas. Some previous works have attempted to put these scattered models in a more knit form by embedding a number of competing models under the umbrella of a general parametric framework as in Butler and McDonald (1986) and Peng et al. (1998). This enables the use of ordinary parametric inference for assessment of each competing model relative to a more comprehensive one. Among others, (Ghilagaber, 2005) shows that five parametric duration models (exponential, Weibull, gamma, log normal, and reciprocal Weibull) may be treated as special cases of a more general extended generalized gamma (EGG) model by constraining the shape and/or scale parameters of the EGG model to some fixed constants.

In this chapter, we extend the EGG model further and increase the family of flexible distributions to include 13 special cases. This is achieved by including distributions that not only constrain the shape and scale parameters to specified constants but also impose some relationships between them. The new set of special cases include the Rayleigh and inverse Rayleigh distributions as well as the ammag and inverse ammag distributions as described in Cox et al. (2007). Further, a half-normal distribution can be obtained as a special case of ammag distribution.

A Bayesian approach is used to fit the EGG model and its 13 special cases to data on time to entry into first marriage among Eritrean men and women. Each special case model is then tested relative to a more general model using the log predictive density score (LPDS) in a Bayesian approach, see Li et al. (2010). Compared to the classical likelihood inference approach, the Bayesian approach provides three main advantages. First, we sample from a posterior density using Markov Chain Monte Carlo (MCMC), and hence, we can make exact inference for any sample size in any parametric survival models of various complexities. Second, we do not need to worry about the problem of local maximum trapping since our algorithm can go through the whole parameter spaces supported by the data. Third, it is straightforward to investigate the performance of joint posterior density, whereas in a frequentist paradigm, we need to run simulation by pre-specifying the true values of parameters when evaluating the performance of maximum likelihood estimates.

In Sect. 2, we introduce the accelerated failure-time models and demonstrate how a number of common distributions can be brought under the umbrella of the EGG model. Bayesian density estimation of the EGG model and MCMC implementation is described in Sect. 3. In Sect. 4, we illustrate the models of Sect. 2 and the methods of Sect. 3 using real-life data from the 2010 Eritrean Population and Health Survey. Section 5 concludes the chapter by way of summary and concluding remarks. A full list of the distributions used in this chapter, a proof for a lemma, and the R code used in the illustrative example are provided in Appendices.

2 Parametric Models for Survival Data

2.1 Background

Survival data contain information on durations until event or censoring (t 1, t 2, ..., t n) together with a censoring indicator as well as background variables or covariates (z 1, z 2, ..., z p) that are often socio-demographic characteristics of individuals or organizations. The distribution of survival time, T, may be described by its three equivalent functions: the survival function, \(S(t)=P\left (T>t\right )\), the density function, f(t), or the hazard (intensity) function, h(t) = f(t)∕S(t), where the last two functions require absolute continuity.

These functions can vary not only over time, but also among individuals within a population. Thus, one objective in the analysis of survival data is to draw inferences about the influence of covariates on these functions. One popular model is the Cox proportional hazards model presented in Cox (1972) where a p-dimensional vector of covariates z affects the hazard function in a multiplicative manner according to

$$\displaystyle \begin{aligned} h(t|\mathbf{z})=h_{0}(t)\exp\left(\mathbf{z}'\boldsymbol{\beta}\right), {} \end{aligned} $$
(1)

where h 0(t) is an unspecified baseline function of time and β ∈ R p is an unknown vector of parameters representing the effect of the covariates z. The factor \(\exp (\mathbf {z}'\boldsymbol {\beta })\) describes the intensity (hazard) for an individual with vector z relative to that of a standard individual (with z = 0).

2.2 Accelerated Failure-Time Models

A second class of models, the accelerated failure-time model, specifies the covariates to act multiplicatively on the event time itself rather than on the hazard function.

If T 0 is the random time to event associated with an individual in the baseline group (z = 0), then the accelerated failure-time model specifies that for an individual with a non-zero vector of covariates z, the event time is given by

$$\displaystyle \begin{aligned} T=T_0\exp(\mathbf{z}'\boldsymbol{\beta}) {} \end{aligned} $$
(2)

or equivalently

$$\displaystyle \begin{aligned} \ln(T)=\mathbf{z}'\boldsymbol{\beta}+\ln(T_0), {} \end{aligned} $$
(3)

where, as before, T is the event time, z is a vector of covariates, and β is a vector of regression parameters. Since covariates alter, by a scale factor, the rate at which an individual traverses the time axis, Eq. (2) is referred to as the accelerated failure-time model. Thus, in accelerated failure-time models, the effect of the explanatory variables is to accelerate or decelerate time to event relative to T 0.

The model in (3) is a linear model with \(\ln (T_0)\) playing the role of an error term with an underlying baseline distribution. Usually, a scale parameter δ is allowed in the model to give

$$\displaystyle \begin{aligned} \ln(T)=\mathbf{z}'\boldsymbol{\beta}+\delta\ln(T_0)= \mathbf{z}'\boldsymbol{\beta}+\delta \epsilon, {} \end{aligned} $$
(4)

where a more conventional notation 𝜖 is used for the error term.

From (4), we note that \(T=e^{\mathbf {z}'\boldsymbol {\beta }}T_{0}^{\delta }\). Thus, the survival function of T may be written in terms of that of T 0:

$$\displaystyle \begin{aligned} S(t)=P(T>t)=P(e^{\mathbf{z}'\boldsymbol{\beta}}T_{0}^{\delta}>t)=P(T_{0}^{\delta }>te^{ -\mathbf{z}'\boldsymbol{\beta}})=S_{0}(te^{-\mathbf{z}'\boldsymbol{\beta}}), {} \end{aligned} $$
(5)

where S 0(.) is the survival function of the baseline time with scale parameter δ, \(T_{0}^{\delta }\), and \(e^{-\mathbf {z}'\boldsymbol {\beta }}\) is the accelerating/decelerating factor. In other words, the probability for an individual with covariate vector z surviving beyond time t is the same as the probability for an individual in the baseline group (z = 0) surviving beyond time \(te^{-\mathbf {z}'\boldsymbol {\beta }}\). A positive coefficient β shifts the time \(te^{-\mathbf {z}'\boldsymbol {\beta }}\) to the left of t, while a negative β shifts the time \(te^{ -\mathbf {z}'\boldsymbol {\beta }}\) to the right of t if all components of z > 0. Accordingly, the density and hazard functions can also be written in terms of the baseline density and hazard:

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(t)& =&\displaystyle e^{-\mathbf{z}'\boldsymbol{\beta}}f_{0}(te^{-\mathbf{z}'\boldsymbol{\beta}})\\ h(t)& =&\displaystyle e^{-\mathbf{z}'\boldsymbol{\beta}}h_{0}(te^{-\mathbf{z}'\boldsymbol{\beta}}). \end{array} \end{aligned} $$

The distribution of T 0 in (4) may be selected from positive-valued distributions such as Weibull or log normal that, in turn, yield extreme-value and normal distributions for the error term 𝜖. Below, we demonstrate how the list may be expanded by assembling various models under the same umbrella.

2.3 The Extended Generalized Gamma (EGG) Model

Stacy (1962) introduced the generalized gamma model that is useful in embedding competing models into a single parametric framework. This model is the distribution of T such that \(\ln (T)=\mu +\delta \epsilon \), where μ ∈ R, δ > 0, and the random error term 𝜖 has the density

$$\displaystyle \begin{aligned} f(k,\epsilon)=\frac{1}{\Gamma (k)}\exp \left[k\epsilon -\exp (\epsilon) \right] , k >0, {} \end{aligned}$$

where k is an additional shape parameter. Prentice (1974) showed that a shift of parameter of the form \(q=k^{-\frac {1}{2}}\) leads to a standard normal distribution for T giving an interior point for q = 0 in the parameter space. The final model with parameters μ, q ∈ R and δ > 0 can be written as \(\ln (T)=\mu +\delta \epsilon \), where the error density function f(q, 𝜖) is given by

$$\displaystyle \begin{aligned} f(q,\epsilon )=\left\{ \begin{array}{ll} \frac{\left| q\right| }{\Gamma (q^{-2})}(q^{-2})^{q^{-2}}\exp \left\{ q^{-2} \left[ q\epsilon -\exp (q\epsilon )\right] \right\}, & q\neq 0 \\ \\ \frac{1}{\sqrt{2\pi }}\exp (-\frac{\epsilon ^{2}}{2}), & q=0. \end{array} \right. {} \end{aligned} $$
(6)

The distribution of T when the error term has the density given in Eq. (6) is known as the extended generalized gamma (EGG) distribution, see, for instance (Ghilagaber, 2005; Ghilagaber et al., 2014).

As can be seen from the lower part of (6), the EGG model reduces to the standard normal distribution for 𝜖 when the shape parameter q is equal to zero. Accordingly, T will have a log-normal distribution. When the shape parameter q = 1, (6) reduces to

$$\displaystyle \begin{aligned} f(q,\epsilon)=\exp \left[\epsilon -\exp (\epsilon )\right], {} \end{aligned}$$

which is the standard (type 1) extreme-value distribution. As \(\ln \left ( T\right ) \) is a linear function of 𝜖, it has the same (extreme-value) distribution as 𝜖. Hence, \(T=\exp (\mathbf {z}'\boldsymbol {\beta } +\delta \epsilon )\) as defined in Eq. (4) will have a Weibull distribution. If q = 1 and δ = 1, then T has the exponential distribution as a special case of the Weibull distribution. The case of q = −1 corresponds to extreme maximum-value distribution for \(\ln \left ( T\right )\). This, in turn, corresponds to reciprocal Weibull distribution for T.

The case of δ = 1 and q > 0 is also of interest. Farewell and Prentice (1977) argue that this gives the ordinary gamma distribution for T. Others, (Bergström and Edin, 1992; Bergström et al., 1994, 1997), argue that this did not hold in their case illustrations. Consequently, we shall relax this special case to δ = 1 and q ∈ R and label it the “gamma” distribution in our illustrative example. Below, we further extend the above family of distributions by imposing some relationships between the scale and shape parameters.

2.4 Further Extensions of the EGG Model

We begin with a baseline distribution for time to event, \(T_{0}\thicksim EGG(0,1,q)\), and label it as standard generalized gamma distribution with density and survival functions given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} f_{EGG(0,1,q)}(t_0)& =&\displaystyle \frac{\left\vert q\right\vert}{t_0\Gamma(q^{-2})} (q^{-2}t_0^q)^{q^{-2}}exp(-q^{-2}t_0^q),\\ \\ S_{EGG(0,1,q)}(t_0)& =&\displaystyle \left\{ \begin{array}{ll} \\ 1-\Phi(\ln t_0), & q=0 \\ \\ 1-\gamma (q^{-2},t_0^{q}{q}^{-2})/\Gamma (q^{-2}), & q>0\\ \\ \gamma (q^{-2},t_0^{q}{q}^{-2})/\Gamma (q^{-2}), & q<0, \end{array} \right. \end{array} \end{aligned} $$

where Φ(⋅) is the cumulative distribution function of the standard normal distribution. By transformation, \(t=e^{\mu }t_{0}^{\delta }\thicksim EGG(\mu ,\delta ,q)\), and T is said to have the extended generalized gamma distribution with shape parameter μ ∈ R, scale parameter δ > 0, and an additional index shape parameter q ∈ R. We denote this by \( T\thicksim EGG(\mu ,\delta ,q)\), with density

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} f_{EGG}(t)=\frac{\left\vert q\right\vert }{t\delta \Gamma (q^{-2})}\left[{q}^{-2}({e}^{-\mu}t)^{\frac{q}{\delta}}\right] ^{q^{-2}}exp\left[-q^{-2}(e^{-\mu }t)^{\frac{q}{\delta}}\right]\qquad \\ =\left\{ \begin{array}{ll} \frac{1}{\delta t\sqrt{2\pi}}e^{-\frac{(\ln t-\mu )^{2}}{2\delta ^{2}}} & q=0 \\ \\ \frac{q}{\delta}{t}^{\frac{q}{\delta }-1}\frac{1}{\Gamma (q^{-2})} \left[{q}^{-2}({e}^{-\mu })^{\frac{q}{\delta}}\right] ^{q^{-2}}(t^{\frac{q}{\delta }})^{q^{-2}-1}exp\left[-q^{-2}(e^{-\mu })^{ \frac{q}{\delta}}t^{\frac{q}{\delta }}\right] & q>0 \\ \\ -\frac{q}{\delta }{t}^{\frac{q}{\delta }-1}\frac{1}{\Gamma (q^{-2})} \left[{q}^{-2}({e}^{-\mu })^{\frac{q}{\delta}}\right] ^{q^{-2}}(t^{\frac{q}{\delta }})^{q^{-2}-1}exp\left[-q^{-2}(e^{-\mu})^{\frac{ q}{\delta }}t^{\frac{q}{\delta}}\right] & q<0. \end{array} \right. \end{array} \end{aligned} $$
(7)

The component

$$\displaystyle \begin{aligned} \frac{1}{\Gamma (q^{-2})}\left[{q}^{-2}({e}^{-\mu })^{\frac{q}{ \delta}}\right]^{q^{-2}}(t^{\frac{q}{\delta}})^{q^{-2}-1}exp\left[ -q^{-2}(e^{-\mu })^{\frac{q}{\delta}}t^{\frac{q}{\delta}}\right] \end{aligned}$$

in the above equation is the density of the gamma distribution for \( t^{\frac {q}{ \delta }}\) with a shape parameter q −2 and a rate parameter \({q}^{-2}({e}^{-\mu })^{\frac {q}{\delta }}.\) The next lemma gives the rth moment and the first four central moments of the EGG density. The following definitions of skewness and excess kurtosis are used:

$$\displaystyle \begin{aligned} \begin{array}{rcl} S(T)& =&\displaystyle \frac{E\left[T-E(T)\right]^3}{V(T)^{3/2}},\\ K(T)& =&\displaystyle \frac{E\left[T-E(T)\right]^4}{V(T)^2}-3, \end{array} \end{aligned} $$

where V (T) is the variance.

Lemma 1

If \(T\thicksim EGG(\mu ,\delta ,q)\) , then

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle E(T^r)=\left\{ \begin{array}{cc} \frac{\Gamma\left(q^{-2}+r\frac{\delta}{q}\right)}{\left(q^{\frac{-2\delta}{q} }e^{-\mu}\right)^r\Gamma(q^{-2})}, &\displaystyle \;\;\mathit{\text{if }}\; r\delta/q > -q^{-2}, \\ \infty & \mathit{\text{otherwise}}. \end{array} \right. \\ & &\displaystyle E(T)=\frac{\Gamma(q^{-2}+\frac{\delta}{q})}{q^{\frac{-2\delta}{q} }e^{-\mu}\Gamma(q^{-2})},\\\\ & &\displaystyle V(T)=\frac{\Gamma(q^{-2}+\frac{2\delta}{q})\Gamma(q^{-2})-\Gamma^2(q^{-2}+ \frac{\delta}{q})}{\Gamma^2(q^{-2})(q^{\frac{-2\delta}{q}}e^{-\mu})^2},\\ & &\displaystyle E\left[T-E(T)\right]^3 =2\Gamma^3\left(q^{-2}+\frac{\delta}{q}\right)-3\Gamma\left(q^{-2}+\frac{2\delta}{q}\right)\Gamma\left(q^{-2}+\frac{\delta}{q}\right)\Gamma(q^{-2})\\ & &\displaystyle \quad + \Gamma\left(q^{-2}+\frac{3\delta}{q}\right)\Gamma^2(q^{-2}),\\ \\ & &\displaystyle E\left[T-E(T)\right]^4 =-3\Gamma^4\left(q^{-2}+\frac{\delta}{q}\right)+6\Gamma\left(q^{-2}+\frac{2\delta}{q}\right)\Gamma^2\left(q^{-2}+\frac{\delta}{q}\right)\Gamma(q^{-2})\\ & &\displaystyle \quad -4\Gamma\left(q^{-2}+\frac{3\delta}{q}\right)\Gamma\left(q^{-2}+\frac{\delta}{q}\right)\Gamma^2(q^{-2})+\Gamma\left(q^{-2}+\frac{4\delta}{q}\right)\Gamma^3(q^{-2}). \end{array} \end{aligned} $$

A simplified proof of Lemma 1 is provided in Appendix 2.

From Lemma 1, we note that S(T) and K(T) are the functions of q and δq, implying both q and δq are shape parameters.

The survival function of t is then given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} {\huge S_{EGG}(t)}=\left\{ \begin{array}{ll} 1-\Phi \left(\frac{\ln t-\mu}{\delta}\right) &\; q=0 \\ \\ 1-\gamma \left[q^{-2},t^{\frac{q}{\delta }}{q}^{-2}({e} ^{-\mu})^{ \frac{q}{\delta }}\right]/\Gamma (q^{-2}) &\; q>0 \\ \\ \gamma \left[q^{-2},t^{\frac{q}{\delta }}{q}^{-2}({e} ^{-\mu})^{ \frac{q}{\delta}}\right]/\Gamma (q^{-2}) &\; q<0, \end{array} \right. \end{array} \end{aligned} $$
(8)

where \(\gamma \left [q^{-2},t^{\frac {q}{\delta }}{q}^{-2}( {e}^{-\mu })^{\frac {q}{\delta }}\right ]/\Gamma (q^{-2})\) is the corresponding cumulative distribution function of the gamma distribution for \(t^{\frac {q}{ \delta }}\) when q > 0 and \(\gamma \left [ q^{-2},t^{\frac {q}{\delta }}{q} ^{-2}({e}^{-\mu })^{\frac {q}{ \delta }}\right ]\) is a lower incomplete gamma function with the form of \( \gamma (s,r)=\int \nolimits _{0}^{r}x^{s-1}e^{-x}dx\) described in Abramowitz and Stegun (1964).

The EGG model redefined in Eqs. (7) and (8) is a rich and versatile model containing many special cases based on different combinations of q and δ.

Apart from those mentioned in the previous subsection, the list may be extended to include the inverse exponential (q = −δ = −1), standard gamma (q = δ), inverse gamma (when q = −δ), ammag (q = 1∕δ), inverse ammag (q = −1∕δ), Rayleigh (q = 1 and δ = 1∕2), inverse Rayleigh (q = −1 and δ = 1∕2), and half-normal (\(q=\sqrt { 2}\) and \(\delta =\sqrt {2}/2\)).

EGG nests more special cases such as Maxwell–Boltzmann, but we have not included this in the present chapter since our focus is on the distribution of survival time T. Further, the equivalent distributions of some special cases are excluded. For instance, the inverse gamma model is equivalent to the Levy model in some special cases: inverse gamma(q −2, q −2 e μ) ↔ Levy(0, c) when q −2 = 1∕2 and q −2 e μ = 2c. The standard gamma model is also equivalent to a chi-squared model in some situations: standard gamma(q −2, q −2 e μ) ↔ \(\chi ^2_{(v)}\) when q −2 = v∕2 and q −2 e μ = 1∕2.

To sum up, the EGG model constitutes of at least 13 special cases whose relationships are depicted in Fig. 1. Each special case model can be assessed relative to a more comprehensive one using appropriate procedures for comparing nested models. A summary of the density functions, f(t), and survival functions, S(t), for 13 special cases is provided in Appendix 1. The corresponding hazard functions can be obtained by h EGG(t) = f EGG(t)∕S EGG(t). The hazards in the EGG models can be of various forms—increasing, decreasing, bathtub, or arc-shaped (Cox et al., 2007).

Fig. 1
figure 1

Relationships among EGG family of distributions

When we adapt the generalized gamma distribution to accelerated failure-time models, the location parameter μ can be composed of a linear predictor based on p covariates \( \mu =\beta _{0}+\sum \limits _{i=1}^{n}z_{ji}\beta _{j}\) (j = 1⋯p), which justifies the feasibility of the EGG in accelerated failure-time models.

The distribution of \(\epsilon =\ln (T_{0})\) is given in Eq. (6). When q = 0, 𝜖 is standard normal distributed; when q ≠ 0, it can be manipulated to give

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} f(\epsilon;q) & =&\displaystyle \left\vert q\right\vert e^{q\epsilon }\frac{ (q^{-2})^{q^{-2}}}{\Gamma (q^{-2})}(e^{q\epsilon })^{q^{-2}-1}exp\left[-q^{-2}exp(q\epsilon)\right] \end{array} \end{aligned} $$
(9)

with the corresponding survival functions

$$\displaystyle \begin{aligned} S(\epsilon, q)=\left\{ \begin{array}{ll} 1-\Phi (\epsilon) &\; q=0 \\ \\ 1-\gamma \left[q^{-2},\exp (q\epsilon )q^{-2}\right]/\Gamma (q^{-2}) &\; q>0 \\ \\ \gamma \left[q^{-2},\exp (q\epsilon )q^{-2}\right]/\Gamma (q^{-2}) &\; q<0. \end{array} \right. \end{aligned} $$
(10)

Based on the density of 𝜖, Fig. 2 shows the shape of some density functions, f(𝜖), for some selected values of q. Here, we have a special case of \(\ln (T)=\mathbf {z}'\boldsymbol {\beta } +\delta \epsilon =\mu +\delta \epsilon \) in which μ = 0 and δ = 1. We note that the densities are positively skewed for q < 0 and negatively skewed for q > 0 with both the absolute skewness and kurtosis monotone increasing in |q|—which are in accordance with those of Prentice (1974).

Fig. 2
figure 2

Five distributions of \(\ln (T)\) for μ = 0, δ = 1, and some values of q

3 Bayesian Inference in the Extended Generalized Gamma Model

Bayesian inference for a three-parameter EGG model and four-parameter generalized gamma distribution (EGG model with one extra location parameter) is discussed in Tsionas (2001) and Van Noortwijk (2001) for situations where there is no censoring. Inference becomes more complicated in the presence of censored observations due to, for instance, difficulty to find conjugate prior or derive full conditional posterior.

Heleno and Alberto (1986) have used Bayesian approach for EGG model with censored data using Jeffrey multi-parameter prior. Ramos et al. (2017) have shown that both the Jeffreys prior and the reference priors give improper posteriors to the EGG model, and then proposed the overall reference prior in Berger et al. (2015), which provided the proper posterior. In this section, we present Bayesian inference in the EGG model that allows for any type of censoring mechanism.

3.1 Prior and Posterior

In a Bayesian framework, any prior information about the parameters of interest is combined with the data (likelihood) to derive a posterior distribution.

In our present case, we use normal priors with mean 0 and large variance \(\sigma _1^{2}\) for each effect parameters β j(j = 0, ⋯ , p). We also assume a vague prior, a gamma distribution with hyperparameters a and b for the scale parameter δ. For the index shape parameter q, a normal prior with mean 0 and large variance \(\sigma _2^{2}\) is assumed. These independent priors can be summarized as follows:

$$\displaystyle \begin{aligned} \begin{array}{rcl} & &\displaystyle \beta_j \sim N(0,\sigma_1^{2}),\; j=1,\ldots,p \\ & &\displaystyle \delta \sim Gamma(a,b) \\ & &\displaystyle q \sim N(0,\sigma_2^{2}). \end{array} \end{aligned} $$

We can use any prior that reflects our prior knowledge (if any) of the unknown parameters. In our illustration in Sect. 4, we will use σ 1 = σ 2 = 1000 and hyperparameters a = b = 1. The rationale behind this is to let the likelihood dominate the posterior so that the inferences drawn are driven by the data.

Denoting data with \(\mathcal {D}\), the joint posterior distribution is then given by

$$\displaystyle \begin{aligned} \begin{array}{rcl} f(\boldsymbol{\beta}, \delta, q|\mathcal{D})& \propto&\displaystyle L(\boldsymbol{\beta}, \delta, q;\mathcal{D})f(\boldsymbol{\beta}, \delta, q)\\ & =&\displaystyle L(\boldsymbol{\beta}, \delta, q;\mathcal{D})\prod_{j=1}^{p}f(\beta_j) f(\delta)f(q), \end{array} \end{aligned} $$

where \(L(\boldsymbol {\beta }, \delta , q;\mathcal {D})\) is the likelihood function, and f(⋅) is the prior density function of β j, δ, and q with known hyperparameters. The above posterior can be generalized to other types of likelihood functions based on other censoring mechanisms (than the standard right censoring assumed in our present case). With right censored data, the likelihood function becomes

$$\displaystyle \begin{aligned} \begin{array}{rcl} L(\boldsymbol{\beta}, \delta, q;\mathcal{D})=\prod_{i=1}^{n}(\delta^{-1}f(\epsilon _{i},q))^{d_{i}}S(\epsilon _{i},q)^{1-d_{i}}, \end{array} \end{aligned} $$

where d i is the censoring indicator and f(𝜖 i, q) and S(𝜖 i, q) are given by Eqs. (9) and (10), respectively.

Since there is no explicit analytical form for the posterior distribution, sampling is performed using numerical methods based on Markov Chain Monte Carlo (MCMC).

3.2 MCMC: Random Walk Metropolis–Hastings Algorithm with Block Sampling

We sample all parameters sequentially from the joint posterior distribution using the Metropolis–Hastings algorithm. See Gelman et al. (2004) for more details on the Metropolis–Hastings algorithm and its nice properties. A random walk Metropolis–Hastings algorithm with block sampling is used, and the sampling procedure for the parameters θ = (β, δ, q) can be summarized as follows:

  1. (1)

    Set the initial values for the parameters θ 0 = (β 0, δ 0, q 0).

  2. (2)

    Construct the proposal distribution J(θ p|θ c) ∼ N(θ c, c 2 Σ), where θ p is the candidate value, θ c is the current value, and c is the scaling constant and Σ is a known covariance matrix. Here we choose \(\boldsymbol {\Sigma }=-H^{-1}(\hat {\boldsymbol {\theta }})\), where \(H(\hat {\boldsymbol {\theta }})\) is the Hessian matrix evaluated at \(\hat {\boldsymbol { \theta }}\), which is obtained by Newton’s method. Following Gelman et al. (2004), we choose a value of \(c=2.4/\sqrt {k}\), where k is the length of the vector θ.

  3. (3)

    Generate θ from J(θ p|θ c) and U from U(0, 1).

  4. (4)

    If

    $$\displaystyle \begin{aligned} U<\frac{f(\boldsymbol{\theta}^{*}|\mathcal{D})f(\boldsymbol{\theta}^{*})J(\boldsymbol{\theta}_{c}|\boldsymbol{\theta}_{p})}{f(\boldsymbol{\theta} _{c}|\mathcal{D})f(\boldsymbol{\theta}_{c})J(\boldsymbol{\theta}_{p}| \boldsymbol{\theta}_{c})}, \end{aligned}$$

    the candidate vector θ is accepted and θ c = θ ; otherwise, we keep θ c.

  5. (5)

    Return to step (2).

3.3 Posterior Statistics and Convergence Diagnostics

We summarize our posterior distribution by way of posterior means and highest posterior density (hpd). Since MCMC is based on ergodic mean theorem (Markov chain law of large numbers), convergence can be verified using diagnostic plots such as a plot of the cumulative mean against the number of iterations. In addition, inefficiency factors (IF) can be computed as a measure of the efficiency of the MCMC scheme.

3.4 Bayesian Model Comparisons

The common way of comparing models in the Bayesian framework is the use of Bayes factor that is the ratio of marginal likelihood of two competing models.

Suppose we have a set of candidate models \(\mathcal {M}_{m}, m=1,\cdots ,M\) and the corresponding model parameters θ m. The posterior model probability is then given by

$$\displaystyle \begin{aligned} P(\mathcal{M}_m|Y) \propto P(\mathcal{M}_m)P(Y|\mathcal{M}_m), \end{aligned}$$

where Y represents the data at hand. The posterior odds \(P(\mathcal {M} _m|Y)/P(\mathcal {M}_l|Y)\) can be used to compare two models, and it can be written in terms of the Bayes factor:

$$\displaystyle \begin{aligned} \frac{P(\mathcal{M}_m)}{P(\mathcal{M}_l)}BF_{ml}, \end{aligned}$$

where BF ml is the Bayes factor between \(\mathcal {M}_m\) and \( \mathcal {M}_l\) with the form

$$\displaystyle \begin{aligned} BF(Y)=\frac{P(Y|\mathcal{M}_{m})}{P(Y|\mathcal{M}_{l})}=\frac{\int P(Y| \boldsymbol{\theta_{m}},\mathcal{M}_m)P(\boldsymbol{\theta_{m}}|\mathcal{M} _m)d\boldsymbol{\theta_m}}{\int P(Y|\boldsymbol{\theta_{l}},\mathcal{M}_l)P( \boldsymbol{\theta_{l}}|\mathcal{M}_l)d\boldsymbol{\theta_l}}. \end{aligned}$$

The marginal likelihood is a conditional expectation for the likelihood given the prior

$$\displaystyle \begin{aligned} E_{P(\boldsymbol{\theta_m}|\mathcal{M}_m)}(P(Y|\boldsymbol{\theta_m}, \mathcal{M}_m)). \end{aligned}$$

It is sensitive to the choice of the prior, especially when the prior is not very informative (Villani et al., 2009). For instance, if \(P(\boldsymbol {\theta _m}|\mathcal {M}_m)\) is far from \(P(Y|\boldsymbol {\theta _{m}},\mathcal {M}_m)\), while \(P( \boldsymbol {\theta _{l}}|\mathcal {M}_l)\) is close to \(P(Y|\boldsymbol { \theta _{l}},\mathcal {M}_l)\), it is possible that \(P(Y|\mathcal { M}_{m})\) is less than \(P(Y|\mathcal {M}_{l})\) even though \(\mathcal {M}_l\) is a sub-model of \(\mathcal {M}_m\).

To avoid such sensitivity to the choice of prior, we compare our models in the illustration on the basis of their predictive performance. The data is split randomly into B folds, and B-1 fold is used as a training data \( \tilde {y}_{-b}\), while the rest one-fold is used as a testing data \(\tilde {y}_b\). The B-fold cross-validation of the log predictive density score (LPDS) is then formed as

$$\displaystyle \begin{aligned} B^{-1}\sum_{b=1}^{B}\ln p(\tilde{y}_b|\tilde{y}_{-b},x). \end{aligned}$$

In other words, part of the observations are used to update the flat (non-informative) prior and the sensitivity to the prior can be reduced substantially. According to Villani et al. (2009), the Bayes factor is roughly B times more discriminatory than the LPDS. For selecting models in Sect. 4, the LPDS was computed using B = 5 folds of the data.

4 Application: Educational and Residential Differences in Marriage Timing Among Eritrean Men and Women

We now illustrate the models and methods described in the previous sections with real-life data—entry into marriage among Eritrean men and women based on its 2010 Population and Health Survey (EPHS2010).

The main goals with the illustration are to study the distributional shapes of the times to marriage, model the effects of covariates on these event times, and examine if inferences regarding covariate effects are robust to the choice of distributional shape.

The study of marriage timing (age at marriage) is also of substantive interest in its own because of its strong negative association with women’s health directly (Raj, 2010) or indirectly through its negative impact on health care utilization (Godha et al., 2016).

4.1 Data and Variables

The data used for illustration in this chapter come from the 2010 Eritrea Population and Health Survey, EPHS2010 (National-Statistics-Office-Eritrea and Fafo-AIS, 2013). The EPHS2010 was designed as a follow-up to its predecessors—the 1995 and 2002 Demographic and Health Surveys (National-Statistics-Office-Eritrea and Macro-International-Inc., 1997, 2003), and to update the information from the previous surveys as well as provide findings on some new topics of interest.

The EPHS2010 was conducted between January and July 2010 and gathered information from 30224 women aged 15–49 and 5021 men aged 15–59. For the purpose of this paper, only respondents with known values on marital status at the time of the survey are used in the analyses. This resulted in 10238 usable records for women and all 5021 records for men. Detailed tabulations for the entire survey may be found in the EPHS2010 Final Report (National-Statistics-Office-Eritrea and Fafo-AIS, 2013). Summary statistics for the subset of data used in the present chapter are shown in Table 1.

Table 1 Summary statistics of the data sets used in the illustration

By the survey time (January–July 2010), 7421 of the 10238 women (72 %) and 2569 of the 5021 men (51 %) have responded they were ever married (this includes those who might have been separated or widowed after). The rest, 2817 women and 2452 men (28 % and 49 %, respectively), have responded that they were still single at the time of interview. The distribution of the women across educational level shows that 4186 (41 %) had no education at all, 2055 (20 %) had primary-level education, 1827 (18 %) had middle-level education, 1894 (18 %) had secondary-level education, while the rest 276 (3 %) had post-secondary education. The corresponding figures for men are 1051 (21 %), 803 (16 %), 1209 (24 %), 1516 (30 %), and 442 (9 %), respectively. Further, 1819 (18 %) of the women respondents were from the capital (Asmara), 2504 (24 %) were from other towns, while the majority 5915 (58 %) were from rural areas. The corresponding figures for men are 931 (19 %), 1257 (25 %), and 2833 (56 %), respectively.

The columns of percentage married in Table 1 reveal clear differentials across both educational levels and residence for both women and men. For instance, while women with no education constitute 41 % of the entire sample, they constitute 51 % of the marriages (3799 of 7421). Women with post-secondary education, on the other hand, constitute only 1 % of the marriages (96 of 7421). The pattern is similar but less dramatic for men—those with no education constitute 35 % of the marriages, while those with post-secondary education constitute only 8 % of the marriages. Differentials across residence show that women from rural areas constitute 58 % of the sub-sample but 64 % of the marriages. Women from the capital, on the other hand, constitute 19 % of the sub-sample but only 13 % of the marriages. The contribution of men from the capital to the sub-sample is 18 %, while their contribution to the total marriage is 15 %. Men from rural areas constitute 56 % of the sub-sample but 65 % of the marriages.

Plots of survival functions by education and residence for women and men are shown in Figs. 3, 4, 5, 6, and 7. Figures 3 and 4 show plots for women by education and residence, respectively, while Figs. 5 and 6 show the corresponding plots for men. Figure 7 shows gender differences in entry to first marriage among all men and women.

Fig. 3
figure 3

Survival functions by education: Women

Fig. 4
figure 4

Survival functions by residence: Women

Fig. 5
figure 5

Survival functions by education: Men

Fig. 6
figure 6

Survival functions by residence: Men

Fig. 7
figure 7

Survival functions by gender

The plots depict what we already noted in Table 1—that there are differentials across education and residence and that the educational differences are more pronounced in the women data than in men data. The last figure shows that women enter marriage at faster rates than men.

The summary in Table 1 and Figs. 3, 4, 5, 6, and 7 provides a good description of the data at hand, but in order to make sound inferences based on the sub-sample, we need deeper analyses of the data and formal statistical tests. Ghilagaber (2018) has analyzed the data sets using frequentist statistical methods ranging from elementary measures of association between an event of interest and background variables to more complex and advanced methods that utilize the data more efficiently. Elsewhere in this book, (Munezero and Ghilagaber, 2022b) analyze the data sets using dynamic Bayesian approach where covariate effects are allowed to vary over time.

In the next sub-section, we present and discuss results from fitting the further EGG model of Sect. 2 to the above data sets in the Bayesian framework of Sect. 3.

4.2 Results from Bayesian Analysis of the Data Using the EGG Model

Table 2 contains a summary of our results to which we will return at the end of this section. Results from fitting the extended generalized gamma (EGG) model and its 13 special cases to the data for women, men, and the combined sample are shown in Tables 3, 4, and 5, respectively.

Table 2 Posterior means (and 95 % hpd) of estimated effects in the selected models
Table 3 Posterior means (and 95 % hpd) of covariate effects on log time to event: Women
Table 4 Posterior means (and 95 % hpd) of covariate effects on log time to event: Men
Table 5 Posterior means (and 95 % hpd) of covariate effects on log time to event: combined sample

In Table 3, the results from the unconstrained EGG model show that the scale and shape parameters (which are freely estimated from the data) are δ = 0.246 and q = −0.526, respectively.

These estimates give early indications of the constants to which the scale and shape parameters are close as well as the relationship between them. For instance, the estimated shape parameter (− 0.526) is much closer to − 1 and 0 than it is to 1. This, in turn, means the reciprocal Weibull distribution (which constrains the shape parameter to − 1) and the log-normal distribution (which constrains the shape parameter to 0) are more plausible candidate distributions than the Weibull distribution (which constrains the shape parameter to 1).

With regard to the relationships between the scale and shape parameters, a model that constrains negative equality is δ = −q that seems to be more plausible compared to, for instance, a model that constrains reciprocal or negative reciprocal relationship. This is so because a reciprocal relationship would give a scale parameter of 1∕(−0.526) = −1.90, while a negative reciprocal relationship would yield − (1∕(−0.526)) = 1.90 both of which are far from the freely estimated scale parameter 0.246. This, in turn, excludes models such as ammag and inverse ammag in favor of the inverse gamma model.

The above closeness of the special case models to the more general EGG model is also reflected in the values of log predictive density scores (LPDS) given in the last columns of each model. For instance, the LPDS of the EGG model is − 4584, while that of the closest model (the inverse gamma) is − 4594. On the other hand, the LPDS for ammag and inverse ammag are − 5705 and − 5184, respectively, which are far from that of the EGG.

Another point worth noting is that the estimates of the covariate effects and their associated 95% hpd are much alike in the models that are close to each other (in terms of estimated scale and/or shape parameters or in terms of LPDS) than those estimates that are far apart.

Thus, for the women data, it would not make much difference if we base our conclusions on the estimates from the EGG model or the inverse gamma model though a formal test would favor the larger EGG model.

The results for men shown in Table 4 can be interpreted similarly. Here, the scale and shape parameters estimated freely from the data in the EGG model are δ = 0.235 and q = −0.199, respectively. Again, the inverse gamma model that imposes a negative relationship between the scale and shape parameters (δ = −q) seems to be much more plausible than any other model. In fact, a closer look at the LPDS values shows that it even outperforms the larger EGG model though the difference in LPDS is marginal.

Hence, for the men data, we have a very strong evidence to base our conclusions on the results from the inverse gamma model that, of course, are identical to those from the EGG model.

Last, the results for the combined sample are shown in Table 5. Similar reasoning as in the above leads to the choice of EGG model or the inverse gamma model though a formal test would favor the larger EGG model. That the results for the combined sample reflected those for women are not surprising because women constitute about two-third of the combined sample.

The final estimates of covariate effects and their associated 95% hpd from our chosen models for respective data sets are summarized in Table 2.

The results in Table 2 show that there are significant differentials in entry to first marriage across women’s educational level and residence where lower education and rural residence are associated with higher intensities of marriage. For men, the educational differences are less pronounced as there is no significant difference in the intensities of entry to first marriage between those with no education and those with primary or middle education. The residential differential is, however, still significant. The results for the combined sample follow those of women because, as mentioned before, women constitute majority in the combined sample.

5 Summary and Concluding Remarks

In this chapter, we presented the extended generalized gamma (EGG) model for survival data with censored observations. Previous works have shown that five known models can be treated as special cases of the EGG model by constraining the scale parameter, shape parameter, or both to some constants. In the present chapter, we extended the EGG model further to include 13 special case models. This was achieved by imposing relationships between the scale and shape parameters in addition to constraining them to some constants.

The issues were illustrated with data on entry into first marriage among Eritrean men and women based on data from the 2010 Eritrean Population and Health Survey (EPHS 2010). Inference was fully Bayesian using a random walk Metropolis–Hastings algorithm to sample from the posterior distribution, and we compared the models with each other and relative to the more general EGG model using the log predictive density score (LPDS).

The application demonstrates that the further extended family of distributions provides a wide range of alternatives for a baseline distribution in the analysis of survival data with censored observations. For instance, we found that the inverse gamma model, where we impose the scale parameter to be the negative of the shape parameters (δ = −q), fits the men data best and outperforms the EGG model. It also performs well in the women data and the combined sample though the evidence is not as strong as in the men data. This was in accordance with the freely estimated values of the scale and shape parameters in the EGG model.

The empirical results in the final selected models reveal significant differentials in the pace of entry to first marriage across women’s educational levels and residence. As would be expected, lower education and rural residence is associated with higher intensities of marriage. Educational differentials are, however, less pronounced for men as there was no significant difference in the intensities of entry to first marriage between those with no education (the baseline group) and those with primary or middle education. The residential differential was still significant in the men’s data. When we analyzed the combined data, the results followed those of women due, mainly, to the fact that women constitute about two-third in the combined sample.

It may be worth noting that the educational level of the individuals refers to what is achieved by the survey time. As such, it is anticipatory in the sense that the reported educational level might have been achieved after the event of interest. But, our aim here is to demonstrate the models and methods empirically, and the anticipatory nature of education does not affect our purpose. Ghilagaber and Koskinen (2009), Ghilagaber and Larsson (2019), and Munezero and Ghilagaber (2022a) study potential biases due to the use of anticipatory covariates and how to account for that.

Our analysis was based on the tacit assumption that the survivor function S(t) tends to 0 as the study period gets longer. This, in turn, means that we have assumed all individuals will experience the event of interest sooner or later. This may not be true for the event in our illustrative example (marriage) as there may be some individuals who may never marry for various reasons. Future works may, therefore, consider accounting for such long-term survivors (those who may never experience the event of interest). This can be achieved by using, for instance, a mixture model consisting of a hazard/intensity model for those who experienced the event or may experience it in the future and a logistic model for the probability of being long-term survivor (never experiencing the event).