1 Introduction

In survival studies, we often come across data that occurs in clusters or groups that arise due to some common factor associated with each group. For example, a cluster may refer to a set of event times for the same individual or event times belonging to related individuals (parent–child, siblings or twins). Sometimes there may be a set of event times belonging to individuals suffering from a particular disease in the same community or being treated in the same hospital under similar conditions like exposure to the same environment or treatment. In all of these cases, there is reason to group together certain event times. This stratification helps to analyse time to event data in a more effective way.

Proportional Hazards models introduced by Cox (1972) assumed that the study population is homogeneous given some observed covariates. However, this may not be true in general and individuals may differ in many aspects. For instance, two individuals at the same stage of disease progression, receiving same treatment, having no significant difference in the covariates may respond to the same treatment in a significantly different manner. This can be attributed to different individuals having different frailties. In general, more frail individuals in a population die earlier than those who are less frail. Hence, heterogeneity due to unknown factors can be attributed to an individual’s frailty and is incorporated in the model as a randomness factor. It modifies the hazard function for each individual and hence increases the precision with which we study the effect of the known covariates.

In particular, shared frailty models assume that the value of frailty is same for each observation in a cluster and frailty levels of different clusters are independent of each other. In other words, there is dependence between event times in a cluster but there is independence between observations from different clusters.

Frailty (named as such by Vaupel et al. 1979) was studied by Greenwood and Yule (1920) for recurrent data and by Clayton (1978) for bivariate data. Shared frailty models have been extensively studied with different choices of distribution viz: gamma distribution (Clayton 1978; Vaupel et al. 1979; Klein 1992; Yashin et al. 1995), log-normal distribution (McGilchrist and Aisbett 1991; Xue and Brookmeyer 1996; Yau and McGilchrist 1997; Ripatti and Palmgren 2000; Vaida and Xu 2000), and positive stable distribution (Hougaard 1986a, b; Oakes 1994; Hougaard 1995; Lam and Kuk 1997).

In order to obtain efficient estimators for the regression coefficients and also to recognize covariates which are significantly affecting the event times, it is important to make the most appropriate choice for the frailty distribution. For this reason, flexible distributions have been used to describe the frailty variable. Some of the flexible frailty distributions that have been studied in literature are the positive stable distribution (Hougaard 1986a), generalized gamma distribution (Balakrishnan and Peng 2006; Pengcheng et al. 2013), compound Poisson distribution (Aalen 1992; Wienke et al. 2003) and log-skew normal distribution (Callegaro and Iacobelli 2012).

In this paper, we consider the generalized gamma distribution (GGD) to describe the frailty term effect. This distribution has many sub models as its special cases. Four popular sub-models of the GGD are the Weibull, exponential, log-normal and gamma distribution. Fitting a flexible distribution helps in improving the accuracy of the estimates obtained and also saves time and cost incurred in fitting survival data to each of these distributions independently to come up with the most efficient model. We apply a Monte Carlo Markov Chain (MCMC) procedure to obtain Bayesian estimates of the parameters of the model. We perform a simulation study to observe the effect of sample size, cluster size and frailty variance on the efficiency of the estimates obtained in terms of their biases and mean squared errors.

Similar methodology has been successfully applied to survival studies previously (Clayton 1991; Zeger and Karim 1991; Dellaportas and Smith 1993; Ducrocq and Casella 1996; Golightly and Wilkinson 2008). Bayesian estimation methods have also found use in the study of random effects model in many other fields. Khoshravesh et al. (2015) apply Bayesian regression to study evapotranspiration in three arid environments. In econometric studies, similar methodology has been applied to longitudinal data by Koop et al. (1997) to make inference about firm specific inefficiencies and by Allenby and Rossi (1998) to model consumer heterogeneity. Bayesian estimation procedures have also been applied in meta analysis wherein multiple studies have been combined (Bodnar et al. 2017; Higgins et al. 2009; Pullenayegum 2011; Turner et al. 2015).

Earlier work done on the generalized gamma shared frailty model (Balakrishnan and Peng 2006; Pengcheng et al. 2013) makes use of the Newton Raphson algorithm or the EM algorithm to obtain Maximum Likelihood estimates. These methods are computationally lengthy especially when the number of treatment effects is large and convergence for such methods may take a long time if initial parameter values in the algorithm are far from the true parameter values. Also, it is never certain whether convergence will happen. MCMC methods like Metropolis–Hastings algorithm can be run for shorter durations and problems like multiple modes and non-convergence etc. can be diagnosed earlier. Bayesian estimation also allows us to incorporate any prior information that may be available about treatment effects from previous studies. Hence, we use MCMC algorithm to obtain Bayesian estimates of the model parameters.

In Sect. 2, the general shared frailty model is defined. Some basic properties of the GGD and its special cases are presented in Sect. 3. In Sect. 4, the generalized gamma shared frailty model is proposed and an estimation method formulated in Sect. 5. Simulations have been carried out in Sect. 6 followed by a real life illustration in Sect. 7 wherein we fit the model to exercise times data set of coronary heart patients (Danahy et al. 1977).

2 Shared frailty model with censoring

Consider a case where continuous random variable T is the lifetime of an individual and random variable C is the censoring time. Let random variable Z be the frailty variable, \(\mathbf X \)  = \((X_{1}, X_{2}, \ldots ,X_{r})^{'}\) denote a vector of ’r’ known covariates and \( \delta \) be the censoring indicator.

Suppose there are M clusters each having \( n_{i} \) observations (i = 1, 2, ..., M) and associated with each cluster is the unobserved frailty \( z_{i} \). The clusters can represent hospitals, schools or cities and \(n_i\) survival times in the ith cluster can represent time to event (death or recovery) of patients suffering from a particular disease, time to event (completing a test or passing a grade) for a student, or time to demographical events (marriage or divorce) for the urban population.

The survival data are right censored such that \(t_{ij}\)= min {\(T_{ij}\), \(C_{ij}\)} denotes the survival time and

$$\begin{aligned} \delta _{ij} = {\left\{ \begin{array}{ll} 1 \quad \text { if } T_{ij} \le C_{ij} \text {, that is, event is not censored}\\ 0\quad \text { if } T_{ij} > C_{ij} \text {, that is, event is censored.} \end{array}\right. } \end{aligned}$$

denotes the censoring indicator.

The shared frailty model assumes that all the event times in a cluster are conditionally independent given the frailties and event times from different clusters are independent. In case of censoring and for given frailty Z = \(z_i\), the conditional hazard function for ith cluster at time \( t_{ij} > 0 \) is written as

$$\begin{aligned} {h(t_{ij}|z_{i},\mathbf {X}_{ij}) = z_{i}h_{0}(t_{ij}) e^{\mathbf {X}_{ij}^\prime \varvec{{\beta }} }} \end{aligned}$$
(1)
  • j = 1, 2, ..., \(n_i\) and i = 1, 2, ..., M

  • where,

  • \(h_{0}(t_{ij})\) is the baseline hazard function;

  • \(\mathbf {X_{ij}}\) are r x 1 column vectors of known covariates of the jth subject in the ith cluster;

  • \( \varvec{\beta }\) is a r x 1 column vector of regression coefficients.

The conditional survival function for a given frailty at time \(t_{ij}>0\) is

$$\begin{aligned} S(t_{ij}|z_i,\mathbf {X}_{ij})= & {} e^{-\int _0^{t_{ij}} h(u|z_{i},\mathbf {X}_{ij})du} \nonumber \\= & {} e^{-\int _{0}^{t_{ij}}z_{i}h_{0}(u) e^{\mathbf {X}_{ij}^\prime \varvec{{\beta }} }du} \nonumber \\= & {} e^{-z_{i}e^{\mathbf {X}_{ij}^\prime \varvec{{\beta }}}\int _{0}^{t_{ij}}h_{0}(u)du} \nonumber \\= & {} e^{-z_{i}H_{0}(t_{ij})e^{\mathbf {X}_{ij}^\prime \varvec{{\beta }}}} \end{aligned}$$
(2)

where \(H_{0} (t_{ij})\) is the cumulative hazard function.

3 Generalized gamma distribution

A random variable \(Z>0\) is said to follow generalized gamma distribution with parameters b, d, and k (GGD(b,d,k)) if its probability density function is given by

$$\begin{aligned} f(z: b,d,k) = \dfrac{d}{\varGamma (k) \; b}\left( \dfrac{z}{b}\right) ^{dk - 1}e^{-(\frac{z}{b})^{d}} \end{aligned}$$
(3)

where \(d>0,k>0\) are the shape parameters and \(b>0\) is the scale parameter.

Although the GGD is complex to handle in its likelihood form, it could be quite useful since it includes other distributions as special cases (Khodabin and Ahmadabadi 2010) based on the values of the parameters. For example, the distribution reduces to exponential distribution when \(d = k = 1\), gamma distribution when \(d = 1\), Weibull distribution when \(k = 1\), log-normal distribution as a limiting case when \(k \rightarrow \infty \). Also by setting \(d = 2\), we can obtain another sub-family known as the generalized normal distribution which itself being a flexible distribution includes the half-normal (\(k=1/2,b^{2} = 2\sigma ^{2}\)), Rayleigh (\(k = 1, b^{2} = 2\sigma ^{2}\)), Maxwell–Boltzmann (\(k = 3/2\)), and chi square (\(k = n/2, n =1, 2,. . .\)).

Fig. 1
figure 1

Density functions of GGD for different parametric combinations

Figure 1 shows the shapes of the probability density function of GGD for different parametric values. The distribution is suitable for a variety of cases. As Hougaard (1995, 2000) shows, the tails of the frailty distribution can determine the type of dependence among correlated observations. A strong early (late) dependence is indicated by a long right (left) tail. As indicated by Fig. 1, generalized gamma distribution when chosen as a distribution to model frailty, can handle varying degrees of early dependencies.

The mean and variance of the GGD(b,d,k) are

$$\begin{aligned} E\left( z\right)= & {} \dfrac{b \varGamma \left( k + \frac{1}{d}\right) }{\varGamma \left( k\right) } \\ V\left( z\right)= & {} \dfrac{b^{2} \varGamma (k + \frac{2}{d})}{\varGamma \left( k\right) } - \left( \dfrac{b \varGamma (k + \frac{1}{d})}{\varGamma \left( k\right) }\right) ^{2}. \end{aligned}$$

In order to make the parameters of the model identifiable, we need to set the mean of the frailty distribution to one. Hence,

$$\begin{aligned} E\left( z\right) = 1 \Rightarrow b = \dfrac{\varGamma \left( k\right) }{\varGamma \left( k + \frac{1}{d}\right) } . \end{aligned}$$

The variance of the frailty distribution obtained after replacing the value of b is given by

$$\begin{aligned} V\left( z\right) = \dfrac{\varGamma \left( k\right) \varGamma \left( k + \frac{2}{d}\right) }{{\varGamma \left( k + \frac{1}{d}\right) }^2} - 1. \end{aligned}$$

4 The generalized gamma shared frailty model

4.1 Baseline distribution

Weibull distribution is one of the most widely used distributions in life time data analysis. Its versatility comes from the various shapes of the hazard function for different values of the shape parameter \(\psi \). The hazard rate of the Weibull distribution decreases with time (early life or infantile high mortality) when shape parameter is less than 1. It is fairly constant (useful or mid life) for shape parameter equal to or close to 1 and increases with time (wear out or old age) for shape parameter greater than 1. These three sections comprise the “bath tub curve” and are indicative of human survival rates from birth to death. Its probability density function is defined by

$$\begin{aligned} f(t)=\psi \omega t^{\psi - 1} e^{-\omega t^{\psi }}, \end{aligned}$$

where \(\psi >0\) is the shape parameter and \(\omega >0\) is the scale parameter.

The baseline hazard function is given by

$$\begin{aligned} h_{0} (t) = \omega \psi t^{\psi - 1} . \end{aligned}$$
(4)

The cumulative baseline hazard function is given by

$$\begin{aligned} H_{0} (t) = \omega t^{\psi } . \end{aligned}$$
(5)

4.2 Likelihood function

The full conditional likelihood function for the generalized gamma frailty model is given by the following equation

$$\begin{aligned} L_{c}=\prod _{i=1}^{M}\prod _{j=1}^{n_i}\left[ h(t_{ij}|z_i,\mathbf {X}_{ij})\right] ^{\delta _{ij}} S(t_{ij}|z_i,\mathbf {X}_{ij}) , \end{aligned}$$
(6)

where the baseline hazard function and the conditional survival function are given by (1) and (2) respectively.

The full unconditional likelihood function obtained by integrating the likelihood function over the entire range of the random variable Z, is given by

Using Eqs. (1) and (2) and writing \( \eta _{ij} = e^{\mathbf {X}_{ij}^\prime \varvec{\beta }}\), we have

$$\begin{aligned} L= & {} \prod _{i=1}^M \left\{ \int _{0}^{\infty } \left( \prod _{j=1}^{n_i} \left[ z_ih_0(t_{ij}) e^{\mathbf {X}_{ij}^\prime \varvec{\beta } }\right] ^{\delta _{ij}} e^{-z_iH_0(t_{ij})e^{\mathbf {X}_{ij}^\prime \varvec{\beta }}}\right) f(z_{i})dz_i\right\} \nonumber \\= & {} \prod _{i=1}^{M}\left\{ \left( \prod _{j=1}^{n_i}{\left[ h_{0}(t_{ij})\eta _{ij} \right] ^{\delta _{ij}}}\right) \int _{0}^{\infty } \left( \prod _{j=1}^{n_i}{ z_i ^{\delta _{ij}} e^{-z_iH_{0}(t_{ij})\eta _{ij}}}\right) f(z_{i})dz_{i} \right\} \nonumber \\= & {} \prod _{i=1}^{M} \left\{ \left( \prod _{j=1}^{n_i}\left[ h_{0}(t_{ij})\eta _{ij} \right] ^{\delta _{ij}} \right) \int _{0}^{\infty } z_i^{\sum _{j=1}^{n_i}{\delta _{ij}}} e^{-z_i\sum _{j=1}^{n_i}H_{0}(t_{ij})\eta _{ij}} f(z_i)dz_i \right\} \nonumber \\= & {} \prod _{i=1}^{M} \left\{ \left( \prod _{j=1}^{n_i}\left[ h_{0}(t_{ij})\eta _{ij} \right] ^{\delta _{ij}} \right) \int _{0}^{\infty } z_i^{D_i} e^{-z_i A_i} f(z_i)dz_i \right\} , \end{aligned}$$
(7)

where \(D_i = \sum \limits _{j=1}^{n_i}{\delta _{ij}} \text { and } A_i = \sum \limits _{j=1}^{n_i}H_{0}(t_{ij})\eta _{ij}\).

Writing the integral in (7) as \(I_i\) and putting \(t_i = z_i A_i\), we have

$$\begin{aligned} I_i= & {} \int _{0}^{\infty }{\left( \frac{t_i}{A_i}\right) }^{D_i} e^{-t_i} f\left( \dfrac{t_i}{A_i}\right) \frac{1}{A_i} dt_i, \nonumber \\= & {} \dfrac{1}{A_i^{D_i+1}}\int _{0}^{\infty }{t_i}^{D_i} e^{-t_i}f\left( \dfrac{t_i}{A_i}\right) dt_i \nonumber \\\approx & {} \dfrac{1}{A_i^{D_i+1}}\sum _n{ w_n {u_n}^{D_i}f\left( \dfrac{u_n}{A_i}\right) }, \end{aligned}$$
(8)

using the Gauss-Laguerre quadrature rule, where \(w_n\) and \(u_n\) are the n weights and nodes of the Laguerre polynomial.

Hence the full unconditional likelihood function reduces to

$$\begin{aligned} L = \prod _{i=1}^{M}\left\{ \left( \prod _{j=1}^{n_i}\left[ h_{0}(t_{ij})\eta _{ij} \right] ^{\delta _{ij}} \right) I_i \right\} , \end{aligned}$$
(9)

where \(I_i\) is as given in (8).

5 Estimation procedure

Often methods like Maximum Likelihood (ML) and Expectation–Maximization (EM) have been used in frailty models. However, in our case, the integral is not in a closed form and this complicates the estimation procedure. Earlier, authors have approximated the integral using Monte Carlo simulations and used Newton Raphson algorithm to obtain ML estimates (Balakrishnan and Peng 2006) or used EM algorithm for estimation with MCMC Metropolis algorithm to calculate conditional expectations (Pengcheng et al. 2013). In this paper, the Random Walk Metropolis–Hastings algorithm, one of the MCMC methods, is used to obtain Bayesian estimates of the parameters of the model.

In the Bayesian approach, the parameters of the model \(\varvec{\theta } = (\psi ,\omega ,\beta _1,\ldots ,\beta _r,d,k)\) are treated as random variables. A prior distribution \(p(\varvec{\theta })\) is used to describe the currently available information about the parameter of interest, \(\varvec{\theta }\). Data collected from a study independent of this information is then incorporated into the model to draw conclusions about \(\varvec{\theta }\). Ideally the prior distribution should cover all reasonable values that \(\varvec{\theta }\) can take, but the distribution need not play a significant role in the posterior distribution. This is because information from the data collected in the study will far outweigh any prior information available or guesses made about \(\varvec{\theta }\), in the form of prior probabilities specified. Hence, we use “reference priors ”, that can be described as vague, diffused, or flat, non-informative priors (Gelman et al. 2013). The rationale for this is to let the data play a more significant role so that the inferences made are driven by the data.

A gamma prior, G(\(\phi ,\phi \)) is used for \(\psi \), \(\omega \), d and k whereas a normal prior N(0, \(\epsilon ^2\)) is used for regression coefficients \(\beta _i\). For simulations, we set the hyperparameters \(\phi \) = 0.001 and \(\epsilon ^2\) = 1000, such that the variance is very large. Similar types of prior distributions have been used earlier (Ibrahim et al. 2001; Sahu et al. 1997; Santos and Achcar 2010; Hanagal and Dabade 2013).

Hence, using the full likelihood function \(L(\varvec{\theta }|x)\) with the prior distribution, we obtain the updated posterior distribution of the parameters as

$$\begin{aligned} \pi (\varvec{\theta }|x) \propto L (\varvec{\theta } | x) p(\varvec{\theta }). \end{aligned}$$
(10)

Under the assumption that all the parameters of the model, \(\psi ,\omega ,\beta _1,\ldots ,\beta _r,\) d, and k are independently distributed, the joint posterior density is

$$\begin{aligned} \pi (\psi ,\omega ,\varvec{\beta },d,k) \propto L(\psi ,\omega ,\varvec{\beta },d,k) g_1(\psi ) g_2(\omega ) g_3(d) g_4(k) \prod _{i=1}^{r}p_i(\beta _i), \end{aligned}$$
(11)

where g(.) is the prior density function of \(\psi \), \(\omega \), d and k with known hyperparameters. \(p_i(.)\) is the prior density function with known hyperparameters of \(\beta _i\). The likelihood function is as given by Eq. (9).

Given the likelihood and priors, we can obtain the full conditional distribution of these parameters by considering only those terms which include that parameter. For example, for \(\psi \) we have

$$\begin{aligned} \pi (\psi |\omega ,\varvec{\beta },d,k) \propto L(\psi ,\omega ,\varvec{\beta },d,k) g_1(\psi ). \end{aligned}$$
(12)

Using Eq. (9),

$$\begin{aligned} \pi (\psi |\omega ,\varvec{\beta },d,k) \propto \prod _{i=1}^{M}\left\{ \prod _{j=1}^{n_i}\left[ h_{0}(t_{ij})\eta _{ij} \right] ^{\delta _{ij}} \right\} I_i g_1(\psi ) . \end{aligned}$$

Hence, ignoring the terms that do not involve \(\psi \) we have,

$$\begin{aligned} \pi (\psi |\omega ,\varvec{\beta },d,k)&\propto \prod _{i=1}^{M}\left\{ \prod _{j=1}^{n_i} h_{0}(t_{ij}) ^{\delta _{ij}} \right\} I_i g_1(\psi ). \end{aligned}$$

Similarily

$$\begin{aligned}&\pi (\omega |\psi ,\varvec{\beta },d,k) \propto \prod _{i=1}^{M}\left\{ \prod _{j=1}^{n_i} h_{0}(t_{ij}) ^{\delta _{ij}} \right\} I_i g_2(\omega ) ,\nonumber \\&\pi (d|\psi ,\omega ,\varvec{\beta },k) \propto \prod _{i=1}^{M} I_i g_3(d) ,\nonumber \\&\pi (k|\psi ,\omega ,\varvec{\beta },d) \propto \prod _{i=1}^{M} I_i g_4(k) , \nonumber \\&\pi (\beta _i|\psi ,\omega ,\varvec{\beta _i},d,k) \propto \prod _{i=1}^{M}\left\{ \prod _{j=1}^{n_i} \eta _{ij} ^{\delta _{ij}} \right\} I_i p_i(\beta _i) \quad \quad \text {for i} = 1, 2, \ldots , \text {r} . \end{aligned}$$
(13)

where \(\varvec{\beta _i} =(\beta _1, \beta _2,\ldots , \beta _{i-1}, \beta _{i+1}, \ldots , \beta _r)\) represents a vector of regression coefficients except \(\beta _i\).

Using the Random Walk Metropolis–Hastings Algorithm, a sample is drawn from each of the populations with the conditional posterior distributions given by (13). The algorithm was first described by Metropolis and Ulam (1949) and Metropolis et al. (1953) and later generalized by Hastings (1970). The algorithm can be summarized as follows:

  1. 1.

    Start with an initial set of values for the parameters \(\varvec{\theta ^{(0)}} = (\psi ^{(0)},\omega ^{(0)},\varvec{\beta ^{(0)}},d^{(0)},k^{(0)})\).

  2. 2.

    Set iteration counter at \(i=1\).

  3. 3.

    Generate a proposal value \(\theta ^*\) from the normal transition kernel. For example, \(\psi ^{*}\) is generated from \(N(\psi ^{(i-1)},c)\).

  4. 4.

    Calculate the acceptance ratio, say for \(\psi ^*\) we have \( \alpha = min \left\{ 1, \dfrac{\pi (\psi ^*|\omega ,\varvec{\beta },d,k)}{\pi (\psi ^{(i-1)}|\omega ,\varvec{\beta },d,k)} \right\} \)

  5. 5.

    Generate a random number r from U(0,1).

  6. 6.

    Accept \(\psi ^*\) as \(\psi ^{(i)}\) if \(r\le \alpha \) else \(\psi ^{(i)} = \psi ^{(i-1)}\)

  7. 7.

    Repeat steps 3–6 for remaining parameters \(\omega \), \(\varvec{\beta }\), d and k each time updating \(\varvec{\theta }^{(i)}\) with the previous point accepted in the chain. These steps are repeated N times to get N samples from \(\pi (\varvec{\theta }) \).

The standard deviation of the transition kernel ‘c’ is chosen independently for each parameter (after a few trial runs) such that the chain has an acceptance ratio of 25 % (Gelman et al. 1996) i.e. 25% of the new proposed values are accepted and others are rejected.

In implementations of this type, it is common to run the chain for an initial period of time (the burn-in period) and to store every \(l\mathrm{th}\) iteration (\(l > 1\)) thereafter until some pre-specified stopping time. This is referred to as thinning the chain and it reduces dependence among the sample points from the posterior distribution. The remaining iterations are used to obtain the posterior summaries. We use the squared error loss function for which the Bayes estimator is the posterior mean. Convergence and stationarity of the Markov chain is studied using the Geweke (Geweke 1992) and the Heidelberger–Welch tests (Heidelberger and Welch 1983). More information about convergence tests in MCMC algorithms can be found in Cowles and Carlin (1996). We also study the Inefficiency Factor (proposed by Kim et al. 1998) which is the ratio of the variance of the sample mean of the correlated draws to the variance of the sample mean of hypothetical i.i.d. draws, to give us a measure of the factor by which the number of iterations needs to be increased (Kohn et al. 2001; Chib and Kang 2016).

Our main interest lies in finding an estimate of variance of the frailty distribution and this is done by using the estimates of the parameters d and k at each iteration of the chain. The estimates of variance of the frailty distribution form another chain, which is subjected to testing for convergence and stationarity.

6 Simulation study

The survival times are simulated from the model given in (1). To observe the effect of cluster size, sample size, and frailty variance on the performance of the model and the estimation procedure, nine datasets consisting of 25, 50 and 100 clusters, each with cluster sizes 2, 4 and 8 have been considered. Clusters of equal size have been chosen for computational convenience although the model can be used for unequal cluster sizes as well.

The survival times for the same cluster are considered to be correlated, having the same frailty described by \(z_i\). In set-up I, the \(z_i's\) follow log-normal distribution with mean 1 and variance 0.5. Set-up II and III assume gamma distribution for \(z_i\) with mean 1 and variance 2 and 4 respectively. Weibull distribution with both shape and scale parameters equal to 2 is chosen to be the baseline distribution. Two treatment effects are considered i.e. the number of covariates r equals 2 with X = \((X_1,X_2)^\prime \) generated from \(N_2(0,I_2\)) and \(\varvec{\beta } = (1.5 , 1.25)^\prime \). For censoring the survival times, it is assumed that the censoring variable follows U(0,a) where a is chosen so as to obtain a censoring of around 10%.

500 datasets are generated under the different scenarios with M = 100, 50, 25 and n = 2, 4, 8. These datasets are then fit to the generalized gamma shared frailty model with Weibull baseline distribution using the estimation method described in Sect. 5. The algorithm described in the previous section is used to estimate the parameters. The integral (8) is computed using 32 point Gauss-Laguerre quadrature rule. The computations are performed using the R statistical software.

The convergence of the method is first monitored graphically using cumulative mean plots. Considering that the state of the chain for the first few iterations is dependent on the starting point, these plots are used to decide the number of iterations that need to be discarded as burn-in.

The trace plots of the chains are studied to determine whether the chain is exploring the parametric space well.

Auto Correlation Function (ACF) plots study the autocorrelation in the data. Since adjacent points in the Markov chain are correlated, it is expected that the autocorrelation will decrease as the lag increases. After discarding the first few iterations as burn-in, every \(l\mathrm{th}\) value is chosen to arrive at a sample of draws from the chain.

Since it was not possible to see the plots individually for all 500 datasets under each scenario, the burn-in and lag were fixed after studying 50 plots under each setting. Geweke and Heidelberger–Welch (H–W) tests are used to study the convergence and stationarity of the chains. Tables 1 and 2 report the percentage of the datasets for which the chains converged ( p-value of the Geweke test statistic > 0.05) and stationarity was achieved (p-value of H–W test statistic > 0.05) and Table 3 reports inefficiency factor for each parameter. These tests help us to decide a suitable length of the Markov chain, that is, the number of iterations required. This number varies for different models and although the same model is used for estimation in our simulation study, the data sizes and frailty variances differ. Hence, we use a standard length of 45,000 iterations with a burn-in of 10,000 and a lag of 150 for all estimations (though these numbers can be changed for quicker estimation in less complex models).

Table 1 Percentage of the chains converging as per the Geweke Test
Table 2 Percentage of the chains achieving stationarity as per the H–W Test
Table 3 Inefficiency factors for the Markov chains for each parameter

The resulting samples are then used to obtain posterior summaries. The squared error loss function is the mean square error (MSE) in our case since the posterior mean is the Bayes estimator. The MSEs, biases and Monte Carlo Standard Errors (MCSE) are displayed in Tables 4.

Table 4 Biases and MSEs obtained from thinned sample and corresponding Monte Carlo Standard Errors for each parameter

On the basis of Tables 1, 2, 3 and 4, it can be concluded that

  1. 1.

    as the cluster size or the number of clusters increases, the percentage of converging or stationary chains is not affected significantly.

  2. 2.

    as the frailty variance of the generated datasets increases, the convergence and stationarity of the Markov chains for the baseline shape parameter \(\psi \) and treatment effects \(\varvec{\beta }\) do not get affected significantly. However, for the estimate of frailty variance, the percentage of chains converging and achieving stationarity reduces.

  3. 3.

    decreasing inefficiency factors as cluster size increases indicates that more iterations are required when the cluster size is small. Also, to estimate frailty variance with better precision, more iterations would be required if the frailty variance is high.

  4. 4.

    since MCSEs reduce with increasing cluster size, the efficiency of the algorithm to simulate draws from each posterior distribution improves as cluster size increases.

  5. 5.

    in general, the bias and mean squared error (MSE) reduces as the number of clusters increases or the cluster size increases. This is due to the increase in the number of data points in the study. Hence, larger studies would give better results.

  6. 6.

    the method is sensitive to the frailty variance. As the variance of the frailty term increases, the MSEs also increase. This indicates that increasing heterogeneity in event times within a cluster can affect the estimates of frailty variance. However, estimates of covariates are unaffected with heterogeneity in the clusters.

Since, the cost of data collection is an important factor that decides the sample size, the researchers should look at the trade-off between improving results or reducing costs of data collection. Also, in case the model indicates high heterogeneity in the data (high frailty variance), it may be worthwhile to look for covariates that have not been considered in the model or consider multiple populations with vastly different hazard rates.

7 Real life data study

Danahy et al. (1977) studied the effect of high dose oral isosorbide dinitrate (ISDN) on heart rate, blood pressure and exercise time until angina pectoris on 21 patients with coronary atherosclerotic heart disease. The patients were studied for 5 days in the hospital where they were given exercise tests lasting until the onset of angina pectoris. The exercise times were recorded for each patients after sublingual nitroglycerin (SLTNG) and sublingual placebo (SLP), at 1, 3 and 5 hours after oral placebo and its control period (OP1, OP2, OP3 and OPC) and at 1, 3 and 5 hours after oral ISDN and its control period (OI1, OI3, OI5 and OIC) as well.

The data set has 10 exercise times for each of the 21 patients with some observations considered to be censored if the patient was too exhausted to complete the test. The data can be analysed using a shared frailty model considering that the observations from a single cluster are coming from the same patient and are hence correlated. The generalized gamma frailty model as described in the previous sections is applied to the exercise data set with baseline distribution as Weibull distribution.

The nine treatment effects along with the baseline and frailty distribution parameters are to be estimated. The algorithm is run for fewer iterations (10,000 to 20,000) to set the standard deviations of the transition kernels in order to have an acceptance rate of approximately 25% for each parameter. Subsequently, the algorithm is run twice for 1,20,000 iterations using two distinct set of initial values.

Fig. 2
figure 2

Trace plots of Markov chains for treatment effects

The trace plots in Fig. 2 indicate whether the chain is exploring the parametric space well.

Observing the burn-in period of the chains from the Cumulative Mean (CM) plots given in Fig. 3, the initial 20,000 iterations for each parameter are discarded. The CM plot also shows the two chains obtained as a result of using two different starting points. This indicates that the estimates are independent of the initial values given to the algorithm. Use of distinct starting values can also check for multiple modes of the likelihood function.

Fig. 3
figure 3

Cumulative mean plots of two chains for the initial 20,000 iterations

The ACF plots shown in Fig. 4 indicate a sufficient reduction in auto-correlation (\({<}0.1\)) at a lag of 150.

Fig. 4
figure 4

The ACF plots

Fig. 5
figure 5

The density plots for treatment effects

The density plots of the posterior sample (Fig. 5) obtained for the treatment effects indicate a symmetric sample. Hence, we take the mean of the thinned sample to obtain an estimate for each treatment effect.

Table 5 Estimated treatment effects when compared with SLP

Table 5 gives the results based on the estimation procedure. The values in Table 5 lead us to the following conclusions.

  1. 1.

    Treatments SLTNG and ISDN (OI1, OI3, OI5) are beneficial to the patients as they reduce the risk of angina pectoris (\(\widehat{\beta }<0\)).

  2. 2.

    Use of an oral placebo increases the risk of heart disease as the corresponding \(\widehat{\beta }\) is positive.

  3. 3.

    Treatment effects SLTNG, OI1, and OI3 are significant as the credible intervals do not contain zero. Hence, there is a significant reduction in the risk of angina pectoris with the use of SLTNG and ISDN with the most benefit at one hour and three hours after the ISDN treatment.

  4. 4.

    Credible intervals of treatment effects OPC and OP5 also do not contain zero indicating a significant increase in risk of heart disease on using an oral placebo specially at the time of treatment and five hours after treatment.

  5. 5.

    The estimate of the shape parameter of the baseline distribution is greater than one, indicating an increasing hazard function with respect to time.

  6. 6.

    The frailty variance is estimated to be around 9.045 which indicates heterogeneity among clusters.

8 Conclusions

Bayesian estimation procedure is applied to the generalized gamma shared frailty model. The Random Walk Metropolis–Hastings algorithm (a Markov Chain Monte Carlo method) has been used to estimate the parameters. Each simulated data set of 100 clusters of size 8 each took approximately 20 minutes on a Dell Vostro 360, Intel i5-2400S processor with 4GB RAM.

In simulation study, it is observed that the Bayesian method performs well with generalized gamma frailty distribution with Weibull baseline distribution. However, the model is sensitive to the variance of the frailty distribution. As the heterogeneity in the data increases, the squared error loss also increases. The estimation method is also applied to the exercise data set given by Danahy et al. (1977) and Bayesian estimates of the treatment effects are obtained. Convergence and stationarity of the Monte Carlo Markov Chains is studied using graphical methods viz. cumulative mean plots and trace plots, and through diagnostic tests such as Geweke and Heidelberger–Welch tests. The credible intervals obtained for each treatment effect are used to identify significant treatment effects.

In this model, one of the basic assumptions is that all treatment effects are independent. However, this may not be true in real life data. One possible way to handle multicollinearity problems is to check for correlation between the covariates prior to analysis. To this end, use of factor analysis method to arrive at a list of the more important independent covariates, may be explored as a part of future research. Use of other baseline distributions will also be explored. There is scope for improving computation times involved in calculations for which the performance of block updation of MCMC algorithm will be looked into.