Keywords

1 Introduction

Data arising from survival and reliability analyses often consist of a response variable that measures the duration of time until the occurrence of a specific event and a set of variables (covariates) thought to be associated with the event-time variable. These data arise in a number of applied fields, such as medicine, biology, public health, epidemiology, engineering, economics, and demography, and they have some features that pose difficulties to traditional statistical methods. The first is that the data are generally asymmetrically distributed, while the second feature is that lifetimes are frequently censored (the end-point of interest has not been observed for that individual). Regression models for survival and reliability data have traditionally been based on the proportional hazards model of Cox [3] which is defined through the hazard function \( h\left( t\mid \mathbf x \right) \) of the form

$$\begin{aligned} h\left( t\mid \mathbf x \right) =h_{0} \left( t\right) \mathrm{{exp}} \left( \mathbf x ^{\prime }\varvec{\beta }\right) , \end{aligned}$$

where \(h_{0}\left( t\right) \) is an arbitrary function of time called baseline hazard function, \(\mathbf x ^{\prime }=\left( x_{1},\ldots ,x_{p}\right) \) is a vector of covariates for the individual at time \(t\), and \(\varvec{\beta }^{\prime }=\left( \beta _{1},\ldots ,\beta _{p}\right) \) is a vector of unknown parameters to be estimated. In the case when the baseline hazard function is treated nonparametrically, then this model becomes a semiparametric model. Instead, if we assume that the baseline hazard function is specified up to a few unknown parameters, which is usually accomplished with a specific parametric distribution such as Weibull distribution, we obtain a parametric proportional hazards model.

Some recent research has focused on developing extended regression models that include Cox model as a special case. In this line, we can find the model introduced by Etezadi-Amoli and Ciampi [4] of the form

$$\begin{aligned} h\left( t\mid \varvec{ \alpha },\varvec{\beta },\mathbf x \right) =\exp \left( \mathbf x ^{\prime } \varvec{\alpha }\right) h_{0}\left( t\exp \left( \mathbf x ^{\prime }\varvec{\beta } \right) \right) , \end{aligned}$$

where \(\varvec{\alpha }\) and \(\varvec{\beta }\) are vectors of regression parameters. For \(\varvec{\beta }=\mathbf 0 \), we deduce the Cox model, while for \(\varvec{\alpha }= \varvec{\beta }\) we obtain the accelerated failure time (AFT) model which is also a popular model in the analysis of survival and reliability data. These authors then show that a better fit is obtained with this new model than with the Cox and AFT models in two examples based on artificial and real data. Nevertheless, the main emphasis of their work is on a spline approximation for the baseline hazard function.

A different family of models with smooth background hazard or survival functions have been proposed by Younes and Lachin [10] and Royston and Parmar [9], which includes the proportional hazards and proportional odds models as special cases. The class of these models is based on transformation of the survival function by a link function \(g(\cdot )\) of the form

$$\begin{aligned} g(S\left( t\mid \varvec{\beta },\mathbf x \right) )=g(S_{0}\left( t\right) )+ \mathbf x ^{\prime }\varvec{\beta }, \end{aligned}$$

where \(S_{0}\left( t\right) =S\left( t\mid \mathbf 0 ,\mathbf x \right) \) is the baseline survival function. The former tackled the estimation problem by using B-splines to estimate the baseline hazard function while the latter utilized natural cubic splines to model \(g(S_{0}\left( t\right) ).\) Here, again the main focus of the work was to check the advantage of a smooth modeling of the background hazard or survival functions, respectively.

An alternative model to the Cox model is based on the hazard function

$$\begin{aligned} h\left( t\mid \varvec{\beta },\mathbf x \right) =h_{0}\left( t\right) \frac{\exp \left( \mathbf x ^{\prime }\varvec{\beta }\right) }{1+\exp \left( \mathbf x ^{\prime }\varvec{\beta }\right) }, \end{aligned}$$
(1)

where the covariate effects are modeled on the logarithmic scale rather than on the log odds scale. In spite of the simplicity of this model, it has not been studied much in the literature. Recently, MacKenzie [7, 8] has considered this logit link-based model with a constant baseline survival function and nonproportional hazards and displayed its applicability, which is given by

$$\begin{aligned} h\left( t\mid \alpha , \varvec{\beta },\mathbf x \right) =h\frac{\exp \left( t\alpha + \mathbf x ^{\prime }\varvec{\beta }\right) }{1+\exp \left( t\alpha +\mathbf x ^{\prime }\varvec{\beta }\right) }. \end{aligned}$$

The flexibility shown by MacKenzie’s model gives us an impetus to extend the Cox model in a similar manner. First of all, to assume that the baseline hazard model is a constant is to limit the flexibility the model. In fact, the aim of the previous papers was to estimate in a proper way the baseline hazard function. Secondly, to measure the influence of the unknown parameters on a generalized log-odds scale instead of a log-odds scale. Therefore, this model is a particular case of our models introduced in Sect.  2 without time-dependence. The reason for not considering time-dependence is to start with a very general family of models but then focus on its simplest form. We hope to consider in our future study time-dependence and also to estimate the background hazard with splines.

In this chapter, we not only study the logit link-based model in (1), but also generalize it to a flexible parametric family of proportional hazards model based on a generalization of the logistic distribution (see Balakrishnan [1]) called Type-I generalized logistic model. The formulation of the model and estimation methods for parametric and semiparametric models are then discussed in Sect.  2. Measures of fitting this model are discussed in Sect.  3. Next, an illustrative example is presented in Sect.  4. Finally, some concluding remarks are made in Sect.  5.

2 The Generalized Logit Link Proportional Hazards Model

The logit link-based model in (1) can be generalized by replacing the logistic distribution function in (1) by a generalization of the logistic distribution called Type-I generalized logistic which is given by

$$\begin{aligned} F\left( y\right) =\frac{1}{\left( 1+e^{-y}\right) ^{a}}, \quad -\infty <y<\infty ,\,a>0; \end{aligned}$$

see Balakrishnan [1].

By utilizing this form, we propose a proportional hazards model defined through the hazard function

$$\begin{aligned} h\left( t \mid \varvec{\beta },a,\mathbf x \right) \,=\,h_{0}\left( t \right) K(\varvec{\beta },a,\mathbf x ), \end{aligned}$$
(2)

with

$$\begin{aligned} K(\varvec{\beta },a,\mathbf x )\,=\,\frac{1}{\left\{ 1+\exp \left( - \mathbf x ^{\prime }\varvec{\beta } \right) \right\} ^{a}}, \quad a>0. \end{aligned}$$

For two covariate profiles \(\mathbf x _{i}\) and \(\mathbf x _{j}\), the hazards are proportional and the relative risk does not depend on \(t\) as

$$\begin{aligned} \rho \left( t \mid \varvec{\beta }, a,\mathbf x _{i},\mathbf x _{j}\right)&= \frac{h \left( t \mid \varvec{\beta }, a, \mathbf x _{i}\right) }{h \left( t \mid \varvec{\beta }, a, \mathbf x _{j}\right) }\\&= \left( \frac{ 1+\exp \left( - \mathbf x _{j}^{\prime }\varvec{\beta } \right) }{ 1+\exp \left( -\mathbf x _{i}^{\prime }\varvec{\beta } \right) }\right) ^a. \end{aligned}$$

Note that in the special case when \(a=1\), we deduce the proportional hazards model with a logit link function in (1).

The survival function corresponding to the hazard model in (2) is

$$\begin{aligned} S\left( t\mid \varvec{\beta },a,\mathbf x \right) \,=\,\exp \left\{ -K\left( \varvec{\beta },a, \mathbf x \right) H_{0}\left( t\right) \right\} , \end{aligned}$$
(3)

where \(H_0(t)=\int _{0}^{t}h_0(u)du\) is the baseline cumulative hazard function.

Equation (2) characterizes the generalized logit link proportional hazards model with density given by

$$\begin{aligned} f\left( t\mid \varvec{\beta },a,\mathbf x \right) =\exp \left\{ -H\left( t\mid \varvec{\beta },a,\mathbf x \right) \right\} h\left( t\mid \varvec{\beta },a,\mathbf x \right) , \end{aligned}$$

where \(H(t|\varvec{\beta },a,\mathbf x )\) is the cumulative hazard function.

At this point, we have not made any assumption about the baseline hazard function \(h_0(t)\), so that the model is parametric only for the covariate effect, and consequently the model is semiparametric. Instead, if we assume a parametric form for the function \(h_0(t)\), the model becomes parametric. Now, we will describe the statistical inferential methods for both these cases.

2.1 Parametric Model

We may assume that the baseline hazard function is specified up to a few unknown parameters. This is usually accomplished with a specific parametric distribution such as the Weibull distribution. In this case, we get a Weibull generalized logit link proportional hazards model WGLPH with hazard function

$$\begin{aligned} h\left( t\mid \lambda ,\gamma , \varvec{\beta },a,\mathbf x \right) =\lambda \gamma t^{\gamma -1}K\left( \varvec{\beta },a,\mathbf x \right) \end{aligned}$$
(4)

which is fully parametric in form. This model contains as a special case the generalized logit link exponential proportional hazards model for the case when \(\gamma =1\).

Then, the cumulative hazard is given by

$$\begin{aligned} H\left( t\mid \lambda ,\gamma ,\varvec{\beta },a,\mathbf x \right) =\lambda K\left( \varvec{\beta },a,\mathbf x \right) t^{\gamma } \end{aligned}$$

and Eq. (4) characterizes the WGLPH with density given by

$$\begin{aligned} f\left( t\mid \lambda ,\gamma ,\varvec{\beta },a,\mathbf x \right) =\lambda K\left( \varvec{\beta },a,\mathbf x \right) \gamma t^{\gamma -1}\exp \left\{ -\lambda K\left( \varvec{\beta },a,\mathbf x \right) t^{\gamma }\right\} . \end{aligned}$$

Note that this is a Weibull density function with parameters \(\gamma \) and \(\lambda K\left( \varvec{\beta },a,\mathbf x \right) .\)

When we assume such a fully parametric form for the distribution of survival times, the estimation of the unknown parameters of the model is by full maximum likelihood method. Consider a sample of \(n\) independent individuals with data \(\left( t_{i},\mathbf x _{i},\delta _{i}\right) \), where \(\delta _{i}=1\) for an event and \(0\) otherwise, for \(i=1,\ldots ,n.\) Accordingly, under the assumption that the censoring mechanism is non-informative, the full likelihood for a random sample of \(n\) individuals is given by

$$\begin{aligned} L \left( \varvec{\beta },a\right) = \prod \limits _{i=1}^{n}\left\{ h \left( t_{i}\mid \varvec{\beta },a,\mathbf x _i\right) \right\} ^{\delta _{i}}\left\{ S\left( t_{i}\mid \varvec{\beta },a,\mathbf x _i\right) \right\} . \end{aligned}$$

For the Weibull baseline hazard, the log-likelihood function simply becomes

$$\begin{aligned} l\left( \lambda ,\gamma ,\varvec{\beta },a \right)&=\ln \left( L\left( \lambda ,\gamma , \varvec{\beta },a\right) \right) \nonumber \\&= \sum _{i=1}^{n}\left[ \delta _{i}\ln \left\{ \lambda \gamma t_{i}^{\gamma -1}K\left( \varvec{\beta } ,a,\mathbf x _i\right) \right\} -\lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \right] . \end{aligned}$$
(5)

To obtain the maximum likelihood estimates, the log-likelihood function in (5) must be maximized numerically by using a procedure for constrained optimization. In order to maximize (5), we obtain its first derivatives with respect to all the parameters which are presented in Appendix A.

Observe that in this case, the corresponding survival function can then be estimated as

$$\begin{aligned} S(t|\hat{\lambda },\hat{\gamma },\hat{\varvec{\beta }},\hat{a},\mathbf x )=\exp \left\{ -K\left( \hat{\varvec{\beta }},\hat{a},\mathbf x \right) \hat{\lambda }t^{\hat{\gamma }} \right\} . \end{aligned}$$

2.2 Semiparametric Model

On the other hand, when we assume an unknown functional form for the baseline survival function, the estimation of the unknown parameters of the model is done by maximum partial likelihood method. Consider a sample of \(n\) independent individuals with data \(\left( t_{i},\mathbf x _{i},\delta _{i}\right) \) as before and when the censoring mechanism is non-informative. In this case, the partial likelihood for a random sample of \(n\) individuals can be written as

$$\begin{aligned} L \left( \varvec{\beta },a \right) = \prod \limits _{i=1}^{n} \left[ \frac{K\left( \varvec{\beta },a,\mathbf x _i \right) }{\sum \limits _{l\in R(t_i)}K\left( \varvec{\beta },a,\mathbf x _l \right) } \right] ^{\delta _{i}}, \end{aligned}$$

where \(R(t_i)\) is the risk set at time \(t_i\).

Then, the partial log-likelihood function is given by

$$\begin{aligned} l\left( \varvec{\beta },a \right)&= \ln \left( L\left( \varvec{\beta },a\right) \right) \nonumber \\&=\sum _{i=1}^{n} \delta _{i} \left[ \ln \left( K\left( \varvec{\beta } ,a,\mathbf x _i\right) \right) -\ln \left( \sum \limits _{l\in R(t_i)}K\left( \varvec{\beta },a,\mathbf x _l \right) \right) \right] . \end{aligned}$$
(6)

For the purpose of maximizing (6) and obtaining the partial maximum likelihood estimates, we use numerical methods for carrying out the required constrained optimization. Its first derivatives with respect to all the parameters are presented in Appendix B.

Once we have fitted a generalized likelihood proportional hazards model, it may be of interest to estimate the survival probability. The estimator of the survival function is based on Breslow’s estimator of the baseline cumulative hazard rate, which proceeds as follows:

Let the full likelihood function be

$$\begin{aligned} L \left( \varvec{\beta },a,h_0(t)\right)&= \prod \limits _{i=1}^{n}\left\{ h_{0}\left( t_i \right) K(\varvec{\beta },a,\mathbf x _i) \right\} ^{\delta _{i}}\left\{ \exp \left[ -K\left( \varvec{\beta },a, \mathbf x _i \right) H_{0}\left( t_i\right) \right] \right\} \\&= \prod \limits _{j=1}^{D}\left\{ h_{0} \left( t_j \right) K(\varvec{\beta },a,\mathbf x _j) \right\} \prod \limits _{i=1}^{n} \left\{ \exp \left[ -K\left( \varvec{\beta },a, \mathbf x _i \right) H_{0}\left( t_i\right) \right] \right\} , \end{aligned}$$

where \(j=1,\ldots ,D\) correspond to the times without censoring. We then obtain

$$\begin{aligned} L \left( \varvec{\beta },a,h_0(t)\right) = \prod \limits _{j=1}^{D} h_{0} \left( t_j \right) K(\varvec{\beta },a,\mathbf x _j) \exp \left\{ -\sum \limits _{i=1}^{n} K\left( \varvec{\beta },a, \mathbf x _i \right) H_{0}\left( t_i\right) \right\} . \end{aligned}$$

Defining \(H_0(t)=\sum \limits _{t^*<t}h_0(t^*)\), and supposing that \(\varvec{\beta }\) and \(a\) are fixed, we have

$$\begin{aligned} L \left( h_0(t)\right) = \prod \limits _{j=1}^{D} h_{0} \left( t_j \right) K(\varvec{\beta },a,\mathbf x _j) \exp \left\{ -\sum \limits _{i=1}^{n} K\left( \varvec{\beta },a, \mathbf x _i \right) \left( \sum \limits _{t_i^*<t_i}h_0(t_i^*)\right) \right\} . \end{aligned}$$

Taking \(h_0(t)=0\) when the event is censored, then we get

$$\begin{aligned} L \left( h_0(t)\right)&= \left[ \prod \limits _{j=1}^{D} h_{0} \left( t_j \right) K(\varvec{\beta },a,\mathbf x _j) \right] \exp \left\{ -\sum \limits _{j=1}^{D}h_0(t_j) \sum \limits _{i \in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) \right\} \\&= \prod \limits _{j=1}^{D} h_{0} \left( t_j \right) K(\varvec{\beta },a,\mathbf x _j) \exp \left\{ -h_0(t_j)\sum \limits _{i \in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) \right\} , \end{aligned}$$

so that

$$\begin{aligned} L \left( h_0(t_1),\cdots ,h_0(t_D)\right) \propto \prod \limits _{j=1}^{D} h_{0} \left( t_j \right) \exp \left\{ - h_0(t_j)\sum \limits _{i\in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) \right\} . \end{aligned}$$

For the determination of the maximum likelihood estimate, we take the derivative with respect to \(h_0(t_j)\) which is given by

$$\begin{aligned} \frac{\partial L}{\partial h_0(t_j)}\,&= \,\exp \left\{ - h_0(t_j)\sum \limits _{i\in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) \right\} \\&\quad + h_0(t_j)\exp \left\{ - h_0(t_j)\sum \limits _{i\in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) \right\} \left\{ -\sum \limits _{i\in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) \right\} . \end{aligned}$$

Upon equating this to zero, we obtain the maximum likelihood estimate to be

$$\begin{aligned} \hat{h}_0(t_j)=\frac{1}{\sum \limits _{i\in R(t_j)} K\left( \varvec{\beta },a, \mathbf x _i \right) }, \end{aligned}$$

and consequently

$$\begin{aligned} \hat{H}_0(t)=\sum \limits _{t^*<t}\frac{1}{w(t^*)}, \end{aligned}$$

where \(w(t^*)=\sum \limits _{l\in R(t^*)}K(\varvec{\beta },a,\mathbf x _l)\).

Since

$$\begin{aligned} \hat{S}_0(t)=\exp \left[ - \hat{H}_0(t) \right] \end{aligned}$$

is the estimator of the survival function of an individual with covariate vector \(\mathbf x=0 \), for estimating the survival function of an individual with covariate vector \(\mathbf x=x^* \), we use the estimator

$$\begin{aligned} \hat{S}(t|\hat{\varvec{\beta }},\hat{a},\mathbf x=x^* ) = \left\{ \hat{S}_0(t)\right\} ^{K \left( \hat{\varvec{\beta }},\hat{a}, \mathbf x=x^* \right) } . \end{aligned}$$
(7)

In both cases of parametric and semiparametric setting, the variance of the estimated parameters \(\hat{\varvec{\theta }}=(\hat{\lambda }, \hat{\gamma }, \hat{\varvec{\beta }},\hat{a})\) and \(\hat{\varvec{\theta }}=(\hat{\varvec{\beta }},\hat{a})\) obtained by maximizing Eqs. (5) and (6), respectively, can be estimated as

$$\begin{aligned} var \left( \hat{\varvec{\theta }} \right) =diag \left( I^{-1}\left( \hat{\varvec{\theta }}\right) \right) , \end{aligned}$$

where \(I\) is the observed information matrix.

3 Measures of Fit

After fitting several possible models for a given data, we will need to compare the fit of each model for selecting the best one. When we fit several non-nested models, we may use the Akaike information criterion (AIC) to choose the best one among them. The AIC is defined as

$$\begin{aligned} -2 l \left( \hat{\varvec{\theta }} \right) + 2(\mathrm{{number\, of\, model\, parameters}}). \end{aligned}$$

Essentially, we compare the AIC scores for different models and then select the one with the smallest AIC score.

Another popular criterion for model selection among parametric models is the Bayesian information criterion (BIC). The BIC is given by

$$\begin{aligned} -2 l \left( \hat{\varvec{\theta }} \right) + (\mathrm{{number\, of\, model\, parameters}})\log (\mathrm{{sample\, size}}), \end{aligned}$$

and in the same way as with AIC scores, we select the one with the smallest BIC value.

On the other hand, to describe how well a model fits the observed data, we can do tests of goodness-of-fit for the estimated survival function. Such tests summarize the discrepancy between observed values and the expected values for the survival function under the model. We will use two well-known statistics for this purpose, the first one is the lack of fit sum of squares (SS) given by

$$\begin{aligned} \sum _i(\mathrm{{observed\, value}}_i-\mathrm{{fitted\, value}}_i)^2, \end{aligned}$$

and the second is the Kolmogorov–Smirnov statistic (KS) defined as

$$\begin{aligned} \max _i|\mathrm{{observed\, value}}_i-\mathrm{{fitted\, value}}_i| . \end{aligned}$$

4 Numerical Illustration

We illustrate the use of the family of proposed models by analyzing death times of male laryngeal cancer patients. Kardaun [5] reported data on 90 males diagnosed with cancer of the larynx during the period 1970–1978 at a Dutch hospital. Times recorded were the intervals (in years) between the first treatment and either death or the end of the study. Also recorded were the patient’s age at the time of diagnosis and the stage of the patient’s cancer, wherein the stage is a factor of four levels. The larynx data have been used by Klein and Moeschberger [6] to illustrate some techniques in survival analysis. The larynx data can be obtained from the MKsurv Package of the R software package.

4.1 Fit of a Fully Parametric Model

First, we fit the fully parametric proportional hazards model by means of three specific models, namely, the Weibull proportional hazards model (WPH), the Weibull logit link proportional hazards model (WLPH), and the WGLPH. To get a good fit of these models, we do a grid \(\{0.5,1.0,\ldots ,4.5,5.0\}\times \{0.5,1.0,\ldots ,4.5,5.0\}\) for the initial values of the parameters \(\lambda \) and \(\gamma \) for the required maximization of the three specified models. Moreover, \(a=1\) is used as the initial value for the parameter a of the WGLPH model.

In the fitting of the WGLPH model, we found that the numerical methods, used to determining the maximum, in fact, find local maxima that can be far away from the global maximum. We therefore adopt a profile full log-likelihood method to solve this problem.

Let the profile log-likelihood function be

$$\begin{aligned} p\;l\left( a\right) =\sup _{\lambda ,\gamma ,\varvec{\beta }}l\left( \lambda ,\gamma ,\varvec{\beta },a\right) . \end{aligned}$$

We then calculate this function for \(a \in \left\{ 5.0,6.0,7.0,8.0,9.0\right\} \) and the same grid for initial points of \(\lambda \) and \(\gamma \) as mentioned earlier. Finally, we found the value of \(a=8\) to be the choice of \(a\) that maximized the profile full log-likelihood (WGLPHaF). The parameter estimates (Est) and their standard errors (SE) so obtained for the four fitted models are presented in Table  1.

Now, we compare the empirical survival function with the estimated survival function for each of the four fitted models. The empirical survival function was calculated by Kaplan–Meier estimator, while the survival function of the WLPH, WGLPHaF, and WGLPH models were estimated with the corresponding function (3) of the parametric model. In Fig.  1, we have shown a plot of these estimated survival functions. To assess the goodness-of-fit of these models, we calculated the SS and KS statistics to compare the empirical survival function with the corresponding estimated survival function for all four models (WPH, WLPH, WGLPHaF, and WGLPH). In Table  2, the values of the SS and KS statistics obtained for these four fitted models are presented.

Also, in Table  3, two comparisons of these four models are made based on the (AIC) and the BIC.

Table 1 Fit of the four fully parametric models

All these results show that the WGLPH, with the parameter \(a\) determined by the profile likelihood method, provides overall the best fitting model for the considered data.

Fig. 1
figure 1

Survival functions of the four fitted parametric models and the Kaplan–Meier estimator

Table 2 Goodness-of-fit statistics for the four fitted fully parametric models
Table 3 AIC and BIC values for the four fitted parametric models
Table 4 Fit of the four semiparametric models

4.2 Fit of a Semiparametric Model

Now, we fit the proportional hazards model (PH), the logit link proportional hazards model (LPH), and the generalized logit link proportional hazards model (GLPH) in a semiparametric framework. In order to get the best fit for the GLPH model, we do a grid for the initial value of the parameter \(a\) in \(\{0.5,1.0,1.5,2.0\}\) and we look for the maximum value of the partial log-likelihood function in (6).

Fig. 2
figure 2

Survival functions of the four fitted semiparametric models and the Kaplan–Meier estimator

As in the parametric case, we consider a profile partial log-likelihood method maximizing the function

$$\begin{aligned} p\;l\left( a\right) =\sup _{\varvec{\beta }}l\left( \varvec{\beta },a\right) . \end{aligned}$$

We calculate this function for \(a \in \left\{ 1,2,\ldots ,8,9,10 \right\} \) in GLPH with \(a\) fixed and we get \(a=8\) that maximizes the profile partial log-likelihood function (GLPHaF). In Table  4, we present the parameter estimates (Est) and their standard errors (SE) for the four fitted models. Note that we obtain similar estimation for GLPHaF (\(a=8\) fixed) and GLPH models, but we reduce substantially the SE in GLPHaF, as the parameter \(a\) is fixed in this case. Furthermore, we compare the empirical survival function and the estimated survival function for each of the four models. The empirical survival function was calculated by Kaplan–Meier estimator, the Breslow’s estimator was used to estimate the survival function in Cox PH model, while the survival function of LPH, GLPHaF, and GLPH models were estimated with the corresponding function \(S(t)\) of the semiparametric model in (7). In Fig.  2, we have presented a plot of these estimated survival functions. In Table  5 we present the values of the SS and KS statistics to compare the empirical survival function and the estimated survival function for each of the four fitted models (PH, LPH, GLPHaF and GLPH).

Table 5 Goodness-of-fit statistics for the four fitted semiparametric models
Table 6 AIC and BIC values for the four fitted semiparametric models

In Table  6, the comparisons of the four fitted models are made based on the AIC and the BIC. From all these results, we draw the general conclusion that the generalized logit-link proportional hazards models are good competitors for the Cox model.

5 Final Remarks

The family of proposed models is quite flexible and seems to provide a good competitor for the Cox model. We derive the likelihood function and the partial likelihood function for the parametric and semiparametric models, respectively, for obtaining the parameter estimates and their standard errors. The estimated survival function of a member of the proposed family of models fits the empirical survival function better than the Cox model. Furthermore, this generalized logit-based proportional hazards model is the one with minimum AIC and BIC.

There are still some unresolved issues in this regard. First of all, the asymptotic properties of the parameter estimates have to be established, and associated statistical inferential issues need to be studied in detail. Another problem of interest is to introduce time-dependence in the models. In this case, the proposed models will provide an extension of [7, 8] models in two ways, with one coming from the non-constant hazard function and the other arising from the generalized logit-link function.

6 Appendix A: Derivatives of the Log-Likelihood Function in a Fully Parametric Model

To maximize (5), we obtain its first derivatives with respect to all the parameters as

$$\begin{aligned} \frac{\partial l \left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \lambda }&= \sum _{i=1}^{n}\left[ \frac{\delta _{i}}{ \lambda } - K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \right] ,\\ \frac{\partial l\left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \gamma }&= \sum _{i=1}^{n}\left[ \delta _ i\left( \frac{1}{ \gamma } + \ln (t_i) \right) - \lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \ln (t_i) \right] ,\\ \frac{\partial l\left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \beta _j}&= \sum _{i=1}^{n}\left[ \frac{ax_{ij}\left( \delta _i -\lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \right) }{1+\exp (\mathbf x _i^{\prime }\varvec{\beta })} \right] , \end{aligned}$$

for \(j=1,\ldots ,p\), and

$$\begin{aligned} \frac{\partial l\left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial a} = \sum _{i=1}^{n}\left[ \ln \left( 1+\exp \left( -\mathbf x _i^{\prime } \varvec{\beta } \right) \right) \left( \lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } -\delta _i \right) \right] . \end{aligned}$$

To obtain the corresponding information matrix \(I(\lambda ,\gamma ,\varvec{\beta },a)\), we need the Hessian matrix \(H(\lambda ,\gamma ,\varvec{\beta },a)\) which is the matrix of second derivatives of the log-likelihood function in (5) with respect to its parameters.

We obtain them readily as follows:

$$\begin{aligned} \frac{\partial ^2 l \left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \lambda ^2}&= -\frac{1}{\lambda ^2}\left[ \sum _{i=1}^{n}\delta _{i}\right] , \\ \frac{\partial ^2 l \left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \lambda \partial \gamma }&= -\sum _{i=1}^{n} K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \ln (t_i), \\ \frac{\partial ^2 l \left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \lambda \partial \beta _j}&= -a\sum _{i=1}^{n}\frac{K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } x_{ij}}{1+\exp (\mathbf x _i^{\prime }\varvec{\beta })}, \qquad \text{ with } j=1,\cdots ,p, \\ \frac{\partial ^2 l \left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \lambda \partial a}&= \sum _{i=1}^{n}K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \ln \left( 1+\exp (-\mathbf x _i^{\prime }\varvec{\beta })\right) , \\ \frac{\partial ^2 l\left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \gamma ^2}&= \sum _{i=1}^{n}\left[ -\frac{\delta _ i}{ \gamma ^2} - \lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma }\left( \ln (t_i)\right) ^2 \right] , \\ \frac{\partial ^2 l\left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \gamma \partial \beta _j}&= -a\lambda \sum _{i=1}^{n}\frac{K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \ln (t_i)x_{ij}}{1+\exp (\mathbf x _i^{\prime }\varvec{\beta })}, \qquad \text{ with } j=1,\cdots ,p,\\ \frac{\partial ^2 l\left( \lambda ,\gamma ,\varvec{\beta },a \right) }{\partial \gamma \partial a}&= \lambda \sum _{i=1}^{n}K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma }\ln (t_i) \ln \left( 1+ \exp (-\mathbf x _i^{\prime }\varvec{\beta }) \right) , \end{aligned}$$

for \(j=1,\ldots ,p\) and \(l=j,\cdots ,p,\)

$$\begin{aligned} \frac{\partial ^2 l\left( \lambda ,\gamma ,\varvec{\beta } ,a\right) }{\partial \beta _j \partial \beta _l}&= -a\sum _{i=1}^{n}\\&\quad \;\times \left[ \!\frac{\delta _i\exp (\mathbf x _i^{\prime }\varvec{\beta })x_{ij}x_{il}+ \lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_i^{\gamma }x_{ij}x_{il}\left( a-\exp (\mathbf x _i^{\prime }\varvec{\beta })\right) }{\left( 1+\exp (\mathbf x _i^{\prime }\varvec{\beta })\right) ^2}\!\right] ,\\ \frac{\partial ^2 l\left( \lambda ,\gamma ,\varvec{\beta }, a \right) }{ \partial \beta _j \partial a}&= \sum _{i=1}^{n}\left[ \frac{\delta _i x_{ij}-\lambda K\left( \varvec{\beta },a,\mathbf x _i \right) t_i^{\gamma }x_{ij} \left\{ 1-a\ln \left( 1+\exp (-\mathbf x _i^{\prime }\varvec{\beta }) \right) \right\} }{1+\exp (\mathbf x _i^{\prime }\varvec{\beta })} \right] , \\ \frac{\partial ^2 l\left( \lambda ,\gamma ,\varvec{\beta },a\right) }{\partial a^2}&= -\lambda \sum _{i=1}^{n}K\left( \varvec{\beta },a,\mathbf x _i \right) t_{i}^{\gamma } \left\{ \ln \left( 1+\exp (-\mathbf x _i^{\prime }\varvec{\beta })\right) \right\} ^2.\\ \end{aligned}$$

7 Appendix B: Derivatives of the Partial Log-Likelihood Function in the Semiparametric Model

To maximize (6), we obtain its first derivatives with respect to all the parameters as

$$\begin{aligned} \frac{\partial l\left( \varvec{\beta },a \right) }{\partial \beta _j} = a\sum _{i=1}^{n}\delta _i \left[ \frac{x_{ij}}{1+\exp (\mathbf x _i^{\prime }\varvec{\beta })}-\frac{1}{\sum \limits _{l\in R(t_i)}K\left( \varvec{\beta },a,\mathbf x _l \right) }\sum \limits _{l\in R(t_i)}\frac{x_{lj}K\left( \varvec{\beta },a,\mathbf x _l \right) }{1+\exp (\mathbf x _l^{\prime }\varvec{\beta })}\right] \end{aligned}$$

for \(j=1,\ldots ,p\), and

$$\begin{aligned} \frac{\partial l(\varvec{\beta },a )}{\partial a}&= \sum _{i=1}^{n}\delta _i \left[ - \ln \left( 1+\exp \left( -\mathbf x _i^{\prime } \varvec{\beta } \right) \right) \right. \\&\quad \left. +\frac{1}{\sum \limits _{l\in R(t_i)}K\left( \varvec{\beta },a,\mathbf x _l \right) } \sum \limits _{l\in R(t_i)}K\left( \varvec{\beta },a,\mathbf x _l \right) \ln \left( 1+\exp \left( -\mathbf x _l^{\prime } \varvec{\beta } \right) \right) \right] .\\ \end{aligned}$$

Now, to obtain the information matrix, we need the second derivatives of the partial log-likelihood function (6) for \(j=1,\cdots ,p\) and \(m=j,\cdots ,p,\) which are as follows:

$$\begin{aligned} \frac{\partial ^2 l\left( \varvec{\beta } ,a\right) }{\partial \beta _j \partial \beta _m}&= a \sum _{i=1}^{n} \delta _i \left[ \frac{-\exp (\mathbf x _i^{\prime }\varvec{\beta })x_{ij}x_{im}}{\left( 1+\exp (\mathbf x _i^{\prime }\varvec{\beta })\right) ^2} - \frac{av_m(t_i)v_j(t_i) - w(t_i) y_{jm}(t_i) }{\left( w(t_i) \right) ^2} \right] ,\\ \frac{\partial ^2 l\left( \varvec{\beta }, a \right) }{ \partial \beta _j \partial a}&= \sum _{i=1}^{n}\delta _i \left[ \frac{x_{ij}}{1+\exp (\mathbf x _i^{\prime }\varvec{\beta })}-\frac{v_j(t_i)}{w(t_i)}-a \frac{v_j(t_i)u(t_i)}{\left( w(t_i)\right) ^2}+a\frac{z_j(t_i)}{w(t_i)} \right] , \\ \frac{\partial ^2 l\left( \varvec{\beta },a\right) }{\partial a^2}&= \sum _{i=1}^{n}\delta _i \left[ \frac{-\left( u(t_i)\right) ^2}{\left( w(t_i)\right) ^2} +\frac{ \sum \limits _{l\in R(t_i)} K\left( \varvec{\beta },a,\mathbf x _l \right) \left\{ \ln \left( 1+\exp (-\mathbf x _l^{\prime }\varvec{\beta })\right) \right\} ^2 }{w(t_i) }\right] , \end{aligned}$$

where

$$\begin{aligned} w(t)&= \sum \limits _{l\in R(t)} K\left( \varvec{\beta },a, \mathbf x _l \right) ,\\ v_{j}(t)&= \sum \limits _{l\in R(t)} \frac{K\left( \varvec{\beta },a,\mathbf x _l \right) x_{lj}}{1+\exp (\mathbf x _l^{\prime }\varvec{\beta })},\\ y_{jm}(t)&= \sum \limits _{l\in R(t)} \frac{K\left( \varvec{\beta },a,\mathbf x _l \right) x_{lj}x_{lm}\left( a-\exp (\mathbf x _l^{\prime }\varvec{\beta })\right) }{\left( 1+\exp (\mathbf x _l^{\prime }\varvec{\beta })\right) ^2},\\ u(t)&= \sum \limits _{l\in R(t)} K\left( \varvec{\beta },a,\mathbf x _l \right) \ln \left( 1+\exp (-\mathbf x _l^{\prime }\varvec{\beta })\right) ,\\ z_j(t)&= \sum \limits _{l\in R(t)} \frac{ K\left( \varvec{\beta },a,\mathbf x _l \right) x_{lj}\left\{ \ln \left( 1+\exp (-\mathbf x _l^{\prime }\varvec{\beta }) \right) \right\} }{1+\exp (\mathbf x _l^{\prime }\varvec{\beta })}. \\ \end{aligned}$$