1 Introduction

Analysis of repeated measures categorical data has drawn the interest of many researchers in the last few decades and has become an important and active area of research [7, 9, 14]. Most of the previous works on correlated outcome variables were based on the marginal response probabilities. Generalized estimating equations (GEE) is a popular and one of the most widely used methods for analyzing longitudinal data which is a quasi-likelihood approach that uses a population averaged model [20, 32]. GEE do not require to meet the classical assumptions of independence and normality, which are too restrictive for many problems [26]. Carey et al. [5] introduced alternating logistic regression (ALR) models based on marginal odds ratios instead of correlations between pairs of binary responses combining the first-order GEE for regression coefficients with a new logistic regression equation for estimating the correlation parameter. Due to lack of proper specification of the underlying model, marginal models such as GEE or ALR may fail to provide the measure of dependence of binary outcomes. The induced correlations considered in these methods and anomalies caused by the induced correlation between repeated outcomes is beyond any explanation. Working correlation structure of GEE has been a concern of many studies mainly focusing on examining the existing selection criteria and/or proposing new selection criteria for correlations structures [10, 12, 23, 24, 27, 30]. Many studies, see, for example, Darlington and Farewell, Guerra et al. [6, 11], tried to address this problem by modifying the approaches based on marginal models using Markov-based transition probabilities.

A good number of researchers, for example, Muenz and Rubinstein [22], Zeger et al. [33] and Azzalini [1], explored Markov models for binary longitudinal data. Islam and Chowdhury [18], Islam et al. [13, 16, 17] carried out a series of research works using Markov-based conditional models and joint models based on marginal conditional approaches for repeated binary data. The conditional regressive logistic models of Bonney  [3, 4] were generalized by Islam et al. [13] to include both binary outcomes in previous times and the covariates in the conditional models.

A longitudinal data offers the advantage of visualizing the change in the individual responses with respect to time. GEE- or GEE-based models, being constructed to describe the population averaged or marginal distribution of repeated measurements, may sometimes be appropriate for descriptive observational studies but should be used carefully in causal experiments [21]. Moreover, GEE or other marginal models may not provide the measure of dependence of binary outcomes due to lack of proper specification of the underlying model. The conditional models alone are, also, not adequate to model the longitudinal data. In this study, we proposed two joint models using marginal-conditional approaches for longitudinal data. These models can be used as alternatives to GEE-based models for longitudinal data where marginal models are not appropriate. Starting with an extension of the Markov-based model proposed by Darlington and Farewell [6], we proposed, consequently, a more generalized form that takes into account the correlation structure in an appropriate manner. Finally, we proposed the use of a regressive model-based joint model in case of more than three repeated outcomes in a longitudinal data. The proposed joint models and their inference procedures are simple. Nevertheless, the proposed models take care of the covariate dependence of the conditional probabilities (of occurrence of events) in second or later follow-ups given the earlier responses of the same subject. One can use of the proposed models for any number of follow-ups, equal or unequal, without making the underlying model complex. Furthermore, the estimation and test procedures for both the specific parameters of interest and the overall model is easy and simple for practical uses on any longitudinal data. Through a simulation study, we compared the proposed two joint models (based on a marginal conditional approach) with GEE and ALR based on marginal models. Finally, we illustrated the selected methods using Health and Retirement Study data [29].

2 Models for Analyzing Repeated Binary Data

Let \(Y_{ij}\) be a Bernoulli outcome variable for subject i at jth occasion, \(i=1, 2, ..., N\) and \(j =1, 2, ..., n_i\). Then the outcome vector for subject i can be defined as \({\varvec{Y}}_i=\left( Y_{i1}\;Y_{i2}\;...\;Y_{in_i}\right) '\) with mean vector \( {\varvec{\mu }}_i =E({\varvec{Y}}_i) =\left( \mu _{i1}\; \mu _{i2}\;...\;\mu _{in_i}\right) ' =\left( p_{i1}\;p_{i2}\;...\;p_{in_i}\right) '\). Also let \({\varvec{X}}_{ij}\) be the \(1 \times (p+1)\) vector of covariates for subject i at jth occasion.

Let us consider the simplest case of two repeated outcomes on each individual. The vector of responses can be defined as \({\varvec{Y}}_i=\left( Y_{i1}\;Y_{i2} \right) \). For binary outcome variables \(Y_{i1}\) and \(Y_{i2}\) of ith individual, the marginal probability of \(Y_{ij}\) observing an event can be expressed as

$$\begin{aligned} p_{ij}&= \hbox {Pr}(Y_{ij} = 1| {{\varvec{x}}_{ij}})\nonumber \\&= \frac{e^{{\varvec{x}}_{ij} {\varvec{\beta }}_j}}{1+e^{{\varvec{x}}_{ij} {\varvec{\beta }}_j}}\; i=1, 2, \ldots , N; j=1, 2, \end{aligned}$$
(1)

where \({\varvec{\beta }}_j=(\beta _{j0}, ..., \beta _{jp})'\) is a \((p+1) \times 1\) vector of parameters of the marginal model of \(Y_{ij}\). Consequently, the marginal probability of not observing an event can be expressed as \(1-p_{ij} = 1-\hbox {Pr}(Y_{ij} = 1| {{\varvec{x}}_{ij}}) = {\frac{1}{1+{{e}^{{\varvec{x}}_{ij} {\varvec{\beta }}_j}}}}\).

If \(Y_{i2}\) depends on \(Y_{i1}\), then for each possible values of \(y_{i1}\), we get one conditional model for \(Y_{i2}\). As we assumed \(Y_{ij}\) to be binary random variables, \(Y_{i1}\) can take values 0 and 1. When \(Y_{i1}=0\), the conditional probability of \(Y_{i2}=1\) can be defined as

$$\begin{aligned} p^*_{i2}&= \hbox {Pr}(Y_{i2} = 1| y_{i1}=0, {\varvec{x}}_{i2})\nonumber \\&= \frac{{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{01}}}}{1+{e^{ {\varvec{x}}_{i2}{\varvec{\beta }}_{01}}}}; \; i =1,2,\ldots ,N, \end{aligned}$$
(2)

here \( {\varvec{\beta }}_{01}\) is the vector of parameters of the conditional model of \(P(Y_{i2}=1|Y_{i1}=0, X_{i2}=x_{i2})\); \(i=1, 2, \ldots N\). Here, the suffix (01) of \({\varvec{\beta }}\) is used to show the transition from \(Y_{i1}=0\) to \(Y_{i2}=1\).

Similarly, when \(Y_{i1}=1\), the conditional probability of \(Y_{i2}=1\) can be defined as

$$\begin{aligned} p^*_{i2}= & {} \hbox {Pr}(Y_{i2} = 1| y_{i1}=1, {\varvec{x}}_{i2}) \nonumber \\= & {} \frac{{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{11}}}}{1+{e^{ {\varvec{x}}_{i2}{\varvec{\beta }}_{11}}}}; \; i =1,2,\ldots ,N, \end{aligned}$$
(3)

where \( {\varvec{\beta }}_{11}\) is the vector of parameters of the conditional model of \(P(Y_{i2}=1|Y_{i1}=1, X_{i2}=x_{i2})\); \(i=1, 2, \ldots N\). The suffix (11) of \({\varvec{\beta }}\) is used to show the transition from \(Y_{i1}=1\) to \(Y_{i2}=1\).

For \(i=1,2, \ldots ,N\), the joint probabilities can be expressed as the product of marginal and conditional probabilities,

$$\begin{aligned} P(Y_{i1}, Y_{i2}) =P(Y_{i2}=1|Y_{i1}=y_{i1}, {\varvec{x}}_{i2})P(Y_{i1}=y_{i1}| {\varvec{x}}_{i1}). \end{aligned}$$
(4)

The repeated measures data are naturally correlated and the major challenge of the methods for analyzing repeated measures categorical data is to model the probable correlations among the repeated observations on the same subject.

2.1 Marginal Models

Following the quasi-likelihood approach [31], with a mean model, \(\mu _{ij}\), and variance structure, \(V_{ij}\), the GEE [20, 32] for \({\varvec{\beta }}\), where \({\varvec{\beta }}\) denotes the parameters of the marginal model, can be expressed as:

$$\begin{aligned} {} U({\varvec{\beta }})=\sum \limits _{i=1}^{N}U_i({\varvec{\beta }}) = \sum \limits _{i=1}^{N}{{\varvec{D}}_i'{\varvec{V}}_i^{-1}({\varvec{Y}}_i-{\varvec{\mu }}_i)}=0, \end{aligned}$$
(5)

where \({\varvec{D}}_i=\frac{\delta {\varvec{\mu }}_i}{\delta {\varvec{\beta }}}\) and \({\varvec{V}}_i\) is a working or approximate covariance matrix of \({\varvec{Y}}_i\) that allows the time dependence to be specified in different ways. The GEE approach uses an induced correlation matrix to define the correlation among the repeated responses. The commonly used correlation structures are independence, autoregressive, exchangeable or unstructured correlation.

The alternating logistic regression (ALR) procedure proposed by Carey et al. [5] combines the first-order GEE for \({\varvec{\beta }}\) with new logistic regression equations for estimating correlation parameter. ALR regress the response on explanatory variables and model the association among responses in terms of pairwise odds ratio simultaneously. The ALR estimate of \((\alpha , {\varvec{\beta }})\), where \(\alpha \) is the pairwise log odds ratio and \({\varvec{\beta }}\) is the regression coefficient, is the simultaneous solution of the following unbiased estimating equations:

$$\begin{aligned} U_{{\varvec{\beta }}}&= \sum _{i=1}^{N}\left( \frac{\delta {\varvec{\mu }}_i}{\delta {\varvec{\beta }}}\right) ' {\varvec{V}}^{-1}_i({\varvec{Y_i}}-{\varvec{\mu }}_i)=0, \end{aligned}$$
(6)
$$\begin{aligned} U_{\alpha }&= \sum _{i=1}^{N}\left( \frac{\delta {\varvec{\zeta }}_i}{\delta \alpha }\right) ' {\varvec{S}}^{-1}_i({\varvec{Y_i}}-{\varvec{\zeta }}_i)=0, \end{aligned}$$
(7)

where \({\varvec{\zeta }}_{ijk}=E(Y_{ij}|Y_{ik}=y_{ik})\) and \(S_i\) is the \(^{n_i}C_2 \times ^{n_i}C_2\) diagonal matrix with elements \({\varvec{\zeta }}_{ijk}(1-{\varvec{\zeta }}_{ijk})\). Equations (6) and (7) are solved simultaneously for \({\varvec{\beta }}\) and \(\alpha \).

2.2 Dependence in Bivariate Binary Outcomes

Consider binary outcomes \(Y_{i1}\) and \(Y_{i2}\) for ith individual. If \(Y_{i1}\) and \(Y_{i2}\) are not independent, then the conditional probability of \(Y_{i2}\) given \(Y_{i1}\) can be expressed as [6, 25]

$$\begin{aligned} \begin{aligned}&P(Y_{i2}=1|Y_{i1}, {\varvec{X}}_{i2}={\varvec{x}}_{i2})\\&=P(Y_{i2}=1|{\varvec{X}}_{i2}={\varvec{x}}_{i2})+\rho _i\left( Y_{i1}-P(Y_{i1}|{\varvec{X}}_{i1}={\varvec{x}}_{i1})\right) \\ \end{aligned} \end{aligned}$$
(8)

where \(\rho \) is the correlation between \(Y_{i1}\) and \(Y_{i2}\). For \(Y_{i1}=0\), Eq. (8) can be expressed as,

$$\begin{aligned} \begin{aligned}&P(Y_{i2} =1|Y_{i1}=0, {\varvec{X}}_{i2}={\varvec{x}}_{i2})\\&=P(Y_{i2}=1|{\varvec{x}}_{i2})+\rho _i \left( 0-P(Y_{i1}|{\varvec{x }}_{i1})\right) \\&\mathrm {or}, \quad \\&\frac{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{01}}}{1+e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{01}}} =\frac{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{2}}}{1+e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{2}}} - \rho _i .\frac{e^{{\varvec{x}}_{i1}{\varvec{\beta }}_{1}}}{1+e^{{\varvec{x}}_{i1}{\varvec{\beta }}_{1}}}\\ \end{aligned} \end{aligned}$$
(9)

and for \(Y_{i1} =1\), Eq. (8) can be expressed as:

$$\begin{aligned} \begin{aligned}&P(Y_{i2} =1|Y_{i1}=1, {\varvec{x}}_{i2})\\&=P(Y_{i2}=1|{\varvec{x}}_{i2})+\rho _i\left( 1-P(Y_{i1}|{\varvec{x}}_{i1})\right) \\&or, \quad \\&\frac{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{11}}}{1+e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{11}}}=\frac{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{2}}}{1+e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{2}}} + \rho _i \left( 1- \frac{e^{{\varvec{x}}_{i1}{\varvec{\beta }}_{1}}}{1+e^{{\varvec{x}}_{i1}{\varvec{\beta }}_{1}}}\right) \end{aligned} \end{aligned}$$
(10)

Clearly \(\rho _i\) is a function of \({\varvec{\beta }}_1\), \({\varvec{\beta }}_2\) and \({\varvec{\beta }}_{2.1}\) where \({\varvec{\beta }}_1\) and \({\varvec{\beta }}_2\) are the parameters of the marginal models (Eq. 1), \(j=1,2\) and \({\varvec{\beta }}_{2.1} = {\varvec{\beta }}_{01}\) or \({\varvec{\beta }}_{11}\), are the vectors of parameters of the two conditional models (Eq. 2) for \(j=2\). When \(Y_{i1}\) and \(Y_{i2}\) are not correlated, then \(\rho _i=0\) and

$$\begin{aligned} \begin{aligned}&P(Y_{i2} =1|Y_{i1}, {\varvec{X}}_{i2}={\varvec{x}}_{i2}) \\&= P(Y_{i2}=1|{\varvec{X}}_{i2}={\varvec{x}}_{i2})\\&=\frac{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_2}}{1+e^{{\varvec{x}}_{i2}{\varvec{\beta }}_2}}. \end{aligned} \end{aligned}$$
(11)

Theoretically, the observed correlation between two repeated outcome variables, \(Y_{i1}\) and \(Y_{i2}\), can be shown as:

$$\begin{aligned} \rho _i =\frac{\hbox {cov}(Y_{i1}, Y_{i2})}{\sqrt{V(Y_{i1})}\sqrt{V(Y_{i2})}} =\frac{E(Y_{i1} Y_{i2})-E(Y_{i1})E(Y_{i2})}{\sqrt{\mu _{i1}(1-\mu _{i1})} \sqrt{\mu _{i2} (1-\mu _{i2})}} \end{aligned}$$
(12)

where

$$\begin{aligned} \begin{aligned} E(Y_{i1}Y_{i2})&=\sum \limits _{y_{i1}, y_{i2}=0}^{1}y_{i1}y_{i2} P(Y_{i1}=y_{i1}, Y_{i2}=y_{i2})\\&=P(Y_{i1}=1, Y_{i2}=1)\\&=P(Y_{i2}=1| Y_{i1}=1)P(Y_{i1}=1)\\&=\frac{e^{x_{i2}{\varvec{\beta }}_{11}}}{1+ e^{x_{i2}{\varvec{\beta }}_{11}}}.\frac{e^{x_{i1}{\varvec{\beta }}_{1}}}{1+ e^{x_{i1}{\varvec{\beta }}_{1}}} \end{aligned} \end{aligned}$$
(13)

If \({\varvec{X}}_{ij}\) is time invariant then the correlation between \(Y_{i1}\) and \(Y_{i2}\), can be shown as:

$$\begin{aligned} \rho _i= e^{\frac{1}{2} {{\varvec{x}}_i({\varvec{\beta }}_1-{\varvec{\beta }}_2)}} \frac{ e^{ {{\varvec{x}}_i {\varvec{\beta }}_{11}}} - e^{ {{\varvec{x}}_i {\varvec{\beta }}_2}} }{(1+e^{ {{\varvec{x}}_i {\varvec{\beta }}_{11}}})}. \end{aligned}$$
(14)

Equation (14) shows that correlation between \(Y_{i1}\) and \(Y_{i2}\) is equal to zero when \({\varvec{\beta }}_{11}={\varvec{\beta }}_2\). However, this condition does not completely define no association between \(Y_{i1}\) and \(Y_{i2}\). Equations (9) and (10) show that for the independence of \(Y_{i1}\) and \(Y_{i2}\), it is necessary that both \({\varvec{\beta }}_{01}\) and \({\varvec{\beta _{11}}}\) are equal and equal to \({\varvec{\beta }}_2\). If \({\varvec{\beta }}_{01}\ne {\varvec{\beta }}_{11}\) then \(Y_{i1}\) and \(Y_{i2}\) are associated. Islam et al. [17] showed that the dependence in bivariate Bernoulli outcome variables can be tested by testing the equality of the conditional models. \(Y_{i1}\) and \(Y_{i2}\) are independent if \(P(Y_{i2}=1|Y_{i1}=s_1, {\varvec{X}}_{i2}={\varvec{x}}_{i2})=P(Y_{i2}=y_{i2}|Y_{i1}=0, {\varvec{X}}_{i2}={\varvec{x}}_{i2})=P(Y_{i2}=y_{i2}|Y_{i1}=1, {\varvec{X}}_{i2}={\varvec{x}}_{i2})=P(Y_{i2}=y_{i2}|{\varvec{X}}_{i2}={\varvec{x}}_{i2})\), i.e. \({\varvec{\beta }}_{2.s_1}={\varvec{\beta }}_{01}={\varvec{\beta }}_{11}={\varvec{\beta }}_2\). It should also be noted that even if the distribution of \(Y_{i1}\), \(Y_{i2}\), ..., \(Y_{ij}\) are independent, i.e., \( {{\varvec{\beta }}_{j.12...{j-1}}}= {{\varvec{\beta }}_j}\), this does not necessarily mean that the distribution of \(Y_{ij}\)’s are identical. Distribution of \(Y_{i1}\), \(Y_{i2}\), ..., \(Y_{ij}\) are identical only if \( {{\varvec{\beta }}_1}= {{\varvec{\beta }}_2}= ...= {{\varvec{\beta }}_j}= {{\varvec{\beta }}}\).

GEE is a method for marginal or population averaged model and it considers \( {\varvec{\beta }}_1= {\varvec{\beta }}_{2.1} = ... = {\varvec{\beta }}_{n_i.12...{n_i-1}} = {\varvec{\beta }}\), although inducing a (nuisance) correlation structure. ALR is also a marginal model-based approach and hence the association of repeated responses cannot be addressed in a true sense in ALR. Clearly, while analyzing longitudinal data with correlated response variables or response variables from independent but non-identical populations at different time points, fitting marginal-conditional models for \(Y_{ij}\)’s is a more appropriate choice as marginal models fail to utilize the major advantage of longitudinal data of observing the change in the outcome variable with respect to time because a marginal model is not able to apprehend the scenario.

It might be noted here that Darlington and Farewell [6] proposed a transition probability model based on the transition probability \(P(Y_{i2}=1|Y_{i1}=1, {\varvec{x_i}})=\frac{e^{{\varvec{x}}_i{\varvec{\beta }}_{11}}}{1+e^{{\varvec{x}}_i{\varvec{\beta }}_{11}}}\) and the marginal probability \(P(Y_{i2}=1|{\varvec{x_i}})=\frac{e^{{\varvec{x}}_i{\varvec{\beta }}}}{1+e^{{\varvec{x}}_i{\varvec{\beta }}}}\), where \(\beta \) is the vector of parameters of the marginal model \(P(Y_{ij}=1|{\varvec{x}}_i)\). Essentially Darlington and Farewell [6] addressed the correlation partially, as they have not considered the transition probability \(P(Y_{i2}=1|Y_{i1}=0, {\varvec{x_i}})\) in their model.

3 Proposed Models

In this study, we propose two joint models based on marginal conditional approach for repeated binary outcomes. We start from the model considered by Darlington and Farewell [6] with the working likelihood function:

$$\begin{aligned} L({\varvec{\beta }}, {\varvec{\beta }}_{11})=\prod \limits _{i=1}^{N}{p_{i}}^{y_{i1}}{(1-p_{i})}^{1-y_{i1}}\prod \limits _{j=2}^{n_i}{p^*_{ij}}^{y_{ij}}{(1-p^*_{ij})}^{1-y_{ij}} \end{aligned}$$
(15)

where \(p_i=\hbox {Pr}(Y_{ij}=1|{\varvec{X}}_i={\varvec{x}}_i)={\frac{{{e}^{ {\varvec{\beta }}'{\varvec{x}}_i}}}{1+{{e}^{{\varvec{\beta }}'{\varvec{x}}_i}}}}\) and \(p^*_{ij}=\hbox {Pr}(Y_{ij}=1|Y_{ij-1},{\varvec{X}}_i={\varvec{x}}_i)=E(Y_{ij}|Y_{ij-1}, {\varvec{X}}_i={\varvec{x}}_i)\) \(=p_i+\rho _i(Y_{ij-1}-p_{i}) \)and \(\rho _i= {\frac{{{e}^{{{\varvec{\beta }}_{11}}'x_i}}-{{e}^{{\varvec{\beta }}^{*'}{\varvec{x}}_i}}}{1+{{e}^{ {\beta _{11}}'{\varvec{x}}_i}}}}\), \(\mathrm {max}(-\frac{p_i}{1-p_i}, -\frac{1-p_i}{p_i})<\rho _i<1\) because the likelihood must be maximized at \(0<p_i<1\) and \(0<p_{ij}<1\). The limitations of this model is that it does not consider the transition probability from \(Y_{ij-1}=0\) to \(Y_{ij}=1\) and considered \(p^*_{ij}=\hbox {Pr}(Y_{ij}=1|Y_{ij-1}=1,{\varvec{X}}_i)\). Although, the transition probability \(P(Y_{ij}=1|Y_{ij-1}=0)\) was not considered in determining the correlation, while defining the range of \(\rho _i\), the transition from \(Y_{ij-1}=0\) was considered which contradicts with the definition of \(\rho _i\). A straight forward and simple way to improve the model discussed by Darlington and Farewell [6] by including both the transition probabilities, \(P(Y_{ij}=1|Y_{ij-1}=0)\) and \(P(Y_{ij}=1|Y_{ij-1}=1)\), in the working likelihood function is discussed in the following subsection.

3.1 Proposed Model 1

For any order of Markov chain with covariate dependence, a model based on marginal and conditional models can be used. Consider the simplest case of two repeated measures on each individuals. If \(Y_{i2}\) depends on \(Y_{i1}\), then for each possible values of \(y_{i1}\), we get one conditional model for \(Y_{i2}\). As we assumed \(Y_{ij}\) to be binary random variables, \(Y_{i1}\) can take values 0 and 1. When \(Y_{i1}=0\), the conditional probability of \(Y_{i2}=1\) can be defined as

$$\begin{aligned} p^*_{i2}= & {} \hbox {Pr}(Y_{i2} = 1| y_{i1}=0, {\varvec{x}}_{i2})\nonumber \\= & {} \frac{{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{01}}}}{1+{e^{ {\varvec{x}}_{i2}{\varvec{\beta }}_{01}}}}; \; i =1,2,\ldots ,N, \end{aligned}$$
(16)

where \( {\varvec{\beta }}_{01}\) is the vector of parameters of the conditional model of \(P(Y_{i2}=1|Y_{i1}=0, X_{i2}=x_{i2})\); \(i=1, 2, \ldots N\). Here the suffix (01) of \({\varvec{\beta }}\) is used to show the transition from \(Y_{i1}=0\) and \(Y_{i2}=1\).

Similarly, when \(Y_{i1}=1\), the conditional probability of \(Y_{i2}=1\) can be defined as

$$\begin{aligned} p^*_{i2}= & {} \hbox {Pr}(Y_{i2} = 1| y_{i1}=1, {\varvec{x}}_{i2}) \nonumber \\= & {} \frac{{e^{{\varvec{x}}_{i2}{\varvec{\beta }}_{11}}}}{1+{e^{ {\varvec{x}}_{i2}{\varvec{\beta }}_{11}}}}; \; i =1,2,\ldots ,N, \end{aligned}$$
(17)

where \( {\varvec{\beta }}_{11}\) is the vector of parameters of the conditional model of \(P(Y_{i2}=1|Y_{i1}=1, X_{i2}=x_{i2})\); \(i=1, 2, \cdots N\). The suffix (11) of \({\varvec{\beta }}\) is used to show the transition from \(Y_{i1}=1\) and \(Y_{i2}=1\).

The joint probabilities can be expressed as, for \(i=1,2, \ldots ,N\),

$$\begin{aligned} P(Y_{i1}, Y_{i2}) =P(Y_{i2}=1|Y_{i1}=y_{i1}, {\varvec{x}}_{i2})P(Y_{i1}=y_{i1}| {\varvec{x}}_{i1}). \end{aligned}$$
(18)

In general, the joint mass function for \(n_i\) outcome variables, \(Y_{i1}, Y_{i2}, ..., Y_{in_i}\), for subject i at follow-ups \(1, 2 ..., n_i\), respectively, in the presence of covariates \({\varvec{X}}_{ij} = (1, X_{ij1}, X_{ij2}, ..., X_{ijp})\), can be expressed as product of the conditional and marginal probability mass functions for given values of covariates as follows:

$$\begin{aligned} \begin{aligned}&\hbox {Pr} (Y_{i1}=y_{i1}, ... ,Y_{in_i}=y_{in_i}|{\varvec{X_i}}={\varvec{x_i}})\\&= \hbox {Pr}(Y_{i1})\prod \limits _{j=2}^{n_i} \hbox {Pr} (Y_{ij}|Y_{i1}=y_{i1}, ..., Y_{ij-1}=y_{ij-1}, {\varvec{X_i}}={\varvec{x_i}}). \end{aligned} \end{aligned}$$
(19)

In general, let us consider \(n_i\) possibly correlated outcome variables \((Y_{i1}, Y_{i2},..., Y_{in_i})\) on each of N individuals. Let \({\varvec{\theta }}=(\theta _1, \theta _{2.1}, ..., \theta _{n_i.1, 2, ..., n_i-1}) \) be the vector of unknown parameters where \(\theta _j=g(\mu _{ij})=X_{ij} {\varvec{\beta }}_j\), \( \theta _{j.12...j-1}=g(\mu _{ij.12...j-1})={\varvec{X}}_{ij} {\varvec{\beta }}_{j.12...j-1}\) and g is an appropriate link function. The joint probability mass function of \(Y_{i1}, ..., Y_{in_i}\) can be expressed as:

$$\begin{aligned} \begin{aligned}&P(Y_{i1}=y_{i1}, \cdots , Y_{in_i}=y_{in_i}) \\&= P(Y_{i1}=y_{i1}| {{\varvec{x}}_{i1}}).P(Y_{i2}=y_{i2}| {{\varvec{x}}_{i2}}, y_{i1}) \\&\cdots P(Y_{in_i}=y_{in_i}| {{\varvec{x}}_{in_i}}, y_{i1}, \cdots , y_{in_i-1}). \end{aligned} \end{aligned}$$
(20)

The likelihood function can be expressed as:

$$\begin{aligned} L({\varvec{\beta }})= & {} \prod \limits _{i=1}^{N}f(y_{i1}| {{\varvec{x}}_{i1}}, {\varvec{\beta }}_1)f(y_{i2.1}| {{\varvec{x}}_{i2}}, {\varvec{\beta }}_{2.1}) \nonumber \\&\cdots \, f(y_{n_i.1, 2, ..., n_i-1}| {{\varvec{x}}_{in_i}}, {\varvec{\beta }}_{n_i.1, 2, ..., n_i-1}), \end{aligned}$$
(21)

where \(f(y_{i1}| {\varvec{x}}_{i1}, {\varvec{\beta }}_1)\) is the marginal distribution of \(y_{i1}\) for given \({\varvec{X}}_{i1}={\varvec{x}}_{i1}\) and the conditional probabilities of \(y_{ij}\), given \(Y_{i1} = y_{i1}, ..., Y_{ij-1}=y_{ij-1}\), and \({\varvec{X}}_{ij}={{\varvec{x}}_{ij}}\) are \(f(y_{ij.12...j-1})=f(y_{ij}| {\varvec{x}}_{ij}, y_{i1},..., y_{ij-1}, \beta _{j.1, 2, ..., j-1})\), \(j =2, 3, ..., n_i\) . Let \(l_{ij}\) be the contribution of ijth term to the log likelihood function. Differentiating the log-likelihood, \(l=\sum _{i=1}^{N}\sum _{j=1}^{n_i}{l_{ij}}\), with respect to corresponding parameters, and equating to zero, the estimating equations are:

$$\begin{aligned} \frac{\partial l}{\delta \beta _k} =\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{n_i}\frac{\partial l_{ij}}{\partial \theta _j}\frac{\partial \theta _{j}}{\partial \mu _{ij}}. \frac{\partial \mu _{ij}}{\partial \beta _k}=0. \end{aligned}$$
(22)

The estimates of \( {{\varvec{\beta }}}\) can be obtained by maximum likelihood method.

The variance of the estimates, \(V(\hat{{\varvec{\beta }}})\), is obtained from the inverse of the information matrix I, where I is a \((2^{n_i}-1)(p+1)\times (2^{n_i}-1)(p+1)\) matrix with \(kk'\)th elements \(-\frac{\partial ^2 l}{\partial \beta _{k}\partial \beta _{k}'}; k, k' = 0, 1, ..., p.\)

For example, consider possibly correlated Bernoulli outcome variables \(Y_{i1}, ..., Y_{in_i}\), with probability of success \(p_{i1}, p^*_{i2}, ..., p^*_{in_i}\), where \(p^*_{ij}\) denotes the conditional probability, \(P(Y_{ij}=1|y_{i1}, ..., y_{ij-1}, {\varvec{x}}_{ij})\), \(j=2, ..., n_i\). Then \(f(y_{i1}| {\varvec{x}}_{i1}, {\varvec{\beta }}_1)=p^{y_{i1}}_{i1}(1-{p_{i1}})^{1-y_{i1}}\) and \(f(y_{ij.12...j-1}| {\varvec{x}}_{ij}, {\varvec{\beta }}_{j.12...j-1})={p^*}^{y_{ij}}_{ij}(1-{{p^*}_{ij}})^{1-y_{ij}}\), \(j=2, ..., n_i\). The likelihood function can be expressed as

$$\begin{aligned} L=\prod _{i=1}^{N}{e^{y_{i1}ln{\frac{p_{i1}}{1-p_{i1}}}+y_{i2}ln{\frac{p^*_{i2}}{1-p^*_{i2}}}+...+y_{in_i}ln{\frac{p^*_{in_i}}{1-p^*_{in_i}}}}}. \end{aligned}$$
(23)

Similar representation was shown previously by Islam and Chowdhury [14]. Differentiating the log-likelihood with respect to the respective parameters and equating to zero, the score equations for \({\varvec{\beta }}\) are obtained as:

$$\begin{aligned} \begin{aligned} \frac{\partial l}{\partial \beta _k}=&\sum _{i=1}^{N}X_{i1k}(y_{i1}-p_{i1})\\&+\sum _{i=1}^{N}\sum _{j=2}^{n_i}X_{ijk}(y_{ij}-p^*_{ij}), k=0, 1, ..., p. \end{aligned} \end{aligned}$$
(24)

The information matrix \(I_{\beta }\) is a \((2^{n_i}-1)(p+1)\times (2^{n_i}-1)(p+1)\) matrix with elements \(-\frac{\partial ^2 l}{\partial \beta _k \partial \beta _k'}\), \(k, k'=0, 1, ..., p\).

3.2 Test of Hypothesis

To test the significance of the overall model, the null and alternative hypothesis can be expressed as: \(H_0: {\varvec{\beta }} = {\varvec{\beta _0}} \) vs \(H_1: {\varvec{\beta }} \ne {\varvec{\beta _0}} \) where \( {{\varvec{\beta }}} = ({\varvec{\beta }}_1, {\varvec{\beta }}_{2.1}, ..., {\varvec{\beta }}_{n_i.1, 2,...n_i-1})\) and \({\varvec{\beta }}_0\) is the value of \({\varvec{\beta }}\) under null hypothesis of no covariate effect. The test statistic \(\Lambda = -2[\text {ln} L( {{\varvec{\beta }}_0})-\text {ln} L({\varvec{\beta }})]\) has a chi-square distribution under \(H_0\) with \((2^{n_i}-1)p\) d.f. Here \(\text {ln}L({\varvec{\beta }})\) is the log likelihood of the full model and \(\text {ln}L({\varvec{\beta _0}})\) is the log likelihood of the reduced model for no covariate effects, i.e. the value of \(\text {ln}L({\varvec{\beta }})\) under \(H_0\). For testing \(H_0: \beta _k =0\) vs \(H_1: \beta _k \ne 0\) the test statistic is \(z=\frac{{\hat{\beta }}_k}{se({\hat{\beta }}_k)}\) which follows N(0, 1) under \(H_0\). The major limitation of the proposed joint model based on Markov transition probability in Eq. (21) is the rapid increase in the number of parameters for increasing number of follow-ups. With \(n_i\) follow-ups, the number of parameters to be estimated is as big as \((2^{n_i}-1)(p+1)\) where p is the number of covariates. To overcome the limitations of the proposed model 1, in the following section, we propose a second set of joint models as an alternative.

3.3 Proposed Model 2

If there are more than three follow-ups in a longitudinal data, the number of parameters of the joint model becomes as big as \((2^4-1)(p+1)\), for 4 follow-ups where p is the number of covariates.. In this section, an alternative to GEE approach is developed based on the regressive models [3] in order to analyze repeated measures data. The generalized form of the regressive model was proposed by Islam et al. [16]. Following the notations of Islam et al. [16], let us define \( {{\varvec{\lambda }}'_{j-1}} = {({\varvec{\beta }}', {\varvec{\gamma }}'_{j-1}, {\varvec{\rho }}'_{j-1}, {\varvec{\eta }}'_{j-1} )}\) and \( {\varvec{W}}_{j-1}'=({\varvec{X}}'_{ij}, {\varvec{Y}}_{j-1}', {\varvec{\nu }}'_{j-1}, {\varvec{Z}}'_{j-1})\) where \({\varvec{X}}_{ij}=(1, X_{ij1}, X_{ij2}, ..., X_{ijp})\) and \({\varvec{Y}}_{j-1}=Y_{i1}, ..., Y_{ij-1}\), \({\varvec{\nu }}_{j-1}=(\nu _{12}, \nu _{123}, ..., \nu _{12...j-1})'=(y_{i1} y_{i2}, y_{i1} y_{i2} y_{i3},..., y_{i1} y_{i2} ... y_{ij-1})'\) are the interaction terms among \(Y_{ij}\)s, \(j=1, ..., n_i\) and \({\varvec{Z}}_{j-1}=(z_{11}, ..., z_{1p}, ..., z_{j-1p})'\) are the interaction terms among \({\varvec{X}}_{ij}\) and \({\varvec{Y}}_i\) =\((x_{i1} y_{i1}, ..., x_{ip}y_{i1}, ..., x_{i1} y_{ij-1}, ..., x_{ip} y_{ij-1})'\). \({\varvec{\beta }}'=(\beta _0, \beta _1, ..., \beta _p)\) are the coefficients of \({\varvec{X}}_{ij}\); \({\varvec{\gamma }}'_j= (\gamma _1, ..., \gamma _{j-1})\), the parameters corresponding to \(Y_{i1}, ..., Y_{ij-1}\); \( {\varvec{\rho }}'_{j-1} = (\rho _{12}, \rho _{123}, ..., \rho _{12...j-1})\), the coefficients of the interaction terms among \(Y_{ij}\)’s, and \({\varvec{\eta }}_{-1}=(\eta _{11}, ..., \eta _{j-1p})\) be the parameters corresponding to \({\varvec{Z}}_{j-1}\). The regressive model for the jth follow-up is defined as:

$$\begin{aligned} P(Y_{ij}=s|w_{j-1})=\frac{e^{ {\varvec{\lambda }}'_j {\varvec{w}}_{j-1}s}}{1+e^{ {\varvec{\lambda }}'_j{\varvec{w}}_{j-1}}}, s= 0, 1, \; j=2, ..., n_i. \end{aligned}$$
(25)

The likelihood function can be expressed as:

$$\begin{aligned} L = \prod \limits _{i=1}^{N}f(y_{i1}|{\varvec{x}}_{1}, {\varvec{\lambda }}_{1})f(y_{i2}|{\varvec{x}}_i, {\varvec{\lambda }}_{2})...f(y_{n_i}|{\varvec{x}}_i, {\varvec{\lambda }}_{n_i}). \end{aligned}$$
(26)

The score equations can be obtained by differentiating the log likelihood, \(l=\mathrm {log} L\), with respect to the respective parameters. The information can be obtained as \(-\frac{\partial ^2 l}{\partial {\varvec{\lambda }} \partial {\varvec{\lambda }}'}\).

3.4 Test of Hypothesis

To test for the dependence of the jth outcome on earlier outcomes and other related terms, the null hypothesis can be shown as: \(H_0:{\varvec{\lambda }}^*_{j-1}=0\) against \(H_1:{\varvec{\lambda }}^*_{j-1}\ne 0\) where \({\varvec{\lambda }}^*_{j-1} = ({\varvec{\gamma }}_{j-1}, {\varvec{\rho }}_{j-1}, {\varvec{\eta }}_{j-1})'\). The total number of parameters need to be tested is \((2^{j-1}-1)\) for \({\varvec{\gamma }}_{j-1}\) and \({\varvec{\rho }}_{j-1}\) and \((j-1)\times p\) parameters for \({\varvec{\eta }}_{j-1}\). The test statistic is a likelihood ratio and follows chi-square distribution with \((2^{j-1}-1)+(j-1)\times p\) degrees of freedom [13]. Under independence, the model in Eq. (25) can be defined as

$$\begin{aligned} P(Y_{ij}=s| {{\varvec{x}} _{ij}}, {{\varvec{Y}}_{ij-1}}, {{\varvec{\nu }}_{j-1}} {{\varvec{Z}}_{j-1}})=\frac{e^{{\varvec{\beta }}'{\varvec{x_{ij}}}s}}{1+e^{{\varvec{\beta }}' {\varvec{x_{ij}}}}}, s= 0, 1. \end{aligned}$$
(27)

If the outcomes are independent, one can simply fit the reduced model using a maximum likelihood method. If the outcomes are associated, the full model as given in Eq. (25) is suggested.

4 Simulation Study

A simulation study was carried out to compare the properties of estimates of regression coefficients of the models discussed in the earlier sections. The repeated measures can be associated in a variety of ways and in this study, the cases considered are:(i) \(Y_{ij}\)’s are identically and independently distributed, (ii) \(Y_{ij}\)’s are identically distributed and associated (iii) \(Y_{ij}\)’s are not identical and their distributions are independent.

4.1 Simulation Design

For simplicity of the study, we restrict the simulation study for the conditional marginal model to two follow-ups, \(Y_{i1}\) and \(Y_{i2}\) on ith subject and only one explanatory variable, \(X_{i1}\) for each of the N individuals where \(X_{i1}\) is fixed and time invariant. We assumed that \(Y_{i1}\) and \(Y_{i2}\) are two binary random variables with \(Y_{i1} \sim B(1, p_{i1})\) and \(Y_{i2 } \sim B(1, p_{i2})\). The corresponding generalized linear models are \(g(\mu _{i1})=\frac{e^{{\varvec{\beta }}'_1 {\varvec{X}}_i }}{1+e^{{\varvec{\beta }}'_1 {\varvec{X}}_i }} \) and \(g(\mu _{i2})=\frac{e^{{\varvec{\beta }}'_2 {\varvec{X}}_i}}{1+e^{{\varvec{\beta }}'_2 {\varvec{X}}_i }} \).

The simulation followed the following steps: a time invariant explanatory variable \(X_i\) was generated first from Bernoulli distribution with probability of success 0.5. Then \(p_{i1}\), the probability of success of \(Y_{i1}\) was calculated using the equation \(P(Y_{i1}=1|{\varvec{x}}_i) = \frac{e^{{\varvec{\beta }}'_1 {\varvec{x}}_i}}{1 + e^{{\varvec{\beta }}'_1 {\varvec{x}}_i}}\) for selected values of \({\varvec{\beta }}_1=(\beta _{10}, \beta _{11})'\) where \({\varvec{X}}_i=(1, X_i)\), \(\beta _{10}\) is the intercept term and \(\beta _{11}\) is the coefficient of \(X_i\). N values, \(a_i\), were generated from uniform distribution within range (0, 1) and then \(Y_{i1}\) was generated such that \(Y_{i1}=1\) if \(a_i < P(Y_{i1}=1|x_i) \) and 0 otherwise. To generate data on \(Y_{i2}\), first, the probability of success at time point 2, \(p_{i2}\), was calculated as \(P(Y_{i2}=1|{\varvec{X}}_i={\varvec{x}}_{i}) = \frac{e^{{\varvec{\beta }}'_2 x_i}}{1 + e^{{\varvec{\beta }}'_2 x_i}}\). Here \({\varvec{\beta }}_2=(\beta _{20}+\gamma _1y_{i1}, \beta _{21})\) where \(\beta _{20}+\gamma _1y_{i1}\) is the intercept term, \(\beta _{21}\) is the coefficient of \(X_{i}\) and \(\gamma _1\) is the coefficient of \(Y_{i1}\). Similar as \(Y_{i1}\), N values, \(b_i\), were generated from uniform distribution within range (0, 1) and then \(Y_{i2}\) was generated such that \(Y_{i2}=1\) if \(b_i < P(Y_{i2}=1|X_i=x_i) \) and 0 otherwise.

For illustration of the regressive model, \(Y_{i1}\), \(Y_{i2}\), \(Y_{i3}\) and \(Y_{i4}\) were generated in a similar way as \(Y_{i1}\) with \({{\varvec{\beta }}_1}=(\beta _0, \beta _1)'\), \(Y_{i2}\) with \({{\varvec{\beta }}_2}=(\beta _0, \beta _1, \gamma _1)'\), \(Y_{i3}\) with \( {{\varvec{\beta }}_3}=(\beta _0, \beta _1, \gamma _1, \gamma _2)'\) and \(Y_{i4}\) with \( {{\varvec{\beta }}_4}=({\beta _0, \beta _1, \gamma _1, \gamma _2, \gamma _3 })'\), respectively.

\(Y_{ij}\)’s are independently and identically distributed when \({\varvec{\beta }}_1={\varvec{\beta }}_2\) and \(\gamma _1=0\); distribution of \(Y_{ij}\)’s are identical (\({\varvec{\beta }}_1={\varvec{\beta }}_2\)) but they are associated (\(\gamma _1 \ne 0 \)); \(Y_{ij}\)’s are independent (\(\gamma _1=0\)) but the distribution of \(Y_{ij}\)’s are not identical (\({\varvec{\beta }}_1 \ne {\varvec{\beta }}_2\)). GEE under different correlation structures (independent, exchangeable and autoregressive), ALR under exchangeable correlation and joint models were fitted. The bias, the standard error of the estimates and coverage probability of the 95 \( \% \) confidence interval were constructed over a range of scenarios for large samples and varying association among the repeated responses.

4.2 Simulation Results

The findings of the simulation study (estimates, bias, standard error and coverage probability) are summarized in Tables 1, 2 and 3. In all the following tables, GEE(In), GEE(Ex), GEE(AR) stand for GEE models under independent, exchangeable and autoregressive correlations, respectively. ALR(Ex)denotes the ALR model under an exchangeable correlation. The parameters of the joint model are \(\beta _{10}\), \(\beta _{11}\), \(\beta _{010}\), \(\beta _{011}\), \(\beta _{110}\) and \(\beta _{111}\). Here, \(\beta _{10}\) and \(\beta _{11}\), respectively, denote the intercept and the regression coefficients of the marginal model \(P(Y_{i1}=1|{\varvec{X}}_i={\varvec{x}}_i)\); \(\beta _{010}\) and \(\beta _{011}\), respectively, denote the intercept and the regression coefficient of the conditional model \(P(Y_{i2}=1|Y_{i1}=0, {\varvec{X}}_i={\varvec{x}}_i)\); and \(\beta _{110}\) and \(\beta _{111}\), respectively, denote the intercept and regression coefficient of the conditional model \(P(Y_{i2}=1|Y_{i1}=1, {\varvec{X}}_i={\varvec{x}}_i)\). GEE or ALR, being approaches based on marginal models, estimate the parameters of such models as an average of the parameters of two populations from where \(Y_{i1}\) and \(Y_{i2}\) were generated. To distinguish the parameters of GEE and ALR from joint model, we used the notation \({\varvec{\beta }}^*=(\beta ^*_0, \beta ^*_1)'\) to denote the parameters of GEE and ALR in the following tables.

Table 1 Parameters (Par), estimates(Est.), bias, standard error(SE) and coverage probability (CP) of estimates for independent \((\gamma _1=0.0)\) and correlated outcomes \((\gamma _1= 1.0)\) with identical distributions of \(Y_{i1}\) and \(Y_{i2}\), (\({\varvec{\beta }}_1=(\beta _{10}, \beta _{11})=(0.5, 0.2)\), \({\varvec{\beta }}_2=(\beta _{20}, \beta _{21})=(0.5, 0.2))\)
Table 2 Estimates(Est.), bias, standard error(SE) and coverage probability (CP) of estimates for independent outcomes with non-identical distributions, (\({\varvec{\beta }}_1=(\beta _{10}, \beta _{11})=(0.5, 0.2)\), \({\varvec{\beta }}_2=(\beta _{20}+\gamma _1 y_{i1}, \beta _{21})=(0.2, 0.7), \gamma _1=0.0)\)
Table 3 Parameters(Par), Estimates(Est), Bias, standard error (SE) and coverage probability (CP) of estimates of different models for independent and associated distribution (\(\beta _{10}=\beta _{20}=\beta _{30}=\beta _{40}=\beta ^*_0=0.2\), \(\beta _{11}=\beta _{21}=\beta _{31}=\beta _{41}=\beta ^*_1=0.7\), \((\gamma _1, \gamma _2, \gamma _3)=(0,0,0)\) and (1, 1, 1))

In Table 1, \(P(Y_{i1}=1|{\varvec{X}}_i={\varvec{x}}_i) = \frac{e^{\beta _{10}+\beta _{11} x_{i11}}}{1+e^{\beta _{10}+\beta _{11} x_{i11}}} = \frac{e^{0.5 + 0.2 x_{i11}}}{1+e^{0.5 + 0.2 x_{i11}}}\) and \(P(Y_{i2}=1|Y_{i1}=y_{i1}, {\varvec{X}}_i={\varvec{x}}_i)=\frac{e^{\beta _{20}+ \beta _{21} x_{i21} +\gamma _1 y_{i1}}}{1+e^{\beta _{20}+ \beta _{21} x_{i21} +\gamma _1 y_{i1}}}\), where, \(\beta _{20}\) and \(\beta _{21}\), respectively, denote the intercept and regression coefficients of the marginal model \(P(Y_{i2}=1|{\varvec{X}}_i={\varvec{x}}_i)\); So if \(\gamma _1=0\), the true values of the parameters to be estimated for the joint model are \(\beta _{10}=0.5\), \(\beta _{11}=0.2\), \(\beta _{010}=\beta _{20}+\gamma _1 \times (y_{i1}=0) =0.5+0\times 0 = 0.5 =\beta _{20}\), \(\beta _{011} = \beta _{21}= 0.2\), \(\beta _{110}=\beta _{20}+\gamma _1 \times (y_{i1}=1) =0.5+0\times 1 = 0.5\) and \(\beta _{111}=\beta _{21}=0.2\). When \(\gamma _1 =1\), the true values of the parameters to be estimated for the joint model are \(\beta _{10}=0.5\), \(\beta _{11}=0.2\), \(\beta _{010}=\beta _{20}+\gamma _1 \times (y_{i1}=0) =0.5+1\times 0 = 0.5\), \(\beta _{011} = \beta _{21}= 0.2\), \(\beta _{110}=\beta _{20}+\gamma _1 \times (y_{i1}=1) =0.5+1\times 1 = 1.5\) and \(\beta _{111}=\beta _{21}=0.2\).

Table 1 shows that bias and the standard error of estimates of the proposed Model 1 (extension of Darlington and Farewell [6]), Proposed Model 2, GEE and ALR are competitive for longitudinal data when the repeated measures are independent (\(\gamma _1=0.0\)). Inadequacy of GEE or ALR to portray the relationship between X and Y are visible with the presence of dependence relationship between \(Y_{i1}\) and \(Y_{i2}\) as shown in Table 1 where the data are generated from two associated populations (\(\gamma _1=1.0\)). The marginal parameters in the Model 1 proposed as an extension of Darlington and Farewell [6] does not make much improvement in the performance of the parameters in terms of bias and standard error.

The proposed joint model (Model 2) gives better estimates in this case. The inadequacy of GEE or ALR to portray the relationship between X and Y are also observed in Table 2 where the data are generated from two independent but nonidentical populations. The estimates of parameters of GEE do not portray the actual relationship between the covariates and the response variable because of the variation in the relationship at different time points. And the actual bias from population 1 (from where \(Y_{i1}\) were generated) and population 2 (from where \(Y_{i2}\) were generated) are shown in two columns of Table 2. Clearly, even if the repeated measures are not associated, while data come from two different populations, the GEE or ALR are not adequate to capture the relationship between the covariates and the response variable. Proposed model 2 is suggested in such cases.

While there are more than three repeated measurements on the same subject, the covariate-dependent Markov Chain-based joint models need to estimate too many parameters and a general form of the regressive model approach [13] is suggested as an alternative of GEE-based approaches. The results of the simulation study (Table 3) show that when the outcomes are independent and identically distributed, the estimates of the parameters of a regressive model produce similar results as GEE or ALR in terms of bias and coverage probability. The regressive model performs better than GEE or ALR while the repeated responses are associated.

Indubitably, GEE and ALR performed well only when repeated measures come from identical population and are not associated. The simulation study also finds that basically there is no difference in the estimates of GEE under different correlation structures (Tables 1, 2, 3). Also, ALR does not show any noticeable difference from GEE estimates in most cases. The proposed model 2 (for 3 or fewer repeated outcomes) and proposed model 3 (for more than 3 repeated outcomes) produce better estimates in terms of bias and coverage probability than GEE or ALR in the cases when responses are associated or the responses at different time points have different distributions.

5 Application to HRS Data

The first three waves of the longitudinal data from the Health and Retirement Study (HRS) conducted by the University of Michigan [29] were used for comparison of the selected methods. The study started in 1992 on American individuals over the age of 50 years and their spouses and the subjects are observed every two years. In wave 1, the sample size was 9760 and the sample size was reduced to 9750 due to the dropping of 10 cases with missing values of outcome variable at first round. Finally, the number of individuals were 8657 who reported that they were not hospitalized at wave 1. The panel data from the waves for 1992, 1994 and 1996 have been used in this study. An Elderly population may suffer from repeated spells of depression which may change over time [8, 15] and result in other health problems and chronic illness [19]. The literature on depression among elderly helped filling many gaps in our understanding of the factors associated with depression and also the outcome of depression [2]. But understanding depression and its associated factors more explicitly is important. In many studies on clinical and non-clinical populations, \(\text {CESD}\) (Center for Epidemiologic Studies Depression) scale is employed to measure depressive symptoms [28]. The dependent variable for this study is Depression status (no depression (\(\text {CESD score} = 0)\), depression (\(\text {CESD score}>0))\). The independent variables are gender (male=1), marital status (married/partnered=1), education, ethnicity: Black (Black \(=\) 1), ethnicity: White (White \(=\) 1), drinking habit (drink=1) and the number of health conditions. In Tables 4 and 5, Mstat stands for marital status, White stands for white ethnicity, Black stands for Black ethnicity, Drink means drinking habit and No. of Cond. is the number of health conditions. In GEE models, we observe that marital status, education year, ethnicity: White and number of health conditions were significantly associated with depression. The GEE model under the assumption of independence and exchangeable correlation produces the same results and finds that marital status, education, White ethnicity and number of health conditions had significant influence on depression among the study population. ALR under an exchangeable correlation, in addition, finds drinking habit as a significant factor for depression. GEE model under the assumption of autoregressive correlation shows that marital status, education, white ethnicity and number of health conditions were significantly associated with the depression status but gender was not significant in GEE-based models.

Table 4 Estimates of parameters of GEE and ALR on HRS data
Table 5 Estimates of parameters of the proposed Model 1 for HRS data

The joint model shows that the effects of the covariates were different on the depression status at different follow-ups. At the baseline, marital status, education, white ethnicity and the number of conditions had a significant effect on depression. Married people were less depressed as compared to their single counterparts, education lessened the risk of depression, white people were less depressed, number of physical conditions increased the risk of depression.

In the first follow-up, covariate effects were different on depression status depending on what the CESD score was in the baseline (\(Y_1\)). If the respondent was not depressed in the baseline, gender, marital status, education year and being white had a significant influence on the dependent variable. Male, married, educated persons and people from White ethnicity are at less risk of being depressed. Gender had no significant effect on those at the first follow-up. Being married and being educated lessens the risk of being depressed for those who were depressed at the baseline.

In the second follow-up, the effects of the covariates were notably different depending on the depression status of the respondent in the previous follow-ups. Depression status of patients (who were not depressed in the baseline or the first follow-up) was significantly associated with marital status, education and drinking habit. Depression status of patients (who were not depressed in the baseline but were depressed in the first follow-up) was significantly associated with education. Education had a significant effect on depression status of patients in second follow-up for those who were depressed in the baseline but not depressed in the first follow-up. Respondents’ depression status was significantly associated with marital status and education for those who were depressed in both the first and the second follow-ups. These findings confirm our assertion that the extensive use of GEE-based models may result in failure to specify the covariate effects adequately for longitudinal data. The results demonstrate that a joint model based on marginal conditional approach explains the covariate effects more meaningfully.

6 Conclusion

Majority of the longitudinal models, for example, the GEE and ALR, are based on marginal approaches with an induced correlation among repeated outcomes on one subject and lack in proper specification of the dependence in binary or multivariate repeated outcomes. Naturally, these models may fail to provide an efficient estimation of parameters of the model considered. At this backdrop, this study proposed the usage of two joint models based on marginal conditional approaches as alternatives to GEE or related models based on marginal approaches.

The joint models, (proposed models 1 and 2), take care of the correlation among the repeated measures in a built-in nature and can be extended for any order of dependence without complicating the theory. First of all, the proposed model 1 is an extension of Darlington and Farewell [6] showing the likelihood for models based on the Markovian assumption of first order more explicitly. The second model (proposed model 2) is a further generalization of proposed model 1 based on marginal and conditional models for any order of a Markov chain with covariate dependence. Although the estimates of parameters of the proposed model 1 have less bias and greater coverage probability as compared to the same of GEE or ALR, the proposed model 1 has restricted use due to an overwhelming increase in the number of models and parameters to be estimated when there are more than three observations on a single subject. To overcome these limitations, we suggested the regressive model (Proposed model 2), when a subject is observed more than three times. It might be noted that the biggest advantage of the proposed model 2 is its minimum number of parameters for any order of the underlying Markov Chain. Furthermore, in terms of bias and coverage probability, the proposed model 2 appears to be as good as other alternatives, say, proposed models 1. Hence, for practical reasons, the proposed model 2 can be used to analyze longitudinal data effectively and conveniently for more than three follow-ups. In addition to the simulation study, the applications of the selected models to HRS data [29] show that the proposed model 2 is a more specified model in a simpler setup, as compared to GEE, ALR, the Darlington and Farewell’s [6] method or proposed model 1. Nevertheless, in case of more than 3 repeated outcomes, the proposed model 2 is not only the most convenient model but also it performs better than GEE or ALR. Indubitably, both the theoretical and practical users will find the results more useful using the proposed models.