Keywords

1 Introduction

Consider a longitudinal binary data setup where y it is the Bernoulli response for the i-th (i = 1, ⋯ , K) individual at the t-th time point (t = 1, ⋯ , T) and \(x_{it} = {(x_{it1},\cdots \,,x_{itu},\cdots \,,x_{itp})}^{\prime }\) is the associated p-dimensional covariate vector. When the longitudinal data are complete (that is, there are no missing responses from any of the individuals in the study), an estimating approach such as generalized quasi-likelihood (GQL) can be used to obtain an estimate of the regression parameter vector, β, that is both consistent and efficient, provided that the correlation structure associated with the repeated binary responses is known (see Sutradhar 2003). In order to describe the longitudinal correlation in the data, it seems reasonable to assume deterioration in the association between observations on the same individuals that are further apart in time. Thus, to achieve this, we let ρ be a longitudinal correlation parameter and consider a conditional linear binary dynamic (CLBD) model proposed by Zeger et al. (1985) (see also Qaqish 2003), which is given by

$$\displaystyle\begin{array}{rcl} & & \qquad \qquad \qquad \qquad \qquad \qquad P(Y _{i1} = 1) =\mu _{i1},\mbox{ and} \\ & & P(Y _{it} = 1\mid y_{i,t-1}) =\mu _{it} +\rho (y_{i,t-1} -\mu _{i,t-1}) =\lambda _{i,t\mid t-1}(\beta,\rho ) =\lambda _{it},\mbox{ for }t = 2,\cdots \,,T{}\end{array}$$
(1)

with \(\mu _{it} = exp(x_{it}^{\prime} \beta )/[1 + exp(x_{it}^{\prime} \beta )]\) for t = 1, ⋯ , T. According to model (1), the marginal means and variances of y it are

$$\displaystyle{ E(Y _{it}) =\mu _{it} }$$
(2)

and

$$\displaystyle{ V ar(Y _{it}) =\sigma _{i,tt} =\mu _{it}(1 -\mu _{it}), }$$
(3)

while the correlations between Y it and Y i, t + l for \(l = 1,\cdots \,,T - 1\), \(t = 1,\cdots \,,T - l\) are given by

$$\displaystyle{ \mathit{corr}(Y _{it},Y _{i,t+l}) {=\rho }^{l}{\left [ \frac{\sigma _{i,tt}} {\sigma _{i,t+l,t+l}}\right ]}^{1/2}. }$$
(4)

The means, variances, and covariances defined by (2) through (4) are nonstationary, since they are all functions of time-dependent covariates {x it }. However, if the σ i, tt are not extremely different, the correlations given by (4) assume a behavior that is analogous to an autoregressive process of order one, AR(1). Under the present model, the correlation parameter ρ must satisfy the range restriction

$$\displaystyle{ max\left [- \frac{\mu _{it}} {1 -\mu _{i,t-1}},-\frac{1 -\mu _{it}} {\mu _{i,t-1}} \right ] \leq \rho \leq min\left [ \frac{1 -\mu _{it}} {1 -\mu _{i,t-1}}, \frac{\mu _{it}} {\mu _{i,t-1}}\right ]. }$$
(5)

Suppose that we let μ i and Σ i (ρ) represent the mean vector and the covariance matrix of the complete data vector Y i , where \(\mu _{i} = {(\mu _{i1},\cdots \,,\mu _{it},\cdots \,,\mu _{iT})}^{\prime }\) and \(\Sigma _{i}(\rho ) = A_{i}^{1/2}C_{i}(\rho )A_{i}^{1/2}\). Here, C i (ρ) is the T ×T correlation matrix based on (4), and \(A_{i} = diag(\sigma _{i,11},\cdots \), \(\sigma _{i,tt},\cdots \,,\sigma _{i,TT})\). An estimator for β that is both consistent and highly efficient can be obtained by solving the GQL estimating equation

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \mu _{i}^{\prime }} {\partial \beta } {[\Sigma _{i}(\rho )]}^{-1}(y_{ i} -\mu _{i}) = 0, }$$
(6)

(Sutradhar 2003).

In practice, it is typically the case that some of the responses associated with each of a number of individuals in the study may be missing. To acknowledge this phenomenon during the data collection process, we introduce an indicator variable R it , that takes on a value of one if Y it is observed, and zero otherwise. For purposes of our investigation here, we adopt the not-so-unreasonable assumption that all individuals provide a response at the first time point, so that R i1 = 1 for all i = 1, ⋯ ,K. We also assume monotonic missingness, suggesting that the R it satisfy the inequality \(R_{i1} \geq R_{i2} \geq \cdots \geq R_{it} \geq \cdots \geq R_{iT}\). Thus, if responses are no longer observed for the i-th individual after the j-th time point, for this individual we would have available y it for \(t = 1,\cdots \,,T_{i} = j\).

Regarding the missing data mechanism, at this time we distinguish between responses that are missing completely at random, MCAR, and those that are missing at random, MAR (see Fitzmaurice et al. 1996; Paik 1997; Rubin 1976). When the responses are MCAR, the indicator variable R it reflecting the presence or absence of Y it does not depend on the previous responses \(Y _{i1},\cdots \,,Y _{i,t-1}\). In this instance, if we define \(R_{i} = diag(R_{i1},\cdots \,,R_{iT})\) and incorporate this matrix into the estimating equation given by (6) to yield

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \mu _{i}^{\prime }} {\partial \beta } {[\Sigma _{i}(\rho )]}^{-1}R_{ i}(y_{i} -\mu _{i}) = 0, }$$
(7)

it is still possible to obtain an unbiased estimator for β that will be consistent and efficient. Note that Σ i (ρ) is a T ×T matrix with appropriate variance and covariance entries in the first T i rows and T i columns and zeroes in the last T − T i rows and columns. On the other hand, when the missing data mechanism for the responses is assumed to be MAR (implying that R it does depend on the previous responses \(Y _{i1},\cdots \,,Y _{i,t-1}\)), it can be shown that \(E[R_{it}(Y _{it} -\mu _{it})]\neq 0\). In this situation, the estimator for β based on (7) will be biased and inconsistent. Upon realizing this to be the case, many studies have attempted to correct for this problem by using a modified inverse probability-weighted distance function

$$\displaystyle{ w_{it}^{-1}\left \{H_{ i,t-1}(y);\alpha \right \}\left [R_{it}(Y _{it} -\mu _{it})\right ], }$$
(8)

where \(H_{i,t-1}(y) \equiv H_{i,t-1} = (Y _{i1},\cdots \,,Y _{i,t-1})\), so that the expectation of (8) is zero. Following Robins et al. (1995), for data that are MAR, we can write the probability weight \(w_{it}\left \{H_{i,t-1}(y);\alpha \right \} = w_{it}\) as a function of past responses as follows. Specifically, imagine that the probability that the i-th individual responds at the j-th time point depends on the past lag q responses, where q ≤ j − 1. Letting \(g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha )\) represent this probability, we can write \(g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha ) = P(R_{ij} = 1\mid R_{i1} = 1,\cdots \,,R_{i,j-1} = 1;y_{i,j-1},\cdots \,,y_{i,j-q})\), which can be modeled as

$$\displaystyle{ g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha ) = \frac{exp(1 +\sum _{ l=1}^{q}\alpha _{l}y_{i,j-l})} {1 + exp(1 +\sum _{ l=1}^{q}\alpha _{l}y_{i,j-l})}, }$$
(9)

where α l is a parameter that reflects the dependence of R ij on y i, j − l for all l = 1, ⋯ , q. Robins et al. (1995) set

$$\displaystyle\begin{array}{rcl} w_{it}& =& P(R_{it} = 1,R_{i,t-1} = 1,\cdots \,,R_{i1} = 1\mid H_{i,t-1}) \\ & =& P(R_{it} = 1\mid R_{i,t-1} = \cdots = R_{i1} = 1;H_{i,t-1}) \times \\ & & P(R_{i,t-1} = 1\mid R_{i,t-2} = \cdots = R_{i1} = 1;H_{i,t-2}) \times \\ & &\cdots \times P(R_{i2} = 1\mid R_{i1} = 1;H_{i1})P(R_{i1} = 1) \\ & =& \prod _{j=1}^{t}g_{ ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha ). {}\end{array}$$
(10)

Since monotonic missingness is assumed

$$\displaystyle{ E\left [R_{it}Y _{it}\mid H_{i,t-1}\right ] = P\left [R_{i1} = 1,R_{i2} = 1,\cdots \,,R_{it} = 1;Y _{it} = 1\mid H_{i,t-1}\right ], }$$
(11)

or, alternatively

$$\displaystyle\begin{array}{rcl} E\left [R_{it}Y _{it}\mid H_{i,t-1}\right ]& =& P(R_{i1} = 1)P\left [R_{i2} = 1\mid R_{i1} = 1;H_{i1}\right ]\cdots \\ & & P\left [R_{it} = 1\mid R_{i1} = 1,R_{i2} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}\right ] \\ & & P\left [Y _{it} = 1\mid H_{i,t-1}\right ]. {}\end{array}$$
(12)

Using model (1) and \(g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha )\) given in (9), (12) becomes

$$\displaystyle\begin{array}{rcl} E\left [R_{it}Y _{it}\mid H_{i,t-1}\right ]& =& \prod _{j=1}^{t}g_{ ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha )\lambda _{it} \\ & =& w_{it}\lambda _{it}, {}\end{array}$$
(13)

which implies that

$$\displaystyle{ E\left [\frac{R_{it}Y _{it}} {w_{it}} \mid H_{i,t-1}\right ] =\lambda _{it}, }$$
(14)

thus giving

$$\displaystyle{ E_{H_{i,t-1}}E\left [\frac{R_{it}Y _{it}} {w_{it}} \mid H_{i,t-1}\right ] = E_{H_{i,t-1}}[\lambda _{it}] =\mu _{it}. }$$
(15)

Similarly

$$\displaystyle{ E_{H_{i,t-1}}E\left [\frac{R_{it}\mu _{it}} {w_{it}} \mid H_{i,t-1}\right ] =\mu _{it}, }$$
(16)

suggesting that combining (15) and (16) yields

$$\displaystyle{ E_{H_{i,t-1}}E\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \mid H_{i,t-1}\right ] = 0. }$$
(17)

This unconditional unbiasedness property of the weighted distance or estimating function \(\left [\frac{R_{it}(Y _{it}-\mu _{it})} {w_{it}} \right ]\) motivated many researchers to write a weighted generalized estimating equation (WGEE) and solve it for the β involved in those μ it . The WGEE, first developed by Robins et al. (1995), is reproduced in brief, in Sect. 2.1. Note that to construct the WGEE, Robins et al. (1995) suggested the specification of a user-selected covariance matrix of \(\{(Y _{it} -\mu _{it}),t = 1,\ldots,T_{i}\}\) by pretending that as though the data were complete. Recently, Sutradhar and Mallick (2010) have found that this widely used WGEE approach produces highly biased regression estimates, indicating consistency break down. In this paper, specifically in Sect. 3, we carry out an extensive simulation study considering various degrees of missingness and examine further the inconsistency problem encountered by the WGEE approach.

In Sect. 2.2, we consider a simpler version of a fully standardized GQL (FSGQL) approach discussed by Sutradhar (2013, Sect. 3.2.4) by constructing the weight matrix, that is, unconditional covariance matrix of \(\{\left [\frac{R_{it}(Y _{it}-\mu _{it})} {w_{it}} \right ],t = 1,\ldots,T_{i}\}\) using longitudinal independence (i. e. , ρ = 0). We will refer to this as the FSGQL(I) approach. In the simulation study in Sect. 3, we examine the relative performance of this FSGQL(I) approach with the existing WGEE as well as WGEE(I) (independence assumption-based WGEE) approach.

Further note that if the correlation model for the complete data were known through λ it in (14), one could exploit the conditional distance function \(\left [\frac{R_{it}(Y _{it}-\lambda _{it})} {w_{it}} \mid H_{i,t-1}\right ]\) to construct a conditional-weighted GQL (CWGQL) estimating equation and solve such an equation to obtain consistent regression estimates. We discuss this approach in Sect. 2.3 and include it in the simulation study in Sect. 3 to examine its performance as compared to the aforementioned approaches.

2 Estimation

2.1 WGEE Approach

Robins et al. (1995, Eq. (10), p. 109) used the result in (17) to propose the WGEE

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \mu _{i}^{\prime }} {\partial \beta }{ \left [V _{i}{(\alpha }^{{\ast}})\right ]}^{-1}\Delta _{ i}(y_{i} -\mu _{i}) = 0, }$$
(18)

(see also Paik 1997, Eq. (1), p. 1321) where \(\Delta _{i} = diag(\delta _{i1},\delta _{i2},\cdots \,,\delta _{iT})\) with \(\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\alpha \}= R_{it}/w_{it}\). The quantity \(V _{i}{(\alpha }^{{\ast}})\) is a working covariance matrix of Y i (see Liang and Zeger 1986) that is used in an effort to increase the efficiency of the estimates. Of note is the fact while Robins et al. (1995) suggested a WGEE, they did not account for the missingness in the data when specifying \(V _{i}{(\alpha }^{{\ast}})\); they simply based their working covariance matrix on the complete data formulae. For this reason, this WGEE approach may be referred to as a partially standardized GEE (PSGEE) approach. See the previous article by Sutradhar (2013) in this chapter for details on the use of PSGEE. Note that a user-selected covariance matrix based on complete data that ignores the missing mechanism leads the WGEE to be unstable, in particular, when the proportion of missing data is high, causing breakdown in estimation, i.e., breakdown in consistency. However, this inconsistency issue has not been adequately addressed in the literature including the studies by Robins et al. (1995), Paik (1997), Rotnitzky et al. (1998) and Birmingham et al. (2003). One of the main reasons is that none of the studies used any stochastic correlation structure in conjunction with the missing mechanism to model the binary data in the incomplete longitudinal setup.

In this paper, in order to investigate the effect on the estimates of the regression parameter vector, we propose to replace the working covariance matrix \(V _{i}{(\alpha }^{{\ast}})\) in (18) with a proper unconditional covariance matrix that accommodates the missingness in the data. The proposed approach is presented in the next section.

2.2 FSGQL Approach

The unconditional unbiasedness property in (17), that is,

$$\displaystyle{E_{H_{i,t-1}}E\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \mid H_{i,t-1}\right ] = E_{H_{i,t-1}}E\left [\delta _{it}(Y _{it} -\mu _{it})\mid H_{i,t-1}\right ] = 0}$$

motivates one to develop a FSGQL estimating equation for β, which requires the computation of the unconditional variance of \(\delta _{it}(Y _{it} -\mu _{it})\). Thus, for all t = 1, , T i , we now compute the unconditional covariance matrix, namely

$$\displaystyle{\mbox{ cov}[\Delta _{i}(y_{i} -\mu _{i})] = \Sigma _{i}^{{\ast}}(\beta,\rho,\alpha ),\;\mbox{ (say)},}$$

by using the formula

$$\displaystyle\begin{array}{rcl} \Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )& & = E_{ H_{i}(y)}[\mbox{ cov}\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)] {}\\ & & \quad + \mbox{ cov}_{H_{i}(y)}[E\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)], {}\\ \end{array}$$

where H i (y) denotes the history of responses. For computational details under any specified correlation model, we refer to the previous article by Sutradhar (2013, Sect. 3.2.4). For the binary AR(1) model in (1), the elements of the T i ×T i unconditional covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\alpha )\) are given by

$$\displaystyle\begin{array}{rcl} & & \mbox{ cov}[\delta _{iu}(y_{iu} -\mu _{iu}),\delta _{it}(y_{it} -\mu _{it})] \\ & \!\!\equiv \!\!\!& \left \{\begin{array}{@{\!\!}l} \!\!\sigma _{i,11}^{{\ast}} =\mu _{i1}[1 -\mu _{i1}] \\ \!\!\sigma _{i,tt}^{{\ast}} = E_{H_{i}(y)}[w_{it}^{-1}\{\mu _{it}(1 -\mu _{it}) +\rho (1 - 2\mu _{it})(y_{i,t-1} -\mu _{i,t-1})\},(\mbox{ for}\;t = 2,\ldots,T_{i}) \\ \!\!\sigma {\ast}_{i,ut} {=\rho \rho }^{t-1-u}\mu _{iu}(1 -\mu _{iu}),\;(\mbox{ for}\;u = 1 < t) \\ \!\!\sigma _{i,ut}^{{\ast}} {=\rho { }^{2}\rho }^{t-u}\mu _{i(u-1)}(1 -\mu _{i(u-1)}),\;(\mbox{ for}\;1 < u < t). \end{array} \right.{}\end{array}$$
(19)

Note that the formulas in (19) under the present AR(1) binary model may be verified directly. For example, we compute the t-th diagonal element of the \(\Sigma _{i}^{{\ast}}(\beta,\rho,\alpha )\) matrix as follows. Since \(\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\alpha \}= R_{it}/w_{it}\), we can write

$$\displaystyle\begin{array}{rcl} V ar\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \right ]& =& V ar_{H_{i,t-1}}E\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \mid H_{i,t-1}\right ] \\ & & +E_{H_{i,t-1}}V ar\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \mid H_{i,t-1}\right ],{}\end{array}$$
(20)

where

$$\displaystyle\begin{array}{rcl} V ar_{H_{i,t-1}}E\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \mid H_{i,t-1}\right ]& =& V ar_{H_{i,t-1}}\left [ \frac{1} {w_{it}}w_{it}(\lambda _{it} -\mu _{it})\right ] \\ & =& E_{H_{i,t-1}}\left [{(\lambda _{it} -\mu _{it})}^{2}\right ] {}\end{array}$$
(21)

since \({\left [E_{H_{i,t-1}}(\lambda _{it} -\mu _{it})\right ]}^{2} = 0\), and

$$\displaystyle\begin{array}{rcl} & & E_{H_{i,t-1}}V ar\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \mid H_{i,t-1}\right ] \\ & =& E_{H_{i,t-1}}\left [ \frac{1} {w_{it}}(1 - w_{it})\left \{\lambda _{it}(1 -\lambda _{it}) + {(\lambda _{it} -\mu _{it})}^{2}\right \} +\lambda _{ it}(1 -\lambda _{it})\right ] \\ & =& E_{H_{i,t-1}}\left [ \frac{1} {w_{it}}\lambda _{i}(1 -\lambda _{it}) + \frac{1} {w_{it}}{(\lambda _{it} -\mu _{it})}^{2} - {(\lambda _{ it} -\mu _{it})}^{2}\right ] {}\end{array}$$
(22)

Substituting (21) and (22) into (20) gives

$$\displaystyle\begin{array}{rcl} V ar\left [\frac{R_{it}(Y _{it} -\mu _{it})} {w_{it}} \right ]& =& \mu _{it}(1 -\mu _{it})E_{H_{i,t-1}}\left ( \frac{1} {w_{it}}\right ) +\rho (1 - 2\mu _{it}) \times \\ & &\left [E_{H_{i,t-1}}\left (\frac{Y _{i,t-1}} {w_{it}} \right ) -\mu _{i,t-1}E_{H_{i,t-1}}\left ( \frac{1} {w_{it}}\right )\right ]{}\end{array}$$
(23)

since \(\lambda _{it} =\mu _{it} +\rho (y_{i,t-1} -\mu _{i,t-1})\) by (1). The conditional expectations given the response history, H i, t − 1, in (23) are evaluated as follows:

$$\displaystyle{ E_{H_{i,t-1}}\left ( \frac{1} {w_{it}}\right ) =\sum _{y_{i1},y_{i2},\cdots \,,y_{i,t-1}} \frac{1} {w_{it}}\mu _{i1}^{y_{i1} }{(1 -\mu _{i1})}^{1-y_{i1} }\prod _{j=2}^{t-1}{(\lambda _{ ij})}^{y_{ij} }{(1 -\lambda _{ij})}^{1-y_{ij} } }$$
(24)

and

$$\displaystyle{ E_{H_{i,t-1}}\left (\frac{Y _{i,t-1}} {w_{it}} \right ) =\sum _{y_{i1},y_{i2},\cdots \,,y_{i,t-1}}\left (\frac{y_{i,t-1}} {w_{it}} \right )\mu _{i1}^{y_{i1} }{(1-\mu _{i1})}^{1-y_{i1} }\prod _{j=2}^{t-1}{(\lambda _{ ij})}^{y_{ij} }{(1-\lambda _{ij})}^{1-y_{ij} } }$$
(25)

2.2.1 FSGQL(I) Approach

Note that in a complete longitudinal setup, one may obtain consistent regression estimates even if longitudinal correlations are ignored in developing the estimating equation but such estimates may not be efficient (Sutradhar 2011, Chap. 7). By this token, to obtain consistent regression estimates in the incomplete longitudinal setup, we may still use the independence assumption (i.e., use ρ = 0) but the missing mechanism must be accommodated to formulate the covariance matrix for the construction of the estimating equation. Thus, for simplicity, we now consider a specialized version of the FSGQL approach, namely FSGQL(I) approach, where the GQL estimating equation is developed by using the independence assumption (ρ = 0)-based covariance matrix. More specifically, under this approach, the covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho = 0,\alpha )\) has the form

$$\displaystyle\begin{array}{rcl} \Sigma _{i}^{{\ast}}(\beta,\rho = 0,\alpha )& & \\ & \equiv & \left \{\begin{array}{l} \sigma _{i,11}^{{\ast}} =\mu _{i1}[1 -\mu _{i1}] \\ \sigma _{i,tt}^{{\ast}} = E_{H_{i}(y)}[w_{it}^{-1}\{\mu _{it}(1 -\mu _{it})\}],(\mbox{ for}\;t = 2,\ldots,T_{i}) \\ \sigma {\ast}_{i,ut} = 0,\;(\mbox{ for}\;u\neq t), \end{array} \right.{}\end{array}$$
(26)

where \(E_{H_{i}(y)}[w_{it}^{-1}]\) is computed by (24).

Now by replacing the “working” covariance matrix \(V _{i}{(\alpha }^{{\ast}})\) in the WGEE given in (18) with \(\Sigma _{i}^{{\ast}}(\beta,\rho = 0,\alpha )\), one may obtain the FSGQL(I) estimate for β by solving the estimating equation

$$\displaystyle{ \sum _{i=1}^{K} \frac{\partial \mu _{i}^{\prime}} {\partial beta}{\left [\Sigma _{i}^{{\ast}}(\beta,\rho = 0,\alpha )\right ]}^{-1}\Delta _{ i}(y_{i} -\mu _{i}) = 0. }$$
(27)

2.3 CWGQL Approach

Note that instead of using the distance function with unconditional zero mean, one may like to exploit the distance function with zero mean conditionally. This is possible only when the expectation of the binary response conditional on the past history is known. In this case, by replacing μ it with λ it in (17), one may construct the distance function which has mean zero conditional on the past history, that is,

$$\displaystyle{ E\left [\frac{R_{it}(Y _{it} -\lambda _{it}(\beta,\rho ))} {w_{it}} \mid H_{i,t-1}\right ] = 0, }$$
(28)

where for binary AR(1) model, for example, the conditional mean has the form

$$\displaystyle{ \lambda _{it}(\beta,\rho ) =\mu _{it} +\rho (y_{i,t-1} -\mu _{i,t-1}), }$$
(29)

for t = 2, , T.

Suppose that

$$\displaystyle{\lambda _{i}(\beta,\rho ) = [\lambda _{i1}(\beta ),\lambda _{i2}(H_{i,1}(y),\beta,\rho ),\ldots,\lambda _{iT_{i}}(H_{i,T_{i}-1}(y),\beta,\rho )]^{\prime} }$$

with \(\lambda _{i1}(\beta ) =\mu _{i1}(\beta )\). To develop a GQL-type estimating equation in the conditional approach, one minimizes the distance function

$$\displaystyle{ \sum _{i=1}^{K}[\{\Delta _{ i}(y_{i} -\lambda _{i}(\beta,\rho ))\}^{\prime} \{\mbox{ cov}(\Delta _{i}(y_{i} -\lambda _{i}(\beta,\rho )))\vert H_{i}{(y)\}}^{-1}\{\Delta _{ i}(y_{i} -\lambda _{i}(\beta,\rho ))\}^{\prime} ] }$$
(30)

with respect to β, the parameter of interest. Given the history, let the conditional covariance matrix \(\{\mbox{ cov}(\Delta _{i}(y_{i} -\lambda _{i}(\beta,\rho )))\vert H_{i}(y)\}\) be denoted by Σ ich (β, ρ). Then assuming that β and ρ in Σ ich (β, ρ) are known, minimizing the quadratic distance function (30) with respect to β is equivalent to solving the equation

$$\displaystyle\begin{array}{rcl} & & \sum _{i=1}^{K}\frac{\partial [E\{\Delta _{i}\lambda _{i}(\beta,\rho )\vert H_{i}(y)\}^{\prime} ]} {\partial \beta } \Sigma _{ich}^{-1}(\beta,\rho )\{\Delta _{ i}(y_{i} -\lambda _{i}(\beta,\rho ))\} \\ & =& \sum _{i=1}^{K}\frac{\partial \lambda ^{\prime} _{i}(\beta,\rho )} {\partial \beta } \Sigma _{ich}^{-1}(\beta,\rho )\{\Delta _{ i}(y_{i} -\lambda _{i}(\beta,\rho ))\} = 0. {}\end{array}$$
(31)

2.3 Computational formula for Σ ich (β, ρ, γ)

For convenience, we first write

$$\displaystyle\begin{array}{rcl} \Delta _{i}& =& W_{i}^{-1}R_{ i},\;\mbox{ with} {}\\ W_{i}& =& \mbox{ diag}[w_{i1},w_{i2},\ldots,w_{iT_{i}}],\;\mbox{ and}\;R_{i} = \mbox{ diag}[R_{i1},\ldots,R_{iT_{i}}]. {}\\ \end{array}$$

It then follows that

$$\displaystyle\begin{array}{rcl} \Sigma _{ich}(\beta,\rho )& =& \mbox{ cov}[\{\Delta _{i}(y_{i} -\lambda _{i}(\beta,\rho ))\}\vert H_{i}(y)] \\ & =& W_{i}^{-1}\mbox{ cov}[\{R_{ i}(y_{i} -\lambda _{i}(\beta,\rho ))\}\vert H_{i}(y)]W_{i}^{-1}.{}\end{array}$$
(32)

Now to compute the covariance matrix in (32), we write

$$\displaystyle{R_{i}(y_{i} -\lambda _{i}(\beta,\rho )) = [R_{i1}(y_{i1} -\lambda _{i1}),\ldots,R_{it}(y_{it} -\lambda _{it}),\ldots,R_{iT_{i}}(y_{iT_{i}} -\lambda _{iT_{i}})]^{\prime}.}$$

It then follows that for u < t, for example,

$$\displaystyle{ \mbox{ cov}[\{R_{iu}(y_{iu} -\lambda _{iu}),R_{it}(y_{it} -\lambda _{it})\}\vert H_{i,t-1}(y)] = 0, }$$
(33)

and for t = 1, , T i ,

$$\displaystyle\begin{array}{rcl} & & \mbox{ var}[R_{it}(y_{it} -\lambda _{it})\vert H_{i,t-1}(y)] = \mbox{ var}[R_{it}\vert H_{i,t-1}(y)]\mbox{ var}[y_{it}\vert H_{i,t-1}(y)] \\ & +& {E}^{2}[R_{ it}\vert H_{i,t-1}(y)]\mbox{ var}[y_{it}\vert H_{i,t-1}(y)] + \mbox{ var}[R_{it}\vert H_{i,t-1}(y)]{E}^{2}[(y_{ it} -\lambda _{it})\vert H_{i,t-1}(y)] \\ & =& w_{it}(1 - w_{it})\sigma _{ic,tt} + w_{it}^{2}\sigma _{ ic,tt} \\ & =& w_{it}\sigma _{ic,tt}, {}\end{array}$$
(34)

where σ ic, tt is the conditional variance of y it given the history. For example, in the binary case, \(\sigma _{ic,tt} =\lambda _{it}(1 -\lambda _{it})\).

2.3.1 CWGQL Estimating Equation

Now by substituting (34) and (33) into (32), one obtains

$$\displaystyle{ \Sigma _{ich}(\beta,\rho ) = W_{i}^{-1}W_{ i}\Sigma _{ic}W_{i}^{-1} = W_{ i}^{-1}\Sigma _{ ic}, }$$
(35)

where \(\Sigma _{ic} = \mbox{ diag}[\sigma _{ic,11},\ldots,\sigma _{ic,T_{i}T_{i}}]\). Consequently, when this formula for Σ ich from (35) is applied to the conditional GQL (CGQL) estimating equation in (31), one obtains

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \lambda ^{\prime} _{i}(\beta,\rho )} {\partial \beta } \Sigma _{ic}^{-1}(\beta,\rho )W_{ i}\{\Delta _{i}(y_{i} -\lambda _{i}(\beta,\rho ))\} = 0, }$$
(36)

which is unaffected by the missing MAR mechanism. This is not surprising, as conditional on the history, R it and y it are independent. However, this fully conditional approach requires the modeling of the conditional means of the responses, which is equivalent to modeling the correlation structure.

2.3.2 Conditional Likelihood Estimation

In fact when conditional inference is used, one can obtain likelihood estimates for β and ρ by maximizing the exact likelihood function under the condition that Y it and R it are independent given the history. This is easier for the analysis of longitudinal binary data as compared to the longitudinal analysis for count data subject to MAR.

Since the R it ’s satisfy the monotonic restriction given in Sect. 1, and because R it and Y it are independent conditional on the history under the MAR mechanism, the likelihood function for the ith individual may be expressed as

$$\displaystyle\begin{array}{rcl} L_{i}(\beta,\rho,\alpha )& =& f_{i1}(y_{i1})f_{i2\vert 1}\{(y_{i2},r_{i2} = 1)\vert r_{i1} = 1,y_{i1}\}\ldots \\ & \times & f_{iT_{i}\vert T_{i-1}}\{(y_{iT_{i}},r_{iT_{i}} = 1)\vert r_{i1} = 1,r_{i2} = 1,\ldots,r_{i(T_{i}-1)} = 1,H_{i,t-1}(y)\} \\ & =& \mu _{i1}^{y_{i1} }{[1 -\mu _{i1}]}^{1-y_{i1} }\Pi _{t=1}^{T_{i} }[\{g_{it}\}\{\lambda _{it}^{y_{it} }{(1 -\lambda _{it})}^{1-y_{it} }\}], {}\end{array}$$
(37)

where, by (9),

$$\displaystyle{g_{it}(\alpha ) = P[(R_{it} = 1)\vert r_{i1} = 1,\ldots,r_{i,t-1} = 1,H_{i,t-1}(y)] = \frac{\exp (1 +\alpha y_{i,t-1})} {1 +\exp (1 +\alpha y_{i,t-1})}.}$$

3 Simulation Study

3.1 Comparison Between WGEE (AR(1)), WGEE(I) and FSGQL(I) Approaches: Multinomial Distribution Based Joint Generation of R and y

In this section, we describe and report the results of a simulation study that centers on a comparison of the WGEE approach of Robins et al. (1995) for estimating the regression parameter vector with the proposed FSGQL approach. Recall that the WGEE in (18) was constructed by using a “working” covariance matrix \(V _{i}{(\alpha }^{{\ast}}) = A_{i}^{\frac{1} {2} }R_{i}^{{\ast}}{(\alpha }^{{\ast}})A_{i}^{\frac{1} {2} }\), of the response vector y i . Note that this weight matrix was chosen ignoring the missing mechanism. Furthermore, there is no guideline to choose the “working” correlation matrix \(R_{i}^{{\ast}}{(\alpha }^{{\ast}})\). In the simulation study, we will consider a non-stationary longitudinal binary AR(1) model with true correlation structure C i (ρ) given by (4), for the responses subject to MAR. To examine the performance of the WGEE approach (18), we choose the best possible stationary AR(1) correlation form, namely,

$$\displaystyle{R_{i}^{{\ast}}{(\alpha }^{{\ast}}) = (r_{ ut}^{{\ast}}{(\alpha }^{{\ast}})) = {({\alpha }^{{\ast}}}^{\vert t-u\vert }),}$$

as compared to using MA(1) and EQC-based “working” correlation matrices. We will refer to this WGEE as the WGEE(AR(1)). Also we will consider the simplest version of the WGEE approach, namely WGEE(I), which is obtained based on the independence assumption by using α  ∗  = 0 in the “working” correlation matrix \(R_{i}^{{\ast}}{(\alpha }^{{\ast}})\). These two versions of the WGEE approach will be compared with the FSGQL(I) approach in (27) which was constructed by accommodating missing mechanism but by using longitudinal independence assumption, i.e., ρ = 0 or \(C_{i}(\rho ) = I_{T_{i}}\). For simplicity, in the present simulation study, we do not consider the true complete covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\alpha )\) -based FSGQL approach in (19).

3.1.1 Joint Generation of (R and y) Incomplete Binary Data: Multinomial Distribution Based

In order to generate an incomplete longitudinal binary data set subject to MAR, we follow the approach of Sutradhar and Mallick (2010). Specifically, the procedure initially assumes that every individual provides a response at time t = 1. Thus, since R i1 = 1 for all i = 1, ⋯ , K, a binary response y i1 is generated with marginal probability μ i1. Subsequently, y it is only observed for the i-th individual (i = 1, ⋯ , K) at time t (t = 2, ⋯ , T) when R it  = 1 conditional on having observed the previous t − 1 responses for that individual; in other words, conditional on \(R_{i1} = 1,\cdots \,,R_{i,t-1} = 1\). Therefore, at time t (t = 2, ⋯ , T), both Y it and R it are random variables conditional on the observed history up to time t − 1, and, as such, one of the following three events occurs:

$$\displaystyle\begin{array}{rcl} & & E_{1}: \left [R_{it} = 1,Y _{it} = 1\mid R_{i1} = \cdots = R_{i,t-1} = 1,H_{i,t-1}(y)\right ], {}\\ & & E_{2}: \left [R_{it} = 1,Y _{it} = 0\mid R_{i1} = \cdots = R_{i,t-1} = 1,H_{i,t-1}(y)\right ], {}\\ & & \mbox{ or }E_{3}: \left [R_{it} = 0\mid R_{i1} = \cdots = R_{i,t-1} = 1,H_{i,t-1}(y)\right ]\mbox{, which implies that $y_{it}$} {}\\ \end{array}$$

is not observed.

Let z its  = 1 for any s = 1, 2, 3 indicate that E s has occurred. Then, for ls, z itl  = 0, and it must be the case that \(\sum _{s=1}^{3}z_{its} = 1\). Let \(p_{its} = P(z_{its} = 1)\) for s = 1, 2, 3. If we set q = 1 in (9), and use the resulting equation in conjunction with model (1), the p its may be expressed as

$$\displaystyle\begin{array}{rcl} & & p_{it1} = P(z_{it1} = 1) = P\left [R_{it} = 1,Y _{it} = 1\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ], {}\\ & & p_{it2} = P(z_{it2} = 1) = P\left [R_{it} = 1,Y _{it} = 0\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ], {}\\ \end{array}$$

and

$$\displaystyle{p_{it3} = P(z_{it3} = 1) = P\left [R_{it} = 0\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ],}$$

which can be written as

$$\displaystyle\begin{array}{rcl} p_{it1}& =& P\left [R_{it} = 1\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ]P\left [Y _{it} = 1\mid H_{i,t-1}(y)\right ] \\ & =& g_{it}(y_{i,t-1};\alpha )\lambda _{it}, {}\end{array}$$
(38)
$$\displaystyle\begin{array}{rcl} p_{it2}& =& P\left [R_{it} = 1\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ]P\left [Y _{it} = 0\mid H_{i,t-1}(y)\right ] \\ & =& g_{it}(y_{i,t-1};\alpha )(1 -\lambda _{it}), {}\end{array}$$
(39)

and

$$\displaystyle{ p_{it3} = P\left [R_{it} = 0\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ] = 1 - g_{it}(y_{i,t-1};\alpha ), }$$
(40)

where

$$\displaystyle{ g_{it}(y_{i,t-1};\alpha ) = exp(1 +\alpha y_{i,t-1})/[1 + exp(1 +\alpha y_{i,t-1})]. }$$
(41)

Thus, Sutradhar and Mallick (2010) summarize the data generation routine for the i-th individual, i = 1, ⋯ , K, as follows:

  1. 1.

    Generate y i1 from a Bernoulli distribution with parameter μ i1.

  2. 2.

    For any t > 1, the values of z its for s = 1, 2, 3 are realized according to the multinomial probability distribution

    $$\displaystyle{P(z_{it1},z_{it2},z_{it3}) = \frac{1!} {z_{it1}!z_{it2}!z_{it3}!}p_{it1}^{z_{it1} }p_{it2}^{z_{it2} }p_{it3}^{z_{it3} }}$$

    with \(\sum _{s=1}^{3}z_{its} = 1\). For z its  = 1, allocate the response y it following E s .

  3. 3.

    If z its  = 1, stop generating y it for this individual; otherwise repeat steps (1) and (2) for t ≤ T.

3.1.2 Comparison Under Various Designs

Regarding the simulation study, for each of four designs, we set K = 100 and T = 4 and performed 1,000 replications. We considered three different values of longitudinal correlation parameter, setting ρ = 0.2, 0.5, and 0.8 in turn. In order to investigate the effect of the degree of missingness on the estimates of the regression parameter vector, for \(\Delta _{i} = diag(\delta _{i1},\delta _{i2},\cdots \,,\delta _{iT})\) in (18) with \(\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\alpha \}= R_{it}/w_{it}\), we set \(w_{it} =\prod _{ j=1}^{t}g_{ij}(y_{i,j-1};\alpha )\) according to (10) with q = 1. We then studied two levels for α, namely α = 1, and \(\alpha = -3\) (We assume throughout the simulation study that both ρ and α are known; hence, we do not concern ourselves with estimating these quantities). Note that, according to (41), \(P[R_{it} = 0\mid y_{i,t-1} = 1] = 0.12\) and \(P[R_{it} = 0\mid y_{i,t-1} = 0] = 0.27\) for α = 1, while \(P[R_{it} = 0\mid y_{i,t-1} = 1] = 0.88\) and \(P[R_{it} = 0\mid y_{i,t-1} = 0] = 0.27\) for \(\alpha = -3\). Thus, when \(\alpha = -3\), the rate of missingness is extremely high, as expected.

Initially, we compared the WGEE(I) and FSGQL(I) approaches using a stationary design that essentially contained no covariates. For this design, we simply had a single β 1 = 0. 5, while the associated x it1 = 1 for all i = 1, ⋯ , 100 and t = 1, ⋯ , 4. Table 1 presents the means and standard errors of the WGEE and FSGQL(I) estimates over the 1,000 replications for each of the six combinations of ρ and α. The number of replications that converged is also reported. When the degree of missingness is not overly severe (α = 1), there is little difference in the WGEE(I) and FSGQL(I) estimates. Both approaches produce essentially unbiased estimates, and all replications converge. However, when the degree of missingness is more pronounced \((\alpha = -3)\), the WGEE(I) estimates are significantly biased. In addition, regardless of the value of ρ, more than half of the replications did not converge. On the other hand, the FSGQL(I) estimates are still unbiased, and all replications continue to converge. We investigated the WGEE approach further by considering an AR(1) type “working” correlation structure instead of an independence assumption-based “working” correlation matrix. This WGEE approach is referred to as the WGEE(AR(1)) approach. Specifically, we set \(V _{i}{(\alpha }^{{\ast}}) = A_{i}^{1/2}R_{i}^{{\ast}}{(\alpha }^{{\ast}})A_{i}^{1/2}\), where \(R_{i}^{{\ast}}{(\alpha }^{{\ast}})\) is a T ×T correlation matrix with \(corr(Y _{it},Y _{i,t+l}) {={ \alpha }^{{\ast}}}^{l}\) and \(A_{i} = diag(\sigma _{i,11},\cdots \,,\sigma _{i,tt},\cdots \,,\sigma _{i,T_{i}T_{i}},0,\cdots \,,0)\) with \(\sigma _{i,tt} =\mu _{it}(1 -\mu _{it})\). To avoid estimation of α  ∗  we have used α  ∗  = ρ. The results obtained for each of the six combinations of ρ and α (the missing dependence parameter) are also presented in Table 1. For \(\alpha = -3\), the WGEE(AR(1)) estimates based on an AR(1) type structure are significantly better than those based on independence, and the number of replications that converged is also notably higher. Nonetheless, the independent FSGQL(I) estimates are still noticeably better than either of the WGEE estimates. Also of note is the fact that the WGEE(AR(1)) estimates based on the AR(1) structure for α = 1 are outperformed by their independent covariance structure counterparts.

Table 1 (Based on data using joint generation approach) Simulated means (SM) and standard errors (SSE) based on 1,000 simulations for β = 0. 5 and selected values of ρ and α, selected design

We also considered a stationary design consisting of one covariate with associated parameter β 1 = 0. 5. Specifically, for all t = 1, ⋯ , 4, we set \(x_{it1} = -1\) for \(i = 1,\cdots \,,K/4\), x it1 = 0 for \(i = (K/4) + 1,\cdots \,,3K/4\), and x it1 = 1 for \(i = (3K/4) + 1,\cdots \,,K\). The simulation results associated with this design are presented in Table 1 for each combination of ρ and α. When α = 1, the performance of WGEE(AR(1)), WGEE(I) and FSGQL(I) are very similar. It is also important to note that when \(\alpha = -3\), despite the fact that the average estimates for the regression parameters are better for WGEE under relatively higher longitudinal correlations of ρ = 0.5 and 0.8, the estimated standard errors are significantly smaller for the proposed FSGQL(I) technique. In addition, WGEE experiences convergence problems on a significant number of simulation replications; when an independent covariance structure is assumed, convergence rates ranged between 40% and 60%, approximately, and were only slightly better when an AR(1) structure was specified.

Two designs consisting of two covariates with associated regression parameters \(\beta _{1} =\beta _{2} = 0.5\) were also studied; one consisted of two stationary covariates, the other of nonstationary ones. For the design consisting of two stationary covariates, we set x it1 = 1 for all i = 1, ⋯ , 100 and t = 1, ⋯ , 4 as in the design with no covariate, and x it2 according to the values specified for the single covariate design described above. The two covariates in the nonstationary design were set as follows:

$$\displaystyle{x_{it1} = \left \{\begin{array}{ll} \frac{1} {2}, &\mbox{ for $i = 1,\cdots \,, \frac{K} {4} $; $t = 1,2$} \\ 0, &\mbox{ for $i = 1,\cdots \,, \frac{K} {4} $; $t = 3,4$} \\ -\frac{1} {2},&\mbox{ for $i = \frac{K} {4} + 1,\cdots \,, \frac{3K} {4} $; $t = 1$} \\ 0, &\mbox{ for $i = \frac{K} {4} + 1,\cdots \,, \frac{3K} {4} $; $t = 2,3$} \\ \frac{1} {2}, &\mbox{ for $i = \frac{K} {4} + 1,\cdots \,, \frac{3K} {4} $; $t = 4$} \\ \frac{t} {2T}, &\mbox{ for $i = \frac{3K} {4} + 1,\cdots \,,K$; $t = 1,\cdots \,,4$} \end{array} \right.}$$

and

$$\displaystyle{x_{it2} = \left \{\begin{array}{ll} \frac{t-2.5} {2T},&\mbox{ for $i = 1,\cdots \,, \frac{K} {2} $; $t = 1,\cdots \,,4$} \\ 0, &\mbox{ for $i = \frac{K} {2} + 1,\cdots \,,K$; $t = 1,2$} \\ \frac{1} {2}, &\mbox{ for $i = \frac{K} {2} + 1,\cdots \,,K$; $t = 3,4$} \end{array} \right.}$$

The results for both the stationary and non-stationary two-covariate designs are presented in Table 2. For both designs, when α = 1, the performance of WGEE(I) under an independent covariance structure and FSGQL(I) is very similar. The estimates obtained using WGEE(AR(1)) with an AR(1) structure appear to be biased. When \(\alpha = -3\), and there is a significantly higher degree of missingness, the estimates obtained under WGEE are biased regardless of the assumed covariance structure and the level of longitudinal correlation; this is particularly the case in the nonstationary design. Also of note is the fact that the convergence rates under the WGEE approach are very poor, with the majority under 5% for the nonstationary design.

Table 2 (Based on data using joint generation approach) Simulated means (SM) and standard errors (SSE) based on 1,000 simulations for \(\beta _{1} =\beta _{2} = 0.5\) and selected values of ρ and α, selected design (D) with p = 2 stationary (S) and non-stationary (NS) covariates

3.2 Comparison of WGEE(AR(1)), WGEE(I), and FSGQL(I) Approaches: Generating R and y conditionally

Because R it and y it are independent conditional on the history H i, t − 1(y), instead of generating them by using a multinomial distribution discussed in Sect. 3.1.1, one may generate them by using a conditional approach as follows:

  1. 1.

    Generate y i1 from bin(μ i1) for all i, i = 1, , K.

  2. 2.

    For i-th individual, generate R i2 from bin(g i2), where g it is given by (9) for t = 2, , T.

  3. 3.

    If R i2 = 0, consider R ij  = 0 and stop generating y ij (j = 2, ⋯ , T).

  4. 4.

    If R i2 = 1, generate y i2 from bin(λ i2), where λ it is the mean of Y it conditional on y i, t − 1 for t = 2, , T, as given by (1).

  5. 5.

    Repeat from step 2 for j = 3, ⋯ , T.

The estimates for the same designs are obtained as in Sect. 3.1.2, and the simulation results are reported in Tables 3 and 4. The results are similar to those of Tables 1 and 2, except that WGEE approaches appear to encounter more convergence problems especially when proportion of missing values is large.

Table 3 (Based on data using conditional approach) Simulated means (SM) and standard errors (SSE) based on 1,000 simulations for β = 0. 5 and selected values of ρ and α, selected design
Table 4 (Based on data using conditional approach) Simulated means (SM) and standard errors (SSE) based on 1,000 simulations for \(\beta _{1} =\beta _{2} = 0.5\) and selected values of ρ and α, selected design (D) with p = 2 stationary (S) and non-stationary (NS) covariates

3.3 Performance of CWGQL Approach: Multinomial Distribution-Based Joint Generation of R and y

As opposed to the marginal approach where the unconditional mean function μ it (β) is modeled, in the longitudinal setup it is more appropriate to model the conditional regression (mean) function. When complete longitudinal binary data follow an AR(1)-type correlation model, as pointed out in (1), the conditional regression function may be modeled as

$$\displaystyle{\lambda _{it}(\beta,\rho ) =\mu _{it} +\rho (y_{i,t-1} -\mu _{i,t-1}),\mbox{ for}\;t = 2,\ldots,T.}$$

Furthermore, as pointed out in (28), because in the incomplete longitudinal setup with MAR mechanism one finds

$$\displaystyle{E\left [\frac{R_{it}(Y _{it} -\lambda _{it}(\beta,\rho ))} {w_{it}} \mid H_{i,t-1}\right ] = 0,}$$

the regression parameter β in μ it (β) modeled through λ it (β, ρ) can be estimated by solving the CWGQL estimating (36). For the same design parameters used in Sect. 3.1.2 for Tables 1 and 2, by generating incomplete data using the multinomial distribution discussed in Sect. 3.1.1, we have obtained the CWGQL estimates for β under different scenarios as for the results shown in Tables 1 and 2. The CWGQL estimates along with their standard errors are reported in Tables 5 and 6.

Table 5 (Based on data using joint generation approach) Simulated means (SM) and standard errors (SSE) for CWGQL approach with β = 0. 5 and selected values of ρ and α, based on 1,000 simulations
Table 6 (Based on data using joint generation approach) Simulated means (SM) and standard errors (SSE) for CWGQL approach with \(\beta _{1} =\beta _{2} = 0.5\) and selected values of ρ and α, based on 1,000 simulations

In order to examine the relative performance of the CWGQL approach with those of WGEE(AR(1)), WGEE(I), and FSGQL(I), it is sufficient to compare the CWGQL approach with the FSGQL(I) approach only. This is because it was found from the results in Tables 1 and 2 that the WGEE approaches may encounter serious convergence problems (showing consistency breakdown) and also may produce highly biased estimates, where the FSGQL(I) approach, in general, does not encounter such convergence problems and produces almost unbiased estimates even if a large proportion of values are missing. Now as compared to the FSGQL(I) approach, the CWGQL approach appears to produce slightly more efficient estimates than the FSGQL(I) approach. For example, when α = 1 (moderate missing) and ρ = 0. 5, in the no-covariate case, the FSGQL(I) approach (Table 1) produces an average estimate of β = 0. 5 as 0. 504 with standard error 0. 169, whereas the CWGQL approach (Table 5) produces β estimate as 0. 501 with standard error 0. 158. Similarly when ρ = 0. 8 and \(\alpha = -3\) (high missing), in the stationary one-covariate case, FSGQL(I) produces an estimate with standard error 0. 325 as compared to 0. 291 for CWGQL. Similar results are found for the stationary two-covariate case. Also in these stationary cases, the CWGQL approach does not encounter any convergence problems even if the proportion of missing is high. In the non-stationary cases however, the CWGQL approach encounters some convergence problems when the proportion of missing is high, but the problem is less serious than the WGEE and WGEE(I) approaches.

4 Conclusion and Discussion

It was found that the existing WGEE (Robins et al. 1995) and WGEE(I) approaches in general encounter convergence problems when the proportion of missing is high, and the WGEE approach may produce highly biased estimates even when the proportion of missing is moderate or low. These results agree with the recent study reported by Sutradhar and Mallick (2010). The WGEE(I) approach, however, produces almost unbiased estimates and consequently this approach produces consistent estimates when the proportion of missing is moderate or low. However, it can be inefficient. The proposed FSGQL(I) approach does not appear to encounter any serious convergence problems even when the proportion of missing is high and the covariates are non-stationary. Also, it produces unbiased estimates similar to the WGEE(I) approach but with smaller standard errors, showing that FSGQL(I) is more efficient as expected than the WGEE(I) approach. Thus even with high proportion of missing, one may reliably use the proposed FSGQL(I) approach for regression estimation whether the covariates are stationary or time dependent. The general FSGQL approach is supposed to increase the efficiency as compared to the FSGQL(I) approach when correlations are large, but this will be studied in the future.

We have also reported some results on the performance of a conditional estimating equation, namely CWGQL estimating equation approach. This approach was found to produce regression estimates with more efficiency than the FSGQL(I) approach. However as compared to the FSGQL(I) approach it encounters convergence problems when covariates are time dependent and the proportion of missing is high. However, it experiences less convergence problems than the WGEE(I) approach.