Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Missing data analysis in the independent setup with multivariate responses has a long history. For example, for an early work, we refer to Lord (1995) who considered a set of incomplete trivariate normal responses collected from K independent individuals. But all components of all three variables were not available from the K individuals. To estimate the mean parameters consistently, instead of dropping out the individuals with incomplete information, Lord (1995) has utilized the available information and constructed unbalanced (bivariate and trivariate) probability functions for individuals toward writing a likelihood function for the desired inference. Note that this technique for consistent estimation of the parameters and other similar inferences by using incomplete data have been used by many researchers over the last six decades. See, for example, Mehta and Gurland (1973), Morrison (1973), Naik (1975), Little (1988), and Krishnamoorthy and Pannala (1999), among others.

In the independent setup, techniques of imputation and multiple imputation (Rubin 1976; Rubin and Schenker 1986; Meng 1994) have also been widely used. Some authors such as Paik (1997) used this imputation technique in repeated measure (longitudinal) setup. The imputation at a given time point is done mainly by averaging over the responses of other individuals at that time who has the same covariates history as that of the individual concerned. Once the missing values are estimated, they are used as data with necessary adjustments to construct complete data based estimating equations for the desired parameters.

In a univariate longitudinal response setup, when T repeated measures are taken they become correlated and hence they jointly follow a T-dimensional multivariate distribution. However, unlike in the Gaussian setup for linear data, the multivariate distributions for repeated binary and count data become complex or impractical. However if a portion of individuals do not provide responses for all T time points, then adopting likelihood approach by blending missing mechanism and correlation structure of the repeated data would naturally become extremely complicated or impossible. As a remedy, either imputation or estimating equation approaches became popular which, however, work well if the missing data occur following the simplest MCAR mechanism. When the missing data occur following the MAR mechanism, writing a proper estimating equation by accommodating both longitudinal correlations and missing mechanism becomes difficult. Robins et al. (1995) proposed an inverse probability weights based generalized estimating equations (WGEE) approach as an extension of the GEE approach proposed by Liang and Zeger (1986) to the incomplete setup. Remark that as demonstrated by Sutradhar and Das (1999) and Sutradhar (2010), for example, the GEE approach can produce less efficient regression estimates than the well-known simpler moments or quasi-likelihood (QL) estimates, in the complete data setup. Thus, to be realistic, there is no reason how WGEE approach can be more efficient in the incomplete longitudinal setup as compared to simpler moments and QL estimates. In fact in the incomplete longitudinal setup, the WGEE approach constructed based on working correlations as opposed to the use of MAR based correlation matrix may yield biased and hence inconsistent regression estimates (Sutradhar and Mallick 2010). Further remark that this inconsistency issue was, however, not adequately addressed in the literature including the studies by Robins et al. (1995), Paik (1997), Rotnitzky et al. (1998), and Birmingham et al. (2003). One of the main reasons is this that none of the studies used any stochastic correlation structure in conjunction with the missing mechanism to model the longitudinal count and binary data in the incomplete longitudinal setup. Details on this inconsistency problem are given in Sect. 3, whereas in Sect. 2 we provide a detailed discussion on missing data analysis in independent setup.

Without realizing the aforementioned inconsistency problems that can be caused because of the use of working correlations in the estimating equations under the MAR based longitudinal setup, some authors such as Wang (1999) and Rotnitzky et al. (1998) used similar estimating equations approach in non-ignorable missing mechanism-based incomplete longitudinal setup. Some authors such as Troxel et al. (1998) (see also Troxel et al. 1997) and Ibrahim et al. (2001) (see also Ibrahim et al. 1999) have used random effects based generalized linear mixed model to accommodate the longitudinal correlations and certain binary logistic models to generate the non-ignorable mechanism based response indicator variables. In general expectation-maximization (EM) techniques are used to estimate the likelihood based parameters. These approaches appear to encounter similar difficulties as the existing MAR based approaches in generating first the response indicator and then the responses so that underlying longitudinal correlation structure is satisfied. Thus the inference validity of these approaches is not yet established. This problem becomes more complicated when longitudinal correlations are not generated through random effects and writing a likelihood such as for repeated count data becomes impossible. For clarity, in this paper we discuss in detail the successes and challenges with the inferences for MAR based incomplete longitudinal models only. The non-ignorable missing data based longitudinal analysis will therefore be beyond the scope of the paper.

2 Missing Data Analysis in Independent Setup

Missing data analysis in the independent setup with multivariate responses has a long history. For example, for an early work, we refer to Lord (1995) who considered a set of incomplete trivariate normal responses collected from K independent individuals. To be specific, suppose that \(y = (y_{1},y_{2},y_{3})^{\prime} \) represents a trivariate response, but all components of y were not available from K individuals. Suppose that y 3 was recorded from all K individuals, and either y 1 or y 2 was recorded for all individuals, but not both. For j = 1, , 3, let K j denote the number of individuals having the response y j . It then follows that

$$\displaystyle{K_{1} + K_{2} = K,\;K3 = K.}$$

Further suppose that the K 1 individuals for whom y 1 is recorded will be denoted collectively as group 1 (G 1); and the K 2 individuals with y 2 will be denoted as group 2 (G 2). Now because y 1 and y 2 are correlated, it is obvious that the data for G 2 contain some information relevant for estimating the parameters of variable y 1, and that the data for G 1 contain some information relevant for estimating the parameters of y 2. The problem is to use the available data as efficiently as possible for estimating the parameters concerned. Denote the distribution of \(y = [y_{1},\;y_{2},\;y_{3}]^{\prime} \) as

$$\displaystyle{y \sim N(\mu,\Sigma ),}$$

with \(\mu = [\mu _{1},\mu _{2},\mu _{3}]^{\prime} \) and

$$\displaystyle{\Sigma = \left (\begin{array}{lll} \sigma _{11} & \rho _{12}{[\sigma _{11}\sigma _{22}]}^{\frac{1} {2} } & \rho _{13}{[\sigma _{11}\sigma _{33}]}^{\frac{1} {2} } \\ & \sigma _{22} & \rho _{23}{[\sigma _{22}\sigma _{33}]}^{\frac{1} {2} } \\ & & \sigma _{33}\\ \end{array} \right ).}$$

Note that in this setup, there are no data available to estimate ρ 12. For the likelihood estimation of all the other parameters, define

$$\displaystyle\begin{array}{rcl} \bar{y}_{1}^{{\ast}}& =& \frac{1} {K_{1}}\sum _{i=1}^{K_{1} }y_{1i},\;\bar{y}_{2}^{{\ast}} = \frac{1} {K_{2}}\sum _{i=1}^{K_{2} }y_{2i},\;\bar{y}_{3} = \frac{1} {K}\sum _{i=1}^{K}y_{ 3i},\;\bar{y}_{3}^{{\ast}} = \frac{1} {K_{1}}\sum _{i=1}^{K_{1} }y_{3i},\;\bar{y}_{3}^{{\ast}{\ast}} = \frac{1} {K_{2}}\sum _{i=1}^{K_{2} }y_{3i} \\ s_{11}^{{\ast}}& =& \frac{1} {K_{1}}\sum _{i=1}^{K_{1} }{[y_{1i} -\bar{ y}_{1}^{{\ast}}]}^{2},\;s_{ 22}^{{\ast}} = \frac{1} {K_{2}}\sum _{i=1}^{K_{2} }{[y_{2i} -\bar{ y}_{2}^{{\ast}}]}^{2},\;s_{ 33} = \frac{1} {K}\sum _{i=1}^{K}{[y_{ 3i} -\bar{ y}_{3}]}^{2}, \\ s_{33}^{{\ast}}& =& \frac{1} {K_{1}}\sum _{i=1}^{K_{1} }{[y_{3i} -\bar{ y}_{3}^{{\ast}}]}^{2},\;s_{ 33}^{{\ast}{\ast}} = \frac{1} {K_{2}}\sum _{i=1}^{K_{2} }{[y_{3i} -\bar{ y}_{3}^{{\ast}{\ast}}]}^{2} \\ r_{13}& =& \frac{1} {K_{1}}\sum _{i=1}^{K_{1} }[(y_{1i} -\bar{ y}_{1}^{{\ast}})(y_{ 3i} -\bar{ y}_{3}^{{\ast}})]/{[s_{ 11}^{{\ast}}s_{ 33}^{{\ast}}]}^{\frac{1} {2} }, \\ r_{23}& =& \frac{1} {K_{2}}\sum _{i=1}^{K_{2} }[(y_{2i} -\bar{ y}_{2}^{{\ast}})(y_{ 3i} -\bar{ y}_{3}^{{\ast}{\ast}})]/{[s_{ 22}^{{\ast}}s_{ 33}^{{\ast}{\ast}}]}^{\frac{1} {2} }. {}\end{array}$$
(1)

The maximum likelihood estimators for the means are then given by

$$\displaystyle{ \hat{\mu }_{1} =\bar{ y}_{1}^{{\ast}}- b_{ 13}[\bar{y}_{3}^{{\ast}}-\bar{ y}_{ 3}],\;\hat{\mu }_{2} =\bar{ y}_{2}^{{\ast}}- b_{ 23}[\bar{y}_{3}^{{\ast}{\ast}}-\bar{ y}_{ 3}],\;\mbox{ and}\;\hat{\mu }_{3} =\bar{ y}_{3}, }$$
(2)

where

$$\displaystyle{b_{13} = r_{13}\frac{s_{11}^{{\ast}}} {s_{33}^{{\ast}}},\;\mbox{ and}\;b_{23} = r_{23} \frac{s_{22}^{{\ast}}} {s_{33}^{{\ast}{\ast}}}.}$$

These estimators in (2) are unbiased and consistent for \(\mu _{1},\;\mu _{2},\;\mbox{ and}\;\mu _{3}\), respectively. The remaining parameters may also be estimated similarly (Lord 1995).

Note that the aforementioned technique for consistent estimation of the parameters and for other similar inferences by using incomplete data has been subsequently used by many researchers over the last six decades. See, for example, Mehta and Gurland (1973), Morrison (1973), Naik (1975), Little (1988), and Krishnamoorthy and Pannala (1999), among others. This idea of making inferences about the underlying model parameters such that the missing data (assuming a small proportion of missing) may not to any major extent negatively influence the inferences has also been extended to the analysis of incomplete repeated measure data. For example, one may refer to Little (1995), Robins et al. (1995), and Paik (1997), as some of the early studies. This inference procedure for incomplete longitudinal data is discussed in detail in the next section.

In the independent setup, techniques of imputation and multiple imputation (Rubin 1976; Rubin and Schenker 1986; Meng 1994) have also been widely used. Later on some authors also used this imputation technique in repeated measure (longitudinal) setup. For example, here we illustrate an imputation formula from Paik (1997) in repeated measure setup. The imputation at a given time point is done mainly by averaging over the responses of other individuals at that time who has the same covariates history as that of the individual concerned. Once the missing values are estimated, they are used as data with necessary adjustments to construct complete data based estimating equations for the desired parameters.

In a univariate longitudinal response setup, when T repeated measures are taken they become correlated and hence they jointly follow a T-dimensional multivariate distribution. Now suppose that T i responses are observed for the ith (i = 1, , K) individual. So, one requires to impute T − T i missing values which may be done following Paik (1997), for example. Interestingly, a unified recursive relation can be developed as follows to obtain the imputed value \(\tilde{y}_{i,T_{i}+k_{i}}\) at time point \(T_{i} + k_{i}\) for all \(k_{i} = 1,\ldots,T - T_{i}\). For this, first define

$$\displaystyle{ \tilde{y}_{j,T_{i}+k_{i}}^{(0)} = y_{ j,T_{i}+k_{i}} }$$
(3)

for the jth individual where ji, j = 1, , K. Also, let \(D_{iT_{i}}\) denote the covariate history up to time point T i for the ith individual, and

$$\displaystyle{D_{i,T_{i}+k_{i}}^{{\ast}} = (x_{ i,T_{i}+1},\ldots,x_{i,T_{i}+k_{i}})}$$

is the covariate information for the ith individual from time T i  + 1 up to \(T_{i} + k_{i}\) for \(k_{i} = 1,\ldots,T - T_{i}\). Further let, \(r_{jw} = 1,\mbox{ or},0\), for example, indicates the response status of the jth individual at wth time. One may then obtain \(\tilde{y}_{i,T_{i}+k_{i}}\) by computing \(\tilde{y}_{i,T_{i}+k_{i}}^{(k_{i})}\), that is, \(\tilde{y}_{i,T_{i}+k_{i}} \equiv \tilde{ y}_{i,T_{i}+k_{i}}^{(k_{i})}\), where

$$\displaystyle\begin{array}{rcl} & & \tilde{y}_{i,T_{i}+k_{i}}^{(k_{i})} = \left [\sum _{ j=1}^{K}\tilde{y}_{ j,T_{i}+k_{i}}^{(0)}\Pi _{ u=1}^{k_{i} }r_{j,T_{i}+u}I(D_{jT_{i}} = D_{iT_{i}},D_{j,T_{i}+k_{i}}^{{\ast}} = D_{ i,T_{i}+k_{i}}^{{\ast}})\right. \\ & +& \sum _{m_{i}=1}^{k_{i}-1}\sum _{ j=1}^{K}\tilde{y}_{ j,T_{i}+k_{i}}^{(m_{i})}\Pi _{ u=k_{i}-(m_{i}-1)}^{k_{i} }(1 - r_{j,T_{i}+u}) \\ & & \times \left.\Pi _{u=1}^{k_{i}-m_{i} }r_{j,T_{i}+u}I(D_{jT_{i}} = D_{iT_{i}},D_{j,T_{i}+k_{i}}^{{\ast}} = D_{ i,T_{i}+k_{i}}^{{\ast}})\right ] \\ & \times &{ \left [\sum _{j=1}^{K}r_{ j,T_{i}+1}I(D_{jT_{i}} = D_{iT_{i}},D_{j,T_{i}+k_{i}}^{{\ast}} = D_{ i,T_{i}+k_{i}}^{{\ast}})\right ]}^{-1}. {}\end{array}$$
(4)

Note that \(\tilde{y}_{i.T_{i}+k_{i}} \equiv \tilde{ y}_{i,T_{i}+k_{i}}^{(k_{i})}\) is an unbiased estimate of \(\mu _{i,T_{i}+k_{i}}\) as the individuals used to impute the missing value of the ith subject has the same covariate history up to time point \(T_{i} + k_{i}\), unlike the covariate history up to time point T i (Paik 1997).

3 Missing Data Models in Longitudinal Setup

Let Y it be the potential response from the ith (i = 1, , K) individual at time point t which may or may not be observed, and \(x_{it} = (x_{it1},\ldots,x_{itp})^{\prime} \) be the corresponding p-dimensional covariate vector which is assumed to be available for all times t = 1, , T. In this setup, K is large (K) and T is small such as 3 or 4. Suppose that \(\beta = (\beta _{1},\ldots,\beta _{p})^{\prime} \) denote the effect of x it on y it . Irrespective of the situation whether Y it is observed or not, it is appropriate in the longitudinal setup to assume that the repeated responses follow a correlation model with known functional forms for the mean and the variance, but the correlation structure may be unknown. Recall that in the independent setup, Lord (1995) considered multivariate responses having a correlation structure and incompleteness arose because of missing information on some response variables, whereas in the present longitudinal setup, repeated responses from an individual form a multivariate response with a suitable mean, variance, and correlation structures, but it remains a possibility that one individual may not provide responses for the whole duration of the study. As indicated in the last section, suppose that for the ith (i = 1, , K) individual T i responses (1 < T i  ≤ T) are collected. Also suppose that the remaining T − T i potential responses are missing and the non-missing responses occur in a monotonic pattern.

As far as the mean, variance, and correlation structure of the potential responses are concerned, it is convenient to define them for the complete data. Let \({y_{i}}^{c} = {(y_{i1},\cdots \,,y_{it},\cdots \,,y_{iT})}^{{\prime}}\) and \({X_{i}}^{c} = {(x_{i1},\cdots \,,x_{it},\cdots \,,x_{iT})}^{^{\prime} }\) denote the T ×1 complete outcome vector and T ×p covariate matrix, respectively, for the i-th (i = 1, ⋯ , K) individual over T successive points in time. Also, let

$$\displaystyle{ E({Y _{i}}^{c}\vert {x_{ i}}^{c}) ={ \mu _{ i}}^{c}(\beta ) = {(\mu _{ i1}(\beta ),\cdots \,,\mu _{it}(\beta ),\cdots \,,\mu _{iT}(\beta ))}^{^{\prime} } }$$
(5)

where \(\mu _{it}(\beta ) = {h}^{-1}(\eta _{it})\;\;\mbox{ with}\;\;\eta _{it} = x_{it}^{^{\prime} }\beta\), h being a suitable link function. For example, for linear models, a linear link function is used so that \(\mu _{it}(\beta ) = x^{\prime} _{it}\beta;\) whereas for the binary data a logistic link function is commonly used so that \(\mu _{it}(\beta ) =\exp (\eta _{it})/[1 +\exp (\eta _{it})]\), and for count data a log linear link function is used so that \(\mu _{it}(\beta ) =\exp (\eta _{it})\). Further let

$$\displaystyle{ \Sigma _{i}^{c}(\beta,\rho ) ={ A_{ i}^{c}}^{\frac{1} {2} }(\beta )\tilde{C}_{i}(\rho,x_{i}^{c}){A_{i}^{c}}^{\frac{1} {2} }(\beta ) }$$
(6)

be the true covariance matrix of \(y_{i}^{c}\), where \(A_{i}^{c}(\beta ) = \mbox{ diag}[\sigma _{i11}(\beta ),\cdots \,,\sigma _{itt}(\beta ),\cdots \,,\sigma _{iTT}(\beta )]\) with \(\sigma _{itt}(\beta ) = var(Y _{it})\), and \(\tilde{C}_{i}(\rho,x_{i}^{c})\) is the correlation matrix for the ith individual with ρ as a suitable vector of correlation parameters, for example, \(\rho \equiv (\rho _{1},\ldots,\rho _{\ell},\ldots,\rho _{T-1})^{\prime} \), where ρ is known to be the lag auto-correlation. Note that when covariates are time dependent, the true correlation matrix is free from time-dependent covariates in linear longitudinal setup, but it depends on the time-dependent covariates through \(X_{i}^{c}\) in the discrete longitudinal setup (Sutradhar 2010). In the stationary case, that is, when covariates are time independent, we will denote the correlation matrix by \(\tilde{C}(\rho )\) in the complete longitudinal setup, and similar to Sutradhar (2010, 2011), this matrix satisfies the auto-correlation structure given by

$$\displaystyle{ \tilde{C}(\rho ) = \left [\begin{array}{ccccc} 1 & \rho _{1} & \rho _{2} & \cdots &\rho _{T-1} \\ \rho _{1} & 1 & \rho _{1} & \cdots &\rho _{T-2}\\ \vdots & \vdots & \vdots & & \vdots \\ \rho _{T-1} & \rho _{T-2} & \rho _{T-3} & \cdots & 1\\ \end{array} \right ], }$$
(7)

where for = 1, , T, ρ is known to be the th lag auto-correlation. Note that when this correlation structure (7) will be used in the incomplete longitudinal setup, it would be denoted by \(\tilde{C_{i}}(\rho )\) as it will be constructed for T i available responses.

As far as the missing mechanism is concerned, it is customary to assume that a longitudinal response may be missing completely at random (MCAR), or missing at random (MAR), or the missing can be non-ignorable. Under the MCAR mechanism, the missing-ness does not depend on any present, past, or future responses. Under the MAR mechanism, the missing-ness depends only on the past responses but not on the present or future responses, whereas under the non-ignorable mechanism the missing-ness depends on the past, present, and future possible responses. In notation, let R it be a response indicator variable at time t (t = 1, ⋯ , T) for the i-th (i = 1, ⋯ , K) individual, so that

$$\displaystyle{ R_{it} = \left \{\begin{array}{ll} 1,&\mbox{ if}\ Y _{it}\mbox{ is observed} \\ 0,&\mbox{ otherwise.} \end{array} \right. }$$
(8)

Note that all individuals provide the responses at the first time point t = 1. Thus, we set R i1 = 1 with \(P(R_{i1} = 1) = 1.0\) for all i = 1, ⋯ , K. Further we assume that the response indicators satisfy the monotonic relationship

$$\displaystyle{ R_{i1} \geq R_{i2} \geq \cdots \geq R_{it} \geq \cdots \geq R_{iT}. }$$
(9)

Next suppose that r it denote the observed value for R it . For \(t = 2,\ldots,T,\) one may then describe the aforementioned three missing mechanisms as

$$\displaystyle\begin{array}{rcl} \mbox{ MCAR Model}&: & Pr(R_{it} = 1\mid {y_{i}}^{c},x_{ i},r_{i,t-1} = 1) = Pr(R_{it} = 1\mid r_{i,t-1} = 1) {}\\ \mbox{ MAR Model}&: & Pr(R_{it} = 1\mid {y_{i}}^{c},x_{ i},r_{i,t-1} = 1) {}\\ & =& Pr(R_{it} = 1\mid y_{i1},\cdots \,,y_{i,t-1},x_{i},r_{i,t-1} = 1) {}\\ \mbox{ Non-ignorable Model}&: & Pr(R_{it} = 1\mid {y_{i}}^{c},x_{ i},r_{i,t-1} = 1) {}\\ & =& Pr(R_{it} = 1\mid y_{i1},\cdots \,,y_{i,t-1},y_{it},\ldots,y_{iT},x_{i},r_{i,t-1} = 1) {}\\ \end{array}$$

(Little and Rubin 1987; Laird 1988; Fitzmaurice et al. 1996). Furthermore, it follows under the monotonic missing pattern (9) that \(Pr(R_{it} = 1\vert y_{i}^{c},x_{i},r_{i,t-1} = 0) = 0\) irrespective of the missing mechanism. Note that the inferences based on the non-ignorable missing mechanism may be quite complicated, and we do not include this complicated mechanism in the current paper.

3.1 Inferences When Longitudinal Responses Are Subject to MCAR

When the longitudinal responses are MCAR, R it does not depend on the past, present, or future responses. In such a situation, R it and Y it are independent, implying that

$$\displaystyle{ E[R_{it}(Y _{it} -\mu _{it}(\beta ))] = E[R_{it}]E[Y _{it} -\mu _{it}(\beta )] = 0, }$$
(10)

because \(E[Y _{it} -\mu _{it}(\beta )] = 0\). It is then clear that the inference for β involved in μ it (β) is not affected by the MCAR mechanism. Thus, one may estimate the regression effects β consistently and efficiently by solving the GQL estimating equation

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \mu _{i}^{^{\prime} }(\beta )} {\partial \beta } \Sigma _{i}^{-1}(\beta,\hat{\rho })(y_{ i} -\mu _{i}(\beta )) = 0, }$$
(11)

where for T i -dimensional observed response vector \(y_{i} = (y_{i1},\ldots,y_{iT_{i}})^{\prime} \),

$$\displaystyle\begin{array}{rcl} \mu _{i}(\beta ) = E[Y _{i}]& =& {(\mu _{i1}(\beta ),\cdots \,,\mu _{it}(\beta ),\cdots \,,\mu _{iT_{i}}(\beta ))}^{^{\prime} } {}\\ \Sigma _{i}(\beta,\hat{\rho })& =& A_{i}^{1/2}(\beta )\tilde{C}_{ i}(\rho,x_{i})A_{i}^{1/2}(\beta ), {}\\ \end{array}$$

with \(A_{i}(\beta ) = \mbox{ diag}(\sigma _{i,11}(\beta ),\cdots \,,\sigma _{i,tt}(\beta ),\cdots \,,\sigma _{i,T_{i}T_{i}}(\beta ))\), where \(\sigma _{i,tt}(\beta ) = \mbox{ var}[Y _{it}]\). Note that the incomplete data based estimating equation (11) can be written in terms of pretended complete data. To be specific, by using the available responses \(y_{i} = (y_{i1},\ldots,y_{iT_{i}})^{\prime} \) corresponding to the known response indicators

$$\displaystyle{R_{i}^{c} = r_{ i}^{c} = \left [\begin{array}{cc} I_{T_{i}} & 0 \\ 0 &0, \end{array} \right ]}$$

one may write the GQL estimating equation (11) under the MCAR mechanism as

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial {\mu _{i}^{c}}^{^{\prime} }(\beta )} {\partial \beta }{ \left [\{I - r_{i}^{c}\} + r_{ i}^{c}\Sigma _{ i}^{c}(\beta,\hat{\rho }){r_{ i}^{c}}^{^{\prime} }\right ]}^{-1}r_{ i}^{c}\left (y_{ i}^{c} -\mu _{ i}^{c}(\beta )\right ) = 0, }$$
(12)

where \(y_{i}^{c} = {(y_{i}^{^{\prime} },y_{im}^{^{\prime} })}^{^{\prime} }\) with y im representing the T − T i dimensional missing responses which are unobserved but for the computational purpose in the present approach one can use it as a zero vector, for convenience, without any loss of generality. Let \(\hat{\beta }_{GQL,MCAR}\) denote the solution of (11) or (12). This estimator is asymptotically unbiased and hence consistent for β.

Note that the computation of \(\tilde{C}_{i}(\hat{\rho },x_{i})\) matrix in (11) in general, i.e., when covariates are time dependent, depends on the specific correlation structure (Sutradhar 2010). In stationary cases as well as in linear longitudinal model setup, one may, however, compute the stationary correlation matrix \(\tilde{C}_{i}(\hat{\rho })\), by first computing a larger \(\tilde{C}(\hat{\rho })\) matrix for \(\ell= 1,\ldots,T - 1\), and then using the desired part of this large matrix for t = 1, , T i . Turning back to the computation for the larger matrix with dimension \(T = \mbox{ max}_{1\leq i\leq K}T_{i}\) for T i  ≥ 2, we exploit the observed response indicator r it given by

$$\displaystyle\begin{array}{rcl} r_{it}& =& \left \{\begin{array}{ll} 1&\mbox{ if}\;t \leq T_{i} \\ 0&\mbox{ if}\;T_{i} < t \leq T. \end{array} \right.{}\\ \end{array}$$

for all t = 1, , T. For known β and σ i t t , the th lag correlation estimate \(\hat{\rho }_{\ell}\) for the larger \(\tilde{C}(\hat{\rho })\) matrix may be computed as

$$\displaystyle{ \hat{\rho }_{\ell} = \frac{\sum _{i=1}^{K}\sum _{t=1}^{T-\ell}r_{it}r_{i,t+\ell}[\left (\frac{y_{it}-x^{\prime} _{it}\beta } {\sigma _{itt}} \right )\left (\frac{y_{i,t+\ell}-x^{\prime} _{it,t+\ell}\beta } {\sigma _{i,t+\ell,t+\ell}} \right )]/\sum _{i=1}^{K}\sum _{t=1}^{T-\ell}r_{it}r_{i,t+\ell}} {\sum _{i=1}^{K}\sum _{t=1}^{T}r_{it}{[\frac{y_{it}-x^{\prime} _{it}\beta } {\sigma _{itt}} ]}^{2}/\sum _{i=1}^{K}r_{it}}, }$$
(13)

(cf. Sneddon and Sutradhar 2004, eqn. (16)) for \(\ell= 1,\ldots,T - 1\). Note that as this estimator contains \(\hat{\beta }_{GQL,MCAR}\), both (11) and (13) have to be computed iteratively until convergence.

Further note that in the existing GEE approach, instead of (11), one solves the estimating equation

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \mu _{i}^{^{\prime} }(\beta )} {\partial \beta } V _{i}^{-1}(\beta,\hat{\alpha })(y_{ i} -\mu _{i}(\beta )) = 0, }$$
(14)

[Liang and Zeger 1986] where \(V _{i}(\beta,\hat{\alpha }) = A_{i}^{1/2}(\beta )Q_{i}(\alpha )A_{i}^{1/2}(\beta )\), with Q i (α) as the \(T_{i} \times T_{i}\) “working” correlation matrix of y i . It is, however, known that this GEE approach may sometimes encounter consistency breakdown (Crowder 1995) because of the difficulty in estimating the “working” correlation or covariance structure, leading to the failure of estimation of β or the non-convergence of β estimator to β. Furthermore, even if GEE β estimate becomes consistent, it may produce inefficient estimate than simpler independence assumption based moment or quasi-likelihood (QL) estimate (Sutradhar and Das 1999; Sutradhar 2011). Thus, one should be clear from these points that the GEE approach even if corrected for missing mechanism may encounter similar consistency and inefficiency in estimating the regression parameters.

We also remark that even though the non-response probability is not affected by the past history under the MCAR mechanism, the respective efficiency of GQL and GEE estimators will decrease if T i is very small as compared to the attempted complete duration T, that is, if T − T i is large. As far as the value of T i is concerned, it depends on the probability, P[R it = 1] which in general decreases due to the monotonic condition (9). This is because under this monotonic property (9) and following MCAR mechanism, one writes

$$\displaystyle\begin{array}{rcl} Pr[R_{it} = 1]& \equiv & Pr[R_{i1} = 1,R_{i2} = 1,\ldots,R_{it} = 1] \\ & =& \Pi _{j=2}^{t}P[R_{ ij} = 1], {}\end{array}$$
(15)

which gets smaller as t gets larger, implying that T i can be small as compared to T if P[R i j = 1] is far away down from 1 such as \(P[R_{ij} = 1] = 0.90\), say.

3.2 Inferences When Longitudinal Responses Are Subject to MAR

Unlike in the MCAR case, R it and y it are not independent under the MAR mechanism. That is

$$\displaystyle{ E[R_{it}(Y _{it} -\mu _{it}(\beta ))]\neq 0\;\;\mbox{ under MAR}. }$$
(16)

This is because

$$\displaystyle\begin{array}{rcl} & & E\left [R_{it}(Y _{it} -\mu _{it}(\beta ))\mid H_{i,t-1}(y)\right ] \\ & =& E_{Y _{it}}E\left [R_{it}(Y _{it} -\mu _{it}(\beta ))\mid Y _{it},H_{i,t-1}(y)\right ] \\ & =& E_{Y _{it}}\left [\{(Y _{it} -\mu _{it}(\beta ))\vert H_{i,t-1}(y)\}E\{R_{it}\vert Y _{it},H_{i,t-1}(y)\}\right ] \\ & =& E_{Y _{it}}\left [\{(Y _{it} -\mu _{it}(\beta ))\vert H_{i,t-1}(y)\}E\{R_{it}\vert H_{i,t-1}(y)\}\right ] {}\end{array}$$
(17)

as R it does not depend on Y it by the definition of MAR.

Next due to the monotonic property (9) of the response indicators

$$\displaystyle\begin{array}{rcl} & & E\left [R_{it}\mid H_{i,t-1}(y)\right ] \\ & =& P\left [R_{i1} = 1,R_{i2} = 1,\cdots \,,R_{i,t-1} = 1,R_{it} = 1\vert H_{i,t-1}(y)\right ] \\ & =& P(R_{i1} = 1)P\left [R_{i2} = 1\mid R_{i1} = 1;H_{i1}(y)\right ]\cdots \\ & \times & P\left [R_{it} = 1\mid R_{i1} = 1,\cdots \,,R_{i,t-1} = 1;H_{i,t-1}(y)\right ] \\ & =& \prod _{j=1}^{t}g_{ ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\gamma ) \\ & =& w_{it}\{H_{i,t-1}(y);\gamma \}, {}\end{array}$$
(18)

and

$$\displaystyle{ E_{Y _{it}}\left [(Y _{it} -\mu _{it}(\beta ))\vert H_{i,t-1}(y)\right ] = (\lambda _{it}(H_{i,t-1}(y),\beta,\rho ) -\mu _{it}(\beta )), }$$
(19)

where \(\lambda _{it}(H_{i,t-1}(y),\beta,\rho )\) is the conditional mean of Y it . In (18), one may, for example, use g ij (γ) as

$$\displaystyle\begin{array}{rcl} g_{ij}(\gamma )& =& Pr[(R_{ij} = 1)\vert R_{i1} = 1,\ldots,R_{i,j-1} = 1,H_{i,j-1}(y)] \\ & =& \frac{\exp (1 +\gamma y_{i,j-1})} {1 +\exp (1 +\gamma y_{i,j-1})}. {}\end{array}$$
(20)

Now because both \(w_{it}\{H_{i,t-1}(y);\gamma \}\) and \(\lambda _{it}(H_{i,t-1}(y),\beta,\rho )\) are functions of the past history of responses H i, t − 1(y), and because

$$\displaystyle{ E_{H_{i,t-1}(y)}[\lambda _{it}(H_{i,t-1}(y),\beta,\rho ) -\mu _{it}(\beta )] = 0, }$$
(21)

it then follows from (17), by (18) and (19), that

$$\displaystyle{ E[R_{it}(Y _{it} -\mu _{it}(\beta ))] = E_{H_{i,t-1}(y)}E\left [R_{it}(Y _{it} -\mu _{it}(\beta ))\mid H_{i,t-1}(y)\right ]\neq 0, }$$
(22)

unless \(w_{it}\{H_{i,t-1}(y);\gamma \}\) is a constant free of H i, t − 1(y), which is, however, impossible under MAR missing mechanism as opposed to the MCAR mechanism. Thus, \(E[R_{it}\{Y _{it} -\mu _{it}(\beta )\}]\neq 0\).

3.2.1 Existing Partially Standardized GEE Estimation for Longitudinal Data Subject to MAR

Note, however, that

$$\displaystyle\begin{array}{rcl} & & E\left \{ \frac{R_{it}} {w_{it}\{H_{i,t-1}(y);\gamma \}}(Y _{it} -\mu _{it}(\beta ))\right \} \\ & =& E_{H_{i,t-1}(y)}\left [\left (\lambda _{it}(H_{i,t-1}(y),\beta,\rho ) -\mu _{it}(\beta )\right )\right ] = 0.{}\end{array}$$
(23)

Now suppose that

$$\displaystyle{\Delta _{i} = \mbox{ diag}[\delta _{i1},\delta _{i2},\cdots \,,\delta _{iT_{i}}]\;\mbox{ with}\;\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\gamma \}}$$

implying that \(E[\Delta _{i}\vert H_{i}(y)] = I_{T_{i}}\), and where H i (y) is used to denote appropriate past history showing that the response indicators are generated based on observed responses only.

By observing the unconditional expectation property from (23), in the spirit of GEE [Liang and Zeger 1986], Robins et al. (1995, eqn. (10), p. 109) proposed a conditional inverse weights based PSGEE for the estimation of β which has the form

$$\displaystyle\begin{array}{rcl} & & \sum _{i=1}^{K}\frac{\partial E_{H_{i}(y)}E[\{\Delta _{i}\mu _{i}(\beta )\}^{\prime} \vert H_{i}(y)]} {\partial \beta } V _{i}^{-1}(\hat{\alpha })\{\Delta _{ i}(y_{i} -\mu _{i}(\beta ))\vert H_{i}(y)\} \\ & & =\sum _{ i=1}^{K}\frac{\partial \{\mu _{i}(\beta )\}^{\prime} } {\partial \beta } V _{i}^{-1}(\hat{\alpha })\{\Delta _{ i}(y_{i} -\mu _{i}(\beta ))\} = 0, {}\end{array}$$
(24)

(see also Paik 1997, eqn. (1), p. 1321). Note that we refer to the GEE in (23) as a partly or partially standardized GEE (PSGEE) because \(V _{i}(\alpha ) =\hat{ \mbox{ cov}}(Y _{i})\) used in this GEE is a partial weight matrix which ignores the missing mechanism, whereas \(\mbox{ cov}[\Delta _{i}(y_{i} -\mu _{i}(\beta ))]\) would be a full weight matrix.

Note that over the last decade many researchers have used this PSWGEE approach for studying various aspects of longitudinal data subject to non-response. See, for example, the studies by Rotnitzky et al. (1998), Preisser et al. (2002), and Birmingham et al. (2003), among others. However, even if the MAR mechanism is accommodated to develop an unbiased estimating function \(\Delta _{i}(y_{i} -\mu _{i}(\beta ))\) (for 0) to construct the fully standardized GEE (FSGEE), the consistency of the estimator of β may break down (see Crowder 1995 for complete longitudinal models) because of the use of “working” covariance matrix V i (α), whereas the true covariance matrix for y i is given by \(\mbox{ cov}[Y _{i}] = \Sigma _{i}(\rho )\). This can happen for those cases where α is not estimable. To be more clear, V i (α) is simply a “working” covariance matrix of y i , whereas a proper estimating equation must use the correct variance (or its consistent estimate) matrix of \(\{\Delta _{i}(y_{i} -\mu _{i}(\beta ))\}\).

To understand the roles of both missing mechanism and longitudinal correlation structure in constructing a proper estimating equation, we now provide following three estimating equations for β. The difficulties and/or advantages encountered by these equations are also indicated.

3.2.2 Partially Standardized GQL (PSGQL) Estimation for Longitudinal Data Subject to MAR

When V i (α) matrix in (24) is replaced with the true \(T_{i} \times T_{i}\) covariance matrix of the available responses, that is, \(\Sigma _{i}(\rho ) = \mbox{ cov}[Y _{i}]\), one obtains the PSGQL estimating equation given by

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \{\mu _{i}(\beta )\}^{\prime} } {\partial \beta } \Sigma _{i}^{-1}(\hat{\rho })\{\Delta _{ i}(y_{i} -\mu _{i}(\beta ))\} = 0, }$$
(25)

which also may produce biased and hence inconsistent estimate. This is because Σ i (ρ) may still be very different than the covariance matrix of the actual variable \(\{\Delta _{i}(y_{i} -\mu _{i}(\beta ))\}\). Thus, if the proportion of missing values is more, one may not get convergent solution to the estimating equation (25) and the consistency for β would break down (Crowder 1995). The convergence problems encountered by (24) would naturally be more severe as even in the complete data case V i (α) may not be estimable.

3.2.3 Partially Standardized Conditional GQL (PSCGQL) Estimation for Longitudinal Data Subject to MAR

Suppose that one uses conditional (on history) variance

$$\displaystyle{ \mbox{ cov}\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y) = \Sigma _{ich}^{{\ast}}(H_{ i}(y),\beta,\rho,\gamma ), }$$
(26)

to construct the estimating equation. Then following (25), one may write the PSCGQL estimating equation given by

$$\displaystyle{ \sum _{i=1}^{K}\frac{\partial \{\mu _{i}(\beta )\}^{\prime} } {\partial \beta }{ \Sigma _{ich}^{{\ast}}}^{-1}(H_{ i}(y),\beta,\rho,\gamma )\{\Delta _{i}(y_{i} -\mu _{i}(\beta ))\} = 0 }$$
(27)

It is, however, seen that

$$\displaystyle\begin{array}{rcl} & &{ \Sigma _{ich}^{{\ast}}}^{-1}(H_{ i}(y),\beta,\rho,\gamma )[\Delta _{i}(Y _{i} -\mu _{i}(\beta ))] \\ & \rightarrow &{ \Sigma _{ich}^{{\ast}}}^{-1}(H_{ i}(y),\beta,\rho,\gamma )E[\{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\}\vert H_{i}(y)] \\ & =&{ \Sigma _{ich}^{{\ast}}}^{-1}(H_{ i}(y),\beta,\rho,\gamma )[\lambda _{i}(H_{i}(y)) -\mu _{i}(\beta )] {}\end{array}$$
(28)

But,

$$\displaystyle\begin{array}{rcl} & & E_{H_{i}(y)}\left [\frac{\partial \{\mu _{i}(\beta )\}^{\prime} } {\partial \beta }{ \Sigma _{ich}^{{\ast}}}^{-1}(H_{ i}(y),\beta,\rho,\gamma )\right. \\ & \times & \left.[\lambda _{i}(H_{i}(y)) -\mu _{i}(\beta )]\right ]\neq 0, {}\end{array}$$
(29)

even though

$$\displaystyle{E_{H_{i}(y)}[\lambda _{i}(H_{i}(y)) -\mu _{i}(\beta )] = 0.}$$

Thus, the PSCGQL estimating equation (27) is not an unbiased equation for 0, and may produce bias estimate.

3.2.3 Computational formula for \(\Sigma _{ich}^{{\ast}}(\beta,\rho,\gamma )\)

For convenience, we first write

$$\displaystyle\begin{array}{rcl} \Delta _{i}& =& W_{i}^{-1}R_{ i},\;\mbox{ with} {}\\ W_{i}& =& \mbox{ diag}[w_{i1},w_{i2},\ldots,w_{iT_{i}}],\;\mbox{ and}\;R_{i} = \mbox{ diag}[R_{i1},\ldots,R_{iT_{i}}]. {}\\ \end{array}$$

It then follows that

$$\displaystyle\begin{array}{rcl} \Sigma _{ich}^{{\ast}}(\beta,\rho )& =& \mbox{ cov}[\{\Delta _{ i}\{(y_{i} -\mu _{i}(\beta ))\}\}\vert H_{i}(y)] \\ & =& W_{i}^{-1}\mbox{ cov}[\{R_{ i}(y_{i} -\mu _{i}(\beta ))\}\vert H_{i}(y)]W_{i}^{-1}.{}\end{array}$$
(30)

Now to compute the covariance matrix in the middle term in the right-hand side of (30), we first re-express \(R_{i}(y_{i} -\mu _{i}(\beta ))\) as

$$\displaystyle{R_{i}(y_{i} -\mu _{i}(\beta )) = [R_{i1}(y_{i1} -\mu _{i1}),\ldots,R_{it}(y_{it} -\mu _{it}),\ldots,R_{iT_{i}}(y_{iT_{i}} -\mu _{iT_{i}})]^{\prime},}$$

and compute the variances for its components as

$$\displaystyle{ \mbox{ var}[\{R_{i1}(y_{i1} -\mu _{i1})\}\vert y_{i1}] = 0, }$$
(31)

because R i1 = 1 always and y i1 is random. In the Poisson case \(\sigma _{i,11} =\mu _{i1}\) and in the binary case \(\sigma _{i,11} =\mu _{i1}(1 -\mu _{i1})\), with appropriate formula for μ i1 in a given case. Next for t = 2, , T i ,

$$\displaystyle\begin{array}{rcl} & & \mbox{ var}[R_{it}(y_{it} -\mu _{it})\vert H_{i,t-1}(y)] = \mbox{ var}[R_{it}\vert H_{i,t-1}(y)]\mbox{ var}[y_{it}\vert H_{i,t-1}(y)] \\ & +& {E}^{2}[R_{ it}\vert H_{i,t-1}(y)]\mbox{ var}[(y_{it})\vert H_{i,t-1}(y)] + \mbox{ var}[R_{it}\vert H_{i,t-1}(y)]{E}^{2}[(y_{ it} -\mu _{it})\vert H_{i,t-1}(y)] \\ & =& w_{it}(1 - w_{it})\sigma _{ic,tt} + w_{it}^{2}\sigma _{ ic,tt} + w_{it}(1 - w_{it})\{\lambda _{it} -\mu {_{it}\}}^{2} \\ & =& w_{it}[\sigma _{ic,tt} + {(\lambda _{it} -\mu _{it})}^{2}] - w_{ it}^{2}{(\lambda _{ it} -\mu _{it})}^{2}, {}\end{array}$$
(32)

where, given the history, λ it and σ ic, tt are the conditional mean and variance of y it , respectively.

Furthermore, all pairwise covariances conditional on the history H i, t − 1(y) may be computed as follows. For u < t,

$$\displaystyle\begin{array}{rcl} & & \mbox{ cov}[\{R_{iu}(y_{iu} -\mu _{iu}),R_{it}(y_{it} -\mu _{it})\}\vert H_{i,t-1}(y)] \\ & =& E[\{R_{iu}R_{it}(y_{iu} -\mu _{iu})(y_{it} -\mu _{it})\}\vert H_{i,t-1}(y)] - E[\{R_{iu}(y_{iu} -\mu _{iu})\}\vert H_{i,t-1}(y)] \\ & \times & E[\{R_{it}(y_{it} -\mu _{it})\}\vert H_{i,t-1}(y)] \\ & =& (y_{iu} -\mu _{iu})E[\{R_{it}(y_{it} -\mu _{it})\}\vert H_{i,t-1}(y)] - [w_{iu}(y_{iu} -\mu _{iu})][w_{it}(\lambda _{it} -\mu _{it})] \\ & =& [(y_{iu} -\mu _{iu})(1 - w_{iu})][w_{it}(\lambda _{it} -\mu _{it})] {}\end{array}$$
(33)

3.2.4 A Fully Standardized GQL (FSGQL) Approach

All three estimating equations, namely PSGEE (24), PSGQL (25), and PSCGQL (27) may produce bias estimates, PSGEE being the worst. The reasons for the poor performance of PSGEE are two fold. This is because it completely ignores the missing mechanism and uses a working correlation matrix to accommodate the longitudinal nature of the available data. As opposed to the PSGEE approach, PSGQL approach uses the true correlation structure under a class of auto-correlations but similar to the PSGEE approach it also ignores the missing mechanism. As far as the PSCGQL approach it uses a correct conditional covariance matrix which accommodates both missing mechanism and correlation structure. However, the resulting estimating equation may not unbiased for zero as the history of the responses involved in covariance matrix make a weighted distance function which is not unbiased.

To remedy the aforementioned problems, it is therefore important to use the correct covariance matrix or its consistent estimate to construct the weight matrix by accommodating both missing mechanism and longitudinal correlations of the repeated data. For this to happen, because the distance function is unconditionally unbiased for zero, i.e.,

$$\displaystyle{E_{H_{i}(y)}E[\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)] = 0,}$$

one must use the unconditional covariance matrix of \(\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\) to compute the incomplete longitudinal weight matrix, for the construction of a desired unbiased estimating equation. Let \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) denote this unconditional covariance matrix which is computed by using the formula

$$\displaystyle\begin{array}{rcl} \Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )& =& \mbox{ cov}\left \{\Delta _{ i}(Y _{i} -\mu _{i}(\beta ))\right \} = E_{H_{i}(y)}[\mbox{ cov}\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)] \\ & +& \mbox{ cov}_{H_{i}(y)}[E\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)]. {}\end{array}$$
(34)

In the spirit of Sutradhar (2003), we propose the FSGQL estimating equation for β given by

$$\displaystyle\begin{array}{rcl} & & \sum _{i=1}^{K}\frac{\partial E_{H_{i}(y)}E[\{\Delta _{i}\mu _{i}(\beta )\}^{\prime} \vert H_{i}(y)]} {\partial \beta } {[\mbox{ cov}\{\Delta _{i}(y_{i} -\mu _{i})\}]}^{-1}\{\Delta _{ i}(y_{i} -\mu _{i}(\beta ))\} \\ & =& \sum _{i=1}^{K}\frac{\partial \mu ^{\prime} _{i}} {\partial \beta } {[\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )]}^{-1}\{\Delta _{ i}(y_{i} -\mu _{i}(\beta ))\} = 0, {}\end{array}$$
(35)

where \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) is yet to be computed. This estimating equation is solved iteratively by using

$$\displaystyle\begin{array}{rcl} \hat{\beta }_{FSGQL}(m + 1)& =& \hat{\beta }_{FSGQL}(m) + \left [\sum _{i=1}^{K}\frac{\partial \mu ^{\prime} _{i}(\beta )} {\partial \beta } {[\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )]}^{-1}\frac{\partial \mu _{i}(\beta )} {\partial \beta ^{\prime} } \right ]_{m}^{-1} \\ & & \times \left [\sum _{i=1}^{K}\frac{\partial \mu _{i}^{^{\prime} }} {\partial \beta } {[\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )]}^{-1}\Delta _{ i}(y_{i} -\mu _{i}(\beta ))\right ]_{m} {}\end{array}$$
(36)
3.2.4 Computation of \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma ) = \mbox{ cov}[\Delta _{i}(y_{i} -\mu _{i})]\)

Rewrite (34) as

$$\displaystyle\begin{array}{rcl} \Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )& =& E_{ H_{i}(y)}[\mbox{ cov}\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)] \\ & +& \mbox{ cov}_{H_{i}(y)}[E\left \{\Delta _{i}(Y _{i} -\mu _{i}(\beta ))\right \}\vert H_{i}(y)] \\ & =& E_{H_{i}(y)}[\Sigma _{ich}^{{\ast}}(\beta,\rho )] + \mbox{ cov}_{ H_{i}(y)}[E_{ich}(\beta,\rho )],{}\end{array}$$
(37)

where \(\Sigma _{ich}^{{\ast}}(\beta,\rho )\) is constructed by (30) by using the formulas from (31) to (33), and E i c h (β, ρ) has the form \(E_{ich}(\beta,\rho ) = [(y_{i1} -\mu _{i1}),(\lambda _{i2} -\mu _{i2}),\ldots,(\lambda _{iT_{i}} -\mu _{iT_{i}})]^{\prime} \).

It then follows that the components of the \(T_{i} \times T_{i}\) unconditional covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) are given by

$$\displaystyle\begin{array}{rcl} & & \mbox{ cov}[\delta _{iu}(y_{iu} -\mu _{iu}),\delta _{it}(y_{it} -\mu _{it})] {} \\ & =& \left \{\begin{array}{l} \mbox{ var}_{y_{i1}}(y_{i1} -\mu _{i1}) =\sigma _{i11}\qquad \mbox{ for u=t=1} \\ E_{H_{i}(y)}[w_{it}^{-1}\{\sigma _{ic,tt} + {(\lambda _{it} -\mu _{it})}^{2}\} - {(\lambda _{it} -\mu _{it})}^{2}] + E_{H_{i}(y)}{(\lambda _{it} -\mu _{it})}^{2},\qquad \mbox{ for u=t=2,}\ldots \\ E_{H_{i}(y)}[(y_{i1} -\mu _{i1})(\lambda _{it} -\mu _{it})]\qquad \mbox{ for u=1,t=2,}\ldots \\ E_{H_{i}(y)}[(w_{iu}^{-1} - 1)\{(y_{iu} -\mu _{iu})(\lambda _{it} -\mu _{it})\}] \\ + E_{H_{i}(y)}[(\lambda _{iu} -\mu _{iu})(\lambda _{it} -\mu _{it})],\qquad \mbox{ for u=2,}\ldots;\;u < t \end{array} \right.\\ \end{array}$$
(38)
3.2.4.1 (a). Example of \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) under linear longitudinal models with T = 2

Note that \(R_{i1} = r_{i1} = 1\) always. But R i2 can be 1 or 0 and under MAR, its probability depends on y i1. Consider

$$\displaystyle{Pr[R_{i1} = 1] = g_{i1} = w_{i1} = 1.0}$$
$$\displaystyle{ P[R_{i2} = 1\vert r_{i1} = 1,y_{i1}] = g_{i2}(\gamma ) = \frac{\exp \{1 +\gamma y_{i1}\}} {1 +\exp \{ 1 +\gamma y_{i1}\}} }$$
(39)

by (20), yielding

$$\displaystyle\begin{array}{rcl} w_{i2}& =& E[R_{i2}\vert H_{i1(y)}] = P[R_{i1} = 1,R_{i2} = 1\vert H_{i1}(y)] {}\\ & =& P[R_{i1} = 1]P[R_{i2} = 1\vert H_{i1}(y)] = g_{i1}g_{i2}(y_{i1}), {}\\ \end{array}$$

(see also (18)). With regard to the longitudinal model for potential responses \(y_{i1},y_{i2}\), along with their non-stationary (time dependent covariates), consider the model as:

$$\displaystyle{ y_{it} \sim (x^{\prime} _{it}\beta, \frac{{\sigma }^{2}} {1 {-\rho }^{2}}),\;\mbox{ corr}(Y _{it},Y _{i,t+\ell}) {=\rho }^{\ell}. }$$
(40)

Assuming normal distribution, one may write

$$\displaystyle\begin{array}{rcl} E[Y _{it}\vert H_{i,t-1}]& =& x^{\prime} _{it}\beta + [\mbox{ cov}(y_{it},y_{i,t-\ell})]{[\mbox{ var}(y_{i,t-\ell})]}^{-1}(y_{ i,t-\ell}- x^{\prime} _{i,t-\ell}\beta ) \\ & =& x^{\prime} _{it}\beta + [ \frac{{\sigma {}^{2}\rho }^{\ell}} {1 {-\rho }^{2}}] = x^{\prime} _{it}\beta {+\rho }^{\ell}[y_{i,t-\ell}- x^{\prime} _{i,t-\ell}\beta ]. {}\end{array}$$
(41)

When the response y it depends on its immediate history, the conditional mean has the formula

$$\displaystyle{E[Y _{it}\vert y_{i,t-1}] =\lambda _{it} = x^{\prime} _{it}\beta +\rho (y_{i,t-1} - x^{\prime} _{i,t-1}\beta ),}$$

implying that the unconditional mean is given by \(\mu _{it} = E[Y _{it}] = x^{\prime} _{it}\beta\), which is the same as the mean in (40), as expected.

Now following (38), we provide the elements of the 2 ×2 matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) as

$$\displaystyle\begin{array}{rcl} \sigma _{i11}^{{\ast}}& =& \frac{{\sigma }^{2}} {1 {-\rho }^{2}} \\ \sigma _{i12}^{{\ast}}& =& \sigma _{ i21}^{{\ast}} =\rho \mbox{ var}[Y _{ i1} - x_{i11}\beta ] =\rho \frac{{\sigma }^{2}} {1 {-\rho }^{2}} \\ \sigma _{i22}^{{\ast}}& =& E_{ y_{i1}}[w_{i2}^{-1}\{\mbox{ var}(Y _{ i2}\vert y_{i1}) + {(\lambda _{i2} -\mu _{i2})}^{2}\}] \\ & =& E_{y_{i1}}[\{1 +\frac{1} {\exp (1 +\gamma y_{i1})}\}\{{\sigma }^{2} {+\rho }^{2}{(y_{ i1} - x_{i11}\beta )}^{2}\}] \\ & =& \frac{{\sigma }^{2}} {1 -{\rho }^{2}} +{\sigma }^{2}E_{ 1} {+\rho }^{2}E_{ 2}, {}\end{array}$$
(42)

where

$$\displaystyle\begin{array}{rcl} E_{1} =\int [ \frac{1} {\exp (1 +\gamma y_{i1})}]g_{N}(y_{i1})dy_{i1},\;\mbox{ and}& & {}\\ E_{2} =\int [\frac{\{y_{i1} - x{_{i11}\beta \}}^{2}} {\exp (1 +\gamma y_{i1})} ]g_{N}(y_{i1})dy_{i1},& & {}\\ \end{array}$$

\(g_{N}(y_{i1})\) being the normal (say) density of y i1. Thus, \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) has the formula

$$\displaystyle{ \Sigma _{i}^{{\ast}}(\beta,\rho,\gamma ) = \frac{{\sigma }^{2}} {1 {-\rho }^{2}}\left [\begin{array}{cc} 1& \rho \\ \rho &\{1 + (1 {-\rho }^{2})E_{1} {+ \frac{{\rho }^{2}(1{-\rho }^{2})} {\sigma }^{2}} E_{2}\} \end{array} \right ], }$$
(43)

Note that in the complete longitudinal case w i2 would be 1 and \(\sigma _{i22}^{{\ast}}\) would reduce to \(\frac{{\sigma }^{2}} {1{-\rho }^{2}}\), leading to

$$\displaystyle{ \Sigma _{i}^{{\ast}}(\beta,\rho,\gamma ) = \Sigma _{ i}(\beta,\rho ) = \frac{{\sigma }^{2}} {1 {-\rho }^{2}}\left [\begin{array}{cc} 1& \rho \\ \rho &1 \end{array} \right ], }$$
(44)

which is free from β in this linear model case, and the PSGEE (24) uses a “working” version of (44), namely

$$\displaystyle{ V _{i}(\alpha ) = \frac{{\sigma }^{2}} {1 {-\rho }^{2}}\left [\begin{array}{cc} 1& \alpha \\ \alpha &1 \end{array} \right ], }$$
(45)

whereas the FSGQL estimating equation (35) would use \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) from (43). This shows the effect of missing mechanism in the construction of the weight matrix for the estimating equation.

3.2.4.2 (b). Example of \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) under binary longitudinal AR(1) model with T = 2

Consider a binary AR(1) model with

$$\displaystyle{ \lambda _{it} = E[Y _{it}\vert y_{i,t-1}] =\mu _{it} +\rho (y_{i,t-1} -\mu _{i,t-1}),\;t = 2,\ldots,T, }$$
(46)

where \(\mu _{it} = \frac{\exp (x^{\prime} _{it}\beta )} {1+\exp (x^{\prime} _{it}\beta )}\), for all \(t = 1,\ldots,T\).

Now considering y i1 as fixed, by using (31)–(33) we first compute the history-dependent conditional covariance matrix \(\Sigma _{ich}(\beta,\rho ) = \mbox{ cov}[\{\Delta _{i}(y_{i} -\mu _{i})\}\vert H_{i}(y)]\) as:

$$\displaystyle\begin{array}{rcl} & & \mbox{ var}[\delta _{i1}(y_{i1} -\mu _{i1})] = 0 \\ & & \mbox{ var}[\{\delta _{i2}(y_{i2} -\mu _{i2})\}\vert y_{i1}] = \frac{1} {w_{i2}}[\lambda _{i2}(1 -\lambda _{i2}) {+\rho }^{2}{(y_{ i1} -\mu _{i1})}^{2}] {-\rho }^{2}{(y_{ i1} -\mu _{i1})}^{2} \\ & & \mbox{ cov}[\{\delta _{i1}(y_{i1} -\mu _{i1}),\delta _{i2}(y_{i2} -\mu _{i2})\}\vert y_{i1}] = 0, {}\end{array}$$
(47)

yielding

$$\displaystyle\begin{array}{rcl} & & E_{H_{i}(y)}[\Sigma _{ich}(\beta,\rho )] \\ & =& \left \{\begin{array}{l} E_{y_{i1}}[\sigma _{ich,11}] = E_{y_{i1}}[0] = 0 \\ E_{y_{i1}}[\sigma _{ich,22}] = E_{y_{i1}}[ \frac{1} {w_{i2}} [\lambda _{i2}(1 -\lambda _{i2}) {+\rho }^{2}{(y_{i1} -\mu _{i1})}^{2}] {-\rho }^{2}{(y_{i1} -\mu _{i1})}^{2}] \\ E_{y_{i1}}[\sigma _{ich,12}] = E_{y_{i1}}[0] = 0 \end{array} \right.{}\end{array}$$
(48)

Next because

$$\displaystyle{E_{ich}(\beta,\rho ) = E\{\Delta _{i}(y_{i} -\mu _{i})\}\vert H_{i}(y)] = [(y_{i1} -\mu _{i1}),(\lambda _{i2} -\mu _{i2})]^{\prime},}$$

one obtains

$$\displaystyle\begin{array}{rcl} & & \mbox{ cov}_{H_{i}(y)}[E_{ich}(\beta,\rho )] \\ & =& \left \{\begin{array}{l} \mbox{ var}_{y_{i1}}[y_{i1} -\mu _{i1}] =\mu _{i1}[1 -\mu _{i1}] \\ \mbox{ cov}_{y_{i1}}[(y_{i1} -\mu _{i1}),(\lambda _{i2} -\mu _{i2})] =\rho \mbox{ var}_{y_{i1}}[y_{i1} -\mu _{i1}] =\rho \mu _{i1}[1 -\mu _{i1}] \\ \mbox{ var}_{y_{i1}}[\lambda _{i2} -\mu _{i2}] = \mbox{ var}_{y_{i1}}[\rho (y_{i1} -\mu _{i1})] {=\rho }^{2}[\mu _{i1}(1 -\mu _{i1})]. \end{array} \right.{}\end{array}$$
(49)

By combining (48) and (49), it follows from (38) that the 2 ×2 unconditional covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) has the form

$$\displaystyle\begin{array}{rcl} \mbox{ var}[\delta _{i1}(y_{i1} -\mu _{i1})]& =& \mu _{i1}[1 -\mu _{i1}] \\ \mbox{ var}[\delta _{i2}(y_{i2} -\mu _{i2})]& =& E_{y_{i1}}[ \frac{1} {w_{i2}}\{\lambda _{i2}(1 -\lambda _{i2}) {+\rho }^{2}{(y_{ i1} -\mu _{i1})}^{2}\}] \\ & =& [\mu _{i2}(1 -\mu _{i2})]E[w_{i2}^{-1}] \\ & & +\rho (1 - 2\mu _{i2})E[w_{i2}^{-1}(y_{ i1} -\mu _{i1})] \\ & =& [\mu _{i2}(1 -\mu _{i2})]E_{1y} +\rho (1 - 2\mu _{i2})[E_{2y} -\mu _{i1}E_{1y}]{}\end{array}$$
(50)
$$\displaystyle\begin{array}{rcl} \mbox{ cov}[\delta _{i1}(y_{i1} -\mu _{i1}),\delta _{i2}(y_{i2} -\mu _{i2})]& =& \rho [\mu _{i1}\{1 -\mu _{i1}\}],{}\end{array}$$
(51)

where

$$\displaystyle\begin{array}{rcl} E_{1y} = E[w_{i2}^{-1}]& =& \{1 +\exp (-1) +\mu _{ i1}\exp (-1)(\exp (-\gamma ) - 1)\} {}\\ E_{2y} = E[ \frac{y_{i1}} {w_{i2}}]& =& \mu _{i1}\{1 +\exp (-\gamma - 1)\}. {}\\ \end{array}$$
3.2.4.2.1 General formula for \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) under the binary AR(1) model

In general, it follows from (38) that the elements of the \(T_{i} \times T_{i}\) unconditional covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\) under AR(1) binary model are given by

$$\displaystyle\begin{array}{rcl} & & \mbox{ cov}[\delta _{iu}(y_{iu} -\mu _{iu}),\delta _{it}(y_{it} -\mu _{it})] \\ & \equiv & \left \{\begin{array}{l} \sigma _{i,11}^{{\ast}} =\mu _{i1}[1 -\mu _{i1}] \\ \sigma _{i,tt}^{{\ast}} = E_{H_{i}(y)}[w_{it}^{-1}\{\mu _{it}(1 -\mu _{it}) +\rho (1 - 2\mu _{it})(y_{i,t-1} -\mu _{i,t-1})\},(\mbox{ for}\;t = 2,\ldots,T_{i}) \\ \sigma {\ast}_{i,ut} {=\rho \rho }^{t-1-u}\mu _{iu}(1 -\mu _{iu}),\;(\mbox{ for}\;u = 1 < t) \\ \sigma _{i,ut}^{{\ast}} {=\rho { }^{2}\rho }^{t-u}\mu _{i(u-1)}(1 -\mu _{i(u-1)}),\;(\mbox{ for}\;1 < u < t). \end{array} \right.{}\end{array}$$
(52)

3.3 An Empirical Illustration

First, to illustrate the performance of the existing PSGEE (24) approach, we refer to some of the simulation results reported by Sutradhar and Mallick (2010). It was shown that this approach may produce highly biased and hence inconsistent regression estimates. In fact these authors also demonstrated that PSGEE(I) (independence assumption based) approach produces less biased estimates than any “working” correlation structures based PSGEE approaches. For example, we consider here their simulation design chosen as

3.3 Simulation Design

K = 100, T = 4, p = 2, q = 1, γ = 4, ρ = 0. 4, 0. 8, \(\beta _{1} =\beta _{2} = 0\) along with two time-dependent covariates:

$$\displaystyle{x_{it1} = \left \{\begin{array}{ll} \frac{1} {2} & \mbox{ for $i = 1,\cdots \,, \frac{K} {4} $; $t = 1,2$} \\ 0 &\mbox{ for $i = 1,\cdots \,, \frac{K} {4} $; $t = 3,4$} \\ -\frac{1} {2} & \mbox{ for $i = \frac{K} {4} + 1,\cdots \,, \frac{3K} {4} $; $t = 1$} \\ 0 &\mbox{ for $i = \frac{K} {4} + 1,\cdots \,, \frac{3K} {4} $; $t = 2,3$} \\ \frac{1} {2} & \mbox{ for $i = \frac{K} {4} + 1,\cdots \,, \frac{3K} {4} $; $t = 4$} \\ \frac{t} {2T} &\mbox{ for $i = \frac{3K} {4} + 1,\cdots \,,K$; $t = 1,\cdots \,,4$} \end{array} \right.}$$

and

$$\displaystyle{x_{it2} = \left \{\begin{array}{ll} \frac{t-2.5} {2T} &\mbox{ for $i = 1,\cdots \,, \frac{K} {2} $; $t = 1,\cdots \,,4$} \\ 0 &\mbox{ for $i = \frac{K} {2} + 1,\cdots \,,K$; $t = 1,2$} \\ \frac{1} {2} & \mbox{ for $i = \frac{K} {2} + 1,\cdots \,,K$; $t = 3,4$} \end{array} \right.}$$

Details on the MAR based incomplete binary data generation, one may be referred to Sutradhar and Mallick (2010, Sect. 2.1). Based on 1,000 simulations, the PSGEE estimates obtained from (24) and PSGEE (I) obtained from (24) by using zero correlation are displayed in Table 1.

Table 1 Simulated means (SMs), simulated standard errors (SSEs), and simulated mean squared errors (SMSEs) for “working” correlations based PSGEE (24) estimates, when the incomplete longitudinal responses were generated based on MAR mechanism (20) with γ = 4. 0 and a longitudinal AR(1) correlation structure with correlation index parameter ρ; \(\beta _{1} =\beta _{2} = 0;\) based on 1,000 simulations

These results show that the PSGEE estimates for \(\beta _{1} = 0\) and β 2 = 0 are highly biased. For example, when ρ = 0. 8, the estimates of β 1 and β 2 are − 0. 213 and − 0. 553, respectively. These estimates are inconsistent and unacceptable. Note that these biases are caused by the wrong correlation matrix used to construct the PSGEE (24), whereas this PSGEE provides almost unbiased estimates when data are treated to be independent even if truly they are not so. However the standard errors of the PSGEE(I) estimates appear to be large and hence it may provide inefficient estimates. In fact when the proportion of missing values is large, the PSGEE(I) will also encounter estimation breakdown or it will produce biased estimates. This is verified by a simulation study reported by Mallick et al. (2013). The reason for this inconsistency encountered by PSGEE and PSGEE(I) is the failure of accommodating MAR mechanism in the covariance matrix used as the longitudinal weights.

As a remedy to this inconsistency, we have developed a FSGQL (35) estimating equation by accommodating both MAR mechanism and longitudinal correlation structure in constructing the weight matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\gamma )\). This FSGQL equation would provide consistent and efficient regression estimates. For simplicity, Mallick et al. (2013) have demonstrated through a simulation study that FSGQL(I) approach by using ρ = 0 in \(\Sigma _{i}^{{\ast}}(\beta,\rho = 0,\gamma )\) produces almost unbiased estimates with small variances. This provides a guidance that ignoring missing mechanism in constructing the weight matrix would provide detrimental results, whereas ignoring longitudinal correlations does not appear to cause any significant loss.