Abstract
It is well known that in the complete longitudinal setup, the so-called working correlation-based generalized estimating equations (GEE) approach may yield less efficient regression estimates as compared to the independence assumption-based method of moments and quasi-likelihood (QL) estimates. In the incomplete longitudinal setup, there exist some studies indicating that the use of the same “working” correlation-based GEE approach may provide inconsistent regression estimates especially when the longitudinal responses are at risk of being missing at random (MAR). In this paper, we revisit this inconsistency issue under a longitudinal binary model and empirically examine the relative performance of the existing weighted (by inverse probability weights for the missing indicator) GEE (WGEE), a fully standardized GQL (FSGQL) and conditional GQL (CGQL) approaches. In the comparative study, we consider both stationary and non-stationary covariates, as well as various degrees of missingness and longitudinal correlation in the data.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Generalized estimating equations
- Generalized quasi-likelihood
- Logistic regression
- Missing at random
- Monotonic missingness
1 Introduction
Consider a longitudinal binary data setup where y it is the Bernoulli response for the i-th (i = 1, ⋯ , K) individual at the t-th time point (t = 1, ⋯ , T) and \(x_{it} = {(x_{it1},\cdots \,,x_{itu},\cdots \,,x_{itp})}^{\prime }\) is the associated p-dimensional covariate vector. When the longitudinal data are complete (that is, there are no missing responses from any of the individuals in the study), an estimating approach such as generalized quasi-likelihood (GQL) can be used to obtain an estimate of the regression parameter vector, β, that is both consistent and efficient, provided that the correlation structure associated with the repeated binary responses is known (see Sutradhar 2003). In order to describe the longitudinal correlation in the data, it seems reasonable to assume deterioration in the association between observations on the same individuals that are further apart in time. Thus, to achieve this, we let ρ be a longitudinal correlation parameter and consider a conditional linear binary dynamic (CLBD) model proposed by Zeger et al. (1985) (see also Qaqish 2003), which is given by
with \(\mu _{it} = exp(x_{it}^{\prime} \beta )/[1 + exp(x_{it}^{\prime} \beta )]\) for t = 1, ⋯ , T. According to model (1), the marginal means and variances of y it are
and
while the correlations between Y it and Y i, t + l for \(l = 1,\cdots \,,T - 1\), \(t = 1,\cdots \,,T - l\) are given by
The means, variances, and covariances defined by (2) through (4) are nonstationary, since they are all functions of time-dependent covariates {x it }. However, if the σ i, tt are not extremely different, the correlations given by (4) assume a behavior that is analogous to an autoregressive process of order one, AR(1). Under the present model, the correlation parameter ρ must satisfy the range restriction
Suppose that we let μ i and Σ i (ρ) represent the mean vector and the covariance matrix of the complete data vector Y i , where \(\mu _{i} = {(\mu _{i1},\cdots \,,\mu _{it},\cdots \,,\mu _{iT})}^{\prime }\) and \(\Sigma _{i}(\rho ) = A_{i}^{1/2}C_{i}(\rho )A_{i}^{1/2}\). Here, C i (ρ) is the T ×T correlation matrix based on (4), and \(A_{i} = diag(\sigma _{i,11},\cdots \), \(\sigma _{i,tt},\cdots \,,\sigma _{i,TT})\). An estimator for β that is both consistent and highly efficient can be obtained by solving the GQL estimating equation
(Sutradhar 2003).
In practice, it is typically the case that some of the responses associated with each of a number of individuals in the study may be missing. To acknowledge this phenomenon during the data collection process, we introduce an indicator variable R it , that takes on a value of one if Y it is observed, and zero otherwise. For purposes of our investigation here, we adopt the not-so-unreasonable assumption that all individuals provide a response at the first time point, so that R i1 = 1 for all i = 1, ⋯ ,K. We also assume monotonic missingness, suggesting that the R it satisfy the inequality \(R_{i1} \geq R_{i2} \geq \cdots \geq R_{it} \geq \cdots \geq R_{iT}\). Thus, if responses are no longer observed for the i-th individual after the j-th time point, for this individual we would have available y it for \(t = 1,\cdots \,,T_{i} = j\).
Regarding the missing data mechanism, at this time we distinguish between responses that are missing completely at random, MCAR, and those that are missing at random, MAR (see Fitzmaurice et al. 1996; Paik 1997; Rubin 1976). When the responses are MCAR, the indicator variable R it reflecting the presence or absence of Y it does not depend on the previous responses \(Y _{i1},\cdots \,,Y _{i,t-1}\). In this instance, if we define \(R_{i} = diag(R_{i1},\cdots \,,R_{iT})\) and incorporate this matrix into the estimating equation given by (6) to yield
it is still possible to obtain an unbiased estimator for β that will be consistent and efficient. Note that Σ i (ρ) is a T ×T matrix with appropriate variance and covariance entries in the first T i rows and T i columns and zeroes in the last T − T i rows and columns. On the other hand, when the missing data mechanism for the responses is assumed to be MAR (implying that R it does depend on the previous responses \(Y _{i1},\cdots \,,Y _{i,t-1}\)), it can be shown that \(E[R_{it}(Y _{it} -\mu _{it})]\neq 0\). In this situation, the estimator for β based on (7) will be biased and inconsistent. Upon realizing this to be the case, many studies have attempted to correct for this problem by using a modified inverse probability-weighted distance function
where \(H_{i,t-1}(y) \equiv H_{i,t-1} = (Y _{i1},\cdots \,,Y _{i,t-1})\), so that the expectation of (8) is zero. Following Robins et al. (1995), for data that are MAR, we can write the probability weight \(w_{it}\left \{H_{i,t-1}(y);\alpha \right \} = w_{it}\) as a function of past responses as follows. Specifically, imagine that the probability that the i-th individual responds at the j-th time point depends on the past lag q responses, where q ≤ j − 1. Letting \(g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha )\) represent this probability, we can write \(g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha ) = P(R_{ij} = 1\mid R_{i1} = 1,\cdots \,,R_{i,j-1} = 1;y_{i,j-1},\cdots \,,y_{i,j-q})\), which can be modeled as
where α l is a parameter that reflects the dependence of R ij on y i, j − l for all l = 1, ⋯ , q. Robins et al. (1995) set
Since monotonic missingness is assumed
or, alternatively
Using model (1) and \(g_{ij}(y_{i,j-1},\cdots \,,y_{i,j-q};\alpha )\) given in (9), (12) becomes
which implies that
thus giving
Similarly
suggesting that combining (15) and (16) yields
This unconditional unbiasedness property of the weighted distance or estimating function \(\left [\frac{R_{it}(Y _{it}-\mu _{it})} {w_{it}} \right ]\) motivated many researchers to write a weighted generalized estimating equation (WGEE) and solve it for the β involved in those μ it . The WGEE, first developed by Robins et al. (1995), is reproduced in brief, in Sect. 2.1. Note that to construct the WGEE, Robins et al. (1995) suggested the specification of a user-selected covariance matrix of \(\{(Y _{it} -\mu _{it}),t = 1,\ldots,T_{i}\}\) by pretending that as though the data were complete. Recently, Sutradhar and Mallick (2010) have found that this widely used WGEE approach produces highly biased regression estimates, indicating consistency break down. In this paper, specifically in Sect. 3, we carry out an extensive simulation study considering various degrees of missingness and examine further the inconsistency problem encountered by the WGEE approach.
In Sect. 2.2, we consider a simpler version of a fully standardized GQL (FSGQL) approach discussed by Sutradhar (2013, Sect. 3.2.4) by constructing the weight matrix, that is, unconditional covariance matrix of \(\{\left [\frac{R_{it}(Y _{it}-\mu _{it})} {w_{it}} \right ],t = 1,\ldots,T_{i}\}\) using longitudinal independence (i. e. , ρ = 0). We will refer to this as the FSGQL(I) approach. In the simulation study in Sect. 3, we examine the relative performance of this FSGQL(I) approach with the existing WGEE as well as WGEE(I) (independence assumption-based WGEE) approach.
Further note that if the correlation model for the complete data were known through λ it in (14), one could exploit the conditional distance function \(\left [\frac{R_{it}(Y _{it}-\lambda _{it})} {w_{it}} \mid H_{i,t-1}\right ]\) to construct a conditional-weighted GQL (CWGQL) estimating equation and solve such an equation to obtain consistent regression estimates. We discuss this approach in Sect. 2.3 and include it in the simulation study in Sect. 3 to examine its performance as compared to the aforementioned approaches.
2 Estimation
2.1 WGEE Approach
Robins et al. (1995, Eq. (10), p. 109) used the result in (17) to propose the WGEE
(see also Paik 1997, Eq. (1), p. 1321) where \(\Delta _{i} = diag(\delta _{i1},\delta _{i2},\cdots \,,\delta _{iT})\) with \(\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\alpha \}= R_{it}/w_{it}\). The quantity \(V _{i}{(\alpha }^{{\ast}})\) is a working covariance matrix of Y i (see Liang and Zeger 1986) that is used in an effort to increase the efficiency of the estimates. Of note is the fact while Robins et al. (1995) suggested a WGEE, they did not account for the missingness in the data when specifying \(V _{i}{(\alpha }^{{\ast}})\); they simply based their working covariance matrix on the complete data formulae. For this reason, this WGEE approach may be referred to as a partially standardized GEE (PSGEE) approach. See the previous article by Sutradhar (2013) in this chapter for details on the use of PSGEE. Note that a user-selected covariance matrix based on complete data that ignores the missing mechanism leads the WGEE to be unstable, in particular, when the proportion of missing data is high, causing breakdown in estimation, i.e., breakdown in consistency. However, this inconsistency issue has not been adequately addressed in the literature including the studies by Robins et al. (1995), Paik (1997), Rotnitzky et al. (1998) and Birmingham et al. (2003). One of the main reasons is that none of the studies used any stochastic correlation structure in conjunction with the missing mechanism to model the binary data in the incomplete longitudinal setup.
In this paper, in order to investigate the effect on the estimates of the regression parameter vector, we propose to replace the working covariance matrix \(V _{i}{(\alpha }^{{\ast}})\) in (18) with a proper unconditional covariance matrix that accommodates the missingness in the data. The proposed approach is presented in the next section.
2.2 FSGQL Approach
The unconditional unbiasedness property in (17), that is,
motivates one to develop a FSGQL estimating equation for β, which requires the computation of the unconditional variance of \(\delta _{it}(Y _{it} -\mu _{it})\). Thus, for all t = 1, …, T i , we now compute the unconditional covariance matrix, namely
by using the formula
where H i (y) denotes the history of responses. For computational details under any specified correlation model, we refer to the previous article by Sutradhar (2013, Sect. 3.2.4). For the binary AR(1) model in (1), the elements of the T i ×T i unconditional covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\alpha )\) are given by
Note that the formulas in (19) under the present AR(1) binary model may be verified directly. For example, we compute the t-th diagonal element of the \(\Sigma _{i}^{{\ast}}(\beta,\rho,\alpha )\) matrix as follows. Since \(\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\alpha \}= R_{it}/w_{it}\), we can write
where
since \({\left [E_{H_{i,t-1}}(\lambda _{it} -\mu _{it})\right ]}^{2} = 0\), and
Substituting (21) and (22) into (20) gives
since \(\lambda _{it} =\mu _{it} +\rho (y_{i,t-1} -\mu _{i,t-1})\) by (1). The conditional expectations given the response history, H i, t − 1, in (23) are evaluated as follows:
and
2.2.1 FSGQL(I) Approach
Note that in a complete longitudinal setup, one may obtain consistent regression estimates even if longitudinal correlations are ignored in developing the estimating equation but such estimates may not be efficient (Sutradhar 2011, Chap. 7). By this token, to obtain consistent regression estimates in the incomplete longitudinal setup, we may still use the independence assumption (i.e., use ρ = 0) but the missing mechanism must be accommodated to formulate the covariance matrix for the construction of the estimating equation. Thus, for simplicity, we now consider a specialized version of the FSGQL approach, namely FSGQL(I) approach, where the GQL estimating equation is developed by using the independence assumption (ρ = 0)-based covariance matrix. More specifically, under this approach, the covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho = 0,\alpha )\) has the form
where \(E_{H_{i}(y)}[w_{it}^{-1}]\) is computed by (24).
Now by replacing the “working” covariance matrix \(V _{i}{(\alpha }^{{\ast}})\) in the WGEE given in (18) with \(\Sigma _{i}^{{\ast}}(\beta,\rho = 0,\alpha )\), one may obtain the FSGQL(I) estimate for β by solving the estimating equation
2.3 CWGQL Approach
Note that instead of using the distance function with unconditional zero mean, one may like to exploit the distance function with zero mean conditionally. This is possible only when the expectation of the binary response conditional on the past history is known. In this case, by replacing μ it with λ it in (17), one may construct the distance function which has mean zero conditional on the past history, that is,
where for binary AR(1) model, for example, the conditional mean has the form
for t = 2, …, T.
Suppose that
with \(\lambda _{i1}(\beta ) =\mu _{i1}(\beta )\). To develop a GQL-type estimating equation in the conditional approach, one minimizes the distance function
with respect to β, the parameter of interest. Given the history, let the conditional covariance matrix \(\{\mbox{ cov}(\Delta _{i}(y_{i} -\lambda _{i}(\beta,\rho )))\vert H_{i}(y)\}\) be denoted by Σ ich (β, ρ). Then assuming that β and ρ in Σ ich (β, ρ) are known, minimizing the quadratic distance function (30) with respect to β is equivalent to solving the equation
2.3 Computational formula for Σ ich (β, ρ, γ)
For convenience, we first write
It then follows that
Now to compute the covariance matrix in (32), we write
It then follows that for u < t, for example,
and for t = 1, …, T i ,
where σ ic, tt is the conditional variance of y it given the history. For example, in the binary case, \(\sigma _{ic,tt} =\lambda _{it}(1 -\lambda _{it})\).
2.3.1 CWGQL Estimating Equation
Now by substituting (34) and (33) into (32), one obtains
where \(\Sigma _{ic} = \mbox{ diag}[\sigma _{ic,11},\ldots,\sigma _{ic,T_{i}T_{i}}]\). Consequently, when this formula for Σ ich from (35) is applied to the conditional GQL (CGQL) estimating equation in (31), one obtains
which is unaffected by the missing MAR mechanism. This is not surprising, as conditional on the history, R it and y it are independent. However, this fully conditional approach requires the modeling of the conditional means of the responses, which is equivalent to modeling the correlation structure.
2.3.2 Conditional Likelihood Estimation
In fact when conditional inference is used, one can obtain likelihood estimates for β and ρ by maximizing the exact likelihood function under the condition that Y it and R it are independent given the history. This is easier for the analysis of longitudinal binary data as compared to the longitudinal analysis for count data subject to MAR.
Since the R it ’s satisfy the monotonic restriction given in Sect. 1, and because R it and Y it are independent conditional on the history under the MAR mechanism, the likelihood function for the ith individual may be expressed as
where, by (9),
3 Simulation Study
3.1 Comparison Between WGEE (AR(1)), WGEE(I) and FSGQL(I) Approaches: Multinomial Distribution Based Joint Generation of R and y
In this section, we describe and report the results of a simulation study that centers on a comparison of the WGEE approach of Robins et al. (1995) for estimating the regression parameter vector with the proposed FSGQL approach. Recall that the WGEE in (18) was constructed by using a “working” covariance matrix \(V _{i}{(\alpha }^{{\ast}}) = A_{i}^{\frac{1} {2} }R_{i}^{{\ast}}{(\alpha }^{{\ast}})A_{i}^{\frac{1} {2} }\), of the response vector y i . Note that this weight matrix was chosen ignoring the missing mechanism. Furthermore, there is no guideline to choose the “working” correlation matrix \(R_{i}^{{\ast}}{(\alpha }^{{\ast}})\). In the simulation study, we will consider a non-stationary longitudinal binary AR(1) model with true correlation structure C i (ρ) given by (4), for the responses subject to MAR. To examine the performance of the WGEE approach (18), we choose the best possible stationary AR(1) correlation form, namely,
as compared to using MA(1) and EQC-based “working” correlation matrices. We will refer to this WGEE as the WGEE(AR(1)). Also we will consider the simplest version of the WGEE approach, namely WGEE(I), which is obtained based on the independence assumption by using α ∗ = 0 in the “working” correlation matrix \(R_{i}^{{\ast}}{(\alpha }^{{\ast}})\). These two versions of the WGEE approach will be compared with the FSGQL(I) approach in (27) which was constructed by accommodating missing mechanism but by using longitudinal independence assumption, i.e., ρ = 0 or \(C_{i}(\rho ) = I_{T_{i}}\). For simplicity, in the present simulation study, we do not consider the true complete covariance matrix \(\Sigma _{i}^{{\ast}}(\beta,\rho,\alpha )\) -based FSGQL approach in (19).
3.1.1 Joint Generation of (R and y) Incomplete Binary Data: Multinomial Distribution Based
In order to generate an incomplete longitudinal binary data set subject to MAR, we follow the approach of Sutradhar and Mallick (2010). Specifically, the procedure initially assumes that every individual provides a response at time t = 1. Thus, since R i1 = 1 for all i = 1, ⋯ , K, a binary response y i1 is generated with marginal probability μ i1. Subsequently, y it is only observed for the i-th individual (i = 1, ⋯ , K) at time t (t = 2, ⋯ , T) when R it = 1 conditional on having observed the previous t − 1 responses for that individual; in other words, conditional on \(R_{i1} = 1,\cdots \,,R_{i,t-1} = 1\). Therefore, at time t (t = 2, ⋯ , T), both Y it and R it are random variables conditional on the observed history up to time t − 1, and, as such, one of the following three events occurs:
is not observed.
Let z its = 1 for any s = 1, 2, 3 indicate that E s has occurred. Then, for l≠s, z itl = 0, and it must be the case that \(\sum _{s=1}^{3}z_{its} = 1\). Let \(p_{its} = P(z_{its} = 1)\) for s = 1, 2, 3. If we set q = 1 in (9), and use the resulting equation in conjunction with model (1), the p its may be expressed as
and
which can be written as
and
where
Thus, Sutradhar and Mallick (2010) summarize the data generation routine for the i-th individual, i = 1, ⋯ , K, as follows:
-
1.
Generate y i1 from a Bernoulli distribution with parameter μ i1.
-
2.
For any t > 1, the values of z its for s = 1, 2, 3 are realized according to the multinomial probability distribution
$$\displaystyle{P(z_{it1},z_{it2},z_{it3}) = \frac{1!} {z_{it1}!z_{it2}!z_{it3}!}p_{it1}^{z_{it1} }p_{it2}^{z_{it2} }p_{it3}^{z_{it3} }}$$with \(\sum _{s=1}^{3}z_{its} = 1\). For z its = 1, allocate the response y it following E s .
-
3.
If z its = 1, stop generating y it for this individual; otherwise repeat steps (1) and (2) for t ≤ T.
3.1.2 Comparison Under Various Designs
Regarding the simulation study, for each of four designs, we set K = 100 and T = 4 and performed 1,000 replications. We considered three different values of longitudinal correlation parameter, setting ρ = 0.2, 0.5, and 0.8 in turn. In order to investigate the effect of the degree of missingness on the estimates of the regression parameter vector, for \(\Delta _{i} = diag(\delta _{i1},\delta _{i2},\cdots \,,\delta _{iT})\) in (18) with \(\delta _{it} = R_{it}/w_{it}\{H_{i,t-1}(y);\alpha \}= R_{it}/w_{it}\), we set \(w_{it} =\prod _{ j=1}^{t}g_{ij}(y_{i,j-1};\alpha )\) according to (10) with q = 1. We then studied two levels for α, namely α = 1, and \(\alpha = -3\) (We assume throughout the simulation study that both ρ and α are known; hence, we do not concern ourselves with estimating these quantities). Note that, according to (41), \(P[R_{it} = 0\mid y_{i,t-1} = 1] = 0.12\) and \(P[R_{it} = 0\mid y_{i,t-1} = 0] = 0.27\) for α = 1, while \(P[R_{it} = 0\mid y_{i,t-1} = 1] = 0.88\) and \(P[R_{it} = 0\mid y_{i,t-1} = 0] = 0.27\) for \(\alpha = -3\). Thus, when \(\alpha = -3\), the rate of missingness is extremely high, as expected.
Initially, we compared the WGEE(I) and FSGQL(I) approaches using a stationary design that essentially contained no covariates. For this design, we simply had a single β 1 = 0. 5, while the associated x it1 = 1 for all i = 1, ⋯ , 100 and t = 1, ⋯ , 4. Table 1 presents the means and standard errors of the WGEE and FSGQL(I) estimates over the 1,000 replications for each of the six combinations of ρ and α. The number of replications that converged is also reported. When the degree of missingness is not overly severe (α = 1), there is little difference in the WGEE(I) and FSGQL(I) estimates. Both approaches produce essentially unbiased estimates, and all replications converge. However, when the degree of missingness is more pronounced \((\alpha = -3)\), the WGEE(I) estimates are significantly biased. In addition, regardless of the value of ρ, more than half of the replications did not converge. On the other hand, the FSGQL(I) estimates are still unbiased, and all replications continue to converge. We investigated the WGEE approach further by considering an AR(1) type “working” correlation structure instead of an independence assumption-based “working” correlation matrix. This WGEE approach is referred to as the WGEE(AR(1)) approach. Specifically, we set \(V _{i}{(\alpha }^{{\ast}}) = A_{i}^{1/2}R_{i}^{{\ast}}{(\alpha }^{{\ast}})A_{i}^{1/2}\), where \(R_{i}^{{\ast}}{(\alpha }^{{\ast}})\) is a T ×T correlation matrix with \(corr(Y _{it},Y _{i,t+l}) {={ \alpha }^{{\ast}}}^{l}\) and \(A_{i} = diag(\sigma _{i,11},\cdots \,,\sigma _{i,tt},\cdots \,,\sigma _{i,T_{i}T_{i}},0,\cdots \,,0)\) with \(\sigma _{i,tt} =\mu _{it}(1 -\mu _{it})\). To avoid estimation of α ∗ we have used α ∗ = ρ. The results obtained for each of the six combinations of ρ and α (the missing dependence parameter) are also presented in Table 1. For \(\alpha = -3\), the WGEE(AR(1)) estimates based on an AR(1) type structure are significantly better than those based on independence, and the number of replications that converged is also notably higher. Nonetheless, the independent FSGQL(I) estimates are still noticeably better than either of the WGEE estimates. Also of note is the fact that the WGEE(AR(1)) estimates based on the AR(1) structure for α = 1 are outperformed by their independent covariance structure counterparts.
We also considered a stationary design consisting of one covariate with associated parameter β 1 = 0. 5. Specifically, for all t = 1, ⋯ , 4, we set \(x_{it1} = -1\) for \(i = 1,\cdots \,,K/4\), x it1 = 0 for \(i = (K/4) + 1,\cdots \,,3K/4\), and x it1 = 1 for \(i = (3K/4) + 1,\cdots \,,K\). The simulation results associated with this design are presented in Table 1 for each combination of ρ and α. When α = 1, the performance of WGEE(AR(1)), WGEE(I) and FSGQL(I) are very similar. It is also important to note that when \(\alpha = -3\), despite the fact that the average estimates for the regression parameters are better for WGEE under relatively higher longitudinal correlations of ρ = 0.5 and 0.8, the estimated standard errors are significantly smaller for the proposed FSGQL(I) technique. In addition, WGEE experiences convergence problems on a significant number of simulation replications; when an independent covariance structure is assumed, convergence rates ranged between 40% and 60%, approximately, and were only slightly better when an AR(1) structure was specified.
Two designs consisting of two covariates with associated regression parameters \(\beta _{1} =\beta _{2} = 0.5\) were also studied; one consisted of two stationary covariates, the other of nonstationary ones. For the design consisting of two stationary covariates, we set x it1 = 1 for all i = 1, ⋯ , 100 and t = 1, ⋯ , 4 as in the design with no covariate, and x it2 according to the values specified for the single covariate design described above. The two covariates in the nonstationary design were set as follows:
and
The results for both the stationary and non-stationary two-covariate designs are presented in Table 2. For both designs, when α = 1, the performance of WGEE(I) under an independent covariance structure and FSGQL(I) is very similar. The estimates obtained using WGEE(AR(1)) with an AR(1) structure appear to be biased. When \(\alpha = -3\), and there is a significantly higher degree of missingness, the estimates obtained under WGEE are biased regardless of the assumed covariance structure and the level of longitudinal correlation; this is particularly the case in the nonstationary design. Also of note is the fact that the convergence rates under the WGEE approach are very poor, with the majority under 5% for the nonstationary design.
3.2 Comparison of WGEE(AR(1)), WGEE(I), and FSGQL(I) Approaches: Generating R and y conditionally
Because R it and y it are independent conditional on the history H i, t − 1(y), instead of generating them by using a multinomial distribution discussed in Sect. 3.1.1, one may generate them by using a conditional approach as follows:
-
1.
Generate y i1 from bin(μ i1) for all i, i = 1, …, K.
-
2.
For i-th individual, generate R i2 from bin(g i2), where g it is given by (9) for t = 2, …, T.
-
3.
If R i2 = 0, consider R ij = 0 and stop generating y ij (j = 2, ⋯ , T).
-
4.
If R i2 = 1, generate y i2 from bin(λ i2), where λ it is the mean of Y it conditional on y i, t − 1 for t = 2, …, T, as given by (1).
-
5.
Repeat from step 2 for j = 3, ⋯ , T.
The estimates for the same designs are obtained as in Sect. 3.1.2, and the simulation results are reported in Tables 3 and 4. The results are similar to those of Tables 1 and 2, except that WGEE approaches appear to encounter more convergence problems especially when proportion of missing values is large.
3.3 Performance of CWGQL Approach: Multinomial Distribution-Based Joint Generation of R and y
As opposed to the marginal approach where the unconditional mean function μ it (β) is modeled, in the longitudinal setup it is more appropriate to model the conditional regression (mean) function. When complete longitudinal binary data follow an AR(1)-type correlation model, as pointed out in (1), the conditional regression function may be modeled as
Furthermore, as pointed out in (28), because in the incomplete longitudinal setup with MAR mechanism one finds
the regression parameter β in μ it (β) modeled through λ it (β, ρ) can be estimated by solving the CWGQL estimating (36). For the same design parameters used in Sect. 3.1.2 for Tables 1 and 2, by generating incomplete data using the multinomial distribution discussed in Sect. 3.1.1, we have obtained the CWGQL estimates for β under different scenarios as for the results shown in Tables 1 and 2. The CWGQL estimates along with their standard errors are reported in Tables 5 and 6.
In order to examine the relative performance of the CWGQL approach with those of WGEE(AR(1)), WGEE(I), and FSGQL(I), it is sufficient to compare the CWGQL approach with the FSGQL(I) approach only. This is because it was found from the results in Tables 1 and 2 that the WGEE approaches may encounter serious convergence problems (showing consistency breakdown) and also may produce highly biased estimates, where the FSGQL(I) approach, in general, does not encounter such convergence problems and produces almost unbiased estimates even if a large proportion of values are missing. Now as compared to the FSGQL(I) approach, the CWGQL approach appears to produce slightly more efficient estimates than the FSGQL(I) approach. For example, when α = 1 (moderate missing) and ρ = 0. 5, in the no-covariate case, the FSGQL(I) approach (Table 1) produces an average estimate of β = 0. 5 as 0. 504 with standard error 0. 169, whereas the CWGQL approach (Table 5) produces β estimate as 0. 501 with standard error 0. 158. Similarly when ρ = 0. 8 and \(\alpha = -3\) (high missing), in the stationary one-covariate case, FSGQL(I) produces an estimate with standard error 0. 325 as compared to 0. 291 for CWGQL. Similar results are found for the stationary two-covariate case. Also in these stationary cases, the CWGQL approach does not encounter any convergence problems even if the proportion of missing is high. In the non-stationary cases however, the CWGQL approach encounters some convergence problems when the proportion of missing is high, but the problem is less serious than the WGEE and WGEE(I) approaches.
4 Conclusion and Discussion
It was found that the existing WGEE (Robins et al. 1995) and WGEE(I) approaches in general encounter convergence problems when the proportion of missing is high, and the WGEE approach may produce highly biased estimates even when the proportion of missing is moderate or low. These results agree with the recent study reported by Sutradhar and Mallick (2010). The WGEE(I) approach, however, produces almost unbiased estimates and consequently this approach produces consistent estimates when the proportion of missing is moderate or low. However, it can be inefficient. The proposed FSGQL(I) approach does not appear to encounter any serious convergence problems even when the proportion of missing is high and the covariates are non-stationary. Also, it produces unbiased estimates similar to the WGEE(I) approach but with smaller standard errors, showing that FSGQL(I) is more efficient as expected than the WGEE(I) approach. Thus even with high proportion of missing, one may reliably use the proposed FSGQL(I) approach for regression estimation whether the covariates are stationary or time dependent. The general FSGQL approach is supposed to increase the efficiency as compared to the FSGQL(I) approach when correlations are large, but this will be studied in the future.
We have also reported some results on the performance of a conditional estimating equation, namely CWGQL estimating equation approach. This approach was found to produce regression estimates with more efficiency than the FSGQL(I) approach. However as compared to the FSGQL(I) approach it encounters convergence problems when covariates are time dependent and the proportion of missing is high. However, it experiences less convergence problems than the WGEE(I) approach.
References
Birmingham, J., Rotnitzky, A., Fitzmaurice, G.M.: Pattern-mixture and selection models for analysing longitudinal data with monotone missing patterns. J. R. Stat. Soc. B 65, 275–297 (2003)
Fitzmaurice, G.M., Laird, N.M., Zahner, G.E.P.: Multivariate logistic models for incomplete binary responses. J. Am. Stat. Assoc. 91, 99–108 (1996)
Liang, K.-Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986)
Paik, M.C.: The generalized estimating equation approach when data are not missing completely at random. J. Am. Stat. Assoc. 92, 1320–1329 (1997)
Qaqish, B.F.: A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90, 455–463 (2003)
Robins, J.M., Rotnitzky, A., Zhao, L.P.: Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 90, 106–121 (1995)
Rotnitzky, A., Robins, M., Scharfstein, D.O.: Semi-parametric regression for repeated outcomes with nonignorable nonresponse. J. Am. Stat. Assoc. 93, 1321–1339 (1998)
Rubin, D.B.: Inference and missing data. Biometrika 63, 581–592 (1976)
Sutradhar, B.C.: An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–93 (2003)
Sutradhar, B.C.: Dynamic Mixed Models for Familial Longitudinal Data. Springer, New York (2011)
Sutradhar, B.C.: Inference progress in missing data analysis from independent to longitudinal setup. In: Sutradhar, B.C. (ed.) ISS-2012 Proceedings volume on longitudinal data analysis subject to measurement errors, missing values, and/or outliers, Springer Lecture Notes Series, pp. 101–123 (2013)
Sutradhar, B.C., Mallick, T.S.: Modified weights based generalized quasilikelihood inferences in incomplete longitudinal binary models. Can. J. Stat. 38, 217–231 (2010)
Zeger, S.L, Liang, K.-Y., Self, S.G.: The analysis of binary longitudinal data with time-independent covariates. Biometrika 72, 31–38 (1985)
Acknowledgements
The authors would like to thank the audience of the symposium for their comments and suggestions. The authors would also like to thank a referee for valuable comments on the original submission. The research program of Patrick J. Farrell is supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Mallick, T.S., Farrell, P.J., Sutradhar, B.C. (2013). Consistent Estimation in Incomplete Longitudinal Binary Models. In: Sutradhar, B. (eds) ISS-2012 Proceedings Volume On Longitudinal Data Analysis Subject to Measurement Errors, Missing Values, and/or Outliers. Lecture Notes in Statistics(), vol 211. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6871-4_6
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6871-4_6
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6870-7
Online ISBN: 978-1-4614-6871-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)