Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Sutradhar, Brajendra C.

doi:10.1007/s13571-021-00260-3

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Published: 09 July 2021

Volume 84, pages 259–302, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya B Aims and scope Submit manuscript

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Download PDF

Brajendra C. Sutradhar¹

126 Accesses
3 Citations
Explore all metrics

Abstract

In a cross-sectional cluster setup, the binary responses from the individuals in a cluster become correlated as they share a common cluster effect, whereas longitudinal responses from an individual those form a cluster become correlated as the present and past responses are likely to maintain a suitable dynamic relationship. In both cluster and longitudinal setups, the marginal means may or may not be specified as the function of regression effects/parameters only. In a cluster setup, this depends on the distributional assumption of the random cluster effects and in a longitudinal setup this depends on the form such as linear or non-linear dynamic relationships used to construct a conditional model. However, over the last four decades, many studies arbitrarily pre-specified the marginal means as the function of regression effects only under both cluster and longitudinal setups and accommodated correlations also using arbitrarily selected ‘working’ correlation structures. This paper makes a thorough in-depth review of these decades long binary correlation models for consistent and efficient estimation of the regression effects. Both progress and drawbacks of these works are presented clearly showing how the inconsistency can arise if the pre-specified marginal fixed model is used when in fact such a marginal fixed effects model does not exist. This is because, some of the conditional random effects models in a cluster setup produce mixed effect models for the marginal means, and conditional non-linear dynamic models in a longitudinal setup produce history based marginal recursive/dynamic models. As the practitioners in both cluster and longitudinal setups deal with large data sets, it is demonstrated for their benefits how one can use the GQL (generalized quasi-likelihood) estimation approach both in cluster and longitudinal setups. Furthermore, there exist many studies using the Bayesisn approach where unlike the aforementioned parametric correlation structure based inferences, the marginal mixed effects models have been used for inferences for correlated binary data without specifying their correlation structures, under both cluster and longitudinal setup. We also provide a brief review on this alternative approach.

Measurement Error Analysis from Independent to Longitudinal Setup

Modelling Correlated Bivariate Binary Data: A Comparative View

Article 13 May 2022

Robust Inference Progress from Independent to Longitudinal Setup

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In a cross-sectional cluster setup, the responses from the individuals in a given cluster are correlated as these responses share a common random cluster effect, whereas in a longitudinal setup the repeated responses collected from an individual form a cluster and these clustered responses from the same individual become correlated as they are likely to follow a dynamic relationship. Thus the correlation structure under cross-sectional clusters and longitudinal clusters are supposed to be different. In both setups, it is of primary objective to examine the regression effects (of the associated covariates) after accommodating the respective correlation structure.

To facilitate the discussion on the cluster regression models in a cross-sectional setup, suppose that there are I independent clusters and for a cluster i(i = 1,…,I), n_i denotes its size. Let y_ij denotes a binary response from the j-th (i = 1,…,n_i) individual of the i-th cluster. Further, let x_ij be a p-dimensional fixed covariate vector, and $\boldsymbol {\beta }=(\beta _{1},\ldots ,\beta _{u},\ldots ,\beta _{p})^{\prime }$ be the regression effect of x_ij on y_ij, for all i = 1,…,I;j = 1,…,n_i. Notice that in this cluster setup, there is likely to be a cluster effect on the responses belonging to the same cluster. Let γ_i denote the random cluster effect of the i-th cluster which is shared by the responses belonging to this cluster. Thus, on top of β, there is an influence of γ_i on the responses ({y_ij,j = 1,…,n_i}) belonging to the i-th cluster. This additional influence, along with the influence of x_ij, is reflected on the binary response y_ij through a cluster-specific conditional mean model given by

$$ \begin{array}{@{}rcl@{}} E[Y_{ij}|\gamma_{i}]&=&Pr[Y_{ij}=1|\boldsymbol{x}_{ij},\gamma_{i}] \\ &=& p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i})=\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})/ [1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})], \end{array} $$

(1)

where it is customarily assumed that $\gamma _{i} {\stackrel {iid}{\sim }} (0,\sigma ^{2}_{\gamma }).$ Notice that Eq. 1 is a marginal model for y_ij conditional on γ_i. Hence this model may be referred to as the marginal-conditional (MC) binary logistic model. This model in Eq. 1 is also known as the so-called random effects model where $\sigma ^{2}_{\gamma }$ plays multiple roles. More specifically, depending on the distributional assumption of γ_i or $p^{*}_{ij}(\gamma _{i}),$ (1) the unconditional mean, that is, E[Y_ij] = Pr[Y_ij = 1|x_ij], may or may not be a function of $\sigma ^{2}_{\gamma };$ (2) similarly var[Y_ij] may or may not exhibit overdispersion (McCullagh and Nelder, 1989); but (3) because γ_i is the common clustered effect shared by all responses {y_ijandy_ik,fork≠j;j,k = 1,…,n_i,} y_ij and y_ik are correlated, and this within cluster correlation must be a function of $\sigma ^{2}_{\gamma }.$ For the third reason (3), $\sigma ^{2}_{\gamma }$ may preferably be referred to as the cluster correlation parameter.

As far as the applications of the model (1) is concerned, there are many practical situations, where one needs to analyze cluster specific binary data following (1). For example, in a chronic obstructive pulmonary disease (COPD) study (Liang et al. 1992; Ekholm et al. 1995; Sutradhar and Mukerjee, 2005), y_ij denotes the impaired pulmonary function (IPF) status (yes or no), and x_ij is the vector of covariates such as gender, race, age, and smoking status, for the j-th sibling of the i-th COPD patient. In this problem it is likely that the IPF status for n_i siblings may be influenced by an unobservable random effect (γ_i) due to the i-th COPD patient. This common random effect makes the binary responses of any two siblings of the same patient correlated. It is of scientific interest to find the effects of the covariates on the binary responses (i.e., β in the mean function, E[Y_ij] = Pr[Y_ij = 1|x_ij]), after taking the within cluster correlations into account. Thus, it is desired to derive the formula for P[Y_ij = 1|x_ij] = E[Y_ij|x_ij] from Eq. 1 under a suitable distribution for γ_i, or using a nonparametric density. In this paper, we will confine our discussion to a parametric setup.

With regard to constructing a marginal (fixed or mixed) model for E[Y_ij] = Pr[Y_ij = 1|x_ij] we remark that because γ_i in Eq. 1 may be treated as an additive random covariate in the linear predictor $\boldsymbol {x}^{\prime }_{ij}+\gamma _{i},$ it would be highly reasonable to assume that γ_i follows a normal (N) distribution, specifically $\gamma _{i} {\stackrel {iid}{\sim }} N(0,\sigma ^{2}_{\gamma })$ (Breslow and Clayton, 1993; Lee and Nelder, 1996; Sutradhar, 2004). This normality assumption produces a marginal mixed effects (MME) model. For convenience of further discussion in the next section and so on, we name this model as cluster model A (CM-A). Some studies such as Wang and Louis (2003) assume a so-called “bridge” distribution for γ_i, which provides a marginal fixed effects (MFE) model. We name this model as CM-B-1. Some other studies such as Prentice (1986) (see also Haseman and Kuper, 1979) assumed a beta-binary distribution which also produces a MFE model. We name this as CM-B-2.

There exists another group of studies (Zeger et al. (1988, Section 3.1), Neuhaus et al. (1991, Eqn. (4)), and Chen et al. (2011, Sections 2.1, 3.1)) where without any justification how the cluster effect may contribute to the modeling for mean, variance and correlations, they assumed a subject specific (SS) arbitrary marginal fixed effects (AMFE) model for this mean function, and further assume that a user’s choice ‘working’ correlation structure can be used for the estimation of the marginal fixed effects parameter. Thus, in this approach both means and correlations have ‘working’ models, which is a naive approach, and is bound to produce invalid such as inconsistent regression and correlation estimates in many practical situations where true means and correlations may be generated based on normal random cluster effects, γ_i. We name this naive/working model as CM-C. A brief review is given in Section 2 on the advantages and drawbacks of all these for cluster models (CM-A, CM-B-1, CM-B-2, CM-C) and their respective inferences.

We now consider the clustered binary data in a longitudinal setup, where a cluster is formed with repeated responses from the same individual. For convenience, we consider I independent individuals, whereas in the cross-sectional cluster setup, the same I was used to represent total number of independent clusters. However to form the i-th (i = 1,…,I) cluster for individual i, with repeated responses, we assume that these responses are recorded over a small period of time T, such as T = 4or5 weeks/months/years. Hence we denote the binary response recorded at time t(t = 1,…,T) from the i-th individual by y_it, whereas in cross-sectional cluster setup y_ij,j = 1,…,n_i, was used to represent the binary response from the j-th member of the i-th clusters. Next we denote by x_it, a time dependent covariate vector corresponding to y_it. Here it is natural to expect that these repeated responses {y_it,t = 1,…,T,} will be correlated most possibly through a dynamic relationship similar to time series data.

Similar to the cross-sectional clusters setup. it is of primary interest in this longitudinal setup, to find out the effect of x_it on the binary response y_it. This is equivalent to compute the effect of x_it on E[Y_it|x_it] = Pr[Y_it = 1|x_it]. Note however that unlike in the cross-sectional setup, in some situations it may be of interest to find the effects of the past history on the current response y_it. This is equivalent to compute the effect of the covariate history

$$ H_{i,t}(\cdot)\equiv [\boldsymbol{x}_{i1},\ldots,\boldsymbol{x}_{i,t-1},\boldsymbol{x}_{i,t}] $$

on y_it, i.e., to compute E[Y_it|H_it] = Pr[Y_it = 1|H_it]. Suppose that the effect of x_it on y_it is measured by β which is similar but different than in the cross-sectional case where it represents the effect of x_ij on y_ij, j being the j-th individual in the cluster. On top of this difference, a major difference between the models in both setups (cross-sectional and longitudinal clusters) arises because of the different nature of the binary responses under their respective clusters. More specifically, in the longitudinal setup, the correlation between y_it and y_i,t− 1, for t = 2,…,T arises because these responses are likely to be directly related through a dynamic dependence relationship, whereas in cross-sectional setup, y_ij and y_ik for j≠k;j,k = 1,…,n_i, are correlated as they share a common random cluster effect γ_i. For this reason, a big volume of existing studies (Laird and Ware, 1982; Stiratelli et al. 1984; Neuhaus, 2002; Parzen et al. 2011) where longitudinal binary responses are analyzed using random/mixed effects model, fail to accommodate longitudinal correlations adequately. In particular these random effects based models have limited or no values to address the dynamic dependence among repeated responses.

As far as the marginal models for time specific binary means are concerned, there exists many situations where a marginal fixed effects (MFE) model involving only regression parameters (β) can be used for the marginal means. This is similar to the cross-sectional setup but correlation models are quite different under both cross-sectional and longitudinal setups. For MFE based longitudinal models, one may refer to specific dynamic models suggested, for example, by Bahadur (1961) (see also Cox, 1972) multivariate binary density (MBD) based model, Kanter (1975) for an observation driven dynamic (ODD) model, and Zeger et al. (1985) for a linear dynamic conditional probability (LDCP) model (see Sutradhar (2011, Section 7.2) for details). All these models yield the marginal mean function, i.e., the formula for the unconditional means (E[Y_it|x_it] = Pr[Y_it = 1|x_it]) for all t = 1,…,T, in terms of β parameter only. For our discussion involving a MFE model, we consider, for example the AR(1) (auto-regressive order 1) type linear dynamic model from Zeger et al. (1985), given by

$$ \begin{array}{@{}rcl@{}} &&{}Pr[Y_{i1}=1|\boldsymbol{x}_{i1}]=\tilde{p}_{i1}(\boldsymbol{\beta}) \\ &&{}Pr[Y_{it}=1|y_{i,t-1},\boldsymbol{x}_{it},\boldsymbol{x}_{i,t-1}]=\tilde{p}_{it}(\boldsymbol{\beta})+ \rho(y_{i,t-1}-\tilde{p}_{i,t-1}), t=2,\ldots,T, \end{array} $$

(2)

with $\tilde {p}_{it}(\boldsymbol {\beta })=\exp (x^{\prime }_{it}\boldsymbol {\beta })/[1+\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta })],$ yielding the marginal means and variances as the function of β only, that is they are free of the dynamic dependence parameter ρ. We refer to this MFE model as longitudinal model 1 (LM(1)), and express it as

$$ \begin{array}{@{}rcl@{}} && \text{LM(1): A marginal fixed effects (MFE) model} \\ &&E[Y_{it}]=Pr[Y_{it}=1]=\tilde{p}_{it}(\boldsymbol{\beta})=\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})/ [1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})] \\ && \text{var}[Y_{it}]=\tilde{p}_{it}(\boldsymbol{\beta}) (1-\tilde{p}_{it}(\boldsymbol{\beta})). \end{array} $$

(3)

There are, however, many other situations where the MFE models are not appropriate for the marginal means of the longitudinal binary data. This mostly happens when y_it depends on the history H_it, rather than on x_it. In this case, marginal means will be the function of both β and dynamic dependence parameter ρ. In cross-sectional cluster setup, a MME (marginal mixed effects model (CM(1)), means involving β and $\sigma ^{2}_{\gamma }$) was used to represent the marginal means, but in the present longitudinal setup it is more appropriate to refer the model as the MD (marginal dynamic) or MR (marginal recursive) model. More specifically this MD/MR model for marginal means is derived from a non-linear conditional dynamic logit model Sutradhar and Farrell (2007) given by

$$ \begin{array}{@{}rcl@{}} &&{}Pr[Y_{i1}=1|\boldsymbol{x}_{i1}]=\tilde{p}_{i1}(\boldsymbol{\beta}) =\exp(\boldsymbol{x}^{\prime}_{i1}\boldsymbol{\beta})/[1+\exp(\boldsymbol{x}^{\prime}_{i1}\boldsymbol{\beta})] \\ &&{}Pr[Y_{it}=1|y_{i,t-1},\boldsymbol{x}_{it}]=\frac{\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+\rho y_{i,t-1})}{1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+\rho y_{i,t-1})}, t=2,\ldots,T, \end{array} $$

(4)

(see also Loredo-Osti and Sutradhar, 2012; Fokianos and Kedem, 2003 in a time series setup) yielding the marginal dynamic/recursive (MD/MR) model:

$$ \begin{array}{@{}rcl@{}} \text{LM(2)} \!\!\!\!&:&\!\!\!\! \text{marginal dynamic/recursive (MD/MR) model} \\ \mu_{i1}(\beta)\!\!\!\!&=&\!\!\!\!E[Y_{i1}|\boldsymbol{x}_{i1}]=Pr[Y_{i1}=1|\boldsymbol{x}_{i1}]=\tilde{p}_{i1}(\boldsymbol{\beta}) \\ \!\!\!\!\mu_{it}(\boldsymbol{\beta},\rho)\!\!\!\!&=&\!\!\!\!E[Y_{it}|H_{it}]=\tilde{p}_{it}(\boldsymbol{\beta}) +\mu_{i,t-1}(\cdot)({\tilde{\tilde{p}}}_{it}(\boldsymbol{\beta},\rho)- \tilde{p}_{it}(\boldsymbol{\beta})),\\ t\!\!\!\!&=&\!\!\!\!2,\ldots,T, \end{array} $$

(5)

[Sutradhar and Farrell (2007), Sutradhar (2011, Section 7.7.2)] where ${\tilde {\tilde {p}}}_{it}(\boldsymbol {\beta },$ $\rho ) =\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta }+\rho )/[1+\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta }+\rho )].$

For the sake of completeness, we also include another MFE model (on top of LM(1)) where, similar to CM-C in cross-section cluster setup, an arbitrary MFE (AMFE) model is used for the marginal means in terms of β, and longitudinal correlations are not modeled but substituted by certain ‘working’ correlations (Liang and Zeger, 1986) for the inference about β. We name this AMFE based model as LM(3). we briefly review these models LM(1), LM(2), and LM(3) in Section 3, along with available approaches for their parameters estimation. The advantages and drawbacks of these models and estimation approaches are also discussed.

Furthermore, because the CM-A as opposed to CM-B-1, CM-B-2 and CM-C in the cross-sectional cluster setup shows under the normality assumption of the random cluster effect (γ_i) that the marginal means contain both β and $\sigma ^{2}_{\gamma }$ parameters, we consider this general model further in Section 4 and demonstrate how to construct a computationally simpler GQL (generalized quasi-likelihood) approach than maximum likelihood (ML) approach for the estimation of β, for known $\sigma ^{2}_{\gamma }.$ When $\sigma ^{2}_{\gamma }$ is unknown we provide a consistent MM (method of moments) approach for its estimation. Asymptotic properties such as consistency of these GQL and MM estimators are also given in the same section. As far as the estimation of β and ρ under the general longitudinal model LM(2) is concerned, one may refer to Sutradhar and Farrell (2007) for their GQL and MM based estimation. In Section 5, we provide a brief review on the use of the GLMM (generalized linear mixed model) in a Bayesian frame work for inferences for correlated binary data in both cluster and longitudinal setups. Apart from computational complexity, it is outlined that because the random effects based models, in general, do not produce the time lag dependent correlations, choosing necessary proper prior distributions under longitudinal setup may be problematic. To tackle this problem to some extent, there appears a few studies using dynamic models for longitudinal binary data in a Bayesian frame work. This approach is discussed in brief as well. The paper concludes in Section 6.

2 Existing Marginal Models and Estimation for Cross-Sectional Clustered Binary Data

2.1 CM-A: Population Average (PA) Based Marginal Mixed Effects (MME) Models

Refer to the marginal conditional model (1) which is written by adding the random cluster effect γ_i to the linear predictor $\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }$ used in the binary logistic probability function. This model is also as random effects model for binary data. Under the assumption that γ_i has a suitable probability distribution with probability density $g_{D}(\gamma _{i}|0,\sigma ^{2}_{\gamma }),$ one may then write the likelihood function as

$$ \begin{array}{@{}rcl@{}} &&L(\boldsymbol{\beta},\sigma^{2}_{\gamma})={\Pi}^{I}_{i=1}Pr((y_{i1},\ldots,y_{ij},\ldots,y_{in_{i}}) |\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\Pi}^{I}_{i=1}{\int}_{\gamma_{i}}{\Pi}^{n_{i}}_{j=1}Pr(y_{ij}|\gamma_{i})g_{D}(\gamma_{i})d\gamma_{i} ={\Pi}^{I}_{i=1}{\int}_{\gamma_{i}}\\ &&{\Pi}^{n_{i}}_{j=1}[p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i})]^{y_{ij}} [1-p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i})]^{1-y_{ij}}g_{D}(\gamma_{i})d\gamma_{i} \\ &=&{\Pi}^{I}_{i=1}\int\frac{\exp\{{\sum}^{n_{i}}_{j=1}y_{ij}(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta} +\gamma_{i})\}}{{\Pi}^{n_{i}}_{j=1}\{1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})\}} g_{D}(\gamma_{i})d\gamma_{i}, \end{array} $$

(6)

which, and some of its modification such as penalized quasi-likelihood, hierarchical likelihood, were exploited by many researchers over the last four decades under varieties form of g_D(⋅), mainly for the estimation of β and $\sigma ^{2}_{\gamma }.$ Among varieties form for g_D(⋅), normality assumption based g_N(γ_i) is widely used. See for example, Stiratelli et al. (1984, Eqn. (3.1)), Breslow and Clayton (1993), Lee and Nelder (1996), Sutradhar and Mukerjee (2005). Some authors have used a specialized “bridge” (b) distribution with density, say g_b(⋅) (Wang and Louis (2003, Eqns. (4.1)-(4.2)), Parzen et al. (2011)), which, unlike the normal distribution (g_N(⋅)), yields a marginal fixed effects model for the marginal means. But this “bridge” distribution appears to be restrictive and too technical for practical use. When g_D(⋅) ≡ g_N(⋅), one obtains a MME (marginal mixed effects) based mean model given by

$$ \begin{array}{@{}rcl@{}} &&E[Y_{ij}]=Pr[Y_{ij}=1|\boldsymbol{x}_{ij}] j=1,\ldots,n_{i} \\ &=&{\int}_{\gamma_{i}}p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i})g_{N}(\gamma_{i})d\gamma_{i} =\int \left[\frac{\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})} {[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})]}\right]dG_{N}(\gamma_{i},\sigma^{2}_{\gamma}) \\ &=&\mu_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \text{(say), for all} j=1,\ldots,n_{i} \end{array} $$

(7)

$$ \begin{array}{@{}rcl@{}} & \neq& \frac{\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})}{[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})]}=p_{ij}(\boldsymbol{\beta}). \end{array} $$

(8)

Notice that the g_N(⋅) based likelihood estimates for β and $\sigma ^{2}_{\gamma }$ (Sutradhar and Mukerjee, 2005), obtained by maximizing the likelihood function (6), can be used in the marginal mean $\mu _{ij}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ to interpret the effects of x_ij on the binary mean response E[Y_ij] = Pr[Y_ij = 1|x_ij]. However, some studies attempt to estimate β in p_ij(β) and interpret the effects of x_ij on the mean response. But clearly it would be an incorrect or inconsistent estimate under normal random cluster effects, as β in Eq. 7 can not be estimated without estimating $\sigma ^{2}_{\gamma }$ at least consistently. This is also evident from Zeger et al. (1988) that under normality, the population average (PA) based β, that is, β^PA in $\mu _{ij}({\boldsymbol {\beta }}^{PA},\sigma ^{2}_{\gamma })$ in Eq. 8 has an approximate relationship with the subject specific (SS) β, i.e., β^SS in $p_{ij}({\boldsymbol {\beta }}^{SS}),$ as

$$ \begin{array}{@{}rcl@{}} {\boldsymbol{\beta}}^{PA} \approx {\boldsymbol{\beta}}^{SS}/[\sqrt{1+\left( \frac{16}{15}\right)^{2} \frac{3}{\pi^{2}}\sigma^{2}_{\gamma}}]. \end{array} $$

(9)

Thus, the desired β^PA can not be estimated without estimating $\sigma ^{2}_{\gamma }$ under the present cluster setup.

However, there remains two relatively complex issues in this β = β^PA estimation. First, $\mu _{ij}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ is an implicit function, hence it is not easy to interpret the role of β on this mean function in the presence of an estimate of $\sigma ^{2}_{\gamma }.$ Second, the likelihood estimation for β and $\sigma ^{2}_{\gamma }$ is complex. As a remedy, following a binomial approximation (BA) to the normal distribution of γ_i (Sutradhar (2011, Chapter 5, Eqn. (4.24))), one may compute this mean function μ_ij(⋅) as follows and interpret it as the function of β for given $\sigma ^{2}_{\gamma }.$ Use $\gamma ^{*}_{i}=\gamma _{i}/\sigma _{\gamma }$ in Eq. 7 and express $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})$ as $p^{*}_{ij}(\boldsymbol {x}_{j};\boldsymbol {\beta },\sigma _{\gamma } \gamma ^{*}_{i}).$ Further consider v_i as a binomial variable with parameters V and 1/2, i.e., $v_{i} \sim \text {binomial} (V,1/2).$ Next using

$$ \gamma^{*}_{i}=\frac{v_{i}-V(1/2)}{\sqrt{V(1/2)(1/2)}}\equiv h(v_{i}) $$

(10)

we may express the MME based mean function in Eq. 7 as

$$ \begin{array}{@{}rcl@{}} \mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) ={\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

(11)

where V is assumed to be relatively large such as V = 10. Note that this formula in Eq. 11 for the computation of the BA-based individual specific mean function is explicit, whereas the mean function was implicitly defined in Eq. 7. Thus one may use the likelihood estimates of β and $\sigma ^{2}_{\gamma }$ obtained by exploiting the g_N(γ_i)-based likelihood function (6) (Sutradhar and Mukerjee, 2005) into this mean formula in Eq. 11, and easily examine/interpret the effects of individual covariate x_ij on the binary mean function Pr[Y_ij = 1|x_ij] = E[Y_ij|x_ij]. We remark however that as the likelihood estimation is relatively complex, in Section 4 we demonstrate how one can develop a GQL approach (which produces consistent and highly efficient estimate, the ML estimate being optimal) for the estimation of the main regression parameters, and a MM approach for consistent estimation of $\sigma ^{2}_{\gamma }.$ These GQL and MM approach exploit moments of the clustered binary data up to order 2 containing all squared and pairwise (from 2 individuals in the cluster) products of the binary responses. The following section provides a brief discussion on some other (than ML and GQL) existing estimated approaches along with their limitations.

2.1.1 Some Highly Competing Estimation Approaches in the Cross-Sectional Cluster Setup and their Drawbacks

A BLUP (Best Linear Unbiased Prediction) Approach

Under normality, i.e., when g_D(γ_i) ≡ g_N(γ_i) in Eq. 6, many authors such as Stiratelli et al. (1984, Eqn. (3.1)), Schall (1991), Karim and Zeger (1992), Breslow and Clayton (1993), McGilchrist (1994), Kuk (1995), Lin and Breslow (1996), and Lee and Nelder (1996) have used a BLUP analogue estimation approach, where cluster/familial random effects are treated to be the fixed effects [Henderson (1963)] and the regression and variance components of the mixed model (6) are estimated based on the so-called estimates of the random effects. Because γ_i has to be estimated using the data from the i-th cluster only, in general, this BLUP procedure may yield biased estimate for γ_i specially when i th cluster size is small, which may subsequently produce biased regression and variance estimates, variance estimates being more adversely affected than regression estimates. In order to remove biases in the estimates, Kuk (1995) and Lin and Breslow (1996), among others, provided certain asymptotic bias corrections both for the regression and the variance component estimates. But, as Breslow and Lin (1995, p. 90) have shown that the bias corrections appear to improve the asymptotic performance of the uncorrected quantities only when the true variance component is small, more specifically, less than or equal to 0.25. But in practice, variance component can be much larger. We further remark that the above BLUP analogue approaches are essentially using a likelihood technique for the present non-linear binary regression analysis. For example, Breslow and Clayton (1993) specifically use a PQL (penalized quasi-likelihood) approach, similarly Lee and Nelder (1996) use a HL (hierarchical likelihood) approach. These two approaches are similar, because in the first step, both PQL and HL approaches estimate the regression parameters and the random effects. The difference between the two approaches is that the PQL approach estimates them by maximizing a penalized quasi-likelihood function, whereas the HL approach maximizes a hierarchical likelihood function. In the second step, in estimating the variance of the random effects, the PQL approach maximizes a profile quasi-likelihood function, whereas the HL approach maximizes an adjusted profile hierarchical likelihood function. Thus both approaches encounter biases in the estimates in a similar way.

Another major drawback of the above mentioned BLUP oriented likelihood approaches is that no attempt is made to compute the marginal means from the respective likelihood function, whereas this computation of the marginal means is essential to interpret the effects of the covariates x_ij on the marginal means $Pr[Y_{ij}=1|\boldsymbol {x}_{ij}]=E[Y_{ij}|\boldsymbol {x}_{ij}]=\mu _{ij} (\boldsymbol {x}_{ij};\boldsymbol {\beta },\sigma ^{2}_{\gamma }).$

2.2 CM-B-1: Subject Specific (SS) Marginal Fixed Effects (MFE) Model Based on “bridge” Random Cluster Effects

In some situations depending on the assumption about the distribution of the random effects (γ_i), the PA-based mixed model may yield a fixed effects model for the marginal means. More specifically, by using a slightly different (than $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})$ in Eq. 7) marginal-conditional probability given by

$$ \begin{array}{@{}rcl@{}} &&Pr[Y_{ij}=1|\boldsymbol{x}_{ij},\gamma_{i}]=p^{**}_{ij}(\boldsymbol{\beta},\phi(\sigma^{2}_{\gamma}),\gamma_{i}) \\ &=&\exp(\{\phi(\sigma^{2}_{\gamma})\}^{-1}\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i}) [1+\exp(\{\phi(\sigma^{2}_{\gamma})\}^{-1}\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})],\\ 0&<&\phi(\sigma^{2}_{\gamma})<1, \end{array} $$

(12)

Wang and Louis (2003, Eqn. (4.2)) have shown that

$$ \begin{array}{@{}rcl@{}} &&Pr[Y_{ij}=1|\boldsymbol{x}_{ij}]={\int}^{\infty}_{-\infty}p^{**}_{ij} (\boldsymbol{\beta},\gamma_{i})g_{D}(\gamma_{i})d\gamma_{i} \\ &=&p_{ij}(\boldsymbol{\beta}) =\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})/[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})], \end{array} $$

(13)

a MFE (marginal fixed effects) based model involving only β parameters, when

$$ g_{D}(\gamma_{i}) \Rightarrow g_{b}(\gamma_{i}), $$

g_b(γ_i) being the so-called “bridge” density of the form

$$ \begin{array}{@{}rcl@{}} &&{}g_{b}(\gamma_{i})=\frac{1}{2\pi}\frac{sin(\phi \pi)}{cosh(\phi \gamma_{i})+cos(\phi \pi)}; 0<\phi(\sigma^{2}_{\gamma})<1, -\infty <\gamma_{i} < \infty. \end{array} $$

(14)

where ϕ is related to $\sigma ^{2}_{\gamma }$ through the relationship, $\sigma ^{2}_{\gamma }=\pi ^{2}(\phi ^{-2}-1)/3.$

We remark that the MFE model for the marginal means given in Eq. 13 is simpler than the MME model (7) to interpret the effects of the covariates x_ij on the SS binary means E[Y_ij|x_ij] as it is $\sigma ^{2}_{\gamma }$ free. Also the likelihood estimate of β obtained by exploiting the likelihood function

$$ \begin{array}{@{}rcl@{}} L(\boldsymbol{\beta},\phi/\sigma^{2}_{\gamma}) ={\Pi}^{I}_{i=1}\int\frac{\exp\{{\sum}^{n_{i}}_{j=1}y_{ij}(\phi^{-1}\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta} +\gamma_{i})\}}{{\Pi}^{n_{i}}_{j=1}\{1+\exp(\phi^{-1}\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\gamma_{i})\}} g_{b}(\gamma_{i})d\gamma_{i}, \end{array} $$

(15)

can be used for β in the MFE model (13). This is because the MFE model (13) can be obtained from the joint probability function used in the likelihood function (15). That is β^PA ≡β^SS.

However, some of the major drawbacks of this “bridge” random effects based fixed model are:

(i) Notice that γ_i involved in the linear mixed predictor in the conditional probability function in Eq. 12 may be treated as a random covariate, whereas x_ij’s are known to be fixed covariates. As far as its distributional properties are concerned, even though the bridge distribution (14) (which has a complex trigonometrical ratio form) technically yields the marginal fixed effects model, the suitability of this distributional assumption, as opposed to the normality assumption (e.g., Breslow and Clayton, 1993; Lee and Nelder, 1996 in GLMM setup) in practical contexts, is not discussed adequately in the literature.

(ii) Even though the MFE model (13) does not contain $\phi /\sigma ^{2}_{\gamma },$ this parameter has to be estimated anyway as it contains in the likelihood function or any possible correlation structure. Moreover, the likelihood estimation using the likelihood function (15) would be much more complex than using the normal clusters based likelihood function.

(iii) β parameter in the MFE model (13) could be estimated using a QL (quasi-likelihood) approach rather than using the complicated likelihood approach, provided one could compute the pair-wise correlations among the clustered binary responses. The computation of these correlations appear to be complicated under this “bridge” cluster effects assumption.

2.3 CM-B-2: SS Marginal Fixed Effects (MFE) Model Based on Beta-Binary Random Clustered Probability Function

Similar to the CM-B-1 model (Wang and Louis, 2003, 2004), there exists some early studies (Prentice, 1986; Haseman and Kuper, 1979) in the context of clustered binary regression analysis in a longitudinal setup, where, in order to obtain a MFE model for binary means, an extended assumption about the distribution of a function of γ_i, specifically for $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i}),$ was used. For a bounded scale parameter τ, as a function of $\sigma ^{2}_{\gamma },$ say $\tau (\sigma ^{2}_{\gamma })$ satisfying the range $0<\tau (\sigma ^{2}_{\gamma })<1),$ suppose that $p^{*}_{ij}(\boldsymbol {\beta },\gamma )$ in Eq. 1 (see also Eq. 7) follows a beta-distribution ($\tilde {g}_{B}$) of first kind with parameters $(\{\tau (\sigma ^{2}_{\gamma })\}^{-1}-1)p_{ij}(\boldsymbol {\beta })$ and $(\{\tau (\sigma ^{2}_{\gamma })\}^{-1}-1)q_{ij}(\boldsymbol {\beta }))$ with q_ij(β) = 1 − p_ij(β). More specifically,

$$ \begin{array}{@{}rcl@{}} &&{}{\text{Distributional assumption for the random logistic function }p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i})} : \\ &&{}{\text{ A beta-distribution of first kind }} \\ &&{}\tilde{g}_{B}(p^{*}_{ij};\tau,p_{ij}(\boldsymbol{\beta}))=\frac{{p^{*}_{ij}}^{(\tau^{-1}-1)p_{ij}-1} (1-p^{*}_{ij})^{(\tau^{-1}-1)q_{ij}-1}}{B((\tau^{-1}-1)p_{ij},(\tau^{-1}-1)q_{ij})}; 0\le p^{*}_{ij} \le 1, \end{array} $$

(16)

which yields the marginal probability

$$ \begin{array}{@{}rcl@{}} &&Pr[Y_{ij}=1|\boldsymbol{x}_{ij}]={{\int}^{1}_{0}}p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i}) \tilde{g}_{B}(p^{*}_{ij})dp^{*}_{ij} \\ & &=p_{ij}(\boldsymbol{\beta}) =\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})/[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})], \end{array} $$

(17)

as in Eq. 13 under CM-B-1 model, which is the same as the marginal probability in Eq. 8.

Consequently, under this mixed model approach, one may examine the effects of x_ij on the marginal response means E[Y_ij|x_ij] = Pr[Y_ij = 1|x_ij] by computing β parameter involved in the simpler MFE model Eq. 17, i.e., in $p_{ij}(\boldsymbol {\beta }) =\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta })/[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta })].$ This estimation can be achieved by maximizing the likelihood function

$$ \begin{array}{@{}rcl@{}} \!\!\!\!L(\boldsymbol{\beta},\tau)\!\!\!\!&=&\!\!\!\!{\Pi}^{I}_{i=1}{\Pi}^{n_{i}}_{j=1}{{\int}^{1}_{0}} \frac{{p^{*}_{ij}}^{\{(\tau^{-1}-1)p_{ij}+y_{ij}\}-1} (1 - p^{*}_{ij})^{\{(\tau^{-1}-1)q_{ij}+y_{ij}+1\}-1}}{B((\tau^{-1}-1)p_{ij},(\tau^{-1}-1)q_{ij})} dp^{*}_{ij} \\ \!\!\!\!&=&\!\!\!\!{\Pi}^{I}_{i=1}{\Pi}^{n_{i}}_{j=1}\frac{\Gamma{(\tau^{-1}-1)p_{ij}+y_{ij}} {\Gamma}{(\tau^{-1}-1)q_{ij}+y_{ij}+1}} {\Gamma{(\tau^{-1} - 1)+2y_{ij} + 1}B((\tau^{-1} - 1)p_{ij},(\tau^{-1} - 1)q_{ij})}, \end{array} $$

(18)

with respect to β and τ.

However, some of the major drawbacks with this beta-binary approach based marginal fixed model, are:

(i) The likelihood estimation by exploiting the likelihood function (18) is complex. See, for example, Sutradhar and Das (1997), for an approximate QL approach estimation in a similar setup.

(ii) The assumption that the whole conditional probability $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})=\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i}) /[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})]$ in Eq. 1 follows a beta distribution, rather than assuming a distribution for γ_i such as normality, appears to be too restrictive and hence it may be impractical, in order to obtain a marginal fixed model.

(iii) The pair-wise correlations among the clustered responses are not understood as they may not be easy to compute. This is because, such a computation will require first the computation of the correlations between $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i}) =\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})/[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})]$ and $p^{*}_{ik}(\boldsymbol {\beta },\gamma _{i})=\exp (\boldsymbol {x}^{\prime }_{ik}\boldsymbol {\beta }+\gamma _{i}) /[1+\exp (\boldsymbol {x}^{\prime }_{ik}\boldsymbol {\beta }+\gamma _{i})]$ for j≠k;j,k = 1,…,n_i which is not possible without making further joint, say bivariate, distributional assumptions for $p^{*}_{ij}$ and $p^{*}_{ik}.$

(iv) Even though ML approach may give an estimate for τ, estimating $\sigma ^{2}_{\gamma },$ the cluster variance, is, however, not possible without knowing the specific relationship between τ and $\sigma ^{2}_{\gamma },$ $\tau (\sigma ^{2}_{\gamma })$ being currently an implicit function only.

2.4 CM-C: A SS Arbitrary Marginal Fixed Effects (AMFE) Model

Some authors, for example in an early study, Zeger et al. (1988, Section 3.1), considered a clustered binary data analysis and suggested to use the MFE model for the marginal means, given by

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!&&E[Y_{ij}] = Pr[Y_{ij} = 1|\boldsymbol{x}_{ij}] = p_{ij}(\boldsymbol{\beta}) = \exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta})/ [1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}], \end{array} $$

(19)

for all j = 1,…,n_i. Because the clustered responses are correlated, for inferences about β, these authors have suggested the use of a ‘working’ correlation structure based GEE (generalized estimating equations) approach discussed by Liang and Zeger (1986). It is clear that neither the means nor the correlations were modeled under this approach. Hence, the MFE model in Eq. 19 is purely an arbitrary fixed effects model. Notice that even though this model in Eq. 19 appears to be the same as the marginal models in Eqs. 13 and 17, it is however an assumed model without showing its connection with the marginal-conditional model (1), whereas the models in Eqs. 13 and 17 were derived from Eq. 1 under certain distributional (“bridge” and beta-binary) assumptions for the cluster effects γ_i. Furthermore, as shown by Eq. 7 (see also Eq. 11), this marginal model (19) can not be derived from Eq. 1 under normality assumption for γ_i, and in such normal based cases, the MFE model (19) would produce biased and hence inconsistent regression estimate due to ignoring $\sigma ^{2}_{\gamma }$ from the marginal mean function. In this token we remark that the marginal model (19) suggested by Zeger et al. (1988, Section 3.1), therefore, gives a wrong impression that it can be used for any clustered correlated binary data. This impression is further noticed in a recent paper by Chen et al. (2011, Section 2.1), where this marginal model (19) was used under a clustered correlated ‘response process’ without misclassification, and was generalized for a possible ‘misclassification process’. The difference between their models is that Zeger et al. (1988, Section 3.1) suggested a ‘working’ correlation structure to construct their GEE, whereas Chen et al. (2011, Section 2.1) suggested a ‘working’ odds ratio based ‘working’ covariance (or bivariate probability) structure to develop the GEE for β estimation. However, in a longitudinal setup for binary data, it is well known that these GEE approaches may produce inefficient estimates as compared to the so-called independence assumption based simpler MM and QL approaches (Sutradhar and Das (1999), Sutradhar (2011, Section 7.3.6), Sutradhar and Zheng (2018), Sutradhar (2014, Section 4.2)), which is a serious inference drawback.

To have a feel about the possible adverse performance of the odds ratio based GEE approach in the present cross-sectional cluster setup, we consider the most likely practical case with normal random cluster effects $(\gamma _{i} \sim N(0,\sigma ^{2}_{\gamma }))$ as discussed in Section 2.1, and compute the odds ratio as follows to examine whether one can express log of this odds ratio in a linear form as suggested in Chen et al. (2011, Section 2.1). Because, the odds ratio for y_ij and y_ik (j≠k;j,k = 1,…,n_i) has the formula

$$ \begin{array}{@{}rcl@{}} \psi_{ijk}&=&\frac{Pr(Y_{ij}=1,Y_{ik}=1)Pr(Y_{ij}=0,Y_{ik}=0)} {Pr(Y_{ij}=1,Y_{ik}=0)Pr(Y_{ij}=0,Y_{ik}=1)}, \end{array} $$

(20)

we compute these joint probabilities involved in Eq. 20, by exploiting the independence of y_ij and y_ik conditional on γ_i in Eq. 1 and then taking population average over the normal distribution of γ_i. More specifically, for $\gamma ^{*}_{i}=\frac {v_{i}-V(1/2)}{\sqrt {V(1/2)(1/2)}}\equiv h(v_{i})$ as in Eq. 10, following Eq. 11, we write

$$ \begin{array}{@{}rcl@{}} &&\lambda^{(1,1)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})=Pr(Y_{ij}=1,Y_{ik}=1) \end{array} $$

(21)

$$ \begin{array}{@{}rcl@{}} \!\!\!\!&=&\!\!\!\!{\sum}^{V}_{v_{i}=0}\frac{\exp((\boldsymbol{x}^{\prime}_{ij}+\boldsymbol{x}^{\prime}_{ik})\boldsymbol{\beta}+2\sigma_{\gamma} h(v_{i}))} {[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))][1+\exp(\boldsymbol{x}^{\prime}_{ik}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))]} \begin{pmatrix}V \\ v_{i}\end{pmatrix}\\&&(1/2)^{v_{i}}(1/2)^{V-v_{i}} \\ &&\lambda^{(0,0)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})=Pr(Y_{ij}=0,Y_{ik}=0) \end{array} $$

(22)

$$ \begin{array}{@{}rcl@{}} \!\!\!\!&=&\!\!\!\!{\sum}^{V}_{v_{i}=0}\frac{1} {[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))][1+\exp(\boldsymbol{x}^{\prime}_{ik}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))]} \begin{pmatrix}V \\ v_{i}\end{pmatrix}\\&&(1/2)^{v_{i}}(1/2)^{V-v_{i}} \\ &&\lambda^{(1,0)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})=Pr(Y_{ij}=1,Y_{ik}=0) \end{array} $$

(23)

$$ \begin{array}{@{}rcl@{}} \!\!\!\!&=&\!\!\!\!{\sum}^{V}_{v_{i}=0}\frac{\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))} {[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))][1+\exp(\boldsymbol{x}^{\prime}_{ik}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))]} \begin{pmatrix}V \\ v_{i}\end{pmatrix}\\&&(1/2)^{v_{i}}(1/2)^{V-v_{i}} \\ &&\lambda^{(0,1)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})=Pr(Y_{ij}=0,Y_{ik}=1) \end{array} $$

(24)

$$ \begin{array}{@{}rcl@{}} \!\!\!\!&=&\!\!\!\!{\sum}^{V}_{v_{i}=0}\frac{\exp(\boldsymbol{x}^{\prime}_{ik}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))} {[1+\exp(\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))][1+\exp(\boldsymbol{x}^{\prime}_{ik}\boldsymbol{\beta}+\sigma_{\gamma} h(v_{i}))]} \begin{pmatrix}V \\ v_{i}\end{pmatrix}\\&&(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

yielding the odds ratio as

$$ \begin{array}{@{}rcl@{}} \psi_{ijk}&=&\frac{\lambda^{(1,1)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \lambda^{(0,0)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})} {\lambda^{(1,0)}_{ijk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})\lambda^{(0,1)}_{ijk} (\boldsymbol{\beta},\sigma^{2}_{\gamma})}. \end{array} $$

(25)

In Chen et al. (2011), these joint probabilities are unknown, and $\lambda ^{(1,1)}_{ijk}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ is expressed as a function of ψ_ijk,p_ij(β),p_ik(β), and then estimate this joint probability by using an estimate of ψ_ijk. For the estimation, in this approach they use an ‘working’ log linear model, namely

$$ \psi_{ijk}=\exp(\boldsymbol{u}^{\prime}_{ijk}\boldsymbol{\alpha}), $$

where u_ijk is a set of suitable covariates and α is set of new regression parameters. Notice however that the odds ratio, which is known, and computed by Eq. 25, is far different than what is modeled using a log linear relationship. Thus, this aforementioned example demonstrates that the ‘working’ odds ratio approach by fitting a log linear model for odds ratio estimation may yield inconsistent estimate for the joint probability, restricting its use for GEE construction.

3 Existing Marginal Models and Estimation for Longitudinal Clustered Binary Data

As opposed to cross-sectional clustered data collection, in a longitudinal setup a cluster is formed with repeated responses over a period of time T, from an individual i, for all i = 1,…,I. As explained in Section 1, specifically in Eqs. 2 and 4, the correlations among repeated responses arise through certain dynamic relationships between past and present responses of the same individual. We refer to Sutradhar (2010, Section 2.2) and Sutradhar and Zheng (2018), for example, for some low order non-stationary (time dependent covariates based) correlation such as AR(1) (auto-regressive order 1), MA(1) (moving average order 1), and exchangeable/equi-correlation structures for repeated binary data. Similar but ‘working’ correlation structures for stationary/non-stationary repeated binary data are also found in Liang and Zeger (1986), Zeger et al. (1985), and Lin and Carroll (2001), for example.

As far as the marginal models for the binary means at a given time t are concerned, similar to the cross-sectional clustered binary models, these models can be (1) MFE model (LM(1)) such as in Eq. 1.4 obtained from a linear dynamic conditional model, or (2) MD/MR model (LM(2)) such as in Eq. 5 obtained from a non-linear dynamic conditional logits model, or (3) AMFE (arbitrary marginal fixed effects) model (LM(3)), where correlations are thought not to play any roles in mean specification. For convenience, we refer to Zeger et al. (1985), Sutradhar (2010, 2011), and Sutradhar and Zheng (2018), for LM(1) type MFE model; Fokianos and Kedem (2003), Sutradhar and Farrell (2007), and Sutradhar (2011, Section 7.7.2), for LM(2) type MD/MR (marginal dynamic/recursive) model; and Liang and Zeger (1986), and Lin and Carroll (2001), for the AMFE model.

We further remark that some studies (e.g., Laird and Ware, 1982; Stiratelli et al., 1984; Parzen et al., 2011) have used random effects models those are similar to the cross-sectional clustered models discussed in Section 2. However, these models can accommodate only EQC/exchangeable type correlations, and hence they have limited or no values for longitudinal data where one encounters correlations through time series type dynamic models. As discussed in Section 2, these models also have limitations for specification of marginal means for the cross-sectional cluster binary data. Thus, we do not include these models any further in our discussion.

3.1 LM(1): Time Specific (TS) Marginal Fixed Effects Model

Recall from Section 1 that the linear dynamic conditional probability (Pr[Y_it = 1|y_i,t− 1]) model (2) relates y_i,t− 1 to y_it, for t = 2,…,T, through an AR(1) type relationship. This model produces a MFE model for the binary means at time t as in Eq. 1.4, specifically it yields

$$ E[Y_{it}]=Pr[Y_{it}=1]=\tilde{p}_{it}=\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}) /[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})], $$

where β explains the effects of the fixed covariates x_it on y_it, specifically on E[Y_it]. Notice that the conditional model (2) also produces the lag (t − u) correlations between two responses y_iu and y_it, for u < t, say, as

$$ \begin{array}{@{}rcl@{}} \text{corr}(Y_{iu},Y_{it})&=& \rho^{t-u}[\frac{\sigma_{iuu}}{\sigma_{itt}}]^{\frac{1}{2}}, \end{array} $$

(26)

[Sutradhar (2011), Eqn. (7.73)) where σ_itt is the variance of y_it, for all t = 1,…,T, and is given in Eq. 1.4 as $\sigma _{itt}=\tilde {p}_{it}(1-\tilde {p}_{it}).$

As far as the estimation of β is concerned, if one is willing to ignore the correlation structure (26) (which is equivalent to use ρ = 0 in Eq. 26), then (1) one may solve the MM (method of moments) estimating equation

$$ \begin{array}{@{}rcl@{}} {\sum}^{I}_{i=1}{\sum}^{T}_{t=1}\frac{\partial \tilde{p}_{it}}{\partial \boldsymbol{\beta}}(y_{it}-\tilde{p}_{it})=0, \end{array} $$

(27)

or (2) a QL (quasi-likelihood) estimating equation

$$ \begin{array}{@{}rcl@{}} {\sum}^{I}_{i=1}{\sum}^{T}_{t=1}\frac{\partial \tilde{p}_{it}}{\partial \boldsymbol{\beta}}\sigma^{-1}_{itt}(y_{it}-\tilde{p}_{it})=0, \end{array} $$

(28)

[Wedderburn (1974)] to obtain MM or QL estimate for β. Note that both MM (27) and QL (28) estimating equations are unbiased as $E[Y_{it}]=\tilde {p}_{it}(\boldsymbol {\beta })$ yielding $E[Y_{it}-\tilde {p}_{it}(\boldsymbol {\beta })]=0.$ Consequently, as $I \rightarrow \infty ,$ MM and QL estimators will be consistent under some mild regularity conditions. But these estimators will be inefficient as compared to other moments based estimators obtained by accommodating the underlying correlation structure (26).

Let Σ_i(β,ρ) = (σ_iut(⋅)) denote the T × T covariance matrix constructed based on the correlation structure from Eq. 26. One may then obtain a highly efficient estimate of β by solving the GQL estimating equation

$$ \begin{array}{@{}rcl@{}} {\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}{\boldsymbol{\Sigma}}^{-1}_{i}(\boldsymbol{\beta},\rho)(\boldsymbol{y}_{i}-\tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta}))=0, \end{array} $$

(29)

[Sutradhar (2003, Section 3)] where

$$ \boldsymbol{y}_{i}=(y_{i1},\ldots,y_{it},\ldots,y_{iT})^{\prime}, \tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta})= (\tilde{p}_{i1},\ldots,\tilde{p}_{it},\ldots, \tilde{p}_{iT})^{\prime}. $$

Alternatively, one may obtain an optimal estimate of β by solving a likelihood estimating equation for $\boldsymbol {\theta }=(\boldsymbol {\beta }^{\prime },\rho )^{\prime }$ given by

$$ \begin{array}{@{}rcl@{}} &&\frac{\partial log L(\boldsymbol{\beta},\rho)}{\partial \boldsymbol{\theta}}=0, \end{array} $$

(30)

where

$$ L(\boldsymbol{\beta},\rho)={\Pi}^{I}_{i=1}[f(y_{i1}){\Pi}^{T}_{t=2}f(y_{it}|y_{i,t-1})], $$

(31)

with

$$ f(y_{i1})={\tilde{p}_{i1}}^{y_{i1}}[1-\tilde{p}_{i1}]^{1-y_{i1}} $$

as the binary density at t = 1, and conditional density of the form

$$ f(y_{it}|y_{i,t-1})=[\lambda^{*}_{it}(\boldsymbol{\beta},\rho|y_{i,t-1})]^{y_{it}} [1-\lambda^{*}_{it}(\boldsymbol{\beta},\rho|y_{i,t-1})]^{1-y_{it}}, $$

(32)

for t = 2,…,T, with $\lambda ^{*}_{it}(\boldsymbol {\beta },\rho |y_{i,t-1})=P[y_{it}=1|y_{i,t-1}]$ as the conditional probability as given in Eq. 2.

Note that as the likelihood estimation is more complex than the GQL estimation approach, GQL approach becomes practically useful as it also provides more efficient estimates than the MM and QL approaches. We further remark that under this MFE model (LM(1)), where correlations are specified by Eq. 26, the so-called GEE approach (Liang and Zeger (1986)) becomes redundant because no ‘working’ correlation structure is needed when true correlation structure is known.

3.2 LM(2): Time Specific (TS) Marginal Dynamic/Recursive (MD/MR) Model

Many existing studies such as Liang and Zeger (1986), Zeger et al (1988, Section 3.1), Lipsitz et al. (1991), and Yi and Cook (2002), among others, have specified the marginal binary means as a function of regression parameters only, specifically as

$$ E[Y_{it}]=Pr[Y_{it}=1]=\tilde{p}_{it}=\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}) /[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})], $$

(33)

which is similar to Eq. 19 in a cross-sectional cluster setup, and estimated β using ‘working’ correlations based so-called GEE approach. Thus, it is clear that these and other follow up works neither did model the marginal means nor the correlation structure for the underlying longitudinal binary responses. Between these two specifications, i.e., specifying the marginal means by Eq. 33, and specifying a ‘working’ correlations matrix, for repeated binary data, the former specification can seriously effect the validation of the regression estimates when the marginal means for correlated binary data can not be specified as a function of regression parameters only. One such important situation is indicated by Eq. 5 under Section 1, where marginal means for the longitudinal binary data appear to involve both regression (β) and correlation (ρ) parameters. More specifically, the dynamic logit model (4) yields

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\mu_{i1}(\boldsymbol{\beta})\!\!\!&=&\!\!\!E[Y_{i1}|\boldsymbol{x}_{i1}]=\tilde{p}_{i1}(\boldsymbol{\beta}) =\exp(\boldsymbol{x}^{\prime}_{i1}\boldsymbol{\beta})/[1+\exp(\boldsymbol{x}^{\prime}_{i1}\boldsymbol{\beta})] \\ \!\!\!\!\!\!\!\!\!\!\!\!\mu_{i2}(\boldsymbol{\beta},\rho)\!\!\!&=&\!\!\!E[Y_{i2}|H_{i2}]=\frac{\exp(\boldsymbol{x}^{\prime}_{i2}\boldsymbol{\beta})} {[1+\exp(\boldsymbol{x}^{\prime}_{i2}\boldsymbol{\beta})]} \\ &&+\mu_{i1}(\boldsymbol{\beta})\left( \frac{\exp(\boldsymbol{x}^{\prime}_{i2}\boldsymbol{\beta}+\rho)} {[1+\exp(\boldsymbol{x}^{\prime}_{i2}\boldsymbol{\beta}+\rho)]} -\frac{\exp(\boldsymbol{x}^{\prime}_{i2}\boldsymbol{\beta})}{[1+\exp(\boldsymbol{x}^{\prime}_{i2}\boldsymbol{\beta})]}\right) \\ \!\!\!\!\!\!\!\!\!\!\!\!\mu_{i3}(\boldsymbol{\beta},\rho)\!\!\!&=&\!\!\!E[Y_{i3}|H_{i3}]=\frac{\exp(\boldsymbol{x}^{\prime}_{i3}\boldsymbol{\beta})} {[1+\exp(\boldsymbol{x}^{\prime}_{i3}\boldsymbol{\beta})]} \\ &&\!\!\!+\mu_{i2}(\boldsymbol{\beta},\rho)\left( \frac{\exp(\boldsymbol{x}^{\prime}_{i3}\boldsymbol{\beta}+\rho)} {[1+\exp(\boldsymbol{x}^{\prime}_{i3}\boldsymbol{\beta}+\rho)]} -\frac{\exp(\boldsymbol{x}^{\prime}_{i3}\boldsymbol{\beta})}{[1+\exp(\boldsymbol{x}^{\prime}_{i3}\boldsymbol{\beta})]}\right), \end{array} $$

(34)

and so on. Clearly, these means show a recursive relationship. These marginal means take the so-called fixed effects model form, i.e., $\mu _{it}(\cdot )=\tilde {p}_{it}(\boldsymbol {\beta })=\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta }) /[1+\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta })], \text {for all} t=1,\ldots ,T,$ only when ρ = 0. Otherwise, μ_i2(⋅) (marginal mean at time point t = 2) is $\tilde {p}_{i2}(\boldsymbol {\beta })$ plus an increment or decrement due to ρ weighted by previous mean μ_i1, and so on. It is then clear that one can no longer estimate the regression effects β by using the so-called GEE approach (Liang and Zeger, 1986). This is because the binary means (5) under this BDL model are not free of correlation parameter, whereas GEE approach is developed for the estimation of fixed effects based marginal means all containing only regression parameters β, correlations are being nuisance. In summary, any β estimates for the mean model (33) when in fact the mean model by Eq. 5 is true, would produce inconsistent regression estimates, which is a serious inference issue.

For the regression analysis of the BDL (binary dynamic logit) model (4) which produces the marginal recursive (MR) means as in Eq. 5 involving both β and ρ, (Sutradhar and Farrell, 2007) (see also Amemiya (1985, p. 422) in a time series setup) have develop a GQL (generalized quasi-likelihood) estimation approach which exploits the true correlation structure of the data. For u < t, the formula for the lag (t − u) auto-correlation between y_iu and y_it, is given by

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\text{corr}(Y_{iu},Y_{it})\!\!\!&=&\!\!\!\tilde{\rho}_{t-u}(\boldsymbol{\beta},\rho) \\ \!\!\!&=&\!\!\!\sqrt{\frac{\mu_{iu}(\cdot)(1-\mu_{iu}(\cdot))} {\mu_{it}(\cdot)(1-\mu_{it}(\cdot))}}{\Pi}^{t}_{v=u+1} ({\tilde{\tilde{p}}}_{iv}(\boldsymbol{\beta},\rho)-\tilde{p}_{iv}(\boldsymbol{\beta})), \end{array} $$

(35)

[Sutradhar and Farrell (2007)] where

$$ \begin{array}{@{}rcl@{}} \tilde{p}_{it}(\boldsymbol{\beta})&=&\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}) /[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})], \text{for all} t=1,\ldots,T, \\ {\tilde{\tilde{p}}}_{it}(\boldsymbol{\beta},\rho)&=&\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+\rho) /[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+\rho)], \text{for all} t=2,\ldots,T, \\ \mu_{i1}(\boldsymbol{\beta})&=&\tilde{p}_{i1}(\boldsymbol{\beta}), \end{array} $$

and μ_it(β,ρ) for t = 2,…,T, have the recursive/dynamic formula as in Eq. 5. Subsequently, one may construct the T × T true covariance matrix of the response vector $\boldsymbol {y}_{i}=(y_{i1},\ldots ,y_{it},\ldots ,y_{iT})^{\prime },$ of the i-th individual, as

$$ \begin{array}{@{}rcl@{}} \tilde{\boldsymbol{\Sigma}}_{i}(\boldsymbol{\beta},\rho)&=&\text{cov}[\boldsymbol{Y}_{i}]={\boldsymbol{A}}^{\frac{1}{2}}_{i} (\boldsymbol{\beta},\rho)\tilde{\boldsymbol{\rho}}_{M}(\boldsymbol{\beta},\rho) {\boldsymbol{A}}^{\frac{1}{2}}_{i}(\boldsymbol{\beta},\rho), \end{array} $$

(36)

where

$$ \begin{array}{@{}rcl@{}} \boldsymbol{A}_{i}(\boldsymbol{\beta},\rho)&=& \text{diag}[\mu_{i1}(\cdot)(1-\mu_{i1}(\cdot)),\ldots,\mu_{iT}(\cdot)(1-\mu_{iT}(\cdot))] \\ \tilde{\boldsymbol{\rho}}_{M}(\boldsymbol{\beta},\rho)&=&\begin{pmatrix}1 & \tilde{\rho}_{1} & \tilde{\rho}_{2} & {\ldots} &\tilde{\rho}_{\ell} & {\ldots} & \tilde{\rho}_{T-2} & \tilde{\rho}_{T-1} \\ \cdot & 1 & \tilde{\rho}_{1} & {\ldots} & \tilde{\rho}_{\ell -1}& {\ldots} & \tilde{\rho}_{T-3} & \tilde{\rho}_{T-2} \\ {\vdots} & {\vdots} & \vdots & {\ldots} & {\vdots} & {\ldots} & {\vdots} \\ \cdot &\cdot & \cdot & {\ldots} & \cdot & {\ldots} & 1 & \tilde{\rho}_{1} \\ \cdot &\cdot & \cdot & {\ldots} & \cdot & {\ldots} & \cdot & 1 \\\end{pmatrix}. \end{array} $$

(37)

One may then exploit the mean vector μ_i(β,ρ) = E[Y_i] = (μ_i1(β),μ_i2(β,ρ), $\ldots ,\mu _{iT}(\boldsymbol {\beta },\rho ))^{\prime }$ and the above covariance matrix $\tilde {\boldsymbol {\Sigma }}_{i}(\boldsymbol {\beta },\rho )$ from Eq. 36, for the GQL estimation of the main regression parameter β. The dynamic dependence or correlation index parameter ρ can be estimated by using the method of moments (MM). See Section 5, for specific GQL and MM estimating equations for these parameters. In the same section, it is shown that as $I \rightarrow \infty ,$ the GQL estimator of β and the MM estimator of ρ are consistent under some mild regularity conditions. The asymptotic normality of the GQL estimator of the main parameter β is also given for convenience of the construction of confidence intervals, when needed.

3.3 LM(3): A TS (Time Specific) Arbitrary Marginal Fixed Effects (AMFE) Model for Longitudinal Binary Data

This model is similar to the AMFE model (19) under the cross-sectional cluster setup. More specifically, the AMFE model under the longitudinal setup is written as in Eq. 33, i.e.,

$$ E[Y_{it}]=Pr[Y_{it}=1]=\tilde{p}_{it}(\boldsymbol{\beta}) =\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})/[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta})], $$

without writing any correlation structures for its derivation, as the underlying structure is assumed to be unknown. Some possible correlation models those might yield the above marginal mean model (as in Eq. 33) are: (a) the AR(1) type model given in Eq. 2. (b) MA(1) (moving average order 1), and (c) EQC (equi-correlations)/Exchange, models. We may refer to Sutradhar (2011, Sections 7.4.1 to 7.4.3) for a detailed discussion about these three basic longitudinal models, all yielding the same marginal fixed effects based mean model.

Notice that to obtain a consistent estimate of β involved in the above marginal mean $\tilde {p}_{it}(\boldsymbol {\beta }),$ one may solve the MM or QL estimating equations shown in Eqs. 26 and 27, as they are unbiased estimating equations. These MM and QL estimating equations are free of correlations and hence the regression estimates obtained from them are bound to be less efficient than any moments based equations involving correlations. However, for the cases where true correlation models are unknown, Liang and Zeger (1986) proposed a ‘working’ correlation matrix based GEE (generalized estimating equations) approach for efficient estimation of β. More specifically, for efficient β estimation, they define a ‘working’ correlation matrix as R_i(α), α being a set of working correlation index parameters, and solve the GEE given by

$$ \begin{array}{@{}rcl@{}} &&{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha)(\boldsymbol{y}_{i}-\tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta}))=0, \end{array} $$

(38)

where $\tilde {\boldsymbol {p}}_{i}(\boldsymbol {\beta })=(\tilde {p}_{i1}(\boldsymbol {\beta }),\ldots , \tilde {p}_{it}(\boldsymbol {\beta }),\ldots ,\tilde {p}_{iT}(\boldsymbol {\beta }))^{\prime },$ and $\boldsymbol {V}_{i}(\boldsymbol {\beta },\alpha )={\tilde {\boldsymbol {A}}}^{\frac {1}{2}}_{i}(\boldsymbol {\beta }) \boldsymbol {R}_{i}(\boldsymbol {\beta },$ $\alpha ) {\tilde {\boldsymbol {A}}}^{\frac {1}{2}}_{i}(\boldsymbol {\beta }),$ with $\tilde {\boldsymbol {A}}_{i}(\boldsymbol {\beta })=\text {diag}[\tilde {p}_{i1},\ldots ,\tilde {p}_{it},\ldots , \tilde {p}_{iT}].$ Because this GEE approach was ambitiously aimed to deal with any types of correlated binary data, it was used by hundreds and hundreds researchers over two decades or so until it was discovered that this approach may in fact yield less efficient estimates than an independence assumption-based estimating equation approach (Sutradhar and Das (1999), Sutradhar (2011, Section 7.3.6; see also Sutradhar and Zheng (2018) under a semi-parametric setup)) such as QL approach in Eq. 27 (also may be referred to as independence based GEE (GEE(I)).

Further note that as pointed out in the last section, one can not at all use the marginal fixed effects (MFE) based GEE approach for certain longitudinal binary data where correlation parameters enter to the formulas for the binary marginal means. More specifically, if GEE is used in such cases, it will produce inconsistent regression estimates. For example, suppose that the longitudinal responses follow the BDL (binary dynamic logit) model (4) yielding the marginal mean models as in Eq. 5. Under this BDL model, the response vector $\boldsymbol {y}_{i}=(y_{i1},\ldots ,y_{it},\ldots ,y_{iT})^{\prime }$ has the mean $\boldsymbol {\mu }_{i}(\boldsymbol {\beta },\rho )=E[\boldsymbol {Y}_{i}]=(\mu _{i1} (\boldsymbol {\beta }),\mu _{i2}(\boldsymbol {\beta },\rho ),\ldots ,\mu _{iT}(\boldsymbol {\beta },\rho ))^{\prime }$ (see Eq. 5) and the covariance matrix Σ_i(β,ρ) as in Eq. 36, i.e.,

$$ \boldsymbol{Y}_{i} \sim(\boldsymbol{\mu}_{i}(\boldsymbol{\beta},\rho),\boldsymbol{\Sigma}_{i}(\boldsymbol{\beta},\rho)), $$

(39)

where marginal means are function of both β and ρ, whereas the GEE approach will specify the mean vector as $\tilde {\boldsymbol {p}}_{i}(\boldsymbol {\beta })=(\tilde {p}_{i1}(\boldsymbol {\beta }),\ldots , \tilde {p}_{it}(\boldsymbol {\beta }),\ldots ,\tilde {p}_{iT}$ $(\boldsymbol {\beta }))^{\prime },$ with $\tilde {p}_{it}(\boldsymbol {\beta })=\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta })/[1+\exp (\boldsymbol {x}^{\prime }_{it}\boldsymbol {\beta })].$

Now to examine the convergence of $\hat {\boldsymbol {\beta }}_{GEE}$ obtained from Eq. 38 when it is known that y_i has the true mean vector and covariance matrix as in Eq. 39, we first write the iterative equation to obtain $\hat {\boldsymbol {\beta }}_{GEE}$, as follows:

$$ \begin{array}{@{}rcl@{}} &&\hat{\boldsymbol{\beta}}_{GEE}(r+1)=\hat{\boldsymbol{\beta}}_{GEE}(r) \\ \!\!\!\!&+&\!\!\!\!\left[\left\{{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta})}{\partial {\boldsymbol{\beta}}^{\prime}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\hat{\alpha})\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right\}\right.\\&&\left. {\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\hat{\alpha})(\boldsymbol{y}_{i} -\tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta}))\right]_{\boldsymbol{\beta}=\hat{\boldsymbol{\beta}}_{GEE}(r)}. \end{array} $$

(40)

Notice that because the mean vector and covariance matrix of y_i are the function of both β and ρ, and because $\hat {\alpha }$ is usually a moment estimator, it then follows that $\hat {\alpha }$ will converge to a quantity, say α₀, which must be a function of ρ. That is,

$$ \hat{\alpha} \rightarrow \alpha_{0}(\rho) $$

(41)

[Crowder (1995), Sutradhar and Das (1999)]. Thus, one may approximate the limiting (as $I \rightarrow \infty $) difference between $\hat {\boldsymbol {\beta }}_{GEE}$ and true parameter β, as

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\lim_{I \rightarrow \infty}[\hat{\boldsymbol{\beta}}_{GEE}-\boldsymbol{\beta}] \\ &\approx &\!\!\!\!\lim_{I \rightarrow \infty}\left[\left\{{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta})}{\partial {\boldsymbol{\beta}}^{\prime}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha_{0}(\rho))\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right\} \right.\\&&\left.{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha_{0}(\rho))(\boldsymbol{y}_{i} -\tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta}))\right] \\ &\rightarrow &\!\!\!\! E_{\boldsymbol{y}}\left[\left\{{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta})}{\partial {\boldsymbol{\beta}}^{\prime}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha_{0}(\rho))\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right\} \right.\\&&\left.{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha_{0}(\rho))(\boldsymbol{y}_{i} -\tilde{\boldsymbol{p}}_{i}(\beta))\right] \\ &=&\!\!\!\!\left[\left\{{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta})}{\partial {\boldsymbol{\beta}}^{\prime}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha_{0}(\rho))\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}\right\} \right.\\&&\left.{\sum}^{I}_{i=1}\frac{\partial \tilde{\boldsymbol{p}}^{\prime}_{i}(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}{\boldsymbol{V}}^{-1}_{i}(\boldsymbol{\beta},\alpha_{0}(\rho))(\boldsymbol{\mu}_{i}(\boldsymbol{\beta},\rho) -\tilde{\boldsymbol{p}}_{i}(\boldsymbol{\beta}))\right] \ne 0, \end{array} $$

(42)

because of the fact that under the BDL model (4), E[Y_i] = μ_i(β,ρ) as in Eq. 34 (see also Eq. 39), which is quite different than $\tilde {\boldsymbol {p}}_{i}(\boldsymbol {\beta }).$ Thus $\hat {\boldsymbol {\beta }}_{GEE}$ obtained from Eq. 38 is asymptotically biased and can not converge to β unless ρ = 0, which is unlikely to happen in the longitudinal setup.

4 Further Estimation and Asymptotic Properties in Cross-sectional Cluster Setup

4.1 GQL and MM Estimation

Recall from Section 2 that except the MME based general cluster model A (CM-A), the remaining MFE based cluster models were developed either under restrictive assumptions about the distribution of the random effects such as “bridge” distribution leading to the fixed effects model CM-B-1, and beta-binary distribution leading to the fixed effects model CM-B-2, or using ‘working’ specification both for means and correlations leading to the AMFE model (CM-C). As discussed in details in the same section, these later three MFE based models have limited practical use, in particular the AMFE model (CM-C) can not be trusted at all as it does not justify how a fixed effects based marginal mean model can be derived from the conditional random effects model (1). For these reasons, we concentrate in this section only on the estimation of the parameters of the MME based CM-A model.

More specifically we turn back to the CM-A model described in Section 2.1. The model parameters β and $\sigma ^{2}_{\gamma }$ are involved in the marginal mean function $\mu _{ij}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ in Eq. 7 which has its BA (binomial approximation) based computational version given by Eq. 11. For the estimation of these parameters, as discussed in Section 2.1.1 that there exist several likelihood based approaches (exact likelihood, PQL (penalized quasi-likelihood), HL (hierarchical likelihood)), but they are either computationally involved or they produce biased and hence mean squared error inconsistent estimates specially for large values of $\sigma ^{2}_{\gamma }.$ In this section, following the GQL approach of Sutradhar (2004) developed under the GLMM (generalized linear mixed model) setup, we simplify the binomial approximation (to standard normal random effects) based GQL estimating equation for β, and MM estimating equation for $\sigma ^{2}_{\gamma }.$ Furthermore, as in practice one (specially the statistical agencies) deals with large number of clusters each containing large number of individuals so that ${\sum }^{I}_{i=1}n_{i} \rightarrow \infty ,$ in the next section we make sure for the benefit to these practitioners that the GQL estimator of β and the MM estimator of $\sigma ^{2}_{\gamma },$ are consistent. In Section 4.2.2, we show that the GQL estimator of the main regression parameters β has asymptotically normal distribution providing an opportunity for confidence interval construction, when needed.

4.1.1 GQL Estimation of β

Once the mean function is specified, one requires the true covariance/correlation structure to construct the desired GQL estimating equation (Sutradhar (2003, Section 3.1)) for the parameter of interest. Under the conditional cluster model (1) with normal random cluster effects (γ_i), the mean function of the i-th cluster response vector $\boldsymbol {y}_{i}=(y_{i1},\ldots ,y_{ij},\ldots ,y_{in_{i}})^{\prime }$ is computed as in Eq. 39. More specifically, by using the BA (binomial approximation) to the standard normal cluster effect $\gamma ^{*}_{i}$ as in Eq. 10, we write the BA based mean function as

$$ \begin{array}{@{}rcl@{}} &&E[\boldsymbol{Y}_{i}]={\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&(\mu^{BA}_{i1}(\boldsymbol{\beta},\sigma^{2}_{\gamma}),\ldots, \mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma}),\ldots,\mu^{BA}_{in_{i}}(\boldsymbol{\beta},\sigma^{2}_{\gamma}))^{\prime}, \end{array} $$

(43)

where by Eq. 11,

$$ \mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) ={\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, $$

for all j = 1,…,n_i, V being a large number such as V = 10,or15. It immediately follows that

$$ \text{var}(Y_{ij})=\sigma^{BA}_{i,jj}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) =\mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) (1-\mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma})). $$

(44)

We now turn to the computation of the n_i × n_i covariance matrix of y_i. For two responses y_ij and y_ik, j≠k;j,k = 1,…,n_i, by Eq. 1, we first write

$$ \begin{array}{@{}rcl@{}} &&\lambda_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})= E[Y_{ij}Y_{ik}]=E_{\gamma_{i}}E[Y_{ij}Y_{ik}|\gamma_{i}] \\ &=&E_{\gamma_{i}}[E(Y_{ij}|\gamma_{i})E(Y_{ik}|\gamma_{i})] =\int p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i}) p^{*}_{ik}(\boldsymbol{\beta},\gamma_{i})g_{N}(\gamma_{i})d\gamma_{i}, \end{array} $$

(45)

where $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})=\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i}) /[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})],$ and $g_{N}(\gamma _{i}) \equiv [\gamma _{i} \sim N(0,\sigma ^{2}_{\gamma })].$ Notice that the normal integration (45) of a complex function in γ_i, can be computed as in Eq. 11 using the BA. More specifically, for $\gamma ^{*}_{i}=\gamma _{i}/\sigma _{\gamma } \equiv h(v_{i})$ as in Eq. 10, the integration in Eq. 45 is approximated as

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!=&\!\!\!\!\!\!\!\!\!\!\!\!{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

(46)

yielding the covariance between y_ij and y_ik as

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\text{cov}[Y_{ij},Y_{ik}]\!\!\!\!&=&\!\!\!\!E[Y_{ij}Y_{ik}]-E[Y_{ij}]E[Y_{ik}] \\ \!\!\!\!&=&\!\!\!\!\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})-\mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \mu^{BA}_{ik}(\boldsymbol{\beta},\sigma^{2}_{\gamma})=\sigma^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \end{array} $$

(47)

where the formula for $\mu ^{BA}_{ij}(\boldsymbol {\beta },\sigma ^{2}_{\gamma }),$ for example, is given by Eq. 43 (see also Eq. 11). Subsequently, combining Eqs. 44 and 47 we obtain the n_i × n_i covariance matrix of y_i, as

$$ \begin{array}{@{}rcl@{}} \text{cov}[\boldsymbol{Y}_{i}]&=&{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) =(\sigma^{BA}_{i,jk}): n_{i} \times n_{i}. \end{array} $$

(48)

Because $E[\boldsymbol {Y}_{i}]={\boldsymbol {\mu }}^{BA}_{i}(\boldsymbol {\beta },\sigma ^{2})$ and $\text {cov}[\boldsymbol {Y}_{i}]={\boldsymbol {\Sigma }}^{BA}_{i}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ can be computed by Eqs. 43 and 48, respectively, following Sutradhar (2003, Section 3.1), for given $\sigma ^{2}_{\gamma },$ one may then construct the desired GQL estimating equation for β as

$$ \begin{array}{@{}rcl@{}} &&{\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}))=0. \end{array} $$

(49)

Let the solution of this GQL estimating Eq. 49 be denoted by $\hat {\boldsymbol {\beta }}_{GQL}.$ For practical benefit, the asymptotic properties of this estimator must be studied. The consistency of this estimator is examined in Section 4.2.1, along with its asymptotic normality property in Section 4.2.2. To solve the estimating Eq. 49, it remains to compute the matrix derivative involved in the equation, which we derive as follows.

Computation of the Derivative $\frac {\partial [{\boldsymbol {\mu }}^{BA}_{i}(\boldsymbol {\beta },\sigma ^{2})]^{\prime }}{\partial \boldsymbol {\beta }}$

For this matrix computation it is sufficient to compute the derivative vector, $\frac {\partial \mu ^{BA}_{ij}(\boldsymbol {\beta },\sigma ^{2})}{\partial \boldsymbol {\beta }},$ which, following Eq. 43, can be derived as

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\!\!&&\frac{\partial \mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \boldsymbol{\beta}} = {\sum}^{V}_{v_{i}=0}\frac{\partial p^{*}_{ij}(\boldsymbol{x}_{ij};\!\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))}{\partial \boldsymbol{\beta}}\!\begin{pmatrix}V \\ v_{i}\end{pmatrix}\!(1/2)^{v_{i}}(1/2)^{V-v_{i}} \end{array} $$

(50)

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\!\!&=&{\sum}^{V}_{v_{i}=0}\boldsymbol{x}_{ij} p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))q^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}\\\!\!\!\!\!\!\!\!\!\!\!\!\!\!&&(1/2)^{v_{i}}(1/2)^{V-v_{i}}: p \times 1, \end{array} $$

where $q^{*}_{ij}(\boldsymbol {x}_{ij};\boldsymbol {\beta },\sigma _{\gamma } h(v_{i}))\!=1-p^{*}_{ij}(\boldsymbol {x}_{ij};\boldsymbol {\beta },\sigma _{\gamma } h(v_{i}))=[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\sigma _{\gamma } h$ (v_i))]^− 1.

4.1.2 MM Estimation of $\sigma ^{2}_{\gamma }$

Notice that the GQL estimating Eq. 66 for β was developed for known $\sigma ^{2}_{\gamma },$ which is, however, unknown in practice. Similar to Sutradhar (2004) (see also Jiang, 1998), in this section we estimate this parameter by exploiting second order binary responses, whereas β was estimated using the first order responses. However, because $\sigma ^{2}_{\gamma }$ is a parameter of secondary interest, as opposed to the GQL approach for $\sigma ^{2}_{\gamma }$ estimation by Sutradhar (2004), for simplicity, we use the well known method of moments (MM). It is shown in Section 4.2.3, this simpler MM estimation produces consistent $\sigma ^{2}_{\gamma }$ estimate, similar to the consistency property of the GQL regression estimator $\hat {\boldsymbol {\beta }}_{GQL}.$ As pointed out above, this MM estimator of $\sigma ^{2}_{\gamma }$ is expected to be less efficient than its GQL estimator, this efficiency is not being a concerning issue as $\sigma ^{2}_{\gamma }$ is a parameter of secondary interest.

Let the second order response vectors under the present clustered binary setup, be denoted by

$$ \begin{array}{@{}rcl@{}} \boldsymbol{g}_{i}&=&(y^{2}_{i1},\ldots,y^{2}_{ij},\ldots,y^{2}_{in_{i}})^{\prime} \equiv(y_{i1},\ldots,y_{ij},\ldots,y_{in_{i}})^{\prime}=\boldsymbol{y}_{i}:n_{i} \times 1 \\ \boldsymbol{q}_{i} &=& (y_{i1}y_{i2},\ldots,y_{ij}y_{ik},\ldots, y_{i(n_{i}-1)}y_{in_{i}})^{\prime}: j<k: \frac{n_{i}(n_{i}-1)}{2} \times 1 \\ &\equiv& (q_{i,12},\ldots,q_{i,jk},\ldots,q_{i,(n_{i}-1)n_{i}})^{\prime}, \end{array} $$

(51)

containing all possible squared and pair-wise responses. Because, g_i = y_i, clearly $E[\boldsymbol {G}_{i}]={\boldsymbol {\mu }}^{BA}_{i}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ as in Eq. 43. Next, by Eqs. 45 and 46, we write

$$ \begin{array}{@{}rcl@{}} E[\boldsymbol{Q}_{i}]&=&(\lambda^{BA}_{i,12}(\boldsymbol{\beta},\sigma^{2}_{\gamma}),\ldots, \lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \ldots,\boldsymbol{\lambda}^{BA}_{i,(n_{1}-1)n_{i}}(\boldsymbol{\beta},\sigma^{2}_{\gamma}))^{\prime} \\ &=&\boldsymbol{\lambda}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \text{(say)}, \end{array} $$

(52)

where the BA based formula for $E[Y_{ij}Y_{ik}]=\lambda _{i,jk}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ is given in Eq. 46. Because both $\boldsymbol {\mu }^{BA}_{i}(\cdot )$ and $\boldsymbol {\lambda }^{BA}_{i}(\cdot )$ contain $\sigma ^{2}_{\gamma }$ on top of β, we may construct a MM estimating equation for $\sigma ^{2}_{\gamma },$ as

$$ \begin{array}{@{}rcl@{}} && {\sum}^{I}_{i=1}\left[\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})]^{\prime} }{\partial \sigma^{2}_{\gamma}} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta}, \sigma^{2}_{\gamma}) ) \right. \\ &+& \left. \frac{\partial [{\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{\prime}}{\partial \sigma^{2}_{\gamma}} (\boldsymbol{q}_{i}-{\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}))\right]=0, \end{array} $$

(53)

where

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\frac{\partial {\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}} = (\frac {\partial \mu^{BA}_{i1}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}},\ldots, \frac{\partial \mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}},\ldots,\frac{\partial \mu^{BA}_{in_{i}}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}})^{\prime}, \end{array} $$

(54)

with

$$ \begin{array}{@{}rcl@{}} &&\frac{\partial \mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}} \end{array} $$

(55)

$$ \begin{array}{@{}rcl@{}} &\!\!\!\!\!=&\!\!\!\!{\sum}^{V}_{v_{i}=0}\frac{1}{2\sigma_{\gamma}}h(v_{i}) p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))q^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

and

$$ \begin{array}{@{}rcl@{}} \frac{\partial {\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}}&=& (\frac{\partial \lambda^{BA}_{i,12}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}},\ldots,\frac{\partial \lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}}, \ldots,\\&&\frac{\partial \lambda^{BA}_{i,(n_{1}-1)n_{i}}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}})^{\prime}, \end{array} $$

(56)

where, it follows from Eq. 46 that

$$ \begin{array}{@{}rcl@{}} &&\!\!\!\!\!\!\!\!\!\!\!\!\frac{\partial \lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})}{\partial \sigma^{2}_{\gamma}}={\sum}^{V}_{v_{i}=0}\frac{1}{2\sigma_{\gamma}}h(v_{i}) p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ &\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\!\times &\!\!\!\!\!\!\!\!\!\!\!\! [q^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) +q^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))]\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

(57)

for all j < k;j,k = 1,…,n_i.

Let $\hat {\sigma }^{2}_{\gamma , MM}$ denote the solution of the moment (53). In Section 4.2.3 we show that this MM estimator for $\sigma ^{2}_{\gamma }$ is a consistent estimator under some mild regularity conditions.

4.2 Consistency and Asymptotic Normality

4.2.1 Consistency of $\hat {\boldsymbol {\beta }}_{GQL}$ Obtained from Eq. 49

We first apply a first order Taylor series expansion to the GQL estimating function in the left hand side of the estimating in Eq. 49 and obtain

$$ \begin{array}{@{}rcl@{}} &&\hat{\boldsymbol{\beta}}_{GQL}-\boldsymbol{\beta} \simeq -\left[{\sum}^{I}_{i=1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}}\right]^{-1} \\ &\times &\left[{\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}))\right] \\&&+o_{p}(1/\sqrt{N}), \end{array} $$

(58)

where $N={\sum }^{I}_{i=1}n_{i}.$ Let G_N be a N-dependent finite and bounded quantity, and it increases as N gets larger. Notice that the p × p matrix in the first term in the right hand side of Eq. 58 is free from responses {y}. Suppose that this p × p matrix satisfies the regularity condition

$$ \begin{array}{@{}rcl@{}} \frac{1}{{\sum}^{I}_{i=1}n_{i}}{\sum}^{I}_{i=1} |\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}}| \le G_{N}, \end{array} $$

(59)

implying that

$$ \begin{array}{@{}rcl@{}} {\sum}^{I}_{i=1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}}\equiv O(NG_{N}). \end{array} $$

(60)

Next the second term in the right hand side of Eq. 58 converges to zero as

$$ \begin{array}{@{}rcl@{}} &&{}\quad{\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})) \\ &&{} \rightarrow E\left[{\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}))\right]=0, \end{array} $$

(61)

in the order of

$$ \begin{array}{@{}rcl@{}} &&{}\quad\left[|\text{cov}\left\{ {\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})) \right \}|\right]^{\frac{1}{2}} \end{array} $$

(62)

$$ \begin{array}{@{}rcl@{}} &&{}=|\left[ {\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}} \right]^{\frac{1}{2}}|\\&&{}\simeq O_{p}(\sqrt{NG_{N}}), \text{by} (4.35). \end{array} $$

Hence applying Eqs. 60 and 62 into 58, we obtain

$$ \begin{array}{@{}rcl@{}} [\hat{\boldsymbol{\beta}}_{GQL}-\boldsymbol{\beta} ] &=& O(N^{-1}G^{-1}_{N})O_{p}(\sqrt{NG_{N}})+o_{p}(1/\sqrt{N}) \\ &=&O_{p}((1/\sqrt{N})G^{-\frac{1}{2}}_{N})+o_{p}(1/\sqrt{N}) \equiv o_{p}(1/\sqrt{N}), \end{array} $$

(63)

because G_N is a finite and bounded quantity. It then follows that

$$ \begin{array}{@{}rcl@{}} &&\lim_{N \rightarrow \infty} [\hat{\boldsymbol{\beta}}_{GQL}-\boldsymbol{\beta} ] \rightarrow 0. \end{array} $$

(64)

Thus, $\hat {\boldsymbol {\beta }}_{GQL}$ obtained from Eq. 49 is consistent for β.

Note that as $\hat {\boldsymbol {\beta }}_{GQL}$ is asymptotically unbiased for β, it follows from Eq. 58 that its asymptotic covariance matrix is given by

$$ \begin{array}{@{}rcl@{}} &&{}\lim_{I \rightarrow \infty}\text{cov}(\hat{\boldsymbol{\beta}}_{GQL}) \\&&{}=\left[ {\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}}\right]^{-1}, \end{array} $$

(65)

which can be estimated by replacing β and $\sigma ^{2}_{\gamma },$ with $\hat {\boldsymbol {\beta }}_{GQL}$ and $\hat {\sigma }^{2}_{\gamma ,MM},$ respectively, provided $\hat {\sigma }^{2}_{\gamma ,MM}$ is a consistent estimator of $\sigma ^{2}_{\gamma }.$ This later consistency property is examined in Section 4.2.3. Further note that the aforementioned estimate for $\text {cov}(\hat {\boldsymbol {\beta }}_{GQL})$ becomes more useful when confidence interval construction for the β parameter is needed. However, for such a confidence interval construction one needs to examine the asymptotic distribution of $\hat {\boldsymbol {\beta }}_{GQL},$ which we do in the following section.

4.2.2 Asymptotic Normality of $\hat {\boldsymbol {\beta }}_{GQL}$

We outline the derivation of the asymptotic distribution as follows. Notice from Eq. 58 that for large I, β estimator satisfy the approximation

$$ \begin{array}{@{}rcl@{}} &&{}\quad\hat{\boldsymbol{\beta}}_{GQL}-\boldsymbol{\beta} \simeq -\left[\frac{1}{I} {\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}} \right]^{-1} \\ &&{}\times \left[\frac{1}{I} {\sum}^{I}_{i=1}\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}))\right], \end{array} $$

(66)

which we re-express as

$$ \begin{array}{@{}rcl@{}} \hat{\boldsymbol{\beta}}_{GQL}-\boldsymbol{\beta} \simeq =-\left[{\sum}^{I}_{i=1}\frac{\partial \boldsymbol{f}_{i}(\boldsymbol{\beta}|\sigma^{2}_{\gamma}, \boldsymbol{y}_{i})}{\partial \boldsymbol{\beta}^{\prime}}\right]^{-1} \left[{\sum}^{I}_{i=1}\boldsymbol{f}_{i}(\boldsymbol{\beta}|\sigma^{2}_{\gamma}, \boldsymbol{y}_{i})\right]. \end{array} $$

(67)

Let

$$ \begin{array}{@{}rcl@{}} \bar{\boldsymbol{f}}_{I}(\boldsymbol{\beta}|\sigma^{2}_{\gamma})=\frac{1}{I}{\sum}^{I}_{i=1} \boldsymbol{f}_{i}(\boldsymbol{\beta}|\sigma^{2}_{\gamma},\boldsymbol{y}_{i}), \end{array} $$

(68)

where f_i’s are clearly independent because y₁,…,y_i,…,y_I are independent vectors from I independent clusters. But, they are not identically distributed because of the fact that

$$ \{\boldsymbol{Y}_{i}: n_{i} \times 1\} \sim ({\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}),{\boldsymbol{\Sigma}}^{BA}_{i} (\boldsymbol{\beta},\sigma^{2}_{\gamma})), $$

(69)

by Eqs. 43 and 48, i.e., the means, variances and covariances are cluster dependent, i.e., they vary from cluster to cluster. Notice from Eqs. 66–68 that $\bar {\boldsymbol {f}}_{I}(\boldsymbol {\beta }|\sigma ^{2}_{\gamma },\boldsymbol {y}_{i})$ in Eq. 68 has the mean vector and covariance matrix as given by

$$ \begin{array}{@{}rcl@{}} &&E[\bar{\boldsymbol{f}}_{I}(\boldsymbol{\beta})]=0 , \text{and} \\ && \text{cov}[\bar{\boldsymbol{f}}_{I}(\boldsymbol{\beta})] =\frac{1}{I^{2}}{\sum}^{I}_{i=1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]^{\prime}}{\partial \boldsymbol{\beta}}[{\boldsymbol{\Sigma}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2})]}{\partial \boldsymbol{\beta}^{\prime}} \\ &=&\frac{1}{I^{2}} {\boldsymbol{V}}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \text{(say)}. \end{array} $$

(70)

Next we assume that the multivariate version of Lindeberg’s condition holds, that is,

$$ \begin{array}{@{}rcl@{}} {\lim}_{I \rightarrow \infty}{\boldsymbol{V}^{*}}^{-1}_{I} {\sum}^{I}_{i=1}{\sum}_{\{\boldsymbol{f}^{\prime}_{i}{\boldsymbol{V}^{*}}^{-1}_{I}\boldsymbol{f}_{i}\}> \epsilon}\boldsymbol{f}_{i}\boldsymbol{f}^{\prime}_{i}p^{\dagger}(\boldsymbol{f}_{i})=0 \end{array} $$

(71)

holds, for all 𝜖 > 0, p^‡(⋅) being the probability distribution of f_i. Then the Lindeberg-Feller central limit theorem [Amemiya (1985). Theorem 3.3.6), McDonald (2005, Theorem 2.2)] implies the following convergence in distribution $(\rightarrow _{d}):$

$$ \begin{array}{@{}rcl@{}} &&\boldsymbol{Z}_{I}=I[{\boldsymbol{V}}^{*}_{I}]^{-\frac{1}{2}}\bar{\boldsymbol{f}}_{I}(\boldsymbol{\beta})\rightarrow_{d} N_{p}(0,I_{p}). \end{array} $$

(72)

I_p being the p × p identity matrix.

By using the notations from Eq. 68 it follows from Eqs. 67 and 72 that

$$ \begin{array}{@{}rcl@{}} &&\hat{\boldsymbol{\beta}}_{GQL}-\boldsymbol{\beta} \simeq -\left[{\sum}^{I}_{i=1}\frac{\partial \boldsymbol{f}_{i}(\boldsymbol{\beta})} {\partial \boldsymbol{\beta}^{\prime}}\right]^{-1} \left[{\sum}^{I}_{i=1}\boldsymbol{f}_{i}\right] \\ &=&[\boldsymbol{V}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1} [\boldsymbol{V}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{\frac{1}{2}} [\boldsymbol{V}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-\frac{1}{2}}I\bar{\boldsymbol{f}}_{I}(\boldsymbol{\beta}) \\ &=&[\boldsymbol{V}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-\frac{1}{2}}\boldsymbol{Z}_{I}. \end{array} $$

(73)

Clearly, by Eq. 72, the quantity in Eq. 73 converges in distribution, as

$$ \begin{array}{@{}rcl@{}} &&[\boldsymbol{V}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-\frac{1}{2}}\boldsymbol{Z}_{I} \rightarrow_{d} N_{p}(0, [\boldsymbol{V}^{*}_{I}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{-1}). \end{array} $$

(74)

Notice that this normal covariance matrix $[\boldsymbol {V}^{*}_{I}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })]^{-1}$ is the same as the limiting covariance matrix in Eq. 65, as expected.

4.2.3 Consistency of $\hat {\sigma }^{2}_{\gamma , MM}$ Obtained from Eq. 53

Consistency property of $\hat {\sigma }^{2}_{\gamma ,MM}$ can be established in a similar way as that of $\hat {\boldsymbol {\beta }}_{GQL}$ discussed in Section 4.2.1. For convenience we, however, highlight the main steps below. Because $\hat {\sigma }^{2}_{\gamma , MM}$ is the solution of the MM estimating Eq. 53, a first order Taylor series expansion of the estimating function in the left hand side of Eq. 53 about $\sigma ^{2}_{\gamma }$ provides

$$ \begin{array}{@{}rcl@{}} &&[\hat{\sigma}^{2}_{\gamma,MM}-\sigma^{2}_{\gamma}] \\ &\simeq & - \left[{\sum}^{I}_{i=1}\left\{\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})]^{\prime} }{\partial \sigma^{2}_{\gamma}} \frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})] }{\partial \sigma^{2}_{\gamma}} + \frac{\partial [{\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{\prime}}{\partial \sigma^{2}_{\gamma}} \frac{\partial [{\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{\prime}}{\partial \sigma^{2}_{\gamma}}\right\}\right]^{-1} \\ &\times & \left[{\sum}^{I}_{i=1}\left\{\frac{\partial [{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})]^{\prime} }{\partial \sigma^{2}_{\gamma}} (\boldsymbol{y}_{i}-{\boldsymbol{\mu}}^{BA}_{i}(\boldsymbol{\beta}, \sigma^{2}_{\gamma}) )+ \frac{\partial [{\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]^{\prime}}{\partial \sigma^{2}_{\gamma}} (\boldsymbol{q}_{i}-{\boldsymbol{\lambda}}^{BA}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}))\right\}\right] \\ &+&o_{p}(1/\sqrt{N}), \end{array} $$

(75)

where $N={\sum }^{I}_{i=1}n_{i}.$ For convenience of further calculations, we re-express the equation in (75) as

$$ \begin{array}{@{}rcl@{}} &&\hat{\sigma}^{2}_{\gamma,MM}-\sigma^{2}_{\gamma} \simeq -S^{-1}_{1}S_{2,y}+o_{p}(1/\sqrt{N}). \end{array} $$

(76)

Suppose that H_N is a N-dependent increasing but finite and bounded quantity, and S₁ in Eq. 76 satisfies the following regularity condition:

$$ \frac{1}{{\sum}^{I}_{i=1}n_{i}}S_{1} \le H_{N}, $$

(77)

implying that

$$ \begin{array}{@{}rcl@{}} &&S_{1} \approx O(NH_{N}). \end{array} $$

(78)

Notice that E_Y[S_2,y] = 0. It then follows that $S_{2,y} \rightarrow _{p} E_{Y}[S_{2,y}]=0,$ but in order of $[\text {var}(S_{2,y})]^{\frac {1}{2}}.$ To compute this variance formula, it is convenient to re-express S_2,y, by using Eq. 75, as

$$ \begin{array}{@{}rcl@{}} &&S_{2,y}={\sum}^{I}_{i=1}\left\{\frac{{\sum}^{n_{i}}_{j=1}\partial [\mu^{BA}_{ij}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})] }{\partial \sigma^{2}_{\gamma}} (y_{ij}-\mu^{BA}_{ij}(\boldsymbol{\beta}, \sigma^{2}_{\gamma}) ) \right. \\ &+& \left. {\sum}^{n_{i}}_{j <k} \frac{\partial [\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]}{\partial \sigma^{2}_{\gamma}} (y_{ij}y_{ik}-\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma}))\right\}, \end{array} $$

(79)

and obtain its variance as

$$ \begin{array}{@{}rcl@{}} &&\text{var}(S_{2,y}) \\ &=&{\sum}^{I}_{i=1}\left[\left\{{\sum}^{n_{i}}_{j=1}\left( \frac{\partial [\mu^{BA}_{ij}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})] }{\partial \sigma^{2}_{\gamma}}\right)^{2}\text{var}(Y_{ij})\right. \right. \\&&+2{\sum}^{n_{i}}_{j <k} \left( \frac{\partial [\mu^{BA}_{ij}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})] }{\partial \sigma^{2}_{\gamma}}\frac{\partial [\mu^{BA}_{ik}(\boldsymbol{\beta}, \sigma^{2}_{\gamma})] }{\partial \sigma^{2}_{\gamma}}\right) \\ &\times & \left. \text{cov}\left( Y_{ij},Y_{ik}\right) \right\} +\left\{{\sum}^{n_{i}}_{j<k}\left( \frac{\partial [\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]}{\partial \sigma^{2}_{\gamma}}\right)^{2}\text{var}(Y_{ij}Y_{ik}) \right. \\ &+& \left. {\sum}^{n_{i}}_{j<k}{\sum}^{n_{i}}_{\ell<m}\left( \frac{\partial [\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]}{\partial \sigma^{2}_{\gamma}}\frac{\partial [\lambda^{BA}_{i,\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]}{\partial \sigma^{2}_{\gamma}}\right) \text{cov}(Y_{ij}Y_{ik},Y_{i\ell}Y_{im}) \right\} \\ &+&\left. \left\{{\sum}^{n_{i}}_{j=1}{\sum}^{n_{i}}_{k< \ell}\left( \frac{\partial [\mu^{BA}_{ij}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]}{\partial \sigma^{2}_{\gamma}}\frac{\partial [\lambda^{BA}_{i,k \ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma})]}{\partial \sigma^{2}_{\gamma}}\right) \text{cov}(Y_{ij},Y_{ik}Y_{i\ell}) \right\} \right] \\ &=&{\sum}^{I}_{i=1}{\Omega}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \text{(say)}, \end{array} $$

(80)

where $\text {var}(Y_{ij})=\sigma ^{BA}_{i,jj}(\boldsymbol {\beta },\sigma ^{2}_{\gamma }),$ and $\text {cov}\left (Y_{ij},Y_{ik}\right )=\sigma ^{BA}_{i,jk}(\boldsymbol {\beta },\sigma ^{2}_{\gamma }),$ are given by Eqs. 44 and. 47, respectively. The computational formulas for the remaining third and fourth order moments, i.e., for $\text {var}(Y_{ij}Y_{ik})=\omega ^{BA}_{i,jjkk}(\boldsymbol {\beta },\sigma ^{2}_{\gamma }); \text {cov}(Y_{ij}$ $Y_{ik},Y_{i\ell }Y_{im}) =\omega ^{BA}_{i,jk\ell m}(\boldsymbol {\beta },\sigma ^{2}_{\gamma }); \text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi ^{BA}_{i,jk\ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma }),$ are relatively lengthy and given in Appendix A, for convenience.

Suppose that for a N-dependent finite and bounded quantity K_N, var (S_2,y) satisfies the regularity condition

$$ \begin{array}{@{}rcl@{}} &&\frac{1}{{\sum}^{I}_{i=1}n_{i}}{\sum}^{I}_{i=1}{\Omega}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \le K_{N}, \end{array} $$

(81)

implying that

$$ \begin{array}{@{}rcl@{}} &&\left[\text{var}(S_{2,y})\right]=O_{p}(\sqrt{NK_{N}}). \end{array} $$

(82)

Now by applying Eqs. 77 and 82 to 76, one obtains

$$ \begin{array}{@{}rcl@{}} \hat{\sigma}^{2}_{\gamma,MM}-\sigma^{2}_{\gamma} &\simeq & O(N^{-1}H^{-1}_{N})O_{p}(\sqrt{NK_{N}})+o_{p}(1/\sqrt{N}) \\ &=&O_{p}(N^{-\frac{1}{2}}\frac{\sqrt{K_{N}}}{H_{N}})+o_{p}(1/\sqrt{N})=o_{p}(1/\sqrt{N}), \end{array} $$

(83)

because both H_N and K_N are finite and bounded. Hence,

$$ {\lim}_{N \rightarrow \infty}[\hat{\sigma}^{2}_{\gamma,MM}-\sigma^{2}_{\gamma}] \rightarrow_{p} 0, $$

(84)

justifying that $\hat {\sigma }^{2}_{\gamma ,MM}$ is consistent for $\sigma ^{2}_{\gamma },$ and this can be used in the GQL (49) while estimating β.

We remark that because by Eq. 84, $\hat {\sigma }^{2}_{\gamma ,MM}$ is asymptotically unbiased for $\sigma ^{2}_{\gamma },$ one may then compute the asymptotic variance of $\hat {\sigma }^{2}_{\gamma ,MM}$ by exploiting Eqs. 75 and 76. More specifically,

$$ \begin{array}{@{}rcl@{}} \text{var}[\hat{\sigma}^{2}_{\gamma,MM}]&=&S^{-1}(\boldsymbol{\beta},\sigma^{2}_{\gamma})\text{var}[S_{2,y}] S^{-1}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&S^{-1}(\boldsymbol{\beta},\sigma^{2}_{\gamma}){\sum}^{I}_{i=1}{\Omega}_{i}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) S^{-1}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), \end{array} $$

(85)

by Eq. 80, which can be estimated by replacing β with $\hat {\boldsymbol {\beta }}_{GQL},$ and $\sigma ^{2}_{\gamma }$ with $\hat {\sigma }^{2}_{\gamma ,MM}.$

5 On the Bayesian Approach for Correlated Binary Data

We continue discussing mixed effects models (1) for cross-sectional cluster binary data and a time dynamic fixed effects models (4) for longitudinal cluster data. However, as opposed to the parametric correlation structures based regression analysis, here we focus on some of the the existing alternative studies using the Bayesian approach where, without specifying the correlation structures, multilevel conditional models are used to estimate the main parameters such as the individual level covariates effects (β) and cluster specific parameters such as cluster variation $\sigma ^{2}_{\gamma }$ in Eq. 1 under cross-sectional cluster model (e.g., McCulloch, 1997), or dynamic dependence parameter (ρ) in Eq. 4 under longitudinal cluster model (e.g Chib and Jeliazkov, 2006). We remark that because the mixed effects models are also used by some authors such as Stiratelli et al. (1984), and Zeger et al. (1988) for binary longitudinal data, we also include these models under the longitudinal setup on top of the dynamic fixed effects models.

5.1 Monte Carlo Based Likelihood Estimation for Cluster Binary Data

We keep focussing on the clustered binary model but instead of common random cluster effect γ_i, consider a more general situation using γ_ij as the cluster specific individual random effect for the j-th individual in the i-th cluster. Let ${z}^{*}_{ij}$ denote a cluster-specific scalar covariate associated with random cluster effects γ_ij. Thus, ${z}^{*}_{ij}=1,$ and γ_ij = γ_i for all j = 1,…,n_i, would refer to the basic cluster-specific mean model (1). Similar to Stiratelli et al. (1984, Eqn. (2.2)), Zeger et al. (1988, Eqn. (2.1), and Daniels and Gatsonis (1999, Eqns. (1)-(2)), we may write the logit link for this general case, as

$$ \begin{array}{@{}rcl@{}} &&\text{logit}(p^{*}_{ij}(\boldsymbol{\beta},{\gamma}_{ij}))\equiv \ell(p^{*}_{ij})=\boldsymbol{x}^{\prime}_{ij}\boldsymbol{\beta}+z^{*}_{ij}{\gamma}_{ij}, \end{array} $$

(86)

for j = 1,…,n_i;i = 1,…,I. Notice that Daniels and Gatsonis (1999) have expressed the logit link as $\ell (p^{*}_{ij})={\boldsymbol {x}^{*}}^{\prime }_{ij}\boldsymbol {\alpha }_{i}.$ For our discussion it is convenient to use the notation in Eq. 86. Write $\boldsymbol {\gamma }_{i}=(\gamma _{i1},\ldots ,\gamma _{ij},\ldots ,\gamma _{n_{i}})^{\prime }.$ Similar to the normality assumption in Eq. 7, and also in the aforementioned studies, one may assume that

$$ \begin{array}{@{}rcl@{}} &&\boldsymbol{\gamma}_{i} \sim N(0, \boldsymbol{D}_{i}), \end{array} $$

(87)

where D_i is the n_i × n_i covariance matrix.

Recall from Eq. 6 that a closed-form likelihood function cannot be obtained due to the problem of integration over the distribution of the random effect γ_i. To handle such an integration problem, some numerical algorithms are developed where γ_i is considered to be a missing data, and it is drawn from a conditional distribution of γ_i|y by using the so-called Metropolis algorithm (Gelfand and Carlin, 1993), which does not require specification of the unconditional density of the binary data y. More specifically, the Metropolis algorithm is used to simulate the random effects and the so-called expectation-maximization (EM) or Newton−Ralphson (NR) technique is used to maximize the Monte Carlo (simulated) (MC) based approximate likelihood function for the estimation of the regression effects β. We may refer to McCulloch (1997), for example, for these MCEM and MCNR approaches. Some authors such as Daniels and Gatsonis (1999), in stead of normality in Eq. 87, have assumed more general symmetric multivariate t distribution for γ_i given by

$$ \begin{array}{@{}rcl@{}} &&\boldsymbol{\gamma}_{i} \sim t_{\nu}(\boldsymbol{G}_{i}\gamma,\boldsymbol{D}_{i}), \end{array} $$

(88)

where G_i is a cluster level covariates dependent known matrix of dimension n_i × q, and γ is a q-dimensional vector of suitable parameters, and ν is the unknown degrees of freedom parameter. Next, using suitable proper prior distributions for γ and D_i, Daniels and Gatsonis (1999, Section 2.2) used the Markov Chain Monte Carlo (MCMC) approach for the desired model fitting.

However, as expected the above monte carlo based likelihood inference approach is computationally expensive. Moreover, the selection of proper prior distributions is a challenge in this approach. For example, while under normality assumption for γ_i (87), it is reasonable to consider $\boldsymbol {D}^{-1}_{i}$ has the prior so-called Wishart distribution, but it may not be a proper prior distribution when γ_i follows the multivariate t distribution as in Eq. 88. This is because as Sutradhar and Ali (1989), for example, derived a Wishart distribution under the multivariate t-model which is different than the usual normality based Wishart distribution. More specifically, it is also dependent of the degrees of freedom of the t-distribution.

Turning back to the parametric inferences discussed in Section 4, when the normality assumption in Eq. 87 holds, using the so-called Binomial approximation (BA), one may easily construct the correlation structure as explained in Section 4.1.1 and apply the generalized quasi-likelihood estimation approach (49) to obtain consistent and highly efficient estimate for β, and method of moments (MM) estimation approach (53) to obtain consistent estimates for the variance parameters involved in D_i. Alternatively, as also pointed out by Daniels and Gatsonis (1999, Section 1), one may use the other parametric approaches such as the PQL (penalized quasi-likelihood) approach of Breslow and Clayton (1993) or hierarchical likelihood (HQL) approach of Lee and Nelder (1996), which are simpler than the MCMC based Bayesian approach.

5.2 Monte Carlo Based Likelihood Estimation for Longitudinal Binary Data

With regard to the analysis of longitudinal binary data in Bayesian setup, the most of the existing studies used the same random effects based logit link model (86) with some modifications as follows. Using the notations from Section 3, we re-write the logit link model for longitudinal data as

$$ \begin{array}{@{}rcl@{}} &&\text{logit}(\tilde{p}_{it}(\boldsymbol{\beta},{\gamma}_{i}))\equiv \ell(\tilde{p}_{it})=\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+\tilde{z}_{it}{\gamma}_{i}, \end{array} $$

(89)

for the binary response y_it, recorded at time t (t = 1,…,T), for the i-th individual. Here both x_it and $\tilde {\boldsymbol {z}}_{it}$ are time dependent covariates. Under this model, the binary responses y_iu at time u, and y_it at time t, become correlated under the assumption that i-th individual’s random effect remains the same over time.

We remark that in a longitudinal setup, irrespective of the nature of the responses whether linear, count or binary, it is expected in practice that as time lag increases the correlations between two responses must decrease. Two dynamic models considered in the literature for binary responses, such as the AR(1) type dynamic model (2) and binary dynamic logit (BDL) model (4) satisfies this lag dependent decaying correlation property. Specifically, the AR(1) model (2) produced the correlations $\text {corr}(Y_{iu},Y_{it})= \rho ^{t-u}[\frac {\sigma _{iuu}}{\sigma _{itt}}]^{\frac {1}{2}}$ as in Eq. 26 which (a) becomes smaller as |t − u| increases, and also (b) it contains time varying covariates involved in σ_itt. Similarly, the BDL model (4) produced the correlations

$$ \text{corr}(Y_{iu},Y_{it})=\sqrt{\frac{\mu_{iu}(\cdot)(1-\mu_{iu}(\cdot))} {\mu_{it}(\cdot)(1-\mu_{it}(\cdot))}}{\Pi}^{t}_{v=u+1} ({\tilde{\tilde{p}}}_{iv}(\boldsymbol{\beta},\rho)-\tilde{p}_{iv}(\boldsymbol{\beta}) $$

as in Eq. 35, which decay as |t − u| increases. This is because

$$ 0<({\tilde{\tilde{p}}}_{iv}(\boldsymbol{\beta},\rho)-\tilde{p}_{iv}(\boldsymbol{\beta})<1, $$

for all v = (u + 1),…,t. Also it contains time varying covariates.

Note that the random effects model (89), for example, does not satisfy decaying correlation property (a) mentioned above. It, however, satisfies (b) indicating that correlations contain time varying covariates. This is because under the assumption that $\gamma _{i} {\stackrel {iid}{\sim }} N(0,\tilde {\sigma }^{2}_{\gamma }),$ we can compute

$$ \begin{array}{@{}rcl@{}} E[Y_{it}]&=&E_{\gamma_{i}}\left[E(Y_{it}|\gamma_{i})\right] =\tilde{\mu}_{it}(\boldsymbol{x}_{it},z^{*}_{it},\boldsymbol{\beta},\tilde{\sigma}^{2}_{\gamma}) \\ &=&\int \left[\frac{\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+z^{*}_{it}\gamma_{i})} {[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+z^{*}_{it}\gamma_{i})]}\right]dG_{N}(\gamma_{i},\tilde{\sigma}^{2}_{\gamma}), \end{array} $$

(90)

and

$$ \begin{array}{@{}rcl@{}} E[Y_{iu}Y_{it}]&=&E_{\gamma_{i}}\left[E(Y_{iu}|\gamma_{i})E(Y_{it}|\gamma_{i})\right] =\tilde{\lambda}_{iut}(\boldsymbol{x}_{iu},\boldsymbol{x}_{it},z^{*}_{iu},z^{*}_{it},\boldsymbol{\beta}, \tilde{\sigma}^{2}_{\gamma}) \\ &=& \int \left[\left\{\frac{\exp(\boldsymbol{x}^{\prime}_{iu}\boldsymbol{\beta}+z^{*}_{iu}\gamma_{i})} {[1+\exp(\boldsymbol{x}^{\prime}_{iu}\boldsymbol{\beta}+z^{*}_{iu}\gamma_{i})]}\right\} \left\{\frac{\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+z^{*}_{it}\gamma_{i})} {[1+\exp(\boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+z^{*}_{it}\gamma_{i})]}\right\}\right] \\&&dG_{N}(\gamma_{i},\tilde{\sigma}^{2}_{\gamma}) \\ &-&\tilde{\mu}_{iu}(\boldsymbol{x}_{iu},z^{*}_{iu},\boldsymbol{\beta},\sigma^{2}_{\gamma}) \tilde{\mu}_{it}(\boldsymbol{x}_{it},z^{*}_{it},\boldsymbol{\beta},\sigma^{2}_{\gamma}), \end{array} $$

(91)

yielding the (t − u) lag correlation as

$$ \begin{array}{@{}rcl@{}} \text{corr}[Y_{iu},Y_{it}]&=&\frac{ \tilde{\lambda}_{iut}(\boldsymbol{x}_{iu},\boldsymbol{x}_{it},z^{*}_{iu},z^{*}_{it},\boldsymbol{\beta}, \tilde{\sigma}^{2}_{\gamma})}{[\tilde{\mu}_{iu}(\cdot)(1-\tilde{\mu}_{iu}(\cdot)) \tilde{\mu}_{it}(\cdot)(1-\tilde{\mu}_{it}(\cdot))]^{\frac{1}{2}}} \\ & - &\frac{[\tilde{\mu}_{iu}(\cdot) \tilde{\mu}_{it}(\cdot)]^{\frac{1}{2}}} {[(1-\tilde{\mu}_{iu}(\cdot)) (1-\tilde{\mu}_{it}(\cdot))]^{\frac{1}{2}}}, \end{array} $$

(92)

which contains time varying covariates but provides equi-correlations for any (u,t) when these covariates are same over time. Thus, it does not show any decaying correlations when lag |t − u| increases.

Some authors in the past such as Stiratelli et al. (1984, Eqns. (2.2), (3.1)–(3.2)), recognizing that serial correlations play an important role for longitudinal binary data, to reflect such correlations, as opposed to Eq. 89, they have used a logit link random effects model similar but different than Eq. 86. More specifically,

$$ \begin{array}{@{}rcl@{}} &&\text{logit}(\tilde{p}_{it}(\boldsymbol{\beta},{\gamma}_{it}))\equiv \ell(\tilde{p}_{it})=\boldsymbol{x}^{\prime}_{it}(y_{i,t-1},\ldots,y_{i1})\boldsymbol{\beta}+\tilde{z}_{it}{\gamma}_{it}, \end{array} $$

(93)

where the covariates x_it is composed of the past binary responses as given covariates, and $\tilde {\boldsymbol {\gamma }}_{i}=(\gamma _{i1},\ldots ,\gamma _{it},\ldots ,\gamma _{iT})^{\prime }$ denote the variable random effects of the i-th individual over the time period T. As far as the distribution of $\tilde {\boldsymbol {\gamma }}_{i}$ is concerned, the authors have considered

$$ \begin{array}{@{}rcl@{}} \tilde{\boldsymbol{\gamma}}_{i} \sim N(0,\tilde{D}), \end{array} $$

(94)

similar to Eq. 87, and estimated β and $\tilde {D}: T \times T,$ using the so-called empirical Bayes estimation approach. Notice that the dimension of β in Eqs. 93 and 89 is different. This is because β in Eq. 93 also contains the regression effects/parameters of the past binary responses.

As compared to the logit link model (93), Chib and Jeliazkov (2006, Eqns. (1),(6)) have used a more general semi-parametric dynamic mixed model for longitudinal binary data, constructed based on a latent linear semi-parametric dynamic mixed model. This allows one either to use logit or probit links. More specifically, suppose that $y^{*}_{it}$ is an unobservable continuous variable satisfying a linear semi-parametric dynamic mixed model, as

$$ \begin{array}{@{}rcl@{}} g^{*}_{it}=E[Y^{*}_{it}|\cdot] &=& \boldsymbol{x}^{\prime}_{it}\boldsymbol{\beta}+\tilde{z}_{it}\gamma_{it}+\phi_{1} y_{i,t-1}+\ldots+\phi_{m} y_{i,t-m}\\&&+g(s_{it})+\epsilon_{it}, \end{array} $$

(95)

where y_i,t−j is the binary response occurred in the past at time (t − j) with its regression effect ϕ_t−j, g(s_it) is a smooth non-parametric function in s_it covariates, and 𝜖_it is the model error. Next, suppose that the binary response y_it be determined based on the relationship

$$ \begin{array}{@{}rcl@{}} y_{it}&=&\left\{\begin{array}{ll} 1 & \text{if} y^{*}_{it} > 0 \\ 0 & \text{otherwise.} \end{array} \right. \end{array} $$

(96)

Note that if $y^{*}_{it}$ follows a logistic (L) distribution (e.g., Johnson and Kotz (1970)) with mean $g^{*}_{it}$ as in Eq. 95, and variance $\frac {\pi ^{2}}{3},$ then by using the condition in Eq. 96, one can compute the binary probability as

$$ \begin{array}{@{}rcl@{}} Pr(Y_{it}=1|g^{*}_{it}]&=& {\int}^{g^{*}_{it}}_{-\infty}f_{L}(y^{*}_{it})dy^{*}_{it}=\frac{\exp(g^{*}_{it})}{1+ \exp(g^{*}_{it})}=\pi^{**}_{it}(\cdot), \end{array} $$

(97)

which has the same form as in Eq. 4, with a difference in the formula for $g^{*}_{it},$ specifically in Eq. 4$g^{*}_{it}$ has the dynamic form, whereas $g^{*}_{it}$ in Eq. 95 considered by Chib and Jeliazkov (2006) has the dynamic mixed model form which is a logit link function $(\text {logit}(\pi ^{**}_{it})=g^{*}_{it})$ for clustered longitudinal data (see Sutradhar (2011, Chapter 11) for similar familial/cluster longitudinal binary data). For linear data, $g^{*}_{it}$ itself is the linear link function which has been studied by some authors such as Das et al. (2013) in a Bayesian setup. For some more discussions on binary dynamic mixed models, similar to that of Chib and Jeliazkov (2006), one may be referred to Sutradhar et al. (2010), for example, in a parametric setup, and Congdon (2014, Section 7.1.1, p. 287), among others, in a Bayesian frame work.

6 Concluding Remarks

This review paper clarifies at least two main misconceptions around the analysis of correlated binary data collected under a cluster in both cross-sectional cluster and longitudinal cluster data setups. First many authors over the past forty years used random effect models to model the correlations for longitudinal binary data. This approach is either misleading or too restrictive. This is because similar to time series modeling, the longitudinal correlations are best modeled through suitable dynamic models relating repeated responses from the same individual. More clearly a common individual random effect among longitudinal responses is unable to address the time effects on the binary responses, rather it generates equi-correlations type structure among the repeated responses which is too restrictive.

Second, in both cross-sectional and longitudinal cluster setups, many studies have pre-specified the marginal means as the function of regression parameters only which may lead to inconsistent regression estimates when a mixed effects model is the true model for the marginal means. To clarify this issue in the cross-sectional cluster setup, we have considered 3 different important situations where fixed effects based marginal means may or may not appropriate. (A) In the first approach, random cluster effects are assumed to follow a normal distribution, and a likelihood function is constructed averaging (referred to as population average (PA)) the conditional likelihood function (which is a product of independent binary distributions conditional on the cluster effects) over the normal random effects, and then the likelihood estimates of the regression and cluster variance parameters are obtained and interpreted. Under this approach, the binary response means were shown to have a marginal mixed effects (MME) model. Thus any fixed effects based marginal mean specification in such cases is bound to produce inconsistent regression estimates, which is a serious inference issue. (B) In the second approach, certain suitable distributions for the random cluster effects were technically developed so that it provides a marginal fixed effects (MFE) model for the binary means involving only the regression parameters (referred to as the subject specific (SS) regression effects), which may be estimated and interpreted by using the likelihood estimates computed in the same way as in (A). But, these distributional assumptions such as so-called “bridge” or beta- binary distributions for the random effects or their functions, are too narrow or restrictive for practical use. (C) In the third approach, no assumption is made about the random cluster effects distribution, in stead an arbitrary MFE (AMFE) model was used for the means involving only regression parameters. Also in this approach no attempts were made to develop any correlation structure or likelihood function, in stead a ‘working’ correlation structure based GEE (generalized estimating equations) approach was used for the SS regression parameters estimation. This approach is misleading as under (A) one never gets a fixed mean model, and under (B) only a limited number of assumptions for the distribution of random effects, those too technically restrictive, may lead to a marginal fixed effects based mean model. In summary, because normal random effects based cluster model (A) is quite practical, we have given details for estimation of the mean model (involving both β and $\sigma ^{2}_{\gamma }$) using the so-called GQL approach. As asymptotic properties of such estimators are not available, they (consistency and normality) are discussed in details.

The paper has also equally studied the longitudinal clustered models for binary data where repeated responses from an individual are collected over a short period. Longitudinal correlations arise due to a dynamic relationship among the present and past binary responses and they are different than clustered correlations. Similar to the cluster setup, it is shown that in many situations MFE model can not be used to study the regression effects. For example, an alternative MD/MR (marginal dynamic/recursive) model does not produce fixed effects based mean model. The existing GEE approach is not useful in such a situation because the recursive means contains both regression and correlation parameters, whereas GEE is based on fixed effects based marginal means.

Furthermore, there also exists some studies dealing with clustered and/or panel/longitudinal binary data in a Bayesian setup. These studies are based on generalized linear mixed models with certain suitable link functions to reflect the correlations of the clustered and/or longitudinal data. But the inferences are not made based on any correlation structures, rather they exploit conditional likelihood using monte carlo techniques. We have high lighted some of these important studies in this paper.

References

Amemiya, T. (1985). Advanced Econometrics. Harvard University Press, Cambridge.
Google Scholar
Bahadur, R.R. (1961). A representation of the joint distribution of responses to n dichotomous items, 6, Solomon, H. (ed.), p. 158–168.
Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in generalized linear mixed models. Journal of American Statistical Association 88, 9–25.
MATH Google Scholar
Breslow, N.E. and Lin, X. (1995). Bias correction in generalized linear models with a single component of dispersion. Biometrika 82, 81–92.
Article MathSciNet MATH Google Scholar
Chen, Z., Yi, G.Y. and Wu, C. (2011). Marginal methods for correlated binary data with misclassified responses. Biometrika 98, 647–662.
Article MathSciNet MATH Google Scholar
Chib, S. and Jeliazkov, I. (2006). Inference in semiparametric dynamic models for binary longitudinal data. Journal of American Statistical Association 101, 685–700.
Article MathSciNet MATH Google Scholar
Crowder, M. (1995). On the use of a working correlation matrix in using generalized linear models for repeated measures. Biometrika 82, 407–410.
Article MATH Google Scholar
Congdon, P. (2014). Applied Bayesian Modelling. Wiley, New York.
MATH Google Scholar
Cox, D.R. (1972). The analysis of multivariate binary data. Appl. Stat.21, 113–120.
Article Google Scholar
Daniels, M.J. and Gatsonis, C. (1999). Hierarchical generalized linear models in the analysis of variations in health care utilization. Journal of American Statistical Association 94, 29–42.
Article Google Scholar
Das, K., Li, R., Sengupta, S. and Wu, R. (2013). A Bayesisn semiparametric model for bivariate sparse longitudinal data. Stat. Med. 32, 3899–3910.
Article MathSciNet Google Scholar
Ekholm, A., Smith, P.W.F. and McDonald, J.W. (1995). Marginal regression analysis of a multivariate binary response. Biometrika 82, 847–854.
Article MathSciNet MATH Google Scholar
Fokianos, K. and Kedem, B. (2003). Regression theory for categorical time series. Stat. Sci. 18, 357–376.
Article MathSciNet MATH Google Scholar
Gelfand, A.E. and Carlin, B.P. (1993). Maximum likelihood estimation for constrained or missing data problems. Canadian Journal of Statistics 21, 303–311.
Article MATH Google Scholar
Haseman, J.K. and Kuper, J.K. (1979). Analysis of dichotomous response data from certain toxicological experiments. Biometrics 35, 281–294.
Article Google Scholar
Henderson, C.R. (1963). Selection index and expected genetic advance. National Academy of Sciences, p. 141–63.
Jiang, J. (1998). Consistent estimators in generalized linear mixed models. Journal of American Statistical Association 93, 720–729.
Article MathSciNet MATH Google Scholar
Johnson, N. L. and Kotz, S. (1970). Continuous Multivariate Distributions-2. Wiley, New York.
Google Scholar
Kanter, M. (1975). Auto-regression for discrete processes mod 2. Journal of Applied Probability 12, 371–375.
Article MathSciNet MATH Google Scholar
Karim, M.R. and Zeger, S.L. (1992). Generalized linear models with random effects: Salamander mating revisited. Biometrics 48, 631–644.
Article Google Scholar
Kuk, A.Y.C. (1995). Asymptotically unbiased estimation in generalized linear models with random effects. J. R. Stastist. Soc. B 58, 619–678.
MATH Google Scholar
Laird, N.M. and Ware, J.H. (1982). Random effects models for longitudinal data. Biometrics 38, 963–974.
Article MATH Google Scholar
Lee, Y. and Nelder, J. (1996). Hierarchical generalized linear models. Journal of Royal Statistical Society, B 58, 619–678.
MathSciNet MATH Google Scholar
Liang, K.Y. and Zeger, S.L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
Article MathSciNet MATH Google Scholar
Liang, K.-Y., Zeger, S.L. and Qaqish, B. (1992). Multivariate regression analysis for categorical data. J. Roy. Statist. Soc. Ser. B 54, 3–40.
MathSciNet MATH Google Scholar
Lin, X. and Breslow, N.E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. J. Am. Statist. Assoc. 91, 1007–1016.
Article MathSciNet MATH Google Scholar
Lin, X. and Carroll, R.J. (2001). Semiparametric regression for cluster data using generalized estimating equations. J. Am. Statist. Asso. 96, 1045–1056.
Article MATH Google Scholar
Lipsitz, S.R., Laird, N.M. and Harrington, D.P. (1991). Generalized estimating equations for correlated binary data: using the odds ratio as a measure of association. Biometrika 78, 153–160.
Article MathSciNet Google Scholar
Loredo-Osti, J.C. and Sutradhar, B.C. (2012). Estimation of regression and dynamic dependence parameters for non-stationary multinomial time series. J. Time Ser. Anal. 33, 458–467.
Article MATH Google Scholar
McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models. Chapman and Hall, London.
Book MATH Google Scholar
McCulloch, C.E. (1997). Maximum likelihood algorithms for generalized linear mixed models. Journal of American Statistical Association 92, 162–170.
Article MathSciNet MATH Google Scholar
McDonald, D.R. (2005). The local limit theorem: a historical perspective. Journal of Iranian Statistical Society 4, 73–86.
MATH Google Scholar
McGilchrist, C.A. (1994). Estimation in generalised linear mixed models. J. R. Statist. Soc. B56, 61–69.
MathSciNet MATH Google Scholar
Neuhaus, J.M. (2002). Analysis of clustered and longitudinal binary data subject to response misclassification. Biometrics 58, 675–683.
Article MathSciNet MATH Google Scholar
Neuhaus, J.M., Kalbfleisch, J.D. and Hauck, W.W. (1991). A comparison of cluster-specific and population-averaged approaches for analyzing correlated binary data. Int. Stat. Rev. 59, 25–35.
Article Google Scholar
Parzen, M. et al. (2011). A generalized linear mixed model for longitudinal binary data with a marginal logit link function. The Annals of Applied Statistics5, 449–467.
Article MathSciNet MATH Google Scholar
Prentice, R.L. (1986). Binary regression using an extended Beta-binomial distribution. Journal of American Statistical Association 81, 321–327.
Article MATH Google Scholar
Schall, R. (1991). Estimation in generalized linear models with random effects. Biometrika 78, 719–727.
Article MATH Google Scholar
Stiratelli, R., Laird, N. and Ware, J.H. (1984). Random effects model for serial observations with binary response. Biometrics 40, 961–971.
Article Google Scholar
Sutradhar, B.C. (2003). An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–393.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. (2004). On exact quasi-likelihood inference in generalized linear mixed models. Sankhya B: The Indian Journal of Statistics 66, 261–289.
Google Scholar
Sutradhar, B.C. (2010). Inferences in generalized linear longitudinal mixed models, Vol. 38.
Sutradhar, B.C. (2011). Dynamic Mixed Models for Familial Longitudinal Data. Springer, New York.
Book MATH Google Scholar
Sutradhar, B.C. (2014). Longitudinal Categorical Data Analysis. Springer, New York.
Book MATH Google Scholar
Sutradhar, B.C. and Ali, M.M. (1989). A generalization of the Wishart distribution for the elliptical model and its moments for the multivariate t model. J. Multivar. Anal. 29, 155–162.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Das, K. (1997). Generalized linear models for beta correlated binary longitudinal data. Communications in Statistics- Theory and Methods26, 617–635.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Das, K. (1999). On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C., Bari, W. and Das, K. (2010). On probit versus logit dynamic mixed models for binary panel data. J. Stat. Comput. Simul. 80, 421–441.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Farrell, P.J. (2007). On optimal lag 1 dependence estimation for dynamic binary models with application to asthma data. Sankhya B 69, 448–467.
MathSciNet MATH Google Scholar
Sutradhar, B.C. and Mukerjee, R. (2005). On likelihood inference in binary mixed model with an application to COPD data. Computational Statistics and Data Analysis 48, 345–361.
Article MathSciNet MATH Google Scholar
Sutradhar, B.C. and Zheng, N. (2018). Inferences in binary dynamic fixed models in a semiparametric setup. Sankhya B 80, 263–291.
Article MathSciNet MATH Google Scholar
Wang, Z. and Louis, T.A. (2003). Matching conditional and marginal shapes in binary random intercept models using a bridge distribution function. Biometrika90, 765–775.
Article MathSciNet MATH Google Scholar
Wang, Z. and Louis, T.A. (2004). Marginalized binary mixed-effects models with covariate-dependent random effects and likelihood inference. Biometrics 60, 884–891.
Article MathSciNet MATH Google Scholar
Wedderburn, R. (1974). Quasilikelihood functions, generalized linear models and the Gauss-Newton method. Biometrika 61, 439–447.
MathSciNet MATH Google Scholar
Yi, G.Y. and Cook, R.J. (2002). Marginal methods for incomplete longitudinal data arising in clusters. Journal of American Statistical Association 97, 1071–1080.
Article MathSciNet MATH Google Scholar
Zeger, S.L., Liang, K.Y. and Albert, P.S. (1988). Models for longitudinal data: a generalized estimating equations approach. Biometrics 44, 1049–1060.
Article MathSciNet MATH Google Scholar
Zeger, S.L., Liang, K.Y. and Self, S.G. (1985). The analysis of binary longitudinal data with time independent covariates. Biometrika 72, 31–38.
MathSciNet MATH Google Scholar

Download references

Acknowledgments

The author would like to thank a referee and the Associate Editor for their valuable comments and suggestions that lead to the improvement of the paper.

Author information

Authors and Affiliations

Memorial University, St. John’s, NL, Canada
Brajendra C. Sutradhar

Authors

Brajendra C. Sutradhar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Brajendra C. Sutradhar.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

To compute ${\Omega }_{i}(\boldsymbol {\beta },\sigma ^{2}_{\gamma })$ in Eq. 80 on top of var(Y_ij) and cov(Y_ij,Y_ik), we need the formulas for certain specific third and fourth order moments as follows.

Computation of var(Y _ij Y _ik)

This variance is computed as

$$ \begin{array}{@{}rcl@{}} \text{var}(Y_{ij}Y_{ik})&=&E[Y^{2}_{ij}{Y^{2}_{k}}]-[E(Y_{ij}Y_{ik})]^{2}\\ &=&E(Y_{ij}Y_{ik})-[E(Y_{ij}Y_{ik})]^{2}=\lambda^{BA}_{i,jk}[1-\lambda^{BA}_{i,jk}] =\omega^{BA}_{i,jjkk}, \end{array} $$

(98)

where $\lambda ^{BA}_{i,jk}$ is computed by Eq. 46.

Computation of $\text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi _{i,jk \ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma })$

Because

$$ \begin{array}{@{}rcl@{}} \text{cov}(Y_{ij},Y_{ik}Y_{i\ell})&=&E[Y_{ij}Y_{ik}Y_{i\ell}]-\mu^{BA}_{ij}\lambda^{BA}_{i,k \ell}, \end{array} $$

(99)

we need the formula for the third order moments, namely

$$ \begin{array}{@{}rcl@{}} &&E[Y_{ij}Y_{ik}Y_{i\ell}]=E_{\gamma_{i}}E[\{Y_{ij}Y_{ik}Y_{i\ell}\}|\gamma_{i}] \\ &&=E_{\gamma_{i}}\left[E(Y_{ij}|\gamma_{i})E(Y_{ik}|\gamma_{i})E(Y_{i\ell}|\gamma_{i})\right] \\ &=&\int p^{*}_{ij}(\boldsymbol{\beta},\gamma_{i}) p^{*}_{ik}(\boldsymbol{\beta},\gamma_{i})p^{*}_{i\ell}(\boldsymbol{\beta},\gamma_{i})g_{N}(\gamma_{i})d\gamma_{i}, \end{array} $$

(100)

where, for example, $p^{*}_{ij}(\boldsymbol {\beta },\gamma _{i})=\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta } +\gamma _{i})/[1+\exp (\boldsymbol {x}^{\prime }_{ij}\boldsymbol {\beta }+\gamma _{i})],$ and $g_{N}(\gamma _{i}) \equiv [\gamma _{i} \sim N(0,\sigma ^{2}_{\gamma })].$ Similar to Eq. 46, this normal integration in Eq. 100 may be computed approximately by

$$ \begin{array}{@{}rcl@{}} &&\lambda^{BA}_{i,jk\ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ & \times &p^{*}_{i\ell}(\boldsymbol{x}_{i\ell};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}, \end{array} $$

(101)

yielding

$$ \begin{array}{@{}rcl@{}} &&\phi^{BA}_{i,jk \ell} (\boldsymbol{\beta},\sigma^{2}_{\gamma})=\lambda^{BA}_{i,jk\ell}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) -\mu^{BA}_{ij}\lambda^{BA}_{i,k \ell}. \end{array} $$

(102)

Computation of $\text {cov}(Y_{ij}Y_{ik},Y_{i\ell }Y_{im})=\omega _{i,jk\ell m} (\boldsymbol {\beta },\sigma ^{2}_{\gamma })$

By similar calculations as in Eq. 101, one obtains

$$ \omega^{BA}_{i,jk\ell m} (\boldsymbol{\beta},\sigma^{2}_{\gamma})=\lambda^{BA}_{i,jk\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) -\lambda^{BA}_{i,jk}(\boldsymbol{\beta},\sigma^{2}_{\gamma})\lambda^{BA}_{i,\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}), $$

(103)

where

$$ \begin{array}{@{}rcl@{}} &&\lambda^{BA}_{i,jk\ell m}(\boldsymbol{\beta},\sigma^{2}_{\gamma}) \\ &=&{\sum}^{V}_{v_{i}=0}p^{*}_{ij}(\boldsymbol{x}_{ij};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{ik}(\boldsymbol{x}_{ik};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))p^{*}_{i\ell}(\boldsymbol{x}_{i\ell};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i})) \\ & \times &p^{*}_{im}(\boldsymbol{x}_{im};\boldsymbol{\beta},\sigma_{\gamma} h(v_{i}))\begin{pmatrix}V \\ v_{i}\end{pmatrix}(1/2)^{v_{i}}(1/2)^{V-v_{i}}. \end{array} $$

(104)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sutradhar, B.C. Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges. Sankhya B 84, 259–302 (2022). https://doi.org/10.1007/s13571-021-00260-3

Download citation

Received: 31 December 2020
Accepted: 05 June 2021
Published: 09 July 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s13571-021-00260-3

Keywords

PACS Nos

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Fixed versus Mixed Effects Based Marginal Models for Clustered Correlated Binary Data: an Overview on Advances and Challenges

Abstract

Similar content being viewed by others

Measurement Error Analysis from Independent to Longitudinal Setup

Modelling Correlated Bivariate Binary Data: A Comparative View

Robust Inference Progress from Independent to Longitudinal Setup

1 Introduction

2 Existing Marginal Models and Estimation for Cross-Sectional Clustered Binary Data

2.1 CM-A: Population Average (PA) Based Marginal Mixed Effects (MME) Models

2.1.1 Some Highly Competing Estimation Approaches in the Cross-Sectional Cluster Setup and their Drawbacks

A BLUP (Best Linear Unbiased Prediction) Approach

2.2 CM-B-1: Subject Specific (SS) Marginal Fixed Effects (MFE) Model Based on “bridge” Random Cluster Effects

2.3 CM-B-2: SS Marginal Fixed Effects (MFE) Model Based on Beta-Binary Random Clustered Probability Function

2.4 CM-C: A SS Arbitrary Marginal Fixed Effects (AMFE) Model

3 Existing Marginal Models and Estimation for Longitudinal Clustered Binary Data

3.1 LM(1): Time Specific (TS) Marginal Fixed Effects Model

3.2 LM(2): Time Specific (TS) Marginal Dynamic/Recursive (MD/MR) Model

3.3 LM(3): A TS (Time Specific) Arbitrary Marginal Fixed Effects (AMFE) Model for Longitudinal Binary Data

4 Further Estimation and Asymptotic Properties in Cross-sectional Cluster Setup

4.1 GQL and MM Estimation

4.1.1 GQL Estimation of β

Computation of the Derivative \(\frac {\partial [{\boldsymbol {\mu }}^{BA}_{i}(\boldsymbol {\beta },\sigma ^{2})]^{\prime }}{\partial \boldsymbol {\beta }}\)

4.1.2 MM Estimation of \(\sigma ^{2}_{\gamma }\)

4.2 Consistency and Asymptotic Normality

4.2.1 Consistency of \(\hat {\boldsymbol {\beta }}_{GQL}\) Obtained from Eq. 49

4.2.2 Asymptotic Normality of \(\hat {\boldsymbol {\beta }}_{GQL}\)

4.2.3 Consistency of \(\hat {\sigma }^{2}_{\gamma , MM}\) Obtained from Eq. 53

5 On the Bayesian Approach for Correlated Binary Data

5.1 Monte Carlo Based Likelihood Estimation for Cluster Binary Data

5.2 Monte Carlo Based Likelihood Estimation for Longitudinal Binary Data

6 Concluding Remarks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

Appendix: Higher Order Moments (up to order 4) for Clustered Binary Responses

Computation of var(Y ij Y ik)

Computation of \(\text {cov}(Y_{ij},Y_{ik}Y_{i\ell })=\phi _{i,jk \ell } (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Computation of \(\text {cov}(Y_{ij}Y_{ik},Y_{i\ell }Y_{im})=\omega _{i,jk\ell m} (\boldsymbol {\beta },\sigma ^{2}_{\gamma })\)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

PACS Nos

Search

Navigation

Computation of var(Y _ij Y _ik)