Abstract
In practice one may not have always smooth data. When bulk of the data are smooth but the complete data set apparently contains a few contaminated observations or outliers, one encounters difficulties to choose an inference technique because of the fact that the traditional inference techniques developed for smooth data analysis may no longer provide unbiased and consistent estimates for the desired parameters such as regression parameters in linear or generalized linear models (GLMs) setup. In this paper, we first briefly review some of the widely used bias corrected techniques in linear model setup. But, as opposed to the linear models in normal or other continuous exponential family based variables, the robust inference for discrete data in the GLMs setup, such as for count and binary data, is, however, not adequately discussed in the literature. The advantages and drawbacks of an existing outliers resistant Mallow’s type quasi-likelihood (MQL) estimation approach in GLMs setup are reviewed in brief. We then discuss a recently proposed fully standardized MQL (FSMQL) approach that provides almost unbiased estimates ensuring its higher consistency performance. One encounters further challenges when the data in GLMs setup are repeatedly collected over a period of time. This is mainly because one then requires to modify the FSMQL type estimation approaches such that the modified approach also accommodates the correlation structure of the repeated data. A recently proposed robust generalized QL (RGQL) approach is reviewed for the purpose.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Consistency
- Fully standardized Mallow’s type quasi-likelihood estimators
- GQL estimation
- Longitudinal models for repeated data
- Mallow’s type quasi-likelihood estimators
- Outliers
- Quasi-likelihood estimators
- Regression models for count and binary data
- Robust GQL approach
1 Introduction
In a regression setup, the responses whether linear, count, or binary, are generated as a function of certain suitable covariates. If bulk of the responses appear to be close to the mean function of the responses with a few remaining responses appearing at a significant distance from the mean function, then these latter few responses are considered to be potential outliers. In general these outliers occur because of the corresponding covariates which may be contaminated in some ways, and they are referred to as the mean shifted outliers. In some situations, a response may be considered as an outlier because of its inflated variance as compared to the bulk of the responses. It is of main interest to understand the regression model appropriate for bulk of the good responses. But the use of few outlying responses may distort the inference for the bulk of the responses. There are at least two ways this inference problem has been tackled in the literature.
First, it is attempted to detect the outliers and exclude them for the overall inference. For some justifications on this, one may be referred to Hampel et al. (1986, Sect. 1.4) among others. For the purpose, many researchers have discussed the so-called maximum studentized residual (MSR) and maximum normed residual (MNR) tests for detection of outliers in a linear regression setup for independent data. For example, one may refer to the work of Srikantan (1961), Stefansky (1971, 1972), Tietjen et al. (1973), Prescott (1975), Lund (1975), Bailey (1977), Johnson and Prescott (1975), Ellenberg (1973, 1976), Cook and Prescott (1981), Doornbos (1981), and Beckman and Cook (1983, Sect. 4), among others. The powers of these two statistics in detecting outliers may also be affected by the ways the parameters of the regression models are estimated. For a discussion on this, see, for example, a relatively recent work by Sutradhar et al. (2007). In second approach, a robust weighted distance function is constructed such that the suspected outliers get smaller weights. Next the distance function is minimized for the estimation of the regression effects. Some of the existing widely used robust procedures are: Minimax estimation, M-estimation, L-estimation, and R-estimation. For details on these procedures, see, for example, Hampel et al. (1986), Rousseeuw and Leroy (1987), and Huber (2004), and the references therein.
In the independent setup, some authors such as Cantoni and Ronchetti (2001), among others, have suggested a Mallow’s type quasi-likelihood (MQL) robust estimation approach to obtain a consistent estimate for the regression effects involved in the model. For the MQL construction, they use the Huber’s robust function but did not use the inverse of the variance of such a function to make the MQL standardized. Recently, Bari and Sutradhar (2010a) have improved this estimating equation and introduced a fully standardized MQL (FSMQL) estimating equation that provides regression estimates with smaller bias. In this paper, we review these MQL and FSMQL approaches for the estimation of the regression effects involved in generalized linear models (GLMs), for example for binary and count data.
Also, there have been some studies using QL or generalized estimating equations (GEE) approaches for robust regression estimation in the longitudinal setup. For example, Preisser and Qaqish (1999) have used a resistant GEE (REGEE) approach, which was improved by Cantoni (2004) (see also Sinha 2006 for a random effects approach) by using a semi-standardized MQL (SSMQL; see also Bari and Sutradhar 2010b) approach. In the second part of the paper, we review these approaches including the robust GQL (RGQL) approach discussed by Bari and Sutradhar (2010b) and point out their advantages and drawbacks. Both count and binary longitudinal models are considered.
2 Robust Inference in Regression Models in Independent Setup
2.1 Inference for Linear Models
There exists a vast literature for robust inference in linear models for independent data in the presence of one or more outliers. See, for example, Rousseeuw and Leroy (1987), Huber (2004, Chap. 7), and a relatively recent paper by Sutradhar et al. (2007). These studies mainly deal with outliers in normal responses. For simplicity consider a simple linear regression model
where \(y = {(y_{1},\ldots,y_{i},\ldots,y_{K})}^{\prime}\) is a K ×1 response vector, X is known design matrix of order K ×p, β is a p ×1 vector of unknown parameters, and ε is an K ×1 error variable distributed as \(\epsilon \sim N(0{,\sigma}^{2}I_{K})\), I K being the K ×n identity matrix. Usually, each observation in a realization (y, X) contributes to the evaluation of the regression coefficient β. The contribution of one observation, however, may be discordant to the point of sensibly determining the value of a regression parameter. Such an observation is said to be an outlier. To see how an outlier can perturb the linear model (1), two types of outliers are generally considered. They are (a) mean shifted outliers, also referred to as the additive outliers, and (b) variance inflated outliers, also referred to as the innovative or multiplicative outliers.
To construct an additive outlier model, one can perturb the linear model (1) and write
where \(\tilde{\epsilon}= {(\tilde{\epsilon}_{1},\ldots,\tilde{\epsilon}_{i},\ldots,\tilde{\epsilon}_{K})}^{\prime}\) is related to ε in (1) as
where for | δ 1 | > 0, \(y_{i} = x^{\prime}_{i}\beta +\tilde{\epsilon} _{i}\) is certainly a discordant observation when compared to the other K − 1 observations. It is clear from (1) and (3) that
where
To construct a variance inflated outlier model, one can perturb the model (1) as
where \({\epsilon}^{{\ast}} = {(\epsilon _{1}^{{\ast}},\ldots,\epsilon _{i}^{{\ast}},\ldots,\epsilon _{K}^{{\ast}})}^{\prime}\) is related to ε in (1) as
where for ω → 0, the ith observation y i will have large variance leading this observation to be an outlier. It is clear from (1) and (5) that
Thus, under model (2), bulk (K − 1) of the error variables follow N(0, σ 2) distribution and 1 follows N(δ 1, σ 2). This is equivalent to say that the \(\tilde{\epsilon}_{i}\) in model (2) are independent, identically distributed with the common underlying distribution
(Huber 2004, Example 1.1) where Φ( ⋅) is the standard normal cumulative. Similarly, one may say that ε i ∗ under model (4) are independent, identically distributed with common underlying distribution
2.1.1 Robust Estimation of Regression Effects
It is understandable that the ordinary least square (LS) estimator
is biased for β under model (2)–(3) and will be unbiased but inefficient under model (4)–(5). There exist various robust approaches for the consistent estimation of β irrespective of the underlying model whether it is (2)–(3) or (4)–(5). Here we briefly describe two of the approaches, for example.
2.1.1.1 Huber’s Robust Weights Based Iterative Re-weighted Least Square Approach
This estimate is obtained via an iterative re-weighted least squares (RWLS) method (Street et al. 1988). For p components of β, in this approach one solves the robust weights based estimating equation
where x ju is the uth component of the x j vector, and
with ψ(r j ) as the Huber’s bounded function of r j given by
where \(r_{j} = (y_{j} - x_{j}^{\prime}\beta _{r(0)}^{{\ast}})/\tilde{s}\) for j = 1, …, n, with \(\beta _{r(0)}^{{\ast}}\) as an initial robust estimate of β which may be obtained by minimizing the L 1 distance \(\sum _{j=1}^{K}\vert y_{j} - x_{j}^{\prime}\beta \vert\), and \(\tilde{s}\) as a robust estimate of σ given by
Note that if r j = 0, one uses ξ j = 1. The solution to (7) may then be obtained as
where \(\Omega = \mbox{diag}[\xi _{1},\ldots,\xi _{K}]\). This \(\beta _{r(1)}^{{\ast}}\) replaces \(\beta _{r(0)}^{{\ast}}\) and provides us with a new start and new weights for an improved estimate of β to be obtained by (9). This cycle of iterations continues until convergence. Let the final solution be denoted by \(\hat{\beta}_{r(1)}\).
2.1.1.2 An Alternative Weights Based Iterative RWLS Approach
Rousseeuw and Leroy (1987, Chap. 5) suggest a least median of squares (LSM) approach where the scale parameter to compute the residual is estimated using robust weights different than Huber’s weights used in the last section. In fact one can use the iterative least square approach discussed in the last section by replacing Huber’s weights with these new weights suggested by Rousseeuw and Leroy (1987, p. 202). See, for example, Sutradhar et al. (2007) for a comparison between RWLS approaches using Huber’s and Rousseeuw and Leroy weights. To be specific, Rousseeuw and Leroy robust weights are defined as
where \(d_{j(\beta _{r(0)}^{{\ast}})} = y_{j} - x_{j}^{\prime}\beta _{r(0)}^{{\ast}}\) and \(\tilde{s}_{0}\) is given by
These robust weights in (10) are then used to compute an \(\tilde{\Omega}\) matrix as
which is then used to obtain a first step improved robust estimate for β as
The cycle of iterations continues until convergence. Let this final RWLS estimate be denoted by \(\hat{\beta}_{r(2)}\).
2.1.2 Robust Estimation of Variance Component
Note that in the linear model setup, the LS estimate of σ 2 is obtained by computing the residual sum of squares based on the least square estimate of β. That is, \(\hat{\sigma}_{ls}^{2} =\sum _{j=1}^{K}{(y_{j} - x_{j}^{\prime}\hat{\beta}_{ls})}^{2}/(K - p)\). Under the linear model in the presence of outliers, one may obtain LS estimate of σ 2 simply by replacing \(\hat{\beta}_{ls}\) with \(\hat{\beta}_{r(1)}\) or \(\hat{\beta}_{r(2)}\) obtained in the last section. Thus the LS estimator for σ 2 has the formula
or
2.1.2.1 Huber’s Robust Weights Based Iterative RWLS Estimator for σ 2
Following Street et al. (1988), one obtains this estimator as
where ξ j (j = 1, …, K) is the jth robust weight to protect the estimate against possible outliers, and \(\Omega = diag(\xi _{1},\ldots,\xi _{j},\ldots,\xi _{K})\). To be specific, ξ j is defined as \(\xi _{j} =\psi (r_{j})/r_{j}\) with \(r_{j} = (y_{j} - {x}^{\prime}\hat{\beta}_{ls})/{s}^{{\ast}}\), where
Note that the ψ function involved in ξ j in (14) is the same Huber’s robust function used in (8).
2.1.2.2 Rosseeuw and Leroy Weights Based Robust Estimator for σ 2
This robust estimator is computed following Rousseeuw and Leroy (1987, p. 202, Eq. (1.5)). More specifically, in this approach, robust weights are defined as
where \(d_{j(\hat{\beta}_{ls})} = y_{j} - x_{j}^{\prime}\hat{\beta}_{ls}\) and s 0 is given by
Next, these weights are exploited to compute the estimator, say \(\hat{\sigma}_{r(2)}^{2}\), as
2.1.3 Finite Sample Performance of the Robust Estimators: An Illustration
Sutradhar et al. (2007) conducted a simulation study to examine the performance of the robust methods as compared to the LS method in estimating the parameters in a linear model when the data contain a few variance inflated outliers. Here, we refer to some of the results of this study, for example. Consider a linear model with p = 2 covariates so that \(\beta = {(\beta _{1},\beta _{2})}^{\prime}\). For the associated K ×2 design matrix X, consider their design configuration:
With regard to the sample size, consider \(K(\equiv n) = 6,8,10,\ \mbox{and}\ 20\) to examine the effect of small as well as moderately large samples on the estimation. Furthermore, select two locations for the possible outlier, namely locations at i = 2 and 3 for K = 6; i = 2 and 4 for K = 8; i = 2 and 6 for K = 10; and i = 2 and 11 for K = 20. Also, without any loss of generality, choose σ 2 = 1, β 1 = 1, and β 2 = 0. 5. For variance inflation, eight values of ω i , namely ω i = 0. 001, 0. 005, 0. 01, 0. 05, 0. 10, 0. 25, 0. 50, and 1. 0, were considered. Note that ω i = 1. 0 represents the case where the data do not contain any outliers, whereas a small value of ω i indicates that y i is generated with a large variance implying that y i can be an influential outlier. The data were simulated 10,000 times. Under each simulation, the LS estimate of β and σ 2 were obtained, which are denoted by \(\hat{\beta}_{ls} = {(\hat{\beta}_{ls,1},\hat{\beta}_{ls,2})}^{\prime}\) and \(\hat{\sigma}_{ls}^{2}\), respectively. As far as the robust estimation of β and σ 2 is concerned, these parameters were estimated by using two robust approaches. More specifically, \(\hat{\beta}_{r(1)} = {(\hat{\beta}_{r(1),1},\hat{\beta}_{r(1),2})}^{\prime}\) is obtained by using (9), \(\hat{\beta}_{r(2)} = {(\hat{\beta}_{r(2),1},\hat{\beta}_{r(2),2})}^{\prime}\) is obtained by using (11), and similarly \(\tilde{\sigma}_{r(1)}^{2}\) and \(\tilde{\sigma}_{r(2)}^{2}\) are obtained from (14) and (15), respectively. The mean squared errors (MSEs) of these estimators based on 10,000 simulations are displayed in Figs. 1–3, for the estimates of β 1, β 2, and σ 2, respectively.
In summary, the results of this simulation study indicate that in the presence of a variance inflated outlier, the second robust approach performs worse as compared to the first robust and LS methods in estimating β 1 and β 2. In estimating σ 2, the LS method performs very poorly when compared with the robust methods.
2.2 Robust Estimation in GLM Setup For Independent Discrete Data
As opposed to the linear models in normal or other continuous exponential family based variables, the robust inference for discrete data in the GLMs setup, such as for count and binary data, is, however, not adequately discussed in the literature. For i = 1, …, K, let y i be a discrete response, such as count or binary, collected from the ith individual, and \(x_{i} = (x_{i1},\ldots,x_{iu},\ldots,x_{ip})^{\prime}\) be the corresponding p-dimensional observed covariate vector. Note that when the data contain a single outlier, any of the K responses \(y_{1},\ldots,y_{i},\ldots,y_{K}\) can be that outlier. Now, in the spirit of the mean shifted linear outlier model (2)–(3), suppose that we consider \(y_{j},\;j\neq i,\;i = 1,\ldots,K\), for example, to be the outlier because of the covariate for the jth individual, namely x j is contaminated. Note that if \(\tilde{x}_{i} = {(\tilde{x}_{i1},\ldots,\tilde{x}_{iu},\ldots,\tilde{x}_{ip})}^{\prime}\) denotes the p-dimensional uncontaminated covariate vector corresponding to y i for all i = 1, …, K, then for a positive vector δ, the observed covariates {x i } may be related to the uncontaminated covariates \(\{\tilde{x}_{i}\}\) as
It is of primary interest to estimate \(\beta = {(\beta _{1},\ldots,\beta _{u},\ldots,\beta _{p})}^{\prime}\), the effects of uncontaminated covariates \(\tilde{x}_{i}\) on the response y i . But, as not all the \(\tilde{x}_{i}\)’s are observed, one cannot use them to estimate β, instead the observed contaminated x i ’s are used, which causes bias and hence inconsistency in the estimators.
2.2.1 Understanding Outliers in Count and Binary Data
2.2.1.1 K Count Observations with a Single Outlier
First assume that in the absence of outliers, \(y_{1},\ldots,y_{i},\ldots,y_{K}\) are generated following the Poisson density \(P(Y _{i} = y_{i}) = [\exp (-\mu _{i})\mu _{i}^{y_{i}}]/y_{i}!\), with \(\mu _{i} =\exp (\tilde{x}_{i}^{\prime}\beta )\) with \(\tilde{x}_{i} = {(\tilde{x}_{i1},\tilde{x}_{i2})}^{\prime}\). Suppose that the values of these two covariates arise from
respectively, for all i = 1, …, K. Suppose that j is the index for the outlying observation that takes a value between 1 and K.
Now, to consider y j as an outlying value, that is, to have a data set of size K with one outlier, one may then shift the values of \(\tilde{x}_{j1}\) and \(\tilde{x}_{j2}\) as
respectively, but retain \(x_{i1} =\tilde{x}_{i1}\;\mbox{and}\;x_{i2} =\tilde{x}_{i2}\), for all i≠j. As far as the shifting is concerned, suppose that δ = 2. 0. Thus, y 1, …, y K refer to a sample of K count observations with y j as the single outlier.
2.2.1.2 K Binary Observations with a Single Outlier
Note that the existing literature (Copas 1988, p. 226; Carroll and Pederson 1993; Sinha 2004) does not provide a clear definition for the outliers in binary data. Remark that Cantoni and Ronchetti (2001) have suggested a practically useful MQL robust inference technique for independent data subject to outliers in GLM setup. However even though GLMs include count and binary models, since the concordant counts (bulk of the observations of similar nature) in the Poisson case and the concordant success numbers in the binomial case can be exploited in a similar way to recognize any possible outliers in the respective data sets, Cantoni and Ronchetti’s (2001) definitions of outliers are appropriate only for the Poisson and binomial cases. Thus, even though binary is a special case of the binomial setup, Cantoni and Rochetti’s (2001) robust inference development does not appear to be appropriate for the binary data. In view of these difficulties with regard to the robust inferences for the binary case, Bari and Sutradhar (2010a) have provided a new definition for the outliers for the binary data. More specifically, they dealt with one and two sided outliers in the binary data. For convenience these definitions are summarized as follows.
- One sided outlier:
-
For
$$\displaystyle{Pr[Y _{i} = 1] = E[Y _{i}] =\mu _{i} = \frac{exp(x_{i}^{\prime}\beta )} {1 + exp(x_{i}^{\prime}\beta )},}$$and
$$\displaystyle{p_{sb} = max\{\mu _{i}\},\;p_{lb} = min\{\mu _{i}\},}$$suppose that the bulk (K − 1) of the binary observations occur with small probabilities such that
$$\displaystyle{Pr[Y _{i} = 1] = \left \{\begin{array}{ll} \leq p_{sb}&\mbox{for}\;i\neq j,i = 1,\ldots,K, \\> p_{sb}&\mbox{for}\;i = j,\\ \end{array} \right.}$$(17)or, with large probabilities such that
$$\displaystyle{Pr[Y _{i} = 1] = \left \{\begin{array}{ll} \geq p_{lb}&\mbox{for}\;i\neq j,i = 1,\ldots,K, \\ <p_{lb}&\mbox{for}\;i = j,\\ \end{array} \right.}$$(18)Here the binary y j , whether 1 or 0, satisfying (17) is referred to as an upper sided outlier or satisfying (18) is referred to as a lower sided outlier, whereas the remaining K − 1 responses denoted by y i for i≠j constitute a group of “concordant” observations.
- Two sided outlier:
-
It may happen in practice that probabilities for the bulk of the observations lie in the range \(p_{sb} \leq P(Y _{i} = 1) \leq p_{lb}\), leading to a situation where one may encounter a two sided outlier. To be specific, y j = 0 or 1 will be an outlier if either P(Y j = 1) > p lb or P(Y j = 1) < p sb .
- Generation of K binary observations with an outlier:
-
We now illustrate the generation of K binary observations including one outlier. For the purpose one may first generate K binary responses \(y_{1},\ldots,y_{i},\ldots,y_{K}\) assuming that they do not contain any outliers. To be specific, generate these K “good” responses following the binary logistic model \(P(Y _{i} = 1) = [\exp (\tilde{x}_{i}^{\prime}\beta )]/[1 +\exp (\tilde{x}_{i}^{\prime}\beta )]\), with two covariates so that \(\tilde{x}_{i} = {(\tilde{x}_{i1},\tilde{x}_{i2})}^{\prime}\) and \(\beta = {(\beta _{1},\beta _{2})}^{\prime}\). As far as the covariate values are concerned, similar to the Poisson case, consider two covariates \(\tilde{x}_{i1}\) and \(\tilde{x}_{i2}\) as
$$\displaystyle{\tilde{x}_{i1}\stackrel{iid} \sim N(-1.0,0.25)\;\mbox{and}\;\tilde{x}_{i2}\stackrel{iid} \sim N(-1.0,0.5),}$$respectively, for i = 1, …, K.
Next, to create an outlier y j where j can take any value between 1 and K, change the corresponding covariate values \(\tilde{x}_{j1}\) and \(\tilde{x}_{j2}\) as
respectively. Note that for large positive δ 1 and δ 2, these modified covariates will be increased in magnitude yielding larger probability for y j = 1. One may then treat y j as an outlier. For convenience, suppose that one uses δ 1 = 3. 0 and δ 2 = 4. 0. As far as the remaining covariates are concerned, they are kept unchanged. That is, for i≠j (i = 1, …, K), consider \(x_{i1} =\tilde{x}_{i1}\;\mbox{and}\;x_{i2} =\tilde{x}_{i2}\).
2.2.2 Naive and Existing Robust QL Estimation Approaches
2.2.2.1 Naive QL (NQL) Estimation of β
Had there been no outliers, one could have obtained the consistent estimate of β by solving the well-known QL (quasi-likelihood) estimating equation
(see Wedderburn 1974; McCullagh and Nelder 1989; Heyde 1997) where, for example, \(\tilde{\mu}_{i} = E[Y _{i}] =\exp (\tilde{x}^{\prime}_{i}\beta )\;\mbox{and}\;V (\tilde{\mu}_{i}) = var[Y _{i}] =\tilde{\mu} _{i}\) for Poisson count data; and \(\tilde{\mu}_{i} = E[Y _{i}] =\exp (\tilde{x}^{\prime}_{i}\beta )/[1 +\exp (\tilde{x}^{\prime}_{i}\beta )]\;\mbox{and}\;V (\tilde{\mu}_{i}) = var[Y _{i}] =\tilde{\mu} _{i}(1 -\tilde{\mu}_{i})\) for binary data. But, as the uncontaminated \(\tilde{x}_{i}\)’s are unobserved, it is not possible to use (19) for the estimation of β. Now suppose that following (19) but by using the observed covariates {x i }, one writes the naive quasi-likelihood (NQL) estimating equation for β given by
where, for example, \(\mu _{i} =\exp (x^{\prime}_{i}\beta )\;\mbox{and}\;V (\mu _{i}) =\mu _{i}\) for Poisson count data; and \(\mu _{i} =\exp (x^{\prime}_{i}\beta )/[1 +\exp (x^{\prime}_{i}\beta )]\;\mbox{and}\;V (\mu _{i}) =\mu _{i}(1 -\mu _{i})\) for binary data. Since β is the effect of \(\tilde{x}_{i}\) on y i for all i = 1, …, K, it then follows that the quasi-likelihood estimator obtained from (20) will be biased and hence inconsistent for β.
2.2.2.2 Partly Standardized Mallows Type QL (PSMQL) Estimation of β
As a remedy to the inconsistency of the quasi-likelihood estimator obtained from (20), Cantoni and Ronchetti (2001) (see also references therein), among others, have suggested a Mallow’s type quasi-likelihood (MQL) robust estimation approach to obtain a consistent estimate for the regression effects β. For the purpose, for \(r_{i} = \frac{y_{i}-\mu _{i}} {\sqrt{V (\mu _{i} )}}\), they first define the Huber robust function as
where c is referred to as the so-called tuning constant. This robust function is then used to construct the MQL estimating equation given by
where \(a(\beta ) = \frac{1} {K}\sum _{i=1}^{K}w(x_{i})\frac{\partial \mu _{i}} {\partial \beta} {V}^{-\frac{1} {2}}(\mu _{i})E[\psi _{c}(r_{i})]\), with \(\mu _{i} = E(Y _{i})\), \(V (\mu _{i}) = var(Y _{i})\), and w(x i ) = 1 for the binomial data as in Huber’s linear regression case, but \(w(x_{i}) = \sqrt{(1 - h_{i} )}\) for the Poisson data, where h i is the ith diagonal element of the hat matrix \(H = X{({X}^{\prime}X)}^{-1}{X}^{\prime}\), with \(X = {(x_{1},\ldots,x_{i},\ldots,x_{K})}^{\prime}\) being the K ×p covariate matrix.
Note that in order to minimize the robust distance function \(\psi _{c}(r_{i})\), the MQL estimating (22) was constructed by using the variance \(V (\mu _{i}) = var(Y _{i})\) as a weight function and \(\frac{\partial \mu _{i}} {\partial \beta}\) as a gradient function, whereas a proper estimating equation should use \(var(\psi _{c}(r_{i}))\) and \(\frac{\partial \psi _{c}(r_{i})} {\partial \beta}\) as the weight and gradient functions, respectively. One may therefore refer to the estimating (22) as a partly standardized MQL (PSMQL) estimating equation. This PSMQL estimating (22) provides regression estimates with smaller bias than the traditional maximum likelihood or NQL estimating (20). But, as discussed in Bari and Sutradhar (2010a), this improvement does not appear to be significant enough to recommend the use of the PSMQL estimation approach. Moreover, this PSMQL approach is not suitable for inferences in binary regression models.
2.2.2.3 FSMQL Estimation of β
As an improvement over the PSMQL estimation, Bari and Sutradhar (2010a) have proposed a FSMQL estimation approach where the regression effects β is obtained by solving the FSMQL estimating equation
Note that this FSMQL estimating (23) is constructed by replacing the “working” variance and gradient functions V (μ i ) and \(\frac{\partial \mu _{i}} {\partial \beta}\) in (22), with the true variance and gradient functions \(var(\psi _{c}(r_{i}))\) and \(\frac{\partial \psi _{c}(r_{i})} {\partial \beta}\), respectively. Also, \(w(x_{i}) = \sqrt{(1 - h_{i} )}\) is used in both binary and Poisson cases. Furthermore, the specific formulas for the true weight function \(var(\psi _{c}(r_{i}))\) and the gradient function \(\frac{\partial \psi _{c}(r_{i})} {\partial \beta}\) for the count and binary cases are available from Bari and Sutradhar (2010a, Sects. 2.1 and 2.2).
Bari and Sutradhar (2010a) also considered another version of the FSMQL estimating (23), which was developed by using the deviance \(\psi _{c}(r_{i}) - E(\psi _{c}(r_{i}))\) instead of \(\psi _{c}(r_{i}) - \frac{1} {K}\sum _{i=1}^{K}E(\psi _{c}(r_{i}))\). This alternative FSMQL estimating equation has the form
For convenience, one may refer to (23) and (24) as the FSMQL1 and FSMQL2 estimating equations, respectively.
2.2.2.3.1 Robust Function and Properties for Count Data
For the count data, consider the Huber robust function ψ c (r i ) as in (21). The expectation and variance of this function are available from Cantoni and Ronchetti (2001, Appendix A, p. 1028). The gradient of the robust function and its expectation may then be computed as follows (see also Bari and Sutradhar 2010a, Appendix):
and
where
2.2.2.3.2 Robust Function and Properties for Binary Data
2.2.2.3.3 (a) Robust function in the presence of one sided outlier
Suppose that the bulk of the binary observations occur with small probabilities. In this case, the robust function ψ c (r i ) (i = 1, …, n) may be defined as
where \(\mu _{i} = \frac{exp(x_{i}^{\prime}\beta )} {1+exp(x_{i}^{\prime}\beta )}\), \(V (\mu _{i}) =\mu _{i}(1 -\mu _{i})\) for all i = 1, …, K, and \(p_{sb} = max\{\mu _{i}\}\), i≠j, is a bound for all K − 1 small probabilities.
Note that as opposed to the case given in (27), if the bulk of the binary observations occur with large probabilities, then the robust function \(\psi _{c}(r_{i})\) (i = 1, …, K) is defined as
where \(p_{lb} = min\{\mu _{i}\}\), i≠j, is a bound for all K − 1 large probabilities.
2.2.2.3.4 (b) Robust function in the presence of two sided outlier
In this case, the robust function \(\psi _{c}(r_{i})\) (i = 1, …, K) may be defined as
where \(\mu _{j}^{(c_{1})}\) and \({V}^{(c_{1})}(\mu _{j}^{(c_{1})})\) are defined as in (27), whereas \(\mu _{j}^{(c_{2})}\) and \({V}^{(c_{2})}(\mu _{j}^{(c_{2})})\) are defined as in (28).
2.2.2.3.5 (b(i)) Basic properties of the robust function ψ c (r i ): Binary case
It is convenient to write these properties for the two sided outlier case. The results for the one sided outlier may be obtained as a special case. The expectation, variance, and gradient of the robust function in the presence of a two sided outlier are available from Bari and Sutradhar (2010a, Appendix). For convenience, these properties are summarized as follows.
Let \(\psi _{c}(r_{i})\) denote the robust function defined as in (29). The expectation and variance of ψ c (r i ) are given by
and
where P 1, P 2, and P 3 are the probabilities for a binary observation to satisfy the conditions \(P(Y _{i} = 1)> p_{lb}\), \(p_{sb} \leq P(Y _{i} = 1) \leq p_{lb}\), and \(P(Y _{i} = 1) <p_{sb}\), respectively. In practice, the probabilities P 1, P 2, and P 3 may be computed from the data by using the sample proportions given by, for example,
The gradient of the robust function ψ c (r i ) [defined in (29)] and its expectation are given by
and
To illustrate the finite sample based relative performance of the competitive robust approaches, namely PSMQL (22), FSMQL1 (23), and FSMQL2 (24) approaches, we refer to some of the simulation results from Bari and Sutradhar (2010a). In the presence of a single outlier, the count and binary data were generated as in Sect. 2.2.1. With K = 60 observations including an outlier, the relative bias (RB) of an estimator, for example, for β k (k = 1, …, p) given by
were computed based 1,000 simulations. The results are shown in Table 1.
The results of the table show that both fully standardized robust procedures FSMQL1 and FSMQL2 perform much better in estimating β as compared to the existing PSMQL robust approach.
3 Robust Inference in Longitudinal Setup
3.1 Existing GEE Approaches for Robust Inferences
Let \(\mu _{i}(x_{i}) = E(Y _{i}) = {(\mu _{i1},\ldots,\mu _{it},\ldots,\mu _{iT})}^{\prime}\) denote the mean, and \(\Sigma _{i}(x_{i},\rho ): T \times T\) be the true covariance matrix of the response vector y i where x i represents all true covariates, i.e., \(x_{i} \equiv x_{i1},\ldots,x_{it},\ldots,x_{iT}\). For convenience, the covariance matrix \(\Sigma _{i}(x_{i},\rho )\) is often expressed as \(\Sigma _{i}(x_{i},\rho ) = A_{i}^{\frac{1} {2}}C_{i}(\rho )A_{i}^{\frac{1} {2}}\), where \(A_{i} = \mbox{diag}[\sigma _{i11},\ldots,\sigma _{itt},\ldots,\sigma _{iTT}]\) and C i (ρ) is the correlation matrix for repeated binary or count data. Note that if the longitudinal data do not contain any outliers, then one may obtain consistent and highly efficient estimate of β by solving the GQL estimating equation
(see Sutradhar 2003) where \(\hat{\rho}\) is a suitable consistent, for example, a moment estimate of ρ.
Note that in practice it may, however, happen that a small percentage such as 1% of longitudinal observations are suspected to be outliers. Suppose that m of the K T responses are referred to as the outliers when their corresponding covariates are shifted by an amount δ, δ being a real valued vector. For convenience, we denote the new set of covariates as
and use these observed covariates \(\tilde{x}_{it}\) for the estimation of β. It is, therefore, clear that since β is the effect of the true covariate x it on y it , the solution of the observed covariates \(\tilde{x}_{i}\) based naive GQL (NGQL) estimating equation
will produce biased and hence inconsistent estimate for β. To overcome this inconsistency problem, Preisser and Qaqish (1999), among others, have proposed to solve a resistant generalized quasi-likelihood estimating equation (REGEE) given by
where \(\psi _{i}^{{\ast}}\) is a down-weighting function, \(c_{i} = E(\psi _{i}^{{\ast}})\), and \(V _{i}(\tilde{x}_{i},\alpha )\) is a “working” covariance matrix (Liang and Zeger 1986). Note that the REGEE in (37) does not appear to be a proper weighted estimating equation. This is because, first, \(V _{i}(\tilde{x}_{i},\alpha )\) is only a substitute of \(\Sigma _{i}(\tilde{x}_{i},\rho )\) matrix, whereas in the presence of outliers, one needs to use \(\Omega _{i}^{{\ast}} = var(\psi _{i}^{{\ast}})\) in order to obtain efficient regression estimates. Secondly, the REGEE (37) uses \(\frac{\partial \mu _{i}^{\prime}(\tilde{x}_{i})} {\partial \beta}\) as the gradient function, whereas the consistency of the estimates may depend on the proper gradient function constructed by taking the derivative of the \(\psi _{i}^{{\ast}}- c_{i}\) function with respect to β.
Cantoni (2004) has provided an improvement over the REGEE by introducing the proper gradient function in the estimating equation. To be specific, as compared to Preisser and Qaqish (1999) (see also Eq. (36)), Cantoni (2004) constructed an improved resistant generalized estimating equation (IREGEE) given by
where \(E\left [\frac{\partial (\psi _{i}^{{\ast}}-c_{i})} {{\partial \beta}^{\prime}} \right ]\) is a proper gradient of the robust function \(\psi _{i}^{{\ast}}- c_{i}\), with
Note that the estimating (38) still uses a “working” covariance matrix \(V _{i}(\tilde{x}_{i},\alpha )\), whereas an efficient estimating equation (Sutradhar and Das 1999) should use the proper covariance matrix of the robust function, namely \(\Omega _{i}^{{\ast}} = var(\psi _{i}^{{\ast}})\). Further, similar to Cantoni (2004), Sinha (2006) has attempted to develop certain robust inferences to deal with outliers in the longitudinal data. But, Sinha (2006) has modeled the longitudinal correlations through random effects, which, therefore addresses a different problem than longitudinal data problems.
Recently, Bari and Sutradhar (2010b) has proposed an auto-correlation class based robust GQL (RGQL) approach for inferences in binary and count panel data models in the presence of outliers. This RGQL approach produces consistent and highly efficient regression estimates, and it is a generalization of the FSMQL approach for independent data to the longitudinal setup. The RGQL approach is summarized in the next section.
3.2 RGQL Approach for Robust Inferences in Longitudinal Setup
Note that when the covariates are stationary, that is, time independent, one may develop a general auto-correlation class based robust GQL estimation approach. Bari and Sutradhar (2010b) have considered non-stationary covariates and exploited the most likely AR(1) type correlation structures for both count and binary data. These correlation structures are discussed in detail in Sutradhar (2010), see also Sutradhar (2011). For convenience we summarize these correlation structures as follows.
Recall that \(x_{it} = {(x_{it1},\ldots,x_{itu},\ldots,x_{itp})}^{\prime}\) is the p ×1 vector of covariates corresponding to y it when the data do not contain any outliers, and β denote the effects of the covariate x it on y it . The AR(1) correlation models for repeated responses \(y_{i1},\ldots,y_{it},\ldots,y_{iT}\) based on the uncontaminated covariates \(x_{i1},\ldots,x_{it},\ldots,x_{iT}\), for binary and count data are given below.
3.2 AR(1) model for repeated binary data
For \(\mu _{it} = \frac{exp(x_{it}^{\prime}\beta )} {1+exp(x_{it}^{\prime}\beta )}\), for all t = 1, …, T, the AR(1) model for the binary data may be written as
(Zeger et al. 1985; Qaqish 2003) where ρ is a correlation index parameter. The binary AR(1) model (39) has the auto-correlation structure given by
where \(\sigma _{iuu} =\mu _{iu}(1 -\mu _{iu})\), for example, is the variance of y iu . Note that ρ parameter in (39)–(40) must satisfy the range restriction
3.2 AR(1) model for repeated count data
As opposed to the binary AR(1) model (39), the AR(1) model for the count data is defined as
(see McKenzie 1988; Sutradhar 2003), where \(y_{i,t-1} \sim Poisson(\mu _{i,t-1})\) and \(d_{it} \sim Poisson(\mu _{it} -\rho \mu _{i,t-1})\), with \(\mu _{it} = E(Y _{it}) =\exp (x^{\prime}_{it}\beta )\). In (42), d it and \(y_{i,t-1}\) are assumed to be independent. Also, for given count y i, t − 1,
where b j (ρ) stands for a binary variable with \(P[b_{j}(\rho ) = 1] =\rho\) and \(P[b_{j}(\rho ) = 0] = 1-\rho\). The AR(1) model (42) for count data has the auto-correlation structure given by
with ρ satisfying the range restriction
3.2.1 RGQL Estimating Equation
For \(\xi _{i} = {[\psi _{c}(r_{i1}),\ldots,\psi _{c}(r_{it}),\ldots,\psi _{c}(r_{iT})]}^{\prime}\), its expectation λ i is available from Cantoni and Ronchetti (2001) for the count data, and from Sect. 2.2.2 for the binary case. Recall from (38) that based on “working” covariance of the responses (Liang and Zeger 1986), Cantoni (2004) has suggested an IREGEE approach for estimating β in the presence of outliers. One may obtain consistent β estimate by solving a slightly different equation than (38) given by
where \(W_{i} = diag[w_{i1},\ldots,w_{it},\ldots,w_{iT}]\) is the T ×T covariate dependent diagonal weight matrix so that covariates corresponding to the outlying response yield less weight for the corresponding robust function. To be specific, the t-th diagonal element of the W i matrix is computed as \(w_{it} = \sqrt{1 - h_{itt}}\), h itt being the t-th diagonal element of the hat matrix \(H_{i} = \tilde{X}_{i}{(\tilde{X}_{i}^{\prime}\tilde{X}_{i})}^{-1}\tilde{X}_{i}^{\prime}\) with \(\tilde{X}_{i} = {[\tilde{x}_{i1},\ldots,\tilde{x}_{it},\ldots,\tilde{x}_{iT}]}^{\prime}\). See, for example, Cantoni and Ronchetti (2001). Also in (45), \(V _{i}(\alpha ) = \mbox{cov}(Y _{i}) = A_{i}^{\frac{1} {2}}R(\alpha )A_{i}^{\frac{1} {2}}\) is a “working” covariance matrix of y i , with R(α) as the associated “working” correlation matrix. Note that there are twofold problems with this estimating equation. First, for efficiency increase, it would have been appropriate to use \(\mbox{cov}(\xi _{i}) = cov[\psi _{c}(r_{i1}),\ldots,\psi _{c}(r_{it}),\ldots,\psi _{c}(r_{iT})]\) as the weight matrix instead of the true covariance matrix \(\Sigma _{i}(\alpha ) = \mbox{cov}(Y _{i})\). Secondly, Cantoni (2004) did not even use Σ i , rather has used a “working” covariance matrix \(V _{i}(\alpha ) = \mbox{cov}(Y _{i})\).
To overcome this inefficiency problem encountered by Cantoni’s approach, Bari and Sutradhar (2010b) have suggested a robust function based GQL (RGQL) estimating equation for β as
where
with
where, as mentioned above, the formulas for \(E[\psi _{c}(r_{it})]\) are available for both count and binary data.
3.2.1.1 Computation of Ω i for the Binary Data
Note that the computation of the product moment \(E\left [\psi _{c}(r_{iu})\psi _{c}(r_{it})\right ]\) in (48) is manageable for the binary case, but it is extremely difficult for the count data. For example, suppose that y it , t = 1, …, T, used in the robust functions \(\psi _{c}(r_{it})\), follow an AR(1) type correlation structure given by (40), where \(\mu _{it} = \frac{exp(x_{it}^{\prime}\beta )} {1+exp(x_{it}^{\prime}\beta )}\) and ρ is a correlation index parameter. Next, suppose that the binary data contain two sided outliers. One may then follow (29) and compute all nine combinations for the product term \(\psi _{c}(r_{iu})\psi _{c}(r_{it})\) and compute the expectations of all these nine terms, and derive the formulas as
where
for u < t. We may then easily compute ω iut by using (49) and (48).
Further note that for the one sided outlier case, the \(E\left [\psi _{c}(r_{iu})\psi _{c}(r_{it})\right ]\) can be obtained from (49) as follows. For the one sided down-weighting function \(\psi _{c}(r_{it})\) given in (28), one may compute the expectation of \(\psi _{c}(r_{iu})\psi _{c}(r_{it})\) from (49) by changing the limits obtained by replacing p lb with 0. Similarly, the product moment based on the down-weighting function \(\psi _{c}(r_{it})\) given in (27), can be obtained from (49) by changing the limits obtained by replacing p sb with 1.
Under the AR(1) binary correlation structure (40), the outlier based moment estimation formula for ρ derived from (49), is given by
Alternatively, for any lag 1 dependent [irrespective of the correlation structure such as AR(1) or MA(1)] binary or count data with possible outliers, the lag 1 correlation index parameter ρ may be estimated as
where \(\bar{\xi}_{t,w} = \frac{1} {K}\sum _{i=1}^{K}\psi _{c}(r_{it})w_{it}\).
3.2.1.2 Computation of Ω i for Count Data
Note that as opposed to the binary case, the construction of the Ω i matrix is difficult for the count data case. One may, however, alternatively compute this Ω i matrix by using the general formula
where \(A_{i\xi} = \left [var(\psi _{c}(r_{i1})),\ldots,var(\psi _{c}(r_{it})),\ldots,var(\psi _{c}(r_{iT}))\right ]\) and \(C_{i\xi} = (c_{i\xi,ut})\), with \(c_{i\xi,ut} = corr[\psi _{c}(r_{iu}),\psi _{c}(r_{it})]\) for u, t = 1, …, T. For (52), the formulas for \(var[\psi _{c}(r_{it})]\) for the binary data are given in Sect. 2.2.2, and for the count data they are available from Cantoni and Ronchetti (2001, Appendix). As far as the computation of the C iξ matrix is concerned, one may approximate this matrix by a constant matrix \(C_{\xi}^{{\ast}}\), say, by pretending that the covariates are stationary even though they are non-stationary (i.e., time dependent). Under this assumption, the (u, t)t h component of the constant matrix \(C_{\xi}^{{\ast}}\) may be computed as
where
with \(\bar{\xi}_{t} = \frac{1} {K}\sum _{i=1}^{K}\psi _{c}(r_{it})\), for all t = 1, …, T.
Note that the REGEE approach encounters convergence problems and also this approach produces regression estimates with much larger relative biases than the RGQL approach. See, for example, the finite sample relative performance of the RGQL and REGEE approaches shown through intensive simulation studies reported in Bari and Sutradhar (2010b).
References
Bailey, B.: Tables of the Bonferroni t-test. J. Am. Stat. Assoc. 72, 469–478 (1977)
Bari, W., Sutradhar, B.C.: On bias reduction in robust inference for generalized linear models. Scand. J. Stat. 37, 109–125 (2010a)
Bari, W., Sutradhar, B.C.: Robust inferences in longitudinal models for binary and count panel data in the presence of outliers. Sankhya B 72, 11–37 (2010b)
Beckman, R.J., Cook, R.D.: Outlier....s. Technometrics 25, 119–163 (1983)
Cantoni, E.: A robust approach to longitudinal data analysis. Can. J. Stat. 32, 169–180 (2004)
Cantoni, E., Ronchetti, E.: Robust inference for generalized linear models. J. Am. Stat. Assoc. 96, 1022–1030 (2001)
Carroll, R.J., Pederson, S.: On robustness in the logistic regression model. J. R. Stat. Soc. B 55, 693–706 (1993)
Cook, R.D., Prescott, P.: On the accuracy of Bonferroni significance levels for detecting outliers in linear models. Technometrics, 23, 59–63 (1981)
Copas, J.B.: Binary regression models for contaminated data (with discussion). J. R. Stat. Soc. B 50, 225–265 (1988)
Doornbos, R.: Testing for a single outlier in a linear model. Biometrics 37, 705–711 (1981)
Ellenberg, J.H.: The joint distribution of the standardized least squares residuals from a general linear regression. J. Am. Stat. Assoc. 68, 941–943 (1973)
Ellenberg, J.H.: Testing for a single outlier from a general linear model. Biometrics 32, 637–645 (1976)
Hampel, F.R., Rousseeuw, P.J., Ronchetti, E.M., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (1986)
Huber, P.J.: Robust Statistics. Wiley, New York (2004)
Heyde, C.C.: Quasi-likelihood and its applications. Springer-Verlag, New York (1997)
Johnson, B.A., Prescott, P.: Critical values of a test to detect outliers in factorial experiments. Appl. Stat. 24, 56–59 (1975)
Liang, K.-Y., Zeger, S.L.: Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22 (1986)
Lund, R.E.: Tables for an approximate test for outliers in linear regression. Technometrics 17, 473–476 (1975)
McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman and Hall, London (1989)
McKenzie, E.: Some ARMA models for dependent sequences of Poisson counts. Adv. Appl. Probab. 20, 822–835 (1988)
Preisser, J.S., Qaqish, B.F.: Robust regression for clustered data with applications to binary regression. Biometrics 55, 574–579 (1999)
Prescott, P.: An approximate test for outliers in linear models. Technometrics 17, 129–132 (1975)
Qaqish, B.F.: A family of multivariate binary distributions for simulating correlated binary variables with specified marginal means and correlations. Biometrika 90, 455–463 (2003)
Rousseeuw, P.J., Leroy, A.M.: Robust Regression and Outlier Detection. Wiley, New York (1987)
Sinha, S.K.: Robust analysis of generalized linear mixed models. J. Am. Stat. Assoc. 99, 451–460 (2004)
Sinha, S.K.: Robust inference in generalized linear model for longitudinal data. Can. J. Stat. 34, 1–18 (2006)
Srikantan, K.S.: Testing for a single outlier in a regression model. Sankhya A 23, 251–260 (1961)
Stefansky, W.: Rejecting outliers by maximum normed residual. Ann. Math. Stat. 42, 35–45 (1971)
Stefansky, W.: Rejecting outliers in factorial designs. Technometrics 14, 469–479 (1972)
Street, J.O., Carroll, R.J., Ruppert, D.: A note on computing robust regression estimates via iteratively reweighted least squares. Am. Stat. 42, 152–154 (1988)
Sutradhar, B.C.: An overview on regression models for discrete longitudinal responses. Stat. Sci. 18, 377–393 (2003)
Sutradhar, B.C.: Inferences in generalized linear longitudinal mixed models. Can. J. Stat. 38, 174–196 (2010)
Sutradhar, B.C.: Dynamic Mixed Models for Familial Longitudinal Data. Springer, New York (2011)
Sutradhar, B.C., Chu, D.P.T., Bari, W.: Estimation effects on powers of two simple test statistics in identifying an outlier in linear models. J. Stat. Comput. Simul. 77, 305–328 (2007)
Sutradhar, B.C., Das, K.: On the efficiency of regression estimators in generalized linear models for longitudinal data. Biometrika 86, 459–465 (1999)
Tietjen, G.L., Moore, R.H., Beckman, R.J.: Testing for a single outlier in simple linear regression. Technometrics 15, 717–721 (1973)
Wedderburn, R.W.M.: Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika 61, 439–447 (1974)
Zeger, S.L., Liang, K.Y., Self, S.G.: The analysis of binary longitudinal data with time independent covariates. Biometrika 72, 31–38 (1985)
Acknowledgments
The author fondly acknowledges the stimulating discussion by the audience of the symposium and wishes to thank for their comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Sutradhar, B.C. (2013). Robust Inference Progress from Independent to Longitudinal Setup. In: Sutradhar, B. (eds) ISS-2012 Proceedings Volume On Longitudinal Data Analysis Subject to Measurement Errors, Missing Values, and/or Outliers. Lecture Notes in Statistics(), vol 211. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6871-4_9
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6871-4_9
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6870-7
Online ISBN: 978-1-4614-6871-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)