Abstract
The marginalized zero-inflated poisson (MZIP) regression model quantifies the effects of an explanatory variable in the mixture population. Also, in practice the variables are usually partially observed. Thus, we first propose to study the maximum likelihood estimator when all variables are observed. Then, assuming that the probability of selection is modeled using mixed covariates (continuous, discrete and categorical), we propose a semiparametric inverse-probability weighted (SIPW) method for estimating the parameters of the MZIP model with covariates missing at random (MAR). The asymptotic properties (consistency, asymptotic normality) of the proposed estimators are established under certain regularity conditions. Through numerical studies, the performance of the proposed estimators was evaluated. Then the results of the SIPW are compared to the results obtained by semiparametric inverse-probability weighted kermel-based (SIPWK) estimator method. Finally, we apply our methodology to a dataset on health care demand in the United States.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 INTRODUCTION
Although Poisson models (or binomial models) are the most widely used tools for modeling count data, we are seeing more and more count data with zero inflation in several fields such as economics, biomedical studies, criminology, insurance, sociology and political science. When the number of observed zeros is greater than that predicted by standard counting distributions, zero inflation (ZI) regression models are an alternative for modeling such data. For more information on ZI regression models, see Lambert [1], Diallo et al. [4, 7], Kouakou et al. [27], Ali et al. [28]. The ZIP distribution proposed by Lambert has gained popularity. ZIP regression models have been used successfully in a variety of important applications, see for example Dietz et al. [2], Yau et al. [3], and Cheung et al. [5].
However, the ZIP distribution has two group regression parameters, one for the probability of being an zero-inflation and the other for the Poisson mean. The parameters have latent class interpretations, these latent classes are often thought to classify some not-at-risk group and the at-risk group indicating a difference in susceptibility between the two populations. Because entire population parameters interpretations are desired, Long et al. [16] introduced the marginalized zero-inflation Poisson regression (MZIP).
In practice, the data are most often partially observed. In this context, the basic method used is called the complete cases method which consists in removing the individuals who have at least one missing data. This method is simple to implement. However, when the proportion of individuals who have a missing data is higher than 5\(\%\) this method gives bad results. Two other alternatives to the complete case for the treatment of missing data are the Monte Carlo EM algorithm (MCEM) and multiple imputation (MI). The MCEM and MI methods are efficient but require quite high computational loads. Finally, the IPW method that is often used requires that we find the right model for the selection probability. see for example Diallo et al. [7] and Benecha et al. [18]. To circumvent these modeling difficulties while proposing a non-numerical method, Lukusa et al. [8, 9] proposed weighted semiparametric estimators that are suitable when the selection probabilities are expressed in terms of covariates of the same nature. However, there is little work on the estimation of the MZIP model in the context of missing data. This work aims to fill this gap. In this article, we propose a semiparametric approach in which the probability of selection that is a function of continuous, discrete and categorical covariates is estimated nonparametrically. This alternative consists in discretizing the continuous covariates using Jenks’s method to have categorical covariates.
The rest of the paper is organized as follows. In the Section 2, We present the MZIP regression model and its maximum likelihood estimator. We present the SIPWK and SIPW estimation methods of MZIP model when the covariates are missing at random (MAR) and the consistency and asymptotic normality of the SIPW estimators are established in Section 3. The performance of the presented estimators are evaluated in Section 4. As an illustration, we apply these methods to real data in Section 5. A discussion and some perspectives are presented in Section 6. The technical proofs are reported in an Appendix.
2 MARGINALIZED ZIP MODELS
The ZIP distribution is used to model the counting variable of interest, namely \(Y_{i}\), \(i=1\ldots n\). \(Y_{i}\) takes the value of from a Poisson distribution, with a mean of \(\mu_{i}\), with a probability of \(1-\psi_{i}\), or is drawn to zero from a Bernoulli distribution, with a probability of \(\psi_{i}\). For example in dental caries research, the marginal mean \(\nu_{i}\) caries count is often of more interest than the mean caries count \(\mu_{i}\) of a susceptible latent group of individuals see Preisser [17].
Because entire population parameter interpretations are desired, the marginal mean \(\nu_{i}\) can be modeled directly to give overall exposure effect estimates. Given that \(\mu_{i}=\nu_{i}/(1-\psi_{i})\) the representation of the MZIP distribution is
In the MZIP model, Long et al. [16] links regression parameters directly to the marginal mean \(\nu_{i}\), while employing another set of parameters to model the probability of being an excess zero (i.e., \(\psi_{i}\)). The parameters \(\nu_{i}\) and \(\psi_{i}\) of MZIP model are modeling by
where \(\gamma=(\gamma_{1},\gamma_{2},\ldots,\gamma_{q})^{T}\) is a \((q\times 1)\) column have the same interpretation as in ZIP model, \(\alpha=(\alpha_{1},\alpha_{2},\ldots,\alpha_{p})^{T}\) is a \((p\times 1)\) vector of regression parameters for \(\nu_{i}\) having interpretations as the log-incidence density ratio (IDR) for the entire sample population and \(\mathbf{X}_{i_{(p\times 1)}}\) and \(\mathbf{Z}_{i_{(q\times 1)}}\) denote the vectors of covariates for the \(i\)th individual. Let \(\theta=(\gamma^{T},\alpha^{T})^{T}\). Consider that we observe a sample of \(n\) independent copies \((Y_{1},\mathbf{X}_{1},\mathbf{Z}_{1})\), \((Y_{2},\mathbf{X}_{2},\mathbf{Z}_{2}),\ldots,(Y_{n},\mathbf{X}_{n},\mathbf{Z}_{n})\) of \((Z,\mathbf{X},\mathbf{Z})\). Then, the log-likelihood of \(\theta\) is
where \(J_{i}=1_{\{Y_{i}=0\}}\). The maximum likelihood estimator \(\hat{\theta}_{F,n}=(\hat{\gamma}^{T}_{n},\hat{\alpha}^{T}_{n})^{T}\) of \(\theta\) is the solution of the equation \(U_{F,n}(\theta)=0\), with
where
and
3 ESTIMATING PARAMETERS WITH MISSING COVARIATES
Let \(\mathbf{X}\) and \(\mathbf{Z}\) be the vectors covariates with missing data and \(Y\) always observed. Let \(\Delta_{i}\) be a dummy variable that is \(1\) when \(\{\mathbf{Z}_{i},\mathbf{X}_{i}\}\) is completely observed, \(0\) otherwise, see Rubin [12] for details. We consider covariates mixed (continuous, discrete, and categorial). Let \(\mathbf{V}=(Y,\mathbf{S}^{D},\mathbf{S}^{C})^{T}\), where \(\mathbf{S}^{D}=(\mathbf{X}^{D(\textrm{obs}),T},\mathbf{Z}^{D(\textrm{obs}),T})\) denote the vector of discretes variables that are always observed on each individual, \(\mathbf{S}^{C}=(\mathbf{X}^{C(\textrm{obs}),T},\mathbf{Z}^{C(\textrm{obs}),T})\) denote the vector of continuous variables that are always observed on each individual and \(\{\mathbf{X}^{(\textrm{miss}),T},\mathbf{Z}^{(\textrm{miss}),T}\}\) the missing components of \(\{\mathbf{X},\mathbf{Z}\}\). Under the MAR mechanism, define the selection probability
3.1 Kernel-Based Weighting Estimator of a MZIP Model
Let \(\mathbf{D}=(\mathbf{X}^{(\textrm{obs}),T},\mathbf{Z}^{(\textrm{obs}),T})\) and \(d\in\{d_{1},d_{2},\ldots,d_{m}\}\) denote the distinct values of the \(\mathbf{D}\). We consider \(\hat{\pi}(y,d)\) a Nadaraya–Watston (N-W) [22, 24] type estimator of \(\pi(y,d)\) defined by
where \(K_{h}\) is a kernel function and \(h\) is a bandwidth satisfying some conditions stated in Wang [23]. The resulting semiparametric kernel-assisted weighting (SIPWK) estimator \(\hat{\theta}_{n}^{wsk}\) of \(\theta\) in models 2.1 and 2.2 is the solution of the equation
In the following section, we present another weighted semiparametric estimation of a MZIP regression model.
3.2 Semiparametric IPW (SIPW) Estimator of a MZIP Model
We recall that \(\mathbf{S}^{C}=(\mathbf{X}^{C(\textrm{obs}),T},\mathbf{Z}^{C(\textrm{obs}),T})\) is the set of observed continuous covariates. Inspired by Jenks’ method [26], we discretize this set. Using Herbert’s method [25], we obtain the number of optimal classes. Jenk’s method is based on the similarity principle. The method minimizes the intraclass variance. This method allows to have new categorical covariates \(\mathbf{S}^{\prime,D}\).
Let \(s_{1}^{D},s_{2}^{D},\ldots,s_{m}^{D}\) denote the distinct values of the \(\mathbf{S}_{i}^{D}\)s, \(s_{1}^{\prime,D},s_{2}^{\prime,D},\ldots,s_{m}^{\prime,D}\) denote the distinct values of the \(\mathbf{S}^{\prime,D}\)s. The nonparametric estimator of \(\pi(y,s^{D},s^{\prime,D})\) is given by the following expression:
where \(y=0,1,2,\ldots\), \(s^{D}\in\{s_{1}^{D},s_{2}^{D},\ldots,s_{m}^{D}\}\) and \(s^{\prime,D}\in\{s_{1}^{\prime,D},s_{2}^{\prime,D},\ldots,s_{m}^{\prime,D}\}\).
Thus, in this context, the SIPW estimator \(\hat{\theta}_{n}^{ws}\) of \(\theta\) in models 2.1 and 2.2 is the solution of the equation
We study the asymptotic properties of \(\hat{\theta}_{n}^{F}\) and \(\hat{\theta}_{n}^{ws}\) in the following section.
3.3 Asymptotic Results
To establish the asymptotic properties of \(\hat{\theta}^{F}_{n}\) and \(\hat{\theta}_{n}^{ws}\) we give conditions of regularity.
-
\(\mathbf{H1.}\) The true parameter value \(\theta_{0}:=(\gamma_{0}^{T},\alpha_{0}^{T})^{T}\) lies in the interior of some known compact set of \(\mathbb{R}^{p}\times\mathbb{R}^{q}\).
-
\(\mathbf{H2.}\) Let \(\textrm{supp}(\mathbf{S}^{D})\) denote’s the support of \(\mathbf{S}^{D}\) and \(\textrm{supp}(\mathbf{S}^{\prime,D})\) denote’s the support of \(\mathbf{S}^{\prime,D}\). Assume \(\textrm{supp}(\mathbf{S}^{D})\) and \(\textrm{supp}(\mathbf{S}^{\prime,D})\) does not depend on \(\theta\). Furthermore, for any \(y=0,1,\ldots\) , for \(s^{D}\in\textrm{supp}(\mathbf{S}^{D})\) and for \(s^{\prime,D}\in\textrm{supp}(\mathbf{S}^{\prime,D})\), the selection probability \(\pi(y,s^{D},s^{\prime,D})>0\).
-
\(\mathbf{H3.}\) \(\mathbb{E}\left[\frac{\dot{l_{i}}(\theta)\dot{l_{i}}(\theta)^{T}}{\pi(\mathbf{V}_{i})}\right]\) is finite and positive definite in neighborhood of the true \(\theta\).
-
\(\mathbf{H4.}\) In a neighborhood of the true \(\theta\), the first and second derivatives of \(U_{F,n}(\theta)\) with respect to \(\theta\) exist almost surely and are uniformly bounded above by a fonction of \((Y,\mathbf{X},\mathbf{Z})\), whose expectations exist.
-
\(\mathbf{H5.}\) The first derivatives of \(U_{w,n}(\theta,\pi)\) with respect to \(\theta\) exist almost surely in a neighborhood of \(\theta_{0}\). Additionally, in such a neighborhood, these first derivatives are uniformly bounded above by a function of \((Y,\mathbf{X},\mathbf{Z})\), whose expectations exist.
The asymptotic properties of \(\hat{\theta}^{F}_{n}\) and \(\hat{\theta}^{ws}_{n}\) are stated in Theorems 1 and 2, respectively. The detailed of proofs of Theorem 1 in the Appendix A and Theorem 2 in the Appendix B.
Before studying the asymptotic properties of the estimators, we define by
Because each component of \(\Sigma_{n}(\theta)\) is a mean of independent and identically distributed random variables, we have \(\mathbb{E}\left[\Sigma_{n}(\theta)\right]=\mathbb{E}\left[-\frac{\partial^{2}l_{1}(\theta)}{\partial\theta\partial\theta^{T}}\right]=\Sigma(\theta).\)
Theorem 1. Assume that conditions (H1), (H2), and (H4) hold. Then \(\hat{\theta}^{F}_{n}\) converges in probability to \(\theta_{0}\), as \(n\rightarrow\infty\) and \(\sqrt{n}(\hat{\theta}_{n}^{F}-\theta_{0})\) has an asymptotic normal distribution with mean zero and covariance matrix \(\Delta_{F}\), with \(\Delta_{F}:=\Sigma(\theta_{0})^{-1}Q_{F}(\theta_{0})[\Sigma(\theta_{0})^{-1}]^{T}\), where \(Q_{F}(\theta)=\mathbb{E}\left[\dot{l_{1}}(\theta)\dot{l_{1}}(\theta)^{T}\right]\).
Since the inverse of the Fisher information matrix is the variance of the score function, we can have \(\Sigma(\theta_{0})=Q_{F}(\theta_{0})\). Finally \(\Delta_{F}=\Sigma(\theta_{0})^{-1}\).
Theorem 2. Assume that conditions (H1), (H2), and (H4) hold. Then \(\hat{\theta}^{ws}_{n}\) converges in probability to \(\theta_{0}\), as \(n\rightarrow\infty\) and \(\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})\) has an asymptotic normal distribution with mean zero and covariance matrix \(\Delta_{ws}\), with \(\Delta_{ws}:=\Sigma(\theta_{0})^{-1}\{\Omega_{3}(\theta_{0},\pi)-\left[\Omega_{4}(\theta_{0},\pi)-\Omega_{5}(\theta_{0},\pi)\right]\}[\Sigma(\theta_{0})^{-1}]^{T}\), where \(\Omega_{3}(\theta_{0},\pi)=\mathbb{E}\left[\frac{\dot{l}_{i}(\theta_{0})\dot{l}_{i}(\theta_{0})^{T}}{\pi(Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})}\right]\), \(\Omega_{4}(\theta_{0},\pi)=\mathbb{E}\left[\frac{\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}}{\pi(Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})}\right]\), \(\Omega_{5}(\theta_{0},\pi)=\mathbb{E}\left[\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]\), and \(\dot{l}_{i}^{*}(\theta_{0})=\mathbb{E}\left[\dot{l}_{i}(\theta_{0})|Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i}\right]\).
4 SIMULATIONS STUDY
In this section, we study the performances under various conditions of the following estimators:
-
\(\hat{\theta}^{F}_{n}\) the maximum likelihood estimator obtained by solving the equation \(U_{F,n}(\theta)=0\) where \(U_{F,n}(\theta)\) is defined in 2.3.
-
\(\hat{\theta}^{wsk}_{n}\) the SIPWK estimator obtained by solving the Eq. (3.1).
-
\(\hat{\theta}^{ws}_{n}\) the SIPW estimator obtained by solving the Eq. (3.2).
In this numerical study, we consider samples of size \(n=2000\) and \(1000\).
where \(X_{i1}=Z_{i1}=1\), \(Z_{i2}=X_{i2}\), and \(Z_{i2}\), \(Z_{i3}\), \(Z_{i4}\), \(X_{i3}\), follows, respectively, the Gaussian distribution \(N(0,1.7)\), Poisson distribution \(P(0.5)\), exponential distribution \(E(1)\), and binomial distribution \(B(1,0.5)\). The regression parameter \(\alpha\) is chosen as follows \(\alpha=(1.2,0.2,-0.7)^{T}\). The regression parameter \(\gamma\) is chosen as follows
-
case 1: \(\gamma=(-1,0.4,0.3,0.45)^{T}\) ,
-
case 2: \(\gamma=(-1,0.62,0.3,0.8)^{T}\).
In case 1 (respectively case 2), the average percentage of zero inflation in this simulation is \(41\%\) (respectively \(65\%\)). In the variable \(Z_{i4}\), we assume that the data are missing. The average fraction of missing data (AFMD) in the simulated samples is equal to \(15\) and \(30\%\). We used a multiplicative kermel (the Dirac discrete kermel for discrete variables and the Gaussian kernel for the continuous variable) for the kernel-based weighting estimator of an MZIP model. Finally, for each configuration (sample size, proportions of zero inflation and missing data), we simulate \(N=1000\) samples and calculate \(\hat{\theta}_{n}^{ws}\) and \(\hat{\theta}_{n}^{wsk}\). We use the statistical software R.3.5.2 to perform our simulations and the maxlik package (see Henningsen et al. [19]) to solve Eqs. (2.3), (3.1). We compute the bias of the estimates \(\hat{\gamma}_{j,n}\) and \(\hat{\alpha}_{k,n}\). We obtain the bias, the standard deviation (SD) and the mean square error (RMSE) for each estimator \(\hat{\gamma}_{j,n}(j=1,...,4)\) and \(\hat{\alpha}_{k,n}(k=1,...,3)\). For comparison purposes, we also provide the results that would be obtained if there were no missing covariates. In this case, the MLE is obtained by solving the score equation (2.2) (FD estimator). In Table 1, we present the results for \(n=500\), 41\(\%\) (top) and 65\(\%\) (bottom) zero inflation and mean missing data 15 and 30\(\%\). Table 2, we present the results for \(n=1000\), 41\(\%\) (top) and 65\(\%\) (bottom) zero inflation and mean missing data 15 and 30\(\%\). Table 3 provides the results for \(n=2000\), 41\(\%\) (top) and 65\(\%\) (bottom) zero inflation and the average missing data 15 and 30\(\%\). The Tables 1–3 show that both methods perform well, as the results obtained with both methods are close to the base case. The results also show that the bias and RMSE of the proposed method are generally better than the bias and RMSE of the SIPWK method. Let us now examine the performance of the proposed estimator. The results in Tables 1–3 show that the bias, standard deviation, and RMSE decrease as the sample size increases and the proportion of individuals with missing covariates decreases. Furthermore, the bias remains reasonable even with 30\(\%\) missing data. The estimator \(\hat{\theta}^{F}_{n}\) is obviously better than \(\hat{\theta}^{wsk}_{n}\) and \(\hat{\theta}^{wsk}_{n}\), but FD is only possible in the absence of missing data.
5 APPLICATION
In this section, we describe an application of the MZIP model to NMES1988 data obtained from the National Medical Expenditure Survey (NMES) conducted in 1987–1988. We analyze the variable ofnp (number of consultations with a non-physician health professional in a practice) by the MZIP. The proportion of zero in the observations of this variable is equal to 0.6818. This very high proportion suggest a situation of inflation of zeros. For each of the individuals \(i\,(i=1\ldots n=4406)\) of the sample, let \(Y_{i}\) denote the number of consultations a non-physician health professional in a practice.
-
\(\psi_{i}\) represents the probability that patient \(i\) will give up in such a way systematic to consult a non-physician professional.
-
\(\nu_{i}\) represents the average number of consultations with a health professional not doctor, for a patient \(i\).
To model the marginal mean and zero-inflation parameters \(\nu_{i}\) and \(\psi_{i}\) defined in (2.2), where \(Z_{i}\) and \(X_{i}\) are the set of covariates, we proceeded as follows. First, we fitted an MZIP regression model incorporating all the covariates available in (2.2), i.e., taking \(X_{i}=Z_{i}\) for each \(i\). Next, Wald tests were used to select the relevant covariates in the sub-models (2.2). Through this procedure, we identify three significant predictors included in \(\nu_{i}\) (chronic, gender, school) and six significant predictors included in \(\psi_{i}\) (chronic, medicaid, age, income, gender, school). The significant covariates are gender (1 for female, 0 for male), age (in years, divided by 10), school (number of years of education), income (in 10 000 dollars), chronic diseases (cancer, arthritis, diabetes…), and medicaid (a binary variable indicating whether the individual is covered by medicaid or not). The covariate age (in years, divided by 10) was discretized before applying the proposed method. We therefore model \(\psi_{i}\) and \(\nu_{i}\) as follows:
We simulated \(15\%\) (moderate) and \(30\%\) (high) proportions of missing data in the ‘‘income’’ variable, respectively. Indeed, among the covariates, the ‘‘income’’ variable is the most likely to have missing data, as it is more sensitive and confidential information. Respondents are often reluctant to disclose their income, which can lead to higher rates of missing data for this variable. According to Mishra et al. [29], National Health and Nutrition Examination Survey, the rate of missing data in the ‘‘income’’ variable is often high, reaching or exceeding 15\(\%\). Tables 4 and 5 show the estimation results for the case with no missing data (FD) and 15\(\%\) missing data, followed by the case with no missing data (FD) and 30\(\%\) missing data, respectively. We can say that the proposed method is robust because when the percentage of missing data increases, the covariates remain significant and the coefficients keep the same signs as in the reference case (FD). We can state that the variables of Medicaid status and gender are identified as the most influential factors in the decision to never use consultations with a non-physician health care professional. Medicaid recipients are more likely to forego a non-physician health care professional during an office visit. One explanation is that patients covered by Medicaid can limit their consultations to those that are necessary, i.e., not see a doctor, given that Medicaid is health insurance for the less well-off.
The probability of never using a doctor decreases with chronic, income, school, and age. The probability of never using a non-physician health care professional in a medical office decreases with the level of education because better-informed patients may tend to diversify their use of care. This probability decreases as health status worsens (in part because patients with worsening health status tend to favor visits to health professionals). This probability decreases with income because patients with higher incomes prefer to visit a health care professional.
The number of chronic illnesses and the level of education are the variables that most influence the average number consultations with non-physician healthcare professionals because patients with chronic conditions and those with higher levels of education visit regularly.
6 CONCLUSIONS
In this article, we have proposed a method for estimating the parameters of the MZIP model with MAR covariates. We compare the performance of this estimator with that of the kernel-assisted weighted estimator. The analysis of the numerical results concludes that the proposed \(\hat{\theta}^{ws}_{n}\) estimator and the \(\hat{\theta}^{wsk}_{n}\) estimator has a good performance. However, the simulation results suggest that the proposed method is more efficient than the kernel-assisted weighting method. The proposed SIPW estimator was used to analyze data from the U.S. public health economics NMES1988. The results of this analysis confirm the robustness of the proposed SIPW estimator.
In this paper, we assume that our data are MAR. But the missing data model is not monotonic in many practical situations. Adapting this approach to non-monotonic missing data in MZIP regression deserves further research.
REFERENCES
D. Lambert, “Zero-inflated Poisson regression with an application to defects in manufacturing,” Technometrics 34, 1–14 (1992).
D. Lambert, ‘‘Zero-inflated Poisson regression with an application to defects in manufacturing,’’ Technometrics 34, 1–14 (1992).
E. Dietz and D. Böhning, ‘‘On estimation of the Poisson parameter in zero-modified Poisson models,’’ Comput. Stat. Data Anal. 4, 441–459 (2000).
K. K. W. Yau and A. H. Lee, ‘‘Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme,’’ Stat. Med. 20, 2907–2920 (2001).
A. O. Diallo, A. Diop, and J.-F. Dupuy, ‘‘Asymptotic properties of themaximum likelihood estimator in zero-inflated binomial regression,’’ Commun. Stat. Theory Methods 46 (20), 9930–9948 (2017).
Y. B. Cheung, ‘‘Zero-inflated models for regression analysis of count data: a study of growth and development,’’ Stat. Med. 21, 1461–1469 (2002).
M. Reilly and M. S. Pepe, ‘‘A mean score method for missing and auxiliary covariates data in regression methods,’’ Biometrika 82, 299–314 (1995).
A. Diallo, A. Diop, and J.-F. Dupuy, ‘‘Estimation in zero-inflated binomial regression with missing covariates,’’ Statistics 53 (4), 839–865 (2019).
T. M. Lukusa, S.-M. Lee, and C.-S. Li, ‘‘Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates,’’ Metrika 79 (4), 457–483 (2016).
T. M. Lukusa and F. K. Hing Phoa, ‘‘A note on the weighting-type estimations of the zero-inflated Poisson regression model with missing data in covariates,’’ Journal Pre-proof. (2019).
D. G. Horvitz and D. J. Thompson, ‘‘A generalization of sampling without replacement from a finite universe,’’ Current Res. Biostat. 47, 663–685 (1952).
S. H. Hsieh, S. M. Lee, and P. S. Shen, ‘‘Logistic regression analysis of randomized response data with missing covariates,’’ J. Stat. Plan. Inference 140, 927–940 (2010).
D. B. Rubin, ‘‘Inference and missing data,’’ Biometrika 63 (3), 581–592 (1976).
R. V. Foutz, ‘‘On the unique consistent solution to the likelihood equations,’’ J. Am. Stat. Assoc. 72, 147–148 (1977).
D. Böhning, E. Dietz, P. Schlattmann, L. Mendonca, and U. Kirchner, ‘‘The zero-inflated Poisson model and the decayed, missing, and filled teeth index in dental epidemiology,’’ J. R. Stat. Soc. Ser. A 162, 195–209 (1999).
S. H. Hsieh, S. M. Lee, and P. S. Shen, ‘‘Semiparametric analysis of randomized response data with missing covariates in logistic regression,’’ Comput. Stat. Data Anal. 53, 2673–2692 (2009).
D. Long, J. S. Preisser, A. H. Herringb, and C. E. Golin, ‘‘A marginalized zero-inflated Poisson regression model with overall exposure effects,’’ Statist. Med. 33, 5151–5165 (2014).
J. S. Preisser, J. W. Stamm, D. L. Long, and M. E. Kincade, ‘‘Review and recommendations for zero-inflated count regression modeling of dental caries indices in epidemiological studies,’’ Caries Research. 46 (4), 413–423 (2012) .
K. H. Benecha, J. S. Preisser, and K. Das, ‘‘Marginal Zero-inflated models with missing covariates,’’ Biometric Journal (2018).
A. Henningsen and O. Toomet, ‘‘maxLik: A package for maximum likelihood estimation in R,’’ Computational Statistics 26 (3), 443–458 (2011).
D. Wang and S. X. Chen, ‘‘Empirical likelihood for estimating equations with missing values,’’ Ann. Stat. 37, 490–517 (2000).
M. Reilly and M. S. Pepe, ‘‘A mean score method for missing and auxiliary covariates data in regression methods,’’ Biometrika 82, 299–314 (2019).
E. A. Nadaraya, ‘‘On estimating regression,’’ Theory of probability and its applications 9, 141–142 (1964).
S. Wang and C. Y. Wang, ‘‘A note on kernel assisted estimators in missing covariate regression,’’ Stat. Probabil. Lett. 55, 439–449 (2001).
G. S. Watson, ‘‘Smooth regression analysis,’’ Sankhya, Series A 26, 359–372 (1964).
Herbert A. Sturges, ‘‘The Choice of a Class Interval,’’ Journal of the American Statistical Association 21, 153 (1926).
G F. Jenks, Optimal Data Classification for Choropleth Maps (Lawrence Kansas, 1977).
K. J. G. Kouakou, O. Hili, and J. F. Dupuy, ‘‘Estimation in the Zero-Inflated Bivariate Poisson model, with an application to healt-care utilization data,’’ Africa Statistica 16 (2): 2767–2788 (2021).
E. Ali, M. L. Diop, nd A. Diop, ‘‘Statistical inference in a Zero-Inflated Bell regression model,’’ Mathematical methods of Statistic 31, 91–104 (2022).
Suruchi Mishra, Cynthia L. Ogden, and Melissa Dimeler, Dietary Supplement Use in the United States: National Health and Nutrition Examination Survey, National Health Statistics Reports, 2017–March 2020 (2023).
ACKNOWLEDGMENTS
Authors are grateful to referees and editor for their comments and suggestions that led to significant improvements of earlier versions of this article.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
The authors declare that they have no conflicts of interest.
Appendix A
Appendix A
PROOFS OF ASYMPTOTIC RESULTS
6.1 Proof of Theorem 1
We prove consistency of \(\hat{\theta}_{n}^{F}\) by checking the conditions of the inverse function theorem of Foutz [13]. These conditions are proved in a series of technical lemmas.
Lemma 1. As \(n\rightarrow\infty\), \(n^{-1/2}U_{F,n}(\theta_{0})\) converges in probability to 0.
Proof of Lemma 1. Decompose \(n^{-1/2}U_{F,n}(\theta_{0})\) as, for every \(i=1,\ldots,n\), we have
For \(i=1,\ldots,n\) and \(l=1,\ldots,q\);
We have
Now, we have
\(\mathbb{E}(J_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})=\mathbb{P}(Y_{i}=0|\mathbf{X}_{i},\mathbf{Z}_{i}),\) \(\mathbb{E}(Y_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})=\nu_{i}\) and \(\mathbb{E}(1-J_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})=\mathbb{P}(Y_{i}>0|\mathbf{X}_{i},\mathbf{Z}_{i}).\) It follows that \(\mathbb{E}[Z_{il}B_{i}(\theta_{0})]=0\).
Using similarly arguments we prove that, for every \(i=1,\ldots,n\) and \(j=1,\ldots,p\), \(\mathbb{E}[X_{ij}A_{i}(\theta_{0})]=0\).
Now, for every \(i=1,\ldots,n\) and \(l=1,\ldots,q\), we have
By \(\mathbf{H3}\), we have \(\mathbb{E}\left(Z_{il}^{2}B_{i}^{2}(\theta_{0})\right)<\infty\).
Using similar arguments, we prove \(\textrm{var}\left(X_{ij}A_{i}(\theta_{0})\right)<\infty\) for every \(i=1,\ldots,n\) and \(j=1,\ldots,p\).
Thus, by the weak law of large numbers, \(n^{-1/2}U_{F,n}(\theta_{0})\) converges in probability to \(0\), which concludes the proof.
Lemma 2. As \(n\rightarrow\infty\), \(n^{-1/2}\frac{\partial U_{F,n}(\theta)}{\partial\theta^{T}}\) converges in probability to a fixed function \(-\Sigma(\theta)\), uniformly in an open neighbourhood of \(\theta_{0}\).
Proof of Lemma 2: Let \(\tilde{U}_{F,n}(\theta):=n^{-1/2}\frac{\partial U_{F,n}(\theta)}{\partial\theta^{T}}\), and \(\nu_{\theta_{0}}\) be an open neighbourhood of \(\theta_{0}\). Let \(\theta\in\nu_{\theta_{0}}\).
By the weak law of large numbers and \(\mathbf{H3}\), \(\tilde{U}_{F,n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{\partial^{2}l_{i}(\theta)}{\partial\theta\partial\theta^{T}}\right\}\) converges in probability to the matrix \(-\Sigma(\theta)\) as \(n\rightarrow\infty\), where \(\Sigma(\theta)=\mathbb{E}\left[-\frac{\partial^{2}l_{1}(\theta)}{\partial\theta\partial\theta^{T}}\right]\).
By conditions \(\mathbf{H4}\), we prove that the convergence of \(\tilde{U}_{F,n}(\theta)\) to \(-\Sigma(\theta)\) is uniform on \(\nu_{\theta_{0}}\).
The conditions inverse function theorem of Foutz [13] are verified. Finally \(\hat{\theta}_{n}\) converges in probability to \(\theta_{0}\).
Now, we prove that \(\hat{\theta}_{n}^{F}\) is asymptotically Gaussian. To do this, it follows by a Taylor’s expansion of \(U_{F,n}(\hat{\theta}_{F,n})\) at \(\theta_{0}\) yields
.
By calculations \(\textrm{var}(U_{F,n}(\theta_{0}))=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\left(\dot{l_{i}}(\theta_{0})\dot{l_{i}}(\theta_{0})^{T}\right)=Q_{F}(\theta_{0})\).
Finally, by Lemma 2 and Slusky’s theorem, \(\sqrt{n}(\hat{\theta}_{n}^{F}-\theta_{0})\) converges in distribution to the Gaussian vector of mean zero and variance \(\Delta_{F}\), where \(\Delta_{F}\) is defined in Theorem 1.
Appendix B
6.2 Proof of Theorem 2
We prove consistency of \(\hat{\theta}_{n}^{ws}\) by checking the conditions of the inverse function theorem of Foutz [13]. These conditions are proved in a series of technical lemmas.
Lemma 3. As \(n\rightarrow\infty\), \(n^{-1/2}U_{w,n}(\theta_{0},\hat{\pi})\) converges in probability to \(0\).
Proof of Lemma 3. We decompose \(n^{-1/2}U_{w,n}(\theta_{0},\hat{\pi})\) as
Considering the first term of this decomposition.
Let \(\mathbf{S}^{\prime}_{i}=(\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})\) and \(G_{n}(\theta_{0},\pi)=n^{-1/2}U_{ws,n}(\theta_{0},\hat{\pi})-n^{-1/2}U_{ws,n}(\theta_{0},\pi)\), we have
were \(o_{p}^{*}(a_{n})\) denotes a matrix whose components are uniformly \(o_{p}(a_{n})\). By the weak law of large numbers we have
converges in probability to \(0\) as \(n\rightarrow\infty\).
Using conditions \(\mathbf{H3}\), we prove that \(\dot{l_{i}}(\theta_{0})\) is finite a.s. Finally, by Slutsky’s theorem
converges in probability to \(0\) as \(n\rightarrow\infty\).
Next, consider the term \(n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))\) in decomposition (6.1).
We show that \(n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))\) converges in probability to \(0\) as \(n\rightarrow\infty\).
For every \(i=1,\ldots,n\), we have
For \(i=1,\ldots,n\) and \(l=1,\ldots,q\);
Two cases should be considered, namely: (i) \(Z_{il}\) is a component of \(Z^{\textrm{obs}}\) and (ii) \(Z_{il}\) is a component of \(Z^{\textrm{miss}}\). In case (i), we have
Given \(\mathbf{V}_{i}=(Y_{i},\mathbf{S}^{\prime}_{i})\), \(Z_{il}B_{i}(\theta_{0})\) is a function of \((\mathbf{X}^{\textrm{miss}},\mathbf{Z}^{\textrm{miss}})\) only. Thus, by the MAR assumption, \(B_{i}(\theta_{0})\) and \(\Delta_{i}\) are independent
In case (ii),
Given \(\mathbf{V}_{i}\), \(Z_{il}B_{i}(\theta_{0})\) is a function of \((\mathbf{X}^{\textrm{miss}},\mathbf{Z}^{\textrm{miss}})\) only. Thus, by the MAR assumption, \(B_{i}(\theta_{0})\) and \(\Delta_{i}\) are independent
It follows that \(\mathbb{E}[\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}B_{i}(\theta_{0})]=0\).
Using similar arguments, we prove that \(\mathbb{E}[\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}X_{ij}A_{i}(\theta_{0})]=0\).
Now, for every \(i=1,\ldots,n\) and \(l=1,\ldots,q\) , we have
By \(\mathbf{H3}\), we have \(\mathbb{E}\left(\frac{\Delta_{i}}{\pi_{i}^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}^{2}B_{i}^{2}(\theta_{0})\right)<\infty\) .
Using similar arguments, we prove
Thus, by the weak law of large numbers, \(n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))\) converges in probability to \(0\) as \(n\rightarrow\infty\).
Finally \(n^{-1/2}U_{w,n}(\theta_{0},\hat{\pi}(Y_{i},\mathbf{S}^{\prime}_{i}))\) converges to \(0\), which concludes the proof.
Lemma 4. As \(n\rightarrow\infty\), \(n^{-1/2}\frac{\partial U_{ws,n}(\theta,\hat{\pi})}{\partial\theta^{T}}\) converges in probability to a fixed function \(-\Sigma(\theta)\), uniformly in a neighbourhood of \(\theta_{0}\).
Proof of Lemma 4. Let \(\bar{U}_{ws,n}(\theta,\hat{\pi}):=n^{-1/2}\frac{\partial U_{ws,n}(\theta,\pi)}{\partial\theta^{T}}\) and \(\ddot{l_{i}}(\theta)=\frac{\partial^{2}l_{i}(\theta)}{\partial\theta\partial\theta^{T}}\). We have
Using similary argument in Lemma 4, we have \(\bar{U}_{ws,n}(\theta,\hat{\pi})-\bar{U}_{ws,n}(\theta,\pi)\) converges in probability to \(0\). By the weak law of large numbers, and \(\mathbf{H3}\)
converges in probability to the matrix \(-\Sigma(\theta)\) as \(n\rightarrow\infty\).
By \(\mathbf{H5}\), we prove that the convergence of \(\tilde{U}_{ws,n}(\theta,\hat{\pi})\) to \(-\Sigma(\theta)\) is uniform.
The conditions inverse function theorem Foutz [13] are verified. Finally \(\hat{\theta}_{n}^{ws}\) converges in probability to \(\theta_{0}\).
Now, we prove that \(\theta_{n}^{ws}\) is asymptotically Gaussian.
It follows by a Taylor’s expansion of \(U_{ws,n}(\hat{\theta}^{ws}_{n},\hat{\pi})\) at \((\theta_{0},\hat{\pi})\) yields
therefore
thus
By calculations,
Let \(H(\theta_{0},\pi)=U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\)
where \(o_{p}(a_{n})\) denotes a column vector whose components are uniformly \(o_{p}(a_{n})\).
Let
In order to show that \(\mathbf{Q}_{1n}=\mathbf{O}_{p}(1/\sqrt{n}),\) \(\mathbb{E}(\mathbf{Q}_{1n})=\mathbf{O}_{p}(1/\sqrt{n})\) and \(\textrm{Var}(\mathbf{Q}_{1n})=\mathbf{O}_{p}^{*}(1/n)\) where \(O^{*}(a_{n})\) and \(O(a_{n})\) denote a matrix and column vector whose components are uniformly \(O(a_{n})\). It first can be shown that
and then
Thus, we have
We have
and
Therefore, \(\mathbf{Q}_{1n}=O_{p}(\frac{1}{\sqrt{n}})\), \(\mathbf{Q}_{2n}\) can be expressed as follows:
where \(\Phi_{k}=\frac{\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})}{\pi(Y_{k},\mathbf{S}^{\prime}_{k})}\) and
We have \(\mathbb{E}\left[\Psi_{ik}(\theta_{0})|Y_{i}=Y_{k},\mathbf{S}^{\prime}_{i}=\mathbf{S}^{\prime}_{k}\right]=0\) and, hence,
Let \(\Psi_{iks}(\theta_{0})\) be the \(s\)th element of \(\Psi_{ik}(\theta_{0})\). Then, by Cauchy–Schwarz’s inequality,
Because for each element of \(\Psi_{ik}(\theta_{0})\)
we can proove \(\mathbb{E}\left[\mid\Phi_{k}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Psi_{ik}(\theta_{0})\right)\mid\right]<\infty\).
By the weak law of large numbers \(\frac{1}{n}\sum_{k=1}^{n}\left\{\Phi_{k}\left[\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Psi_{ik}(\theta_{0})\right]\right\}=o_{p}(1)\) . Hence, \(\mathbf{Q}_{2n}\) can be expressed as \(\mathbf{Q}_{2n}=\frac{1}{\sqrt{n}}\sum_{k=1}^{n}\left[\frac{\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})}{\pi(Y_{k},\mathbf{S}^{\prime}_{k})}\right]\dot{l}_{k}^{*}(\theta_{0}+o_{p}(1)\).
and let \(\Sigma=\textrm{Cov}\left[U_{ws,n}(\theta_{0},\pi),U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\right]\), we have
where the notation \(o^{*}(a_{n})\) denotes a matrix whose components are uniformly \(o^{*}(a_{n})\). Finally,
Thus, by the central limit theorem, we have \(U_{ws,n}(\theta_{0},\hat{\pi})\) converges in distribution to the Gaussian vector of mean zero and variance \(\Omega_{3}(\theta_{0},\pi)-\left[\Omega_{4}(\theta_{0},\pi)-\Omega_{5}(\theta_{0},\pi)\right]\). Because \(\left[-\bar{U}_{ws,n}^{-1}(\theta_{0},\hat{\pi})-\Sigma^{-1}(\theta_{0})\right]\) converges in probability to \(0\), by Slutsky’s theorem \(\left[-\bar{U}_{ws,n}^{-1}(\theta_{0},\hat{\pi})-\Sigma^{-1}(\theta_{0})\right]U_{ws,n}(\theta_{0},\hat{\pi})\) converges in distribution to \(0\).
Finally, by Lemma 4 and Slutsky’s theorem, \(\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})\) converges in distribution to the Gaussian vector of mean zero and variance
About this article
Cite this article
Amani, K.M., Hili, O. & Kouakou, K.J. Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates. Math. Meth. Stat. 32, 241–259 (2023). https://doi.org/10.3103/S1066530723040038
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S1066530723040038