Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates

Amani, Kouakou Mathias; Hili, Ouagnina; Kouakou, Konan Jean Geoffroy

doi:10.3103/S1066530723040038

Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates

Published: 23 December 2023

Volume 32, pages 241–259, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Statistics Aims and scope Submit manuscript

Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates

Download PDF

Kouakou Mathias Amani¹,
Ouagnina Hili² &
Konan Jean Geoffroy Kouakou³

70 Accesses
Explore all metrics

Abstract

The marginalized zero-inflated poisson (MZIP) regression model quantifies the effects of an explanatory variable in the mixture population. Also, in practice the variables are usually partially observed. Thus, we first propose to study the maximum likelihood estimator when all variables are observed. Then, assuming that the probability of selection is modeled using mixed covariates (continuous, discrete and categorical), we propose a semiparametric inverse-probability weighted (SIPW) method for estimating the parameters of the MZIP model with covariates missing at random (MAR). The asymptotic properties (consistency, asymptotic normality) of the proposed estimators are established under certain regularity conditions. Through numerical studies, the performance of the proposed estimators was evaluated. Then the results of the SIPW are compared to the results obtained by semiparametric inverse-probability weighted kermel-based (SIPWK) estimator method. Finally, we apply our methodology to a dataset on health care demand in the United States.

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Article Open access 30 January 2020

Adjusted Empirical Likelihood Estimation of Distribution Function and Quantile with Nonignorable Missing Data

Article 08 February 2018

Identification and Estimation of Generalized Additive Partial Linear Models with Nonignorable Missing Response

Article 13 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 INTRODUCTION

Although Poisson models (or binomial models) are the most widely used tools for modeling count data, we are seeing more and more count data with zero inflation in several fields such as economics, biomedical studies, criminology, insurance, sociology and political science. When the number of observed zeros is greater than that predicted by standard counting distributions, zero inflation (ZI) regression models are an alternative for modeling such data. For more information on ZI regression models, see Lambert [1], Diallo et al. [4, 7], Kouakou et al. [27], Ali et al. [28]. The ZIP distribution proposed by Lambert has gained popularity. ZIP regression models have been used successfully in a variety of important applications, see for example Dietz et al. [2], Yau et al. [3], and Cheung et al. [5].

However, the ZIP distribution has two group regression parameters, one for the probability of being an zero-inflation and the other for the Poisson mean. The parameters have latent class interpretations, these latent classes are often thought to classify some not-at-risk group and the at-risk group indicating a difference in susceptibility between the two populations. Because entire population parameters interpretations are desired, Long et al. [16] introduced the marginalized zero-inflation Poisson regression (MZIP).

In practice, the data are most often partially observed. In this context, the basic method used is called the complete cases method which consists in removing the individuals who have at least one missing data. This method is simple to implement. However, when the proportion of individuals who have a missing data is higher than 5$\%$ this method gives bad results. Two other alternatives to the complete case for the treatment of missing data are the Monte Carlo EM algorithm (MCEM) and multiple imputation (MI). The MCEM and MI methods are efficient but require quite high computational loads. Finally, the IPW method that is often used requires that we find the right model for the selection probability. see for example Diallo et al. [7] and Benecha et al. [18]. To circumvent these modeling difficulties while proposing a non-numerical method, Lukusa et al. [8, 9] proposed weighted semiparametric estimators that are suitable when the selection probabilities are expressed in terms of covariates of the same nature. However, there is little work on the estimation of the MZIP model in the context of missing data. This work aims to fill this gap. In this article, we propose a semiparametric approach in which the probability of selection that is a function of continuous, discrete and categorical covariates is estimated nonparametrically. This alternative consists in discretizing the continuous covariates using Jenks’s method to have categorical covariates.

The rest of the paper is organized as follows. In the Section 2, We present the MZIP regression model and its maximum likelihood estimator. We present the SIPWK and SIPW estimation methods of MZIP model when the covariates are missing at random (MAR) and the consistency and asymptotic normality of the SIPW estimators are established in Section 3. The performance of the presented estimators are evaluated in Section 4. As an illustration, we apply these methods to real data in Section 5. A discussion and some perspectives are presented in Section 6. The technical proofs are reported in an Appendix.

2 MARGINALIZED ZIP MODELS

The ZIP distribution is used to model the counting variable of interest, namely $Y_{i}$, $i=1\ldots n$. $Y_{i}$ takes the value of from a Poisson distribution, with a mean of $\mu_{i}$, with a probability of $1-\psi_{i}$, or is drawn to zero from a Bernoulli distribution, with a probability of $\psi_{i}$. For example in dental caries research, the marginal mean $\nu_{i}$ caries count is often of more interest than the mean caries count $\mu_{i}$ of a susceptible latent group of individuals see Preisser [17].

Because entire population parameter interpretations are desired, the marginal mean $\nu_{i}$ can be modeled directly to give overall exposure effect estimates. Given that $\mu_{i}=\nu_{i}/(1-\psi_{i})$ the representation of the MZIP distribution is

$$\mathbb{P}(Y_{i}=k)=\begin{cases}\psi_{i}+(1-\psi_{i})\exp(-\nu_{i}/(1-\psi_{i})),\quad k=0\\ \displaystyle(1-\psi_{i})\frac{\exp(-\nu_{i}/(1-\psi_{i}))[\nu_{i}/(1-\psi_{i})]^{k}}{k!},\quad k>0.\end{cases}$$

(2.1)

In the MZIP model, Long et al. [16] links regression parameters directly to the marginal mean $\nu_{i}$, while employing another set of parameters to model the probability of being an excess zero (i.e., $\psi_{i}$). The parameters $\nu_{i}$ and $\psi_{i}$ of MZIP model are modeling by

$$\textrm{logit}(\psi_{i})=\mathbf{Z}_{i}^{T}\gamma\quad\textrm{and}\quad\textrm{log}(\nu_{i})=\mathbf{X}_{i}^{T}\alpha,$$

(2.2)

where $\gamma=(\gamma_{1},\gamma_{2},\ldots,\gamma_{q})^{T}$ is a $(q\times 1)$ column have the same interpretation as in ZIP model, $\alpha=(\alpha_{1},\alpha_{2},\ldots,\alpha_{p})^{T}$ is a $(p\times 1)$ vector of regression parameters for $\nu_{i}$ having interpretations as the log-incidence density ratio (IDR) for the entire sample population and $\mathbf{X}_{i_{(p\times 1)}}$ and $\mathbf{Z}_{i_{(q\times 1)}}$ denote the vectors of covariates for the $i$th individual. Let $\theta=(\gamma^{T},\alpha^{T})^{T}$. Consider that we observe a sample of $n$ independent copies $(Y_{1},\mathbf{X}_{1},\mathbf{Z}_{1})$, $(Y_{2},\mathbf{X}_{2},\mathbf{Z}_{2}),\ldots,(Y_{n},\mathbf{X}_{n},\mathbf{Z}_{n})$ of $(Z,\mathbf{X},\mathbf{Z})$. Then, the log-likelihood of $\theta$ is

$$l_{n}(\theta)=\sum_{i=1}^{n}-\textrm{log}(1+e^{\mathbf{Z}^{T}_{i}\gamma})+J_{i}\textrm{log}\left(e^{\mathbf{Z}^{T}_{i}\gamma}+e^{-(1+\textrm{exp}(\mathbf{Z}^{T}_{i}\gamma))\textrm{exp}(\mathbf{X}^{T}_{i}\alpha)}\right)$$

$${}+\sum_{i=1}^{n}(1-J_{i})\left(-(1+e^{\mathbf{Z}^{T}_{i}\gamma})e^{\mathbf{X}^{T}_{i}\alpha}+Y_{i}\textrm{log}(1+e^{\mathbf{Z}^{T}_{i}\gamma})+\mathbf{X}^{T}_{i}\alpha Y_{i}-\textrm{log}(Y_{i}!)\right),$$

where $J_{i}=1_{\{Y_{i}=0\}}$. The maximum likelihood estimator $\hat{\theta}_{F,n}=(\hat{\gamma}^{T}_{n},\hat{\alpha}^{T}_{n})^{T}$ of $\theta$ is the solution of the equation $U_{F,n}(\theta)=0$, with

$$U_{F,n}(\theta)=\frac{1}{\sqrt{n}}\frac{\partial l_{n}(\theta)}{\partial\theta}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\partial l_{i}(\theta)}{\partial\theta}=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\dot{l_{i}}(\theta)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(\mathbf{Z}_{i}B_{i}(\theta),\mathbf{X}_{i}A_{i}(\theta))^{T},$$

(2.3)

where

$$A_{i}(\theta)=(Y_{i}-e^{\mathbf{X}^{T}_{i}\alpha}(1+e^{\mathbf{Z}^{T}_{i}\gamma}))(1-J_{i})-\frac{e^{\mathbf{X}^{T}_{i}\alpha}(1+e^{\mathbf{Z}^{T}_{i}\gamma})J_{i}}{e^{\mathbf{Z}^{T}_{i}\gamma+h_{i}(\theta)}+1},$$

$$B_{i}(\theta)=\frac{J_{i}e^{\mathbf{Z}^{T}_{i}\gamma}\left(e^{h_{i}(\theta)}-e^{\mathbf{X}^{T}_{i}\alpha}\right)}{e^{\mathbf{Z}^{T}_{i}\gamma+h_{i}(\theta)}+1}+\frac{e^{\mathbf{Z}^{T}_{i}\gamma}(Y_{i}-1)}{1+e^{\mathbf{Z}^{T}_{i}\gamma}}-(1-J_{i})e^{\mathbf{X}^{T}_{i}\alpha+\mathbf{Z}^{T}_{i}\gamma},$$

and

$$h_{i}(\theta)=(1+\textrm{exp}(\mathbf{Z}^{T}_{i}\gamma))\textrm{exp}(\mathbf{X}^{T}_{i}\alpha).$$

3 ESTIMATING PARAMETERS WITH MISSING COVARIATES

Let $\mathbf{X}$ and $\mathbf{Z}$ be the vectors covariates with missing data and $Y$ always observed. Let $\Delta_{i}$ be a dummy variable that is $1$ when $\{\mathbf{Z}_{i},\mathbf{X}_{i}\}$ is completely observed, $0$ otherwise, see Rubin [12] for details. We consider covariates mixed (continuous, discrete, and categorial). Let $\mathbf{V}=(Y,\mathbf{S}^{D},\mathbf{S}^{C})^{T}$, where $\mathbf{S}^{D}=(\mathbf{X}^{D(\textrm{obs}),T},\mathbf{Z}^{D(\textrm{obs}),T})$ denote the vector of discretes variables that are always observed on each individual, $\mathbf{S}^{C}=(\mathbf{X}^{C(\textrm{obs}),T},\mathbf{Z}^{C(\textrm{obs}),T})$ denote the vector of continuous variables that are always observed on each individual and $\{\mathbf{X}^{(\textrm{miss}),T},\mathbf{Z}^{(\textrm{miss}),T}\}$ the missing components of $\{\mathbf{X},\mathbf{Z}\}$. Under the MAR mechanism, define the selection probability

$$\pi(\mathbf{V}_{i})=\mathbb{P}(\Delta_{i}=1|Y_{i},\mathbf{X}_{i},\mathbf{Z}_{i})=\mathbb{P}(\Delta_{i}=1|\mathbf{V}_{i}).$$

3.1 Kernel-Based Weighting Estimator of a MZIP Model

Let $\mathbf{D}=(\mathbf{X}^{(\textrm{obs}),T},\mathbf{Z}^{(\textrm{obs}),T})$ and $d\in\{d_{1},d_{2},\ldots,d_{m}\}$ denote the distinct values of the $\mathbf{D}$. We consider $\hat{\pi}(y,d)$ a Nadaraya–Watston (N-W) [22, 24] type estimator of $\pi(y,d)$ defined by

$$\hat{\pi}(y,d)=\frac{\sum_{k=1}^{n}\Delta_{k}K_{h}(Y_{k}=y,\mathbf{D}_{k}-d)}{\sum_{i=1}^{n}K_{h}(Y_{i}=y,\mathbf{D}_{i}-d)},$$

where $K_{h}$ is a kernel function and $h$ is a bandwidth satisfying some conditions stated in Wang [23]. The resulting semiparametric kernel-assisted weighting (SIPWK) estimator $\hat{\theta}_{n}^{wsk}$ of $\theta$ in models 2.1 and 2.2 is the solution of the equation

$$U_{w,n}(\theta,\hat{\pi})=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\Delta_{i}}{\hat{\pi}(Y_{i},\mathbf{D}_{i})}\dot{l_{i}}(\theta)=0.$$

(3.1)

In the following section, we present another weighted semiparametric estimation of a MZIP regression model.

3.2 Semiparametric IPW (SIPW) Estimator of a MZIP Model

We recall that $\mathbf{S}^{C}=(\mathbf{X}^{C(\textrm{obs}),T},\mathbf{Z}^{C(\textrm{obs}),T})$ is the set of observed continuous covariates. Inspired by Jenks’ method [26], we discretize this set. Using Herbert’s method [25], we obtain the number of optimal classes. Jenk’s method is based on the similarity principle. The method minimizes the intraclass variance. This method allows to have new categorical covariates $\mathbf{S}^{\prime,D}$.

Let $s_{1}^{D},s_{2}^{D},\ldots,s_{m}^{D}$ denote the distinct values of the $\mathbf{S}_{i}^{D}$s, $s_{1}^{\prime,D},s_{2}^{\prime,D},\ldots,s_{m}^{\prime,D}$ denote the distinct values of the $\mathbf{S}^{\prime,D}$s. The nonparametric estimator of $\pi(y,s^{D},s^{\prime,D})$ is given by the following expression:

$$\hat{\pi}(y,s^{D},s^{\prime,D})=\frac{\sum_{k=1}^{n}\Delta_{k}I(Y_{k}=y,\mathbf{S}^{D}_{k}=s^{D},\mathbf{S}^{\prime,D}_{k}=s^{\prime,D})}{\sum_{i=1}^{n}I(Y_{i}=y,\mathbf{S}^{D}_{i}=s^{D},\mathbf{S}^{\prime,D}_{i}=s^{\prime,D})},$$

where $y=0,1,2,\ldots$, $s^{D}\in\{s_{1}^{D},s_{2}^{D},\ldots,s_{m}^{D}\}$ and $s^{\prime,D}\in\{s_{1}^{\prime,D},s_{2}^{\prime,D},\ldots,s_{m}^{\prime,D}\}$.

Thus, in this context, the SIPW estimator $\hat{\theta}_{n}^{ws}$ of $\theta$ in models 2.1 and 2.2 is the solution of the equation

$$\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\frac{\Delta_{i}}{\hat{\pi}(Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})}\dot{l_{i}}(\theta)=0.$$

(3.2)

We study the asymptotic properties of $\hat{\theta}_{n}^{F}$ and $\hat{\theta}_{n}^{ws}$ in the following section.

3.3 Asymptotic Results

To establish the asymptotic properties of $\hat{\theta}^{F}_{n}$ and $\hat{\theta}_{n}^{ws}$ we give conditions of regularity.

$\mathbf{H1.}$ The true parameter value $\theta_{0}:=(\gamma_{0}^{T},\alpha_{0}^{T})^{T}$ lies in the interior of some known compact set of $\mathbb{R}^{p}\times\mathbb{R}^{q}$.
$\mathbf{H2.}$ Let $\textrm{supp}(\mathbf{S}^{D})$ denote’s the support of $\mathbf{S}^{D}$ and $\textrm{supp}(\mathbf{S}^{\prime,D})$ denote’s the support of $\mathbf{S}^{\prime,D}$. Assume $\textrm{supp}(\mathbf{S}^{D})$ and $\textrm{supp}(\mathbf{S}^{\prime,D})$ does not depend on $\theta$. Furthermore, for any $y=0,1,\ldots$ , for $s^{D}\in\textrm{supp}(\mathbf{S}^{D})$ and for $s^{\prime,D}\in\textrm{supp}(\mathbf{S}^{\prime,D})$, the selection probability $\pi(y,s^{D},s^{\prime,D})>0$.
$\mathbf{H3.}$ $\mathbb{E}\left[\frac{\dot{l_{i}}(\theta)\dot{l_{i}}(\theta)^{T}}{\pi(\mathbf{V}_{i})}\right]$ is finite and positive definite in neighborhood of the true $\theta$.
$\mathbf{H4.}$ In a neighborhood of the true $\theta$, the first and second derivatives of $U_{F,n}(\theta)$ with respect to $\theta$ exist almost surely and are uniformly bounded above by a fonction of $(Y,\mathbf{X},\mathbf{Z})$, whose expectations exist.
$\mathbf{H5.}$ The first derivatives of $U_{w,n}(\theta,\pi)$ with respect to $\theta$ exist almost surely in a neighborhood of $\theta_{0}$. Additionally, in such a neighborhood, these first derivatives are uniformly bounded above by a function of $(Y,\mathbf{X},\mathbf{Z})$, whose expectations exist.

The asymptotic properties of $\hat{\theta}^{F}_{n}$ and $\hat{\theta}^{ws}_{n}$ are stated in Theorems 1 and 2, respectively. The detailed of proofs of Theorem 1 in the Appendix A and Theorem 2 in the Appendix B.

Before studying the asymptotic properties of the estimators, we define by

$$\Sigma_{n}(\theta)=-n^{-1/2}\frac{\partial U_{F,n}}{\partial\theta^{T}}=-\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{\partial^{2}l_{i}(\theta)}{\partial\theta\partial\theta^{T}}\right\}\;\ \text{and}\;\ Q_{F}(\theta_{0})=\mathbb{E}\left[\dot{l_{1}}(\theta_{0})\dot{l_{1}}(\theta_{0})^{T}\right].$$

Because each component of $\Sigma_{n}(\theta)$ is a mean of independent and identically distributed random variables, we have $\mathbb{E}\left[\Sigma_{n}(\theta)\right]=\mathbb{E}\left[-\frac{\partial^{2}l_{1}(\theta)}{\partial\theta\partial\theta^{T}}\right]=\Sigma(\theta).$

Theorem 1. Assume that conditions (H1), (H2), and (H4) hold. Then $\hat{\theta}^{F}_{n}$ converges in probability to $\theta_{0}$, as $n\rightarrow\infty$ and $\sqrt{n}(\hat{\theta}_{n}^{F}-\theta_{0})$ has an asymptotic normal distribution with mean zero and covariance matrix $\Delta_{F}$, with $\Delta_{F}:=\Sigma(\theta_{0})^{-1}Q_{F}(\theta_{0})[\Sigma(\theta_{0})^{-1}]^{T}$, where $Q_{F}(\theta)=\mathbb{E}\left[\dot{l_{1}}(\theta)\dot{l_{1}}(\theta)^{T}\right]$.

Since the inverse of the Fisher information matrix is the variance of the score function, we can have $\Sigma(\theta_{0})=Q_{F}(\theta_{0})$. Finally $\Delta_{F}=\Sigma(\theta_{0})^{-1}$.

Theorem 2. Assume that conditions (H1), (H2), and (H4) hold. Then $\hat{\theta}^{ws}_{n}$ converges in probability to $\theta_{0}$, as $n\rightarrow\infty$ and $\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})$ has an asymptotic normal distribution with mean zero and covariance matrix $\Delta_{ws}$, with $\Delta_{ws}:=\Sigma(\theta_{0})^{-1}\{\Omega_{3}(\theta_{0},\pi)-\left[\Omega_{4}(\theta_{0},\pi)-\Omega_{5}(\theta_{0},\pi)\right]\}[\Sigma(\theta_{0})^{-1}]^{T}$, where $\Omega_{3}(\theta_{0},\pi)=\mathbb{E}\left[\frac{\dot{l}_{i}(\theta_{0})\dot{l}_{i}(\theta_{0})^{T}}{\pi(Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})}\right]$, $\Omega_{4}(\theta_{0},\pi)=\mathbb{E}\left[\frac{\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}}{\pi(Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})}\right]$, $\Omega_{5}(\theta_{0},\pi)=\mathbb{E}\left[\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]$, and $\dot{l}_{i}^{*}(\theta_{0})=\mathbb{E}\left[\dot{l}_{i}(\theta_{0})|Y_{i},\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i}\right]$.

4 SIMULATIONS STUDY

In this section, we study the performances under various conditions of the following estimators:

$\hat{\theta}^{F}_{n}$ the maximum likelihood estimator obtained by solving the equation $U_{F,n}(\theta)=0$ where $U_{F,n}(\theta)$ is defined in 2.3.
$\hat{\theta}^{wsk}_{n}$ the SIPWK estimator obtained by solving the Eq. (3.1).
$\hat{\theta}^{ws}_{n}$ the SIPW estimator obtained by solving the Eq. (3.2).

In this numerical study, we consider samples of size $n=2000$ and $1000$.

$$\textrm{logit}(\psi_{i})=\gamma_{1}Z_{i1}+\gamma_{2}Z_{i2}+\gamma_{3}Z_{i3}+\gamma_{4}Z_{i4},$$

$$\textrm{log}(\nu_{i})=\alpha_{1}X_{i1}+\alpha_{2}X_{i2}+\alpha_{3}X_{i3},$$

(4.1)

where $X_{i1}=Z_{i1}=1$, $Z_{i2}=X_{i2}$, and $Z_{i2}$, $Z_{i3}$, $Z_{i4}$, $X_{i3}$, follows, respectively, the Gaussian distribution $N(0,1.7)$, Poisson distribution $P(0.5)$, exponential distribution $E(1)$, and binomial distribution $B(1,0.5)$. The regression parameter $\alpha$ is chosen as follows $\alpha=(1.2,0.2,-0.7)^{T}$. The regression parameter $\gamma$ is chosen as follows

case 1: $\gamma=(-1,0.4,0.3,0.45)^{T}$ ,
case 2: $\gamma=(-1,0.62,0.3,0.8)^{T}$.

In case 1 (respectively case 2), the average percentage of zero inflation in this simulation is $41\%$ (respectively $65\%$). In the variable $Z_{i4}$, we assume that the data are missing. The average fraction of missing data (AFMD) in the simulated samples is equal to $15$ and $30\%$. We used a multiplicative kermel (the Dirac discrete kermel for discrete variables and the Gaussian kernel for the continuous variable) for the kernel-based weighting estimator of an MZIP model. Finally, for each configuration (sample size, proportions of zero inflation and missing data), we simulate $N=1000$ samples and calculate $\hat{\theta}_{n}^{ws}$ and $\hat{\theta}_{n}^{wsk}$. We use the statistical software R.3.5.2 to perform our simulations and the maxlik package (see Henningsen et al. [19]) to solve Eqs. (2.3), (3.1). We compute the bias of the estimates $\hat{\gamma}_{j,n}$ and $\hat{\alpha}_{k,n}$. We obtain the bias, the standard deviation (SD) and the mean square error (RMSE) for each estimator $\hat{\gamma}_{j,n}(j=1,...,4)$ and $\hat{\alpha}_{k,n}(k=1,...,3)$. For comparison purposes, we also provide the results that would be obtained if there were no missing covariates. In this case, the MLE is obtained by solving the score equation (2.2) (FD estimator). In Table 1, we present the results for $n=500$, 41$\%$ (top) and 65$\%$ (bottom) zero inflation and mean missing data 15 and 30$\%$. Table 2, we present the results for $n=1000$, 41$\%$ (top) and 65$\%$ (bottom) zero inflation and mean missing data 15 and 30$\%$. Table 3 provides the results for $n=2000$, 41$\%$ (top) and 65$\%$ (bottom) zero inflation and the average missing data 15 and 30$\%$. The Tables 1–3 show that both methods perform well, as the results obtained with both methods are close to the base case. The results also show that the bias and RMSE of the proposed method are generally better than the bias and RMSE of the SIPWK method. Let us now examine the performance of the proposed estimator. The results in Tables 1–3 show that the bias, standard deviation, and RMSE decrease as the sample size increases and the proportion of individuals with missing covariates decreases. Furthermore, the bias remains reasonable even with 30$\%$ missing data. The estimator $\hat{\theta}^{F}_{n}$ is obviously better than $\hat{\theta}^{wsk}_{n}$ and $\hat{\theta}^{wsk}_{n}$, but FD is only possible in the absence of missing data.

Table 1 Simulation results for $n=500$, zero inflation: 41$\%$ (top) and 65$\%$ (bottom)

Full size table

Table 1 (Contd.)

Full size table

Table 2 Simulation results for $n=1000$, zero inflation: 41$\%$ (top) and 65$\%$ (bottom)

Full size table

Table 2 (Contd.)

Full size table

Table 3 Simulation results for $n=2000$, zero inflation: 41$\%$ (top) and 65$\%$ (bottom)

Full size table

Table 3 (Contd.)

Full size table

5 APPLICATION

In this section, we describe an application of the MZIP model to NMES1988 data obtained from the National Medical Expenditure Survey (NMES) conducted in 1987–1988. We analyze the variable ofnp (number of consultations with a non-physician health professional in a practice) by the MZIP. The proportion of zero in the observations of this variable is equal to 0.6818. This very high proportion suggest a situation of inflation of zeros. For each of the individuals $i\,(i=1\ldots n=4406)$ of the sample, let $Y_{i}$ denote the number of consultations a non-physician health professional in a practice.

$\psi_{i}$ represents the probability that patient $i$ will give up in such a way systematic to consult a non-physician professional.
$\nu_{i}$ represents the average number of consultations with a health professional not doctor, for a patient $i$.

To model the marginal mean and zero-inflation parameters $\nu_{i}$ and $\psi_{i}$ defined in (2.2), where $Z_{i}$ and $X_{i}$ are the set of covariates, we proceeded as follows. First, we fitted an MZIP regression model incorporating all the covariates available in (2.2), i.e., taking $X_{i}=Z_{i}$ for each $i$. Next, Wald tests were used to select the relevant covariates in the sub-models (2.2). Through this procedure, we identify three significant predictors included in $\nu_{i}$ (chronic, gender, school) and six significant predictors included in $\psi_{i}$ (chronic, medicaid, age, income, gender, school). The significant covariates are gender (1 for female, 0 for male), age (in years, divided by 10), school (number of years of education), income (in 10 000 dollars), chronic diseases (cancer, arthritis, diabetes…), and medicaid (a binary variable indicating whether the individual is covered by medicaid or not). The covariate age (in years, divided by 10) was discretized before applying the proposed method. We therefore model $\psi_{i}$ and $\nu_{i}$ as follows:

$$\textrm{logit}(\psi_{i})=\gamma_{1}\textrm{inter}+\gamma_{2}\textrm{chronic}+\gamma_{3}\textrm{medicaid}+\gamma_{4}\textrm{age}+\gamma_{5}\textrm{income}+\gamma_{6}\textrm{gender}+\gamma_{7}\textrm{school},$$

(5.1)

$$\textrm{log}(\nu_{i})=\alpha_{1}\textrm{inter}+\alpha_{2}\textrm{chronic}+\alpha_{3}\textrm{gender}+\alpha_{4}\textrm{school}.$$

(5.2)

We simulated $15\%$ (moderate) and $30\%$ (high) proportions of missing data in the ‘‘income’’ variable, respectively. Indeed, among the covariates, the ‘‘income’’ variable is the most likely to have missing data, as it is more sensitive and confidential information. Respondents are often reluctant to disclose their income, which can lead to higher rates of missing data for this variable. According to Mishra et al. [29], National Health and Nutrition Examination Survey, the rate of missing data in the ‘‘income’’ variable is often high, reaching or exceeding 15$\%$. Tables 4 and 5 show the estimation results for the case with no missing data (FD) and 15$\%$ missing data, followed by the case with no missing data (FD) and 30$\%$ missing data, respectively. We can say that the proposed method is robust because when the percentage of missing data increases, the covariates remain significant and the coefficients keep the same signs as in the reference case (FD). We can state that the variables of Medicaid status and gender are identified as the most influential factors in the decision to never use consultations with a non-physician health care professional. Medicaid recipients are more likely to forego a non-physician health care professional during an office visit. One explanation is that patients covered by Medicaid can limit their consultations to those that are necessary, i.e., not see a doctor, given that Medicaid is health insurance for the less well-off.

Table 4 Analysis of health care data with 15$\%$ missing data

Full size table

Table 5 Analysis of health care data with 30$\%$ missing data

Full size table

The probability of never using a doctor decreases with chronic, income, school, and age. The probability of never using a non-physician health care professional in a medical office decreases with the level of education because better-informed patients may tend to diversify their use of care. This probability decreases as health status worsens (in part because patients with worsening health status tend to favor visits to health professionals). This probability decreases with income because patients with higher incomes prefer to visit a health care professional.

The number of chronic illnesses and the level of education are the variables that most influence the average number consultations with non-physician healthcare professionals because patients with chronic conditions and those with higher levels of education visit regularly.

6 CONCLUSIONS

In this article, we have proposed a method for estimating the parameters of the MZIP model with MAR covariates. We compare the performance of this estimator with that of the kernel-assisted weighted estimator. The analysis of the numerical results concludes that the proposed $\hat{\theta}^{ws}_{n}$ estimator and the $\hat{\theta}^{wsk}_{n}$ estimator has a good performance. However, the simulation results suggest that the proposed method is more efficient than the kernel-assisted weighting method. The proposed SIPW estimator was used to analyze data from the U.S. public health economics NMES1988. The results of this analysis confirm the robustness of the proposed SIPW estimator.

In this paper, we assume that our data are MAR. But the missing data model is not monotonic in many practical situations. Adapting this approach to non-monotonic missing data in MZIP regression deserves further research.

REFERENCES

D. Lambert, “Zero-inflated Poisson regression with an application to defects in manufacturing,” Technometrics 34, 1–14 (1992).
Article Google Scholar
D. Lambert, ‘‘Zero-inflated Poisson regression with an application to defects in manufacturing,’’ Technometrics 34, 1–14 (1992).
Article Google Scholar
E. Dietz and D. Böhning, ‘‘On estimation of the Poisson parameter in zero-modified Poisson models,’’ Comput. Stat. Data Anal. 4, 441–459 (2000).
Article Google Scholar
K. K. W. Yau and A. H. Lee, ‘‘Zero-inflated Poisson regression with random effects to evaluate an occupational injury prevention programme,’’ Stat. Med. 20, 2907–2920 (2001).
Article Google Scholar
A. O. Diallo, A. Diop, and J.-F. Dupuy, ‘‘Asymptotic properties of themaximum likelihood estimator in zero-inflated binomial regression,’’ Commun. Stat. Theory Methods 46 (20), 9930–9948 (2017).
Article Google Scholar
Y. B. Cheung, ‘‘Zero-inflated models for regression analysis of count data: a study of growth and development,’’ Stat. Med. 21, 1461–1469 (2002).
Article MathSciNet Google Scholar
M. Reilly and M. S. Pepe, ‘‘A mean score method for missing and auxiliary covariates data in regression methods,’’ Biometrika 82, 299–314 (1995).
Article MathSciNet Google Scholar
A. Diallo, A. Diop, and J.-F. Dupuy, ‘‘Estimation in zero-inflated binomial regression with missing covariates,’’ Statistics 53 (4), 839–865 (2019).
Article MathSciNet Google Scholar
T. M. Lukusa, S.-M. Lee, and C.-S. Li, ‘‘Semiparametric estimation of a zero-inflated Poisson regression model with missing covariates,’’ Metrika 79 (4), 457–483 (2016).
Google Scholar
T. M. Lukusa and F. K. Hing Phoa, ‘‘A note on the weighting-type estimations of the zero-inflated Poisson regression model with missing data in covariates,’’ Journal Pre-proof. (2019).
MathSciNet Google Scholar
D. G. Horvitz and D. J. Thompson, ‘‘A generalization of sampling without replacement from a finite universe,’’ Current Res. Biostat. 47, 663–685 (1952).
Article MathSciNet Google Scholar
S. H. Hsieh, S. M. Lee, and P. S. Shen, ‘‘Logistic regression analysis of randomized response data with missing covariates,’’ J. Stat. Plan. Inference 140, 927–940 (2010).
Article MathSciNet Google Scholar
D. B. Rubin, ‘‘Inference and missing data,’’ Biometrika 63 (3), 581–592 (1976).
Article MathSciNet Google Scholar
R. V. Foutz, ‘‘On the unique consistent solution to the likelihood equations,’’ J. Am. Stat. Assoc. 72, 147–148 (1977).
Article Google Scholar
D. Böhning, E. Dietz, P. Schlattmann, L. Mendonca, and U. Kirchner, ‘‘The zero-inflated Poisson model and the decayed, missing, and filled teeth index in dental epidemiology,’’ J. R. Stat. Soc. Ser. A 162, 195–209 (1999).
Article MathSciNet Google Scholar
S. H. Hsieh, S. M. Lee, and P. S. Shen, ‘‘Semiparametric analysis of randomized response data with missing covariates in logistic regression,’’ Comput. Stat. Data Anal. 53, 2673–2692 (2009).
Article MathSciNet Google Scholar
D. Long, J. S. Preisser, A. H. Herringb, and C. E. Golin, ‘‘A marginalized zero-inflated Poisson regression model with overall exposure effects,’’ Statist. Med. 33, 5151–5165 (2014).
Article Google Scholar
J. S. Preisser, J. W. Stamm, D. L. Long, and M. E. Kincade, ‘‘Review and recommendations for zero-inflated count regression modeling of dental caries indices in epidemiological studies,’’ Caries Research. 46 (4), 413–423 (2012) .
Google Scholar
K. H. Benecha, J. S. Preisser, and K. Das, ‘‘Marginal Zero-inflated models with missing covariates,’’ Biometric Journal (2018).
Article MathSciNet Google Scholar
A. Henningsen and O. Toomet, ‘‘maxLik: A package for maximum likelihood estimation in R,’’ Computational Statistics 26 (3), 443–458 (2011).
MathSciNet Google Scholar
D. Wang and S. X. Chen, ‘‘Empirical likelihood for estimating equations with missing values,’’ Ann. Stat. 37, 490–517 (2000).
Article Google Scholar
M. Reilly and M. S. Pepe, ‘‘A mean score method for missing and auxiliary covariates data in regression methods,’’ Biometrika 82, 299–314 (2019).
Article Google Scholar
E. A. Nadaraya, ‘‘On estimating regression,’’ Theory of probability and its applications 9, 141–142 (1964).
Article MathSciNet Google Scholar
S. Wang and C. Y. Wang, ‘‘A note on kernel assisted estimators in missing covariate regression,’’ Stat. Probabil. Lett. 55, 439–449 (2001).
MathSciNet Google Scholar
G. S. Watson, ‘‘Smooth regression analysis,’’ Sankhya, Series A 26, 359–372 (1964).
Article Google Scholar
Herbert A. Sturges, ‘‘The Choice of a Class Interval,’’ Journal of the American Statistical Association 21, 153 (1926).
Google Scholar
G F. Jenks, Optimal Data Classification for Choropleth Maps (Lawrence Kansas, 1977).
Google Scholar
K. J. G. Kouakou, O. Hili, and J. F. Dupuy, ‘‘Estimation in the Zero-Inflated Bivariate Poisson model, with an application to healt-care utilization data,’’ Africa Statistica 16 (2): 2767–2788 (2021).
E. Ali, M. L. Diop, nd A. Diop, ‘‘Statistical inference in a Zero-Inflated Bell regression model,’’ Mathematical methods of Statistic 31, 91–104 (2022).
Suruchi Mishra, Cynthia L. Ogden, and Melissa Dimeler, Dietary Supplement Use in the United States: National Health and Nutrition Examination Survey, National Health Statistics Reports, 2017–March 2020 (2023).

Download references

ACKNOWLEDGMENTS

Authors are grateful to referees and editor for their comments and suggestions that led to significant improvements of earlier versions of this article.

Author information

Authors and Affiliations

UMRI-Mathématiques et Nouvelles Technologies de l’Information, Institut National Polytechnique Félix Houphouët-Boigny (INP-HB) de Yamoussoukro, Yamoussoukro, Côte d’Ivoire
Kouakou Mathias Amani
UMRI-Mathématiques et Nouvelles Technologies de l’Information, Institut National Polytechnique Félix Houphouët-Boigny (INP-HB) de Yamoussoukro, Yamoussoukro, Côte d’Ivoire
Ouagnina Hili
UMRI-Mathématiques et Nouvelles Technologies de l’Information, Institut National Polytechnique Félix Houphouët-Boigny (INP-HB) de Yamoussoukro, Yamoussoukro, Côte d’Ivoire
Konan Jean Geoffroy Kouakou

Authors

Kouakou Mathias Amani
View author publications
You can also search for this author in PubMed Google Scholar
Ouagnina Hili
View author publications
You can also search for this author in PubMed Google Scholar
Konan Jean Geoffroy Kouakou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Kouakou Mathias Amani, Ouagnina Hili or Konan Jean Geoffroy Kouakou.

Ethics declarations

The authors declare that they have no conflicts of interest.

Appendix A

PROOFS OF ASYMPTOTIC RESULTS

6.1 Proof of Theorem 1

We prove consistency of $\hat{\theta}_{n}^{F}$ by checking the conditions of the inverse function theorem of Foutz [13]. These conditions are proved in a series of technical lemmas.

Lemma 1. As $n\rightarrow\infty$, $n^{-1/2}U_{F,n}(\theta_{0})$ converges in probability to 0.

Proof of Lemma 1. Decompose $n^{-1/2}U_{F,n}(\theta_{0})$ as, for every $i=1,\ldots,n$, we have

$$n^{-1/2}U_{F,n}(\theta_{0})=\begin{pmatrix}\displaystyle\frac{1}{n}\sum_{i=1}^{n}Z_{i1}B_{i}(\theta_{0})\\ \vdots\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}Z_{iq}B_{i}(\theta_{0})\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}X_{i1}A_{i}(\theta_{0})\\ \vdots\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}X_{ip}A_{i}(\theta_{0})\end{pmatrix}.$$

For $i=1,\ldots,n$ and $l=1,\ldots,q$;

$$\mathbb{E}\left[Z_{il}B_{i}(\theta_{0})\right]=\mathbb{E}\left[\mathbb{E}\left[Z_{il}B_{i}(\theta_{0})|\mathbf{X}_{i},\mathbf{Z}_{i}\right]\right]=\mathbb{E}\left[Z_{il}\mathbb{E}\left[B_{i}(\theta_{0})|\mathbf{X}_{i},\mathbf{Z}_{i}\right]\right].$$

We have

$$\mathbb{E}[B_{i}(\theta_{0})|\mathbf{X}_{i},\mathbf{Z}_{i}]=\frac{\mathbb{E}(J_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})e^{\mathbf{Z}^{T}_{i}\gamma_{0}}\left(e^{h_{i}(\theta_{0})}-e^{\mathbf{X}^{T}_{i}\alpha_{0}}\right)}{e^{\mathbf{Z}^{T}_{i}\gamma_{0}+h_{i}(\theta_{0})}+1}+\frac{e^{\mathbf{Z}^{T}_{i}\gamma_{0}}(\mathbb{E}(Y_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})-1)}{1+e^{\mathbf{Z}^{T}_{i}\gamma_{0}}}$$

$${}-\left[1-\mathbb{E}(J_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})\right]e^{\mathbf{X}^{T}_{i}\alpha_{0}+\mathbf{Z}^{T}_{i}\gamma_{0}}.$$

Now, we have

$\mathbb{E}(J_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})=\mathbb{P}(Y_{i}=0|\mathbf{X}_{i},\mathbf{Z}_{i}),$ $\mathbb{E}(Y_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})=\nu_{i}$ and $\mathbb{E}(1-J_{i}|\mathbf{X}_{i},\mathbf{Z}_{i})=\mathbb{P}(Y_{i}>0|\mathbf{X}_{i},\mathbf{Z}_{i}).$ It follows that $\mathbb{E}[Z_{il}B_{i}(\theta_{0})]=0$.

Using similarly arguments we prove that, for every $i=1,\ldots,n$ and $j=1,\ldots,p$, $\mathbb{E}[X_{ij}A_{i}(\theta_{0})]=0$.

Now, for every $i=1,\ldots,n$ and $l=1,\ldots,q$, we have

$$\textrm{var}\left(Z_{il}B_{i}(\theta_{0})\right)\leq\mathbb{E}\left(Z_{il}^{2}B_{i}^{2}(\theta_{0})\right).$$

By $\mathbf{H3}$, we have $\mathbb{E}\left(Z_{il}^{2}B_{i}^{2}(\theta_{0})\right)<\infty$.

Using similar arguments, we prove $\textrm{var}\left(X_{ij}A_{i}(\theta_{0})\right)<\infty$ for every $i=1,\ldots,n$ and $j=1,\ldots,p$.

Thus, by the weak law of large numbers, $n^{-1/2}U_{F,n}(\theta_{0})$ converges in probability to $0$, which concludes the proof.

Lemma 2. As $n\rightarrow\infty$, $n^{-1/2}\frac{\partial U_{F,n}(\theta)}{\partial\theta^{T}}$ converges in probability to a fixed function $-\Sigma(\theta)$, uniformly in an open neighbourhood of $\theta_{0}$.

Proof of Lemma 2: Let $\tilde{U}_{F,n}(\theta):=n^{-1/2}\frac{\partial U_{F,n}(\theta)}{\partial\theta^{T}}$, and $\nu_{\theta_{0}}$ be an open neighbourhood of $\theta_{0}$. Let $\theta\in\nu_{\theta_{0}}$.

By the weak law of large numbers and $\mathbf{H3}$, $\tilde{U}_{F,n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{\partial^{2}l_{i}(\theta)}{\partial\theta\partial\theta^{T}}\right\}$ converges in probability to the matrix $-\Sigma(\theta)$ as $n\rightarrow\infty$, where $\Sigma(\theta)=\mathbb{E}\left[-\frac{\partial^{2}l_{1}(\theta)}{\partial\theta\partial\theta^{T}}\right]$.

By conditions $\mathbf{H4}$, we prove that the convergence of $\tilde{U}_{F,n}(\theta)$ to $-\Sigma(\theta)$ is uniform on $\nu_{\theta_{0}}$.

The conditions inverse function theorem of Foutz [13] are verified. Finally $\hat{\theta}_{n}$ converges in probability to $\theta_{0}$.

Now, we prove that $\hat{\theta}_{n}^{F}$ is asymptotically Gaussian. To do this, it follows by a Taylor’s expansion of $U_{F,n}(\hat{\theta}_{F,n})$ at $\theta_{0}$ yields

$$0=U_{F,n}(\theta_{0})+\tilde{U}_{F,n}(\theta_{0})\sqrt{n}(\hat{\theta}_{n}^{F}-\theta_{0})+o_{p}(1)$$

.

By calculations $\textrm{var}(U_{F,n}(\theta_{0}))=\frac{1}{n}\sum_{i=1}^{n}\mathbb{E}\left(\dot{l_{i}}(\theta_{0})\dot{l_{i}}(\theta_{0})^{T}\right)=Q_{F}(\theta_{0})$.

Finally, by Lemma 2 and Slusky’s theorem, $\sqrt{n}(\hat{\theta}_{n}^{F}-\theta_{0})$ converges in distribution to the Gaussian vector of mean zero and variance $\Delta_{F}$, where $\Delta_{F}$ is defined in Theorem 1.

Appendix B

6.2 Proof of Theorem 2

We prove consistency of $\hat{\theta}_{n}^{ws}$ by checking the conditions of the inverse function theorem of Foutz [13]. These conditions are proved in a series of technical lemmas.

Lemma 3. As $n\rightarrow\infty$, $n^{-1/2}U_{w,n}(\theta_{0},\hat{\pi})$ converges in probability to $0$.

Proof of Lemma 3. We decompose $n^{-1/2}U_{w,n}(\theta_{0},\hat{\pi})$ as

$$n^{-1/2}U_{ws,n}(\theta_{0},\hat{\pi})=(n^{-1/2}U_{ws,n}(\theta_{0},\hat{\pi})-n^{-1/2}U_{ws,n}(\theta_{0},\pi))+n^{-1/2}U_{ws,n}(\theta_{0},\pi).$$

(6.1)

Considering the first term of this decomposition.

Let $\mathbf{S}^{\prime}_{i}=(\mathbf{S}^{D}_{i},\mathbf{S}^{\prime,D}_{i})$ and $G_{n}(\theta_{0},\pi)=n^{-1/2}U_{ws,n}(\theta_{0},\hat{\pi})-n^{-1/2}U_{ws,n}(\theta_{0},\pi)$, we have

$$G_{n}(\theta_{0},\pi)=\frac{1}{n}\sum_{i=1}^{n}\Delta_{i}\left(\frac{1}{\hat{\pi}(Y_{i},\mathbf{S}^{\prime}_{i})}-\frac{1}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right)\dot{l_{i}}(\theta_{0}),$$

$${}=\frac{1}{n}\sum_{i=1}^{n}\Delta_{i}\left[\frac{\hat{\pi}(Y_{i},\mathbf{S}^{\prime}_{i})-\pi(Y_{i},\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}+O_{P}\left((\hat{\pi}(Y_{i},\mathbf{S}^{\prime}_{i})-\pi(Y_{i},\mathbf{S}^{\prime}_{i}))^{2}\right)\right]\dot{l_{i}}(\theta_{0}),$$

$${}=\frac{1}{n}\sum_{i=1}^{n}\Delta_{i}\left[\frac{\frac{\sum_{k=1}^{n}\Delta_{k}I(Y_{k}=y,\mathbf{S}^{\prime}_{k}=s^{\prime})}{\sum_{i=1}^{n}I(Y_{i}=y,\mathbf{S}^{\prime}_{i}=s^{\prime})}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})+o^{*}_{p}\left(\frac{1}{\sqrt{n}}\right),$$

$${}=\frac{1}{n}\sum_{i=1}^{n}\Delta_{i}\left[\frac{\frac{1}{n}\sum_{i=1}^{n}\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i}\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}+O_{p}\left(\frac{1}{n}\right)\right]$$

$${}\times\dot{l_{i}}(\theta_{0})+o^{*}_{p}\left(\frac{1}{\sqrt{n}}\right),$$

$${}=\frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{{}^{\prime}}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})$$

$${}+\frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})+o_{p}^{*}\left(\frac{1}{\sqrt{n}}\right),$$

were $o_{p}^{*}(a_{n})$ denotes a matrix whose components are uniformly $o_{p}(a_{n})$. By the weak law of large numbers we have

$$\frac{1}{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]$$

converges in probability to $0$ as $n\rightarrow\infty$.

Using conditions $\mathbf{H3}$, we prove that $\dot{l_{i}}(\theta_{0})$ is finite a.s. Finally, by Slutsky’s theorem

$$G_{n}(\theta_{0},\pi)=\frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi_{i}^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]$$

$${}\times\dot{l_{i}}(\theta_{0})+\frac{1}{n^{2}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})$$

converges in probability to $0$ as $n\rightarrow\infty$.

Next, consider the term $n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))$ in decomposition (6.1).

We show that $n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))$ converges in probability to $0$ as $n\rightarrow\infty$.

For every $i=1,\ldots,n$, we have

$$n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))=\begin{pmatrix}\displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{i1}B_{i}(\theta_{0})\\ \vdots\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{iq}B_{i}(\theta_{0})\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}X_{i1}A_{i}(\theta_{0})\\ \vdots\\ \displaystyle\frac{1}{n}\sum_{i=1}^{n}\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}X_{ip}A_{i}(\theta_{0})\end{pmatrix}.$$

For $i=1,\ldots,n$ and $l=1,\ldots,q$;

$$\mathbb{E}\left[\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}B_{i}(\theta_{0})\right]=\mathbb{E}\left[\mathbb{E}\left[\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{i1}B_{i}(\theta_{0})|Y_{i},\mathbf{S}^{\prime}_{i}\right]\right].$$

Two cases should be considered, namely: (i) $Z_{il}$ is a component of $Z^{\textrm{obs}}$ and (ii) $Z_{il}$ is a component of $Z^{\textrm{miss}}$. In case (i), we have

$$\mathbb{E}\left[\mathbb{E}\left[\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}B_{i}(\theta_{0})|Y_{i},\mathbf{S}^{\prime}_{i}\right]\right]=\mathbb{E}\left[\frac{1}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}\mathbb{E}[\Delta_{i}B_{i}(\theta_{0})|Y_{i},\mathbf{S}^{\prime}_{i}]\right].$$

Given $\mathbf{V}_{i}=(Y_{i},\mathbf{S}^{\prime}_{i})$, $Z_{il}B_{i}(\theta_{0})$ is a function of $(\mathbf{X}^{\textrm{miss}},\mathbf{Z}^{\textrm{miss}})$ only. Thus, by the MAR assumption, $B_{i}(\theta_{0})$ and $\Delta_{i}$ are independent

$$\mathbb{E}\left[\frac{1}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}\mathbb{E}[\Delta_{i}B_{i}(\theta_{0})|\mathbf{V}_{i}]\right]=\mathbb{E}\left[\frac{1}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}\mathbb{E}[\Delta_{i}|\mathbf{V}_{i}]\mathbb{E}[B_{i}(\theta_{0})|\mathbf{V}_{i}]\right],$$

$${}=\mathbb{E}\left[Z_{il}\mathbb{E}[B_{i}(\theta_{0})|\mathbf{V}_{i}]\right],$$

$${}=\mathbb{E}\left[Z_{il}B_{i}(\theta_{0})\right],$$

$${}=0.$$

In case (ii),

$$\mathbb{E}\left[\mathbb{E}\left[\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{i1}B_{i}(\theta_{0})|\mathbf{V}_{i}\right]\right]=\mathbb{E}\left[\frac{1}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}\mathbb{E}[\Delta_{i}Z_{il}B_{i}(\theta_{0})|\mathbf{V}_{i}]\right].$$

Given $\mathbf{V}_{i}$, $Z_{il}B_{i}(\theta_{0})$ is a function of $(\mathbf{X}^{\textrm{miss}},\mathbf{Z}^{\textrm{miss}})$ only. Thus, by the MAR assumption, $B_{i}(\theta_{0})$ and $\Delta_{i}$ are independent

$$\mathbb{E}\left[\frac{1}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}\mathbb{E}[\Delta_{i}Z_{il}B_{i}(\theta_{0})|\mathbf{V}_{i}]\right]=\mathbb{E}\left[\frac{1}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}\mathbb{E}[\Delta_{i}|V_{i}]\mathbb{E}[Z_{il}B_{i}(\theta_{0})|\mathbf{V}_{i}]\right],$$

$${}=\mathbb{E}[Z_{il}B_{i}(\theta_{0})],$$

$${}=0.$$

It follows that $\mathbb{E}[\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}B_{i}(\theta_{0})]=0$.

Using similar arguments, we prove that $\mathbb{E}[\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}X_{ij}A_{i}(\theta_{0})]=0$.

Now, for every $i=1,\ldots,n$ and $l=1,\ldots,q$ , we have

$$\textrm{var}\left(\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}B_{i}(\theta_{0})\right)\leq\mathbb{E}\left(\frac{\Delta_{i}}{\pi_{i}^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}^{2}B_{i}^{2}(\theta_{0})\right).$$

By $\mathbf{H3}$, we have $\mathbb{E}\left(\frac{\Delta_{i}}{\pi_{i}^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}Z_{il}^{2}B_{i}^{2}(\theta_{0})\right)<\infty$ .

Using similar arguments, we prove

$$\textrm{var}\left(\frac{\Delta_{i}}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}X_{ij}A_{i}(\theta_{0})\right)<\infty \text{for every} i=1,\ldots,n \textrm{and} j=1,\ldots,p.$$

Thus, by the weak law of large numbers, $n^{-1/2}U_{ws,n}(\theta_{0},\pi(Y_{i},\mathbf{S}^{\prime}_{i}))$ converges in probability to $0$ as $n\rightarrow\infty$.

Finally $n^{-1/2}U_{w,n}(\theta_{0},\hat{\pi}(Y_{i},\mathbf{S}^{\prime}_{i}))$ converges to $0$, which concludes the proof.

Lemma 4. As $n\rightarrow\infty$, $n^{-1/2}\frac{\partial U_{ws,n}(\theta,\hat{\pi})}{\partial\theta^{T}}$ converges in probability to a fixed function $-\Sigma(\theta)$, uniformly in a neighbourhood of $\theta_{0}$.

Proof of Lemma 4. Let $\bar{U}_{ws,n}(\theta,\hat{\pi}):=n^{-1/2}\frac{\partial U_{ws,n}(\theta,\pi)}{\partial\theta^{T}}$ and $\ddot{l_{i}}(\theta)=\frac{\partial^{2}l_{i}(\theta)}{\partial\theta\partial\theta^{T}}$. We have

$$\bar{U}_{ws,n}(\theta,\hat{\pi})=\left[\bar{U}_{ws,n}(\theta,\hat{\pi})-\bar{U}_{ws,n}(\theta,\pi)\right]+\bar{U}_{ws,n}(\theta,\pi).$$

Using similary argument in Lemma 4, we have $\bar{U}_{ws,n}(\theta,\hat{\pi})-\bar{U}_{ws,n}(\theta,\pi)$ converges in probability to $0$. By the weak law of large numbers, and $\mathbf{H3}$

$$\bar{U}_{ws,n}(\theta,\pi)=\frac{1}{n}\sum_{i=1}^{n}\left\{\frac{\Delta_{i}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\ddot{l_{i}}(\theta)\right\}$$

converges in probability to the matrix $-\Sigma(\theta)$ as $n\rightarrow\infty$.

By $\mathbf{H5}$, we prove that the convergence of $\tilde{U}_{ws,n}(\theta,\hat{\pi})$ to $-\Sigma(\theta)$ is uniform.

The conditions inverse function theorem Foutz [13] are verified. Finally $\hat{\theta}_{n}^{ws}$ converges in probability to $\theta_{0}$.

Now, we prove that $\theta_{n}^{ws}$ is asymptotically Gaussian.

It follows by a Taylor’s expansion of $U_{ws,n}(\hat{\theta}^{ws}_{n},\hat{\pi})$ at $(\theta_{0},\hat{\pi})$ yields

$$0=U_{ws,n}(\hat{\theta}^{ws}_{n},\hat{\pi})=U_{ws,n}(\theta_{0},\hat{\pi})+\bar{U}_{ws,n}(\theta,\hat{\pi})\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})+o_{p}(1),$$

therefore

$$\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})=-\left[\bar{U}_{ws,n}(\theta_{0},\hat{\pi})\right]^{-1}U_{ws,n}(\theta_{0},\hat{\pi})+o_{p}(1),$$

thus

$$\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})=\Sigma^{-1}(\theta_{0})U_{ws,n}(\theta_{0},\hat{\pi})+\left[-\bar{U}_{ws,n}^{-1}(\theta_{0},\hat{\pi})-\Sigma^{-1}(\theta_{0})\right]U_{ws,n}(\theta_{0},\hat{\pi})+o_{p}(1).$$

By calculations,

$$\textrm{Var}\left[U_{ws,n}(\theta_{0},\hat{\pi})\right]=\textrm{Var}\left\{U_{ws,n}(\theta_{0},\pi)+\left[U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\right]\right\},$$

$${}=\textrm{Var}\left[U_{ws,n}(\theta_{0},\pi)\right]+\textrm{Var}\left[U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\right]$$

$${}+2\textrm{Cov}\left[U_{ws,n}(\theta_{0},\pi),U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\right],$$

$$\textrm{Var}\left[U_{ws,n}(\theta_{0},\pi)\right]=\mathbb{E}\left[\frac{\dot{l}_{i}(\theta_{0},\pi)\dot{l}_{i}(\theta_{0},\pi)^{T}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right]=\Omega_{3}(\theta_{0},\pi).$$

Let $H(\theta_{0},\pi)=U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)$

$$H(\theta_{0},\pi)=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Delta_{i}\left(\frac{1}{\hat{\pi}(Y_{i},\mathbf{S}^{\prime}_{i})}-\frac{1}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right)\dot{l_{i}}(\theta_{0}),$$

$${}=-\frac{1}{\sqrt{n^{3}}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})$$

$${}-\frac{1}{\sqrt{n^{3}}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})+o_{p}(1),$$

$${}=-\mathbf{Q}_{1n}-\mathbf{Q}_{2n}+o_{p}(1),$$

where $o_{p}(a_{n})$ denotes a column vector whose components are uniformly $o_{p}(a_{n})$.

$$\mathbf{Q}_{1n}=\frac{1}{\sqrt{n^{3}}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0}).$$

$$\mathbf{Q}_{2n}=\frac{1}{\sqrt{n^{3}}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0}).$$

Let

$$P_{ik}=\left[\frac{\left[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0}).$$

In order to show that $\mathbf{Q}_{1n}=\mathbf{O}_{p}(1/\sqrt{n}),$ $\mathbb{E}(\mathbf{Q}_{1n})=\mathbf{O}_{p}(1/\sqrt{n})$ and $\textrm{Var}(\mathbf{Q}_{1n})=\mathbf{O}_{p}^{*}(1/n)$ where $O^{*}(a_{n})$ and $O(a_{n})$ denote a matrix and column vector whose components are uniformly $O(a_{n})$. It first can be shown that

$$\mathbb{E}\left[P_{ik}\right]=\mathbb{E}\left[\mathbb{E}(P_{ik}|Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})\right]=\begin{cases}0\quad\text{if}\quad i\neq k\\ \mathbb{E}\left[\frac{\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\dot{l_{i}}(\theta_{0})}{\pi_{i}(Y_{i},\mathbf{S}^{\prime}_{i})}\right]\quad\text{if}\quad i=k,\end{cases}$$

and then

$$\mathbb{E}\left[\frac{\left[1-\pi(Y_{i},\mathbf{S}_{i})\right]\dot{l_{i}}(\theta_{0})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right]=\mathbb{E}\left\{\mathbb{E}\left[\frac{\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\dot{l_{i}}(\theta_{0})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}|Y_{i},\mathbf{S}^{\prime}_{i}\right]\right\},$$

$${}=\mathbb{E}\left[\frac{\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\dot{l}_{i}^{*}(\theta_{0})\right].$$

Thus, we have

$$\mathbb{E}(\mathbf{Q}_{1n})=\frac{1}{n^{3/2}}\sum_{k=1}^{n}\sum_{i=1}^{n}\mathbb{E}\left[P_{ik}\right]=\frac{1}{n^{3/2}}\sum_{i=1}^{n}\mathbb{E}\left[\frac{\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\dot{l}_{i}^{*}(\theta_{0})\right]=O(\frac{1}{\sqrt{n}}).$$

We have

$$\textrm{Cov}(P_{ij},P_{kl})=\mathbb{E}\bigg{\{}\left[\frac{\left[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]\left[\Delta_{j}-\pi(Y_{j},\mathbf{S}^{\prime}_{j})\right]I(Y_{j}=Y_{i},\mathbf{S}^{\prime}_{j}=\mathbf{S}^{\prime}_{i})}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0})$$

$${}\times\left[\frac{\left[\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})\right]\left[\Delta_{l}-\pi(Y_{l},\mathbf{S}^{\prime}_{l})\right]I(Y_{l}=Y_{k},\mathbf{S}^{\prime}_{l}=\mathbf{S}^{\prime}_{k})}{\pi^{2}(Y_{k},\mathbf{S}^{\prime}_{k})P(Y=Y_{k},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{k})}\right]\dot{l_{k}}(\theta_{0})^{T}\bigg{\}},$$

$$\textrm{Cov}(P_{ij},P_{lk})=\begin{cases}0\quad\text{if}\quad k\neq i,j\quad\text{and}\quad l\neq i,j\\ 0\quad\text{if}\quad k\neq i,j\quad\text{and}\quad l=i\quad\text{or}\quad j\\ \mathbb{E}\left[\frac{\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]^{2}\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}\right]\quad\text{if}\quad i=l\quad\text{and}\quad j=k\end{cases}$$

and

$$\textrm{Var}(\mathbf{Q}_{1n})=\frac{1}{n^{3}}\bigg{\{}\sum_{i,j}^{n}\textrm{Var}(P_{ij})+\sum_{i,j}^{n}\bigg{[}\sum_{l=j,k\neq j}\textrm{Cov}(P_{ij},P_{lk})$$

$${}+\sum_{k=j,l\neq j}\textrm{Cov}(P_{ij},P_{lk})+\sum_{k\neq j,l\neq j}\textrm{Cov}(P_{ij},P_{lk})\bigg{]}\bigg{\}},$$

$${}=\frac{1}{n^{3}}\bigg{\{}n\mathbb{E}\bigg{\{}\frac{\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}}{\pi^{4}(Y_{i},\mathbf{S}^{\prime}_{i})}\bigg{[}\pi(Y_{i},\mathbf{S}^{\prime}_{i})-4\pi^{4}(Y_{i},\mathbf{S}^{\prime}_{i})$$

$${}+6\pi^{3}(Y_{i},\mathbf{S}^{\prime}_{i})-3\pi^{4}(Y_{i},\mathbf{S}^{\prime}_{i})\bigg{]}\bigg{\}}$$

$${}+n(n-1)\mathbb{E}\left\{\frac{\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}}{\pi^{4}(Y_{i},\mathbf{S}^{\prime}_{i})}\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]^{2}\right\}$$

$${}+n(n-1)\mathbb{E}\left\{\frac{\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}}{\pi^{4}(Y_{i},\mathbf{S}^{\prime}_{i})}\left[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]^{2}\right\}\bigg{\}},$$

$${}=O^{*}(\frac{1}{n^{2}})+O^{*}(\frac{1}{n})+O^{*}(\frac{1}{n})$$

$${}=O^{*}(\frac{1}{n}).$$

Therefore, $\mathbf{Q}_{1n}=O_{p}(\frac{1}{\sqrt{n}})$, $\mathbf{Q}_{2n}$ can be expressed as follows:

$$\mathbf{Q}_{2n}=\frac{1}{\sqrt{n^{3}}}\sum_{i=1}^{n}\sum_{k=1}^{n}\left[\frac{\left[\Delta_{k}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})\right]I(Y_{k}=Y_{i},\mathbf{S}^{\prime}_{k}=\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})P(Y=Y_{i},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{i})}\right]\dot{l_{i}}(\theta_{0}),$$

$${}=\frac{1}{\sqrt{n}}\sum_{k=1}^{n}\left[\frac{\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})}{\pi(Y_{k},\mathbf{S}^{\prime}_{k})}\right]\left[\dot{l}_{k}^{*}(\theta_{0})+\frac{\frac{1}{n}\sum_{i=1}^{n}I(Y_{i}=Y_{k},\mathbf{S}^{\prime}_{i}=\mathbf{S}^{\prime}_{k})\dot{l_{i}}(\theta_{0})}{P(Y=Y_{k},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{k})}-\dot{l}_{k}^{*}(\theta_{0})\right],$$

$${}=\frac{1}{\sqrt{n}}\sum_{k=1}^{n}\left[\frac{\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})}{\pi(Y_{k},\mathbf{S}^{\prime}_{k})}\right]\dot{l}_{k}^{*}(\theta_{0})+\frac{1}{\sqrt{n}}\sum_{k=1}^{n}\Phi_{k}\left[\frac{1}{n}\sum_{i=1}^{n}\Psi_{ik}(\theta_{0})\right],$$

where $\Phi_{k}=\frac{\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})}{\pi(Y_{k},\mathbf{S}^{\prime}_{k})}$ and

$$\Psi_{ik}(\theta_{0})=\frac{I(Y_{i}=Y_{k},\mathbf{S}^{\prime}_{i}=\mathbf{S}^{\prime}_{k})\dot{l}_{i}(\theta_{0})-P(Y=Y_{k},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{k})\dot{l}_{k}^{*}(\theta_{0})}{P(Y=Y_{k},\mathbf{S}^{\prime}=\mathbf{S}^{\prime}_{k})}.$$

We have $\mathbb{E}\left[\Psi_{ik}(\theta_{0})|Y_{i}=Y_{k},\mathbf{S}^{\prime}_{i}=\mathbf{S}^{\prime}_{k}\right]=0$ and, hence,

$$\mathbb{E}\left[\Phi_{k}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Psi_{ik}(\theta_{0})\right)\right]=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\mathbb{E}\left[\mathbb{E}\left(\Phi_{k}\Psi_{k}(\theta_{0})|Y_{i}=Y_{k},\mathbf{S}^{\prime}_{i}=\mathbf{S}^{\prime}_{k}\right)\right],$$

$$=0.$$

Let $\Psi_{iks}(\theta_{0})$ be the $s$th element of $\Psi_{ik}(\theta_{0})$. Then, by Cauchy–Schwarz’s inequality,

$$\mathbb{E}\left[\mid\Phi_{k}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Psi_{iks}(\theta_{0})\right)\mid\right]\leq\left[\mathbb{E}\left(\Phi_{k}^{2}\right)\right]^{1/2}\left\{\mathbb{E}\left[\frac{1}{n}\left(\sum_{i=1}^{n}\Psi_{iks}(\theta_{0})\right)^{2}\right]\right\}^{1/2}.$$

Because for each element of $\Psi_{ik}(\theta_{0})$

$$\mathbb{E}\left[\frac{1}{n}\left(\sum_{i=1}^{n}\Psi_{iks}(\theta_{0})\right)^{2}\right]=\frac{1}{n}\left[\mathbb{E}\left(\sum_{i=1}^{n}\Psi_{iks}^{2}(\theta_{0})\right)+\sum_{i=1}^{n}\sum_{j=1,j\neq i}^{n}\mathbb{E}(\Psi_{iks}(\theta_{0})\Psi_{jks}(\theta_{0}))\right],$$

$${}=\mathbb{E}(\Psi_{iks}^{2}(\theta_{0}))<\infty,$$

we can proove $\mathbb{E}\left[\mid\Phi_{k}\left(\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Psi_{ik}(\theta_{0})\right)\mid\right]<\infty$.

By the weak law of large numbers $\frac{1}{n}\sum_{k=1}^{n}\left\{\Phi_{k}\left[\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\Psi_{ik}(\theta_{0})\right]\right\}=o_{p}(1)$ . Hence, $\mathbf{Q}_{2n}$ can be expressed as $\mathbf{Q}_{2n}=\frac{1}{\sqrt{n}}\sum_{k=1}^{n}\left[\frac{\Delta_{k}-\pi(Y_{k},\mathbf{S}^{\prime}_{k})}{\pi(Y_{k},\mathbf{S}^{\prime}_{k})}\right]\dot{l}_{k}^{*}(\theta_{0}+o_{p}(1)$.

$$\textrm{Var}\left[U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\right]=\mathbb{E}\left\{\mathbb{E}\left[\frac{[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})]^{2}\left[\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}|Y_{i},\mathbf{S}^{\prime}_{i}\right]\right\}$$

$${}+o^{*}(1),$$

$${}=\mathbb{E}\left[\frac{[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})]\left[\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right]+o^{*}(1)$$

and let $\Sigma=\textrm{Cov}\left[U_{ws,n}(\theta_{0},\pi),U_{ws,n}(\theta_{0},\hat{\pi})-U_{ws,n}(\theta_{0},\pi)\right]$, we have

$$\Sigma=-\mathbb{E}\left[\frac{\Delta_{i}[\Delta_{i}-\pi(Y_{i},\mathbf{S}^{\prime}_{i})]}{\pi^{2}(Y_{i},\mathbf{S}^{\prime}_{i})}\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]+o^{*}(1),$$

$${}=-\mathbb{E}\left[\frac{1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]+o^{*}(1),$$

where the notation $o^{*}(a_{n})$ denotes a matrix whose components are uniformly $o^{*}(a_{n})$. Finally,

$$\textrm{Var}\left[U_{ws,n}(\theta_{0},\hat{\pi})\right]=\mathbb{E}\left[\frac{\dot{l}_{i}(\theta_{0},\pi)\dot{l}_{i}(\theta_{0},\pi)^{T}}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right]+\mathbb{E}\left[\frac{[1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})]\left[\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\right]$$

$${}-2\mathbb{E}\left[\frac{1-\pi(Y_{i},\mathbf{S}^{\prime}_{i})}{\pi(Y_{i},\mathbf{S}^{\prime}_{i})}\dot{l}_{i}^{*}(\theta_{0})\dot{l}_{i}^{*}(\theta_{0})^{T}\right]+o^{*}(1),$$

$${}=\Omega_{3}(\theta_{0},\pi)-\left[\Omega_{4}(\theta_{0},\pi)-\Omega_{5}(\theta_{0},\pi)\right]+o^{*}(1).$$

Thus, by the central limit theorem, we have $U_{ws,n}(\theta_{0},\hat{\pi})$ converges in distribution to the Gaussian vector of mean zero and variance $\Omega_{3}(\theta_{0},\pi)-\left[\Omega_{4}(\theta_{0},\pi)-\Omega_{5}(\theta_{0},\pi)\right]$. Because $\left[-\bar{U}_{ws,n}^{-1}(\theta_{0},\hat{\pi})-\Sigma^{-1}(\theta_{0})\right]$ converges in probability to $0$, by Slutsky’s theorem $\left[-\bar{U}_{ws,n}^{-1}(\theta_{0},\hat{\pi})-\Sigma^{-1}(\theta_{0})\right]U_{ws,n}(\theta_{0},\hat{\pi})$ converges in distribution to $0$.

Finally, by Lemma 4 and Slutsky’s theorem, $\sqrt{n}(\hat{\theta}^{ws}_{n}-\theta_{0})$ converges in distribution to the Gaussian vector of mean zero and variance

$$\Delta_{ws}:=\Sigma(\theta_{0})^{-1}\{\Omega_{3}(\theta_{0},\pi)-\left[\Omega_{4}(\theta_{0},\pi)-\Omega_{5}(\theta_{0},\pi)\right]\}[\Sigma(\theta_{0})^{-1}]^{T}.$$

About this article

Cite this article

Amani, K.M., Hili, O. & Kouakou, K.J. Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates. Math. Meth. Stat. 32, 241–259 (2023). https://doi.org/10.3103/S1066530723040038

Download citation

Received: 17 April 2023
Revised: 01 July 2023
Accepted: 23 July 2023
Published: 23 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.3103/S1066530723040038

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates

Abstract

Similar content being viewed by others

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Adjusted Empirical Likelihood Estimation of Distribution Function and Quantile with Nonignorable Missing Data

Identification and Estimation of Generalized Additive Partial Linear Models with Nonignorable Missing Response

1 INTRODUCTION

2 MARGINALIZED ZIP MODELS

3 ESTIMATING PARAMETERS WITH MISSING COVARIATES

3.1 Kernel-Based Weighting Estimator of a MZIP Model

3.2 Semiparametric IPW (SIPW) Estimator of a MZIP Model

3.3 Asymptotic Results

4 SIMULATIONS STUDY

5 APPLICATION

6 CONCLUSIONS

REFERENCES

ACKNOWLEDGMENTS

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Appendix A

PROOFS OF ASYMPTOTIC RESULTS

6.1 Proof of Theorem 1

Appendix B

6.2 Proof of Theorem 2

About this article

Cite this article

Keywords:

Navigation

Statistical Inference in Marginalized Zero-inflated Poisson Regression Models with Missing Data in Covariates

Abstract

Similar content being viewed by others

Estimation of the mean of the partially linear single-index errors-in-variables model with missing response variables

Adjusted Empirical Likelihood Estimation of Distribution Function and Quantile with Nonignorable Missing Data

Identification and Estimation of Generalized Additive Partial Linear Models with Nonignorable Missing Response

1 INTRODUCTION

2 MARGINALIZED ZIP MODELS

3 ESTIMATING PARAMETERS WITH MISSING COVARIATES

3.1 Kernel-Based Weighting Estimator of a MZIP Model

3.2 Semiparametric IPW (SIPW) Estimator of a MZIP Model

3.3 Asymptotic Results

4 SIMULATIONS STUDY

5 APPLICATION

6 CONCLUSIONS

REFERENCES

ACKNOWLEDGMENTS

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Appendix A

Appendix A

PROOFS OF ASYMPTOTIC RESULTS

6.1 Proof of Theorem 1

Appendix B

6.2 Proof of Theorem 2

About this article

Cite this article

Share this article

Keywords:

Search

Navigation