1 Introduction

Count data collected in healthcare utilization studies exhibit remarkable features, including excess zeroes from the non-users of healthcare facilities, overdispersion, and multi-modality due to between-subject heterogeneity (Cameron and Trivedi 2005, 2013). The conventional two-part count models, such as zero-inflated Poisson and negative binomial models and hurdle Poisson and negative binomial models, have long been used to accommodate these features when analyzing healthcare utilization data in health economics and health services research. Recently, several marginalized two-part count models were proposed in the literature. These marginalized models were largely promoted because the models can allegedly provide “direct” marginal inference, whereas their non-marginalized counterparts cannot do so. This article is devoted to determine whether it is true that the marginalized two-part models are superior to non-marginalized two-part models for count data with excess zeroes because of the claimed advantage of direct marginal inference.

Investigational studies in healthcare utilization in health economics and health services research often set up their primary outcome as the number of usages of healthcare facilities, such as visits to primary care doctors or emergency department and days of hospitalization after surgeries. A consequence of this is that the outcomes are count observations expressed numerically as non-negative integers. Excessive zeroes occur when the studies involve participants that do not use any healthcare facilities during the study period. To account for excess zeroes in count data, Lambert (1992) first introduced the zero-inflated Poisson (ZIP) models as a two-part mixture model that combined a regular Poisson model with a latent binary distribution that governs the probability of generating structural zeroes and generating the Poisson counts. Since then, the ZIP models have been one of the most popular models for count data with excess zeroes (Winkelmann 2008) and have been extended to multivariate settings (Li et al. 1999) and models with random effects (Hall 2000). Recently, Long et al. (2014) modified the ZIP models and developed the marginalized ZIP (MZIP) models by specifying linear predictors for the overall mean of the count variable rather than using a linear predictor for the mean of Poisson component in the ZIP models. The MZIP models were claimed to be able to provide overall marginal effect inference while accommodating the mechanism of mixture of a random population and a degenerate component and also avoiding the misuse of conditional mean as the population mean.

A natural extension of the ZIP models is the zero-inflated negative binomial (ZINB) models proposed by Greene (1994), in which the Poisson model in the ZIP models is replaced by a negative binomial model and the component for structural zeroes remains. When equality of mean and variance fails even after structural zeroes are split, the ZINB models would be a better choice than the ZIP models. Ridout et al. (2001) indicated a serious bias of parameter estimates by the ZIP modelling if the nonzero counts are overdispersed in relation to the ZIP models, and they provided a score test for testing the ZIP models against the ZINB models. As a parallel proposal with Long et al. (2014), Preisser et al. (2016) introduced the marginalized ZINB (MZINB) models and justified the MZINB models by comparing parameter estimates with the ZIP and MZIP models from fitting simulated MZINB data. The rationale behind the MZINB models is identical to that of MZIP models in terms of seeking instant marginal inference. The difference is that the MZINB models specified the negative binomial distribution, rather than the Poisson distribution, to account for additional overdispersion.

Closely related to these zero-inflated models are the hurdle models that were originally proposed by Cragg (1971) and formally presented by Mullahy (1986). The hurdle models are dichotomous models combining a binary distribution of probing the count below or above the hurdle with a truncated count model above the hurdle. Hurdle-at-zero models are the most common hurdle models, among which the hurdle Poisson (Mullahy 1986) (HP) and hurdle negative binomial (HNB) models developed by Pohlmeier and Ulrich (1995) are the top choices in empirical analysis. Because of the complete separation of zero counts from the population of positive counts, hurdle models can accommodate count data with either zero-inflation or zero-deflation and either underdispersion or overdispersion based on the underlying count distributions. Although Kassahun et al. (2014) and Tabb et al. (2016) explored the marginalized hurdle models for panel count data, no research in the literature includes formal discussion of marginalized hurdle models for cross-sectional count data with excess zeroes.

The primary objective of this article is to rectify the previous misleading statement on the marginalized two-part models over their non-marginalized counterparts in characterizing the count data with excess zeroes. This article thoroughly defines and derives the (average) marginal and incremental effects of a covariate with respect to the overall marginal mean of count outcomes with excess zeroes in the context of four non-marginalized two-part models (the ZIP, ZINB, HP, and HNB models) and four marginalized two-part models (the MZIP, MZINB, marginalized hurdle Poisson, and marginalized hurdle negative binomial models). Among these models, it is the first time that the marginalized hurdle Poisson (MHP) and marginalized hurdle negative binomial (MHNB) models are formally proposed for cross-sectional data. Estimators and variance estimators are developed for the (average) marginal and incremental effect in each of the models. The derived effects and their estimators demonstrate that both types of models, either non-marginalized or marginalized, can provide marginal inference on the overall marginal mean of count outcomes with excess zeroes. The marginalized models have simplified marginal and incremental effects, but there is not any extra computational burden in estimating the effects by using the conventional models. Instead of promoting the use of marginalized two-part models, we emphasize that the two types of models should be taken as parallel competitors and that unjustified faith in either type of model will result in model misspecification bias in statistical inference, including the inference of marginal means. Comprehensive numerical studies were conducted and are reported in this article to illustrate the consequences on statistical analysis when the marginalized two-part models are misused for the data that are generated from the non-marginalized two-part models and vice versa. Substantial biases were observed in statistical inference in the numerical studies when either type of model was mistakenly replaced by its counterpart. We propose a solution to the possible misuse of either type of model, which is to conduct rigorous model comparison and selection by using the information reflected in the observed data. Simulation studies were conducted and are reported to investigate the three model comparison and selection criteria: the effect-specific mean square error criterion (Dow and Norton 2003), the information criteria, and the Vuong’s closeness test (Vuong 1989). The studies verify that the information criteria can best select among the two types of models regardless of the magnitude of sample size. Although the performance of the mean square error criterion is acceptable, the Vuong’s closeness test is not an ideal tool for distinguishing the non-marginalized and marginalized two-part models.

This article is organized as follows: Sect. 2 introduces the definitions of marginal and incremental effects and their average effects; Sect. 3 reviews the two zero-inflated models (i.e. ZIP and ZINB models) and their marginalized peers (i.e. MZIP and MZINB models), including their effect estimation; Sect. 4 discusses the two hurdle models (i.e. HP and HNB models) and the marginalized hurdle models (i.e. MHP and MHNB models) with effect estimation; Sect. 5 derives the variance estimators of marginal effects, incremental effects, and their average effects in these two-part models; Sect. 6 provides a thorough discussion on the question of superiority of marginalized two-part models over non-marginalized two-part models; Sect. 7 presents our simulation studies for comparison between ZIP and MZIP and between HNB and MHNB; Sect. 8 discusses model selection via effect estimates and shows further comparisons between paired models based on our simulation results; Sect. 9 reports an empirical analysis of German Socioeconomic Panel data using these models.

2 Marginal effects and average marginal effects

Let y be a count response variable (dependent variable) that takes the value of either a positive integer or zero. Let \(x = (x_1,x_2,\ldots ,x_J)\) be a vector of J covariates (independent variables). Denote \(\mu (x) = \text{ E }(y|x)\) the expected value of y, then the marginal effect (Greene 2002), or partial effect, of the jth covariate \(x_j\) on the expected overall outcome is defined as

$$\begin{aligned} \eta _j(x) = \frac{\partial \mu (x)}{\partial x_j} = \frac{\partial \text{ E }(y|x)}{\partial x_j}, \end{aligned}$$
(1)

where \(j = 1,2,\ldots ,J\). The marginal effect allows us to quantify the marginal change in the expected overall outcome when covariate \(x_{j}\) changes by a small amount while holding other covariates \(x_{(-j)} = (x_{1},\ldots , x_{j-1},x_{j + 1},\ldots ,x_{J})'\) constant. The marginal effect is a function of both unknown parameters and covariates, and is evaluated at a particular combination of covariate values, say \(x = x^{(0)}\), with the parameter estimates. Another quantity of interest in health economics and health services research is average marginal effect. Note that the marginal effect (1) represents the effect of the subpopulation that satisfies \(x = x^{(0)}\). This subpopulation may be a small or even negligible portion of the entire population. When the study objective is to assess the marginal effect on the outcomes in the entire population, the expected value of the marginal effect over the population distribution of all covariates is then the primary interest. This is quantified by the average marginal effect that is defined as

$$\begin{aligned} \text{ E }\{\eta _j (x)\} = \text{ E }\left\{ \frac{\partial \mu (x)}{\partial x_j} \right\} , \end{aligned}$$

in which the expectation is taken with respect to \(x = (x_1,x_2,\ldots ,x_J).\)

When \(x_{j}\) is a categorical covariate that represents multiple levels or experimental groups, the quantity of interest is usually the incremental effect (Greene 2002; Basu and Rathouz 2005). The incremental effect is defined as

$$\begin{aligned} \pi _j(x) = \mu (x_j = l_2,x_{(-j)})-\mu (x_j = l_1,x_{(-j)}), \end{aligned}$$

in which \(l_1\) and \(l_2\) are two levels of covariate \(x_j\). The incremental effect measures the difference in the expected overall outcome at the two levels of \(x_{j}\) while holding other covariates \(x_{(-j)}\) constant. When \(x_j\) is binary that takes values 1 or 0, the incremental effect from level 0 to level 1 is

$$\begin{aligned} \pi _j(x) = \mu (x_j = 1,x_{(-j)})-\mu (x_j = 0,x_{(-j)}). \end{aligned}$$

The average incremental effect is defined as

$$\begin{aligned} \text{ E }\{\pi _j (x)\}=\text{ E }\{\mu (x_j = l_2,x_{(-j)})-\mu (x_j = l_1,x_{(-j)})\}, \end{aligned}$$

in which the expectation is taken with respect to \(x_{(-j)} = (x_1,x_2,\ldots ,x_{(j-1)},x_{(j + 1)},\ldots ,x_J)\).

3 Estimation of marginal effects: zero-inflated models and marginalized zero-inflated models

3.1 Zero-inflated Poisson and negative binomial models

The zero-inflated Poisson (ZIP) model (Lambert 1992) for the count data with excess zeroes is a mixture of constant zeroes and a standard Poisson model. For the ith outcome \(y_i\), \(i=1,2,\ldots ,n\), the ZIP model is given by

$$\begin{aligned} y_{i}=\left\{ \begin{array}{ll}\,\,0 & \quad \text{ if }\,\,\, c_i =1;\\ \,\, y_{i}^*&\quad \text{ if }\,\,\, c_i =0, \end{array}\right. \end{aligned}$$

in which \(c_{i}\) is a Bernoulli variable with mean \(\psi _{i}=P(c_i=1)\) and \(y_{i}^*\sim \text{ Poisson }(\mu _i)\) with a probability mass function (pmf) \(g(y_{i}^*| \mu _i) =\displaystyle e^{-\mu _i}\mu _i^{y_{i}^*}/y_{i}^*!\). The marginal pmf of \(y_i\) in the ZIP model is

$$\begin{aligned} f(y_{i})=\left\{ \begin{array}{ll} \psi _{i} + (1-\psi _{i})e^{-\mu _i}, & \quad \text {for}\,\,\,y_{i} = 0,\\ ( 1-\psi _{i})e^{-\mu _i}\mu _i^{y_{i}}/y_{i} !, & \quad \text {for}\,\,\, y_{i} = 1,2,3,\ldots , \end{array}\right. \end{aligned}$$

with \(\text{ E }(y_{i})=\mu _i (1-\psi _i)\) and \(\text{ var }(y_{i})=\mu _i (1-\psi _i)(1 + \mu _i\psi _i)\). In contrast to the standard Poisson model, the overdispersion in the ZIP model is measured by \({\mathrm{var}}(y_i)/\text{ E }(y_i)=1+\psi _i \mu _i\). To further characterize the dependence of \(y_i\) on the covariates, Lambert (1992) constructed a ZIP model as

$$\begin{aligned} \ln (\mu _i) = {x'_i}{\beta } \qquad \text{ and }\qquad {\mathrm {logit}}(\psi _i)=\ln \left( \frac{\psi _{i}}{1-\psi _{i}}\right) = {z'_i}{\gamma }, \end{aligned}$$
(2)

in which \({x}_{i} = (x_{i0}\equiv 1,x_{i1},x_{i2},\ldots ,x_{iJ_1})'\) and \(z_i = (z_{i0}\equiv 1,z_{i1}, z_{i2},\ldots ,z_{iJ_2})'\) are two vectors of covariates that may or may not overlap with each other, \(\beta = (\beta _0,\beta _1,\ldots ,\beta _{J_1})'\) is the vector of regression coefficients for the Poisson process and \(\gamma = (\gamma _0,\gamma _1,\ldots ,\gamma _{J_2})'\) is the vector of regression coefficients for the excess zeroes, and \(\beta _0\) and \(\gamma _0\) are regression intercepts. Let \(\theta = (\beta ',\gamma ')'\) denote the vector that contains all unknown parameters in the ZIP model, then the log-likelihood function of the model is

$$\begin{aligned} \ell (\theta |y,x,z)&= \sum \limits _{i=1,\ldots ,n; \,y_i = 0} \ln (e^{{z}_{i}^{\prime }{\gamma }} + e^{-e^{{x}_{i}^{\prime }{\beta }}}) + \sum \limits _{i=1,\ldots ,n; \,y_i>0} (y_i {{x}_{i}^{\prime }{\beta }}-e^{{x}_{i}^{\prime }{\beta }}- \ln y_i!)\nonumber \\&\quad -\, \sum \limits _{i = 1}^n \ln (1 + e^{{z}_{i}^{\prime }{\gamma }}). \end{aligned}$$
(3)

In (3), the observed data are represented by the collection of \(y=(y_1,y_2,\ldots ,y_n)'\), \(x=(x_1',x_2',\ldots ,x_n')'\), and \(z=(z_1',z_2',\ldots ,z_n')'\). Estimates of the unknown parameters in the ZIP model can be obtained by maximizing (3) using numerical optimization methods. Lambert (1992) also derived the joint probability density function of \(y_i\) and \(c_i\) and the Expectation-Maximization (EM) algorithm for maximizing the complete log-likelihood function.

An extension of the ZIP model is the zero-inflated negative binomial (ZINB) model that assumes the count data with excess zeroes are observed from a mixture of constant zeroes and a standard negative binomial model. For the ith outcome \(y_i\), the ZINB model is given by

$$\begin{aligned} y_{i}=\left\{ \begin{array}{ll}\,\,0 &\quad \text{ if } \,\,c_i =1;\\ \,\, y_{i}^*&\quad \text{ if }\,\,c_i =0, \end{array}\right. \end{aligned}$$

in which \(c_{i}\) is still a Bernoulli variable with mean \(\psi _{i}=P(c_i=1)\) but \(y_{i}\) follows a negative binomial distribution \(\text{ NegBin }(\mu _i,\alpha )\) with a pmf

$$\begin{aligned} g(y_i| \mu _i,\alpha ) = \frac{\Gamma (y_i + \alpha )}{\Gamma (\alpha )\Gamma (y_i + 1)} \left( \frac{\alpha }{\alpha + \mu _i}\right) ^{\alpha } \left( \frac{\mu _i}{\alpha + \mu _i}\right) ^{y_i}, \quad y_i = 0,1,2,\ldots . \end{aligned}$$

The marginal pmf of \(y_i\) in the ZINB model is

$$\begin{aligned} f(y_i) = {\left\{ \begin{array}{ll}\displaystyle \psi _i + (1-\psi _i) \left( \frac{\alpha }{\alpha + \mu _i}\right) ^\alpha , & \quad \text{ for }\,\, y_i = 0,\\ \displaystyle (1-\psi _i)\frac{\Gamma (y_i + \alpha )}{\Gamma (\alpha )\Gamma (y_i + 1)} \left( \frac{\alpha }{\alpha + \mu _i}\right) ^{\alpha } \left( \frac{\mu _i}{\alpha + \mu _i}\right) ^{y_i},&\quad \text{ for }\,\,y_i = 1,2,\ldots , \end{array}\right. } \end{aligned}$$
(4)

with \(\text{ E }(y_i) = \mu _i(1-\psi _i)\) and \(\text{ var }(y_i) =\displaystyle \mu _i(1-\psi _i)\left( 1 + \mu _i/\alpha + \mu _i\psi _i\right)\). The negative binomial model for \(y_{i}^*\) in the count component of the ZINB model represents the overdispersed \(y_{i}^*\) in that \(\text{ var }(y^*_i)/\text{ E }(y^*_i)= 1 + \mu _i/\alpha\). The overdispersion in the ZINB model as a whole is measured by \(\text{ var }(y_i)/\text{ E }(y_i) = 1 + \mu _i(1 + \alpha \psi _i)/\alpha\). To describe the dependence of \(y_i\) on the covariates, the ZINB model specifies that

$$\begin{aligned} \ln (\mu _i) = {x'_i}{\beta } \qquad \text{ and }\qquad {\mathrm {logit}}(\psi _i)= {z'_i}{\gamma }, \end{aligned}$$

as in (2). Let \(\theta = (\beta ',\gamma ',\alpha )'\) denote the vector that contains all unknown parameters in the ZINB model, then the log-likelihood function of the model is

$$\begin{aligned} \ell (\theta | x,z, y)&= - \sum \limits _{i = 1}^n \ln (1 + e^{{z}_{i}^{\prime }{\gamma }}) + \sum \limits _{i=1,\ldots ,n; \,y_i = 0} \ln \left\{ e^{{z}_{i}^{\prime }{\gamma }} + \Big (\frac{\alpha }{\alpha +e^{{x}_{i}^{\prime }{\beta }}}\Big )^{{ \alpha }} \right\} \\&\quad +\, \sum \limits _{i=1,\ldots ,n; \,y_i>0}\Bigg \{ \sum _{j = 0}^{y_i-1} \ln \left( j + {\alpha }\right) -\left( {\alpha } + y_i\right) \ln \left( \alpha + e^{{x}_{i}^{\prime }{\beta }}\right) \\&\qquad \qquad \qquad \qquad \quad + \alpha \ln \alpha + y_i {{x}_{i}^{\prime }{\beta }}- \ln y_i! \Bigg \}. \end{aligned}$$

The marginal and incremental effects of the ZIP and ZINB models can be derived according to the definitions given in Sect. 2 and the modelling framework of the ZIP model. The marginal expectation of the response \(y_i\) in both the ZIP and the ZINB models possesses an identical expression as below:

$$\begin{aligned} \text{ E }(y_i|x_i,z_i) = \mu _i (1-\psi _i) = \frac{e^{x_i'\beta }}{1 + e^{{z'_i}\gamma }}. \end{aligned}$$

Here, to derive the marginal and incremental effects of the two models, we investigate the scenario, in which the first \(J_0\) covariates in \(x_i\) are duplicated in \(z_i\); that is, we assume that covariate \(x_{ij}\) in the vector \(x_i\) and covariate \(z_{ij}\) in the vector \(z_i\) are identical covariates for \(j=1,2,\ldots , J_0\) with \(J_0\le J_1\) and \(J_0\le J_2\). Then, the marginal effect \(\eta _j(x_i,z_i,\theta )\) of covariate \(x_{ij}\), or \(z_{ij}\), with respect to the overall mean of response \(y_i\) in the ZIP and ZINB models is

$$\begin{aligned} \eta _j(x_i,z_i,\theta ) = \frac{\partial \text{ E }(y_i|x_i,z_i) }{\partial x_{ij}} = \frac{e^{x_i'\beta }}{(1 + e^{{z'_i}\gamma })^2}\{\beta _j+e^{{z'_i}\gamma }(\beta _j-\gamma _j)\}. \end{aligned}$$

When \(x_{ij}\) is a categorical covariate, the incremental effect \(\pi _j(x_{i(-j)},z_{i(-j)},\theta )\) from level \(l_1\) to level \(l_2\) in \(x_{ij}\) with respect to \(y_i\) is

$$\begin{aligned} \pi _j(x_{i(-j)},z_{i(-j)},\theta ) = ~&\text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}=l_2, \theta ) - \text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}= l_1,\theta ) \\ =~&\frac{e^{x_{i(-j)}'\beta _{(-j)} + l_2\beta _j } }{1 + e^{z_{i(-j)}'\gamma _{(-j)} + l_2\gamma _j }} - \frac{e^{x_{i(-j)}'\beta _{(-j)} + l_1\beta _j } }{1 + e^{z_{i(-j)}'\gamma _{(-j)} + l_1\gamma _j }}. \end{aligned}$$

The estimates of the marginal and incremental effects \({\hat{\eta }}_j(x_i,z_i,{\hat{\theta }})\) and \({\hat{\pi }}_j(x_{i(-j)},z_{i(-j)},{\hat{\theta }})\) are obtained by substituting the unknown parameters \(\theta\) in \(\eta _j(x_i,z_i,\theta )\) and \(\pi _j(x_{i(-j)},z_{i(-j)},\theta )\) with their maximization likelihood estimates \({\hat{\theta }}\). The average marginal and average incremental effects of covariate \(x_{ij}\) with respect to the overall mean of response \(y_i\) in the ZIP and ZINB models are

$$\begin{aligned} {\bar{\eta }}_j(\theta )&= \text{ E }_{x,z}(\eta _{j} (x,z, \theta ) ) = \int \eta _{j} (x,z, \theta )\, d F_{x,z}(x,z)\\&= \int \frac{e^{x'\beta }}{(1 + e^{{z'}\gamma })^2}\{\beta _j+e^{{z'}\gamma }(\beta _j-\gamma _j)\}\, d F_{x,z}(x,z) \end{aligned}$$

and

$$\begin{aligned} {\bar{\pi }}_{j} (\theta )&= \text{ E }_{x_{(-j)},z_{(-j)}}(\pi _{j} (x_{(-j)},z_{(-j)}, \theta ) ) = \int \pi _{j} (x_{(-j)},z_{(-j)}, \theta )\ d F_{x_{(-j)},z_{(-j)}}(x_{(-j)},z_{(-j)}),\\&=\int \left( \frac{e^{x_{(-j)}'\beta _{(-j)} + l_2\beta _j } }{1 + e^{z_{(-j)}'\gamma _{(-j)} + l_2\gamma _j }} - \frac{e^{x_{(-j)}'\beta _{(-j)} + l_1\beta _j } }{1 + e^{z_{(-j)}'\gamma _{(-j)} + l_1\gamma _j }}\right) \, d F_{x_{(-j)},z_{(-j)}}(x_{(-j)},z_{(-j)}), \end{aligned}$$

in which \(F_{x,z}(x,z)\) and \(F_{x_{(-j)},z_{(-k)}} (x_{(-j)}, z_{(-k)})\) are the joint cumulative density functions of (xz) and \((x_{(-j)},z_{(-{k})})\), respectively. Estimators of the average marginal and average incremental effects are given by averaging the estimated marginal and incremental effects that are evaluated at the observed data:

$$\begin{aligned} \hat{\bar{\eta }}_j({{\hat{\theta }}}) = \frac{1}{n}\sum \limits _{i = 1}^n\hat{{\eta }}_j(x_i,z_i,{\hat{\theta }})\qquad \text{ and }\qquad \hat{\bar{\pi }}_j({\hat{\theta }}) = \frac{1}{n}\sum \limits _{i = 1}^n\hat{{\pi }}_j(x_{i(-j)},z_{i(-j)},{\hat{\theta }}). \end{aligned}$$
(5)

3.2 Marginalized zero-inflated Poisson and negative binomial models

The formulas of the (average) marginal and (average) incremental effects in the ZIP and ZINB models are complex as shown in Sect. 3.1. Especially, both the ZIP and the ZINB models cannot provide direct marginal inference on the overall mean of the response due to the fact that these models does not connect a linear predictor of covariates directly to the overall marginal mean. Long et al. (2014) and Preisser et al. (2016) proposed marginalized versions of the ZIP and ZINB models, named marginalized zero-inflated Poisson (MZIP) model and marginalized zero-inflated negative binomial (MZINB) model, respectively. The MZIP model (Long et al. 2014) still assumes that the zero-inflated count outcome \(y_i=0\) when \(c_i =1\) and \(y_i=y_{i}^*\) when \(c_i =0\), in which the binary variable \(c_{i}\sim \text{ Bernoulli }(\psi _{i})\) and \(y_{i}^*\sim \text{ Poisson }(\mu _i)\) with a pmf \(g(y_{i}^*| \mu _i) =\displaystyle e^{-\mu _i}\mu _i^{y_{i}^*}/y_{i}^*!\). However, instead of specifying a linear model for the log of \(\mu _i\) as in (2), the MZIP model assumes that the overall mean of the outcome is directly associated with a linear predictor of covariates:

$$\begin{aligned} \ln (v_i)= {x'_i}{\beta }\qquad \text{ and }\qquad {\mathrm {logit}}(\psi _i)= {z'_i}{\gamma }, \end{aligned}$$
(6)

in which \(v_i=\text{ E }(y_i)\). With \(\theta = (\beta ',\gamma ')'\), the log-likelihood function of the MZIP model is

$$\begin{aligned} \ell (\theta |y,x,z) &=-\sum \limits _{i = 1}^n \ln (1 + e^{{z'_i}\gamma }) + \sum \limits _{i=1,\ldots ,n; \,y_i = 0} \ln \Big \{e^{{z'_i}\gamma } + e^{-e^{{x'_i}\beta }(1 + e^{{z'_i}\gamma })}\Big \}\\&\quad +\, \sum \limits _{i=1,\ldots ,n; \,y_i>0} \left\{ y_i {x'_i}\beta + y_i \ln (1 + e^{{z'_i}\gamma }) -e^{{x'_i}\beta }(1 + e^{{z'_i}\gamma }) - \ln y_i! \right\} . \end{aligned}$$

The MZINB model (Preisser et al. 2016) assumes that the zero-inflated count outcome \(y_i=0\) when \(c_i =1\) and \(y_i=y_{i}^*\) when \(c_i =0\), in which \(c_{i}\sim \text{ Bernoulli }(\psi _{i})\) and \(y_{i}^*\sim \text{ NegBin }(\mu _i)\) with a pmf described in (4). To get a direct marginal interpretation on the overall mean of the response, as in the MZIP model, the same two regression equations are constructed in the MZINB model

$$\begin{aligned} \ln (v_i)= {x'_i}{\beta }\qquad \text{ and }\qquad {\mathrm {logit}}(\psi _i)= {z'_i}{\gamma }, \end{aligned}$$

in which \(v_i=\text{ E }(y_i)\). With \(\theta = (\beta ',\gamma ', \alpha )'\), the log-likelihood function of the MZINB model is

$$\begin{aligned} \ell (\theta |y, x, z) &=-\sum \limits _{i = 1}^n \ln (1 + e^{{z'_i}\gamma }) + \sum \limits _{i=1,\ldots ,n; \,y_i = 0} \ln \left[ e^{{z'_i}\gamma } + \Big \{\frac{\alpha }{\alpha + e^{{x'_i}\beta }(1 + e^{{z'_i}\gamma })}\Big \}^{{ \alpha }} \right] \\&\quad + \, \sum \limits _{i=1,\ldots ,n; \,y_i>0} \left[ \sum \limits _{j = 0}^{y_i-1}\ln (\alpha + j) - (\alpha + y_i) \ln \{\alpha + e^{{x'_i}\beta }(1 + e^{{z'_i}\gamma })\} \right] \\&\quad + \, \sum \limits _{i=1,\ldots ,n;y_i>0} \left\{ \alpha \ln \alpha + y_i {x'_i}\beta + y_i \ln (1 + e^{{z'_i}\gamma }) - \ln y_i!\right\} . \end{aligned}$$

The specification of the MZIP and the MZINB models leads to

$$\begin{aligned} \text{ E }(y_i|x_i,z_i) = e^{{x'_i}\beta }. \end{aligned}$$
(7)

This concise representation on the overall mean response results in the simplified formulas of the marginal and incremental effects for the MZIP and the MZINB models. It can be derived that, for these models, the marginal and incremental effects of covariate \(x_{ij}\), or \(z_{ij}\), with respect to the overall mean of response \(y_i\) are

$$\begin{aligned} \eta _j(x_i,z_i,\theta ) = ~&\frac{\partial \text{ E }(y_i|x_i,z_i) }{\partial x_{ij}} = \beta _j e^{{x'_i}\beta } \end{aligned}$$
(8)

and

$$\begin{aligned} \pi _j(x_{i(-j)},z_{i(-j)},\theta ) = ~&\text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}=l_2, \theta ) - \text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}= l_1,\theta )\nonumber \\ = ~&e^{x'_{i(-j)}\beta _{(-j)} + l_2\beta _j}-e^{x'_{i(-j)}\beta _{(-j)} + l_1\beta _j}, \end{aligned}$$
(9)

respectively. The estimates of the marginal and incremental effects are

$$\begin{aligned} {\hat{\eta }}_j(x_i,z_i,{\hat{\theta }}) = {\hat{\beta }}_j e^{{x'_i}\hat{\beta }}\qquad \text{ and }\qquad {\hat{\pi }}_j(x_{i(-j)},z_{i(-k)},{\hat{\theta }}) = e^{x'_{i(-j)}{\hat{\beta }}_{(-j)} + l_2{\hat{\beta }}_j}-e^{x'_{i(-j)}{\hat{\beta }}_{(-j)} + l_1{\hat{\beta }}_j}. \end{aligned}$$
(10)

The average marginal effect and average incremental effect of covariate \(x_{ij}\) with respect to the overall mean of response \(y_i\) in the MZIP and the MZINB models are

$$\begin{aligned} \bar{\eta }_j(\theta )&= \text{ E }_{x,z}(\eta _{j} (x,z, \theta ) ) = \int \beta _j e^{x'\beta } d F_{x,z}(x,z) \end{aligned}$$
(11)

and

$$\begin{aligned} {\bar{\pi }}_{j} (\theta )&= \text{ E }_{x_{(-j)},z_{(-k)}}(\pi _{j} (x,z, \theta ) ) \nonumber \\&= \int \left( e^{x'_{(-j)}\beta _{(-j)} + l_2\beta _j}-e^{x'_{(-j)}\beta _{(-j)} + l_1\beta _j}\right) \, d F_{x_{(-j)},z_{(-k)}} (x_{(-j)}, z_{(-k)}), \end{aligned}$$
(12)

respectively. Estimators of these average marginal and average incremental effects are again given by averaging the estimated marginal and incremental effects evaluated at the observed data:

$$\begin{aligned} \hat{\bar{\eta }}_j({\hat{\theta }}) = \frac{1}{n}\sum \limits _{i = 1}^n\hat{{\eta }}_j(x_i,z_i,{\hat{\theta }})\qquad \text{ and }\qquad \hat{\bar{\pi }}_j({\hat{\theta }}) = \frac{1}{n}\sum \limits _{i = 1}^n\hat{{\pi }}_j(x_{i(-j)},z_{i(-j)},{\hat{\theta }}). \end{aligned}$$
(13)

4 Estimation of marginal effects: hurdle models and marginalized hurdle models

4.1 Hurdle Poisson and negative binomial models

Hurdle models ( Mullahy 1986) characterize the statistical processes that generate observations below the hurdle and above the hurdle. Hurdle models are two-component models, in which one component is a dichotomous model for a latent binary variable indicating outcomes below or above the hurdle and another component is, when the hurdle at zero is crossed, a truncated model for outcomes above the hurdle. In the hurdle models, a Bernoulli binary variable \(c_i\) with a mean of \(\psi _i\) is combined with a zero-truncated count variable \(y_i^*\) with a zero-truncated pmf

$$\begin{aligned} {\tilde{g}}(y_i^*)=\displaystyle \frac{g(y_i^*)}{1-g(0)}, \quad y_i^*=1,2,3,\ldots , \end{aligned}$$
(14)

yielding the outcome \(y_i\) through the mechanism

$$\begin{aligned} y_{i}=\left\{ \begin{array}{ll}\,\,0 & \quad \text{ if } \,\,c_i =1;\\ \,\, y_{i}^*&\quad \text{ if }\,\, c_i =0, \end{array}\right. \end{aligned}$$

in which \(g(y_{i}^*)\) that has support over the nonnegative integers including zero is a pmf before zero truncation. The marginal pmf of \(y_i\) in the hurdle model is

$$\begin{aligned} f(y_i) = \left\{ \begin{array}{ll} \psi _i, & \quad \text {for}\,\, y_i = 0,\\ \displaystyle \frac{1-\psi _i}{1-g(0)}g(y_i), & \quad \text {for}\,\, y_i = 1,2,3,\ldots . \end{array}\right. \end{aligned}$$

The mean and variance of \(y_i\) in the hurdle models are

$$\begin{aligned} \text{ E }(y_i)&= \frac{1-\psi _i}{1-g(0)}\mu _i,\\ \text{ var }(y_i)&= \frac{1-\psi _i}{1-g(0) }\sigma _i^2 + \frac{(1-\psi _i)(\psi _i-g(0))}{(1-g(0))^2} \mu _i^2, \end{aligned}$$

in which \(\mu _i\) and \(\sigma _i^2\) are the mean and variance, respectively, of the pmf \(g(y_{i}^*)\). In the hurdle models, the zero observations are below the hurdle and the positive counts are assumed to be produced from the zero-truncated count model when above the hurdle. Because the zero and positive count data are completely separated by the two parts of the models, the hurdle models can be used to fit both zero-inflated count data and zero-deflated count data. The zero inflation or deflation is determined by the magnitude of \(1-\psi _i\) and \(1-g(0)\) or, equivalently, the magnitude of \(\psi _i\) and g(0). The overdispersion in the hurdle models is measured by \(\displaystyle \frac{\text{ var }(y_i)}{\text{ E }(y_i)} = \displaystyle \frac{\sigma _i^2}{\mu _i} + \frac{\psi _i-g(0)}{1-g(0)} \mu _i\).

Conventional hurdle models include hurdle Poisson (HP) model and hurdle negative binomial (HNB) model. The HP model is constructed by specifying \(g(y_{i}^*)\), the pmf before zero truncation in (14), to be the pmf of \(\text{ Poisson }(\mu _i)\). As such, the marginal pmf of \(y_i\) in the HP model is

$$\begin{aligned} f(y_i) = \left\{ \begin{array}{ll} \psi _i, & \quad \text {for} \,\,y_i = 0,\\ \displaystyle \frac{1-\psi _i}{1-e^{-\mu _i}} \cdot \frac{e^{-\mu _i}\mu _i^{y_i} }{y_i!}, & \quad \text {for}\,\, y_i = 1,2,3,\ldots . \end{array}\right. \end{aligned}$$

The HNB model is constructed by specifying \(g(y_{i}^*)\) to be the pmf of \(\text{ NegBin }(\mu _i,\alpha )\). The marginal pmf of \(y_i\) in the HNB model is

$$\begin{aligned} f(y_i) = \left\{ \begin{array}{ll} \psi _i, & \quad \text {for}\,\, y_i = 0,\\ \displaystyle \frac{1-\psi _i}{ 1-\{\alpha /(\alpha + \mu _i)\}^{\alpha }} \cdot \frac{\Gamma (y_i + \alpha )}{\Gamma (\alpha )\Gamma (y_i + 1)}\cdot \left( \frac{\alpha }{\alpha + \mu _i}\right) ^{\alpha } \left( \frac{\mu _i}{\alpha + \mu _i} \right) ^{y_i}, & \quad \text {for}\,\, y_i = 1,2,3,\ldots . \end{array}\right. \end{aligned}$$

To characterize the dependence of \(y_i\) on the covariates, the HP and HNB models set up two regression models as in (2):

$$\begin{aligned} \ln (\mu _i) = {x'_i}{\beta } \qquad \text{ and }\qquad {\mathrm {logit}}(\psi _{i}) = {z'_i}{\gamma }. \end{aligned}$$

Denote the parameter vector in the HP and HNB models by \(\theta = (\beta ',\gamma ')'\) and \(\theta = (\beta ',\gamma ',\alpha )'\), respectively, then the log-likelihood function of the HP model is

$$\begin{aligned} \ell (\theta |y, x,z)&= \sum \limits _{i=1,\ldots ,n;\, y_i = 0} {{z}_{i}^\prime {\gamma }} -\sum \limits _{i = 1}^n \ln (1 + e^{{{z}_{i}^\prime {\gamma }}}) \\&\quad + \,\sum \limits _{i=1,\ldots ,n;\, y_i>0} \left\{ y_i {{x'_i}{\beta }} -\ln (e^{e^{{x'_i}{\beta }}}-1) - \ln y_i!\right\} , \end{aligned}$$

and the log-likelihood function of the HNB model is

$$\begin{aligned} \ell (\theta |y,x,z)& = \sum \limits _{i=1,\ldots ,n;\,y_i = 0} {{z}_{i}^\prime {\gamma }} -\sum \limits _{i = 1}^n \ln (1 + e^{{{z}_{i}^\prime {\gamma }}}) \\&\quad + \,\sum \limits _{i=1,\ldots ,n;\,y_i>0} \left\{ \sum \limits _{j = 0}^{y_i-1} \ln \left( {\alpha } + j\right) -\ln y_i! + \alpha \ln \alpha + y_i {{x'_i}{\beta }}\right\} \\&\quad -\,\sum \limits _{i=1,\ldots ,n;\,y_i>0} \left[ \ln \left\{ 1-\left( \frac{\alpha }{\alpha + e^{{x'_i}{\beta }}}\right) ^\alpha \right\} + (\alpha + y_i)\ln (\alpha + e^{{x'_i}{\beta }}) \right] . \end{aligned}$$

The marginal and incremental effects of the HP and HNB models are considerably complex. The marginal expectation of the response \(y_i\) in the HP model is

$$\begin{aligned} \text{ E }(y_i|x_i,z_i)= \frac{e^{x_i'\beta }}{(1 + e^{z_i'\gamma })(1-e^{-e^{x_i'\beta }})}. \end{aligned}$$

It can be derived that the marginal effect \(\eta _j(x_i,z_i,\theta )\) of covariate \(x_{ij}\), or \(z_{ij}\), with respect to the overall mean of response \(y_i\) in the HP model is

$$\begin{aligned} \eta _j(x_i,z_i,\theta )& = \frac{e^{x_i'\beta +e^{x_i'\beta }}}{(1 + e^{{z'_i}\gamma })^2(e^{e^{x_i'\beta }}-1)^2}\\&\quad \cdot \left[ \left( e^{e^{x_i'\beta }}-1\right) \left\{ \beta _j+(\beta _j-\gamma _j)e^{z_i'\gamma }\right\} -\beta _j e^{x_i'\beta }(1+e^{z_i'\gamma }) \right] , \end{aligned}$$

When \(x_{ij}\) is a categorical covariate, the incremental effect \(\pi _j(x_{i(-j)},z_{i(-j)},\theta )\) from level \(l_1\) to level \(l_2\) in \(x_{ij}\) with respect to \(y_i\) is

$$\begin{aligned}&\pi _j(x_{i(-j)},z_{i(-j)},\theta ) \\&\quad = ~\text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}=l_2, \theta ) - \text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}= l_1,\theta )\\&\quad =~ \frac{e^{x_{i(-j)}'\beta _{(-j)} + l_2\beta _j +e^{x_{i(-j)}'\beta _{(-j)} + l_2\beta _j }} }{\{1 + e^{z_{i(-j)}'\gamma _{(-j)} + l_2\gamma _j }\}\{ e^{x_{i(-j)}'\beta _{(-j)} + l_2\beta _j } -1\}}\\&\qquad -\, \frac{e^{x_{i(-j)}'\beta _{(-j)} + l_1\beta _j +e^{x_{i(-j)}'\beta _{(-j)} + l_1\beta _j }} }{\{1 + e^{z_{i(-j)}'\gamma _{(-j)} + l_1\gamma _j }\}\{ e^{x_{i(-j)}'\beta _{(-j)} + l_1\beta _j } -1\} } . \end{aligned}$$

The average marginal and average incremental effects of covariate \(x_{ij}\) with respect to the overall mean of response \(y_i\) in the HNB model are consequently

$$\begin{aligned} \bar{\eta }_j(\theta )&= \int \frac{e^{x'\beta +e^{x'\beta }}}{(1 + e^{{z'}\gamma })^2(e^{e^{x'\beta }}-1)^2} \cdot \left[ \left( e^{e^{x'\beta }}-1\right) \left\{ \beta _j+(\beta _j-\gamma _j)e^{z'\gamma }\right\} \right. \\&\left. \quad -\,\beta _j e^{x'\beta }(1+e^{z'\gamma }) \right] \, d F_{x,z}(x,z) \end{aligned}$$

and

$$\begin{aligned} {\bar{\pi }}_{j} (\theta )&=\int \left[ \frac{e^{x_{(-j)}'\beta _{(-j)} + l_2\beta _j +e^{x_{(-j)}'\beta _{(-j)} + l_2\beta _j }} }{\{1 + e^{z_{(-j)}'\gamma _{(-j)} + l_2\gamma _j }\}\{ e^{x_{(-j)}'\beta _{(-j)} + l_2\beta _j } -1\} } \right. \\&\left. \quad -\, \frac{e^{x_{(-j)}'\beta _{(-j)} + l_1\beta _j +e^{x_{(-j)}'\beta _{(-j)} + l_1\beta _j }} }{\{1 + e^{z_{(-j)}'\gamma _{(-j)} + l_1\gamma _j }\}\{ e^{x_{(-j)}'\beta _{(-j)} + l_1\beta _j } -1\} } \right] \\&\quad \cdot d F_{x_{(-j)},z_{(-j)}}(x_{(-j)},z_{(-j)}). \end{aligned}$$

The marginal expectation of the response \(y_i\) in the HNB model is

$$\begin{aligned} \text{ E }(y_i|x_i,z_i)=\displaystyle \frac{e^{x_i'\beta }}{(1 + e^{z_i'\gamma })\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_i'\beta }}\right) ^\alpha \right\} }. \end{aligned}$$

Then, the (average) marginal and (average) incremental effects in the HNB model are

$$\begin{aligned} \eta _j(x_i,z_i,\theta )&= \frac{\partial \text{ E }(y_i|x_i,z_i) }{\partial x_{ij}} = \displaystyle \frac{e^{x_i'\beta } }{(1 + e^{z_i'\gamma })^2\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_i'\beta }}\right) ^\alpha \right\} ^2}\\&\quad \cdot \left[ \left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_i'\beta }}\right) ^\alpha \right\} \{\beta _j + (\beta _j-\gamma _j) e^{z_i'\gamma } \} \right. \\&\left. \quad - \, \beta _j e^{x_i'\beta } \left( \displaystyle \frac{\alpha }{\alpha + e^{x_i'\beta }}\right) ^{\alpha +1} (1 + e^{z_i'\gamma }) \right] ,\\ \pi _j(x_{i(-j)},z_{i(-j)},\theta ) &=\text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}=l_2, \theta ) - \text{ E }(y_i|x_{i(-j)}, z_{i(-j)}, x_{ij}= l_1,\theta ) \\&= \frac{e^{x_{i(-j)}'\beta _{(-j)} + l_2\beta _j } }{(1 + e^{z_{i(-j)}'\gamma _{(-j)} + l_2\gamma _j })\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_{i(-j)}'\beta _{(-j)} + l_2\beta _j }}\right) ^\alpha \right\} }\\&\quad -\,\frac{e^{x_{i(-j)}'\beta _{(-j)} + l_1\beta _j } }{(1 + e^{z_{i(-j)}'\gamma _{(-j)} + l_1\gamma _j })\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_{i(-j)}'\beta _{(-j)} + l_1\beta _j }}\right) ^\alpha \right\} },\\ \bar{\eta }_j(\theta ) &=\int \frac{e^{x'\beta } }{(1 + e^{z'\gamma })^2\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x'\beta }}\right) ^\alpha \right\} ^2}\\&\quad \cdot \left[ \left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x'\beta }}\right) ^\alpha \right\} \{\beta _j + (\beta _j-\gamma _j) e^{z'\gamma } \} \right. \\&\left. \quad -\, \beta _j e^{x'\beta } \left( \displaystyle \frac{\alpha }{\alpha + e^{x'\beta }}\right) ^{\alpha +1} (1 + e^{z'\gamma }) \right] \, d F_{x,z}(x,z), \end{aligned}$$

and

$$\begin{aligned} {\bar{\pi }}_{j} (\theta )&=\int \left[ \frac{e^{x_{(-j)}'\beta _{(-j)} + l_2\beta _j } }{(1 + e^{z_{(-j)}'\gamma _{(-j)} + l_2\gamma _j })\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_{(-j)}'\beta _{(-j)} + l_2\beta _j }}\right) ^\alpha \right\} } \right. \\&\left. \quad -\,\frac{e^{x_{(-j)}'\beta _{(-j)} + l_1\beta _j } }{(1 + e^{z_{(-j)}'\gamma _{(-j)} + l_1\gamma _j })\left\{ 1-\left( \displaystyle \frac{\alpha }{\alpha + e^{x_{(-j)}'\beta _{(-j)} + l_1\beta _j }}\right) ^\alpha \right\} } \right] \, d F_{x_{(-j)},z_{(-j)}}(x_{(-j)},z_{(-j)}), \end{aligned}$$

respectively. The estimates of the marginal and incremental effects in the two models can be obtained by substituting the unknown parameters in the effects with their maximum likelihood estimates. Estimators of the average marginal and average incremental effects are obtained by averaging the estimated marginal and incremental effects that are evaluated at the observed data.

4.2 Marginalized hurdle Poisson and negative binomial models

It is straightforward to construct marginalized hurdle Poisson and negative binomial models for cross-sectional count data with excess zero. However, it has not been officially reported in the literature, although Tabb et al. (2016) proposed marginalized random-effects hurdle Poisson and negative binomial models for panel count data. The marginalized hurdle models assume, as in the hurdle models in Sect. 4.1, that the zero-inflated count outcome \(y_i=0\) when \(c_i =1\) and \(y_i=y_{i}^*\) when \(c_i =0\), in which the binary variable \(c_{i}\sim \text{ Bernoulli }(\psi _{i})\) and \(y_{i}^*\) follows a zero-truncated distribution with a pmf \({\tilde{g}}(y_i^*)=\displaystyle \frac{g(y_i^*)}{1-g(0)}\), \(y_i^*=1,2,3,\ldots\). To achieve the goal of making direct inference on the overall mean of the outcome \(y_i\), the marginalized hurdle models specify as in (6) that

$$\begin{aligned} \ln (v_i)= {x'_i}{\beta }\qquad \text{ and }\qquad {\mathrm {logit}}(\psi _i)= {z'_i}{\gamma }, \end{aligned}$$

in which \(v_i=\text{ E }(y_i)\). The marginalized hurdle Poisson (MHP) model can be constructed by assigning \(g(y_{i}^*)\), the pmf before zero truncation, to be the pmf of \(\text{ Poisson }(\mu _i)\), and the marginalized hurdle negative binomial (MHNB) model is constructed by assigning \(g(y_{i}^*)\), the pmf before zero truncation, to be the pmf of \(\text{ NegBin }(\mu _i,\alpha )\).

It can be derived that, for the MHP model, the log-likelihood function is

$$\begin{aligned} \ell (\theta ,\mu |y,x,z)&= \sum \limits _{i=1,\ldots ,n; \,y_i = 0}\big \{{z'_i}\gamma -\ln (1 + e^{{z'_i}\gamma })\big \}\nonumber \\&\quad +\, \sum \limits _{i=1,\ldots ,n; \,y_i>0}\left\{ {x'_i}\beta -\ln y_i! + (y_i-1)\ln \mu _i -\mu _i\right\} , \end{aligned}$$
(15)
$$\begin{aligned} \text{ subject } \text{ to }\qquad e^{{x'_i}\beta }&=\displaystyle \frac{\mu _i}{(1 + e^{{z'_i}\gamma })(1-e^{-\mu _i})},\qquad i=1,2,\ldots ,n, \end{aligned}$$
(16)

in which \(\theta = (\beta ',\gamma ')'\) and \(\mu =(\mu _1,\mu _2,\ldots ,\mu _n)\). For the MHNB model, the log-likelihood function is

$$\begin{aligned} \ell (\theta ,\mu |y,x,z)&=\sum \limits _{i=1,\ldots ,n; \,y_i = 0}\big \{{z'_i}\gamma - \ln (1 + e^{{z'_i}\gamma })\big \} \nonumber \\&\quad+ \sum \limits _{i=1,\ldots ,n; \,y_i>0}\left\{ \sum \limits _{j = 0} ^{y_i-1}\ln (j + \alpha ) -\ln y_i! + {x'_i}\beta \right\} \nonumber \\&\quad+ \sum \limits _{i=1,\ldots ,n; \,y_i>0}\left\{ (y_i-1)\ln \mu _i + \alpha \ln \alpha - (\alpha + y_i)\ln (\alpha + \mu _i)\right\} , \end{aligned}$$
(17)
$$\begin{aligned} \text{ subject } \text{ to }\qquad e^{{x'_i}\beta } &=\displaystyle \frac{\mu _i}{(1 + e^{{z'_i}\gamma })[1-\{\alpha /(\alpha + \mu _i)\}^\alpha ]},\qquad i=1,2,\ldots ,n, \end{aligned}$$
(18)

in which \(\theta = (\beta ',\gamma ',\alpha )'\) and \(\mu =(\mu _1,\mu _2,\ldots ,\mu _n)\). The maximum likelihood estimates are obtained in the MPH and MHNB models by numerically solving \({\hat{\theta }}=\max _\theta \ell (\theta )\) in (15) and (17) but subject to (16) and (18), respectively.

The specification of the MHP and MHNB models leads to \(\text{ E }(y_i|x_i,z_i) = e^{{x'_i}\beta }\), which is identical to the expression in (7) for the MZIP and MZINB models. Therefore, the (average) marginal and (average) incremental effects for the MHP and MHNB models, and their estimates, are given by (8)–(13).

5 Variance estimation of marginal effects

Asymptotic variances of the estimated marginal effects and average marginal effects can be derived using the delta method and Taylor series approximations. Note that the parameters \(\theta\) in the models summarized in Sects. 3 and 4 are estimated by maximizing their log-likelihood functions. Under regular conditions, \({\hat{\theta }} \mathop {\longrightarrow }\limits ^{P} \theta\) as \(n\rightarrow \infty\) and

$$\begin{aligned} {\hat{\theta }} \mathop {\longrightarrow }^{D} N(\theta ,[I_n(\theta )]^{-1}), \end{aligned}$$

where \(I_n(\theta ) = \displaystyle - \text{ E }\left( \frac{\partial ^2\ell (\theta )}{\partial \theta ^2}\right)\) is the Fisher information matrix and \([I_n(\theta )]^{-1} = (1/n)[I_1(\theta )]^{-1} \mathop {\longrightarrow }\limits ^{P} 0\) as \(n\rightarrow \infty\). The observed Fisher information matrix is \(I_n({\hat{\theta }}) = \displaystyle - \frac{\partial ^2\ell (\theta )}{\partial \theta ^2}\Bigg |_{\theta = {\hat{\theta }}}\). By the delta method, variances of the estimated marginal and incremental effects \({\hat{\eta }}_j (x_i,z_i,{\hat{\theta }})\) and \({\hat{\pi }}_j (x_{i(-j)},z_{i(-j)},{\hat{\theta }})\), as continuously differentiable functions of the parameters, can be estimated by

$$\begin{aligned} {\widehat{\text{ var }}}_{\hat{\eta }_j} (x_i,z_i, \hat{\theta }) = \left( \nabla _\theta {\hat{\eta }}_j (x_i,z_i,{\hat{\theta }})\right) ^\prime \big [I_n({\hat{\theta }})\big ]^{-1} \left( \nabla _\theta {\hat{\eta }_j} (x_i,z_i,{\hat{\theta }})\right) \end{aligned}$$
(19)

and

$$\begin{aligned} {\widehat{\text{ var }}}_{\hat{\pi }_j} (x_{i(-j)},z_{i(-j)}, {\hat{\theta }}) = \left( \nabla _\theta {\hat{\pi }}_j (x_{i(-j)},z_{i(-j)},{\hat{\theta }})\right) ^\prime \big [I_n({\hat{\theta }})\big ]^{-1} \left( \nabla _\theta {\hat{\pi }}_j (x_{i(-j)},z_{i(-j)},{\hat{\theta }})\right) , \end{aligned}$$
(20)

respectively.

To derive the variance estimator of the average marginal effect \(\hat{\bar{\eta }}_j({\hat{\theta }})\), the multivariate Taylor’s theorem is applied for \(\hat{\bar{\eta }}_j({\hat{\theta }})\) with respect to \({\hat{\theta }}\) at the true value \(\theta\):

$$\begin{aligned} \hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) = \hat{{\bar{\eta }}}_{j} (\theta ) + \left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) '({\hat{\theta }}-\theta ) + h_1({\tilde{\theta }}) ({\hat{\theta }}-\theta ), \end{aligned}$$

where \({\tilde{\theta }}\) is some value between \(\theta\) and \({\hat{\theta }}\), and \(\lim \limits _{{\hat{\theta }}\rightarrow \theta } h_1({\tilde{\theta }}) = 0\) in probability. Thus,

$$\begin{aligned} \text{ var }(\hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) )&= \text{ var }( \hat{{\bar{\eta }}}_{j} (\theta )) + \text{ var }\left( \left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) ' ({\hat{\theta }}-\theta ) \right) \nonumber \\&\quad +\, 2\,\text{ cov }\left( \hat{{\bar{\eta }}}_{j} (\theta ) ,\left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (x_i,z_i,\theta ) \right) ' ({\hat{\theta }}-\theta )\right) \nonumber \\&\quad +\,\text{ var }\left( h_1({\tilde{\theta }}) ({\hat{\theta }}-\theta )\right) + 2\,\text{ cov }\left( \hat{{\bar{\eta }}}_{j} (\theta )),h_1({\tilde{\theta }}) ({\hat{\theta }}-\theta ) \right) \nonumber \\&\quad +\,2\,\text{ cov }\left( \left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) ' ({\hat{\theta }}-\theta ), h_1({\tilde{\theta }}) ({\hat{\theta }}-\theta )\right) . \end{aligned}$$
(21)

The first term on the right-hand side of (21) is estimated by \({{\widehat{\text{ var }}}}(\hat{{\bar{\eta }}}_{j} (\theta )) =\displaystyle \frac{1}{n(n-1)} \sum \limits _{i = 1}^n \Big ({\hat{\eta }}_{j}(x_i,z_i,{\hat{\theta }}) -\hat{{\bar{\eta }}}_{j} ({\hat{\theta }})\Big )^2\). For the second term, the delta method gives

$$\begin{aligned} \left( \left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) '({\hat{\theta }}-\theta ) \right) \mathop {\longrightarrow }\limits ^{D} N\bigg (0,\left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) ' [I_n(\theta )]^{-1}\left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) \bigg ), \end{aligned}$$
(22)

which implies \(\text{ E }\big (\left( \nabla _\theta \hat{{\bar{\eta }}}_{j} (\theta ) \right) '({\hat{\theta }}-\theta ) \big ) \mathop {\longrightarrow }\limits ^{P} 0,~ \text{ as } n\rightarrow \infty\). Therefore, the second term on the right-hand side of (21) is estimated by \(\left( \nabla _\theta \hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) \right) '[I_n({\hat{\theta }})]^{-1} \left( \nabla _\theta \hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) \right) ,\) in which \(\displaystyle \nabla _\theta \hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) = \frac{1}{n}\sum \limits _{i = 1}^n \nabla _\theta \hat{\eta }_{j} ({\hat{\theta }})\). In addition, the consistency of \({\hat{\theta }}\), the normality in (22), the fact that \(\lim \limits _{{\hat{\theta }}\rightarrow \theta } h_1({\tilde{\theta }}) = 0\) as \(n\rightarrow \infty\), and the Slutsky’s Theorem together indicate the remaining four terms in (21) approach 0 in probability as \(n\rightarrow \infty\). Therefore, the estimator of \(\text{ var }(\hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) )\) is

$$\begin{aligned} {\widehat{\text{ var }}}(\hat{{\bar{\eta }}}_{j} ({\hat{\theta }}) )&= \frac{1}{n(n-1)} \sum \limits _{i = 1}^n \left( \hat{\eta }_{j}( x_i,z_i,{\hat{\theta }}) - \frac{1}{n} \sum \limits _{i = 1}^n \hat{{\bar{\eta }}}_{j}({\hat{\theta }})\right) ^2\nonumber \\&\quad + \,\bigg (\frac{1}{n}\sum \limits _{i = 1}^n\nabla _\theta \hat{\eta }_{j} ( x_i,z_i,{\hat{\theta }}) \bigg )^\prime [I_n({\hat{\theta }})]^{-1}\bigg (\frac{1}{n}\sum \limits _{i = 1}^n\nabla _\theta \hat{\eta }_{j} ( x_i,z_i,{\hat{\theta }}) \bigg ). \end{aligned}$$
(23)

It can be derived similarly that the estimator of \(\text{ var }(\hat{{\bar{\pi }}}_{j} (x_i,z_i,{\hat{\theta }}) )\) is

$$\begin{aligned} {\widehat{\text{ var }}}(\hat{{\bar{\pi }}}_{j} ({\hat{\theta }}) )&= \frac{1}{n(n-1)} \sum \limits _{i = 1}^n \left( \hat{\pi }_{j} (x_{i(-j)},z_{i(-j)},{\hat{\theta }}) - \frac{1}{n} \sum \limits _{i = 1}^n \hat{{\bar{\pi }}}_{j}({\hat{\theta }}) \right) ^2\nonumber \\&\quad +\, \bigg (\frac{1}{n}\sum \limits _{i = 1}^n\nabla _\theta \hat{\pi }_{j} (x_{i(-j)},z_{i(-j)},{\hat{\theta }}) \bigg )^\prime \left[ I_n({\hat{\theta }})\right] ^{-1}\nonumber \\&\quad \times \,\bigg (\frac{1}{n}\sum \limits _{i = 1}^n\nabla _\theta \hat{\pi }_{j} (x_{i(-j)},z_{i(-j)},{\hat{\theta }})\bigg ). \end{aligned}$$
(24)

Variance estimators (19), (20), (23), and (24) involve gradients of the marginal and incremental effects \(\nabla _\theta \eta _j (x_i,z_i,\theta )\) and \(\nabla _\theta \pi _j (x_{i(-j)},z_{i(-j)},\theta )\) that need to be derived specifically for each of the models in Sects. 3 and 4. For the marginalized models, i.e. the MZIP, MZINB, MHP, and MHNB models, the gradients of their marginal effects and incremental effects are

$$\begin{aligned} \nabla _\theta \eta _j (x_i,z_i,\theta )&= \beta _j e^{{x'_i}\beta } \sum \limits _{m = 0}^{J_1} x_{im}u_{(m + 1)} + e^{{x'_i}\beta } u_{(j + 1)},\\ \nabla _\theta \pi _j (x_{i(-j)},z_{i(-j)},\theta )&= \Big [e^{x'_{i(-j)}\beta _{(-j)} + l_2\beta _j} - e^{x'_{i(-j)}\beta _{(-j)} + l_1\beta _j}\Big ] \sum \limits _{m = 0,\ne j}^{J_1} x_{im}u_{(m + 1)} \\&\quad +\, \Big [l_2e^{x'_{i(-j)}\beta _{(-j)} + l_2\beta _j} - l_1 e^{x'_{i(-j)}\beta _{(-j)} + l_1\beta _j}\Big ]u_{(j + 1)}, \end{aligned}$$

where \(u_{(m)}\) is a unit vector with 1 in the mth component and 0 in others. The length of \(u_{(m)}\) is \((J_1+J_2+2)\) for the MZIP and MHP models and is \((J_1+J_2+3)\) for the MZINB and MHNB models. The gradients of the marginal effects and incremental effects for the non-marginalized models, i.e. the ZIP, ZINB, HP, and HNB models, are considerably complex and are reported in “Appendix”.

6 Superiority of marginalized two-part models over non-marginalized two-part models: true or false?

Several previous articles Long et al. (2014), Preisser et al. (2016) promoted the use of marginalized two-part models over the traditional non-marginalized two-part models for the count data with excess zeroes. It has been argued that the marginalized two-part models can provide “direct” marginal inference, which gives an impression that these models are superior to the non-marginalized two-part models. Is this really true?

Marginal inference and interpretation The discussion in Sects. 3 and 4 reveals that both types of models, either marginalized or non-marginalized, can provide marginal inference through marginal effects on the overall mean of count outcomes with excess zeroes. The difference is that estimators and variance estimators of the marginal effects that are derived from the non-marginalized models are a little more complex than the marginalized models. For example, the marginal effect of a covariate \(x_{ij}\) with respect to the overall marginal mean derived from the ZIP and ZINB models is \(\displaystyle \frac{\partial \text{ E }(y_i|x_i,z_i) }{\partial x_{ij}}=\displaystyle \frac{e^{x_i'\beta }}{(1 + e^{{z'_i}\gamma })^2}\{\beta _j+e^{{z'_i}\gamma }(\beta _j-\gamma _j)\}\), whereas this marginal effect is \(\displaystyle \frac{\partial \text{ E }(y_i|x_i,z_i) }{\partial x_{ij}}=e^{x_i'\beta }\) in the MZIP and MZNB models. However, our numerical studies in Sect. 7 show that the numerical implementations of the marginal effect estimators for the two types of models are both convenient and computationally fast.

Furthermore, the argument that the marginalized two-part models can provide direct marginal inference is actually not precise. In health economics and health services research, what is concerned is the marginal effect on the overall mean of a response variable, not on any transformation of the overall mean. Only when a linear predictor is directly connected to the expectation of responses (e.g., \(\text{ E }(y_i|x_i)=x_i'\beta\)) can a direct marginal inference be made through the regression coefficients and the marginal effects \(\displaystyle \frac{\partial \text{ E }(y_i|x_i) }{\partial x_{ij}}=\beta _j\). For the marginalized two-part models, it is obvious that the direct marginal inference can be made only for the logarithmic scale of the overall mean of the response \(\displaystyle \frac{\partial \log \text{ E }(y_i|x_i,z_i) }{\partial x_{ij}}=\beta _j\) but not on the original scale.

Model misspecification and model selection The discussion in Sects. 3 and 4 reveals that the marginalized two-part models possess a linear representation in the logarithmic scale of the overall mean of outcomes but have a non-linear representation in the logarithmic scale of the mean of positive outcomes (in the marginalized hurdle models) or positive outcomes with some zeroes (in the marginalized zero-inflated models). In contrast, the non-marginalized two-part models possess a linear representation in the logarithmic scale of the mean of positive outcomes or positive outcomes with some zeroes but have a non-linear representation on the other side. Therefore, the marginalized or non-marginalized two-part models are indeed two parallel competitors in modelling count data with excess zeroes, and neither type of model is superior to the other. It would be problematic to, by default, believe that the linear representation should be imposed to any side. Model misspecification is always an issue when the assumed model is not true or is not close to the truth. In Sect. 7, we report the simulation studies that we conducted to show the consequences of model misspecification when fitting a marginalized model to the data generated from its non-marginalized counterpart and vice versa. Because of the bias that may be induced by model misspecification, it is recommended that data analysts follow formal model selection procedures to choose between the marginalized and conventional two-part models while making inference with the models. In Sect. 8, three model selection criteria are investigated to examine the performance of each of them in this particular setting.

7 Model misspecification: theories and numerical studies

In this section, we report the results gathered from the simulation studies that were conducted to investigate the impact of model misspecification on marginal effects estimation in the conventional and marginalized two-part models for zero-inflated count data. The investigation on model misspecification was restricted to two scenarios: (1) the underlying true model that generates the simulated data is a conventional two-part model, but the corresponding marginalized two-part model is fit to the data, and (2) the underlying true model is a marginalized two-part model, but the corresponding conventional two-part model is fit.

7.1 Theories on model misspecification

For either a marginalized or a nonmarginalized two-part model, consider the zero-inflated response variable y with its true probability density function g(y). Let \(\{f(y;\theta ), \theta \in \Theta \}\) be a parametric family of probability density functions that may be misspecified for y. White (1982) showed that, under suitable regularity conditions, there exists a \(\theta ^*\in \Theta\) such that the quasi-maximum likelihood estimator \(\hat{\theta }^{(n)} = \displaystyle \mathop {\hbox {argmax}}\limits _{\theta \in \Theta }\frac{1}{n}\sum \limits _{i=1}^{n} \log f(y_i; \theta )\) almost surely converges to \(\theta ^*\), in which \(\theta ^*\) minimizes the Kullback–Leibler distance between g(y) and \(f(y;\theta )\):

$$\begin{aligned} I(g(x):f(y;\theta ))=\text{ E }_g\left\{ \displaystyle \log \frac{g(y)}{f(y;\theta )}\right\} . \end{aligned}$$

In addition, asymptotic normality holds for \(\hat{\theta }^{(n)}\) as

$$\begin{aligned} \sqrt{n}(\hat{\theta }^{(n)}-\theta ^*) \mathop {\longrightarrow }\limits ^{\mathrm{a.s.}} N(0,V(\theta ^*)) \end{aligned}$$

and \(V_n(\hat{\theta }^{(n)}) \mathop {\longrightarrow }\limits ^{\mathrm{a.s.}} V(\theta ^*)\), where \(V_n(\hat{\theta }^{(n)})=A_n^{-1}(\hat{\theta }^{(n)})\,B_n(\hat{\theta }^{(n)})A_n(\hat{\theta }^{(n)}),~ V(\theta ^*) = A^{-1}(\theta ^*) B(\theta ^*) A(\theta ^*),\) and

$$\begin{aligned} A_n(\theta ) = \left\{ \frac{1}{n}\sum \limits _{i=1}^{n}\frac{\partial ^2\log f(y_i;\theta )}{\partial \theta _k\partial \theta _l}\right\} ,\quad&B_n(\theta ) = \left\{ \frac{1}{n}\sum \limits _{i=1}^{n}\frac{\partial \log f(y_i;\theta )}{\partial \theta _k}\frac{\partial \log f(y_i;\theta )}{\partial \theta _l} \right\} ,\\ A(\theta ) =E\left( \left\{ \frac{\partial ^2\log f(y;\theta ) }{\partial \theta _k\partial \theta _l} \right\} \right) ,\quad&B(\theta ) = E\left( \left\{ \frac{\partial \log f(y,\theta )}{\partial \theta _k}\frac{\partial \log f(y;\theta )}{\partial \theta _l} \right\} \right) . \end{aligned}$$

If either a marginalized or a non-marginalized two-part model is correctly specified (i.e., its corresponding counterpart is misspecified), there exists a \(\theta ^{(0)}\in \Theta\) such that \(f(y;\theta ^{(0)})=g(y)\) and the quasi-maximum likelihood estimator becomes the maximum likelihood estimator and \(\theta ^*=\theta ^{(0)}\) with the inverse of the Fisher’s information matrix as the asymptotic covariance matrix estimator. When the model is misspecified, the standard errors for \(\hat{\theta }^{(n)}\) should be obtained from the sandwich estimator \(V_n(\hat{\theta }^{(n)})\).

7.2 True ZIP models versus misspecified MZIP models

In the first simulation study, a total number of 500 data sets with three sample sizes \(n=100\), 500, and 1000 were generated from the ZIP model, in which the linear predictors were specified as \(\ln (\mu _i)={\beta _0+x_{i1}\beta _1+x_{i2}\beta _2}\) and \({\mathrm{logit}}(\psi _{i})={\gamma _0+z_{i1}\gamma _1+z_{i2}\gamma _2}\) with one continuous covariate \(x_{i1}=z_{i1}\sim N(0,1)\) and one binary covariate \(x_{i2}=z_{i2}\sim {\mathrm{Bernoulli}}(0.5)\). When generating the simulated data sets, five combinations of \(\beta =(\beta _0,\beta _1,\beta _2)'\) (see Table 1 for details of the combinations) were considered such that the average values of \(\mu _i\)’s range from approximately 4–15. We fixed \(\gamma =(\gamma _0, \gamma _1, \gamma _2)=(0.5,1,-\,1)\) to maintain the average value of \(\psi _i\)’s (i.e., the average percentage of structural zeroes) to be around the intermediate value of \(50\%\). Both the ZIP and MZIP models were then fit to each of the simulated data sets, and the true average marginal effect \(\bar{\eta }_1(\theta )\) of \(x_1\) and the true average incremental effects \(\bar{\pi }_2(\theta )\) of \(x_2\) given by the two models, as well as their estimates \(\hat{\bar{\eta }}_1({\hat{\theta }})\) and \(\hat{\bar{\pi }}_2({\hat{\theta }})\), were calculated.

Table 1 True values, estimates, standard deviations (SD), standard errors (SE), and biases of the average marginal effects and average incremental effects for the true ZIP models and misspecified MZIP models
Table 2 True values, estimates, standard deviations (SD), standard errors (SE), and biases of the average marginal effects and average incremental effects for the true MZIP models and misspecified ZIP models

Table 1 reports the true value of the average marginal and incremental effects, the mean and standard deviation of effect estimates, and the mean of SEs given by the ZIP and MZIP models from fitting the 500 simulated data sets in three sample sizes. The results in Table 1 demonstrate that the estimated average marginal and incremental effects obtained from the underlying true model, the ZIP model, are unbiased across all combinations of \(\beta =(\beta _0,\beta _1,\beta _2)'\) and sample sizes. The finite-sample bias of the estimates of average marginal and incremental effects given by the misspecified MZIP model is larger than the one given by the true model, though the bias usually does not exceed two times of standard errors. The simulation results in Table 1 also show that, for both models, the average SE of the average marginal and incremental effects is close to the corresponding standard deviation. In addition, the average SE evidently shrinks as the sample size increases from 100, 500, to 1000. This piece of evidence verifies that the variance estimation procedure, which we derived in Sect. 5 based upon the asymptotic properties of marginal and incremental effects, are valid for the finite samples.

7.3 True MZIP models versus misspecified ZIP models

The investigational plan of the remaining three simulation studies is comparable to the first study in Sect. 7.2. In the second simulation study, 500 data sets were produced with sample sizes \(n=100\), 500, and 1000 from the MZIP model by specifying the linear predictors as \(\ln (v_i)={\beta _0+x_{i1}\beta _1+x_{i2}\beta _2}\) and \({\mathrm{logit}}(\psi _{i})={\gamma _0+z_{i1}\gamma _1+z_{i2}\gamma _2}\) with \(x_{i1}=z_{i1}\sim N(0,1)\) and \(x_{i2}=z_{i2}\sim {\mathrm{Bernoulli}}(0.5)\). Note that \(\ln (v_i)={\beta _0+x_{i1}\beta _1+x_{i2}\beta _2}\) is the linear predictor for marginal expectation of the counts including zeroes, instead of positive counts. As such, the expectation of positive counts satisfies \(\mu _i=v_i/(1-\psi _i) =e^{{\beta _0+x_{i1}\beta _1+x_{i2}\beta _2}} (1+e^{{\gamma _0+z_{i1}\gamma _1+z_{i2}\gamma _2}})\). Five combinations of \(\beta =(\beta _0,\beta _1,\beta _2)'\) (see Table 2) were used for data generation, and the averages of the resulting \(\mu _i\)’s range from approximately 3 to 11. The parameter \(\gamma =(\gamma _0, \gamma _1, \gamma _2)=(0.5,1,-\,1)\) is also fixed yielding an average of \(\psi _i\)’s being around the intermediate value of \(50\%\). The ZIP and MZIP models were subsequently fit to each of the simulated data sets, and the true average marginal effects \(\bar{\eta }_1(\theta )\) of \(x_1\) and true average incremental effects \(\bar{\pi }_2(\theta )\) of \(x_2\), as well as their estimates \(\hat{\bar{\eta }}_1({\hat{\theta }})\) and \(\hat{\bar{\pi }}_2({\hat{\theta }})\), were computed.

Table 2 reports the true value of the average marginal and incremental effects, the mean and standard deviation of their estimates, and the mean of SEs given by the ZIP and MZIP models from fitting the simulation data sets. In Table 2, the estimated average marginal and incremental effects obtained from the underlying true model, the MZIP model, are still unbiased as expected. The misspecified ZIP model across all combinations of \(\beta =(\beta _0,\beta _1,\beta _2)'\) and the three sample sizes provides the effect estimates with larger bias. For both models, the average SE of the average marginal and incremental effects is close to the corresponding standard deviations and decreases as the sample size increases from 100, 500, to 1000, which verified the validity of the variance estimation procedure in Sect. 5 for the finite samples.

7.4 True HNB models versus misspecified MHNB models

In the third simulation study, 500 data sets with sample sizes \(n=100\), 500, and 1000 were simulated from the HNB model with \(\ln (\mu _i)={\beta _0+x_{i1}\beta _1+x_{i2}\beta _2}\) and \({\mathrm{logit}}(\psi _{i})={\gamma _0+z_{i1}\gamma _1+z_{i2}\gamma _2}\), in which \(x_{i1}=z_{i1}\sim N(0,1)\) and \(x_{i2}=z_{i2}\sim {\mathrm{Bernoulli}}(0.5)\). Five combinations of \(\beta =(\beta _0,\beta _1,\beta _2)'\) (see Table 3) were examined, such that the averages of \(\mu _i\)’s vary from approximately 3 to 11. The fixed parameter \(\gamma =(\gamma _0, \gamma _1, \gamma _2)=(0.5,1,-1)\) provides an average of \(\psi _i\)’s being around \(50\%\). The scale parameter was fixed at \(\alpha =1.5\) for all data sets. After data generation, the HNB and MHNB models were fit to each of the simulated data sets. We calculated for each model the true average marginal effect \(\bar{\eta }_1(\theta )\) of \(x_1\) and true average incremental effects \(\bar{\pi }_2(\theta )\) of \(x_2\), as well as their estimates \(\hat{\bar{\eta }}_1({\hat{\theta }})\) and \(\hat{\bar{\pi }}_2({\hat{\theta }})\). The simulation results were reported in Table 3. Evidently, effect estimates from the true HNB model have smaller bias than the ones from the misspecified MHNB model.

Table 3 True values, estimates, standard deviations (SD), standard errors (SE), and biases of the average marginal effects and average incremental effects for the true HNB models and misspecified MHNB models

7.5 True MHNB models versus misspecified HNB models

In the fourth simulation study, we produced 500 data sets with sample sizes \(n=100\), 500, and 1000 from the MHNB model using the linear predictors as \(\ln (v_i)={\beta _0+x_{i1}\beta _1+x_{i2}\beta _2}\) and \({\mathrm{logit}}(\psi _{i})={\gamma _0+z_{i1}\gamma _1+z_{i2}\gamma _2}\) with \(x_{i1}=z_{i1}\sim N(0,1)\) and \(x_{i2}=z_{i2}\sim {\mathrm{Bernoulli}}(0.5)\). We still considered five combinations of \(\beta =(\beta _0,\beta _1,\beta _2)'\) (see Table 4) yielding the averages of \(\mu _i\)’s varying from approximately 4–13. As in the third simulation study, \(\gamma =(\gamma _0, \gamma _1, \gamma _2)=(0.5,1,-1)\) and \(\alpha =1.5\) were fixed when generating simulation data. Then, the MHNB and HNB models were fit to each of simulated data sets. Table 4 presents the results of the average marginal and incremental effects in terms of the true value, the mean and standard deviation of their estimates, and the mean of SEs given by the true and misspecified models. It is observed that the behavior of the effect estimates and the SEs of average marginal and incremental effects is as same as in the previous simulation studies.

Table 4 True values, estimates, standard deviations (SD), standard errors (SE), and biases of the average marginal effects and average incremental effects for the true MHNB models and misspecified HNB models

The conclusion from the four back-to-back simulation studies is straightforward. No matter which type of model, the marginalized or conventional two-part model, is fit to the data, the estimates of marginal effects will be biased as long as the model is misspecified. The marginalized two-part models do not have any advantage to reduce such type of bias in estimating marginal effects of a covariate with respect to the expected outcomes.

7.6 Robustness

The results from the above numerical studies are consistent with the presented theories in Sect. 7.1, in that the misspecified models have larger bias than the true models in the maximum likelihood estimation. Cross-comparison of the estimation biases produced by the misspecified ZIP and MZIP models reveals that the misspecified ZIP models induce smaller biases than the misspecified MZIP models (see the results on biases in Tables 12). This indicates that the maximum likelihood estimators of the MZIP models are less robust to model misspecification than the maximum likelihood estimators given by the ZIP models, which would be even worse if compared with Poisson models (Staub and Winkelmann 2013). The results on biases show that there is not significant difference in robustness on the maximum likelihood estimators given by the HNB and MHNB models with respect to model misspecification (see the results on biases Tables 34).

8 Model selection via marginal effects

When the primary interest of data analysis lies in estimating the (average) marginal or incremental effect of a covariate with respect to the expected outcomes, the empirical mean square error (MSE) criterion (Dow and Norton 2003; Madden 2008) can be used for selecting the best model among the candidate models to estimate the effects. Suppose the goal of data analysis is to precisely estimate an incremental effect \(\pi _j(x_{i(-j)},z_{i(-j)},\theta )\) or an average incremental effect \(\bar{\pi }_j(\theta )\) subject to a change of \(x_j\). The MSE of an effect estimator \(\hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\) or \(\hat{\bar{\pi }}_j({\hat{\theta }})\) is equal to the variance of the estimators plus the square of its bias:

$$\begin{aligned} \begin{array}{rl} {\mathrm{MSE}}\left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\right] & = \text{ var }\left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\right] + \text {Bias}^2\left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\right] \\ & = \text{ var }\left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\right] \\ &\quad +\, \left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })-{\pi }_j(x_{i(-j)},z_{i(-j)},{\theta })\right] ^2 \end{array} \end{aligned}$$
(25)

and

$$\begin{aligned} \begin{array}{rl} {\mathrm{MSE}}\left[ \hat{\bar{\pi }}_j(\hat{\theta })\right] & = \text{ var }\left[ \hat{\bar{\pi }}_j(\hat{\theta })\right] + \text {Bias}^2\left[ \hat{\bar{\pi }}_j(\hat{\theta })\right] \\ & = \text{ var }\left[ \hat{\bar{\pi }}_j(\hat{\theta })\right] + \left[ \hat{\bar{\pi }}_j(\hat{\theta })-\bar{\pi }_j(\theta )\right] ^2. \end{array} \end{aligned}$$
(26)

The MSE criterion selects the candidate model with the minimum MSE as the best model to estimate the corresponding marginal or incremental effect. Because in practice the true effects \(\pi _j(x_{i(-j)},z_{i(-j)},\theta )\) and \(\bar{\pi }_j(\theta )\) are unknown in (25) and (26), the empirical MSEs (EMSEs)

$$\begin{aligned} \begin{array}{rl} {\mathrm{EMSE}}\left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\right] & = \text{ var }\left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })\right] \\ &\quad +\, \left[ \hat{\pi }_j(x_{i(-j)},z_{i(-j)},\hat{\theta })-\hat{\pi }_j^{c}(x_{i(-j)},z_{i(-j)},\hat{\theta }^{c})\right] ^2 \end{array} \end{aligned}$$
(27)

and

$$\begin{aligned} \begin{array}{rl} {\mathrm{EMSE}}\left[ \hat{\bar{\pi }}_j(\hat{\theta })\right]&= \text{ var }\left[ \hat{\bar{\pi }}_j(\hat{\theta })\right] + \left[ \hat{\bar{\pi }}_j(\hat{\theta })-\hat{\bar{\pi }}_k^c(\hat{\theta }^c)\right] ^2. \end{array} \end{aligned}$$
(28)

are used to accomplish the mission of model selection. In practice, the true effect in (27) and (28) is replaced by the estimated effect \(\hat{\pi }_j^{c}(x_{i(-j)},z_{i(-j)},\hat{\theta }^{c})\) or \(\hat{\bar{\pi }}_k^c(\hat{\theta }^c)\) from a pre-specified model. Dow and Norton (2003) illustrated the use of the MSE criterion, through a Monte Carlo example, for selecting between sample selection models and two-part models for corner solutions in semicontinuous data. This MSE criterion was referred as “an empirical MSE test” by Dow and Norton (2003). The competitors of the MSE criterion include the information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), and the Vuong’s closeness test. The Vuong’s closeness test (Vuong 1989), or the Vuong’s test, is a likelihood-ratio-based test for examining whether two non-nested models are equally close to the true data generating process. The Vuong’s test statistic is \(V=\sqrt{n}({\bar{m}}/s_m)\), in which \({\bar{m}}=\displaystyle \frac{1}{n}\sum _{i=1}^n m_i\), \(s_m=\displaystyle \frac{1}{n}\sum _{i=1}^n (m_i-{\bar{m}})^2\), and \(m_i=\ell _{i,0}(\theta )-\ell _{i,1}(\theta )\) is the ith term for observation i in the log-likelihood \(\ell _{0}(\theta )\) under the null hypothesis model minus the ith term in the log-likelihood \(\ell _{1}(\theta )\) under the alternative model. The Vuong’s test statistic asymptotically follows a standard normal distribution under the hypothesis that the two models are equivalent, and a standard Z-test is then applied.

To examine the performance of the above model selection criteria, subsequent numerical studies were conducted following the simulation studies in Sect. 7. In each of the outlined simulation studies in Sect. 7, we collected the observed likelihood values and calculated the AIC and BIC values of the pair of true and misspecified models that were fit to each of the 500 simulated data sets. The best model was then selected based upon the AIC and BIC criteria for each pair of models. Note that the AIC and BIC values are identical through the four simulation studies, because the investigated true and misspecified models had the same number of unknown parameters. With the collected observed likelihood values, the Vuong’s test with a significance level of \(5\%\) was also conducted for each pair of true and misspecified models with the misspecified model in the null hypothesis. The EMSEs of the average marginal and incremental effects in each pair of the models were calculated, in which the true effect estimates were taken to calculate the bias term. Tables 5, 6, 7 and 8 report the rates of selecting the true model over the misspecified model given by the AIC, BIC, and EMSE criteria among 500 simulation data sets. The tables also report rejection rates given by the Vuong’s test for each of the combinations of parameters and sample sizes. Among these criteria, the selection rates of AIC and BIC are the highest for most cases. The rates of selecting a true model given by AIC and BIC are mostly over 90% when the sample size \(n=1000\) and more than 80% for \(n=500\). Even for \(n=100\), these rates are usually higher than 60%. There is not a clear trend of increase for the select rates of the EMSE criterion along with the increase of sample size. These rates mostly range from about 50–90%, but can be as low as \(22\%\) when the sample size is small. These rates in general do not exhibit a reliable pattern but acceptable. The Vuong’s test rejection rates do not perform well, especially for the sample sizes \(n=100\) and \(n=500\), but the rates do increase with the growth of sample size. The simulation studies show that the information criteria are reliable in distinguishing between the standard and misspecified two-part models for count data with excess zeroes, and the Vuong’s test cannot differentiate the models if the sample size is not large enough. The MSE criterion might be acceptable to be effect-specific model selection approach; however, it should be noted that in practice, its performance can be dramatically influenced by the hypothesis of which model is consistent and therefore can be used to calculate the bias term.

Table 5 Rates of selecting the true ZIP model over the misspecified MZIP model given by AIC/BIC and EMSE and rejection rates given by Vuong’s test
Table 6 Rates of selecting the true MZIP model over the misspecified ZIP model given by AIC/BIC and EMSE and rejection rates given by Vuong’s test
Table 7 Rates of selecting the true HNB model over the misspecified MHNB model given by AIC/BIC and EMSE and rejection rates given by Vuong’s test
Table 8 Rates of selecting the true MHNB model over the misspecified HNB model given by AIC/BIC and EMSE and rejection rates given by Vuong’s test

9 Application

The German Socioeconomic Panel (GSOEP) data (1984–1995) (Riphahn et al.2003) are used for empirical analysis with the four conventional two-part models and their corresponding marginalized models discussed in Sects. 3 and 4. The data were collected based on annual face-to-face individual or computer-assisted personal interviews with household members aged 16 or over living in Germany for comprehensive information to measure stability and change in living conditions Frick (2006).

The pooled subsample of the GSOEP data (1984–1994) includes 7293 German citizens, aged 26 through 65 Riphahn et al. (2003). After removing missing values, the subsample only includes years 1984–1988, 1991, and 1994 with 14,243 male observations and 13,083 female observations. The dependent variable is the number of doctor visits in the last 3 months right before the survey with \(37.09\%\) observations as zero and the mean across the whole sample is 3.18 with a standard deviation of 5.69. One key independent variable is the public indicator which divides people into the group mandatorily insured by the public insurance and the group voluntarily with the proportions of 88.57 versus 14.33%. Among those with coverage of public insurance, about 2.12% purchased add-on insurance which takes up \(1.88\%\) of the whole data. The add-on insurance indicator is another key covariate. The age and degree of health satisfaction (using integer scales 0–10 meaning bad to well) are the only two continuous covariates. All other independent variables including gender and years are converted to dummy variables.

9.1 Statistical modelling

In our statistical modelling, all independent variables are considered in both parts, that is,

$$\begin{aligned} x_i=z_i=\{&\text {female, age, health, public insurance, add-on insurance,} \\&\text {year1985, year1986, year1987, year1988, year1991, year1994}\}, \end{aligned}$$

and the linear predictiors \(x_i'\beta\) and \(z_i'\gamma\) and the link functions are specified as in Sections 3 and 4. However, the estimates of \(\beta\) and \(\gamma\) have different interpretations. The models involving negative binomial models contain an extra scale parameter \(\alpha\). All models have explicit log-likelihood functions except for the MHP and MHNB models. Their log-likelihood functions (15) and (17) are subject to nonlinear constraints (16) and (18). We used Newton-Raphson method for solving \(\mu _i\) from these constraints at every iteration of maximizing the log-likelihood functions.

After fitting the data with the two-part models, we collected AIC, BIC, \(\hat{\bar{\pi }}_{\mathrm{public}}\), \(\hat{\bar{\pi }}_{\text {add-on}}\), and the EMSE values of the two effects estimates, and conducted the Vuong’s test for each pair of models. Regarding the EMSE values of \(\hat{\bar{\pi }}_{\mathrm{public}}\) and \(\hat{\bar{\pi }}_{\text {add-on}}\), a pre-specified model must be selected for the true average incremental effect in (28), whereas the true model for real data is unpredicted, implying a parameter estimate consistency issue. Hence, for the purpose of comparison, both the conventional model and the corresponding marginalized model in each pair are selected as the pre-specified model and their estimated effects are used in (28), respectively.

9.2 Empirical results

Table 9 Parameter estimates (SEs) given by the non-marginalized and marginalized two-part models from fitting to the GSOEP data

Table 9 presents the results from fitting the two-part models. In general, all models provide positive and significant estimates \(\hat{\beta }_{\text {public}}\) varying from 0.136 to 0.208, indicating that the participants covered by public insurance see doctors more frequently than private insurance cohort on a regular basis (for the zero-inflated models), or on need (for the hurdle models) or for the whole population (for the marginalized models). The MZINB model presents a non-significant negative estimate \(\hat{\gamma }_{\text {public}}\) (\(-\,0.178\)), while other models show significant negative estimates. This suggests that, under the MZINB models, there is not much difference of no regular doctor visits between public and private insurance cohorts, whereas under other models, there are substantial chances that public insurance cohort see doctors more regularly.

Coefficient estimates \(\hat{\beta }_{\text {add-on}}\) and \(\hat{\gamma }_{\text {add-on}}\) for add-on insurance are all negative but more diverse than for public insurance. Estimates of \(\hat{\beta }_{\text {add-on}}\) are significant for the ZIP, HP and HNB models with larger magnitudes than the nonsignificant estimates for the MZIP, ZINB, MZINB, MHP and MHNB models. In terms of \(\hat{\gamma }_{\text {add-on}}\), the MZIP, HP, MHP, and HNB models give significant estimates with magnitudes ranging from 0.180 to 0.308; whereas the ZIP, ZINB, MZINB, and MHNB models show non-significant estimates with magnitudes varying from 0.073 to 0.724.

Table 10 Average incremental effect estimates (SEs, p values), the AIC, BIC, and EMSE values, and the Vuong’s Z-statistics (p values) given by the non-marginalized and marginalized two-part models from fitting to the GSOEP data

Table 10 compares results from these models. All models provide positive incremental effect estimate \(\hat{\bar{\pi }}_{\text {public}}\) and negative estimate \(\hat{\bar{\pi }}_{\text {add-on}}\). The comparison of AIC and BIC support the HNB and MHNB models with smaller AIC and BIC values and the HNB model carries the smallest. Based on Vuong’s test, the two models are significantly different in modelling the GSOEP data, indicating that the HNB model is the best one among these models for the GSEOP subsample. In terms of EMSEs of incremental effect estimates of public insurance and add-on insurance, we use notations of \(\hbox {EMSE}_{\text {C}}\) and \(\hbox {EMSE}_{\text {M}}\) for the EMSE values calculated based on the conventional and the marginalized models in each pair as the pre-specified model, respectively. The results show that the HNB model has smaller \(\hbox {EMSE}_{\text {C}}\) and \(\hbox {EMSE}_{\text {M}}\) values of \(\hat{\bar{\pi }}_{\text {public}}\) than the MHNB model; however, its \(\hbox {EMSE}_{\text {C}}\) and \(\hbox {EMSE}_{\text {M}}\) values of \(\hat{\bar{\pi }}_{\text {add-on}}\) are larger than the MHNB model due to the large SE of \(\hat{\bar{\pi }}_{\text {add-on}}\). Regarding the incremental effect estimates \(\hat{\bar{\pi }}_{\text {public}}\) and \(\hat{\bar{\pi }}_{\text {add-on}}\), the results show that both effects are not significant to the overall healthcare utilization in terms of number of physician visits under both the HNB and MHNB models whereas the related parameter estimates are different stories, which seems to be a surprise to the initial motivation of the proposal of marginalized two-part models.

10 Conclusion

This article reviews four two-part models for cross-sectional count data with abundant zeroes (the ZIP, ZINB, HP, and HNB models) and two marginalized models (the MZIP and MZINB models) and proposes two other models (the MHP and MHNB models). We argue that the facility of marginalization of two-part models cannot be taken as a reason to choose marginalized models over the non-marginalized models to fit such data. Instead, appropriate model selection procedure should be followed to find the best model. In this article, we derive estimates and variance estimates of the (average) marginal effects and (average) incremental effects of covariates with respect to the overall mean outcomes for these two-part models. The average effect estimates given by the true models are unbiased in the simulation studies, and the irregular bias of average effect estimates given by the misspecified models is observed. Two pairs of non-marginalized and marginalized models are compared by using three model selection criteria in the simulation studies. The results confirm the reliability of the AIC and BIC criterion. In summary, despite marginalized two-part models can help in estimating overall marginal effects of covariates on the transformed expectation of count outcomes, this advantage should not be over-emphasized. Otherwise, model misspecification may lead to inaccurate statistical inference. When the two-part models include a large number of covariates, penalized maximum likelihood methods such as the least absolute shrinkage and selection operator, smoothly clipped absolute deviation (SCAD), or minimax concave penalty (MCP) are recommended to be applied to conduct variable selection. It has been shown that these methods can provide comparable estimation, but are more robust than the traditional stepwise variable selection in terms of variable selection (Wang et al. 2015).