1 INTRODUCTION

The Conway–Maxwell–Poisson (CMP) distribution has been introduced in [9] as an extension of the common Poisson model, which allows to describe under- and overdispersion in count data. In the last two decades, its flexibility and usefulness has been pointed out and, meanwhile, there is a variety of articles in the literature dealing with CMP distributions and their properties including, among others, distribution theory, inference, regression models, time series, tree-based models, Bayesian methods, multivariate extensions, and applications; see, for instance, [1, 2, 4–8, 10–12, 16–22].

The counting density of the CMP distribution \(P_{\boldsymbol{\vartheta}}\) is given by

$$f_{\boldsymbol{\vartheta}}(x)=C(\boldsymbol{\vartheta})\frac{\lambda^{x}}{x!^{\nu}},\quad x\in{\mathbb{N}}_{0}=\{0,1,2,\dots\}$$
(1)

with normalizing constant

$$C(\boldsymbol{\vartheta})=\left(\sum_{k=0}^{\infty}\frac{\lambda^{k}}{k!^{\nu}}\right)^{-1}$$
(2)

for \(\boldsymbol{\vartheta}=(\lambda,\nu)\in\Theta\) with parameter set

$$\Theta=(0,\infty)^{2}\cup[(0,1)\times\{0\}].$$

Geometric distributions, Poisson distributions, and Bernoulli distributions are contained in the model and result by setting \(\nu=0\), 1, and by sending \(\nu\) to infinity. A CMP distribution has variance not smaller than its mean for \(\nu<1\) (overdispersion) and variance not larger than its mean for \(\nu>1\) (underdispersion); see, e.g., [13, Sect. 4.2; 10, Subsect. 2.3.1]. In case \(\nu=1\), i.e., for Poisson distributions, we have equality, of course.

In this article two characterization results are provided related to maximum likelihood (ML) estimation of the CMP parameter \(\boldsymbol{\vartheta}\). In a previous article by the authors, a sufficient condition for the non-existence of the ML estimate has been derived, namely that the range of observations is less than two; see [5]. Here, we point out that the condition is also necessary (Section 2). Moreover, in case of existence, a simple necessary and sufficient condition for the ML estimate to be a solution of the likelihood equation is derived (Section 3). If this condition is not met, the ML estimate lies on the boundary of \(\Theta\). Finally, a simulation study is performed to assess the accuracy of the ML estimator given the range of observations (Section 4).

2 A CHARACTERIZATION RESULT ON THE EXISTENCE OF THE ML ESTIMATE

Let \(X_{1},\dots,X_{n}\) be independent and identically distributed random variables with distribution \(P_{\boldsymbol{\vartheta}}\) and counting density \(f_{\boldsymbol{\vartheta}}\) given by formula (1), and let \(x_{1},\dots,x_{n}\in{\mathbb{N}}_{0}\) be realizations of \(X_{1},\dots,X_{n}\). Moreover, let \(x_{(1)}\leq\ldots\leq x_{(n)}\) denote the realizations in ascending order. Furthermore, we introduce the statistic

$$\mathbf{T}(x)=(T_{1}(x),T_{2}(x))=(x,-\ln(x!)),\quad x\in{\mathbb{N}}_{0},$$

and denote the convex support of \(\mathbf{T}\), i.e., the closed convex hull of the support of \(\mathbf{T}\), by \(M\). According to Lemma 2.3 in [5], we have the representation

$$M=\{(y+\alpha,z)\in{\mathbb{R}}^{2}:y\in{\mathbb{N}}_{0},\ \alpha\in[0,1),\ z\leq\tilde{T}_{2}(y,\alpha)\},$$

where

$$\tilde{T}_{2}(y,\alpha)=\alpha T_{2}(y+1)+(1-\alpha)T_{2}(y),\quad y\in{\mathbb{N}}_{0},\quad\alpha\in[0,1).$$

Theorem 2.1. An ML estimate of \(\boldsymbol{\vartheta}\) based on \(x_{1},\dots,x_{n}\) exists if and only if \(x_{(n)}-x_{(1)}\geq 2\), i.e., if the range of all observations is at least two. In case of existence, the ML estimate is uniquely determined.

Proof. As shown in [5, 15], the set \(\{P_{\boldsymbol{\vartheta}}^{(n)}:\boldsymbol{\vartheta}\in\Theta\}\) of \(n\)-fold product measures of \(P_{\boldsymbol{\vartheta}}\), \(\boldsymbol{\vartheta}\in\Theta\), forms a full exponential family with canonical parameter \(\boldsymbol{\zeta}=(\ln(\lambda),\nu)\) and minimal sufficient statistic \(\mathbf{T}^{(n)}(\mathbf{x})=\sum_{i=1}^{n}\mathbf{T}(x_{i})\) for \(\mathbf{x}=(x_{1},\dots,x_{n})\in{\mathbb{N}}_{0}^{n}\). Applying Theorem 9.13 in [3, p. 151] to the corresponding exponential family consisting of the distributions of \(\mathbf{T}^{(n)}\), an ML estimate of \(\boldsymbol{\zeta}\) (and then also of \(\boldsymbol{\vartheta}\)) based on \(\mathbf{x}\in{\mathbb{N}}_{0}^{n}\) exists and is then unique if and only if \(\mathbf{T}^{(n)}(\mathbf{x})\) lies in the interior of the convex support of \(\mathbf{T}^{(n)}\) or, equivalently, if \(\sum_{i=1}^{n}\mathbf{T}(x_{i})/n\) lies in the interior of \(M\). The latter condition, in turn, is equivalent to \(x_{(n)}-x_{(1)}\geq 2\) by [5, Lemma 3.2]. \(\Box\)

Theorem 2.1 improves upon a former result in [5, Theorem 3.3 and Corollary 3.5] stating that \(x_{(n)}-x_{(1)}\geq 2\) is a necessary condition for the existence of the ML estimate.

In case of existence, the unique ML estimate \(\hat{\boldsymbol{\vartheta}}=(\hat{\lambda},\hat{\nu})\), say, of \(\boldsymbol{\vartheta}\) is either the unique solution of the likelihood equation

$$\boldsymbol{\pi}(\boldsymbol{\vartheta})=\frac{1}{n}\sum_{i=1}^{n}\mathbf{T}(x_{i})$$
(3)

with mapping \(\boldsymbol{\pi}=(\pi_{1},\pi_{2}):\textrm{int}(\Theta)\rightarrow\textrm{int}(M)\) defined by

$$\boldsymbol{\pi}(\boldsymbol{\vartheta})=(\pi_{1}(\boldsymbol{\vartheta}),\pi_{2}(\boldsymbol{\vartheta})){}{}=C(\boldsymbol{\vartheta})\left(\sum_{k=0}^{\infty}\frac{k\lambda^{k}}{k!^{\nu}},-\sum_{k=0}^{\infty}\frac{\ln(k!)\lambda^{k}}{k!^{\nu}}\right),$$
$$\boldsymbol{\vartheta}=(\lambda,\nu)\in(0,\infty)^{2}$$
(4)

or lies on the boundary of \(\Theta\) and, hence, corresponds to a geometric distribution; see [5] for details. In the latter case, we necessarily have \(\hat{\boldsymbol{\vartheta}}=(\overline{x}_{n}/(1+\overline{x}_{n}),0)\) with \(\overline{x}_{n}=\sum_{i=1}^{n}x_{i}/n\), since \(\overline{x}_{n}/(1+\overline{x}_{n})\) is the unique ML estimate of \(\lambda\) in the subfamily of geometric distributions. The image of \(\boldsymbol{\pi}\) is of complicated analytic form, but its graphical representation may be useful to decide between the cases; see [5].

3 CHARACTERIZING THE EXISTENCE OF A SOLUTION OF THE LIKELIHOOD EQUATION

In the following, we derive an analytic criterion to decide whether, in case of existence, the ML estimate lies in the interior \(\textrm{int}(\Theta)=(0,\infty)^{2}\) of \(\Theta\) and can thus be obtained as a solution of the likelihood equation or lies on the boundary \(\{(\lambda,0):\lambda\in(0,1)\}\) of \(\Theta\). For this, note that the mapping \(\boldsymbol{\pi}:\textrm{int}(\Theta)\rightarrow\boldsymbol{\pi}(\textrm{int}(\Theta))\) is bijective and continuously differentiable; see [5], Section 2. \(\boldsymbol{\pi}\) therefore possesses a continuous inverse function \(\boldsymbol{\pi}^{-1}:\boldsymbol{\pi}(\textrm{int}(\Theta))\mapsto\textrm{int}(\Theta)\). In particular, we have that \(\boldsymbol{\pi}(\textrm{int}(\Theta))\) is open as the pre-image of \(\textrm{int}(\Theta)\) under \(\boldsymbol{\pi}^{-1}\). For our purposes, it will be convenient to extend the domain of \(\boldsymbol{\pi}\) from \(\textrm{int}(\Theta)\) to \(\Theta\) according to formula (4).

First, some preliminary results related to the behavior of \(\boldsymbol{\pi}\) on the boundary of \(\Theta\) are stated within several lemmas.

Lemma 3.1. Let \(\boldsymbol{\vartheta}\in\Theta\) be a boundary point of \(\Theta\) and \(\boldsymbol{\vartheta}_{j}\in\textrm{int}(\Theta)\), \(j\in{\mathbb{N}}\), be a sequence in the interior of \(\Theta\) with \(\boldsymbol{\vartheta}_{j}\rightarrow\boldsymbol{\vartheta}\) for \(j\rightarrow\infty\). Then, \(C(\boldsymbol{\vartheta}_{j})\rightarrow C(\boldsymbol{\vartheta})\) and \(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\rightarrow\boldsymbol{\pi}(\boldsymbol{\vartheta})\) for \(j\rightarrow\infty\).

Proof. Let \(\boldsymbol{\vartheta}\in\Theta\) be some boundary point of \(\Theta\), i.e., \(\boldsymbol{\vartheta}=(\lambda,0)\) for some \(\lambda\in(0,1)\), and let \(\boldsymbol{\vartheta}_{j}=(\lambda_{j},\nu_{j})\in\textrm{int}(\Theta)\) with \(\boldsymbol{\vartheta}_{j}\rightarrow(\lambda,0)\) for \(j\rightarrow\infty\). Moreover, let \(\varepsilon>0\) be such that \(\lambda+2\varepsilon<1\). For \(j\in{\mathbb{N}}\), let \(g_{j}(k)=\lambda_{j}^{k}/k!^{\nu_{j}}\), \(k\in{\mathbb{N}}_{0}\). Furthermore, let \(\mu\) denote the counting measure on \({\mathbb{N}}_{0}\). Then, there exists \(j_{0}\in{\mathbb{N}}\) with \(g_{j}(k)\leq g(k)=(\lambda+\varepsilon)^{k}\), \(k\in{\mathbb{N}}_{0}\), for all \(j\geq j_{0}\), where \(g\) is \(\mu\)-integrable. Hence, the dominated convergence theorem yields

$$\sum_{k=0}^{\infty}\frac{\lambda_{j}^{k}}{k!^{\nu_{j}}}=\int g_{j}d\mu\rightarrow\int\limits_{{\mathbb{N}}_{0}}\lambda^{k}d\mu(k)=\frac{1}{1-\lambda}$$

and thus \(C(\boldsymbol{\vartheta}_{j})\rightarrow C(\boldsymbol{\vartheta})\) for \(j\rightarrow\infty\); see formula (2).

Next, we define for \(j\in{\mathbb{N}}\) the functions \(h_{j}(k)=kg_{j}(k)\) and \(\tilde{h}_{j}(k)=\ln(k!)g_{j}(k)\) for \(k\in{\mathbb{N}}_{0}\). Then, for \(j\geq j_{0}\), we have that \(h_{j}(k)\leq h(k)=k(\lambda+\varepsilon)^{k}\) and \(\tilde{h}_{j}(k)\leq\tilde{h}(k)=\ln(k!)(\lambda+\varepsilon)^{k}\) for \(k\in{\mathbb{N}}_{0}\). Obviously, \(h\) is \(\mu\)-integrable by the ratio test for series. To see that \(\tilde{h}\) is \(\mu\)-integrable, note that by l’Hospital’s rule

$$\lim_{k\rightarrow\infty}\frac{\ln(k+1)}{\ln(k!)}=\lim_{x\rightarrow\infty}\frac{\ln(x+1)}{\ln(\Gamma(x+1))}=\lim_{x\rightarrow\infty}\frac{1}{(x+1)\psi(x+1)}=0,$$

where \(\psi\) denotes the digamma function satisfying \(\psi(x)\rightarrow\infty\) for \(x\rightarrow\infty\). Hence, there exists \(k_{0}\in{\mathbb{N}}\) such that \(\ln(k+1)/\ln(k!)<\varepsilon/(\lambda+\varepsilon)\) for \(k\geq k_{0}\). It follows that

$$\frac{\ln((k+1)!)(\lambda+\varepsilon)^{k+1}}{\ln(k!)(\lambda+\varepsilon)^{k}}=(\lambda+\varepsilon)\left(1+\frac{\ln(k+1)}{\ln(k!)}\right)<\lambda+2\varepsilon<1$$

for \(k\geq k_{0}\), and \(\tilde{h}\) is then \(\mu\)-integrable by the ratio test for series; cf. the proof of Lemma 2.2 in [5]. Applying the dominated convergence theorem, we finally obtain

$$\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})=C(\boldsymbol{\vartheta}_{j})\left(\int h_{j}d\mu,-\int\tilde{h}_{j}d\mu\right)$$
$${}\rightarrow C(\boldsymbol{\vartheta})\left(\int\limits_{\mathbb{N}_{0}}k\lambda^{k}d\mu(k),-\int\limits_{\mathbb{N}_{0}}\ln(k!)\lambda^{k}d\mu(k)\right)=\boldsymbol{\pi}(\boldsymbol{\vartheta})$$

for \(j\rightarrow\infty\). \(\Box\)

Lemma 3.2 states a characterization of a boundary point \(\boldsymbol{\vartheta}\) of \(\Theta\) in terms of \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\).

Lemma 3.2. Let \(\boldsymbol{\vartheta}\in\Theta\). Then \(\boldsymbol{\vartheta}\) is a boundary point of \(\Theta\) if and only if \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\) is a boundary point of \(\boldsymbol{\pi}(\textrm{int}(\Theta))\).

Proof. First, let \(\boldsymbol{\vartheta}\in\Theta\) be a boundary point of \(\Theta\). Then there exists a sequence \(\boldsymbol{\vartheta}_{j}\in\textrm{int}(\Theta)\), \(j\in{\mathbb{N}}\), with \(\boldsymbol{\vartheta}_{j}\rightarrow\boldsymbol{\vartheta}\) for \(j\rightarrow\infty\). By Lemma 3.1, we have \(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\rightarrow\boldsymbol{\pi}(\boldsymbol{\vartheta})\) for \(j\rightarrow\infty\), such that every neighborhood of \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\) must contain the points \(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\in\boldsymbol{\pi}(\textrm{int}(\Theta))\), \(j\geq j_{0}\), for some \(j_{0}\in{\mathbb{N}}\).

Next, suppose that \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\in\boldsymbol{\pi}(\textrm{int}(\Theta))\). Then there exists some \(\tilde{\boldsymbol{\vartheta}}\in\textrm{int}(\Theta)\) with \(\boldsymbol{\pi}(\boldsymbol{\vartheta})=\boldsymbol{\pi}(\tilde{\boldsymbol{\vartheta}})\). Since \(\boldsymbol{\vartheta}_{j}\in\textrm{int}(\Theta)\) for all \(j\in{\mathbb{N}}\) and \(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\rightarrow\boldsymbol{\pi}(\boldsymbol{\vartheta})\) for \(j\rightarrow\infty\) by Lemma 3.1, it follows that

$$\boldsymbol{\vartheta}_{j}-\tilde{\boldsymbol{\vartheta}}=\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j}))-\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\tilde{\boldsymbol{\vartheta}}))=\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j}))-\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\boldsymbol{\vartheta}))\rightarrow 0$$

for \(j\rightarrow\infty\) by using that \(\boldsymbol{\pi}^{-1}\) is continuous. This leads to the contradiction \(\boldsymbol{\vartheta}=\tilde{\boldsymbol{\vartheta}}\in\textrm{int}(\Theta)\). Hence, we have \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\notin\boldsymbol{\pi}(\textrm{int}(\Theta))\), and \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\) is a boundary point of \(\boldsymbol{\pi}(\textrm{int}(\Theta))\).

On the other hand, for \(\boldsymbol{\vartheta}\in\textrm{int}(\Theta)\), it is evident that \(\boldsymbol{\pi}(\boldsymbol{\vartheta})\) lies in the interior of \(\boldsymbol{\pi}(\textrm{int}(\Theta))\), since \(\boldsymbol{\pi}(\textrm{int}(\Theta))\) is open. \(\Box\)

In what follows, let the mapping \(d:(0,\infty)\rightarrow(-\infty,0)\) be defined by

$$d(z)=-\sum_{j=2}^{\infty}\ln(j)\left(\frac{z}{z+1}\right)^{j},\quad z>0,$$
(5)

and let \(D=\{(z,d(z)):z>0\}\) denote the graph of \(d\).

Lemma 3.3. \(D\) is a subset of the boundary of \(\boldsymbol{\pi}(\textrm{int}(\Theta))\) and \(D\cap\boldsymbol{\pi}(\textrm{int}(\Theta))=\emptyset\).

Proof. Applying Lemma 3.2 and using formula (4), the boundary of \(\boldsymbol{\pi}(\textrm{int}(\Theta))\) contains the points

$$(1-\lambda)\left(\sum_{k=0}^{\infty}k\lambda^{k},-\sum_{k=0}^{\infty}\ln(k!)\lambda^{k}\right),\quad\lambda\in(0,1).$$
(6)

Since for \(\lambda\in(0,1)\)

$$\sum_{k=0}^{\infty}\ln(k!)\lambda^{k}=\sum_{k=2}^{\infty}\sum_{j=2}^{k}\ln(j)\lambda^{k}=\sum_{j=2}^{\infty}\ln(j)\sum_{k=j}^{\infty}\lambda^{k}$$
$${}=\frac{1}{1-\lambda}\sum_{j=2}^{\infty}\ln(j)\lambda^{j},$$
(7)

formula (6) can be rewritten as

$$\left(\frac{\lambda}{1-\lambda},-\sum_{j=2}^{\infty}\ln(j)\lambda^{j}\right),\quad\lambda\in(0,1),$$

which, by setting \(z=\lambda/(1-\lambda)\), can be reparametrized as \((z,d(z))\), \(z>0\), and the first assertion is shown. The second assertion is obvious from the fact that \(\boldsymbol{\pi}(\textrm{int}(\Theta))\) is open and thus cannot contain any of its boundary points. \(\Box\)

Lemma 3.3 enables us to formulate a condition that every point in \(\boldsymbol{\pi}(\textrm{int}(\Theta))\) necessarily fulfils.

Lemma 3.4. For \((t_{1},t_{2})\in\boldsymbol{\pi}(\textrm{int}(\Theta))\), it holds that \(t_{2}>d(t_{1})\) with mapping \(d\) given by formula (5).

Proof. Let \(\boldsymbol{\vartheta}=(1,1)\in\textrm{int}(\Theta)\). Then, according to formula (4) with \(C(\boldsymbol{\vartheta})=1/e\) and formula (5),

$$\boldsymbol{\pi}(\boldsymbol{\vartheta})=\left(1,-\sum_{k=2}^{\infty}a_{k}\right)\quad\text{and}\quad d(1)=-\sum_{k=2}^{\infty}b_{k},$$

where

$$a_{k}=\frac{\ln(k!)}{ek!}\quad\text{and}\quad b_{k}=\frac{\ln(k)}{2^{k}}\quad\text{for}\quad k\geq 2.$$

To establish that \(d(1)<\pi_{2}(\boldsymbol{\vartheta})\), we show by induction that \(a_{k}<b_{k}\) for all \(k\geq 2\).

Obviously, \(a_{2}=\ln(2)/(2e)<\ln(2)/4=b_{2}\) and \(a_{3}=\ln(6)/(6e)\approx 0.110<0.137\approx\ln(3)/8=b_{3}\). Next, let \(a_{k}<b_{k}\) for some \(k\geq 3\). Then, by using that \((k+1)!>2^{k+1}\),

$$a_{k+1}=\frac{\ln((k+1)!)}{e(k+1)!}=\frac{1}{k+1}\frac{\ln(k!)}{ek!}+\frac{\ln(k+1)}{e(k+1)!}$$
$${}<\frac{1}{k+1}\frac{\ln(k)}{2^{k}}+\frac{\ln(k+1)}{e(k+1)!}<\frac{1}{k+1}\frac{\ln(k+1)}{2^{k}}+\frac{\ln(k+1)}{e2^{k+1}}$$
$${}=\frac{\ln(k+1)}{2^{k}}\left(\frac{1}{k+1}+\frac{1}{2e}\right)<\frac{\ln(k+1)}{2^{k+1}}=b_{k+1}.$$

Now, since \(\boldsymbol{\pi}(\textrm{int}(\Theta))\subset\textrm{int}(M)\subset(0,\infty)\times(-\infty,0)\) is connected by [14, p. 668] and \(D\cap\boldsymbol{\pi}(\textrm{int}(\Theta))=\emptyset\) by Lemma 3.3, the assertion follows. \(\Box\)

We arrive at the main result of this section stating a simple and useful characterization.

Theorem 3.5. Let \(x_{(n)}-x_{(1)}\geq 2\) and \(d\) be given by formula \((5)\). Moreover, let

$$\overline{x}_{n}=\frac{1}{n}\sum_{k=1}^{n}x_{k}\quad and\quad\overline{t}_{n}=-\frac{1}{n}\sum_{k=1}^{n}\ln(x_{k}!).$$
  1. (i)

    If \(\overline{t}_{n}\leq d(\overline{x}_{n})\), then the ML estimate of \(\boldsymbol{\vartheta}\) lies on the boundary of \(\Theta\) and is given by \(\hat{\boldsymbol{\vartheta}}=(\overline{x}_{n}/(\overline{x}_{n}+1),0)\).

  2. (ii)

    If \(\overline{t}_{n}>d(\overline{x}_{n})\), then the ML estimate of \(\boldsymbol{\vartheta}\) lies in the interior of \(\Theta\) and is the unique solution of the likelihood equation (3).

Proof. By Theorem 2.1, existence of the ML estimate \(\hat{\boldsymbol{\vartheta}}\) of \(\boldsymbol{\vartheta}\) based on \(x_{1},\dots,x_{n}\) is guaranteed. If \(\overline{t}_{n}\leq d(\overline{x}_{n})\), it follows from Lemma 3.4 that \((\overline{x}_{n},\overline{t}_{n})\notin\boldsymbol{\pi}(\textrm{int}(\Theta))\), such that a solution of the likelihood equation (3) with respect to \(\boldsymbol{\vartheta}\in(0,\infty)^{2}\) does not exist. Hence, \(\hat{\boldsymbol{\vartheta}}\) necessarily lies on the boundary of \(\Theta\) and must then be given by \(\hat{\boldsymbol{\vartheta}}=(\overline{x}_{n}/(\overline{x}_{n}+1),0)\).

Now, let \(\overline{t}_{n}>d(\overline{x}_{n})\). Suppose that the ML estimate \(\hat{\boldsymbol{\vartheta}}\) lies on the boundary of \(\Theta\). Then, it necessarily holds that \(\hat{\boldsymbol{\vartheta}}=(\hat{\lambda},0)\) with \(\hat{\lambda}=\overline{x}_{n}/(\overline{x}_{n}+1)\). Note that the log-likelihood function based on \(\mathbf{x}=(x_{1},\dots,x_{n})\) is given by

$$\ell_{n}(\boldsymbol{\vartheta};\boldsymbol{x})=\ln\left(\prod_{i=1}^{n}f_{\boldsymbol{\vartheta}}(x_{i})\right)$$
$${}=n\left[\ln(C(\boldsymbol{\vartheta}))+\overline{x}_{n}\ln(\lambda)+\nu\overline{t}_{n}\right],\quad\boldsymbol{\vartheta}=(\lambda,\nu)\in\Theta.$$

Let \(\hat{\boldsymbol{\vartheta}}_{\nu}=(\hat{\lambda},\nu)\) and \(q(\nu)=\ell_{n}(\hat{\boldsymbol{\vartheta}}_{\nu};\mathbf{x})/n\) for \(\nu\geq 0\). Since \(\ln(C(\boldsymbol{\vartheta}))=-\ln(\sum_{k=0}^{\infty}\lambda^{k}/k!^{\nu})\) is differentiable on \((0,\infty)^{2}\) and by using formula (4), it follows that

$$q^{\prime}(\nu)=\frac{\sum_{k=0}^{\infty}\ln(k!)\hat{\lambda}^{k}/k!^{\nu}}{\sum_{k=0}^{\infty}\hat{\lambda}^{k}/k!^{\nu}}+\overline{t}_{n}=\overline{t}_{n}-\pi_{2}(\hat{\boldsymbol{\vartheta}}_{\nu}),\quad\nu>0.$$

According to Lemma 3.1, we have \(\boldsymbol{\pi}(\hat{\boldsymbol{\vartheta}}_{\nu})\rightarrow\boldsymbol{\pi}(\hat{\boldsymbol{\vartheta}})\) for \(\nu\searrow 0\) and, hence,

$$\lim_{\nu\searrow 0}q^{\prime}(\nu)=\overline{t}_{n}-\pi_{2}(\hat{\boldsymbol{\vartheta}})=\overline{t}_{n}-\frac{C(\hat{\boldsymbol{\vartheta}})d(\overline{x}_{n})}{1-\hat{\lambda}}=\overline{t}_{n}-d(\overline{x}_{n})>0$$
(8)

by using formula (4), formula (7) with \(\lambda=\hat{\lambda}\), and formula (5) with \(z=\overline{x}_{n}\). Since \(C(\hat{\boldsymbol{\vartheta}}_{\nu})\rightarrow C(\hat{\boldsymbol{\vartheta}})\) by Lemma 3.1, we also have \(\ell(\hat{\boldsymbol{\vartheta}}_{\nu};\mathbf{x})\rightarrow\ell(\hat{\boldsymbol{\vartheta}};\mathbf{x})\) for \(\nu\searrow 0\), and formula (8) then implies the existence of some small \(\nu>0\) with \(\ell(\hat{\boldsymbol{\vartheta}}_{\nu};\mathbf{x})>\ell(\hat{\boldsymbol{\vartheta}};\mathbf{x})\), which forms a contradiction to \(\hat{\boldsymbol{\vartheta}}\) being the ML estimate. \(\Box\)

The usefulness of Theorem 2.1 and Theorem 3.5 is demonstrated by means of an example.

Example 3.6. We consider three data sets of size \(n=7\), namely

$$\mathbf{x}^{(1)}=(2,0,1,2,0,2,0),$$
$$\mathbf{x}^{(2)}=(0,1,0,5,0,1,0),$$
$$\text{and}\quad\mathbf{x}^{(3)}=(2,1,1,1,2,1,2).$$

Applying Theorem 2.1, the ML estimate of \(\boldsymbol{\vartheta}\) based on \(\mathbf{x}^{(1)}\) and the ML estimate of \(\boldsymbol{\vartheta}\) based on \(\mathbf{x}^{(2)}\) uniquely exist, while the ML estimate of \(\boldsymbol{\vartheta}\) based on \(\mathbf{x}^{(3)}\) does not exist. The arithmetic mean of the observations in \(\mathbf{x}^{(1)}\) and \(\mathbf{x}^{(2)}\), respectively, is each equal to 1, and we have

$$d(1)=-\sum_{j=2}^{\infty}\ln(j)\left(\frac{1}{2}\right)^{j}\approx-0.508.$$

Since

$$\overline{t}_{7}^{(1)}=-\frac{1}{7}\sum_{i=1}^{7}\ln(x_{i}^{(1)}!)=-\frac{3\ln(2)}{7}\approx-0.297>d(1)$$

and

$$\overline{t}_{7}^{(2)}=-\frac{1}{7}\sum_{i=1}^{7}\ln(x_{i}^{(2)}!)=-\frac{\ln(120)}{7}\approx-0.684\leq d(1),$$

applying Theorem 3.5 yields that the ML estimate of \(\boldsymbol{\vartheta}\) based on \(\mathbf{x}^{(1)}\) lies in the interior of \(\Theta\) and is the unique solution of the likelihood equation, while the ML estimate of \(\boldsymbol{\vartheta}\) based on \(\mathbf{x}^{(2)}\) lies on the boundary of \(\Theta\) and is given by \(\hat{\boldsymbol{\vartheta}}=(1/2,0)\).

The conditions \(\overline{t}_{n}\leq d(\overline{x}_{n})\) and \(\overline{t}_{n}>d(\overline{x}_{n})\) in Theorem 3.5 are equivalent to \(\overline{t}_{n}\leq\tilde{d}(\overline{x}_{n}/(\overline{x}_{n}+1))\) and \(\overline{t}_{n}>\tilde{d}(\overline{x}_{n}/(\overline{x}_{n}+1))\), respectively, with mapping

$$\tilde{d}(\lambda)=-\sum_{j=2}^{\infty}\ln(j)\lambda^{j},\quad\lambda\in(0,1),$$
(9)

the graph of which is depicted in Fig. 1 to ease the comparison of the values \(\overline{t}_{n}\) and \(\tilde{d}(\overline{x}_{n}/(\overline{x}_{n}+1))\) for a given data set.

Fig. 1
figure 1

Graph of function \(\tilde{d}\) defined by formula (9).

4 SIMULATION STUDY

We perform a simulation study to investigate the accuracy of the ML estimates in dependence of the range of the underlying observations. Let \(\boldsymbol{\vartheta}_{1}=(\lambda_{1},\nu_{1})=(7.24,0.8)\) and \(\boldsymbol{\vartheta}_{2}=(\lambda_{2},\nu_{2})=(293\,162.5,5)\) be two parameter vectors corresponding to an overdispersed CMP distribution \(P_{\boldsymbol{\vartheta}_{1}}\) and an underdispersed CMP distribution \(P_{\boldsymbol{\vartheta}_{2}}\). Here, for \(i=1,2\), the parameter \(\lambda_{i}\) is determined in dependence of \(\nu_{i}\) as \((12+(\nu_{i}-1)/(2\nu_{i}))^{\nu_{i}}\) to ensure that the mean of \(P_{\boldsymbol{\vartheta}_{i}}\) is approximately equal to 12; see, e.g., [22, formula (7)]. For each parameter vector, we generate \(m=100\,000\) realizations of a sample of size \(n=20\) and compute the corresponding ML estimates, all of which exist by Theorem 2.1, since the observed range of every sample turns out to be greater than 1 (for the parameter vectors considered, the probability of non-existence of the ML estimate is very small). The ML estimates thus obtained are then grouped with respect to the range of the underlying observations and can be considered realizations of the ML estimator conditioned on the range \(X_{(20)}\)\(X_{(1)}\). For every such group consisting of the ML estimates \((\hat{\lambda}_{i}^{(j)},\hat{\nu}_{i}^{(j)}),1\leq j\leq k\), say, we separately calculate the (empirical) relative absolute bias

$$\text{RAB}(\hat{\lambda}_{i})=\frac{1}{k}\sum_{j=1}^{k}\left|\frac{\hat{\lambda}_{i}^{(j)}}{\lambda_{i}}-1\right|$$

and the scaled (empirical) root-mean-square error

$$\text{SRMSE}(\hat{\lambda}_{i})=\sqrt{\frac{1}{k}\sum_{j=1}^{k}\left(\frac{\hat{\lambda}_{i}^{(j)}}{\lambda_{i}}-1\right)^{2}},$$

as well as respective quantities for the dispersion parameter \(\nu_{i}\), \(i=1,2\). The results are shown in Tables 1 and 2 along with the relative frequency of every group. It is found that all dispersion measures are decreasing/increasing as functions of the range of observations. In the overdispersed case, the ML estimates of \(\lambda_{1}\) and \(\nu_{1}\) are most inaccurate when the range is small, while a large range seems to be less problematic. This finding also applies to the ML estimate of \(\lambda_{2}\) in the underdispersed case. The precision of the ML estimate of \(\nu_{2}\), in turn, appears not to be affected by a small range of observations. Having shown in Theorem 2.1 that an ML estimate does not exist for a range of 0 or 1, the simulation study additionally suggests that, for small ranges greater than 1, ML estimation may produce highly inaccurate values. Although the probability of non-existence of the ML estimate will typically be small in applications, observing a small range greater than 1 might not be that unlikely, in the case of which one should be critical of the ML estimate.

Table 1 Empirical relative absolute bias (RAB) and scaled root-mean-square error (SRMSE) of the ML estimators given the range of observations, where \(m=100\,000\) samples of size \(n=20\) are generated from \(P_{\boldsymbol{\vartheta}_{1}}\)
Table 2 Empirical relative absolute bias (RAB) and scaled root-mean-square error (SRMSE) of the ML estimators given the range of observations, where \(m=100\,000\) samples of size \(n=20\) are generated from \(P_{\boldsymbol{\vartheta}_{2}}\)