Characterizing Existence and Location of the ML Estimate in the Conway–Maxwell–Poisson Model

Bedbur, Stefan; Imm, Anton; Kamps, Udo

doi:10.3103/S1066530724700042

Characterizing Existence and Location of the ML Estimate in the Conway–Maxwell–Poisson Model

Published: 25 April 2024

Volume 33, pages 70–78, (2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Mathematical Methods of Statistics Aims and scope Submit manuscript

Characterizing Existence and Location of the ML Estimate in the Conway–Maxwell–Poisson Model

Download PDF

Stefan Bedbur¹,
Anton Imm¹ &
Udo Kamps¹

70 Accesses
Explore all metrics

Abstract

As a flexible extension of the common Poisson model, the Conway–Maxwell–Poisson distribution allows for describing under- and overdispersion in count data via an additional parameter. Estimation methods for two Conway–Maxwell–Poisson parameters are then required to specify the model. In this work, two characterization results are provided related to maximum likelihood estimation of the Conway–Maxwell–Poisson parameters. The first states that maximum likelihood estimation fails if and only if the range of the observations is less than two. Assuming that the maximum likelihood estimate exists, the second result then comprises a simple necessary and sufficient condition for the maximum likelihood estimate to be a solution of the likelihood equation; otherwise it lies on the boundary of the parameter set. A simulation study is carried out to investigate the accuracy of the maximum likelihood estimate in dependence of the range of the underlying observations.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 INTRODUCTION

The Conway–Maxwell–Poisson (CMP) distribution has been introduced in [9] as an extension of the common Poisson model, which allows to describe under- and overdispersion in count data. In the last two decades, its flexibility and usefulness has been pointed out and, meanwhile, there is a variety of articles in the literature dealing with CMP distributions and their properties including, among others, distribution theory, inference, regression models, time series, tree-based models, Bayesian methods, multivariate extensions, and applications; see, for instance, [1, 2, 4–8, 10–12, 16–22].

The counting density of the CMP distribution $P_{\boldsymbol{\vartheta}}$ is given by

$$f_{\boldsymbol{\vartheta}}(x)=C(\boldsymbol{\vartheta})\frac{\lambda^{x}}{x!^{\nu}},\quad x\in{\mathbb{N}}_{0}=\{0,1,2,\dots\}$$

(1)

with normalizing constant

$$C(\boldsymbol{\vartheta})=\left(\sum_{k=0}^{\infty}\frac{\lambda^{k}}{k!^{\nu}}\right)^{-1}$$

(2)

for $\boldsymbol{\vartheta}=(\lambda,\nu)\in\Theta$ with parameter set

$$\Theta=(0,\infty)^{2}\cup[(0,1)\times\{0\}].$$

Geometric distributions, Poisson distributions, and Bernoulli distributions are contained in the model and result by setting $\nu=0$, 1, and by sending $\nu$ to infinity. A CMP distribution has variance not smaller than its mean for $\nu<1$ (overdispersion) and variance not larger than its mean for $\nu>1$ (underdispersion); see, e.g., [13, Sect. 4.2; 10, Subsect. 2.3.1]. In case $\nu=1$, i.e., for Poisson distributions, we have equality, of course.

In this article two characterization results are provided related to maximum likelihood (ML) estimation of the CMP parameter $\boldsymbol{\vartheta}$. In a previous article by the authors, a sufficient condition for the non-existence of the ML estimate has been derived, namely that the range of observations is less than two; see [5]. Here, we point out that the condition is also necessary (Section 2). Moreover, in case of existence, a simple necessary and sufficient condition for the ML estimate to be a solution of the likelihood equation is derived (Section 3). If this condition is not met, the ML estimate lies on the boundary of $\Theta$. Finally, a simulation study is performed to assess the accuracy of the ML estimator given the range of observations (Section 4).

2 A CHARACTERIZATION RESULT ON THE EXISTENCE OF THE ML ESTIMATE

Let $X_{1},\dots,X_{n}$ be independent and identically distributed random variables with distribution $P_{\boldsymbol{\vartheta}}$ and counting density $f_{\boldsymbol{\vartheta}}$ given by formula (1), and let $x_{1},\dots,x_{n}\in{\mathbb{N}}_{0}$ be realizations of $X_{1},\dots,X_{n}$. Moreover, let $x_{(1)}\leq\ldots\leq x_{(n)}$ denote the realizations in ascending order. Furthermore, we introduce the statistic

$$\mathbf{T}(x)=(T_{1}(x),T_{2}(x))=(x,-\ln(x!)),\quad x\in{\mathbb{N}}_{0},$$

and denote the convex support of $\mathbf{T}$, i.e., the closed convex hull of the support of $\mathbf{T}$, by $M$. According to Lemma 2.3 in [5], we have the representation

$$M=\{(y+\alpha,z)\in{\mathbb{R}}^{2}:y\in{\mathbb{N}}_{0},\ \alpha\in[0,1),\ z\leq\tilde{T}_{2}(y,\alpha)\},$$

where

$$\tilde{T}_{2}(y,\alpha)=\alpha T_{2}(y+1)+(1-\alpha)T_{2}(y),\quad y\in{\mathbb{N}}_{0},\quad\alpha\in[0,1).$$

Theorem 2.1. An ML estimate of $\boldsymbol{\vartheta}$ based on $x_{1},\dots,x_{n}$ exists if and only if $x_{(n)}-x_{(1)}\geq 2$, i.e., if the range of all observations is at least two. In case of existence, the ML estimate is uniquely determined.

Proof. As shown in [5, 15], the set $\{P_{\boldsymbol{\vartheta}}^{(n)}:\boldsymbol{\vartheta}\in\Theta\}$ of $n$-fold product measures of $P_{\boldsymbol{\vartheta}}$, $\boldsymbol{\vartheta}\in\Theta$, forms a full exponential family with canonical parameter $\boldsymbol{\zeta}=(\ln(\lambda),\nu)$ and minimal sufficient statistic $\mathbf{T}^{(n)}(\mathbf{x})=\sum_{i=1}^{n}\mathbf{T}(x_{i})$ for $\mathbf{x}=(x_{1},\dots,x_{n})\in{\mathbb{N}}_{0}^{n}$. Applying Theorem 9.13 in [3, p. 151] to the corresponding exponential family consisting of the distributions of $\mathbf{T}^{(n)}$, an ML estimate of $\boldsymbol{\zeta}$ (and then also of $\boldsymbol{\vartheta}$) based on $\mathbf{x}\in{\mathbb{N}}_{0}^{n}$ exists and is then unique if and only if $\mathbf{T}^{(n)}(\mathbf{x})$ lies in the interior of the convex support of $\mathbf{T}^{(n)}$ or, equivalently, if $\sum_{i=1}^{n}\mathbf{T}(x_{i})/n$ lies in the interior of $M$. The latter condition, in turn, is equivalent to $x_{(n)}-x_{(1)}\geq 2$ by [5, Lemma 3.2]. $\Box$

Theorem 2.1 improves upon a former result in [5, Theorem 3.3 and Corollary 3.5] stating that $x_{(n)}-x_{(1)}\geq 2$ is a necessary condition for the existence of the ML estimate.

In case of existence, the unique ML estimate $\hat{\boldsymbol{\vartheta}}=(\hat{\lambda},\hat{\nu})$, say, of $\boldsymbol{\vartheta}$ is either the unique solution of the likelihood equation

$$\boldsymbol{\pi}(\boldsymbol{\vartheta})=\frac{1}{n}\sum_{i=1}^{n}\mathbf{T}(x_{i})$$

(3)

with mapping $\boldsymbol{\pi}=(\pi_{1},\pi_{2}):\textrm{int}(\Theta)\rightarrow\textrm{int}(M)$ defined by

$$\boldsymbol{\pi}(\boldsymbol{\vartheta})=(\pi_{1}(\boldsymbol{\vartheta}),\pi_{2}(\boldsymbol{\vartheta})){}{}=C(\boldsymbol{\vartheta})\left(\sum_{k=0}^{\infty}\frac{k\lambda^{k}}{k!^{\nu}},-\sum_{k=0}^{\infty}\frac{\ln(k!)\lambda^{k}}{k!^{\nu}}\right),$$

$$\boldsymbol{\vartheta}=(\lambda,\nu)\in(0,\infty)^{2}$$

(4)

or lies on the boundary of $\Theta$ and, hence, corresponds to a geometric distribution; see [5] for details. In the latter case, we necessarily have $\hat{\boldsymbol{\vartheta}}=(\overline{x}_{n}/(1+\overline{x}_{n}),0)$ with $\overline{x}_{n}=\sum_{i=1}^{n}x_{i}/n$, since $\overline{x}_{n}/(1+\overline{x}_{n})$ is the unique ML estimate of $\lambda$ in the subfamily of geometric distributions. The image of $\boldsymbol{\pi}$ is of complicated analytic form, but its graphical representation may be useful to decide between the cases; see [5].

3 CHARACTERIZING THE EXISTENCE OF A SOLUTION OF THE LIKELIHOOD EQUATION

In the following, we derive an analytic criterion to decide whether, in case of existence, the ML estimate lies in the interior $\textrm{int}(\Theta)=(0,\infty)^{2}$ of $\Theta$ and can thus be obtained as a solution of the likelihood equation or lies on the boundary $\{(\lambda,0):\lambda\in(0,1)\}$ of $\Theta$. For this, note that the mapping $\boldsymbol{\pi}:\textrm{int}(\Theta)\rightarrow\boldsymbol{\pi}(\textrm{int}(\Theta))$ is bijective and continuously differentiable; see [5], Section 2. $\boldsymbol{\pi}$ therefore possesses a continuous inverse function $\boldsymbol{\pi}^{-1}:\boldsymbol{\pi}(\textrm{int}(\Theta))\mapsto\textrm{int}(\Theta)$. In particular, we have that $\boldsymbol{\pi}(\textrm{int}(\Theta))$ is open as the pre-image of $\textrm{int}(\Theta)$ under $\boldsymbol{\pi}^{-1}$. For our purposes, it will be convenient to extend the domain of $\boldsymbol{\pi}$ from $\textrm{int}(\Theta)$ to $\Theta$ according to formula (4).

First, some preliminary results related to the behavior of $\boldsymbol{\pi}$ on the boundary of $\Theta$ are stated within several lemmas.

Lemma 3.1. Let $\boldsymbol{\vartheta}\in\Theta$ be a boundary point of $\Theta$ and $\boldsymbol{\vartheta}_{j}\in\textrm{int}(\Theta)$, $j\in{\mathbb{N}}$, be a sequence in the interior of $\Theta$ with $\boldsymbol{\vartheta}_{j}\rightarrow\boldsymbol{\vartheta}$ for $j\rightarrow\infty$. Then, $C(\boldsymbol{\vartheta}_{j})\rightarrow C(\boldsymbol{\vartheta})$ and $\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\rightarrow\boldsymbol{\pi}(\boldsymbol{\vartheta})$ for $j\rightarrow\infty$.

Proof. Let $\boldsymbol{\vartheta}\in\Theta$ be some boundary point of $\Theta$, i.e., $\boldsymbol{\vartheta}=(\lambda,0)$ for some $\lambda\in(0,1)$, and let $\boldsymbol{\vartheta}_{j}=(\lambda_{j},\nu_{j})\in\textrm{int}(\Theta)$ with $\boldsymbol{\vartheta}_{j}\rightarrow(\lambda,0)$ for $j\rightarrow\infty$. Moreover, let $\varepsilon>0$ be such that $\lambda+2\varepsilon<1$. For $j\in{\mathbb{N}}$, let $g_{j}(k)=\lambda_{j}^{k}/k!^{\nu_{j}}$, $k\in{\mathbb{N}}_{0}$. Furthermore, let $\mu$ denote the counting measure on ${\mathbb{N}}_{0}$. Then, there exists $j_{0}\in{\mathbb{N}}$ with $g_{j}(k)\leq g(k)=(\lambda+\varepsilon)^{k}$, $k\in{\mathbb{N}}_{0}$, for all $j\geq j_{0}$, where $g$ is $\mu$-integrable. Hence, the dominated convergence theorem yields

$$\sum_{k=0}^{\infty}\frac{\lambda_{j}^{k}}{k!^{\nu_{j}}}=\int g_{j}d\mu\rightarrow\int\limits_{{\mathbb{N}}_{0}}\lambda^{k}d\mu(k)=\frac{1}{1-\lambda}$$

and thus $C(\boldsymbol{\vartheta}_{j})\rightarrow C(\boldsymbol{\vartheta})$ for $j\rightarrow\infty$; see formula (2).

Next, we define for $j\in{\mathbb{N}}$ the functions $h_{j}(k)=kg_{j}(k)$ and $\tilde{h}_{j}(k)=\ln(k!)g_{j}(k)$ for $k\in{\mathbb{N}}_{0}$. Then, for $j\geq j_{0}$, we have that $h_{j}(k)\leq h(k)=k(\lambda+\varepsilon)^{k}$ and $\tilde{h}_{j}(k)\leq\tilde{h}(k)=\ln(k!)(\lambda+\varepsilon)^{k}$ for $k\in{\mathbb{N}}_{0}$. Obviously, $h$ is $\mu$-integrable by the ratio test for series. To see that $\tilde{h}$ is $\mu$-integrable, note that by l’Hospital’s rule

$$\lim_{k\rightarrow\infty}\frac{\ln(k+1)}{\ln(k!)}=\lim_{x\rightarrow\infty}\frac{\ln(x+1)}{\ln(\Gamma(x+1))}=\lim_{x\rightarrow\infty}\frac{1}{(x+1)\psi(x+1)}=0,$$

where $\psi$ denotes the digamma function satisfying $\psi(x)\rightarrow\infty$ for $x\rightarrow\infty$. Hence, there exists $k_{0}\in{\mathbb{N}}$ such that $\ln(k+1)/\ln(k!)<\varepsilon/(\lambda+\varepsilon)$ for $k\geq k_{0}$. It follows that

$$\frac{\ln((k+1)!)(\lambda+\varepsilon)^{k+1}}{\ln(k!)(\lambda+\varepsilon)^{k}}=(\lambda+\varepsilon)\left(1+\frac{\ln(k+1)}{\ln(k!)}\right)<\lambda+2\varepsilon<1$$

for $k\geq k_{0}$, and $\tilde{h}$ is then $\mu$-integrable by the ratio test for series; cf. the proof of Lemma 2.2 in [5]. Applying the dominated convergence theorem, we finally obtain

$$\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})=C(\boldsymbol{\vartheta}_{j})\left(\int h_{j}d\mu,-\int\tilde{h}_{j}d\mu\right)$$

$${}\rightarrow C(\boldsymbol{\vartheta})\left(\int\limits_{\mathbb{N}_{0}}k\lambda^{k}d\mu(k),-\int\limits_{\mathbb{N}_{0}}\ln(k!)\lambda^{k}d\mu(k)\right)=\boldsymbol{\pi}(\boldsymbol{\vartheta})$$

for $j\rightarrow\infty$. $\Box$

Lemma 3.2 states a characterization of a boundary point $\boldsymbol{\vartheta}$ of $\Theta$ in terms of $\boldsymbol{\pi}(\boldsymbol{\vartheta})$.

Lemma 3.2. Let $\boldsymbol{\vartheta}\in\Theta$. Then $\boldsymbol{\vartheta}$ is a boundary point of $\Theta$ if and only if $\boldsymbol{\pi}(\boldsymbol{\vartheta})$ is a boundary point of $\boldsymbol{\pi}(\textrm{int}(\Theta))$.

Proof. First, let $\boldsymbol{\vartheta}\in\Theta$ be a boundary point of $\Theta$. Then there exists a sequence $\boldsymbol{\vartheta}_{j}\in\textrm{int}(\Theta)$, $j\in{\mathbb{N}}$, with $\boldsymbol{\vartheta}_{j}\rightarrow\boldsymbol{\vartheta}$ for $j\rightarrow\infty$. By Lemma 3.1, we have $\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\rightarrow\boldsymbol{\pi}(\boldsymbol{\vartheta})$ for $j\rightarrow\infty$, such that every neighborhood of $\boldsymbol{\pi}(\boldsymbol{\vartheta})$ must contain the points $\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\in\boldsymbol{\pi}(\textrm{int}(\Theta))$, $j\geq j_{0}$, for some $j_{0}\in{\mathbb{N}}$.

Next, suppose that $\boldsymbol{\pi}(\boldsymbol{\vartheta})\in\boldsymbol{\pi}(\textrm{int}(\Theta))$. Then there exists some $\tilde{\boldsymbol{\vartheta}}\in\textrm{int}(\Theta)$ with $\boldsymbol{\pi}(\boldsymbol{\vartheta})=\boldsymbol{\pi}(\tilde{\boldsymbol{\vartheta}})$. Since $\boldsymbol{\vartheta}_{j}\in\textrm{int}(\Theta)$ for all $j\in{\mathbb{N}}$ and $\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j})\rightarrow\boldsymbol{\pi}(\boldsymbol{\vartheta})$ for $j\rightarrow\infty$ by Lemma 3.1, it follows that

$$\boldsymbol{\vartheta}_{j}-\tilde{\boldsymbol{\vartheta}}=\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j}))-\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\tilde{\boldsymbol{\vartheta}}))=\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\boldsymbol{\vartheta}_{j}))-\boldsymbol{\pi}^{-1}(\boldsymbol{\pi}(\boldsymbol{\vartheta}))\rightarrow 0$$

for $j\rightarrow\infty$ by using that $\boldsymbol{\pi}^{-1}$ is continuous. This leads to the contradiction $\boldsymbol{\vartheta}=\tilde{\boldsymbol{\vartheta}}\in\textrm{int}(\Theta)$. Hence, we have $\boldsymbol{\pi}(\boldsymbol{\vartheta})\notin\boldsymbol{\pi}(\textrm{int}(\Theta))$, and $\boldsymbol{\pi}(\boldsymbol{\vartheta})$ is a boundary point of $\boldsymbol{\pi}(\textrm{int}(\Theta))$.

On the other hand, for $\boldsymbol{\vartheta}\in\textrm{int}(\Theta)$, it is evident that $\boldsymbol{\pi}(\boldsymbol{\vartheta})$ lies in the interior of $\boldsymbol{\pi}(\textrm{int}(\Theta))$, since $\boldsymbol{\pi}(\textrm{int}(\Theta))$ is open. $\Box$

In what follows, let the mapping $d:(0,\infty)\rightarrow(-\infty,0)$ be defined by

$$d(z)=-\sum_{j=2}^{\infty}\ln(j)\left(\frac{z}{z+1}\right)^{j},\quad z>0,$$

(5)

and let $D=\{(z,d(z)):z>0\}$ denote the graph of $d$.

Lemma 3.3. $D$ is a subset of the boundary of $\boldsymbol{\pi}(\textrm{int}(\Theta))$ and $D\cap\boldsymbol{\pi}(\textrm{int}(\Theta))=\emptyset$.

Proof. Applying Lemma 3.2 and using formula (4), the boundary of $\boldsymbol{\pi}(\textrm{int}(\Theta))$ contains the points

$$(1-\lambda)\left(\sum_{k=0}^{\infty}k\lambda^{k},-\sum_{k=0}^{\infty}\ln(k!)\lambda^{k}\right),\quad\lambda\in(0,1).$$

(6)

Since for $\lambda\in(0,1)$

$$\sum_{k=0}^{\infty}\ln(k!)\lambda^{k}=\sum_{k=2}^{\infty}\sum_{j=2}^{k}\ln(j)\lambda^{k}=\sum_{j=2}^{\infty}\ln(j)\sum_{k=j}^{\infty}\lambda^{k}$$

$${}=\frac{1}{1-\lambda}\sum_{j=2}^{\infty}\ln(j)\lambda^{j},$$

(7)

formula (6) can be rewritten as

$$\left(\frac{\lambda}{1-\lambda},-\sum_{j=2}^{\infty}\ln(j)\lambda^{j}\right),\quad\lambda\in(0,1),$$

which, by setting $z=\lambda/(1-\lambda)$, can be reparametrized as $(z,d(z))$, $z>0$, and the first assertion is shown. The second assertion is obvious from the fact that $\boldsymbol{\pi}(\textrm{int}(\Theta))$ is open and thus cannot contain any of its boundary points. $\Box$

Lemma 3.3 enables us to formulate a condition that every point in $\boldsymbol{\pi}(\textrm{int}(\Theta))$ necessarily fulfils.

Lemma 3.4. For $(t_{1},t_{2})\in\boldsymbol{\pi}(\textrm{int}(\Theta))$, it holds that $t_{2}>d(t_{1})$ with mapping $d$ given by formula (5).

Proof. Let $\boldsymbol{\vartheta}=(1,1)\in\textrm{int}(\Theta)$. Then, according to formula (4) with $C(\boldsymbol{\vartheta})=1/e$ and formula (5),

$$\boldsymbol{\pi}(\boldsymbol{\vartheta})=\left(1,-\sum_{k=2}^{\infty}a_{k}\right)\quad\text{and}\quad d(1)=-\sum_{k=2}^{\infty}b_{k},$$

where

$$a_{k}=\frac{\ln(k!)}{ek!}\quad\text{and}\quad b_{k}=\frac{\ln(k)}{2^{k}}\quad\text{for}\quad k\geq 2.$$

To establish that $d(1)<\pi_{2}(\boldsymbol{\vartheta})$, we show by induction that $a_{k}<b_{k}$ for all $k\geq 2$.

Obviously, $a_{2}=\ln(2)/(2e)<\ln(2)/4=b_{2}$ and $a_{3}=\ln(6)/(6e)\approx 0.110<0.137\approx\ln(3)/8=b_{3}$. Next, let $a_{k}<b_{k}$ for some $k\geq 3$. Then, by using that $(k+1)!>2^{k+1}$,

$$a_{k+1}=\frac{\ln((k+1)!)}{e(k+1)!}=\frac{1}{k+1}\frac{\ln(k!)}{ek!}+\frac{\ln(k+1)}{e(k+1)!}$$

$${}<\frac{1}{k+1}\frac{\ln(k)}{2^{k}}+\frac{\ln(k+1)}{e(k+1)!}<\frac{1}{k+1}\frac{\ln(k+1)}{2^{k}}+\frac{\ln(k+1)}{e2^{k+1}}$$

$${}=\frac{\ln(k+1)}{2^{k}}\left(\frac{1}{k+1}+\frac{1}{2e}\right)<\frac{\ln(k+1)}{2^{k+1}}=b_{k+1}.$$

Now, since $\boldsymbol{\pi}(\textrm{int}(\Theta))\subset\textrm{int}(M)\subset(0,\infty)\times(-\infty,0)$ is connected by [14, p. 668] and $D\cap\boldsymbol{\pi}(\textrm{int}(\Theta))=\emptyset$ by Lemma 3.3, the assertion follows. $\Box$

We arrive at the main result of this section stating a simple and useful characterization.

Theorem 3.5. Let $x_{(n)}-x_{(1)}\geq 2$ and $d$ be given by formula $(5)$. Moreover, let

$$\overline{x}_{n}=\frac{1}{n}\sum_{k=1}^{n}x_{k}\quad and\quad\overline{t}_{n}=-\frac{1}{n}\sum_{k=1}^{n}\ln(x_{k}!).$$

(i)
If $\overline{t}_{n}\leq d(\overline{x}_{n})$, then the ML estimate of $\boldsymbol{\vartheta}$ lies on the boundary of $\Theta$ and is given by $\hat{\boldsymbol{\vartheta}}=(\overline{x}_{n}/(\overline{x}_{n}+1),0)$.
(ii)
If $\overline{t}_{n}>d(\overline{x}_{n})$, then the ML estimate of $\boldsymbol{\vartheta}$ lies in the interior of $\Theta$ and is the unique solution of the likelihood equation (3).

Proof. By Theorem 2.1, existence of the ML estimate $\hat{\boldsymbol{\vartheta}}$ of $\boldsymbol{\vartheta}$ based on $x_{1},\dots,x_{n}$ is guaranteed. If $\overline{t}_{n}\leq d(\overline{x}_{n})$, it follows from Lemma 3.4 that $(\overline{x}_{n},\overline{t}_{n})\notin\boldsymbol{\pi}(\textrm{int}(\Theta))$, such that a solution of the likelihood equation (3) with respect to $\boldsymbol{\vartheta}\in(0,\infty)^{2}$ does not exist. Hence, $\hat{\boldsymbol{\vartheta}}$ necessarily lies on the boundary of $\Theta$ and must then be given by $\hat{\boldsymbol{\vartheta}}=(\overline{x}_{n}/(\overline{x}_{n}+1),0)$.

Now, let $\overline{t}_{n}>d(\overline{x}_{n})$. Suppose that the ML estimate $\hat{\boldsymbol{\vartheta}}$ lies on the boundary of $\Theta$. Then, it necessarily holds that $\hat{\boldsymbol{\vartheta}}=(\hat{\lambda},0)$ with $\hat{\lambda}=\overline{x}_{n}/(\overline{x}_{n}+1)$. Note that the log-likelihood function based on $\mathbf{x}=(x_{1},\dots,x_{n})$ is given by

$$\ell_{n}(\boldsymbol{\vartheta};\boldsymbol{x})=\ln\left(\prod_{i=1}^{n}f_{\boldsymbol{\vartheta}}(x_{i})\right)$$

$${}=n\left[\ln(C(\boldsymbol{\vartheta}))+\overline{x}_{n}\ln(\lambda)+\nu\overline{t}_{n}\right],\quad\boldsymbol{\vartheta}=(\lambda,\nu)\in\Theta.$$

Let $\hat{\boldsymbol{\vartheta}}_{\nu}=(\hat{\lambda},\nu)$ and $q(\nu)=\ell_{n}(\hat{\boldsymbol{\vartheta}}_{\nu};\mathbf{x})/n$ for $\nu\geq 0$. Since $\ln(C(\boldsymbol{\vartheta}))=-\ln(\sum_{k=0}^{\infty}\lambda^{k}/k!^{\nu})$ is differentiable on $(0,\infty)^{2}$ and by using formula (4), it follows that

$$q^{\prime}(\nu)=\frac{\sum_{k=0}^{\infty}\ln(k!)\hat{\lambda}^{k}/k!^{\nu}}{\sum_{k=0}^{\infty}\hat{\lambda}^{k}/k!^{\nu}}+\overline{t}_{n}=\overline{t}_{n}-\pi_{2}(\hat{\boldsymbol{\vartheta}}_{\nu}),\quad\nu>0.$$

According to Lemma 3.1, we have $\boldsymbol{\pi}(\hat{\boldsymbol{\vartheta}}_{\nu})\rightarrow\boldsymbol{\pi}(\hat{\boldsymbol{\vartheta}})$ for $\nu\searrow 0$ and, hence,

$$\lim_{\nu\searrow 0}q^{\prime}(\nu)=\overline{t}_{n}-\pi_{2}(\hat{\boldsymbol{\vartheta}})=\overline{t}_{n}-\frac{C(\hat{\boldsymbol{\vartheta}})d(\overline{x}_{n})}{1-\hat{\lambda}}=\overline{t}_{n}-d(\overline{x}_{n})>0$$

(8)

by using formula (4), formula (7) with $\lambda=\hat{\lambda}$, and formula (5) with $z=\overline{x}_{n}$. Since $C(\hat{\boldsymbol{\vartheta}}_{\nu})\rightarrow C(\hat{\boldsymbol{\vartheta}})$ by Lemma 3.1, we also have $\ell(\hat{\boldsymbol{\vartheta}}_{\nu};\mathbf{x})\rightarrow\ell(\hat{\boldsymbol{\vartheta}};\mathbf{x})$ for $\nu\searrow 0$, and formula (8) then implies the existence of some small $\nu>0$ with $\ell(\hat{\boldsymbol{\vartheta}}_{\nu};\mathbf{x})>\ell(\hat{\boldsymbol{\vartheta}};\mathbf{x})$, which forms a contradiction to $\hat{\boldsymbol{\vartheta}}$ being the ML estimate. $\Box$

The usefulness of Theorem 2.1 and Theorem 3.5 is demonstrated by means of an example.

Example 3.6. We consider three data sets of size $n=7$, namely

$$\mathbf{x}^{(1)}=(2,0,1,2,0,2,0),$$

$$\mathbf{x}^{(2)}=(0,1,0,5,0,1,0),$$

$$\text{and}\quad\mathbf{x}^{(3)}=(2,1,1,1,2,1,2).$$

Applying Theorem 2.1, the ML estimate of $\boldsymbol{\vartheta}$ based on $\mathbf{x}^{(1)}$ and the ML estimate of $\boldsymbol{\vartheta}$ based on $\mathbf{x}^{(2)}$ uniquely exist, while the ML estimate of $\boldsymbol{\vartheta}$ based on $\mathbf{x}^{(3)}$ does not exist. The arithmetic mean of the observations in $\mathbf{x}^{(1)}$ and $\mathbf{x}^{(2)}$, respectively, is each equal to 1, and we have

$$d(1)=-\sum_{j=2}^{\infty}\ln(j)\left(\frac{1}{2}\right)^{j}\approx-0.508.$$

Since

$$\overline{t}_{7}^{(1)}=-\frac{1}{7}\sum_{i=1}^{7}\ln(x_{i}^{(1)}!)=-\frac{3\ln(2)}{7}\approx-0.297>d(1)$$

and

$$\overline{t}_{7}^{(2)}=-\frac{1}{7}\sum_{i=1}^{7}\ln(x_{i}^{(2)}!)=-\frac{\ln(120)}{7}\approx-0.684\leq d(1),$$

applying Theorem 3.5 yields that the ML estimate of $\boldsymbol{\vartheta}$ based on $\mathbf{x}^{(1)}$ lies in the interior of $\Theta$ and is the unique solution of the likelihood equation, while the ML estimate of $\boldsymbol{\vartheta}$ based on $\mathbf{x}^{(2)}$ lies on the boundary of $\Theta$ and is given by $\hat{\boldsymbol{\vartheta}}=(1/2,0)$.

The conditions $\overline{t}_{n}\leq d(\overline{x}_{n})$ and $\overline{t}_{n}>d(\overline{x}_{n})$ in Theorem 3.5 are equivalent to $\overline{t}_{n}\leq\tilde{d}(\overline{x}_{n}/(\overline{x}_{n}+1))$ and $\overline{t}_{n}>\tilde{d}(\overline{x}_{n}/(\overline{x}_{n}+1))$, respectively, with mapping

$$\tilde{d}(\lambda)=-\sum_{j=2}^{\infty}\ln(j)\lambda^{j},\quad\lambda\in(0,1),$$

(9)

the graph of which is depicted in Fig. 1 to ease the comparison of the values $\overline{t}_{n}$ and $\tilde{d}(\overline{x}_{n}/(\overline{x}_{n}+1))$ for a given data set.

4 SIMULATION STUDY

We perform a simulation study to investigate the accuracy of the ML estimates in dependence of the range of the underlying observations. Let $\boldsymbol{\vartheta}_{1}=(\lambda_{1},\nu_{1})=(7.24,0.8)$ and $\boldsymbol{\vartheta}_{2}=(\lambda_{2},\nu_{2})=(293\,162.5,5)$ be two parameter vectors corresponding to an overdispersed CMP distribution $P_{\boldsymbol{\vartheta}_{1}}$ and an underdispersed CMP distribution $P_{\boldsymbol{\vartheta}_{2}}$. Here, for $i=1,2$, the parameter $\lambda_{i}$ is determined in dependence of $\nu_{i}$ as $(12+(\nu_{i}-1)/(2\nu_{i}))^{\nu_{i}}$ to ensure that the mean of $P_{\boldsymbol{\vartheta}_{i}}$ is approximately equal to 12; see, e.g., [22, formula (7)]. For each parameter vector, we generate $m=100\,000$ realizations of a sample of size $n=20$ and compute the corresponding ML estimates, all of which exist by Theorem 2.1, since the observed range of every sample turns out to be greater than 1 (for the parameter vectors considered, the probability of non-existence of the ML estimate is very small). The ML estimates thus obtained are then grouped with respect to the range of the underlying observations and can be considered realizations of the ML estimator conditioned on the range $X_{(20)}$–$X_{(1)}$. For every such group consisting of the ML estimates $(\hat{\lambda}_{i}^{(j)},\hat{\nu}_{i}^{(j)}),1\leq j\leq k$, say, we separately calculate the (empirical) relative absolute bias

$$\text{RAB}(\hat{\lambda}_{i})=\frac{1}{k}\sum_{j=1}^{k}\left|\frac{\hat{\lambda}_{i}^{(j)}}{\lambda_{i}}-1\right|$$

and the scaled (empirical) root-mean-square error

$$\text{SRMSE}(\hat{\lambda}_{i})=\sqrt{\frac{1}{k}\sum_{j=1}^{k}\left(\frac{\hat{\lambda}_{i}^{(j)}}{\lambda_{i}}-1\right)^{2}},$$

as well as respective quantities for the dispersion parameter $\nu_{i}$, $i=1,2$. The results are shown in Tables 1 and 2 along with the relative frequency of every group. It is found that all dispersion measures are decreasing/increasing as functions of the range of observations. In the overdispersed case, the ML estimates of $\lambda_{1}$ and $\nu_{1}$ are most inaccurate when the range is small, while a large range seems to be less problematic. This finding also applies to the ML estimate of $\lambda_{2}$ in the underdispersed case. The precision of the ML estimate of $\nu_{2}$, in turn, appears not to be affected by a small range of observations. Having shown in Theorem 2.1 that an ML estimate does not exist for a range of 0 or 1, the simulation study additionally suggests that, for small ranges greater than 1, ML estimation may produce highly inaccurate values. Although the probability of non-existence of the ML estimate will typically be small in applications, observing a small range greater than 1 might not be that unlikely, in the case of which one should be critical of the ML estimate.

Table 1 Empirical relative absolute bias (RAB) and scaled root-mean-square error (SRMSE) of the ML estimators given the range of observations, where $m=100\,000$ samples of size $n=20$ are generated from $P_{\boldsymbol{\vartheta}_{1}}$

Full size table

Table 2 Empirical relative absolute bias (RAB) and scaled root-mean-square error (SRMSE) of the ML estimators given the range of observations, where $m=100\,000$ samples of size $n=20$ are generated from $P_{\boldsymbol{\vartheta}_{2}}$

Full size table

REFERENCES

O. A. Adeoti, J.-C. Malela-Majika, S. C. Shongwe, and M. Aslam, ‘‘A homogeneously weighted moving average control chart for Conway–Maxwell Poisson distribution,’’ Journal of Applied Statistics 49 (12), 3090–3119 (2022).
Article MathSciNet Google Scholar
N. Balakrishnan, S. Barui, and F. S. Milienos, ‘‘Piecewise linear approximations of baseline under proportional hazards based COM-Poisson cure models,’’ Communications in Statistics—Simulation and Computation (2022).
O. Barndorff-Nielsen, Information and Exponential Families in Statistical Theory (Wiley, Chichester, 2014).
Book Google Scholar
S. Bedbur and U. Kamps, ‘‘Uniformly most powerful unbiased tests for the dispersion parameter of the Conway–Maxwell–Poisson distribution,’’ Statistics and Probability Letters 196, 109801 (2023).
S. Bedbur, U. Kamps, and A. Imm, ‘‘On the existence of maximum likelihood estimates for the parameters of the Conway–Maxwell–Poisson distribution,’’ ALEA—Latin American Journal of Probability and Mathematical Statistics 20, 561–577 (2023).
MathSciNet Google Scholar
A. Benson and N. Friel, ‘‘Bayesian inference, model selection and likelihood estimation using fast rejection sampling: The Conway–Maxwell–Poisson distribution,’’ Bayesian Analysis 16 (3), 905–931 (2021).
Article MathSciNet Google Scholar
Ch. Chanialidis, L. Evers, T. Neocleous, and A. Nobile, ‘‘Efficient Bayesian inference for COM-Poisson regression models,’’ Statistics and Computing 28, 595–608 (2018).
Article MathSciNet Google Scholar
S. B. Chatla and G. Shmueli, ‘‘A tree-based semi-varying coefficient model for the COM-Poisson distribution,’’ Journal of Computational and Graphical Statistics 29 (4), 827–846 (2020).
Article MathSciNet Google Scholar
R. W. Conway and W. L. Maxwell, ‘‘A queuing model with state dependent service rates,’’ Journal of Industrial Engineering 12, 132–136 (1964).
Google Scholar
F. Daly and R. E. Gaunt, ‘‘The Conway–Maxwell–Poisson distribution: Distributional theory and approximation,’’ ALEA—Latin American Journal of Probability and Mathematical Statistics 13, 635–658 (2016).
MathSciNet Google Scholar
A. Huang and A. S. I. Kim, ‘‘Bayesian Conway–Maxwell–Poisson regression models for overdispersed and underdispersed counts,’’ Communications in Statistics—Theory and Methods 50 (13), 3094–3105 (2021).
Article MathSciNet Google Scholar
T. Kang, S. M. Levy, and S. Datta, ‘‘Analyzing longitudinal clustered count data with zero inflation: Marginal modeling using the Conway–Maxwell–Poisson distribution,’’ Biometrical Journal 63 (4), 761–786 (2021).
Article MathSciNet Google Scholar
C. C. Kokonendji, D. Mizère, and N. Balakrishnan, ‘‘Connections of the Poisson weight function to overdispersion and underdispersion,’’ Journal of Statistical Planning and Inference 138 (5), 1287–1296 (2008).
Article MathSciNet Google Scholar
S. Kotz, N. Balakrishnan, and N. L. Johnson, Continuous Multivariate Distributions, Models, and Applications (Wiley, New York, 2nd ed., 2000).
Book Google Scholar
P. N. Krivitsky, ‘‘Exponential-family random graph models for valued networks,’’ Electronic Journal of Statistics 6, 1100–1128 (2012).
Article MathSciNet Google Scholar
U. Mammadova and M. R. Özkale, ‘‘Conway–Maxwell Poisson regression-based control charts under iterative Liu estimator for monitoring count data,’’ Applied Stochastic Models in Business and Industry 38 (4), 695–725 (2022).
Article MathSciNet Google Scholar
M. S. Melo and A. P. Alencar, ‘‘Conway–Maxwell–Poisson seasonal autoregressive moving average model,’’ Journal of Statistical Computation and Simulation 92 (2), 283–299 (2022).
Article MathSciNet Google Scholar
S. H. Ong and R. C. Gupta, ‘‘Tiefeng Ma, and Shin Z. Sim. Bivariate Conway–Maxwell Poisson distributions with given marginals and correlation,’’ Journal of Statistical Theory and Practice 15, 10 (2021).
Article Google Scholar
L. S. C. Piancastelli, N. Friel, W. Barreto-Souza, and H. Ombao, ‘‘Multivariate Conway–Maxwell–Poisson distribution: Sarmanov method and doubly intractable Bayesian inference,’’ Journal of Computational and Graphical Statistics, 32 (2), 483–500 (2023).
Article MathSciNet Google Scholar
K. F. Sellers, Sh. Borle, and G. Shmueli, ‘‘The COM-Poisson model for count data: A survey on methods and applications,’’ Applied Stochastic Models in Business and Industry 28 (2), 104–116 (2012).
Article MathSciNet Google Scholar
K. F. Sellers, A. W. Swift, and K. S. Weems. ‘‘A flexible distribution for count data,’’ Journal of Statistical Distributions and Applications 4 (22) (2017).
G. Shmueli, T. P. Minka, J. B. Kadane, Sh. Borle, and P. Boatwright, ‘‘A useful distribution for fitting discrete data: revival of the Conway–Maxwell–Poisson distribution,’’ Journal of the Royal Statistical Society: Series C (Applied Statistics) 54 (1), 127–142 (2005).
MathSciNet Google Scholar

Download references

Funding

This work was supported by ongoing institutional funding. No additional grants to carry out or direct this particular research were obtained.

Author information

Authors and Affiliations

Institute of Statistics, RWTH Aachen University, 52056, Aachen, Germany
Stefan Bedbur, Anton Imm & Udo Kamps

Authors

Stefan Bedbur
View author publications
You can also search for this author in PubMed Google Scholar
Anton Imm
View author publications
You can also search for this author in PubMed Google Scholar
Udo Kamps
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Bedbur.

Ethics declarations

The authors of this work declare that they have no conflicts of interest.

Additional information

Publisher’s Note.

Allerton Press remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

About this article

Cite this article

Bedbur, S., Imm, A. & Kamps, U. Characterizing Existence and Location of the ML Estimate in the Conway–Maxwell–Poisson Model. Math. Meth. Stat. 33, 70–78 (2024). https://doi.org/10.3103/S1066530724700042

Download citation

Received: 31 July 2023
Revised: 07 January 2024
Accepted: 15 January 2024
Published: 25 April 2024
Issue Date: March 2024
DOI: https://doi.org/10.3103/S1066530724700042

Keywords:

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Characterizing Existence and Location of the ML Estimate in the Conway–Maxwell–Poisson Model

Abstract

1 INTRODUCTION

2 A CHARACTERIZATION RESULT ON THE EXISTENCE OF THE ML ESTIMATE

3 CHARACTERIZING THE EXISTENCE OF A SOLUTION OF THE LIKELIHOOD EQUATION

4 SIMULATION STUDY

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher’s Note.

About this article

Cite this article

Share this article

Keywords:

Search

Navigation