1 Introduction

The family of generalized Poisson (GP) distribution [1] has been used for more than 40 years to model count data that may be overdispersed or underdispersed. Some of its interesting theoretical properties include a Poisson mixture interpretation [2], and a heavier tail compared to the negative binomial distribution [2, 3]. Various chance mechanisms have been found to generate the GP distribution [4]. Numerous applications are given in [5]. In bioinformatics, the GP distribution has been used for modeling RNA-Seq count data [6,7,8,9,10].

The negative binomial (NB) distribution has a long history of use in the analysis of biological count data [11,12,13]. In bioinformatics, it is the primary model for modelling RNA-Seq count data and forms the basis of statistical tests of differential gene expression [14,15,16,17].

Low et al. [10] pointed out that the observed gene counts in RNA-Seq experiments are the consequence of stochastic variation acting on true gene counts. They proposed to view the true gene count as a count parameter k. Under the GP model, the expectation of the observed gene counts given k is equal to the product of k and a hyperparameter b. Thus, the modeling focus shifts to finding the posterior distribution of k, assuming a specific prior distribution. Since posterior distributions often have complicated forms that make further analysis difficult and computationally expensive, finding appropriate approximations to them is important to improve their applied value in statistics.

We first introduce the GP model. Let X be a random variable following a \(\text{ GP }(\lambda _1, \lambda _2)\) distribution. Its probability mass function (pmf) is given by

$$\begin{aligned} P(X=x \vert \lambda _1,\lambda _2) = \frac{ \lambda _1(\lambda _1+x\lambda _2)^{x-1} e^{-(\lambda _1+x\lambda _2)} }{x!}, \end{aligned}$$
(1)

where \(x=0,1,2,\ldots \), \(\lambda _1 > 0\) and \(\max \{-1, -\frac{\lambda _1}{4}\}< \lambda _2 < 1\). Negative values of \(\lambda _2\) correspond to overdispersion, positive values to underdispersion, and \(\lambda _2 = 0\) reduces Eq. (1) to the Poisson distribution with mean \(\lambda _1\). Consider the following parametrisation: \(\lambda _2 = 1- \sqrt{m}\), \(\lambda _1 = kb\sqrt{m}\), where \(k=0,1,2,\ldots \) is the count parameter, \(0< b < 1\) is a hyperparameter, and \(m > 0\) is the mean–variance ratio. Under this parametrisation, the mean and the variance of the GP model are given by \({\text {E}}(X \vert k) = \lambda _1 / (1-\lambda _2) = k b,\) and \(\text{ Var }(X\vert k) = {\text {E}}(X\vert k) / (1-\lambda _2)^2 = k b/m\), respectively.

Now, consider a random variable Y that has an NB distribution with parameters r (the number of failures until y successes) and p (the probability of success). Its pmf is given by

$$\begin{aligned} P(Y=y \vert r,p)=\frac{\Gamma {(y+r)}}{\Gamma {(y+1)}\Gamma {(r)}}p^y(1-p)^r, \end{aligned}$$
(2)

for \(y=0,1,2,\ldots \). The mean and the variance of Y are given by \(\text {E}(Y \vert k)={pr}/(1-p) = kb\), and \(\text {Var}(Y \vert k)=pr/(1-p)^2 = {kb}/{m}\), respectively, where \(k = 0, 1, 2, \ldots \), \(0<b<1\), \(0<m<1\), and m is the mean-variance ratio. Thus, \(r={kbm}/(1-m)=k\tau \), where \(\tau = bm/(1-m)\).

We are interested in the posterior distribution of k conditioned on observations from these two models, using an improper uniform prior \(P(k)=1, k=0,1,2,\ldots \). The posterior distribution of k under a GP model (Eq. (1)) is given by

$$\begin{aligned} \begin{aligned} P(k\vert X=x)&= \frac{k(bk\sqrt{m}+x(1-\sqrt{m}))^{x-1}e^{-bk\sqrt{m}}}{\sum _{j=x}^\infty j(bj\sqrt{m}+x(1-\sqrt{m}))^{x-1}e^{-bj\sqrt{m}}} \\&= \frac{k(k+g(x))^{x-1}e^{-bk\sqrt{m}}}{\sum _{j=x}^\infty j(j+g(x))^{x-1}e^{-b\sqrt{m}j}}, \end{aligned} \end{aligned}$$
(3)

for \(k \ge x\), where \(0<m<\min \bigl \{{(1-b)^{-2}} ,4 \bigr \} \) and \(g(x) = \{(1-\sqrt{m}) /(b\sqrt{m})\}x\). Then, the posterior distribution of k under an NB model (Eq. (2)) is given by

$$\begin{aligned} \begin{aligned} P(k \vert Y=y)&=\frac{\frac{\Gamma (y+k\tau )}{\Gamma (y+1)\Gamma (k\tau )}(1-m)^{y}m^{k\tau }}{\sum _{j= y}^{\infty }\frac{\Gamma (y+j\tau )}{\Gamma (y+1)\Gamma (j\tau )}(1-m)^{y}m^{j\tau }} \\&= \frac{\frac{\Gamma (y+k\tau )}{\Gamma (k\tau )}m^{k\tau }}{\sum _{j= y}^{\infty }\frac{\Gamma (y+j\tau )}{\Gamma (j\tau )}m^{j\tau }}, \end{aligned} \end{aligned}$$
(4)

for \(k \ge y\).

The posterior distributions (Eq. (3) and (4)) are proper even though an improper uniform prior distribution is used. The aim of this paper is to find their exact distributions, and where this is not possible, approximating distributions that are mathematically tractable. By doing so, their mean and variance can be determined directly from the theoretical properties of the approximating distribution.

2 Results

We first show that when the GP and the NB models have count of 0 or 1, the posterior distribution of k is geometric, and extended NB, respectively.

Theorem 1

The posterior distribution of k is (i) geometric with mean \(e^{-b\sqrt{m}}(1-e^{-b\sqrt{m}})^{-1}\) and variance \({e^{-b\sqrt{m}}(1-e^{-b\sqrt{m}})^{-2}}\) for \(k \ge 0\), when \(x=0\) for the GP model; and with mean \({m^\tau }{(1-m^\tau )^{-1}}\) and variance \({m^\tau }{(1-m^\tau )^{-2}}\) for \(k \ge 0\), when \(y=0\) for the NB model; (ii) extended NB with mean \({\bigl (1 + e^{-b\sqrt{m}}\bigr )}{\bigl (1-e^{-b\sqrt{m}}\bigr )^{-1}}\) and variance \({2e^{-b\sqrt{m}}}{\bigl (1-e^{-b\sqrt{m}}\bigr )^{-2}}\) for \(k \ge 1\), when \(x=1\) for the GP model; and with mean \({(1+m^\tau )}{(1-m^\tau )^{-1}}\) and variance \({2m^\tau }{(1-m^\tau )^{-2}}\) for \(k \ge 1\), when \(y=1\) for the NB model.

Proof

We only show the proof for the GP model with \(x=0,1\) since the proof for the NB model with \(y=0,1\) is similar with \(m^{\tau }=e^{-b\sqrt{m}}\).

  1. (i)

    When \(x=0\):

    $$\begin{aligned} \begin{aligned} P(k \vert X=0)&=\frac{e^{-bk\sqrt{m}}}{\sum _{j = 0}^{\infty } e^{-bj\sqrt{m}}} = \bigl (e^{-b\sqrt{m}}\bigr )^k \bigl (1-e^{-b\sqrt{m}}\bigr ), \end{aligned} \end{aligned}$$

    where \(k = 0,1,\ldots \). Hence, the posterior distribution of k is geometric with success probability \(p=1-e^{-b\sqrt{m}}\). The mean and the variance follow from standard results.

  2. (ii)

    When \(x=1\):

    $$\begin{aligned} P(k \vert X=1) =\frac{k e^{-bk\sqrt{m}}}{\sum _{j= 1}^{\infty }j e^{-b j\sqrt{m}}} = \left( {\begin{array}{c}k\\ 1\end{array}}\right) \bigl (1-e^{-b\sqrt{m}})^2 \bigl (e^{-b\sqrt{m}}\bigr )^{k-1}, \end{aligned}$$

    where \(k=1,2,\ldots \). Therefore, the posterior distribution of k is extended NB with parameters \(p=e^{-b\sqrt{m}}\) and \(r=2\). The mean and the variance are

    $$\begin{aligned} \text {E}(k \vert X=1)&= \sum _{k=1}^{\infty }k P(k \vert X=1) = \frac{1+e^{-b\sqrt{m}}}{1-e^{-b\sqrt{m}}},\\ \text {Var}(k \vert X=1)&= \sum _{k=1}^{\infty }k^2 P(k\vert X=1) - \bigl [\text {E}(k \vert X=1) \bigr ]^2 =\frac{2e^{-b\sqrt{m}}}{\bigl (1-e^{-b\sqrt{m}}\bigr )^2}, \end{aligned}$$

respectively. \(\square \)

Corollary 1

The cumulative distribution function (cdf) of the posterior distribution of k is

  1. (i)

    \(F_{k \vert X=0}(k)= 1-e^{-b\sqrt{m}(k+1)}\) for \(k \ge 0\);

  2. (ii)

    \(F_{k \vert X=1}(k)=1-\bigl [k(1-e^{-b\sqrt{m}})+1\bigr ] e^{-b\sqrt{m}k}\) for \(k\ge 1\);

  3. (iii)

    \(F_{k \vert Y=0}(k)= 1-m^{\tau {(k+1)}}\) for \(k \ge 0\);

  4. (iv)

    \(F_{k \vert Y=1}(k)=1-\bigl [k(1-m^\tau )+1\bigr ]m^{\tau k}\) for \(k\ge 1\).

We now show that the extended Hurwitz–Lerch zeta distribution [18] is an appropriate approximation for the posterior distribution of k under the GP model with \(x\ge 2\).

Theorem 2

The posterior distribution of k, given \(X=x\) has a GP distribution with mean kb and variance kb/m, can be approximated by an extended Hurwitz–Lerch zeta distribution with mean \(x/b + 1/(b\sqrt{m})\) and variance \((x+1)/(b^2 m)\) for some \(k \ge l\) where \(x \ge 2\) is the conditioning value, and \(l \ge x\).

Proof

First, we note that the denominator in Eq. (3) can be written as

$$\begin{aligned} \sum _{j=x}^\infty (j+g(x))^x e^{-b\sqrt{m}j} - g(x)\sum _{j=x}^\infty (j+g(x))^{x-1}e^{-b\sqrt{m}j}. \end{aligned}$$

The Lerch transcendent \(\Phi (u,s,v)\) (see [19]) is given by

$$\begin{aligned} \Phi (u,s,v) = \sum _{k=0}^\infty \frac{u^k}{(v+k)^s}, \end{aligned}$$

where u is complex and \(\vert u\vert < 1\), \(v \ne 0, -1, -2, \ldots \), and \(s \ne 1, 2, \ldots \). Representing the denominator using the Lerch transcendent, we get

$$\begin{aligned} e^{-b\sqrt{m}x}\Phi (e^{-b\sqrt{m}},-x,wx) - (w-1)xe^{-b\sqrt{m}x}\Phi (e^{-b\sqrt{m}},-(x-1),wx), \end{aligned}$$
(5)

where \(w = 1+(1-\sqrt{m})/(b\sqrt{m})\).

The following identity (Eq.1.11(11) in [19]) relates the Lerch transcendent to the Bernoulli polynomials for \(s=-h\):

$$\begin{aligned} \Phi (u,-h,v) = \frac{h!}{u^v}\left( \log \frac{1}{u} \right) ^{-(h+1)} - \frac{1}{u^v}\sum _{r=0}^\infty \frac{B_{h+r+1}(v)(\log u)^r}{r!(h+r+1)}, \end{aligned}$$
(6)

where \(\vert {\log u}\vert < 2\pi \), \(v \ne 0,-1,-2,\ldots \), \(h \ne -1,-2,\ldots \), and \(B_n(v)\) is the nth Bernoulli polynomial with argument v. The Bernoulli polynomial is defined as

$$\begin{aligned} B_n(v) = \sum _{j=0}^n \left( {\begin{array}{c}n\\ j\end{array}}\right) B_{n-j}(0)v^j, \end{aligned}$$

where \(B_{i}(0)\) is the ith Bernoulli number.

If we substitute \(u=e^{-b\sqrt{m}}\), \(h=x\), \(v=wx\) into Eq. (6), and then multiply both sides by \(e^{-bw\sqrt{m}x}\), we obtain

$$\begin{aligned} e^{-bw\sqrt{m}x}\Phi (e^{-b\sqrt{m}},-x,wx) = \frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}} - \sum _{r=0}^\infty \frac{B_{x+r+1}(wx)(-b\sqrt{m})^r}{r!(x+r+1)}. \end{aligned}$$
(7)

We can approximate Eq. (7) as

$$\begin{aligned} e^{-bw\sqrt{m}x}\Phi (e^{-b\sqrt{m}},-x,wx) \approx \frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}}, \end{aligned}$$
(8)

provided that

$$\begin{aligned} \Biggl \vert \sum _{r=0}^\infty \frac{B_{x+r+1}(wx)(-b\sqrt{m})^r}{r!(x+r+1)}\Biggr \vert = o\left( \frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}}\right) , \end{aligned}$$

as \(x \rightarrow \infty \), o being the little o notation. To show this, we use an identity involving the Bernoulli polynomials and the sum of xth powers (Eq. 1.13(10) in [19]):

$$\begin{aligned} \frac{B_{x+1}(z)-B_{x+1}(0)}{x+1}= \sum _{t=0}^{z-1}t^x, \end{aligned}$$

for \(x=2,3,\ldots \) and \(z \in {\mathbb {Z}}^+\). For sufficiently large z, \(B_{x+1}(z)\) is positive and dominates \(B_{x+1}(0)\). Suppose wx is a positive integer that is sufficiently large, then

$$\begin{aligned} \frac{B_{x+r+1}(wx)}{x+r+1} \approx \sum _{t=0}^{wx-1} t^{x+r}, \end{aligned}$$

if \(B_{x+r+1}(0) > 0\) and for some b and m such that \(B_{x+r+1}(wx) \gg B_{x+r+1}(0)\); if \(B_{x+r+1}(0) \le 0\), we have

$$\begin{aligned} 0 < \frac{B_{x+r+1}(wx)}{x+r+1} \le \sum _{t=0}^{wx-1} t^{x+r}. \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned}&\frac{(b\sqrt{m})^{x+1}}{\Gamma (x+1)} \Biggl \vert \sum _{r=0}^\infty \frac{B_{x+r+1}(wx)(-b\sqrt{m})^r}{r!(x+r+1)} \Biggr \vert \\<\,&\frac{1}{x!} (b\sqrt{m})^{x+1} \sum _{r=0}^{\infty }\left[ \frac{(b\sqrt{m})^r}{r!} \Bigl \vert \frac{B_{x+r+1}(wx)}{x+r+1} \Bigr \vert \right] \\ \lessapprox&\frac{1}{x!} (b\sqrt{m})^{x+1} \sum _{r=0}^{\infty }\left[ \frac{(b\sqrt{m})^r}{r!} \sum _{t=0}^{wx-1} t^{x+r} \right] \\ <\,&\frac{1}{x!} (b\sqrt{m})^{x+1} \sum _{r=0}^{\infty }\left[ \frac{(b\sqrt{m})^r}{r!} \int _{0}^{wx} t^{x+r}dt \right] \\ =\,&\frac{1}{x!} (b\sqrt{m})^{x+1} \sum _{r=0}^{\infty }\left[ \frac{(b\sqrt{m})^r}{r!} \frac{(wx)^{x+r+1}}{x+1}\right] \\ =\,&\frac{(bw\sqrt{m}x)^{x+1}}{(x+1)!} e^{bw\sqrt{m}x}. \end{aligned} \end{aligned}$$

Applying Stirling’s approximation for x!, the right-hand-side simplifies to \(U = (c/\sqrt{2\pi x})(ce^{c+1})^x\), where \(c = bw\sqrt{m}\). For \(0 < ce^{c+1} \le 1\), U converges to 0 as \(x \rightarrow \infty \). Therefore, \(0 < c \le W(e^{-1}) \approx 0.2785\), where \(W(\cdot )\) is the Lambert W function. Thus, for \( \min \{ [1-W(e^{-1})]^{2}(1-b)^{-2}, 4\} \le m < \min \{ (1-b)^{-2}, 4\}\), Eq. (8) should give reasonably good approximation.

Subsequently, we can approximate Eq. (5) using Eq. (8):

$$\begin{aligned}&e^{-b\sqrt{m}x}\Phi (e^{-b\sqrt{m}},-x,wx) - (w-1)xe^{-b\sqrt{m}x}\Phi (e^{-b\sqrt{m}},-(x-1),wx) \nonumber \\&\approx \, e^{-b\sqrt{m}x} e^{bw\sqrt{m}x} \frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}} -(w-1)x e^{-b\sqrt{m}x} e^{bw\sqrt{m}x} \frac{\Gamma (x)}{(b\sqrt{m})^{x}}\nonumber \\&\quad =\, e^{(1-\sqrt{m})x} \frac{\sqrt{m} \ \Gamma (x+1)}{(b\sqrt{m})^{x+1}}\nonumber \\&\quad \approx \, e^{(1-\sqrt{m})x} \sqrt{m} e^{-bw\sqrt{m}x} \Phi (e^{-b\sqrt{m}}, -x, wx)\nonumber \\&\quad =\, \sqrt{m} e^{-b\sqrt{m}x} \Phi (e^{-b\sqrt{m}}, -x, wx). \end{aligned}$$
(9)

Substituting Eq. (9) for the denominator in Eq. (3), we obtain

$$\begin{aligned} \begin{aligned} P(k\vert X=x)&\approx \frac{k(k+g(x))^{x-1} e^{-b\sqrt{m}k}}{\sqrt{m}\Phi (e^{-b\sqrt{m}}, -x, wx) e^{-b\sqrt{m}x}}\\&= \underbrace{\frac{(e^{-b\sqrt{m}})^{k-x}}{\Phi (e^{-b\sqrt{m}}, -x, wx)(k-x+wx)^{-x}}}_{\text {A}} \times \underbrace{\frac{k}{\sqrt{m}(k+g(x))}}_{\text {B}}. \end{aligned} \end{aligned}$$
(10)

To see that A is the pmf of an extended Hurwitz–Lerch zeta distribution with extended parameter space [20], we start with the pmf of the Hurwitz–Lerch zeta distribution:

$$\begin{aligned} \begin{aligned} q_{k}&=\frac{1}{\theta \Phi (\theta ,s+1,a+1)}\frac{\theta ^{k}}{(k+a)^{s+1}}, \end{aligned} \end{aligned}$$
(11)

for \(k=1,2,\ldots \), where \(a>-1\) and \(s \in {\mathbb {R}}\) if \(0<\theta <1\) or \(s>0\) if \(\theta = 1\). Shifting the support to \(k=x,x+1,\ldots \), Eq. (11) becomes

$$\begin{aligned} \begin{aligned} q_k&= \frac{\theta ^{k-x}}{\Phi (\theta , s+1,a+1)(k-x+1+a)^{s+1}}, \end{aligned} \end{aligned}$$
(12)

for \(k=x,x+1,\ldots \), where \(x\ge 2\). Taking \(\theta = e^{-b\sqrt{m}}\), \(s+1=-x\), \(a+1=wx\) and substituting them into Eq. (12) leads to A in Eq. (10).

For some combinations of b and m, there exists an \(l \ge x\) such that \(k \ge l\) results in \(\sqrt{m}(k + g(x)) = k + o(k)\). In this case, B of Eq. (10) becomes approximately 1, thus

$$\begin{aligned} \begin{aligned} P(k\vert X=x)&\approx \frac{(e^{-b\sqrt{m}})^{k-x}}{\Phi (e^{-b\sqrt{m}}, -x, wx)(k-x+wx)^{-x}}, \end{aligned} \end{aligned}$$
(13)

which is the pmf of the extended Hurwitz–Lerch zeta distribution with parameters \(\theta = e^{-b\sqrt{m}}\), \(s+1=-x\), \(a+1=wx\).

To derive the mean and the variance of the extended Hurwitz–Lerch distribution (Eq. (13)), we first note that the latter is a special case of the modified power series distribution [21], which has pmf

$$\begin{aligned} P(Z=z)=\frac{A(z)[g(\theta )]^z}{f(\theta )},\quad z \in {\mathbb {B}}, \end{aligned}$$

where \({\mathbb {B}} \subset {\mathbb {Z}}^+\), \(A(z)>0\), and \(f(\theta )\), \(g(\theta )\) are positive, finite and differentiable functions of \(\theta \). In this case, we have

$$\begin{aligned} g(\theta )=\theta ,\quad f(\theta )=\theta ^{x}\Phi (\theta ,s+1,a+1),\quad A(z)=\frac{1}{(z-x+1+a)^{s+1}}. \end{aligned}$$

The expectation of Z is

$$\begin{aligned} {\text {E}}(Z) =\frac{g(\theta )f^{'}(\theta )}{f(\theta )g^{'}(\theta )} =\frac{\theta }{\theta ^{x}\Phi (\theta ,s+1,a+1)} \frac{\partial }{\partial \theta }\theta ^{x}\Phi (\theta ,s+1,a+1), \end{aligned}$$
(14)

and the variance is

$$\begin{aligned} \text {Var}(Z) = \frac{g(\theta )}{g^{'}(\theta )}\frac{\partial }{\partial {\theta }} {\text {E}}(Z). \end{aligned}$$
(15)

Note that

$$\begin{aligned} \frac{\partial }{\partial \theta }\Phi (\theta ,s,a)=\frac{1}{\theta }\Phi (\theta ,s-1,a)-\frac{a}{\theta }\Phi (\theta ,s,a). \end{aligned}$$
(16)

Therefore,

$$\begin{aligned} \begin{aligned} \frac{\partial }{\partial \theta }\theta ^{x}\Phi (\theta ,s\!+\!1,a\!+\!1)&= \theta ^{x-1}\bigl [(x-a-1)\Phi (\theta ,s\!+\!1,a\!+\!1) + \Phi (\theta ,s,a+1) \bigr ]. \end{aligned}\nonumber \\ \end{aligned}$$
(17)

Let \(\theta = e^{-b\sqrt{m}}\), \(s+1=-x\) and \(a+1=wx\). Substituting Eq. (17) into Eq. (14) yields

$$\begin{aligned} {\text {E}}(k \vert X=x) \approx \frac{\Phi (e^{-b\sqrt{m}}, -x-1, wx)}{\Phi (e^{-b\sqrt{m}}, -x, wx)} - \frac{(1-\sqrt{m})x}{b\sqrt{m}}. \end{aligned}$$
(18)

Then, substituting Eq. (18) into Eq. (15) and using Eq. (16) yields

$$\begin{aligned} \text {Var}(k \vert X=x) \approx \frac{\Phi (e^{-b\sqrt{m}}, -x-2, wx)}{\Phi (e^{-b\sqrt{m}}, -x, wx)} - \left[ \frac{\Phi (e^{-b\sqrt{m}}, -x-1, wx)}{\Phi (e^{-b\sqrt{m}}, -x, wx)} \right] ^2. \end{aligned}$$
(19)

Eq. (18) can be further approximated by applying Eq. (8):

$$\begin{aligned} {\text {E}}(k \vert X=x) \approx \frac{\frac{\Gamma (x+2)}{(b\sqrt{m})^{x+2}}}{\frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}}} - \frac{1-\sqrt{m}}{b\sqrt{m}}x = \frac{x+1}{b\sqrt{m}} - \frac{(1-\sqrt{m})x}{b\sqrt{m}} = \frac{x}{b} + \frac{1}{b\sqrt{m}}. \end{aligned}$$

Similarly, Eq. (19) can be further approximated as

$$\begin{aligned} \text {Var}(k \vert X=x) \approx \frac{\frac{\Gamma (x+3)}{(b\sqrt{m})^{x+3}}}{\frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}}} - \left[ \frac{\frac{\Gamma (x+2)}{(b\sqrt{m})^{x+2}}}{\frac{\Gamma (x+1)}{(b\sqrt{m})^{x+1}}} \right] ^2 = \frac{(x+2)(x+1)}{(b\sqrt{m})^2} - \frac{(x+1)^2}{(b\sqrt{m})^2} = \frac{x+1}{b^2 m}. \end{aligned}$$

\(\square \)

For the NB model with \(y \ge 2\), we again find that the posterior distribution of k is approximately given by the extended Hurwitz–Lerch zeta distribution.

Theorem 3

Let Y have an NB distribution with pmf given by Eq. (2). The posterior distribution of k given \(Y=y\) approximately follows the extended Hurwitz–Lerch zeta distribution with mean \(-(y+1)/(\tau \log m) - (y-1)/(2\tau )\) and variance \((y+1)/(\tau \log m)^2\), for \(k \ge y\), where \(y \ge 2\).

Proof

For non-negative real \(\alpha \), \(\beta \) such that \(\alpha \ne \beta \), Laforgia & Natalini [22] give the following approximation for the quotient of gamma functions:

$$\begin{aligned} \frac{\Gamma (y+\alpha )}{\Gamma (y+\beta )} \approx \frac{1}{(y+c)^{\beta - \alpha }}, \end{aligned}$$
(20)

when \(y \rightarrow \infty \), with \(c=(\alpha +\beta -1)/2\). Applying Eq. (20) in Eq. (4) yields

$$\begin{aligned} \begin{aligned} P(k \vert Y=y)&\approx \frac{\bigl (k\tau +\frac{y-1}{2}\bigr )^y m^{k\tau }}{\sum _{j= y}^{\infty } \bigl (j\tau +\frac{y-1}{2}\bigr )^y m^{j\tau }} \\&= \frac{\bigl (k +\frac{y-1}{2\tau }\bigr )^y m^{k\tau }}{\sum _{j= y}^{\infty } \bigl (j+\frac{y-1}{2\tau }\bigr )^y m^{j\tau }}, \end{aligned} \end{aligned}$$
(21)

as \(k\tau , j\tau \rightarrow \infty \), for \(k \ge y\). Then, representing the denominator of Eq. (21) using the Lerch transcendent, we obtain

$$\begin{aligned} \sum _{j= y}^{\infty } \left( j+\frac{y-1}{2\tau }\right) ^y m^{j\tau } = m^{\tau y} \Phi \left( {m^\tau , -y, \left( \frac{1}{2\tau }+1\right) y - \frac{1}{2\tau }} \right) . \end{aligned}$$
(22)

Hence, Eq. (21) can be expressed as

$$\begin{aligned} P(k \vert Y=y) \approx \frac{\bigl (k +\frac{y-1}{2\tau }\bigr )^y m^{\tau (k-y)}}{\Phi \bigl ({m^\tau , -y, (\frac{1}{2\tau }+1)y - \frac{1}{2\tau }} \bigr )}, \end{aligned}$$
(23)

where \(k=y,y+1,\ldots \). Eq. (23) is just Eq. (12) with \(\theta = m^\tau \), \(s+1=-y\), \(a+1=\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\). Therefore, we conclude that the pmf of the extended Hurwitz–Lerch zeta distribution approximates the posterior distribution of k under NB model for \(y \ge 2\).

By the same approach used in Theorem 2 to derive the mean and the variance of the posterior distribution of k, we obtain

$$\begin{aligned} {\text {E}}(k\vert Y=y) \approx \frac{\Phi \bigl (m^\tau ,-y-1,\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\bigr )}{\Phi \bigl (m^\tau ,-y,\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\bigr )} - \frac{y-1}{2\tau }, \end{aligned}$$
(24)
$$\begin{aligned} \begin{aligned} {\text {Var}}(k \vert Y=y) \approx&\frac{\Phi \bigl (m^\tau ,-y-2,\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\bigr )}{\Phi \bigl (m^\tau ,-y,\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\bigr )}\\ {}&-\left[ \frac{\Phi \bigl (m^\tau ,-y-1,\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\bigr )}{\Phi \bigl (m^\tau ,-y,\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\bigr )} \right] ^2, \end{aligned} \end{aligned}$$
(25)

for \(k \ge y\), where \(y \ge 2\). By similar argument leading to Eq. (8), we obtain

$$\begin{aligned} m^{\tau {g(y)}} \Phi \bigl (m^\tau , -y, g(y)\bigr )\approx \frac{\Gamma (y+1)}{(-\tau \log m)^{y+1}}, \end{aligned}$$
(26)

where \(g(y)=(\frac{1}{2\tau } + 1)y - \frac{1}{2\tau }\). Using Eq. (26), Eq. (24) can be approximated as

$$\begin{aligned} {\text {E}}(k\vert Y=y) \approx \frac{\frac{\Gamma (y+2)}{(-\tau \log m)^{y+2}}}{\frac{\Gamma (y+1)}{(-\tau \log m)^{y+1}}} - \frac{y-1}{2\tau } = -\frac{y+1}{\tau \log m} - \frac{y-1}{2\tau }. \end{aligned}$$

Similarly, we can use Eq. (26) to approximate Eq. (25) as

$$\begin{aligned} \begin{aligned} {\text {Var}}(k\vert Y=y)&\approx \frac{\frac{\Gamma (y+3)}{(-\tau \log m)^{y+3}}}{\frac{\Gamma (y+1)}{(-\tau \log m)^{y+1}}} - \Bigl (\frac{y+1}{-\tau \log m} \Bigr )^2 \\&= \frac{(y+2)(y+1)}{ (\tau \log m )^2} - \frac{(y+1)^2}{(\tau \log m)^2 } \\&= \frac{y+1}{(\tau \log m)^2}. \end{aligned} \end{aligned}$$

\(\square \)

The results of Theorem 2 and Theorem 3 lead us to the following corollary.

Table 1 Kullback–Leibler divergence for Eq. (3) versus Eq. (13)
Table 2 Kullback–Leibler divergence for Eq. (4) versus Eq. (23)
Table 3 Posterior mean of k when X has a GP distribution. For the array in each cell, the first value is the exact posterior mean computed using Eq. (3) (first 10, 000 terms); the second and the third give the deviation from the first value using Eq. (18) and Theorem 2, respectively
Table 4 Posterior standard deviation of k when X has a GP distribution. For the array in each cell, the first value is the exact posterior standard deviation computed using Eq. (3) (first 10, 000 terms); the second and the third give the deviation from the first value using Eq. (19) and Theorem 2, respectively. NA indicates an instance of failure to compute the Lerch transcendent using the VGAM package for \(X=50\) and \(m=b=0.8\)
Table 5 Posterior mean of k when Y has an NB distribution. For the array in each cell, the first value is the exact posterior mean computed using Eq. (4) (first 10, 000 terms); the second and the third give the deviation from the first value using Eq. (24) and Theorem 3, respectively
Table 6 Posterior standard deviation of k when Y has an NB distribution. For the array in each cell, the first value is the exact posterior standard deviation computed using Eq. (4) (first 10, 000 terms); the second and the third give the deviation from the first value using Eq. (25) and Theorem 3, respectively. NA indicates an instance of failure to compute the Lerch transcendent using the VGAM package for \(Y=50\) and \(m=b=0.8\)

Corollary 2

The cdf of the posterior distribution of k is approximately

  1. (i)

    \( F_{k \vert X=x}(k) \approx 1 - e^{-b\sqrt{m}(k-x+1)}\Phi (e^{-b\sqrt{m}}, -x, k+wx-x+1)/\Phi (e^{-b\sqrt{m}},-x,wx), \) for \(k \ge x\), where \(x \ge 2\), for the GP model;

  2. (ii)

    \( F_{k \vert Y=y}(k) \approx 1 - m^{\tau (k-y+1)}\Phi (m^\tau , -y, k+ \frac{y-1}{2\tau } +1)/\Phi (m^\tau ,-y,\frac{y-1}{2\tau }+y ), \) for \(k \ge y\), where \(y \ge 2\), for the NB model.

3 Computational Validation

Table 1 shows how well the extended Hurwitz–Lerch zeta approximates the posterior distribution of k under GP for different combinations of b and m. For a fixed b, the approximation is best for m in the neighborhood of 1. For a fixed m, the approximation improves as b becomes closer to 1. Finally, for fixed m and b, the approximation improves as x increases.

Table 2 shows that for a given y, the larger the values of m and b, the better the extended Hurwitz–Lerch zeta distribution approximates the posterior distribution of k under the NB model. For fixed m and b, the approximation deteriorates as y increases up to 50. However, the Kullback–Leibler divergence remains well below 0.02 when \(0.6 \le m < 1\) for the b values considered.

Table 3 and 4 show results of approximating the mean and the standard deviation of the posterior distribution of k given \(X=x\) has a GP distribution. Similar results for the posterior distribution of k given \(Y=y\) has a NB distribution are given in Table 5 and 6. In general, the approximations have relative error that stays within \(10\%\) of the true value when \(m \ge 0.6\), for the b and XY values considered. Approximations that use the Lerch transcendent function (e.g., Eq. (18), Eq. (19), Eq. (24) and Eq. (25) have smaller relative error compared to the simpler equations in Theorem 2 and Theorem 3. However, for large values of X and Y, both approximations generally have similar relative error for \(m \ge 0.6\). Since the Lerch transcendent cannot be evaluated for some combinations of b and m for large XY values, the use of the simpler equations in these two theorems suffices.

4 Concluding Remarks

In this paper, we have clarified several theoretical properties of the posterior distribution of a count parameter k arising in the GP and the NB model. Thus, for conditioning values of 0 and 1 from these two models, the posterior distribution of k is found to be geometric and extended NB, respectively. For conditioning values of 2 or more, the posterior distribution of k under either a GP or an NB model is approximated by the extended Hurwitz–Lerch zeta distribution. To our knowledge, this is the first instance where a connection between the Hurwitz–Lerch zeta distribution and a Bayesian posterior distribution is demonstrated. The present results open up the possibility of using the posterior mean to correct for observed gene counts in RNA-Seq data analysis.