Abstract
Consider the generalized Poisson and the negative binomial model with mean parameter equal to kb, where \(k \ge 0\) is a count parameter and \(0< b < 1\) is a hyperparameter. We show that conditioning on counts from both models and assuming a uniform prior for k lead to the following Bayesian posterior distributions: (i) geometric for conditioning value of 0; (ii) extended negative binomial for conditioning value of 1; (iii) approximately extended Hurwitz–Lerch zeta distribution for conditioning value of 2 or more. Kullback–Leibler divergence for measuring the quality of the approximating distributions for some combinations of b and the mean–variance ratio is given.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The family of generalized Poisson (GP) distribution [1] has been used for more than 40 years to model count data that may be overdispersed or underdispersed. Some of its interesting theoretical properties include a Poisson mixture interpretation [2], and a heavier tail compared to the negative binomial distribution [2, 3]. Various chance mechanisms have been found to generate the GP distribution [4]. Numerous applications are given in [5]. In bioinformatics, the GP distribution has been used for modeling RNA-Seq count data [6,7,8,9,10].
The negative binomial (NB) distribution has a long history of use in the analysis of biological count data [11,12,13]. In bioinformatics, it is the primary model for modelling RNA-Seq count data and forms the basis of statistical tests of differential gene expression [14,15,16,17].
Low et al. [10] pointed out that the observed gene counts in RNA-Seq experiments are the consequence of stochastic variation acting on true gene counts. They proposed to view the true gene count as a count parameter k. Under the GP model, the expectation of the observed gene counts given k is equal to the product of k and a hyperparameter b. Thus, the modeling focus shifts to finding the posterior distribution of k, assuming a specific prior distribution. Since posterior distributions often have complicated forms that make further analysis difficult and computationally expensive, finding appropriate approximations to them is important to improve their applied value in statistics.
We first introduce the GP model. Let X be a random variable following a \(\text{ GP }(\lambda _1, \lambda _2)\) distribution. Its probability mass function (pmf) is given by
where \(x=0,1,2,\ldots \), \(\lambda _1 > 0\) and \(\max \{-1, -\frac{\lambda _1}{4}\}< \lambda _2 < 1\). Negative values of \(\lambda _2\) correspond to overdispersion, positive values to underdispersion, and \(\lambda _2 = 0\) reduces Eq. (1) to the Poisson distribution with mean \(\lambda _1\). Consider the following parametrisation: \(\lambda _2 = 1- \sqrt{m}\), \(\lambda _1 = kb\sqrt{m}\), where \(k=0,1,2,\ldots \) is the count parameter, \(0< b < 1\) is a hyperparameter, and \(m > 0\) is the mean–variance ratio. Under this parametrisation, the mean and the variance of the GP model are given by \({\text {E}}(X \vert k) = \lambda _1 / (1-\lambda _2) = k b,\) and \(\text{ Var }(X\vert k) = {\text {E}}(X\vert k) / (1-\lambda _2)^2 = k b/m\), respectively.
Now, consider a random variable Y that has an NB distribution with parameters r (the number of failures until y successes) and p (the probability of success). Its pmf is given by
for \(y=0,1,2,\ldots \). The mean and the variance of Y are given by \(\text {E}(Y \vert k)={pr}/(1-p) = kb\), and \(\text {Var}(Y \vert k)=pr/(1-p)^2 = {kb}/{m}\), respectively, where \(k = 0, 1, 2, \ldots \), \(0<b<1\), \(0<m<1\), and m is the mean-variance ratio. Thus, \(r={kbm}/(1-m)=k\tau \), where \(\tau = bm/(1-m)\).
We are interested in the posterior distribution of k conditioned on observations from these two models, using an improper uniform prior \(P(k)=1, k=0,1,2,\ldots \). The posterior distribution of k under a GP model (Eq. (1)) is given by
for \(k \ge x\), where \(0<m<\min \bigl \{{(1-b)^{-2}} ,4 \bigr \} \) and \(g(x) = \{(1-\sqrt{m}) /(b\sqrt{m})\}x\). Then, the posterior distribution of k under an NB model (Eq. (2)) is given by
for \(k \ge y\).
The posterior distributions (Eq. (3) and (4)) are proper even though an improper uniform prior distribution is used. The aim of this paper is to find their exact distributions, and where this is not possible, approximating distributions that are mathematically tractable. By doing so, their mean and variance can be determined directly from the theoretical properties of the approximating distribution.
2 Results
We first show that when the GP and the NB models have count of 0 or 1, the posterior distribution of k is geometric, and extended NB, respectively.
Theorem 1
The posterior distribution of k is (i) geometric with mean \(e^{-b\sqrt{m}}(1-e^{-b\sqrt{m}})^{-1}\) and variance \({e^{-b\sqrt{m}}(1-e^{-b\sqrt{m}})^{-2}}\) for \(k \ge 0\), when \(x=0\) for the GP model; and with mean \({m^\tau }{(1-m^\tau )^{-1}}\) and variance \({m^\tau }{(1-m^\tau )^{-2}}\) for \(k \ge 0\), when \(y=0\) for the NB model; (ii) extended NB with mean \({\bigl (1 + e^{-b\sqrt{m}}\bigr )}{\bigl (1-e^{-b\sqrt{m}}\bigr )^{-1}}\) and variance \({2e^{-b\sqrt{m}}}{\bigl (1-e^{-b\sqrt{m}}\bigr )^{-2}}\) for \(k \ge 1\), when \(x=1\) for the GP model; and with mean \({(1+m^\tau )}{(1-m^\tau )^{-1}}\) and variance \({2m^\tau }{(1-m^\tau )^{-2}}\) for \(k \ge 1\), when \(y=1\) for the NB model.
Proof
We only show the proof for the GP model with \(x=0,1\) since the proof for the NB model with \(y=0,1\) is similar with \(m^{\tau }=e^{-b\sqrt{m}}\).
-
(i)
When \(x=0\):
$$\begin{aligned} \begin{aligned} P(k \vert X=0)&=\frac{e^{-bk\sqrt{m}}}{\sum _{j = 0}^{\infty } e^{-bj\sqrt{m}}} = \bigl (e^{-b\sqrt{m}}\bigr )^k \bigl (1-e^{-b\sqrt{m}}\bigr ), \end{aligned} \end{aligned}$$where \(k = 0,1,\ldots \). Hence, the posterior distribution of k is geometric with success probability \(p=1-e^{-b\sqrt{m}}\). The mean and the variance follow from standard results.
-
(ii)
When \(x=1\):
$$\begin{aligned} P(k \vert X=1) =\frac{k e^{-bk\sqrt{m}}}{\sum _{j= 1}^{\infty }j e^{-b j\sqrt{m}}} = \left( {\begin{array}{c}k\\ 1\end{array}}\right) \bigl (1-e^{-b\sqrt{m}})^2 \bigl (e^{-b\sqrt{m}}\bigr )^{k-1}, \end{aligned}$$where \(k=1,2,\ldots \). Therefore, the posterior distribution of k is extended NB with parameters \(p=e^{-b\sqrt{m}}\) and \(r=2\). The mean and the variance are
$$\begin{aligned} \text {E}(k \vert X=1)&= \sum _{k=1}^{\infty }k P(k \vert X=1) = \frac{1+e^{-b\sqrt{m}}}{1-e^{-b\sqrt{m}}},\\ \text {Var}(k \vert X=1)&= \sum _{k=1}^{\infty }k^2 P(k\vert X=1) - \bigl [\text {E}(k \vert X=1) \bigr ]^2 =\frac{2e^{-b\sqrt{m}}}{\bigl (1-e^{-b\sqrt{m}}\bigr )^2}, \end{aligned}$$
respectively. \(\square \)
Corollary 1
The cumulative distribution function (cdf) of the posterior distribution of k is
-
(i)
\(F_{k \vert X=0}(k)= 1-e^{-b\sqrt{m}(k+1)}\) for \(k \ge 0\);
-
(ii)
\(F_{k \vert X=1}(k)=1-\bigl [k(1-e^{-b\sqrt{m}})+1\bigr ] e^{-b\sqrt{m}k}\) for \(k\ge 1\);
-
(iii)
\(F_{k \vert Y=0}(k)= 1-m^{\tau {(k+1)}}\) for \(k \ge 0\);
-
(iv)
\(F_{k \vert Y=1}(k)=1-\bigl [k(1-m^\tau )+1\bigr ]m^{\tau k}\) for \(k\ge 1\).
We now show that the extended Hurwitz–Lerch zeta distribution [18] is an appropriate approximation for the posterior distribution of k under the GP model with \(x\ge 2\).
Theorem 2
The posterior distribution of k, given \(X=x\) has a GP distribution with mean kb and variance kb/m, can be approximated by an extended Hurwitz–Lerch zeta distribution with mean \(x/b + 1/(b\sqrt{m})\) and variance \((x+1)/(b^2 m)\) for some \(k \ge l\) where \(x \ge 2\) is the conditioning value, and \(l \ge x\).
Proof
First, we note that the denominator in Eq. (3) can be written as
The Lerch transcendent \(\Phi (u,s,v)\) (see [19]) is given by
where u is complex and \(\vert u\vert < 1\), \(v \ne 0, -1, -2, \ldots \), and \(s \ne 1, 2, \ldots \). Representing the denominator using the Lerch transcendent, we get
where \(w = 1+(1-\sqrt{m})/(b\sqrt{m})\).
The following identity (Eq.1.11(11) in [19]) relates the Lerch transcendent to the Bernoulli polynomials for \(s=-h\):
where \(\vert {\log u}\vert < 2\pi \), \(v \ne 0,-1,-2,\ldots \), \(h \ne -1,-2,\ldots \), and \(B_n(v)\) is the nth Bernoulli polynomial with argument v. The Bernoulli polynomial is defined as
where \(B_{i}(0)\) is the ith Bernoulli number.
If we substitute \(u=e^{-b\sqrt{m}}\), \(h=x\), \(v=wx\) into Eq. (6), and then multiply both sides by \(e^{-bw\sqrt{m}x}\), we obtain
We can approximate Eq. (7) as
provided that
as \(x \rightarrow \infty \), o being the little o notation. To show this, we use an identity involving the Bernoulli polynomials and the sum of xth powers (Eq. 1.13(10) in [19]):
for \(x=2,3,\ldots \) and \(z \in {\mathbb {Z}}^+\). For sufficiently large z, \(B_{x+1}(z)\) is positive and dominates \(B_{x+1}(0)\). Suppose wx is a positive integer that is sufficiently large, then
if \(B_{x+r+1}(0) > 0\) and for some b and m such that \(B_{x+r+1}(wx) \gg B_{x+r+1}(0)\); if \(B_{x+r+1}(0) \le 0\), we have
Thus,
Applying Stirling’s approximation for x!, the right-hand-side simplifies to \(U = (c/\sqrt{2\pi x})(ce^{c+1})^x\), where \(c = bw\sqrt{m}\). For \(0 < ce^{c+1} \le 1\), U converges to 0 as \(x \rightarrow \infty \). Therefore, \(0 < c \le W(e^{-1}) \approx 0.2785\), where \(W(\cdot )\) is the Lambert W function. Thus, for \( \min \{ [1-W(e^{-1})]^{2}(1-b)^{-2}, 4\} \le m < \min \{ (1-b)^{-2}, 4\}\), Eq. (8) should give reasonably good approximation.
Subsequently, we can approximate Eq. (5) using Eq. (8):
Substituting Eq. (9) for the denominator in Eq. (3), we obtain
To see that A is the pmf of an extended Hurwitz–Lerch zeta distribution with extended parameter space [20], we start with the pmf of the Hurwitz–Lerch zeta distribution:
for \(k=1,2,\ldots \), where \(a>-1\) and \(s \in {\mathbb {R}}\) if \(0<\theta <1\) or \(s>0\) if \(\theta = 1\). Shifting the support to \(k=x,x+1,\ldots \), Eq. (11) becomes
for \(k=x,x+1,\ldots \), where \(x\ge 2\). Taking \(\theta = e^{-b\sqrt{m}}\), \(s+1=-x\), \(a+1=wx\) and substituting them into Eq. (12) leads to A in Eq. (10).
For some combinations of b and m, there exists an \(l \ge x\) such that \(k \ge l\) results in \(\sqrt{m}(k + g(x)) = k + o(k)\). In this case, B of Eq. (10) becomes approximately 1, thus
which is the pmf of the extended Hurwitz–Lerch zeta distribution with parameters \(\theta = e^{-b\sqrt{m}}\), \(s+1=-x\), \(a+1=wx\).
To derive the mean and the variance of the extended Hurwitz–Lerch distribution (Eq. (13)), we first note that the latter is a special case of the modified power series distribution [21], which has pmf
where \({\mathbb {B}} \subset {\mathbb {Z}}^+\), \(A(z)>0\), and \(f(\theta )\), \(g(\theta )\) are positive, finite and differentiable functions of \(\theta \). In this case, we have
The expectation of Z is
and the variance is
Note that
Therefore,
Let \(\theta = e^{-b\sqrt{m}}\), \(s+1=-x\) and \(a+1=wx\). Substituting Eq. (17) into Eq. (14) yields
Then, substituting Eq. (18) into Eq. (15) and using Eq. (16) yields
Eq. (18) can be further approximated by applying Eq. (8):
Similarly, Eq. (19) can be further approximated as
\(\square \)
For the NB model with \(y \ge 2\), we again find that the posterior distribution of k is approximately given by the extended Hurwitz–Lerch zeta distribution.
Theorem 3
Let Y have an NB distribution with pmf given by Eq. (2). The posterior distribution of k given \(Y=y\) approximately follows the extended Hurwitz–Lerch zeta distribution with mean \(-(y+1)/(\tau \log m) - (y-1)/(2\tau )\) and variance \((y+1)/(\tau \log m)^2\), for \(k \ge y\), where \(y \ge 2\).
Proof
For non-negative real \(\alpha \), \(\beta \) such that \(\alpha \ne \beta \), Laforgia & Natalini [22] give the following approximation for the quotient of gamma functions:
when \(y \rightarrow \infty \), with \(c=(\alpha +\beta -1)/2\). Applying Eq. (20) in Eq. (4) yields
as \(k\tau , j\tau \rightarrow \infty \), for \(k \ge y\). Then, representing the denominator of Eq. (21) using the Lerch transcendent, we obtain
Hence, Eq. (21) can be expressed as
where \(k=y,y+1,\ldots \). Eq. (23) is just Eq. (12) with \(\theta = m^\tau \), \(s+1=-y\), \(a+1=\bigl (\frac{1}{2\tau } +1 \bigr )y - \frac{1}{2\tau }\). Therefore, we conclude that the pmf of the extended Hurwitz–Lerch zeta distribution approximates the posterior distribution of k under NB model for \(y \ge 2\).
By the same approach used in Theorem 2 to derive the mean and the variance of the posterior distribution of k, we obtain
for \(k \ge y\), where \(y \ge 2\). By similar argument leading to Eq. (8), we obtain
where \(g(y)=(\frac{1}{2\tau } + 1)y - \frac{1}{2\tau }\). Using Eq. (26), Eq. (24) can be approximated as
Similarly, we can use Eq. (26) to approximate Eq. (25) as
\(\square \)
The results of Theorem 2 and Theorem 3 lead us to the following corollary.
Corollary 2
The cdf of the posterior distribution of k is approximately
-
(i)
\( F_{k \vert X=x}(k) \approx 1 - e^{-b\sqrt{m}(k-x+1)}\Phi (e^{-b\sqrt{m}}, -x, k+wx-x+1)/\Phi (e^{-b\sqrt{m}},-x,wx), \) for \(k \ge x\), where \(x \ge 2\), for the GP model;
-
(ii)
\( F_{k \vert Y=y}(k) \approx 1 - m^{\tau (k-y+1)}\Phi (m^\tau , -y, k+ \frac{y-1}{2\tau } +1)/\Phi (m^\tau ,-y,\frac{y-1}{2\tau }+y ), \) for \(k \ge y\), where \(y \ge 2\), for the NB model.
3 Computational Validation
Table 1 shows how well the extended Hurwitz–Lerch zeta approximates the posterior distribution of k under GP for different combinations of b and m. For a fixed b, the approximation is best for m in the neighborhood of 1. For a fixed m, the approximation improves as b becomes closer to 1. Finally, for fixed m and b, the approximation improves as x increases.
Table 2 shows that for a given y, the larger the values of m and b, the better the extended Hurwitz–Lerch zeta distribution approximates the posterior distribution of k under the NB model. For fixed m and b, the approximation deteriorates as y increases up to 50. However, the Kullback–Leibler divergence remains well below 0.02 when \(0.6 \le m < 1\) for the b values considered.
Table 3 and 4 show results of approximating the mean and the standard deviation of the posterior distribution of k given \(X=x\) has a GP distribution. Similar results for the posterior distribution of k given \(Y=y\) has a NB distribution are given in Table 5 and 6. In general, the approximations have relative error that stays within \(10\%\) of the true value when \(m \ge 0.6\), for the b and X, Y values considered. Approximations that use the Lerch transcendent function (e.g., Eq. (18), Eq. (19), Eq. (24) and Eq. (25) have smaller relative error compared to the simpler equations in Theorem 2 and Theorem 3. However, for large values of X and Y, both approximations generally have similar relative error for \(m \ge 0.6\). Since the Lerch transcendent cannot be evaluated for some combinations of b and m for large X, Y values, the use of the simpler equations in these two theorems suffices.
4 Concluding Remarks
In this paper, we have clarified several theoretical properties of the posterior distribution of a count parameter k arising in the GP and the NB model. Thus, for conditioning values of 0 and 1 from these two models, the posterior distribution of k is found to be geometric and extended NB, respectively. For conditioning values of 2 or more, the posterior distribution of k under either a GP or an NB model is approximated by the extended Hurwitz–Lerch zeta distribution. To our knowledge, this is the first instance where a connection between the Hurwitz–Lerch zeta distribution and a Bayesian posterior distribution is demonstrated. The present results open up the possibility of using the posterior mean to correct for observed gene counts in RNA-Seq data analysis.
Data Availability
None
Code Availability
R package used: VGAM [23]. R codes for reproducing the analyses are available at https://github.com/Divo-Lee/R-Code/blob/main/Post_dist.R
References
Consul, P.C., Jain, G.C.: A generalization of the Poisson distribution. Technometrics 15(4), 791–799 (1973)
Joe, H., Zhu, R.: Generalized Poisson distribution: the property of mixture of Poisson and comparison with negative binomial distribution. Biom. J. 47(2), 219–229 (2005)
Nikoloulopoulos, A.K., Karlis, D.: On modeling count data: a comparison of some well-known discrete distributions. J. Stat. Comput. Simul. 78(3), 437–457 (2008)
Shoukri, M., Consul, P.: Some chance mechanisms generating the generalized Poisson probability models. In: Biostatistics, pp. 259–268. Springer, Dordrecht (1987)
Consul, P.C.: Generalized Poisson Distribution: Properties and Applications. Marcel Dekker, Marcel Dekker, New York (1989)
Srivastava, S., Chen, L.: A two-parameter generalized Poisson model to improve the analysis of RNA-seq data. Nucleic Acids Res. 38(17), 170 (2010)
Li, W., Jiang, T.: Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads. Bioinformatics 28(22), 2914–2921 (2012)
Zhang, J., Kuo, C.-C.J., Chen, L.: WemIQ: an accurate and robust isoform quantification method for RNA-seq data. Bioinformatics 31(6), 878–885 (2015)
Wang, Z., Wang, J., Wu, C., Deng, M.: Estimation of isoform expression in RNA-seq data using a hierarchical Bayesian model. J. Bioinform. Comput. Biol. 13(06), 1542001 (2015)
Low, J.Z.-B., Khang, T.F., Tammi, M.T.: CORNAS: coverage-dependent RNA-Seq analysis of gene expression data without biological replicates. BMC Bioinformatics 17, 575 (2017)
Anscombe, F.: The statistical analysis of insect counts based on the negative binomial distribution. Biometrics 5(2), 165–173 (1949)
Bliss, C.I., Fisher, R.A.: Fitting the negative binomial distribution to biological data. Biometrics 9(2), 176–200 (1953)
White, G.C., Bennetts, R.E.: Analysis of frequency count data using the negative binomial distribution. Ecology 77(8), 2549–2557 (1996)
Anders, S., Huber, W.: Differential expression analysis for sequence count data. Genome Biol. 11, 106 (2010)
Hardcastle, T.J., Kelly, K.A.: baySeq: empirical Bayesian methods for identifying differential expression in sequence count data. BMC Bioinformatics 11(1), 422 (2010)
Wu, H., Wang, C., Wu, Z.: A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data. Biostatistics 14(2), 232–243 (2013)
Love, M.I., Huber, W., Anders, S.: Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15(12), 550 (2014)
Gupta, P.L., Gupta, R.C., Ong, S.H., Srivastava, H.: A class of Hurwitz-Lerch Zeta distributions and their applications in reliability. Appl. Math. Comput. 196(2), 521–531 (2008)
Bateman, H.: Higher Transcendental Functions, vol. 1. McGraw-Hill Book Company, New York (1953)
Liew, K.W., Ong, S.H., Toh, K.K.: The Poisson-stopped Hurwitz-Lerch zeta distribution. Communications in Statistics - Theory and Methods 51(16), 5638–5652 (2022)
Gupta, R.C.: Modifed power series distribution and some of its applications. Sankhya, Series B 36(3), 288–298 (1974)
Laforgia, A., Natalini, P.: On the asymptotic expansion of a ratio of gamma functions. J. Math. Anal. Appl. 389(2), 833–837 (2012)
Yee, T.W.: VGAM: Vector Generalized Linear and Additive Models. (2022). R package version 1.1-7. https://CRAN.R-project.org/package=VGAM
Acknowledgements
We thank two anonymous reviewers for their constructive comments which helped improve the clarity of the present work.
Funding
None
Author information
Authors and Affiliations
Contributions
TFK was involved in the conceptualisation; all authors contributed to the methodology, formal analysis, writing—original draft preparation and writing—review and editing.
Corresponding author
Ethics declarations
Conflict of interest:
None
Ethics approval:
Not applicable
Consent to participate:
Not applicable
Consent for publication:
Both authors agree to publish
Additional information
Communicated by Rosihan M. Ali.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Khang, T.F. Some Approximation Results for Bayesian Posteriors that Involve the Hurwitz–Lerch Zeta Distribution. Bull. Malays. Math. Sci. Soc. 46, 72 (2023). https://doi.org/10.1007/s40840-023-01463-9
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40840-023-01463-9
Keywords
- Approximation
- Generalized Poisson distribution
- Hurwitz–Lerch zeta distribution
- Negative binomial distribution
- Posterior distribution