1 Introduction

Binomial is one of the most popular discrete probability distributions that has been received great attention in the literature. Statistical intervals for one- and two-sample problems involving binomial models are widely available. In comparison with the binomial, inferential results on negative binomial distributions are very limited. In binomial sampling, a fixed number of sampling units are drawn from an infinite population and the number of units with an attribute (success) of interest is counted whereas in negative binomial sampling (also known as inverse sampling) units are drawn from an infinite population until a prespecified number of successes is observed. Thus, in binomial sampling the sample size is fixed and the number of successes is a random variable whereas in negative binomial sampling the number of success is fixed while the sample size or the number of failures is a random variable. Negative binomial sampling is commonly used in situations where one encounters events that occur with small probability. Haldane (1945) has noted that in epidemiological studies on rare disease, negative binomial sampling design may be used to ensure that a reasonable number of cases are observed. Tian et al. (2009) have noted the applications of negative binomial distributions in biological and medical studies. Kikuchi (1987) has used negative binomial distributions in case-control study involving a rare exposure of maternal congenital heart disease (see Example 1), and Madden et al. (1996) have noted the applications in botanical study of plant diseases. In mail-in survey sampling, non-responses are quite common, and the initial sample size needs to be determined in order to get a specified number of final responses. Recently, Young (2014) has shown application of negative binomial tolerance intervals (TIs) in such survey sampling; see Example 2.

The problem of finding statistical intervals for negative binomial distributions has received only little attention. George and Elston (1993) have considered the problem of finding confidence intervals (CIs) for proportions based on inverse sampling until the occurrence of the first event. Lui (1995) has noted that the confidence interval given in Clemans (1959), which was calculated on the basis of the first event, may be too wide for general utility. Tian et al. (2009) have proposed some approximate confidence intervals (CIs) for the success probability p of a negative binomial distribution. The comparison studies by these authors have indicated that the available exact CI is too conservative and so they have proposed some approximate confidence intervals which are less conservative. Even though the exact CIs for binomial, Poisson and negative binomial distributions are optimal among strictly nested intervals (Thulin and Zwanzig 2017), it is well-known that the exact CIs are often too conservative and unnecessarily wide. Alternative simple closed-form approximate CIs based on the score method and Bayesian methods are proposed for binomial and Poisson distributions. However, such CIs are not available for negative binomial distributions.

Another statistical interval that is commonly used in applications is the prediction interval (PI). The prediction problem that we will address concerns two independent negative binomial distributions with the same “success probability” p, but possibly different target numbers of successes. Given that n independent Bernoulli trials are needed to observe r successes, we like to predict the number of trials required in another negative binomial sampling to observe s successes with confidence \(1-\alpha \). In particular, we like to find a prediction interval \([L(X; r, s, \alpha ), \ U(X; r, s, \alpha )]\) so that

$$\begin{aligned} P_{X,Y}\left( L(X; r, s, \alpha ) \le Y \le U(X; r, s, \alpha )\right) \ge 1-\alpha . \end{aligned}$$
(1)

In the above, the random variable X represents the number of failures before the rth success (so that \(n=X+r\)) and it has a negative binomial distribution with success probability p and the number of successes r, say, NBin(rp) and Y has a NBin(sp) distribution independently of X. Note that by adding s to the PI for Y, we find the PI for the number of trials needed to observe s successes in a future negative binomial sampling. Although many authors (see Knüsel 1994; Dunsmore 1976; Krishnamoorthy and Peng 2011) have addressed the problem of finding PIs for binomial and Poisson distributions, to the best of our knowledge, no PI is available for a negative binomial distribution.

The problem of constructing TIs for a discrete distribution has received some attention in the literature. TIs for a discrete distribution are used to assess the magnitude of discrete quality characteristics of a product, for example, the number of defective components in a system. Methods for finding TIs for the binomial and Poisson models are proposed in Hahn and Chandra (1981), Hahn and Meeker (1991), and Krishnamoorthy et al. (2011). These authors have provided exact and some approximate methods of obtaining TIs. Wang and Tsung (2009) provided an example where it is desired to find a TI for a binomial distribution to assess the number of defective chips in a wafer. Young (2014) has proposed some TIs for negative binomial distributions showing applications to survey sampling (see Example 2).

In this article, we address the following problems. (i) Construction of CIs for the success probability, (ii) finding PIs for \(Y \sim \mathrm{NBin}(s, p)\) based on \(X \sim \mathrm{NBin}(r,p)\), and (iii) construction of equal-tailed tolerance intervals. Since we propose the fiducial approach for the problems (i) and (ii), we first describe fiducial distributions for the success probability p in a negative binomial sampling. In Sect. 3, we propose fiducial and score CIs for p and compare them with an available large sample CI in terms of coverage probabilities and precisions. In Sect. 4, we propose a few CIs for the expected number of trials required to observe a fixed number of successes in a future negative binomial experiment. In Sect. 5, we describe an exact PI, and propose a fiducial PI, highest probability mass (HPM) prediction interval and a PI based on a joint sampling approach. All these PIs are evaluated with respect to coverage probabilities and expected widths. The problem of constructing equal-tailed TIs is addressed in Sect. 6. In Sect. 7, two examples with real data are used to illustrate the methods, and some concluding remarks are given in Sect. 8.

2 Fiducial distribution for p

The negative binomial probability mass function (PMF) is given by

$$\begin{aligned} P(X = x|r,p) = {{r+x-1}\atopwithdelims ()x} p^r(1-p)^x, \quad x=0,1,2,\ldots \end{aligned}$$
(2)

where the random variable X represents the number of failures until the occurrence of the rth success in a sequence of independent Bernoulli trials each with success probability p. Let us denote the negative binomial distribution by NBin(rp).

A fiducial distribution for a parameter can be obtained by inverting a hypothesis test as suggested by Fisher (1935) or by deducing from a random number generating method (Hannig 2009). As both methods produce similar fiducial distributions, we shall follow Hannig’s approach. To identify the data generating mechanism in a negative binomial distribution, we note that \(x^*\) is a pseudo random number from the NBin(rp) distribution if

$$\begin{aligned} P(X \le x^*-1|r,p) < U \le P(X\le x^*|r,p), \end{aligned}$$

where U is a uniform(0,1) random variable (e.g., see Casella and Berger 2001, p. 249).

Let x be an observed value of \(X \sim \mathrm{NBin}(r,p)\). For a given x, the fiducial distribution of p is implicitly determined by

$$\begin{aligned} P(X \le x-1|r,p) < U \le P(X\le x|r,p), \end{aligned}$$
(3)

where U has a uniform(0, 1) distribution. Let \(B_{a,b}\) denote the beta random variable with shape parameters a and b. Using the result (see Patil 1960) that \( P(X \le x|r,p) = P(B_{r,x+1} \le p), \) where \(X \sim \) NBin(rp) in (3), we see that the fiducial distribution of p is implicitly determined by

$$\begin{aligned} P(B_{r,x} \le p) < U \le P(B_{r,x+1}\le p), \end{aligned}$$
(4)

or equivalently,

$$\begin{aligned} B_{r,x+1;U} \le p < B_{r,x;U}, \end{aligned}$$
(5)

where \(B_{a,b;q}\) denotes the qth quantile of a beta(ab) distribution. Notice that if \(u_1,\ldots ,u_N\) are random numbers from uniform(0, 1) distribution, then \(B_{a,b;u_1},\ldots ,B_{a,b;u_N}\) are random numbers from beta(ab) distribution. Thus, a fiducial distribution of p lies between beta\((r,x+1)\) and beta(rx) distributions.

For a given (xr), random samples \(\widehat{p}_{u_1},\ldots ,\widehat{p}_{u_N}\) from the fiducial distribution of p are determined by

$$\begin{aligned} B_{r,x+1;u_i} < \widehat{p}_{u_i} \le B_{r,x; u_i}, \quad i=1,\ldots ,N. \end{aligned}$$

Like in the binomial case (see Krishnamoorthy and Lee 2010), a random quantity that is “stochastically between” \(B_{r,x+1}\) and \(B_{r,x}\) can be used as a single fiducial variable for p. A simple choice is

$$\begin{aligned} B_{r, x+.5}, \end{aligned}$$
(6)

which stochastically lies between \(B_{r,x+1}\) and \(B_{r,x}\). That is, for any given (rx),

$$\begin{aligned} B_{r,x+1; U} \le B_{r,x+.5;U} \le B_{r,x;U}\quad \ \ \text{ for } \text{ all } U \in (0,1). \end{aligned}$$
(7)

3 Confidence intervals

3.1 Fiducial confidence interval

For a given confidence coefficient \(1-\alpha \), the lower and upper \(\alpha /2\) quantiles of \(B_{r,x+.5}\) form a \(1-\alpha \) generalized fiducial CI for p. That is, the fiducial CI is given by

$$\begin{aligned} \left( B_{r,x+.5; \alpha /2}, \ B_{r,x+.5;1-\alpha /2}\right) . \end{aligned}$$
(8)

3.2 Exact confidence interval

It follows from (5) that, for a given x,

$$\begin{aligned} (p_L,p_U)= \left( B_{r,x+1; \alpha /2}, \ B_{r,x;1-\alpha /2}\right) \end{aligned}$$
(9)

is also a \(1-\alpha \) fiducial CI for p. Note that the above interval is an observed value of the random interval

$$\begin{aligned} \left( B_{r,X+1; \alpha /2}, \ B_{r,X;1-\alpha /2}\right) , \end{aligned}$$
(10)

where \(X \sim \) NBin(rp). This random CI is obtained by using the “pivoting a CDF” approach and so it is exact in the frequentist sense; see Theorem 9.2.14 of Casella and Berger (2001) and the paper by Lui (1995). In particular, the endpoints of the exact CI \((p_L, \ p_U)\) are the solutions of

$$\begin{aligned} F_X(x|r,p_L) = \frac{\alpha }{2} \quad \mathrm{and} \quad {\bar{F}}_X(x|r,p_U) = \frac{\alpha }{2}, \end{aligned}$$
(11)

where x is an observed value of \(X \sim \) NBin(rp), \(F_X(x|r,p)=P(X\le x|r,p)\) and\({\bar{F}}_X(x|r,p)=P(X \ge x|r,p).\) The solutions of the above equations are the endpoints of the CI (9), which can be verified using the distributional results that \( P(X \le x|r,p) = P(B_{r,x+1} \le p) \) and \(P(X \ge x|r,p)=P(B_{r,x} \ge p)\).

Remark 1

Tian et al. (2009) have found an approximation, say, \({\widehat{F}}_X(x|r,p)\), to the distribution function \(F_X(x|r,p)\) using the saddle point approximation, and then determined \(p_L\) and \(p_U\) as solutions of \({{\widehat{F}}}_X(x|r,p_L) = \frac{\alpha }{2}\) and \(\widehat{{\bar{F}}}_X(x|r,p_U) = \frac{\alpha }{2}\), respectively. This approximate CI is not in closed-form and can be obtained only numerically. However, the solutions of the exact method determined by equations in (11) are in closed-form given in (10), and simple to compute. So we will not consider this saddle point approximate CI for further studies.

3.3 Score confidence interval

Let \(\eta = (1-p)/p\). For \(X \sim \) NBin(rp), \(E(X) = r\eta \). The score CI for p can be obtained from the one for \(\eta \). Noting that \(\widehat{\eta }=X/r\) is an unbiased estimate of \(\eta \) and \({\mathrm{Var}}(\widehat{\eta }) = \eta /(rp)\), we consider the quantity \( {\sqrt{r}(\widehat{\eta }-\eta )}/{ \sqrt{\eta /p}} \), which is asymptotically normally distributed (Wald 1943). Replacing p with the maximum likelihood estimate (MLE) \(\widehat{p} = r/(r+X)\), we find a CI for \(\eta \) on the basis of the result that

$$\begin{aligned} Z_\eta = \frac{\sqrt{r}(\widehat{\eta }-\eta )}{\sqrt{\eta /\widehat{p}}} \sim N(0,1), \ \ \mathrm{asymptotically.} \end{aligned}$$
(12)

Let \(z_{\alpha /2}\) denote the upper \(100\alpha /2\) percentile of the standard normal distribution. Solving the equation \(Z^2_\eta = z^2_{\alpha /2}\) for \(\eta \), we find a \(1-\alpha \) CI for \(\eta \) as

$$\begin{aligned} (L, U) =\widehat{\eta }+ \frac{z^2_{\alpha /2}}{2r\widehat{p}} \pm \frac{z_{\alpha /2}}{r} \sqrt{\frac{z^2_{\alpha /2}}{4\widehat{p}^2}+\frac{X}{\widehat{p}}}. \end{aligned}$$
(13)

A CI for p, deduced from the above CI, is given by \((1/(1+U), 1/(1+L))\).

3.4 Large sample confidence interval

The large sample CI, proposed in Tian et al. (2009), is based on the asymptotic normality of the MLE of p. The MLE of p is \(\widehat{p} = r/(r+x)\) with variance \(\mathrm{Var}(\widehat{p}) = p^2(1-p)/r\). These results lead to the large sample CI as

$$\begin{aligned} \widehat{p} \pm z_{\alpha /2}\sqrt{\frac{\widehat{p}^2(1-\widehat{p})}{r}}. \end{aligned}$$
(14)

The left endpoints of all CIs are truncated at 0 if they are negative, and the right endpoints are truncated at 1 if they are greater than 1.

3.5 Coverage probabilities and expected widths of confidence intervals

To judge the coverage probabilities and precisions of the exact, fiducial, score and large sample CIs, we computed the coverage probabilities as follows. For a given x of \(X \sim \) NBin(rp), let \((L(x;r,\alpha ), U(x; r,\alpha ))\) be a \(1-\alpha \) CI for p. Then the coverage probability of the CI can be computed using the negative binomial probabilities as

$$\begin{aligned} \sum _{x=0}^\infty {r+x-1\atopwithdelims ()x}p^r(1-p)^xI[L(x;r,\alpha )\le p \le U(x; r,\alpha )], \end{aligned}$$
(15)

where I[x] is the indicator function. Expected width of the CI can be computed using the above expression with the indicator function replaced by the width \([U(x;r,\alpha ) - L(x; r,\alpha )]\).

We computed the coverage probabilities and expected widths of the (i) exact CI, (ii) fiducial CI, (iii) score CI and (iv) large sample CI for \(r=5, 10, 20, 40, 80\) and 120, and plotted them in Fig. 1. Examination of the plots clearly indicates that the large sample CIs are too liberal having coverage probabilities below 0.80 in many cases; it is liberal even for large values of r. The exact CI is too conservative even for large values of r, or equivalently, for large expected number of trials. The over coverage of the exact CI is increasing with increasing p. The fiducial and score CIs are also somewhat liberal for very small values of r and large values of p; see the plot for \(r=5\). For \(r\ge 10\), both the score and fiducial CIs perform very similar, except for a few cases. For instance, the fiducial CI appears to be more liberal than the score CI for large p, \(r=10\) and 20.

Fig. 1
figure 1figure 1

Coverage probabilities and expected widths of 95% confidence intervals for p

In order to compare the coverage probabilities and expected widths simultaneously, we plotted the expected widths on the right pane in Fig. 1. We first note from Fig. 1 that the expected widths are in agreement with the coverage probabilities of the CIs. The coverage probabilities of the large sample CI are much lower than the nominal level 0.95, as a result, they are narrower than other CIs. The exact CIs are too conservative and so they are wider than others in all cases. Between the fiducial and score CIs, the former appears to be slightly wider than the latter for p around 0.5; see the plots for \(r=5,10\) and 20. In general, the fiducial and score CIs perform very similar in terms of coverage probabilities and expected widths, except that the score CI has an edge over the fiducial CI in a few cases. Overall, we see that the score CI followed by the fiducial CI are satisfactory in controlling the coverage probabilities close to the nominal level and maintaining the precision.

4 Confidence intervals for the expected number of trials

Let \(X \sim \mathrm{NBin}(r, p)\). On the basis of (Xr), we would like to estimate the expected number of trials required to observe s successes in a future negative binomial experiment with the same success probability. That is, we would like to find a CI for \(E(s+Y)\), where \(Y\sim \mathrm{NBin}(s, p)\). Since \(E(Y) = s(1-p)/p=s\eta \) and s is known, it is enough to find CI for \(\eta \). As \(\eta \) is a decreasing function of p, any CI for p can be used to find a CI for the expected number of trials to observe s successes. Let \((p_L,p_U)\) is a \(1-\alpha \) CI for p. Then \(\left( (1-p_U)/p_U, \ (1-p_L)/p_L\right) \) is a \(1-\alpha \) CI for \(\eta \). By using the exact CI for p, we can find the exact CI for \(\eta \). Note that the score CI for \(\eta \) is already defined in (13).

The coverage probabilities of these CIs for \(\eta \) should be similar to those for the CIs for p. In the following Table 1, we reported the coverage probabilities and expected widths of confidence intervals for the mean number of trials required to observe s successes for some values of r, s and confidence level 0.95. We observe from Table 1 that the coverage probabilities of the score CIs are very close to or greater than the nominal level for all the cases. Furthermore, the score CIs are shorter than the corresponding exact CIs for all values of r and s reported in the Table 1.

Table 1 Coverage probabilities and expected widths (in parentheses) of 95% CIs for the mean number of trials needed to observe s successes

Remark 2

A CI for the mean of a NBin(rp) distribution can also be obtained by parameterizing \( \theta = r\) and \(\mu = r(1-p)/p\). In this formulation, the PMF can be written as

$$\begin{aligned} P(X = k) = \frac{\Gamma (\theta +k)}{\Gamma (k+1)\Gamma (\theta )}\left( \frac{\mu }{\mu +\theta }\right) ^k \left( \frac{\theta }{\mu +\theta }\right) ^\theta , \quad k=0,1,2,\ldots . \end{aligned}$$

where \(\theta \) is real positive, and both \(\theta \) and \(\mu \) are unknown. Shilane et al. (2010) have addressed the problem of finding CIs for the mean \(\mu \) based on a sample \(X_1,\ldots ,X_n\) from a distribution with the above PMF and using the above parameterized model (Hilbe 2011). They have provided some CIs based on the result that the sample mean \({{\bar{X}}}\) has a gamma distribution asymptotically. Since \(\sum _{i=1}^nX_i\) has the NB(nrp) distribution, the asymptotic approaches in Shilane et al. can be used to find a CI for the mean \(r(1-p)/p\). However, to apply our methods, the value of r should be known, and so they are not applicable to find a CI for \(\mu \) in this parameterized model. For large n, as indicated by the CIs based on simulated data given in Table 2, the gamma asymptotic CI and our CIs are in agreement, but they are appreciably different for small to moderate sample sizes.

Table 2 95% confidence intervals based on simulated data from NB(rp)

5 Prediction intervals for the number of trials

Some of the PIs that we consider below are based on the hypergeometric distribution and for easy reference, we describe the probability mass function (PMF) of the hypergeometric distribution with \(n=\) sample size, \(a=\) number of items with an attribute of interest and \(b=\) the number of items without the attribute, as

$$\begin{aligned} h(x;a, b, n)=P(X = x|a, b, n) = \frac{{a\atopwithdelims ()x}{b\atopwithdelims ()n-x}}{{a+b \atopwithdelims ()n}}, \quad L_x \le x \le U_x, \end{aligned}$$
(16)

where \(L_x = \max \{0, n-b\}\) and \(U_x = \min \{n, a\}.\) Let us denote the PMF of the hypergeometric distribution by h(xabn) and the cumulative distribution function (CDF) by H(xabn).

5.1 Exact prediction interval

Let \(X \sim \) NBin(rp) independently of \(Y \sim \) NBin(sp). For a given (Xr), we like to predict Y or, equivalently, the number of trials \(Y+s\) required to have s successes in a future negative binomial sampling with the same success probability p. The conditional PMF can be expressed as

$$\begin{aligned} P(X=x|X+Y=f)= & {} \frac{{x+r-1 \atopwithdelims ()x}{s+f-x-1\atopwithdelims ()f-x}}{{f+r+s-1\atopwithdelims ()f}}\nonumber \\= & {} \frac{s}{f+s-x}h(x; x+r-1, s+f-x, f), \end{aligned}$$
(17)

for \(\max \{0,x-s\} \le x \le \min \{f,x+r-1\}.\) In the above equation, h(xabn) is the hypergeometric PMF in (16). To find an exact PI for the number of trials required to have s successes, we shall use the approach given in Thatcher (1964) for the binomial case. Following Thatcher’s approach, we find the lower prediction limit for Y as the smallest integer L for which

$$\begin{aligned} P(X \ge x|X+L=f) ={s} \sum _{i = x}^{x+L}\frac{h(i;i+r-1,s+x+L-i,x+L)}{f+s-i} > \alpha . \end{aligned}$$

The upper prediction limit for Y is the largest integer U for which

$$\begin{aligned} P(X\le x|X+U=f) =s\sum _{i=0}^{x}\frac{ h(i;i+r-1,s+x+U-i,x+U)}{f+s-i}>\alpha . \end{aligned}$$

The interval [LU] is the \((1-2\alpha )\) exact PI for Y.

5.2 Fiducial prediction interval

To find a fiducial PI, we shall use the general approach of Wang et al. (2012). For a given (rxs), the fiducial PI is based on the predictive distribution which is described by

$$\begin{aligned} \widetilde{Y}|W \sim \mathrm{NBin}(s,W) \quad \mathrm{and} \quad W \sim \mathrm{beta}(r, x+.5), \end{aligned}$$

where beta(ab) denotes the beta distribution with shape parameters a and b. The probability mass function (PMF) of \(\widetilde{Y}\) can be obtained as

$$\begin{aligned} P(\widetilde{Y}=y)= & {} E_WP(\widetilde{Y}=y|W) \nonumber \\= & {} {s+y-1 \atopwithdelims ()y}\frac{1}{\mathrm{beta}(r,x+.5)}\int _0^1 w^{s+r-1}(1-w)^{y+x+.5-1}dw\nonumber \\= & {} {s+y-1 \atopwithdelims ()y}\frac{\mathrm{beta}(r+s,y+x+.5)}{\mathrm{beta}(r,x+.5)}. \end{aligned}$$
(18)

The above PMF is called the beta-negative binomial. See Sect. 6.2.3 of Johnson et al. (2005).

5.2.1 Equal-tailed prediction interval

For a given (rxs), the lower 100\(\alpha /2\) and the upper 100\(\alpha /2\) percentiles of \(\widetilde{Y}\) form a \(1-\alpha \) fiducial PI for Y. This PI \([\widetilde{L}, \widetilde{U}]\) can be computed as follows. The left endpoint is smallest integer \(\widetilde{L}\) so that \(\sum _{y=0}^{\widetilde{L}} P(\widetilde{Y} = y) > \alpha /2\) and the right endpoint is the largest integer \(\widetilde{U}\) so that \(\sum _{y=\widetilde{U}}^\infty P(\widetilde{Y} = y) > \alpha /2.\)

5.2.2 Highest posterior mass prediction interval

The highest posterior mass (HPM) fiducial PIs are constructed by collecting integers with large probability masses according to the predicting distribution. The HPM-PIs are expected to be shorter than equal-tailed PIs. To compute the HPM-PI, let \(y_m\) denote the mode of the predicting distribution. The HPM-PI can be obtained by first adding \(y_m\) to the predicting set S and then adding the integers in decreasing order of their probability mass until \(P(\widetilde{Y} \in \mathsf{S}) \ge 1 - \alpha \).

To find the mode of the distribution of \(\widetilde{Y}\) defined in (18), it can be easily verified that

$$\begin{aligned} \frac{P(\widetilde{Y}=y+1)}{P(\widetilde{Y} = y)} = \frac{(s+y)(y+x+.5)}{(y+1)(r+s+x+y+.5)} >1 \end{aligned}$$

if

$$\begin{aligned} y \le \left\lceil \frac{s(x-1)-(r+x-.5(s-1))}{r+1}\right\rceil = y_m, \end{aligned}$$

where \(\left\lceil x\right\rceil \) is the ceiling function. Thus, the PMF of \(P(\widetilde{Y}= y)\) is an increasing function for \(y \le y_m\) and decreasing for \(y > y_m\). Therefore, \(y_m\) is the mode. For R code to compute the HPM-PI, see the appendix.

5.3 Joint sampling approach

We now propose a closed-form approximate PI based on the “joint sampling approach” which is similar to the one used to find confidence interval in a calibration problem (e.g., see Brown 1982, Sect. 1.2). This approach was also used to find an approximate PI for binomial distributions (Krishnamoorthy and Peng 2011). To describe this approach, we first note that \(X \sim \) NBin(rp) independently of \(Y \sim \) NBin(sp) and the sum \(X+Y\) has also NBin\((r+s,p)\) distribution. Let \(\eta =(1-p)/p\). Then \(E\left( \frac{X+Y}{r+s}\right) = \eta \). Let \(\widehat{\eta }_{xy} =\frac{X+Y}{r+s}.\) Consider the quantity

$$\begin{aligned} \frac{rY-sX}{\sqrt{{\mathrm{Var}}(rY-sX)}} = \frac{rY-sX}{\sqrt{rs(r+s)\eta /p}}. \end{aligned}$$

Since \(E(rY-rX) =0\), by the Wald result, the above quantity follows the standard normal distribution asymptotically. Replacing \(\eta \) with \(\widehat{\eta }_{xy}\) and p with the MLE \(\widehat{p} = r/(r+X)\) in the above expression, we see that

$$\begin{aligned} C_{X} \frac{(sX-rY)^2}{X+Y}\sim Z^2, \quad \mathrm{asymptotically}, \end{aligned}$$

where \(Z \sim N(0,1)\) and \(C_{X} = 1/[s(r+X)].\) Let \(z_{\alpha /2}\) denote the upper \(100\alpha /2\) percentile of the standard normal distribution. Then solving the equation

$$\begin{aligned} C_{X} \frac{(sX-rY)^2}{X+Y} = z^2_{\alpha /2} \end{aligned}$$

for Y, we find the roots as

$$\begin{aligned} (L, U) = s\left( \frac{X}{r} + \frac{z^2_{\alpha /2}}{2r^2}(r+X)\right) \mp \frac{z_{\alpha /2}}{r}\sqrt{s(r+X)}\sqrt{s\left( \frac{X}{r} + \frac{z^2_{\alpha /2}}{4r^2}(r+X)\right) +X}. \end{aligned}$$

The \(1-\alpha \) PI for the number of trials required to have s successes in a future negative binomial experiment with success probability p is given by

$$\begin{aligned} \left[ \lceil L+s \rceil , \lfloor U+s\rfloor \right] , \end{aligned}$$
(19)

where \(\lceil x\rceil \) and \(\lfloor x \rfloor \) are the ceiling and floor functions, respectively.

5.4 Coverage probabilities and expected widths of prediction intervals

For a given \((r,s,p,\alpha )\), the exact coverage probability of a PI \([L(x,r,s,\alpha ), \ U(x,r,s,\alpha )]\) can be evaluated using the expression

$$\begin{aligned} \sum _{x=0}^\infty \sum _{y=0}^\infty {r+x-1 \atopwithdelims ()x}{s+y-1 \atopwithdelims ()y}p^{r+s}(1-p)^{x+y} I[L(x,r,s,\alpha )\le y \le U(x,r,s,\alpha )], \end{aligned}$$
(20)

where I[x] is the indicator function. The coverage probabilities of a good PI should be close to the nominal level. The above expression with the indicator function replaced by \(U(x,r,s,\alpha )-L(x,r,s,\alpha )\) can be used to compute the expected width.

We evaluated the coverage probabilities of the exact, equal-tailed, HPM prediction intervals and the PI based on the joint sampling approach (JS-PI) for some values of (rs) at the confidence level 0.95. These coverage probabilities were plotted in Fig. 2. We first observe from these plots that the exact PIs are too conservative for all values of (rs). The HPM PI is also conservative but less conservative than the exact one. The equal-tailed PI is liberal for some values in the parameter space; see the plots for \((r,s)=(10,10)\) and (10, 20). The JS-PI is also slightly liberal for the values of p near zero, but it maintains the coverage probability very close to the nominal level for most of the cases.

As the magnitudes of the coverage probability and the expected width of a PI are quite different, the plots of such values are less informative. Instead, we tabulated the coverage probabilities and corresponding expected widths of all PIs for some values of p, r and s in Table 3. We first observe that the exact PI is too conservative and so they are wider than others for all the cases reported in Table 3. Between the equal-tailed PI and the HPM prediction interval, the latter has better coverage probabilities than the former for all the cases. So the HPM-PI is preferable to the equal-tailed PI. The PI by the joint sampling approach (JS-PI) is shorter than other PIs, and it also has coverage probabilities close to the nominal level for all the cases. We also note that between the HPM PI and the JS-PI, the latter is simple to compute.

Fig. 2
figure 2

Coverage probabilities of 95% prediction intervals for NB(sp) distributions

Table 3 Coverage probabilities and expected widths of 95% prediction intervals

6 Tolerance intervals

Let \(X \sim \) NBin(rp). On the basis of (Xr), we like to find an equal-tailed tolerance interval (TI) for a NBin(sp) distribution. Let \(\kappa _{q}(p;s)\) denote the qth quantile of a NBin(sp) distribution. A \(\gamma \) content and \(1-\alpha \) coverage equal-tailed TI or simply a \((\gamma ,1-\alpha )\) equal-tailed TI [L(Xrs), U(Xrs)] for a NBin(sp) distribution is constructed so that it includes the interval \(\left[ \kappa _{\frac{1-\gamma }{2}}(p;s), \kappa _{\frac{1+\gamma }{2}}(p;s)\right] \) with confidence \(1-\alpha \); see Chap. 1 of Krishnamoorthy and Mathew (2009). That is,

$$\begin{aligned} P\left\{ L(X,r,s) \le \kappa _{\frac{1-\gamma }{2}} \ \mathrm{and} \ \kappa _{\frac{1+\gamma }{2}} \le U(X,r,s) \right\} = 1-\alpha . \end{aligned}$$
(21)

Note that 100\(\gamma \) percent of the NBin(ps) distribution falls in the interval \(\left[ \kappa _{\frac{1-\gamma }{2}}(p;s), \kappa _{\frac{1+\gamma }{2}}(p;s)\right] \). So a \(1-\alpha \) CI for this interval will include at least 100\(\gamma \) percent of the distribution with confidence \(1-\alpha .\)

6.1 Tolerance intervals based on different confidence intervals

A CI for \(\left[ \kappa _{\frac{1-\gamma }{2}}(p;s), \ \kappa _{\frac{1+\gamma }{2}}(p;s)\right] \) can be found using the confidence limits for p based on \(X \sim \) NBin(rp). Since the negative binomial distribution is stochastically decreasing in p, the quantile \(\kappa _q(p;s)\) is a decreasing function of p. Using this fact, we see that

$$\begin{aligned} \kappa _{\frac{1-\gamma }{2}}(p_U;s) \le \kappa _{\frac{1-\gamma }{2}}(p;s) \quad \mathrm{and} \quad \kappa _{\frac{1+\gamma }{2}}(p;s) \le \kappa _{\frac{1+\gamma }{2}}(p_L;s) \ \text{ with } \text{ probability } 1-\alpha \text{, } \end{aligned}$$

where \((p_L, p_U)\) is a \(1-\alpha \) CI for p based on \(X \sim \) NBin(rp). In other words,

$$\begin{aligned} \left[ \kappa _{\frac{1-\gamma }{2}}(p_U;s), \kappa _{\frac{1+\gamma }{2}}(p_L;s)\right] \end{aligned}$$
(22)

is a \((\gamma , 1-\alpha )\) equal-tailed TI for the NBin(sp) distribution.

Remark 3

The approach of finding a TI given in Young (2011) is essentially the same as the above approach. Mathew and Young (2013) have also proposed a TI on the basis of fiducial approach which is equivalent to the above TI based on the fiducial CI for p. We follow the above approach as it is simple and is easy to implement in R as shown below.

The properties of the TIs defined above are similar to those of the CIs. For example, if \((p_L,p_U)\) is an exact CI for p based on \(X \sim \) NBin(rp), then the TI defined above is an exact TI in the sense that the coverage probabilities are always greater than or equal to \(1-\alpha \) for all p. We also note that these TIs are easy to compute using the quantile function available in software packages. For example, after finding the CI \((p_L,p_U)\), the R function qnbinom(q, s, p) can be used to find the quantiles. Specifically, the \((\gamma , 1-\alpha )\) TI can be computed as

$$\begin{aligned}{}[\texttt {qnbinom}((1-\gamma )/2,s, p_U), \ \ \texttt {qnbinom}((1+\gamma )/2,s, p_L)]. \end{aligned}$$

Remark 4

Cai and Wang (2009) have proposed first-order and second-order probability matching tolerance intervals for discrete distributions in exponential families which include binomial, Poisson and negative binomial distributions. The two-sided TIs proposed in the Cai and Wang’s paper is determined so that

$$\begin{aligned} P_{\varvec{X}} \left\{ F(U(\varvec{X}))-F(L(\varvec{X})) \ge \gamma \right\} = 1-\alpha , \end{aligned}$$

where F(x) denotes the CDF of a NBin(rp) distribution. It is important to note that Cai and Wang (2009) have defined the random variable X as the number of successes until the rth failure, which is different from our commonly used definition. This two-sided TI is expected to be shorter than an equal-tailed TI, because the latter is constructed to include the lower and upper \(100(1+\gamma )/2\) percentiles. That is, an equal-tailed \((\gamma ,1-\alpha )\) TI not only includes at least 100\(\gamma \)% of the population, but also includes the lower and upper \(100(1+\gamma )/2\) percentiles whereas a two-sided TI is constructed just to include at least 100\(\gamma \)% of the population. Our coverage studies indicated that the Cai and Wang TIs are satisfactory for large r and \(.05 \le p \le .95\). They are not satisfactory for value of p at boundaries even when r is large. For example, for \(r=50\), the coverage probabilities of (.90, .95) two-sided TIs for a NBin(50, p) distribution at \(p=.01,.02,.03,.04\) and .05 are .303, .550, .707, .848 and .902 respectively; at \(p=.97,.98\) and .99, they are .909, .866 and .781, respectively. So these two-sided TIs should be used with caution, and further research is needed to improve these TIs.

6.2 Coverage probabilities and expected widths of tolerance intervals

To judge the coverage probabilities and expected widths of the TIs, we computed the exact coverage probability of a TI \(\left[ \kappa _{\frac{1-\gamma }{2}}(p_U;s), \kappa _{\frac{1+\gamma }{2}}(p_L;s)\right] \) using the following expression:

$$\begin{aligned}&\sum _{x=0}^\infty {r+x-1 \atopwithdelims ()x}p^r(1-p)^xI\left[ \kappa _{\frac{1-\gamma }{2}}(p_U;s) \le \kappa _{\frac{1-\gamma }{2}}(p;s) \ \mathrm{and}\right. \nonumber \\&\quad \left. \kappa _{\frac{1+\gamma }{2}}(p;s) \le \kappa _{\frac{1+\gamma }{2}}(p_L;s)\right] , \end{aligned}$$
(23)

where I[x] is the indicator function.

We evaluated coverage probabilities of the following equal-tailed TIs: (i) the interval in (22) with the exact CI (10) for p (referred to as the exact TI), (ii) the interval in (22) with the score CI deduced from (13) for p (referred to as the score TI), and (iii) the interval in (22) with the large sample CI (14) for p (referred to as the large sample TI). The coverage probabilities of (.90, .95) TIs were computed for \((r,s) = (20,10), (20,30), (30,10), (30,20), (50,10)\) and (50,40). These coverage probabilities were plotted in Fig. 3. We observe from these plots that the large sample TIs are, in general, too liberal having coverage probabilities much smaller than the nominal level .95. The score TI and the exact TI are conservative except that in some cases the score TIs are less conservative than the exact TIs. For example, see the cases \((r,s) = (20,10), (20,30), (30,20)\) and (50,40).

We further compared the TIs with respect to expected widths. We tabulated some summary statistics of the expected widths of the exact and score TIs in Table 4. The summary statistics are based on expected widths for \(p = .001(.001).999.\) We considered only these two TIs because the third one, the large sample TI, is too liberal. The summary statistics of the expected widths of these two TIs clearly indicate the score TI is narrower than the exact TI for all the cases. This comparison result was anticipated because the exact TIs are more conservative than the score TIs.

Fig. 3
figure 3

Coverage probabilities of (.90, .95) tolerance intervals for NB(sp) distributions

Table 4 Summary statistics of expected widths of (.90, .95) tolerance intervals for NBin(sp) based on \(X\sim \) NBin(rp)

7 Examples

Example 1

A study was conducted to find the association between the maternal congenital heart disease and low birthweights of infants (Kikuchi 1987). A negative binomial sampling plan was used to recruit pregnant mothers until \(r = 5\) maternal congenital heart disease mothers were observed. The 5th congenital heart disease mother was observed at the 146th selection. Thus, the number of normal mothers is \(x=141\) and the number of congenital heart disease mothers is \(r=5\). Tian et al. (2009) have used these results to illustrate different interval estimation methods for the proportion p of the mothers with the congenital heart disease. Young (2014) has used the data to construct a two-sided tolerance interval for future samples of pregnant mothers.

We computed the 95% CIs for the proportion p of the mothers with the congenital heart disease using different methods described earlier, and presented them in Table 5. Note that the uniformly minimum variance unbiased estimate for p is \((r-1)/(r+x-1)= 4/145=.0276\), which indicates that the population proportion is likely to be small. For such cases, all interval estimation methods are satisfactory and similar in terms of coverage probabilities and expected widths; see Fig. 1. So, for this example, all the methods produced CIs that are not much different.

Table 5 95% confidence intervals for p and 95% prediction intervals

Suppose it is desired to estimate the expected number of pregnant mothers to be examined in order to find 15 congenial mothers. The 95% CI based on the exact CI for p is (202, 1323). That is, on average, 202 to 1323 mothers are to be examined in order to find 15 congenial mothers. The 95% CI based on the score method is [178, 1004], which is much shorter than the exact CI.

To illustrate the construction of the PIs, we computed 95% PIs for the number of pregnant mothers to be examined in order to find \(s=15\) congenital heart disease pregnant mothers. PIs based on various methods are given in Table 5. The PIs for the number of pregnant mothers to be examined to observe 15 mothers with congenital heart disease are quite different. Among all the PIs the score PI [140, 1073] is the shortest, and it means that 140 to 1073 pregnant mothers to be examined to find 15 congenital heart disease pregnant mothers.

Suppose it is desired to find an interval with 90% confidence where 90% of all future samples of healthy pregnant mothers would fall if a similar study were to be performed; that is, to capture a target number of \(s=5\) mothers with congenial heart disease. Using \((r,x)=(5,141)\), we computed the 90% exact CI (9) as (.0136, .0620). On the basis of this exact CI, the (.90, .90) TI was computed using the R function as

$$\begin{aligned}{}[\texttt {qnbinom}(.05,5,.0620), \texttt {qnbinom}(.95,5,.0136)] = [28, 666]. \end{aligned}$$

Similarly, using the score CI (13), the (.90, .90) TI was computed as [25, 539]. The (.90, .90) TIs for the total sample sizes for negative binomial sampling scheme to observe \(s=5\) mothers with congenial heart disease can be obtained by adding five to these TIs.

Example 2

The American Community Survey (ACS) collects dataFootnote 1 to help local officials, community leaders and businesses to understand the changes taking place in their communities and regions. The data are collected annually from all counties and county equivalents for various purposes. Data were first collected by self-enumeration via mailback. The completed ACS captures roughly 65% of housing units (HUs) and 75% of group quarters (GQs) originally listed in the sample. For more details on data collection, see Young (2014) who used the data to demonstrate the construction of tolerance intervals for a negative binomial distribution. The following Table 6, taken from Young (2014), shows the initial sample sizes and final interview sizes for HUs and GQs during the years 2006–2010.

Table 6 The American Community Survey data from 2006 to 2010

It is quite common in mail-in surveys that the final target sample size is considerably smaller than the initial samples. Young (2014) has used the average initial sample size as the total number of negative binomial trials required to obtain the average final interview size. The averages are given in the last row of Table 6. For example, Young has used \(r= 1,957,479\) and \(x = 2,928,714-1,957,479 = 971,235\) to construct a TI for HUs. As negative binomial distributions have additive property, we can use the total initial sample size 14,643,569 as \(r+x\) and the total final interview size 9,787,393 as r.

7.1 Prediction intervals

For a future survey of target size of 2,000,000, we would like to predict the initial sample size for HUs with 95% confidence. In our present notations, \(s=2,000,000\), \(r+x=14,643,569\) and \(r=9,787,393\), and we like to find a 90% PI for \(s+Y\), where \(Y \sim \) NBin(sp). The PIs based on all four methods are reported in Table 7. For GQs, we like to predict the initial sample size to obtain 200,000 final interviews. In this case, \(r+x=964,045\), \(r=728,740\) and \(s=200,000\). Using these data, we computed 90% PIs for the initial sample size and reported them in Table 7. All four methods produced similar PIs for both cases. To interpret the PI, we note that the exact PI for GQs is \([264,036, \ 265,122]\). This means that an initial sample size of 264,036 to 265,122 is needed to obtain the final target size of 200,000 responses. Other PIs can be interpreted similarly.

Table 7 90% prediction intervals based on the ACS data

7.2 Tolerance intervals

Based on the average of the initial sample and final interview sizes, Young (2014) has estimated (.99, .95) TI where 99% of all initial sample sizes will fall if target final sample size of 2,000,000 is required for HUs. Using the large sample CI, the (.99, .95) TI for the HUs was computed as [2,986,789, 2,997,894]. We computed the same TI using the total initial sample size and the total final interview size. On the basis of totals, the TIs based on the exact CI, score CI and the large sample CI are the same and is [2,988,119, 2,996,556]. This means that 99% of initial sample sizes from this interval would produce a target response size of 2,000,000 with confidence 95%. Young (2014) has also estimated (.99, .95) TI where 99% of all initial sample sizes fall if target final sample size of 200,000 is required for GQs. Using the large sample CI based on average sizes, Young has computed the (.99, .95) TI as [263,163, 266,011]. On the basis of the total sizes, we find the TIs based on the exact, score and large sample CIs are the same and is [263,530, 265,636]. That is, 99% of all initial sample sizes between 263,636 and 265,636 would produce 200,000 final responses with confidence 0.95.

Finally, we note that the TIs based on the total sizes are shorter than the corresponding TIs based on the average sizes for both HUs and GQs. Furthermore, all methods produced the same TI because the sample sizes are very large.

8 Concluding remarks

The classical exact methods for discrete distributions are known to be too conservative producing confidence intervals and prediction intervals that are unnecessarily wide or tests that are less powerful. For the binomial and Poisson distributions, Agresti and Coull (1988), Brown et al. (2001) and many other authors have recommended alternative approximate approaches for constructing confidence intervals and prediction intervals with satisfactory coverage probabilities and good precision. In this article, we have provided similar approximate methods for constructing CIs, PIs and TIs for negative binomial distributions. Furthermore, we showed that the approximate CIs and PIs have good coverage properties with expected widths narrower than those of the exact CIs and PIs. In terms of simplicity and accuracy, the PI based on the joint sampling approach, the score CI and the TI based on the score CI are preferable to others. These statistical intervals are not only easy to compute, but also safe to use in practical applications.