Keywords

1 Introduction

A confidence distribution (CD) refers to a sample-dependent distribution function that can represent confidence intervals (regions) of all levels for a parameter of interest [74, 90]. Instead of the usual point estimator or confidence interval, CD is a distribution estimator of a parameter of interest with a pure frequentist interpretation. The development of the CD can be traced back to, for example, [16, 28, 47, 66]. However, its associated inference schemes and applications have not received much attention until the recent surge of interest in the research of CD and its applications [25, 46, 52, 53, 72,73,74, 77, 78, 82, 90, 91, 94]. All of these developments of CDs, along with a modern definition and interpretation, provide a powerful inferential tool for statistical inference.

One of the main contributions of CD is its applications on fusion learning [12, 15, 40, 51,52,53, 72, 75, 77, 81, 91, 92]. Combining CDs from independent studies naturally preserves more information from the individual studies than a traditional approach of combining only point estimators. A unified framework of combining CDs for fusion learning generally includes three steps: (1) using a CD to summarize relevant information or obtain an inference result from each study, (2) combining information from different sources or studies by combining these CDs, and (3) making inference via the combined CD. This approach has sound theoretical support and has been applied to many practical situations with much success.

On a different note, the fiducial distribution may be considered as one special type of CD, which provides a systematic way to obtain a CD. The origin of fiducial inference can be traced back to R.A. Fisher [28] who introduced the concept of a fiducial distribution for one parameter and proposed the use of this fiducial distribution to avoid the problems related to the choice of a prior distribution. Since the mid-2000s, there has been a renewed interest in modifications of fiducial inference [2, 7, 8, 22, 24, 30, 31, 33,34,35,36,37, 41, 56, 57, 59,60,61,62,63, 68, 73, 74, 79, 85, 87, 90, 93, 96].

We briefly overview these modern approaches which extend Fisher’s original fiducial argument. We then focus on a recent development termed generalized fiducial inference and its applications [14, 17, 37, 41, 42, 44, 49, 50, 65, 86, 88, 89] that greatly expand the applicability of fiducial ideas. We demonstrate this recipe on several examples of varying complexity. The statistical procedures derived by the generalized fiducial inference often have very good performance from both theoretical and numerical points of view.

2 Confidence Distribution

2.1 The Concept of CD

This section will mainly focus on the concept of CD. The CD can be viewed as a distribution estimator, which can be utilized for constructing statistical procedures such as point estimates, confidence intervals, hypothesis tests, etc. The basic notion of CDs is related to the fiducial distribution of [28]; however, it is a pure frequentist concept. Some have suggested to view CD as the frequentist analog of Bayesian posterior distribution [e.g., 73, 74]. More broadly, if the credible intervals or regions obtained from a Bayesian posterior match with frequentist intervals or regions (either exactly or asymptotically), then the Bayesian posterior can be viewed as CD, and thus Bayesian approach is also a way to obtain CD [90].

Suppose X1, , X2, …, Xn are independent and identically distributed and \(\mathcal X\) is the sample space corresponding to the dataset (X1, X2, …, Xn). Let θ be a scalar parameter of interest and Θ be the parameter space. The following formal definitions of CD and asymptotic CD are proposed in [72, 77].

Definition 29.2.1 (CD and Asymptotic CD)

A function Hn(⋅) = Hn(x, ⋅) on \(\mathcal X \times \Theta \rightarrow [0,1]\) is called a CD for a parameter, if (1) for each given \(x \in \mathcal X\), Hn(⋅) is a (continuous) cumulative distribution function on Θ and (2) at the true parameter value θ = θ0, Hn(θ0) ≡ Hn(x, θ0), as a function of the sample x, follows the uniform distribution U(0, 1). In addition, the function Hn(⋅) is called an asymptotic CD if condition (2) is replaced by (2’) at the true parameter θ = θ0, \(H_n(\theta _0) \overset {d}{\to } U(0,1)\) as \(n \rightarrow \infty \).

From a nontechnical point of view, a CD is a function of both the parameter and the sample which satisfies two conditions. The first condition basically states that for any fixed sample, a CD is a distribution function on the parameter space. The second condition essentially requires that the corresponding inference derived by a CD has desired frequentist properties. Section 29.2.2 will further discuss how to use the second condition to extract information from a CD to make inference.

Birnbaum [9] introduced the concept of confidence curve as “an omnibus technique for estimation and testing statistical hypotheses,” which was independent of the development of CD. From a CD Hn(θ), the confidence curve can be written as

$$\displaystyle \begin{aligned}CV_n(\theta)=2\min\{ H_n(\theta),1-H_n(\theta)\}. \end{aligned}$$

Indeed, confidence curve is an alternative expression of CD and it is a very useful graphical tool for visualizing CDs. On a plot of CVn(θ) versus θ, a line across the y-axis of the significance level α, for any 0 < α < 1, intersects with the confidence curve at two points, and these two points correspond to an 1 − α level, equal-tailed, two-sided confidence interval for θ. In addition, the maximum of a confidence curve is the median of the CD which is the recommended point estimator.

We present below five illustrating examples of CDs. More examples refer to [74, 77, 90].

Example 29.2.1

Suppose the data Xi ∼ N(μ, 1), i = 1, …, n, with unknown μ. Let \(\bar x_n\) denote the sample mean. Then \(N(\bar x_n, 1/n )\) is a CD for μ, and it can be represented in the following three forms: (i) confidence distribution (cumulate distribution form), \(H_n(\mu ) = \Phi (\sqrt {n} (\mu - \bar x_n) )\); (ii) confidence density (density form), \(h_n(\mu ) = \frac {1}{\sqrt {2\pi /n}} \exp \{ -\frac {n}{2}(\mu - \bar x_n)^2 \}\); and (iii) confidence curve, \(CV_n(\mu )=2\min \{ \Phi (\sqrt {n}(\mu -\bar x_n) ) ,1-\Phi (\sqrt {n}(\mu -\bar x_n) ) \}\). See Fig. 29.1 for an illustration. The data are generated from N(0.3, 1) with sample size 100.

Fig. 29.1
figure 1

Confidence distribution presented in Example 29.2.1 in the forms of density function, cumulative distribution function, and confidence curve

Example 29.2.2 ( [77])

Suppose the data Xi ∼ N(μ, σ2), i = 1, …, n, with both unknown μ and σ. A CD for μ is \(H_n (\mu ) = F_{t_{n-1}}(\frac {\sqrt {n} (\mu - \bar x_n )}{s_n})\), where sn is the sample standard deviation and \(F_{t_{n-1}}(\cdot )\) is the cumulative distribution function of student t distribution with parameter n − 1. A CD for σ2 is \(H_n(\sigma ^2) = 1- F_{\chi _{n-1}^2} (\frac {(n-1) s^2_n}{\sigma ^2})\), where \(F_{\chi _{n-1}^2}(\cdot )\) is the cumulative distribution function of the \(\chi ^2_{n-1}\)-distribution.

Example 29.2.3 ( [77])

Let \(\widehat \theta \) be a consistent estimator of θ. For bootstrap, the distribution of \(\widehat \theta ^* - \theta \) is estimated by the bootstrap distribution \(\widehat \theta ^* - \widehat \theta \), where \(\widehat \theta ^*\) is the estimator of θ computed on a bootstrap sample [26]. An asymptotic CD for θ is given by \(H_n(\theta ) = 1 -\Pr (\widehat \theta ^*- \widehat \theta \leq \widehat \theta -\theta )= \Pr (\widehat \theta ^*\geq 2\widehat \theta -\theta ) \). In addition, when the limiting distribution of normalized \(\widehat \theta \) is symmetric, the raw bootstrap distribution \(H_n(\theta ) = 1 -\Pr (\widehat \theta - \widehat \theta ^* \leq \widehat \theta -\theta ) = \Pr (\widehat \theta ^* \leq \theta )\) is also an asymptotic CD.

Example 29.2.4

Suppose we are interested in the location parameter θ of a continuous distribution. When the distribution F is symmetric, i.e., F(θ − y) = 1 − F(θ + y), θ is the median. The Wilcoxon rank test for H0 : θ = t, H1 : θ ≠ t is based on the summation of signed ranks of Yi − t, i.e., the test statistic \(W=\sum _{i=1}^n Z_iR_i\), where Ri is the rank of |Yi − t| and Zi is an indicator variable with 1 if Yi − t > 0 and − 1 otherwise. Denote by p(t) the p-value associated with the Wilcoxon rank test for H0 : θ = t, H1 : θ ≠ t. When t varies in (−, ), the p-value p(t) is referred to as a p-value function. We can prove that the p-value function p(t) is an asymptotic CD [90]. Figure 29.2 provides illustrations of the asymptotic CD density p(t), the asymptotic CD function p(t), and the asymptotic CV \(2 \min \{p(t), 1-p(t)\}\) for two sample sizes. The data are generated from N(0, 1) with sample sizes n = 10 and 100, respectively.

Fig. 29.2
figure 2

Confidence distributions presented in Example 29.2.4 in the forms of density function, cumulative distribution function, and confidence curve. The top row is for sample size n = 10 and the bottom row is n = 100

Example 29.2.5 ( [78])

Suppose that there is an independent and identically distributed sample of size n from a semi-parametric model involving multiple parameters. Let ln(θ) be the log profile likelihood function and \(\mathcal J_n(\theta ) = -\ddot l_n(\theta )\) be the observed Fisher information for a scalar parameter of interest θ. Under certain mild assumptions, Theorem 4.1 of [78] proves that, for any given θ,

Because at the true parameter value θ = θ0, Hn(θ0) converges to U(0, 1) as \(n \rightarrow \infty \), it follows that Gn(θ0) converges to U(0, 1). Thus, Gn(θ) is an asymptotic CD. From this observation, we see that CD-based inference may subsume a likelihood inference in some occasions.

If the sample X is from a discrete distribution, we can typically invoke a large sample theory to obtain an asymptotic CD to ensure the asymptotic frequentist coverage property, when the sample size is large. However, when the sample size is limited, we sometimes may want to exam the difference between the “distribution estimator” and the U(0, 1) distribution to get a sense of under and over coverage. To expand the concept of CD to cover the cases of discrete distributions with finite sample sizes, we introduce below the notions of lower and upper CDs. The lower and upper CDs provide us inference statements that are associated with under and over coverages at every significant level.

Definition 29.2.2 (Upper and Lower CDs)

A function \(H_n^+(\cdot )=H_n^+({x},\cdot )\) on \(\mathcal X \times \Theta \rightarrow [0,1]\) is said to be an upper CD for a parameter, if (i) for each given \(x \in \mathcal X\), Hn(⋅) is a monotonic increasing function on Θ with values ranging within (0, 1) and (ii) at the true parameter value θ = θ0, \(H_n^+(\theta _0) \equiv H_n^+({x},\theta _0)\), as a function of the sample x, is stochastically less than or equal to a uniformly distributed random variable U ∼ U(0, 1), i.e.,

$$\displaystyle \begin{aligned} \Pr\left(H_n^+\big({ X},\theta_0\big) \leq t \right)\geq t. \end{aligned} $$
(29.1)

Correspondingly, a lower CD\(H_n^-(\cdot )=H_n(x,\cdot )\) for parameter θ can be defined but with (29.1) replaced by \( \Pr \left (H_n^-({X},\theta _0) \leq t \right ) \leq t\) for all t ∈ (0, 1).

More generally, we also refer to \(H_n^+(\cdot )\) and \(H_n^-(\cdot )\) as the upper and lower CD, respectively, even when the monotonic condition (i) is removed. Note that, due to the stochastic dominance inequalities in the definition, we have, for any α ∈ (0, 1),

$$\displaystyle \begin{aligned} \begin{array}{ll} &\Pr\left(\theta_0 \in \left\{\theta: H_n^+\big({ X}, \theta \big) \leq \alpha \right\} \right)\geq \alpha\, \mbox{and}\\ &\Pr\left(\theta_0 \in \left\{\theta: H_n^-({X},\theta) \leq \alpha \right\}\right) \leq \alpha. \end{array} \end{aligned}$$

Thus, a level-(1 − α) confident interval (or set) \(\{\theta : H_n^+({X},\theta )\leq 1-\alpha \}\) or \(\{\theta : H_n^-({X},\theta )\geq \alpha \}\) has guaranteed the coverage rate of (1 − α)100%, regardless of whether we have the monotonic condition in (i). After we remove the monotonic condition in (i), \(H_n^+(\cdot )\) and \(H_n^-(\cdot )\) may not be a distribution function, and the “nest-ness property” of confidence intervals/sets may also be lost. Here, the “nest-ness property” refers to “a level-(1 − α) confidence set C1−α is not necessarily inside its corresponding level-(1 − α) confidence set C\(_{1 - \alpha ^{\prime }}\), when 1 − α < 1 − α.”

To conclude this section, we present an example of lower and upper CDs.

Example 29.2.6 ( [40])

Suppose sample X is from Binomial (n, p0) with observation x. Let \(H_n(p,x) = \Pr (X > x) = \sum _{x <k\leq n} {n \choose k} p^k(1-p)^{n-k}\). We can show that P(Hn(p0, X) ≤ t) ≥ t and P(Hn(p0, X − 1) ≤ t) ≤ t. Thus, H+(p, x) = Hn(p, x) and \(H_n^-(p,x) = H_n(p,x-1)\) are lower and upper CDs for the success rate p0. The half-corrected CD [25, 37, 72] is

$$\displaystyle \begin{aligned} \frac{H_n^-(p,x)+H_n^+(p,x)}{2} &= \sum_{x<k\leq n_i} {n \choose k} p^k(1-p)^{n-k}\\ &\qquad +\frac{1}{2} {n \choose x} p^{x}(1-p)^{n-x}. \end{aligned} $$

2.2 CD-Based Inference

Analogous to the Bayesian posterior, a CD contains a wealth of information for constructing any type of frequentist inference. We illustrate three aspects of making inference based on a given CD. Figure 29.3 from [90] provides a graphical illustration of the point estimation, confidence interval, and hypothesis testing. More specifically:

Fig. 29.3
figure 3

A graphical illustration of CD-based inference [90]

Point Estimation

The natural choices of point estimators of the parameter θ given a CDHn(⋅) include (i) the median \(\widetilde \theta _n=H_n(1/2)\), (ii) the mean \(\bar \theta _n=\int _{\theta \in \Theta } \theta d H_n(\theta )\), and (iii) the mode \(\widehat \theta _n =\arg \max _{\theta \in \Theta }h_n(\theta )\), where hn(θ) = dHn(θ)∕ is the confidence density function. Under some moderate conditions, these three point estimators are consistent [77, 90, 91].

To further understand these three types of estimators, the median \(\widetilde \theta _n\) is an unbiased estimator with \(\Pr _{\theta _0}(\widetilde \theta _n \leq \theta _0)=\Pr _{\theta _0}(1/2 \leq H_n(\theta _0))=1/2\). The mean \(\bar \theta _n\) can be viewed as a frequentist analog of Bayesian estimator under the squared loss function. The mode \(\widehat \theta _n\) matches with the maximum likelihood estimator if the confidence density is from a normalized likelihood function [90].

Confidence Interval

As discussed in Sect. 29.2.1, in a confidence curve, a line across the y-axis of the significance level α intersects with the confidence curve at two points, and these two points correspond to an 1 − α level, equal-tailed, two-sided confidence interval for θ, i.e., \((H^{-1}_n(\alpha /2), H^{-1}_n(1-\alpha /2))\). Furthermore, \((-\infty , H^{-1}_n(1-\alpha )]\) and \([H^{-1}_n(\alpha ),\infty )\) are one-sided 1 − α level confidence intervals for the parameter θ.

Hypothesis Testing

From a CD, one can obtain p-values for various hypothesis testing problems. The natural thinking is to measure the support that Hn(⋅) lends to a null hypothesis [29]. Xie and Singh [90] summarized making inference for hypothesis testing from a CD in the following theorem.

Theorem 29.2.1

(i) For the one-sided test K0 : θ  C versus K1 : θ  Cc, where c denotes the complementary set and C is an interval of the type of Cl = (−, b] or Cu = [b, ), we have supθCPrθ(p(C) ≤ α) = α, and p(C) = Hn(C) is the corresponding p-value of the test. (ii) For the singleton test K0 : θ = b versus K1 : θ   b, we have \(\Pr _{\theta =b}(2\min \{p(C_l),p(C_u)\}\leq \alpha )=\alpha \), and \(2 \min \{p(C_l),p(C_u)\}=2\min \{H_n(b),1-H_n(b)\}\) is the p-value of the corresponding test.

Example 29.2.7 ( [90])

Consider Example 29.2.2 again. A CD for θ is \(H_n = F_{t_{n-1}}(\frac {\sqrt {n} (\mu - \bar x_n )}{s_n})\). For a one-sided test K0 : μ ≤ b versus K1 : μ > b, its support on the null set C = (−, b] is

$$\displaystyle \begin{aligned} p(C) = p((-\infty,b] ) =H_n (b) = F_{t_{n-1}}(\sqrt{n} (b- \bar x_n) /s_n ). \end{aligned} $$

This is the same p-value using the one-sided t-test. For a two-sided test K0 : θ = b versus K1 : θ ≠ b, the null set C = {b}. We would like to measure the supports of two alternative sets \(p(C^c_{l}) \) and \(p(C^c_{u})\). The rejection region is defined as \(\{x : 2\max \{p(C^c_l),p(C^c_u)\}\geq 1-\alpha \}\), i.e.,

$$\displaystyle \begin{aligned} &\{x : 2\min\{p(C_l),p(C_u)\}\leq \alpha \} = \{x : 2\min\{H_n(b), \\ & 1- H_n(b)\}\leq \alpha \}. {} \end{aligned} $$
(29.2)

Under K0 with θ = b, \(2 \min \{ p(C_l),p(C_u) \}\,{=}\, 2 \min \{ H_n(b),\) 1 − Hn(b)}∼ U(0, 1) by the definition of a CD. Thus,

$$\displaystyle \begin{aligned} &\text{Pr}_{\theta=b} (2\min\{ p(C_l),p(C_u) \} \leq \alpha ) = \text{Pr}_{\theta=b} (2\min\{ H_n(b),\\ &1- H_n(b) \} \leq \alpha ) = \alpha \end{aligned} $$

and the reject region (29.2) corresponds to a level α test. Again, the p-value \(2\min \{p(C_l),p(C_u)\}\) is the standard p-value from a two-sided t-test.

2.3 Combination of CDs for Fusion Learning

One of the important applications of CD development is on fusion learning, which synthesizes information from disparate sources with deep implications for meta-analysis [12, 15, 40, 51,52,53, 72, 75, 77, 81, 91, 92]. Fusion learning aims to combine inference results obtained from different data sources to achieve a more efficient overall inference result. CD-based fusion learning applies even when inference results are derived from different tests or different paradigms, i.e., Bayesian, fiducial, and frequentist (BFF).

The combination of CD can be considered as a unified framework for fusion learning. Suppose there are k independent studies that are dedicated to estimate a common parameter of interest θ. We assume that we have a CDHi(⋅) for θ for the sample xi of the i-th study. Singh et al. [77] proposed a general recipe for combining these k independent CDs:

$$\displaystyle \begin{aligned} H^{c}(\theta) \equiv G_c \{ g_c(H^1(\theta), \ldots, H^k(\theta) ) \}, {} \end{aligned} $$
(29.3)

where gc is a given continuous function on [0, 1] which is nondecreasing in each coordinate, the function Gc is determined by the monotonic function gc with \(G_c(t) = \Pr (g_c(U_1, \ldots ,U_k)\leq t)\), and U1, …, Uk are independent uniform random variables. The function Hc(⋅) contains information from all k samples and is referred to as a combined CD for the parameter θ. Furthermore, the CD obtained by Eq. (29.3) does not require any information regarding how the input CDs are obtained.

A special class of the general combining framework (29.3) plays a prominent role in unifying many modern meta-analysis approaches. The choice of the function gc for this special class is

$$\displaystyle \begin{aligned} g_c(u_1,\ldots, u_k)= w_1 F^{-1}(u_1) + \cdots + w_k F^{-1}(u_k), {} \end{aligned} $$
(29.4)

where F(⋅) is a given cumulative distribution function and wi ≥ 0 with at least one wi ≠ 0 are generic weights for the combination rule. Generally, there are two types of weights: fixed weights to improve the efficiency of combination and adaptive weights based on data.

As shown in [91], it is remarkable that by choosing different gc functions, all the classic approaches of combining p-values including Fisher, Normal (Stouffer), Min (Tippett), Max, and Sum methods [55] and all the five model-based meta-analysis estimators described in [67] including the maximum likelihood method and Bayesian approach under fixed-effects model, method of moment estimators, restricted maximum likelihood method, and Bayesian estimator with a normal prior under random-effects model, can all be obtained through a CD combination framework. Furthermore, it was shown in [94] that Mantel-Haenszel and Peto methods as well as Tian et al.’s method of combining confidence intervals [81] for meta-analysis of 2 × 2 tables can also all be obtained through a CD combination framework. An R-package “gmeta” developed by [95] implements the CD combining framework for fusion learning including classical p-value combination methods from [55], meta-analysis estimators with both fixed-effects and random-effects models, and many other approaches.

Fusion learning under the framework of combining CD provides an extensive and powerful tool for synthesizing information from diverse data sources. This approach has sound theoretical support and has been applied to many practical situations including robust fusion learning [91], exact fusion learning for discrete data [52, 81], fusion learning for heterogeneous studies [53], nonparametric fusion learning [15, 51], split-conquer-combine approach [12], individualized fusion learning (i-fusion) [75], etc. We refer to [13] for more detailed discussions.

2.4 Multivariate CDs

A simultaneous CD for vector parameters can sometimes be difficult to define [72], especially on how to define a multivariate CD in the exact sense in some non-Gaussian settings to ensure that their marginal distributions are CDs for the corresponding single parameter. We consider the Behrens-Fisher problem of testing for the equality of means from two multivariate normal distributions when the covariance matrices are unknown and possibly not equal. A joint CD of the two population means (μ1, μ2) has a joint density of the form

$$\displaystyle \begin{aligned} f_1\left(\frac{\mu_1 -\bar x_1}{s_1/\sqrt{n}_1}\right)f_2\left(\frac{\mu_2-\bar x_2}{s_2/\sqrt{n}_2}\right)/\left(s_1s_2\sqrt{n_1n_2}\right), \end{aligned}$$

where fi is the density function for the student t-distribution with ni − 1 degrees of freedom, i = 1, 2. The marginal distribution of μ1 − μ2 is only an asymptotic CD but not a CD in the exact sense.

The good news in the multidimensional case is that under asymptotic settings or wherever bootstrap theory applies, one can still work with multivariate CDs [90]. When no analytic confidence curve for the parameter vector θ of interest is available, the product method of [4] can be used if confidence curves are available for each component of the vector [72]. Additionally, if we only consider center-outward confidence regions instead of all Borel sets in the p × 1 parameter space, the central-CDs considered in [78] and the confidence net considered in [71] offer coherent notions of multivariate CDs in the exact sense [90].

There are many approaches to obtain CDs. One way is normalizing a likelihood function curve with respect to its parameters so that the area underneath the curve is one. The normalized likelihood function is typically a density function. For instance, under some mild conditions, Fraser and McDunnough [32] show that this normalized likelihood function is the normal density function of an asymptotic CD. Other ways like bootstrap distributions and p-value functions also often provide valid CDs. Finally, CDs and fiducial distributions have been always linked since their inception. The class of fiducial inference provides another systematic way to obtain CDs and we will further discuss fiducial inference in the next section.

3 Fiducial Inference

CD can be somehow viewed as “the Neymanian interpretation of Fisher’s fiducial distributions” [74]. From the definition of CD and fiducial distribution, we may consider the fiducial distribution as one special type of CD, though the CD looks at the problem of obtaining an inferentially meaningful distribution on the parameter space from a pure frequentist point of view [90]. Nevertheless, fiducial inference provides a systematic way to obtain a CD, and its development provides a rich class of literature for CD inference. We briefly review fiducial inference and its recent developments in this section.

3.1 Fiducial Inference

R.A. Fisher introduced the idea of fiducial probability and fiducial inference [28] as a potential replacement of the Bayesian posterior distribution. Although he discussed fiducial inference in several subsequent papers, there appears to be no rigorous definition of a fiducial distribution for a vector parameter. The basic idea of the fiducial argument is switching the role of data and parameters to introduce the distribution on the parameter space. This obtained distribution then summarizes our knowledge about the unknown parameter. Since the mid-2000s, there has been a renewed interest in modern modifications of fiducial inference. The common approaches for these modifications rely on a definition of inferentially meaningful probability statements about subsets of the parameter space without introducing any prior information.

These modern approaches include generalized fiducial inference [37, 41], Dempster-Shafer theory [22, 24], and inferential models [56, 61]. Objective Bayesian inference, which aims at finding nonsubjective model-based priors, can also be seen as addressing the same question. Examples of recent breakthroughs related to reference prior and model selection are [2, 7, 8]. Another related approach is based on higher-order likelihood expansions and implied data-dependent priors [30, 31, 33,34,35,36]. There are many more references that interested readers can find in [41].

3.2 Generalized Fiducial Distribution

Generalized fiducial inference, motivated by [83, 84], has been at the forefront of the modern fiducial revival. Generalized fiducial inference defines a data-dependent measure on the parameter space by using an inverse of a deterministic data generating equation without the use of Bayes theorem.

Motivated by Fisher’s fiducial argument, generalized fiducial inference begins with expressing the relationship between the data Y  and the parameters θ as

$$\displaystyle \begin{aligned} Y = G(U,\theta), \end{aligned} $$
(29.5)

where G(⋅, ⋅) is a deterministic function termed as the data generating equation and U is the random component of this data generating equation whose distribution is independent of parameters and completely known.

The data Y  are created by generating a random variable U and plugging it into the data generating equation (29.5). For example, a single observation from N(μ, 1) distribution can be written as Y = μ + U, where θ = μ and U is N(0, 1) random variable.

Fisher’s original fiducial argument only addresses the simple case where the data generating equation (29.5) can be inverted and the inverse Qy(u) = θ exists for any observed y and for any arbitrary u. One can define the fiducial distribution for θ as the distribution of Qy(U) where U is an independent copy of U. Equivalently, a sample from the fiducial distribution of θ can be obtained by first generating \(U^\star _i,\) and then let \(\theta _i^\star =Q_y(U^\star _i)\), i = 1, …, n. Point estimation and confidence intervals for θ can be obtained based on this sample. In the N(μ, 1) example, Qy(u) = y − u and the fiducial distribution is therefore the distribution of y − U ∼ N(y, 1).

In the case of no θ satisfying Eq. (29.5), Hannig [37] proposed to use the distribution of U conditional on the event {u : y = G(u, θ), for some θ}. Hannig et al. [41] generalized this approach and proposed an attractive definition of generalized fiducial distribution (GFD) through a weak limit.

Definition 29.3.1

A probability measure on the parameter space Θ is called a GFD if it can be obtained as a weak limit

(29.6)

Hannig et al. [41] pointed out a close relationship between GFD and approximate Bayesian computations (ABC) [3]. In an idealized ABC, one first generates an observation θ from the prior, then generates a new sample using a data generating equation y = G(U, θ), and compares the generated data with the observed data y. If the observed and generated datasets are close, i.e., ∥y − y∥≤ 𝜖, the generated θ is accepted; otherwise it is rejected and the procedure is repeated. On the other hand, as for GFD, one first generates U, finds a best fitting , computes y = G(U, θ), again accepts θ if ∥y − y∥≤ 𝜖, and rejects otherwise. In either approach an artificial dataset y = G(U, θ) is generated and compared to the observed data. The main difference is that the Bayes posterior simulates the parameter θ from the prior, while GFD uses the best-fitting parameter.

Fiducial distributions often have good frequentist properties, and corresponding fiducial confidence intervals often give asymptotically correct coverage [37, 41]. In addition, fiducial distribution is a data-dependent measure on the parameter space and thereby a CD. Xie and Singh [90] described the relation between the concepts of CD and fiducial distributions using an analogy in point estimation: A CD is analogous to a consistent estimator and a fiducial distribution is analogous to a maximum likelihood estimator. In the context of point estimation, a consistent estimator does not have to be a maximum likelihood estimator. But under some regularity conditions, the maximum likelihood estimator typically provides a standard procedure to obtain a consistent estimator. In the context of distribution estimator, a CD does not have to be a fiducial distribution. However, under suitable conditions, a fiducial distribution often has good frequentist properties and thus a CD.

3.3 A User-Friendly Formula for GFD

While Definition (29.6) for GFD is conceptually and mathematically appealing, it is not clear how to compute the limit in most of practical situations. The following theorem proposed by [41] provides a computational tool.

Theorem 29.3.1

Under certain assumptions, the limiting distribution in (29.6) has a density

$$\displaystyle \begin{aligned} r(\theta|y)=\frac{f(y,\theta) J(y,\theta)}{\int_\Theta f(y,\theta^{\prime}) J(y,\theta^{\prime})\,d\theta^{\prime}}, \end{aligned} $$
(29.7)

where f(y, θ) is the likelihood and the function

$$\displaystyle \begin{aligned} J(y,\theta)=D\left(\left.\frac{d}{d\theta} G(u,\theta)\right|{}_{u=G^{-1}(y,\theta)}\right). \end{aligned} $$
(29.8)

If (i) n = p, then \(D(A)=|\det A|\). Otherwise the function D(A) depends on the norm used; (ii) the l norm gives \(D(A)=\sum \limits _{\mathit{\text{{\textbf {i}}}}=(i_1,\ldots ,i_p)}\left | {\det (A)}_{\mathit{\text{{\textbf {i}}}}} \right |\);Footnote 1 (iii) under an additional assumption stated in [41], the l2 norm gives \(D(A)=(\det A^\top A)^{1/2}\).

Hannig et al. [41] recommended using (ii) for practitioners. A nice property of GFD is that GFD is invariant under smooth re-parameterizations. This property follows directly from (29.6), since for an appropriate selection of minimizers and any one-to-one function θ = ϕ(η),

Note that GFD could change with transformations of the data generating equation. Assume that the observed dataset has been transformed by a one-to-one smooth transformation Z = T(Y ). By the chain rule, the GFD based on this new data generating equation and observed data z = T(y) is the density (29.7) with the Jacobian function

$$\displaystyle \begin{aligned} J_T(z,\theta)= D\left(\left.\frac{d}{dy} T(y) \cdot \frac{d}{d\theta} G(u,\theta)\right|{}_{u=G^{-1}(y,\theta)}\right), \end{aligned} $$
(29.9)

where for simplicity we write y instead of T−1(z).

3.4 Examples of GFD

In this section we will consider two examples, linear regression and uniform distribution. In the first case, the GFD is the same as Bayes posterior with respect to the independence Jeffreys prior, while in the second case, the GFD is not a Bayes posterior with respect to any prior (that is not data dependent).

Linear Regression [41]

We consider a generalized fiducial approach to regression problem. We express linear regression via the data generating equation,

$$\displaystyle \begin{aligned} Y = G(U, \theta)=X \beta +\sigma U, \end{aligned}$$

where Y  is the dependent variables, X is the design matrix, θ = (β, σ) are the unknown parameters, and U is a random vector with known density f(u) independent of θ and X. Note that \(\frac {d}{d\theta } G(U,\theta ) = (X,U)\) and U = (y − )∕σ; the Jacobian in (29.9) using the l norm simplifies to

$$\displaystyle \begin{aligned} J_\infty(y,\theta)= \sigma^{-1} \sum_{\substack{{\boldsymbol{i}}=(i_1,\ldots,i_p)\\ 1\leq i_1<\cdots<i_p\leq n}}\left|\det\left(X, Y\right)_{\boldsymbol{i}}\right|, \end{aligned}$$

and the density of GFD is

$$\displaystyle \begin{aligned} r(\beta,\sigma | y)\propto \sigma^{-n-1} f((Y-X\beta)/\sigma). \end{aligned}$$

The fiducial solution is the same as the Bayesian solution using Jeffreys prior [5]. Furthermore, by a simple calculation, the Jacobian with l2 norm differs from J(y, θ) only by a constant; the GFD remains unchanged.

GFD in Irregular Models [41]

We consider an irregular model U(a(θ) − b(θ), a(θ) + b(θ)). The reference prior for this model has been shown complex in Theorem 8 from [7]. Considering GFD approach, we first express the observed data by the following data generating equation:

$$\displaystyle \begin{aligned} Y_i=a(\theta)+b(\theta) U_i,\quad U_i\ \overset{i.i.d.}{\sim} \ U(-1,1). \end{aligned}$$

By simple algebra,

$$\displaystyle \begin{aligned} \frac{d}{d\theta} G(u,\theta) = a^{\prime}(\theta)+b^{\prime}(\theta)U \; \text{with} \; U=b^{-1}(\theta)(Y-a(\theta)). \end{aligned} $$

If a(θ) > |b(θ)|, (29.8) simplifies to

$$\displaystyle \begin{aligned} J_1(y,\theta)=n[a^{\prime}(\theta)-a(\theta)\{\log b(\theta)\}^{\prime}+\bar y_n\{\log b(\theta)\}^{\prime}], \end{aligned}$$

and the GFD is

$$\displaystyle \begin{aligned} r_1(\theta|y)\propto \frac{a^{\prime}(\theta)-a(\theta)\{\log b(\theta)\}^{\prime}+\bar y_n\{\log b(\theta)\}^{\prime}}{b(\theta)^n}I_{\{a(\theta)-b(\theta)<y_{(1)}\ \&\ a(\theta)+b(\theta)>y_{(n)}\}}.\end{aligned} $$

Consider an alternative fiducial solution, which constructs the GFD based on the minimal sufficient and ancillary statistics Z = {h1(Y(1)), h2(Y(n)), (YY(1))∕(Y(n)Y(1))}, where Y(1), Y(n) are order statistics, \( h_1^{-1}(\theta )=EY_{(1)}=a(\theta )-b(\theta )(n-1)/(n+1) \mbox{ and } h_2^{-1}(\theta )=EY_{(n)}=a(\theta )+b(\theta )(n-1)/(n+1).\) By a simple calculation,

$$\displaystyle \begin{aligned} J_2(y,\theta)&=(w_1+w_2)\left[a^{\prime}(\theta)-a(\theta)\{\log b(\theta)\}^{\prime}+\frac{w_1 y_{(1)}+w_2 y_{(n)}}{w_1+w_2}\{\log b(\theta)\}^{\prime}\right],\\ r_2(\theta|y)& \propto \frac{I_{\{a(\theta)-b(\theta)<y_{(1)}\ \&\ a(\theta)+b(\theta)>y_{(n)}\}}} {\left[(w_1+w_2)[a^{\prime}(\theta)-a(\theta)\{\log b(\theta)\}^{\prime}]+ (w_1 y_{(1)}+w_2 y_{(n)})\{\log b(\theta)\}^{\prime}\right]^{-1} b(\theta)^n}, \end{aligned} $$

where \(w_1=h_1^{\prime }(y_{(1)})\) and \(w_2=h_2^{\prime }(y_{(n)})\).

Hannig et al. [41] performed extensive simulation studies for a particular case U(θ, θ2) comparing GFD to the Bayesian posteriors with the reference prior \(\pi (\theta )=\frac {(2\theta -1)}{\theta (\theta -1)}e^{\psi \left (\frac {2\theta }{2\theta -1}\right )}\) [7]Footnote 2 and flat prior π(θ) = 1. The simple GFD, the alternative GFD, and the reference prior Bayes posterior maintain nominal coverage for all parameter settings. However, the flat prior Bayes posterior does not have a satisfactory coverage, with the worst departures from nominal coverage for small sample size and large parameter θ.

Nonparametric Fiducial Inference with Right-Censored Data [17]

Let failure times Xi (i = 1, …, n) follow the true distribution function F0 and censoring times Ci (i = 1, …, n) have the distribution function R0. We treat the situation when failure and censoring times are independent and unknown. Suppose we observe right-censored data {yi, δi} (i = 1, …n), where yi = xi ∧ ci is the minimum of xi and ci, δi = I{xi ≤ ci} denotes censoring indicator.

Consider the following data generating equation:

$$\displaystyle \begin{aligned} & Y_i=F^{-1}(U_i)\wedge R^{-1}(V_i),\quad \Delta_i=I\{F^{-1}(U_i)\leq R^{-1}(V_i)\}\\ & (i = 1,\ldots n), \end{aligned} $$

where Ui, Vi are independent and identically distributed U(0, 1).

For a failure event δi = 1, we have full information about failure time xi, i.e., xi = yi, and partial information about censoring time ci, i.e., ci ≥ yi. Thus,

$$\displaystyle \begin{aligned} F^{-1}(u_i)=y_i \Longleftrightarrow F(y_i)\geq u_i, F(y_i-\epsilon)< u_i ~\text{for any}~ \epsilon>0. \end{aligned}$$

For a censored event δi = 0, we only know partial information about xi, i.e., xi > yi, and full information on ci, i.e., ci = yi. Similarly,

$$\displaystyle \begin{aligned} F^{-1}(u_i)> y_i &\Longleftrightarrow F(y_i)< u_i,\\ R^{-1}(v_i)=y_i &\Longleftrightarrow R(y_i)\geq v_i, R(y_i-\epsilon)< v_i ~\text{for any}~ \epsilon>0. \end{aligned} $$

The complete inverse map of the data generating equation is

$$\displaystyle \begin{aligned} Q^{F,R}(y,\delta,u,v)=\bigcap_i Q^{F,R}_{\delta_i}(y_i,u_i,v_i)= Q^{F}(y,\delta,u)\times Q^{R}(y,\delta,v), \end{aligned} $$
(29.10)

where

$$\displaystyle \begin{aligned} Q^F(y,\delta,u)=\left\{F: \begin{cases} F(y_i)\geq u_i, F(y_i-\epsilon)< u_i ~\text{for any}~ \epsilon>0 & \mbox{for all }i\mbox{ such that }\delta_i=1\\ F(y_j)< u_j & \mbox{for all }j\mbox{ such that }\delta_j=0 \end{cases} \right\} ,\end{aligned} $$
(29.11)

and QR(y, δ, v) is analogous.

Let (U, V) be an independent copy of (U, V ). Because the inverse (29.10) separates into a Cartesian product, and of the fact that U and V are independent, the marginal fiducial distribution for the failure distribution function F is

$$\displaystyle \begin{aligned} Q^F(y,\delta,U^*) \mid \{Q^F(y,\delta,U^*) \neq \emptyset\}. \end{aligned}$$

Figure 29.4 from [17] demonstrates the survival function representation of QF(y, δ, u), as defined in Eq. (29.11), for one dataset with n = 8 observations of X following Weibull(20, 10) censored by Z following Exp(20). Each of the panels corresponds to a different value of u, where each u is a realization of U. Any survival function lying between the upper red and the lower black fiducial survival functions corresponds to an element of the closure of QF(y, δ, u). The technical details of sampling refer to Algorithm 1 in [17]. The corresponding fiducial-based confidence intervals proposed in [17] maintain coverage in situations where asymptotic methods often have substantial coverage problems. Furthermore, as also shown in [17], the average length of their log-interpolation fiducial confidence intervals is often shorter than the length of confidence intervals for competing methods that maintain coverage. As pointed by [80], it would also be interesting to consider other choices of fiducial samples such as monotonic spline interpolation.

Fig. 29.4
figure 4

Two realizations of fiducial curves for a sample of size 8 from Weibull(20, 10) censored by Exp(20) [17]. Here fiducial curves refer to Monte Carlo samples\(S^L_i\), \(S^U_i\), and \(S^I_i\) (i = 1, 2) from the GFD. The red and black curves are corresponding realizations of the upper and lower fiducial survival functions. The green curve is the log-linear interpolation type of survival functions. The circle points denote failure observations. The triangle points denote censored observations. The dashed blue curve is the true survival function of Weibull(20, 10)

GFDs for Discrete Distributions [41]

Let Y  be a random variable with distribution function F(y|θ). Assume there is \(\mathcal Y\) so that \(P_\theta (Y\in \mathcal Y)=1\) for all θ, and for each fixed \(y\in \mathcal Y\), the distribution function is either a nonincreasing function of θ, spanning the whole interval (0, 1), or a constant equal to 1; the left limit F(y|θ) is also either a nonincreasing function of θ spanning the whole interval (0, 1) or a constant equal to 0.

Define \(F^-(a|\theta )=\inf \{y: F(y|\theta )\geq a\}\). It is well known [11] that if U ∼ U(0,1), Y = F(U|θ) has the correct distribution and we use this association as a data generating equation. It follows that both \(Q^+_y(u)=\sup \{\theta : F(y|\theta )=u\}\) and \(Q^-_y(u)=\inf \{\theta : F(y_-|\theta )=u\}\) exist and satisfy \(F(y|Q^+_y(u))=u\) and \(F(y_-|Q^-_y(u))=u\). Consequently,

$$\displaystyle \begin{aligned} P(Q^+_y(u)\leq t) & = 1-F(y|t)\;\;\mbox{and} \\ P(Q^-_y(u)\leq t) & = 1-F(y_-|t). \end{aligned} $$

Note that for all u ∈ (0, 1), the function F(u|θ) is nonincreasing in θ and the closure of the inverse image \(\bar {Q}_y(u)=\{Q^-_y(u),Q^+_y(u)\}\). The half-corrected GFD has distribution function

$$\displaystyle \begin{aligned} R(\theta| y)=1-\frac{F(y|\theta)+F(y_-|\theta)}2. \end{aligned}$$

If either of the distribution functions is constant, we interpret it as a point mass at the appropriate boundary of the parameter space. Analogous argument shows that if the distribution function and its left limit were nondecreasing in θ, the half-corrected GFD would have distribution function

$$\displaystyle \begin{aligned} R(\theta| y)=\frac{F(y|\theta)+F(y_-|\theta)}2. \end{aligned}$$

Hannig et al. [41] provide a list of the half-corrected GFDs for three well-known discrete distributions. Let Beta(0, n + 1) and Beta(x + 1, 0) denote the degenerate distributions on 0 and 1, respectively. Let Γ(0, 1) denote the degenerate distribution on 0:

  • X ∼ Binomial(m, p) with m known. GFD is the mixture of Beta(x + 1, m − x) and Beta(x, m − x + 1) distributions [37].

  • X ∼ Poisson(λ). GFD is the mixture of Γ(x + 1, 1) and Γ(x, 1) distributions [22].

  • X ∼ Negative Binomial(r, p) with r known. GFD is the mixture of Beta(r, x − r + 1) and Beta(r, x − r) distributions [38].

Model Selection via GFD [41]

Hannig and Lee [39] introduced model selection into the generalized fiducial inference paradigm in the context of wavelet regression. Two important ingredients are needed for fiducial model selection: (1) include the choice of model as one of the parameters; (2) include penalization in the data generating equation.

Consider a finite collection of models \(\mathcal M\). The data generating equation is

$$\displaystyle \begin{aligned} Y=G(M, \theta_M,U),\qquad M\in\mathcal M,\ \theta_M\in\Theta_M, \end{aligned} $$
(29.12)

where Y  is the observation, M is the model considered, θM includes the parameters associated with model M, and U is a random vector of with fully known distribution independent of any parameters. Hannig and Lee [39] proposed a novel way of adding a penalty into the fiducial model selection. In particular, for each model M, they proposed to augment the data generating equation (29.12) by

$$\displaystyle \begin{aligned} 0=P_k,\quad k=1,\ldots,\min(|M|,n), \end{aligned} $$
(29.13)

where Pk are independent and identically distributed continuous random variables independent of U with fP(0) = q and q is a constant determined by the penalty. Hannig and Lee [39] recommended using q = n−1∕2 as the default penalty. Note that the number of additional equations is the same as the number of unknown parameters in the model. As we never actually observe the outcomes of the extra data generating equations, we will select their values as pi = 0.

For the augmented data generating equation, we have the following theorem from [41]. The quantity r(M|y) can be used for inference in the usual way. For example, fiducial factor, the ratio r(M1|y)∕r(M2|y), can be used in the same way as a Bayes factor, as discussed in [6] in the context of Bayesian model selection.

Theorem 29.3.2 ( [41])

Suppose |M|≤ n and certain assumptions hold; the marginal generalized fiducial probability of model M is

$$\displaystyle \begin{aligned} r(M|y)=\frac{q^{|M|} \int_{\Theta_M} f_M(y,\theta_M) J_M(y,\theta_M)\,d\theta_M}{\sum_{M^{\prime}\in\mathcal M}q^{|M^{\prime}|}\int_{\Theta_{M^{\prime}}} f_{M^{\prime}}(y,\theta_{M^{\prime}}) J_{M^{\prime}}(y,\theta_{M^{\prime}})\,d\theta_{M^{\prime}}}, {} \end{aligned} $$
(29.14)

where fM(y, θM) is the likelihood and JM(y, θM) is the Jacobian function computed using (29.9) for each fixed model M.

For more details on the use of fiducial model selection, see [39] and [43].

4 Applications and Numerical Examples

4.1 CD-Based Inference

Two-Parameter Exponential Distribution

Inference procedures based on the two-parameter exponential model, Exp(μ, σ), are extensively used in several areas of statistical practice, including survival and reliability analysis. The probability distribution function and cumulative distribution function of a random variable X ∼ Exp(μ, σ) are given, respectively, by

$$\displaystyle \begin{aligned} f(x)&=\frac{1}{\sigma}\exp\Bigg\{-\frac{x-\mu}{\sigma}\Bigg\},\\ F(x)&=\begin{cases} 1-\exp\Bigg\{-\frac{x-\mu}{\sigma}\Bigg\} & \text{if}~ x>\mu,\\ 0 & \text{if}~ x\leq \mu, \end{cases} \end{aligned} $$

and survival function (also known as reliability function) is S(x) = 1 − F(x). The inference problem of interest is to obtain confidence intervals (sets) of μ, σ and S(t) at a given t > 0.

Let X(1), …, X(k) be the k (k > 1) smallest observations among X1, …, Xn. Then the maximum likelihood estimator of μ and σ are

$$\displaystyle \begin{aligned} \widehat \mu= X_{(1)},\quad \text{and} \quad \widehat \sigma= \frac{1}{k} \left\{ \sum_{i=1} ^k X_{(i)}+(n-k)X_{(k)}- nX_{(1)} \right\}. \end{aligned} $$

It turns out that \(\widehat \mu \) and \(\widehat \sigma \) are independent and they follow the distributions

$$\displaystyle \begin{aligned} U=2n(\widehat \mu-\mu )/\sigma \sim \chi^2(2), \quad V=2k\widehat \sigma/\sigma \sim \chi^2(2k-2), \end{aligned} $$
(29.15)

respectively. Here χ2(m) is the chi-square distribution with degree of freedom m. We provide below a simple CD-based method to answer the inference problem of interest.

From Eq. (29.15), we have

$$\displaystyle \begin{aligned} \frac{n(\widehat \mu-\mu)}{k \widehat \sigma} = \frac{U/2}{V/(2k-2)} \sim F(2, 2k-2), \end{aligned}$$

where F(a, b) is the F-distribution with degrees of freedom a and b. By the pivot-based CD construction method [78, p134], a CD for μ is \(H_1(\mu ) = 1 - F_{F(2, 2k-2)}(\frac {n(\widehat \mu -\mu )}{k \widehat \sigma })\), where FF(2,2k−2) is the cumulative distribution function of F(2, 2k − 2)-distribution. Similarly, a CD for σ is \(H_2(\sigma ) = 1 - F_{\chi ^2(2k-2)}(\frac {2k(\widehat \sigma )}{\sigma })\), where \(F_{\chi ^2(2k-2)}\) is the cumulative distribution function of χ2(2k − 2)-distribution. Inferential statements regarding μ and σ, including confidence intervals and testing results, can be obtained from these two CDs. Coverage rates and test errors obtained from these two CDs are exact.

We can also consider the inference for (μ, σ) jointly. Here, we introduce a simulation-based approach. Let U∼ χ2(2) and V∼ χ2(2k − 2) be two independently simulated random numbers. Define

$$\displaystyle \begin{aligned} \xi^* = \widehat \mu - \frac{k\widehat \sigma }{n} \frac{U^* }{V^* } \quad \mbox{and} \quad \zeta^* = \frac{2k \widehat \sigma}{V^*}. \end{aligned}$$

Then, \(\xi ^{*} | (\widehat \mu , \widehat \sigma ) \sim H_1(\mu )\) and \(\zeta ^{*} | (\widehat \mu , \widehat \sigma ) \sim H_2(\sigma )\), and they are called CD random variables [90]. Furthermore, the underlying joint distribution of (ξ, ζ), given \((\widehat \mu , \widehat \sigma )\), is a joint CD function H3(μ, σ) of (μ, σ). If we simulate a large number of, say M, copies of (U, V), then we can get M copies of (ξ, ζ). In order to make inference statements about (μ, σ), we can treat these M copies of \((\xi _1^*, \zeta _1^*), \ldots , (\xi _M^*, \zeta _M^*)\) as if they were M copies of bootstrap estimators in bootstrap inference or as if they were M copies of random samples from the posterior distribution of (μ, σ) in a Bayesian inference.

Additionally, we can also use the M copies of CD random variables \((\xi _1^*, \zeta _1^*), \ldots , (\xi _M^*, \zeta _M^*)\) to obtain a pointwise confidence band for S(t), t > 0. For each given t > 0, we compute \(\kappa _j^*(t) = \exp \{ - (t - \xi _j^*)/\zeta _j^*\}\), for j = 1, …, M. Then \([\kappa _{[\alpha M]}^*(t), +\infty )\) and \([\kappa _{[\frac {\alpha }2 M]}^*(t), \kappa _{[\frac {(1 - \alpha )}{2} M]}^*(t)]\) are the one-sided and two-sided level-α confidence intervals of S(t), respectively, where \(\kappa _{[qM]}^*(t)\) is the q-th quantile of \(\kappa _1^*(t), \ldots , \kappa _M^*(t)\). Now by varying t, \([\kappa _{[\alpha M]}^*(t),+\infty )\) forms a level-α lower confidence band, and \([\kappa _{[\frac {\alpha }2 M]}^*(t), \kappa _{[\frac {(1 - \alpha )}{2} M]}^*(t)]\) forms a level-α confidence band for the survival function S(t).

We can show that this set of exact confidence bands derived from the CD method matches with those obtained in [69] using Tsui and Weerahandi’s generalized inference approach [83], but the CD approach is very simple and more direct. Roy and Mathew [69] illustrated the 95% lower limit \(\widetilde S(t)\) for time ranging from 150 to 2000 in Figure 1 of [69] using a real data example with 19 observations taken from [45]. The data deal with mileages for military personnel carriers that failed in service. Figure 29.5 is a similar plot for the confidence band, using our CD approach with M = 1000.

Fig. 29.5
figure 5

Point estimate (solid line) and 95% confidence band (dashed line) of CD-based inference

Data [45]:

162, 200, 271, 320, 393, 508, 539, 629, 706, 777, 884, 1008, 1101, 1182, 1463, 1603, 1984, 2355, 2880

Bivariate Normal Correlation

Suppose we have the following bivariate normal distribution:

$$\displaystyle \begin{aligned} N\left(\begin{pmatrix} \mu_1\\ \mu_2 \end{pmatrix},\begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2\\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix}\right), \end{aligned}$$

and let ρ denote the correlation coefficient. One could use the asymptotic pivot, Fisher’s Z [27, 78],

$$\displaystyle \begin{aligned} \frac{1}{2}\log\frac{1+r}{1-r}-\frac{1}{2} \log \frac{1+\rho}{1-\rho}, \end{aligned} $$

where r is the sample correlation. The limiting distribution of the above pivot is \(N(0,\frac {1}{n-3})\). Therefore, the asymptotic CD is

$$\displaystyle \begin{aligned} H_n(\rho) = 1- \Phi\left( \sqrt{n-3} \left[\frac{1}{2}\log\frac{1+r}{1-r}-\frac{1}{2} \log \frac{1+\rho}{1-\rho}\right] \right), ~~ -1\leq \theta\leq 1. \end{aligned} $$

Figure 29.6 presents the CD of correlation coefficient ρ for a simulated dataset with n = 50, μ1 = μ2 = 1, σ1 = σ2 = 1, ρ = 0.5.

Fig. 29.6
figure 6

CD of the correlation coefficient ρ

In addition to the above two examples, there also are recent developments of CDs on causal inference; see more applications in [54].

4.2 Nonparametric GFD-Based Inference

[17] proposed a fiducial approach to testing reliability function with an infinite dimensional parameter. Their approach does not assume a parametric distribution and is robust to model mis-specification. In [17], they considered a clinical trial of chemotherapy against chemotherapy combined with radiotherapy in the treatment of locally unresectable gastric cancer conducted by the Gastrointestinal Tumor Study Group [70]. In this trial, 45 patients were randomized to each of the 2 groups and followed for several years. The censoring percentage is 13.3% for the combined therapy group, and 4.4% for the chemotherapy group. We are interested in testing whether the two treatment groups have the same survival functions.

The Kaplan-Meier curves for these two datasets are presented in Fig. 29.7a. We notice that the two hazards appear to be crossing, which could pose a problem for some log-rank tests. In this instance, the fiducial approach gives a small p-value 0.002. The p-values of other types of log-rank tests are reported in [17]. To explain why their proposed fiducial approach works good, they plot the sample of the difference of two fiducial distributions in Fig. 29.7b. If these two datasets are from the same distribution, 0 should be well within the sample curves. However, from Fig. 29.7b, we could see that the majority of curves are very far away from 0 on the interval [0.5, 1]. This gives strong evidence that the group with combined therapy has significantly worse early survival outcomes.

Fig. 29.7
figure 7

(a) Kaplan-Meier estimators for two treatment groups [17]. (b) Difference of two sample fiducial distributions

In [17], they choose to use the sup-norm in the definition of the curvewise confidence intervals and tests. It could be possible to make the procedure more powerful by using a different (possibly weighted) norm [64]. Similarly, it might also be possible to use the choice of norm motivated by inferential models [18, 58, 61]. Besides the above example, there also are recent developments of nonparametric fiducial inference on interval-censored data and Efron’s empirical Bayes deconvolution; see [19, 20] for more applications.

Data [70]: (* indicates a censored event)

Combination group: 0.05 0.12 0.12 0.13 0.16 0.20 0.20 0.26 0.28 0.30 0.33 0.39 0.46 0.47 0.50 0.51 0.53 0.53 0.54 0.57 0.64 0.64 0.70 0.84 0.86 1.10 1.22 1.27 1.33 1.45 1.48 1.55 1.58 1.59 2.18 2.34 3.74 4.32 5.64 6.61* 6.81* 7.66* 7.68* 8.04* 8.19*

Chemotherapy group: 0.00 0.17 0.29 0.35 0.50 0.59 0.68 0.72 0.82 0.82 0.94 0.97 0.98 0.98 1.04 1.05 1.05 1.06 1.08 1.12 1.26 1.34 1.37 1.43 1.44 1.47 1.54 1.56 1.85 1.85 2.05 2.13 2.15 2.18 2.62 2.65 2.74 3.41 3.48 3.89 4.25 4.64 6.47 7.55* 8.08*

4.3 Combining Information from Multiple CDs

We use simple cluster of differentiation 4 (cd-4) count data considered in [23] to demonstrate combining information from CDs. Twenty HIV-positive subjects received an experimental antiviral drug. The cd-4 counts in hundreds were recorded for each subject at baseline and after 1 year of treatment.

We obtained the summary statistics and simulated four independent datasets from the following bivariate normal distribution:

$$\displaystyle \begin{aligned} N\left(\begin{pmatrix} \mu_1\\ \mu_2 \end{pmatrix},\begin{pmatrix} \sigma_1^2 & \rho\sigma_1\sigma_2\\ \rho\sigma_1\sigma_2 & \sigma_2^2 \end{pmatrix}\right), \end{aligned}$$

where μ1 = 3.288, μ2 = 4.093, \(\sigma _1^2=0.657\), \(\sigma _2^2= 1.346\), and ρ = 0.723.

Suppose each study makes its own inference conclusion individually. Each dataset was analyzed by Fisher’s Z method [27, 76], the bias-corrected and accelerated (BCa) bootstrap [10, 21, 23], the profile likelihood approach [48], and Bayesian with uniform prior [1], respectively. One natural question we would like to ask is if we can combine the inferences from four independent studies, given that ρ is the same in all studies. The answer is yes. As introduced in Section 29.2.3, combination of CDs is a powerful inferential tool. We fused studies by combining p-values (Stouffer) [55, 95].

The results of the analysis are summarized in Table 29.1. As we can see from the table, four methods in different studies provide more or less similar results, and the combined interval is much shorter than any of the four individual intervals. In order to study the performance of the combination of CDs in this situation, we present a simulation study with 200 replications. Table 29.2 shows the coverage and average length of 95% CIs. We see that not only the combined approach maintains the desired coverage but also the length of CIs is roughly half of the lengths of CIs from individual studies. This result is as expected, since theoretically each study provides a n−1∕2-CIs and the sample size of combined data is 4n so we expect to obtain (4n)−1∕2-CI.

Table 29.1 Inference on correlation coefficient: combining independent bivariate normal studies
Table 29.2 Combination of four independent bivariate normal studies via CDs

Data [23]:

Baseline: 2.12, 4.35, 3.39, 2.51, 4.04, 5.10, 3.77, 3.35, 4.10, 3.35, 4.15, 3.56, 3.39, 1.88, 2.56, 2.96, 2.49, 3.03, 2.66, 3.00

One year: 2.47, 4.61, 5.26, 3.02, 6.36, 5.93, 3.93, 4.09, 4.88, 3.81, 4.74, 3.29, 5.55, 2.82, 4.23, 3.23, 2.56, 4.31, 4.37, 2.40