Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

19.1 Fiducial Probability and Inference

R.A. Fisher introduced the concept of fiducial probability and used in to develop fiducial inference. Counterexamples though the years have lead to its lack of use in statistics.

Example.

Consider a deck of N cards numbered 1, 2, , N. One card is drawn at random, its number denoted by U. Then

$$\displaystyle{ \mathbb{P}(U = u) = \frac{1} {N}\;\;u = 1,2,\ldots,N }$$
(19.1)

Suppose now that we add an unknown number θ to U. We are not told the observed value of U, u obs , or the value of θ but we are told the observed value, \(t_{obs} = u_{obs}+\theta\), of the total \(T = U+\theta\). Note that we could see t obs if and only if one of the following outcomes occurred:

$$\displaystyle{(U = 1,\theta = t_{obs} - 1),(U = 2,\theta = t_{obs} - 2),\ldots,(U = N,\theta = t_{obs} - N)}$$
  1. 1.

    Given the value of t obs there is a one-to-one correspondence between the values of U and θ. If we knew θ then we could determine the value of u obs .

  2. 2.

    If we do not know the value of θ then observing T = t obs will tell us nothing about u obs .

  3. 3.

    Thus the state of uncertainty regarding u obs will be the same after the observation of t obs as it was before.

Therefore we assume that (19.1) holds, and we can write

$$\displaystyle{\mathbb{P}(\theta = t_{obs} - u) = \mathbb{P}(U = u) = \frac{1} {N}\;\;u = 1,2,\ldots,N}$$

which we call the fiducial probability distribution of θ.

Example.

Assume that \(X\stackrel{d}{\sim }\mbox{ N}(\theta,1)\) and define \(U = T-\theta\). Then \(U\stackrel{d}{\sim }\mbox{ N}(0,1)\). If we observe T = t obs then t obs arises from a pair of values \((U = u,\theta = t_{obs} - u)\) so that given t obs there is a one-to-one correspondence between the possible values of U and θ. Again, since θ is unknown we will learn nothing about which value of U occurred. Thus we may assume that \(U\stackrel{d}{\sim }\mbox{ N}(0,1)\) even after T = t obs is observed. Thus we can calculate (fiducial) probabilities about θ by transforming them into probability statements about U, i.e.,

$$\displaystyle{\theta \leq y\;\;\Longleftrightarrow\;\;U \geq t_{obs} - y}$$

so that

$$\displaystyle{\mathbb{P}_{\mathcal{F}}(\theta \leq y) = \mathbb{P}(U \geq t_{obs} - y) = \Phi (y - t_{obs})}$$

where \(\Phi (w) =\int _{ -\infty }^{w}e^{-z^{2}/2 }/\sqrt{2\pi }dz\). Note that such probability statements are the same as if we treated θ as a random variable with a normal distribution with mean t and variance 1. Thus the fiducial distribution of θ is N(t obs , 1). The fiducial density is the derivative with respect to θ, i.e.,

$$\displaystyle{p_{\mathcal{F}}(\theta;t_{obs}) = \frac{1} {\sqrt{2\pi }}\exp \left \{-\frac{(\theta -t_{obs})^{2}} {2} \right \}}$$

Kalbfleisch lists sufficient conditions for the fiducial argument to apply:

  1. 1.

    A single real-valued parameter.

  2. 2.

    A minimal sufficient statistic T exists for θ.

  3. 3.

    There is a pivot U(T, θ) such that

    1. (i)

      For each θ, U(t, θ) is a one-to-one function of t

    2. (ii)

      For each t, U(t, θ) is a one-to-one function of θ

These assumptions which are compounded when we move to more than one parameter mean that the scope of fiducial inference is severely limited. In fact it is largely ignored in most modern treatments of statistics.

Suppose that X 1, X 2, , X n are iid each N(μ, σ 2) where σ 2 is known. Then, whatever the value of μ, we know that for every value of α ∈ [0, 1] we can find an upper 100(1 −α) % confidence interval for μ, namely

$$\displaystyle{\overline{X} + z_{1-\alpha } \frac{\sigma } {\sqrt{n}}}$$

which we know has the property that

$$\displaystyle{\mathbb{P}\left \{\mu \leq \overline{X} + z_{1-\alpha } \frac{\sigma } {\sqrt{n}}\right \} = 1-\alpha }$$

considered as a function of the random variable \(\overline{X}\). Fisher noted that the left hand-side has all of the properties of a distribution function defined over the parameter space and thus proceeded to define the fiducial distribution of μ for fixed \(\overline{X} = \overline{x}_{obs}\) and its derivative as the fiducial density of μ given \(\overline{x}_{obs}\). Thus we can speak of the fiducial probability, \(\mathbb{P}_{\mathcal{F}}(A)\) that μ ∈ A calculated as

$$\displaystyle{\mathbb{P}_{\mathcal{F}}(\mu \in A) =\int _{A} \frac{1} {\sqrt{2\pi }}\exp \left \{-\frac{\sqrt{n}(\mu -\overline{x}_{obs})^{2}} {2\sigma ^{2}} \right \}d\mu }$$

As another example let X 1, X 2, , X n be iid each uniform on \((0,\theta )\). Then \(Y =\max \left \{X_{1},X_{2},\ldots,X_{n}\right \}\) is the minimal sufficient statistic which has distribution function

$$\displaystyle{F_{Y }(y;\theta ) = \mathbb{P}_{\theta }(Y \leq y) = \left [\mathbb{P}_{\theta }(X \leq y)\right ]^{n} = \left [\frac{y} {\theta } \right ]^{n}}$$

It follows that

$$\displaystyle{\mathbb{P}\left (\frac{Y } {\theta } \leq y\right ) = \mathbb{P}\left (Y \leq y\theta \right ) = y^{n}}$$

19.1.1 Good’s Example

Suppose that X is a random variable with density function

$$\displaystyle{f_{X})(x;\theta ) = \frac{\theta ^{2}(x + a)e^{-x\theta }} {a\theta + 1} \;\;\;\mbox{ where $a > 0,\theta > 0,x \geq 0$}}$$

which has distribution function

$$\displaystyle\begin{array}{rcl} F(x;\theta )& =& \int _{0}^{x}f(t;\theta )dt {}\\ & =& \int _{0}^{x}\frac{\theta ^{2}(t + a)e^{-t\theta }} {a\theta + 1} dt {}\\ & =& \frac{\theta ^{2}} {a\theta + 1}\int _{0}^{x}(t + a)e^{-t\theta }dt {}\\ \end{array}$$

Now note that

$$\displaystyle\begin{array}{rcl} \int _{0}^{x}(t + a)e^{-t\theta }dt& =& \int _{ 0}^{x}te^{-t\theta }dt + a\int _{ 0}^{x}e^{-t\theta }dt {}\\ & =& -\frac{te^{-t\theta }} {\theta } \Bigg\vert _{0}^{x} + \frac{1} {\theta } \int _{0}^{x}e^{-t\theta }dt + a\int _{ 0}^{x}e^{-t\theta }dt {}\\ & =& -\frac{xe^{-x\theta }} {\theta } + \frac{1 + a\theta } {\theta } \int _{0}^{x}e^{-t\theta }dt {}\\ & =& -\frac{xe^{-x\theta }} {\theta } + \frac{1 + a\theta } {\theta ^{2}} \left [1 - e^{-x\theta }\right ] {}\\ \end{array}$$

It follows that

$$\displaystyle{F_{X}(x;\theta ) = 1 - e^{-x\theta }\left [1 + \frac{x\theta } {a\theta + 1}\right ]}$$

The fiducial density of θ is the derivative of F X (x; θ) with respect to θ or

$$\displaystyle{\mathcal{F}_{x}(\theta ) = \frac{x\theta e^{-x\theta }} {(a\theta + 1)^{2}}\left [a + (a + x)(1 + a\theta )\right ]}$$

Suppose we now observe, independently of X, another random variable Y with density

$$\displaystyle{f_{Y }(y;\theta ) = \frac{\theta ^{2}(y + b)e^{-x\theta }} {b\theta + 1} \;\;\;\mbox{ where $b > 0,\theta > 0,x \geq 0$ and $b\neq a$}}$$

If we use the fiducial density based on X as a “prior” for θ and combine it with the density for Y the resulting posterior would be

$$\displaystyle{\mathcal{P}_{XY }(\theta;x,y) = \mathcal{F}_{x}(\theta )f_{Y }(y;\theta )}$$

which is equal to

$$\displaystyle{ \frac{x\theta e^{-x\theta }} {(a\theta + 1)^{2}}\left [a + (a + x)(1 + a\theta )\right ]\frac{\theta ^{2}(y + b)e^{-y\theta }} {b\theta + 1} }$$

If fiducial probabilities behaved like true probabilities it should make no difference whether we observed X or Y first. If we used the fiducial density for θ based on Y and combined it with density of X the resulting posterior would be

$$\displaystyle{\mathcal{P}_{Y X}(\theta;y,x) = \mathcal{F}_{y}(\theta )f_{X}(x;\theta )}$$

which is equal to

$$\displaystyle{ \frac{y\theta e^{-y\theta }} {(b\theta + 1)^{2}}\left [b + (b + x)(1 + b\theta )\right ]\frac{\theta ^{2}(x + a)e^{-x\theta }} {a\theta + 1} }$$

The laws of probability would say that these two expressions should be equal. They are clearly not. So fiducial probabilities are not compatible with Bayes theorem and hence are not warranted.

19.1.2 Edward’s Example

Consider two hypotheses \(\theta = +1\) or \(\theta = -1\) and suppose that there are two possible outcomes of a random variable X, \(X = +1\) or \(X = -1\). The probability model for X is

$$\displaystyle{\mathbb{P}_{\theta =+1}(X = x) = \left \{\begin{array}{rl} p&x = +1\\ q &x = -1 \end{array} \right.\;\;;\;\; \mathbb{P}_{\theta =-1}(X = x) = \left \{\begin{array}{rl} q&x = +1\\ p &x = -1 \end{array} \right.}$$

Since when \(\theta = +1\), \(X = +1\) with probability p, and when \(\theta = -1\), \(X = -1\) with probability p, we have that

$$\displaystyle{\mathbb{P}(X\theta = +1)\;\;\mbox{ with probability $p$}.}$$

This statement is true in general and when \(X = +1\) is equivalent to the statement

$$\displaystyle{\mathbb{P}(\theta = +1\vert X = +1) = p}$$

which is the fiducial probability statement about θ following from observing that \(X = +1\).

Thus starting with no prior information and performing an experiment which is uninformative we arrive at a statement of probability for θ.

Suppose now that \(p = q = \frac{1} {2}\). Then

  1. (i)

    There is no information about θ a priori

  2. (ii)

    The observation of X is totally uninformative about θ

  3. (iii)

    The likelihood ratio for comparing \(\theta = +1\) to \(\theta = -1\) is 1 indicating that, based on the observation, we have no preference for \(\theta = +1\) vs \(\theta = -1\)

However we find, using the fiducial argument, that

$$\displaystyle{\mathbb{P}_{\mathcal{F}}(\theta = +1) = \frac{1} {2}\;\;\mbox{ whatever the value of $X$}}$$

Thus we have another example which casts doubt on the veracity of the fiducial argument.

19.2 Confidence Distributions

Recently there has been a great deal of research on confidence distributions which reduce, in many cases, to fiducial distributions, but which are solidly in the frequentist camp.

Definition 19.2.1.

A function \(C_{n}(\theta;\boldsymbol{x}_{n}\::\; \mathcal{X}\times \Theta \mapsto [0,1]\) is called a confidence distribution if

  1. (i)

    \(C_{n}(\theta;\boldsymbol{x}_{n})\) is a distribution function over \(\Theta \) for each fixed \(\boldsymbol{x}_{n} \in \mathcal{X}\)

  2. (ii)

    At the true parameter point \(C_{n}(\theta _{0},\boldsymbol{x}_{n})\) as a function of \(\boldsymbol{x}_{n} \in \mathcal{X}\) has a uniform distribution over [0, 1]

\(C_{n}(\theta;\boldsymbol{x}_{n}\::\; \mathcal{X}\times \Theta \mapsto [0,1]\) is an asymptotic confidence distribution if (ii) is satisfied only asymptotically.

The paper by Xie and Singh [55] provides a useful review of the ideas. The following graph shows the unification of frequentist ideas using the confidence density:

19.2.1 Bootstrap Connections

If \(\widehat{\theta }\) is an estimator of θ let the bootstrap estimator be \(\widehat{\theta }^{{\ast}}\). When the asymptotic distribution of \(\widehat{\theta }\) is symmetric then the sampling distribution of \(\widehat{\theta }-\theta\) is estimated by the bootstrap distribution of \(\widehat{\theta }-\widehat{\theta }^{{\ast}}\). In this case an asymptotic confidence distribution is given by

$$\displaystyle{C_{n}(\theta ) = 1 - \mathbb{P}(\widehat{\theta }-\widehat{\theta }^{{\ast}}\leq \widehat{\theta }-\theta \vert \boldsymbol{x}) = \mathbb{P}(\widehat{\theta }^{{\ast}}\leq \theta \vert \boldsymbol{x})}$$

the bootstrap distribution of \(\widehat{\theta }\).

19.2.2 Likelihood Connections

If we normalize a likelihood function \(\mathcal{L}(\theta;\boldsymbol{x})\) so that the area under the normalized likelihood function is 1, i.e., we form

$$\displaystyle{\mathcal{L}^{{\ast}}(\theta;\boldsymbol{x}) = \frac{\mathcal{L}(\theta;\boldsymbol{x})} {\int _{\Theta }\mathcal{L}(\theta;\boldsymbol{x})d\theta }}$$

then, under certain conditions, we obtain an asymptotic normal confidence distribution.

Similarly, under the usual regularity conditions for maximum likelihood, we can use the normalized profile likelihood as an asymptotic confidence distribution for a parameter of interest.

It is also possible to construct approximate likelihoods using confidence distributions. Efron considered a confidence density \(c(\theta;\boldsymbol{x})\) for the parameter of interest. He then considered doubling the data set by introducing another data set, considered independent of the first but having exactly the same data \(\boldsymbol{x}\). Then construct the confidence density \(c(\theta,\boldsymbol{x},\boldsymbol{x})\) based on the doubled data set using the same confidence intervals to define the density. Then the implied likelihood function is

$$\displaystyle{\mathcal{L}_{imp}(\theta;\boldsymbol{x}) = \frac{c(\theta;\boldsymbol{x},\boldsymbol{x})} {c(\theta;\boldsymbol{x})} }$$

19.2.3 Confidence Curves

Birnbaum introduced the idea of a confidence curve to summarize confidence intervals and levels of tests in one curve. In terms of confidence distributions the confidence curve is given by

$$\displaystyle{CC(\theta ) = 2\min \{C_{n}(\theta;\boldsymbol{x}),1 - C_{n}(\theta;\boldsymbol{x})\}}$$

Thus a confidence distribution is simply a combinant such that for each \(\boldsymbol{x}_{n} \in \mathcal{X}\) it is a distribution function as θ varies over \(\Theta \) and for fixed θ = θ 0 it has a uniform distribution as \(\boldsymbol{x}_{n}\) varies over \(\mathcal{X}\).

Example.

Let

$$\displaystyle{X_{i}\;\;\stackrel{d}{\sim }\;\;\mbox{ N}(\mu,\sigma ^{2})\;\;i = 1,2,\ldots,n}$$

be independent with σ 2 known. Then

$$\displaystyle{C_{n}(\mu;\overline{x}_{obs}) = \Phi \left (\frac{\mu -\overline{x}_{obs}} {\sigma /\sqrt{n}} \right )\;\;\mbox{ where}\;\;\Phi (y) =\int _{ -\infty }^{y}\frac{e^{-z^{2}/2 }} {\sqrt{2\pi }} dz}$$

is a confidence distribution for μ.

19.3 P-Values Again

Given the widespread importance of genomics P-values are now used more than ever. It is important to remember that P-values are observed values of a random variable and hence have intrinsic variability.

19.3.1 Sampling Distribution of P-Values

The P-value is defined, for a test statistic T, with distribution function \(F_{H_{0}}(t)\) assuming the null hypothesis is true as

$$\displaystyle{\mathbb{P}_{H_{0}}(T \geq t_{obs})\;\;\mbox{ where $t_{obs}$ is the observed value of $T$}}$$

Note that the P-value is given by \(1 - F_{H_{0}}(t_{obs})\) and can be considered as an observed value of the random variable \(Y = 1 - F_{H_{0}}(T)\). The distribution function of Y assuming the null hypothesis is true is

$$\displaystyle\begin{array}{rcl} F_{Y }(y)& =& \mathbb{P}_{H_{0}}(Y \leq y) {}\\ & =& \mathbb{P}_{H_{0}}\left \{1 - F_{H_{0}}(T) \leq y\right \} {}\\ & =& \mathbb{P}_{H_{0}}\left \{F_{H_{0}}(T) \geq 1 - y\right \} {}\\ & =& 1 - \mathbb{P}_{H_{0}}\left \{F_{H_{0}}(T) \leq 1 - y\right \} {}\\ & =& 1 - \mathbb{P}_{H_{0}}\left \{T \leq F_{H_{0}}^{-1}(1 - y)\right \} {}\\ & =& 1 - F_{H_{0}}\left \{F_{H_{0}}^{-1}(1 - y)\right \} {}\\ & =& 1 - (1 - y) {}\\ & =& y {}\\ \end{array}$$

for 0 < y < 1. That is, the P-value under the null hypothesis has a uniform distribution. This fact has been known for decades, but P-values are not reported with a standard error as other statistics are.

If the alternative hypothesis is true assume that T has a distribution function G H (t). Then

$$\displaystyle\begin{array}{rcl} F_{Y }(y)& =& \mathbb{P}_{H}(Y \leq y) {}\\ & =& \mathbb{P}_{H}\left \{1 - F_{H_{0}}(T) \leq y\right \} {}\\ & =& \mathbb{P}_{H}\left \{F_{H_{0}}(T) \geq 1 - y\right \} {}\\ & =& 1 - \mathbb{P}_{H}\left \{F_{H_{0}}(T) \leq 1 - y\right \} {}\\ & =& 1 - \mathbb{P}_{H}\left \{T \leq F_{H_{0}}^{-1}(1 - y)\right \} {}\\ & =& 1 - G_{H}\left \{F_{H_{0}}^{-1}(1 - y)\right \} {}\\ \end{array}$$

Under suitable regularity conditions the density of Y under the alternative hypothesis is

$$\displaystyle\begin{array}{rcl} f_{Y }(y)& =& \frac{dF_{Y }(y)} {dy} {}\\ & =& \frac{d\left [1 - G_{H}\left \{F_{H_{0}}^{-1}(1 - y)\right \}\right ]} {dy} {}\\ & =& -g_{H}\left \{F_{H_{0}}^{-1}(1 - y)\right \} \frac{1} {-f_{H_{0}}\left \{F_{H_{0}}^{-1}(1 - y)\right \}} {}\\ & =& \frac{g_{H}\left \{F_{H_{0}}^{-1}(1 - y)\right \}} {f_{H_{0}}\left \{F_{H_{0}}^{-1}(1 - y)\right \}} {}\\ \end{array}$$

Note that for the observed value of T, t obs , we have \(y = 1 - F_{H_{0}}(t_{obs})\) and hence the density evaluated at the observed value of T, t obs , is given by

$$\displaystyle{ \frac{g_{H}\left \{F_{H_{0}}^{-1}(1 - y)\right \}} {f_{H_{0}}\left \{F_{H_{0}}^{-1}(1 - y)\right \}} = \frac{g_{H}\left \{F_{H_{0}}^{-1}\left [F_{H_{0}}(t_{obs})\right ]\right \}} {f_{H_{0}}\left \{F_{H_{0}}^{-1}\left [F_{H_{0}}(t_{obs})\right ]\right \}} = \frac{g_{H}(t_{obs})} {f_{H_{0}}(t_{obs})}}$$

the likelihood ratio!

P-values have been under heavy fire in the last few years for overstating the importance of effects observed in clinical and other investigations [48]. The results above due to Donahue [12] and others have been used by Boos and Stefanski [5] to explain why P-values overstate the conclusions of studies. The results of Goodman [21] are also relevant. The bottom line appears to be that use of P-values is not the way to present the evidence from studies that rely on statistical analysis to report their conclusions.

19.4 Severe Testing

Severe testing is a concept claimed to be useful for post-data inference. The ideas are best illustrated through an example and I follow the example in Mayo and Spannos [33]. Suppose that X 1, X 2, , X n are iid each N(μ, σ 2) where σ = 2, that n = 100, and we choose α = 0. 025 for a one-sided test of

$$\displaystyle{H_{0}\::\:\mu \leq 12\;\;\mbox{ vs}\;\;\mu > 12}$$

Since, under H 0, \(\overline{X}_{n}\stackrel{d}{\sim }\mbox{ N}(12,2/10)\), we reject using the Neyman-Pearson theory if

$$\displaystyle{d(\boldsymbol{x}_{obs}) = \frac{\sqrt{n}(\overline{x}_{obs} - 12)} {2} \geq 1.96\;\;\Longleftrightarrow\;\;\overline{x}_{obs} \geq 12.4}$$

Suppose now that we observe \(\overline{x}_{obs} = 11.8\). Then we would not reject H 0 :  μ = 12. If a value of μ equal to 12.2 was deemed to be of scientific or substantive importance we can ask the question do we have evidence that μ < 12. 2? Mayo suggests calculating the severity with which μ < 12. 2 passes the test. The severity with which μ = 12. 2 passes the test in cases where H 0 is accepted is defined in this situation as

$$\displaystyle\begin{array}{rcl} \mathbb{P}_{\mu }(\overline{X} > \overline{x}_{obs})& =& \mathbb{P}_{\mu =12.2}\left \{\overline{X} > 11.8\right \} {}\\ & =& \mathbb{P}_{\mu =12.2}\left \{\frac{\sqrt{100}(\overline{X} - 12.2)} {2} > \frac{\sqrt{100}(11.8 - 12.2)} {2} \right \} {}\\ & =& \mathbb{P}\left \{Z > -2\right \} {}\\ & =& 0.977 {}\\ \end{array}$$

Note that the power of the test at 12.2 is

$$\displaystyle\begin{array}{rcl} \mathbb{P}_{\mu =12.2}\left \{\overline{X} > 12.4\right \}& =& \mathbb{P}_{\mu =12.2}\left \{\frac{\sqrt{100}(\overline{X} - 12.2)} {2} > \frac{\sqrt{100}(12.4 - 12.2} {2} \right \} {}\\ & =& \mathbb{P}\left \{Z > 1\right \} {}\\ & =& 0.159 {}\\ \end{array}$$

Mayo and Spannos define the attained power in this situation as

$$\displaystyle{\mathbb{P}_{\mu }\left \{\overline{X} > \overline{x}_{obs}\right \}}$$

so that the severity with which μ passes the test is simply the attained power at μ when the observed outcome leads to acceptance of the null hypothesis.

Suppose now that the null hypothesis is rejected. In the example suppose that \(\overline{x}_{obs} = 12.6\). The null hypothesis that μ = 12 is rejected. Again, assuming that μ = 12. 2 is of scientific or substantive interest, do we have evidence of a value of μ of scientific interest? Mayo and Spannos suggest calculating the severity of μ > 12. 2 defined by

$$\displaystyle\begin{array}{rcl} \mathbb{P}_{\mu }\left \{\overline{X} \leq \overline{x}_{obs}\right \}& =& \mathbb{P}_{\mu =12.2}\left \{\overline{X} \leq \overline{x}_{obs}\right \} {}\\ & =& \mathbb{P}_{\mu =12.2}\left \{\overline{X} \leq 12.6\right \} {}\\ & =& \mathbb{P}_{\mu =12.2}\left \{\frac{\sqrt{100}(\overline{X} - 12.2)} {2} \leq \frac{\sqrt{100}(12.6 - 12.2)} {2} \right \} {}\\ & =& \mathbb{P}\left \{Z \leq 2\right \} {}\\ & =& 0.977 {}\\ \end{array}$$

Note that in the case of a hypothesis which is rejected the severity is simply 1 minus the attained power.

Severe testing does a nice job of clarifying the issues which occur when a hypothesis is accepted (not rejected) by finding those values of the parameter (here mu) which are plausible (have high severity) given acceptance. Similarly severe testing addresses the issue of a hypothesis which is rejected by finding those values of the parameter μ which are plausible (have high severity) given rejection. Note that severity is a function of the test, T, the hypothesis, H, and the observed data, \(\boldsymbol{x}_{obs}\). Thus it is inherently a random variable and the standard results on p-values and their distributions apply to severity as well. Also note that conventions need to be established for when severity is judged to be high.

Finally note that most of the existing examples implicitly seem to require a monotone likelihood ratio so that members of the exponential family are included, but whether or not other distributions are covered under the existing theory is unknown.

19.5 Cornfield on Testing and Confidence Intervals

The following quotes by Jerry Cornfield (1966) indicate that the problems with frequentist statistics have been known for a long time.

Cornfield defines the α-postulate as “All hypotheses rejected the same critical level have equal amounts of evidence against them.”

The following example will be recognized by statisticians with consulting experience as a simplified version of a very common situation. An experimenter, having made n observations in the expectation that they would permit the rejection of a particular hypothesis, at some predetermined significance level, say 0.05, finds that he has not quite attained this critical level. He still believes that the hypothesis is false and asks how many more observations would be required to have reasonable certainty of rejecting the hypothesis if the means observed after n observations are taken as the true values. He also makes it clear that had the original n observations permitted rejection he would simply have published his findings. Under these circumstances it is evident that there is no amount of additional observations, no matter how large, which would permit rejection at the 0.05 level. If the hypothesis being tested is true, there is a 0.05 chance of it having been rejected after the first round of observations. To this chance must be added the probability of rejecting after the second round, given failure to reject after the first, and this increases the total chance of erroneous rejection to above 0.05. In fact as the total number of observations in the second round is indefinitely increased the significance approaches 0.0975 (=0.05 +0.95 ×0.05) if the 0.05 criteria is retained. Thus no amount of additional evidence can be collected which would provide evidence against the hypothesis equivalent to rejection at the P = 0. 05 level and adherents of the α-postulate would presumably advise him to turn his attention to other scientific fields. The reasonableness of this advice is perhaps questionable (as is the possibility that it would be accepted). In any event it does not seem possible to argue seriously in the face of this example that all hypotheses rejected at the.05 level have equal amounts of evidence against them.

The confidence set yielded by a given body of data is the set of all hypotheses not rejected by the data, so that the relation between hypothesis test and confidence limits is close. In fact the confidence limit equivalent of the α-postulate is “All statements made with same the confidence coefficient have equal amounts of evidence in their favor.” That this may be seen no more reasonable the α-postulate is suggested by the very common of inference about the ratio of two normal means. The most selective unbiased confidence set for the unknown ratio has the following curious characteristic: for every sample point there exists an α > 0 such that all confidence limits with coefficients ≥ 1 −α are plus to minus infinity. But to assert that the unknown ratio lies between plus and minus infinity with confidence coefficient of only 1 −α is surely being overcautious. Even worse, the postulate asserts that there is less evidence for such an infinite interval than there is for a finite interval about a normal mean, but made with coefficient 1 −α where α  < α. The α-postulate cannot therefore be considered anymore reasonable than it is for hypothesis testing.

It has been proposed by proponents of confidence limits that this clearly undesirable characteristic of the limits on a ratio be avoided by redefining the sample space so as to exclude all sample points that lead to infinite limits for given α. This is equivalent to saying that if the application of a principle to given evidence leads to an absurdity then the evidence must be discarded. It is reminiscent of the heavy smoker, who, worried about the literature relating smoking to lung cancer, decided to give up reading.