Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

12.1 Ancillarity

Conditioning arguments are at the center of many disputes regarding the foundations of statistical inference. We present here only some simple arguments and examples.

Definition 12.1.1.

A statistic A is ancillary (for θ) if its distribution does not depend on θ.

The basic idea in conditioning is that, since A provides no information on θ, we should use the conditional distribution, given A, for inference on θ. The idea originated with R.A. Fisher and has been discussed and disputed for decades. In some problems most statisticians condition on A, in other problems they do not.

In most problems the sample size is considered fixed, i.e., ancillary even though it may be determined by availability of funds or other considerations not related to the problem (θ) of interest. Similarly in regression type problems (linear models, generalized linear models, etc.) most statisticians condition on the covariates (design matrix). There seems to be no definite guidelines for when to condition and when not to condition.

Example.

Cox introduced the following example. Consider two measuring devices. Device P produces measurements which are normal with mean θ and variance σ 2 and device I produces measurements which are normal with variance k 2 σ 2 where k is much larger than 1. Which instrument is used is decided by the flip of a fair coin so that the precision of the measurement (i.e., what instrument is used) is ancillary.

Thus we would report the value of the measurement and the associated value of precision σ 2 or k 2 σ 2 depending on the instrument actually used. However, if we do not condition, the true variance of X is

$$\displaystyle\begin{array}{rcl} \mathbb{V}(X)& =& \mathbb{E}\left [\mathbb{V}(X\vert F)\right ] + \mathbb{V}\left [\mathbb{E}(X\vert F)\right ] {}\\ & =& \frac{\sigma ^{2}} {2} + \frac{k^{2}\sigma ^{2}} {2} {}\\ \end{array}$$

Note that

$$\displaystyle{\sigma ^{2} <\sigma ^{2}\left (\frac{1} {2} + \frac{k^{2}} {2} \right ) < k^{2}\sigma ^{2}}$$

so that the reported standard error will be either too small or too large.

Example (Valliant, Dorfman and Royall).

There is a population of size 1,000 from which we have selected a random sample of size 100 without replacement. The population mean is estimated by the sample mean which has variance estimated by

$$\displaystyle{\mathbb{V}(\overline{Y }_{s}) = \left (1 - \frac{100} {1000}\right ) \frac{s^{2}} {100}\;\;\mbox{ where}\;\;s^{2} = \frac{\sum _{i\in s}(y_{i} -\overline{y}_{s})^{2}} {99} }$$

and s denotes the set of items selected.

Before we drew the sample, we considered doing a complete census of all 1,000 objects, but we had another study of interest. To decide whether to do the complete census or a sample of size 100 and the other study we flipped a coin. If the result was a head we did the complete census; if the result was a tail we took the sample of size 100.

The variance of the sample mean is

$$\displaystyle{\mathbb{V}(\overline{Y }_{s}) = \frac{1} {2}\mathbb{V}(\overline{Y }_{s}\vert n = 100) + \frac{1} {2}\mathbb{V}(\overline{Y }_{s}\vert n = 1000) = \frac{1} {2}\mathbb{V}(\overline{Y }_{s}\vert n = 100)}$$

Using this an estimate of variability is clearly wrong, yet it is correct from a frequentist point of view. Note that the same variance would be required if we had done the complete census. In this case any confidence interval would consist of a set of points whereas we know the population mean exactly! Clearly there is need for conditioning in situations like this.

12.2 Problems with Conditioning

Examples in the previous section indicate that we should condition whenever there is an ancillary statistic. Unfortunately this is not always so easy. An excellent review article by Ghosh et al. [19] provides many examples and extensions. In particular there are examples given where there is no unique ancillary statistic.

Some authors have suggested that there are really two major types of ancillarity:

  1. 1.

    Experimental

  2. 2.

    Mathematical

Experimental ancillaries are those such as sample size, covariates, etc., i.e., situations where most statisticians routinely condition. Mathematical ancillaries are those that arise because of the specific nature of the statistical model.

Example (Continuous uniform).

Let X 1, X 2, , X n be iid with pdf

$$\displaystyle{ f(x;\theta _{1},\theta _{2}) = \left \{\begin{array}{rl} \frac{1} {\Delta } &\theta _{1} \leq x \leq \theta _{2} \\ 0&\mbox{ elsewhere} \end{array} \right. }$$
(12.1)

where \(\Delta =\theta _{2} -\theta _{1}\).

The joint density is given by

$$\displaystyle{ f(x_{1},x_{2},\ldots,x_{n}\:;\:\theta _{1},\theta _{2}) = \left \{\begin{array}{rl} \frac{1} {\Delta ^{n}} & \mbox{ all $x_{i} \in [\theta _{1},\theta _{2}]$} \\ 0&\mbox{ elsewhere} \end{array} \right. }$$
(12.2)

It follows that the minimum and maximum of X 1, X 2, , X n are minimal sufficient statistics for θ 1 and θ 2.

The joint distribution of the minimum and maximum from a random sample with distribution function F and density function f is easily shown to be

$$\displaystyle{ f(y_{1},y_{n}) = n(n - 1)[F(y_{n}) - F(y_{1})]^{n-2}f(y_{ 1})f(y_{n}) }$$
(12.3)

where Y 1 is the minimum of the X i ’s and Y n is the maximum

For the uniform distribution, we have that

$$\displaystyle{F(y;\theta _{1},\theta _{2}) = \frac{1} {\Delta }\int _{\theta _{1}}^{y}dx = \frac{y -\theta _{1}} {\Delta } }$$

so that the joint pdf of Y 1 and Y n is given by

$$\displaystyle{f(y_{1},y_{n};\theta _{1},\theta _{2}) = \frac{1} {\Delta ^{n}}n(n - 1)(y_{n} - y_{1})^{n-2}\;\;\theta _{ 1} \leq y_{1} \leq y_{n} \leq \theta _{2}}$$

Let \(\theta _{1} =\theta -\rho\) and \(\theta _{2} =\theta +\rho\), then we have that \(\Delta = 2\rho\) and hence the joint density is

$$\displaystyle{f(y_{1},y_{n};\theta ) = \frac{n(n - 1)(y_{n} - y_{1})^{n-2}} {\rho ^{n}} \;\;;\;\;\theta -\rho \leq y_{1} \leq y_{n} \leq \theta +\rho }$$

If we assume that ρ is known, then the likelihood function for θ is

$$\displaystyle{\mathcal{L}(\theta ) = 1\;\;;\;\;y_{n}-\rho \leq \theta \leq y_{1}+\rho }$$

For the special case where \(\rho = 1/2\) it is easy to show that [Y 1, Y n ] is a \(100\left (1 - \frac{1} {2^{n-1}} \right )\) confidence interval for θ.

Suppose now that

$$\displaystyle{n = 5\quad \mbox{ and}\quad y_{1} = 0.01,\:y_{n} = 0.99}$$

Then the \(100(1 - \frac{1} {16})\,\% = 93.75\,\%\) confidence interval for θ is. 01 to. 99. But since

$$\displaystyle{y_{1} \geq \theta -\frac{1} {2}\;;\;y_{n} \leq \theta +\frac{1} {2}}$$

if and only if

$$\displaystyle{0.51 = 0.01 + 0.5 \geq \theta \;\;\mbox{ and}\;\;0.49 = 0.99 - 0.5 \leq \theta }$$

with certainty.

Thus with these observed values of y 1 and y n we are certain that

$$\displaystyle{0.49 \leq \theta \leq 0.51}$$

and yet our 93.75 % confidence interval is

$$\displaystyle{0.01 \leq \theta \leq 0.99}$$

This is silly.

As Cox points out it is imperative to condition on the ancillary statistic in this example which is the range \(R = Y _{n} - Y _{1}\).