Bayesian and frequentist evidence in one-sided hypothesis testing

Moreno, Elías; Martínez, Carmen

doi:10.1007/s11749-021-00778-8

Bayesian and frequentist evidence in one-sided hypothesis testing

Original Paper
Published: 08 June 2021

Volume 31, pages 278–297, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

TEST Aims and scope Submit manuscript

Bayesian and frequentist evidence in one-sided hypothesis testing

Download PDF

426 Accesses
2 Citations
Explore all metrics

Abstract

In one-sided testing, Bayesians and frequentists differ on whether or not there is discrepancy between the inference based on the posterior model probability and that based on the p value. We add some arguments to this debate analyzing the discrepancy for moderate and large sample sizes. For small and moderate samples sizes, the discrepancy is measured by the probability of disagreement. Examples of the discrepancy on some basic sampling models indicate the somewhat unexpected result that the probability of disagreement is larger when sampling from models in the alternative hypothesis that are not located at the boundary of the hypotheses. For large sample sizes, we prove that the Bayesian one-sided testing is, under mild conditions, consistent, a property that is not shared by the frequentist procedure. Further, the rate of convergence is $O(e^{nA})$, where A is a constant that depends on the model from which we are sampling. Consistency is also proved for an extension to multiple hypotheses.

Confidence distributions and hypothesis testing

Article Open access 29 March 2024

History and nature of the Jeffreys–Lindley paradox

Article Open access 26 August 2022

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

Article 14 February 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

While it is widely accepted that for two-sided testing the p value overstates the evidence against the null (Edwards et al. 1963; Dickey 1977; Berger and Sellke 1987), it has been argued that for one-sided testing the Bayesian and frequentist approaches agree in producing a data-based evaluation of the evidence on the null hypothesis (Casella and Berger 1987).

The main argument for the agreement was that the infimum of the posterior probability of the null as the prior ranges over a reasonable class of prior distributions is equal to or smaller than the p value. This yields the Casella and Berger’s assertion Thus, the p value may be on the boundary or within the range of Bayesian evidence measures. Micheas and Dey (2003) provide a similar argument for location models. However, in Berger and Mortera (1999) it is questioned whether the infimum is the best evidential summary to provide by the Bayesian inference. These authors analyzed the one-sided testing problem using empirical Bayes factors for exponential and normal models and concluded that our most important conclusion is that using the default tests provides less extreme and arguably better answers in one-sided testing than the p value (p. 553). Morris (1987) in his discussion of the paper by Casella and Berger (1987) argued that the lower bound is a misleading measure of the evidence of the null. He also asserted that typical prior beliefs should concentrate closer to the dividing line between the hypotheses.

On the other hand, the BIC approximation to the Bayes factor avoids the use of priors, but it does not apply to one-sided testing. The reason is that the dimension of the null parameter space and the whole space is the same, and hence, BIC does not correct the likelihood ratio statistic. Some modifications of BIC that can apply were given by Dudley and Haughton (1997), Kass and Vaidyanathan (1992) and Mulder and Raftery (2019).

In this paper, we compare for moderate and large sample sizes the Bayesian evidence based on the posterior model probability, and the frequentist based on the p value. The Bayesian test uses objective reference priors whenever they are proper or the intrinsic priors if the reference priors are improper. The intrinsic priors were introduced by Berger and Pericchi (1996) and further studied by Moreno (1997) and Moreno et al. (1998), and they are priors that concentrate probability around the boundary of the hypotheses, as it is required by Morris. Although intrinsic priors do not necessarily exist for nonnested models, there are some exceptions. For one-sided testing, they are constructed using an auxiliary model (Moreno 2005); for variable selection in regression, or clustering, they are constructed using an encompassing model that converts the nonnested model selection problem in a nested one (Casella et al. 2009, 2014; Moreno et al. 2010, 2015; Wang and Maruyama 2016, among others).

The comparison cannot be based on the (Type I, Type II) error vector, the standard tool for evaluating a test. The reason is that the posterior probability of the null model is typically an increasing function of the p value (Girón et al. 2006), and thus, the Bayesian and frequentist (Type I, Type II) vectors are not comparable as no one dominates the other. Then, we propose to base the comparison on the probability that the Bayesian and frequentist decision disagree. This probability gives us a measure of the strength of the disagreement. Illustrations of this analysis on common models indicate that the stronger disagreement appears when sampling from models located in the alternative hypothesis, which are not at the boundary of the hypotheses.

The comparison for large sample sizes is based on an asymptotic analysis. As far as we know, the asymptotic of the one-sided test is not yet known. Maybe, it is due to the fact that the frequentist test is by construction inconsistent under the null, although for a wide class of sampling models it is consistent under the alternative. Our contribution to the asymptotic of the Bayesian one-sided test is a general proof of their consistency. The proof also provides the rate of consistency.

The rest of the paper is organized as follows. To facilitate the reading of the paper, we bring in Sect. 2 the intrinsic priors for one-sided testing in the absence and presence of nuisance parameters. In Sect. 3, the probability of disagreement is defined and illustrated on some basic families of sampling models. Section 4 contains the proof of the posterior model consistency of the one-sided test, and the multiple hypotheses $ H_{0}:\theta =\theta _{0}$, $H_{1}:\theta \le \theta _{0}$, $H_{2}:\theta \ge \theta _{0}$. Concluding remarks are given in Sect. 5.

2 Priors for one-sided testing

Let X be a random variable with distribution $f(x|\theta )$, where $\theta $ is a real parameter. The one-sided testing problem consists of the null hypothesis $H_{1}:\theta \le \theta _{0}$ and the alternative $H_{2}:\theta \ge \theta _{0}$, where $\theta _{0}$ is a fixed real number. Families of distributions for which this problem is of interest include location and scale families.

In the absence of prior information on $\theta $, the reference prior $\pi ^{N}(\theta )$ is typically used, and hence, the one-sided Bayesian test is the model selection problem between the nonnested models

$$\begin{aligned} M_{1}:\left\{ f(x|\theta ),\frac{\pi ^{N}(\theta )}{\int _{-\infty }^{\theta _{0}}\pi ^{N}(\theta )\mathrm{d}\theta }1_{(-\infty ,\theta _{0})}(\theta )\right\} , \end{aligned}$$

and

$$\begin{aligned} M_{2}:\left\{ f(x|\theta ),\frac{\pi ^{N}(\theta )}{\int _{\theta _{0}}^{\infty }\pi ^{N}(\theta )\mathrm{d}\theta }1_{(\theta _{0},\infty )}(\theta )\right\} , \end{aligned}$$

where $1_{A }(\theta )$ is the indicator function of the set A.

Let $P (M_{i})$ denotes the prior probability of model $M_{i},$ $i=1,2$. Then, for a given sample $\mathbf {x}_{n}=(x_{1},\ldots ,x_{n})$ of X the posterior probability of $M_{1}$ is given by

$$\begin{aligned} \Pr (M_{1}|\mathbf {x}_{n})=\left( 1+B_{21}^{N}(\mathbf {x}_{n})\frac{P (M_{2})}{P (M_{1})}\right) ^{-1}, \end{aligned}$$

where $B_{21}^{N}(\mathbf {x}_{n})$ is the Bayes factor

$$\begin{aligned} B_{21}^{N}(\mathbf {x}_{n})=\frac{\int _{-\infty }^{\theta _{0}}\pi ^{N}(\theta )\mathrm{d}\theta }{\int _{\theta _{0}}^{\infty }\pi ^{N}(\theta )\mathrm{d}\theta } \frac{\int _{\theta _{0}}^{\infty }f(\mathbf {x}_{n}|\theta )\pi ^{N}(\theta )\mathrm{d}\theta }{\int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi ^{N}(\theta )\mathrm{d}\theta }. \end{aligned}$$

This posterior probability measures the posterior uncertainty on model $ M_{1} $. If model $M_{1}$ and $M_{2}$ are a priori equally likely, that is, $P (M_{1})=P (M_{2})$, it seems reasonable to choose model $ M_{1}$ if $\Pr (M_{1}|\mathbf {x}_{n})>1/2$, and $M_{2}$ otherwise.

When the prior $\pi ^{N}(\theta )$ is improper, the Bayes factor $B_{21}^{N}( \mathbf {x}_{n})$ is not well defined. Fortunately, for nested models the intrinsic methodology solves this difficulty replacing the reference prior for the parameter of the complex model with the intrinsic prior. It is well known that the resulting Bayes factor enjoys excellent sampling properties (Casella et al. 2009; Moreno and Girón 2005; Moreno et al. 2015).

However, when the models are nonnested, as it is the case here, the intrinsic priors are not necessarily unique. In fact, we may consider either the pair of priors

$$\begin{aligned} IP_{1}=\{\pi ^{N}(\theta )1_{(-\infty ,\theta _{0})}(\theta ),\pi _{2}^{I}(\theta )1_{(\theta _{0},\infty )}(\theta )\} \end{aligned}$$

or

$$\begin{aligned} IP_{2}=\{\pi _{1}^{I}(\theta )1_{(-\infty ,\theta _{0})}(\theta ),\pi ^{N}(\theta )1_{(\theta _{0},\infty )}(\theta )\}, \end{aligned}$$

where $\pi _{1}^{I}(\theta )=\pi ^{N}(\theta )E_{y|\theta }B_{21}^{N}(y)$ and $\pi _{2}^{I}(\theta )=\pi ^{N}(\theta )E_{y|\theta }B_{12}^{N}(y)$ are intrinsic priors for the random training sample y.

We note that if the expectations $E_{y|\theta }B_{12}^{N}(y)$ and $ E_{y|\theta }B_{21}^{N}(y)$ do exist, the Bayes factors for $IP_{1}$ and $IP_{2}$ are well defined although they might differ. Further, these expectations do not necessarily exist. Let us illustrate these assertions on exponential models.

Example 1

Let X be a random variable with distribution $\displaystyle f(x|\theta )= \frac{1}{\theta }\exp (-x/\theta )$, $\theta >0$, $x>0$, and the improper Jeffreys’ prior $\pi ^{J}(\theta )=k/\theta $, where k is an arbitrary positive constant. Suppose we are interested in testing $ H_{1}:0<\theta \le 1$ versus $H_{2}:1\le \theta <\infty $. The default Bayesian models to be compared are

$$\begin{aligned} M_{1}:\left\{ f(x|\theta ),\pi ^{J}(\theta )=\frac{k_{1}}{\theta } 1_{(0,1)}(\theta )\right\} , \end{aligned}$$

and

$$\begin{aligned} M_{2}:\left\{ f(x|\theta ),\pi ^{J}(\theta )=\frac{k_{2}}{\theta } 1_{(1,\infty )}(\theta )\right\} , \end{aligned}$$

where $k_{1}$ and $k_{2}$ are arbitrary positive constant. We note that the intrinsic prior $\pi _{2}^{I}(\theta )$ does not exist as

$$\begin{aligned} E_{y|\theta }B_{12}^{J}(y)=\frac{k_{1}}{k_{2}}\frac{1}{\theta } \int _{0}^{\infty }\frac{\exp \{-y(1+1/\theta )\}}{1-\exp \{-y\}}\mathrm{d}y=\infty . \end{aligned}$$

On the other hand, $\pi _{1}^{I}(\theta )$ exists and is given by

$$\begin{aligned} \pi _{1}^{I}(\theta )=\frac{k}{1-\theta }1_{(0,1)}(\theta ), \end{aligned}$$

but the Bayes factor $B_{12}^{IP_{2}}(\mathbf {x}_{n})=\infty $ for any sample $\mathbf {x}_{n}$. Thus, the original methodology for producing intrinsic priors does not necessarily work for one-sided testing.

An alternative proposal for defining intrinsic priors for one-sided testing was given in Moreno (2005), and it was formulated in two steps. In a first one, the Bayesian model selection problem between the auxiliary models $M_{0}:f(x|\theta _{0})$ and $M_{3}:\{f(x|\theta ),$ $\pi ^{N}(\theta ),\theta \in \mathbb {R}\}$ is considered. The conditional intrinsic prior $\pi ^{I}(\theta |\theta _{0})$ for a training sample y of minimal size is given by

$$\begin{aligned} \pi ^{I}(\theta |\theta _{0})=\pi ^{N}(\theta )\mathbb {E}_{y|\theta }\frac{ f(y|\theta _{0})}{\int _{\mathbb {R}}f(y|\theta )\pi ^{N}(\theta )\mathrm{d}\theta }, \theta \in \mathbb {R}. \end{aligned}$$

This is a proper prior on the real line $\mathbb {R}$. The second step is to take the restriction of $\pi ^{I}(\theta |\theta _{0})$ to the parameter spaces $\Theta _{1}=\{\theta :$ $\theta \le \theta _{0}\}$ and $\Theta _{2}=\{\theta :\theta \ge \theta _{0}\}$, and so we have the proper priors $ \pi _{1}^{I}(\theta |\theta _{0})$ and $\pi _{2}^{I}(\theta |\theta _{0})$ for one-sided testing.

For a sample $\mathbf {x}_{n}$ from $f(x|\theta )$, the Bayes factor for comparing model $M_{1}:\left\{ f(x|\theta ),\pi _{1}^{I}(\theta |\theta _{0})\right\} $ and $M_{2}:\{f(x|\theta ),\pi _{2}^{I}(\theta |\theta _{0})\} $ is the well-defined Bayes factor for the intrinsic priors

$$\begin{aligned} B_{21}^{I}(\mathbf {x}_{n})=\frac{\int _{\theta _{0}}^{\infty }f(\mathbf {x} _{n}|\theta )\pi _{2}^{I}(\theta |\theta _{0})\mathrm{d}\theta }{\int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}^{I}(\theta |\theta _{0})\mathrm{d}\theta }. \end{aligned}$$

(1)

The fact that the conditional intrinsic priors $\pi _{1}^{I}(\theta |\theta _{0})$ and $\pi _{2}^{I}(\theta |\theta _{0})$ are centered around $\theta _{0}$ is inherited from $\pi ^{I}(\theta |\theta _{0})$.

2.1 The presence of nuisance parameters

In the presence of nuisance parameters, the above procedure to produce intrinsic priors is adapted as follows. Let $f(x|\theta ,\xi )$ be the sampling family, where we want to test $H_{1}:\theta \le \theta _{0}$ versus $H_{2}:\theta \ge \theta _{0}$, $\xi \in \varXi $ being a nuisance parameter. Let $\pi ^{N}(\theta ,\xi )$ be the starting default improper prior for the parameters.

The auxiliary models are now $M_{0}:f(x|\theta _{0},\xi _{0})$, where $\xi _{0}$ is an arbitrary but fixed point, and $M_{3}:\{f(x|\theta ,\xi ),$ $\pi ^{N}(\theta ,\xi )\}$. Assuming that $\pi ^{N}(\theta ,\xi )=\pi ^{N}(\theta )\pi ^{N}(\xi )$, the conditional intrinsic prior for $\theta ,\xi $ for the model comparison between $M_{0}$ and $M_{3}$ is given by

$$\begin{aligned} \pi ^{I}(\theta ,\xi |\theta _{0},\xi _{0})=\pi ^{N}(\theta )\pi ^{N}(\xi ) \mathbb {E}_{y|\theta ,\xi }\frac{f(y|\theta _{0},\xi _{0})}{\int _{\varXi } \pi ^{N}(\xi ) \int _{-\infty }^{\infty }f(y|\theta ,\xi )\pi ^{N}(\theta )\mathrm{d}\theta \mathrm{d}\xi }, \end{aligned}$$

(2)

where y is the random training sample of minimal size. This conditional intrinsic prior is by construction a proper prior. Let us denote $\pi _{1}^{I}(\theta ,\xi |\theta _{0},\xi _{0})$ and $\pi _{2}^{I}(\theta ,\xi |\theta _{0},\xi _{0})$ the restriction of $\pi ^{I}(\theta ,\xi |\theta _{0},\xi _{0})$ to $H_{1}$ and $H_{2}$, respectively. Then, for a sample $ \mathbf {x}_{n}$, the Bayes factor for the one-sided test is obtained as

$$\begin{aligned} B_{21}^{I}(\mathbf {x}_{n})=\frac{\int _{\varXi }\pi ^{N}(\xi _{0})\mathrm{d}\xi _{0}\int _{\varXi } \mathrm{d}\xi \int _{\theta _{0}}^{\infty }f(\mathbf {x}_{n}|\theta ,\xi )\pi _{2}^{I}(\theta ,\xi |\theta _{0},\xi _{0}) \mathrm{d}\theta }{\int _{\varXi }\pi ^{N}(\xi _{0})\mathrm{d}\xi _{0}\int _{\varXi }\mathrm{d}\xi \int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta ,\xi )\pi _{1}^{I}(\theta ,\xi |\theta _{0},\xi _{0})\mathrm{d}\theta }. \end{aligned}$$

(3)

We note that in this expression the arbitrary constant appearing in $\pi ^{N}(\xi _{0})$ cancels out in the ratio.

2.2 Multiple test

In many applications, the auxiliary model $M_{0}$ is also of interest. For instance, if $\theta $ represents the effectiveness of a new treatment we could be interested in testing whether $\theta $ is equal to $\theta _{0}$, greater than $\theta _{0}$ or smaller than $\theta _{0}$, where $\theta _{0}$ is the effectiveness of the old treatment. In this case, we have the multiple comparison of the models

$$\begin{aligned} M_{0}:f(x|\theta _{0}),\, M_{1}:\{f(x|\theta ),\pi _{1}^{I}(\theta |\theta _{0})\}, \, M_{2}:\{f(x|\theta ),\pi _{2}^{I}(\theta |\theta _{0})\}. \end{aligned}$$

For the model prior $\{P (M_{i})$ , $ i=0,1,2\}$ the posterior probabilities of the models are given by

$$\begin{aligned} \Pr (M_{i}|\mathbf {x}_{n})=\frac{m_{i}(\mathbf {x}_{n})P(M_{i})}{ \sum _{j=0}^{2}m_{j}(\mathbf {x}_{n})P (M_{j})}, i=0,1,2\ , \end{aligned}$$

(4)

where $m_{0}(\mathbf {x}_{n})=f(\mathbf {x}_{n}|\theta _{0}),$

$$\begin{aligned} m_{1}(\mathbf {x}_{n})=\int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}^{I}(\theta |\theta _{0})\mathrm{d}\theta , \end{aligned}$$

and

$$\begin{aligned} m_{2}(\mathbf {x}_{n})=\int _{\theta _{0}}^{\infty }f(\mathbf {x}_{n}|\theta )\pi _{2}^{I}(\theta |\theta _{0})\mathrm{d}\theta . \end{aligned}$$

Other extensions to multiple hypotheses with equality and inequality constraints on multidimensional parameters have been studied in Mulder (2014). He utilizes an adjusted Fractional Bayes Factor for priors centered on the boundary of the constrained parameter space.

3 Comparing Bayesian and frequentist one-sided test

The discrepancy between the Neyman fixed Type I error probability, the Fisher p value, and the Jeffreys posterior probability methodologies has been deeply discussed in the literature. Keys papers are Efron and Gous (2001); Berger (2003). In Berger (2003), it is asserted that For many types of testing, Fisher, Jeffreys and Neyman disagreed as to the basic numbers to be reported and could report considerably different conclusions in actual practice. Here we provide a different measure of the discrepancy between the conclusions from the Fisher p value and the Bayesian analysis. As we already mentioned, the Bayesian analysis uses the standard default priors whenever they are proper or the intrinsic priors when the default priors are improper.

The frequentist decision of rejecting the null $M_{1}$, conditional on a given sample $\mathbf {x}_{n}=(x_{1},\ldots ,x_{n})$, is made for any sample point $\mathbf {x}_{n}$ in the critical region

$$\begin{aligned} W_{n}^{F}(a_{n})=\{\mathbf {x}_{n}:p(\mathbf {x}_{n})\le a_{n}\}, a_{n}\in (0,1), \end{aligned}$$

where $p(\mathbf {x}_{n})$ is the p value of $\mathbf {x}_{n}$ and $a_{n}$ a specified value, typically $a_{n}=0.05$. The Bayesian rule rejects $M_{1}$ for any sample point $\mathbf {x}_{n}$ in the critical region

$$\begin{aligned} W_{n}^{B}(b_{n})=\{\mathbf {x}_{n}:\Pr (M_{1}|\mathbf {x}_{n})\le b_{n}\}, b_{n}\in (0,1), \end{aligned}$$

where $b_{n}$ is a specified value, typically $b_{n}=0.5$. The frequentist Type I and II errors are given by

$$\begin{aligned} \mathbf {E}^{F}(\theta )=(\alpha _{n}^{F},[1-\beta _{n}^{F}(\theta )]1_{(\theta _{0},\infty )}(\theta )), \end{aligned}$$

and the Bayesian by

$$\begin{aligned} \mathbf {E}^{B}(\theta )=(\alpha _{n}^{B},[1-\beta _{n}^{B}(\theta )]1_{(\theta _{0},\infty )}(\theta )), \end{aligned}$$

where $\alpha _{n}^{C}=$ $sup_{\theta \le \theta _{0}}\Pr _{\mathbf {X} _{n}|\theta }(W_{n}^{C}(a_{n}))$ and $\beta _{n}^{C}(\theta )=\Pr _{\mathbf {X}_{n}|\theta }(W_{n}^{C}(b_{n}))$ for $C=F,B$. The vectors $\mathbf {E}^{F}(\theta )$ and $ \mathbf {E}^{B}(\theta )$ are not comparable except if one of them dominates the other. The latter situation is not common as $\Pr (M_{1}|\mathbf {x}_{n})$ used to be an increasing function of $p(\mathbf {x}_{n})$.

An alternative way of comparing both testing methods is that of computing the probability of the sample region on which the decision rules disagree. The disagreement region $D_{n}(a_{n},b_{n})$ is given by

$$\begin{aligned} D_{n}(a_{n},b_{n})=[\bar{W}_{n}^{F}(a_{n})\cap W_{n}^{B}(b_{n})]\cup [W_{n}^{F}(a_{n})\cap \bar{W}_{n}^{B}(b_{n})], \end{aligned}$$

where $\bar{W}$ denotes the complement of W. When the true model is $ f(x|\theta )$ the probability of $D_{n}(a_{n},b_{n})$, conditional on $ \theta $,

$$\begin{aligned} d_{n}(\theta )=\Pr {}_{\mathbf {X}_{n}|\theta } (D_{n}(a_{n},b_{n})) , \end{aligned}$$

is our quantity of interest. The larger the probability $d_{n}(\theta )$, the larger the frequentist and Bayesian disagreement, conditional on $\theta $ .

For small or moderate sample sizes, we illustrate how large this disagreement can be on binomial, exponential, and normal models. The first two do not contain nuisance parameter, and the last one contains one nuisance parameter.

We note that when the sample size n goes to infinity and sampling from a null model, the probability of disagreement is the frequentist size, and when sampling from an alternative model it is the limit of the frequentist Type II error. For, from the consistency results in Section 4 it follows that

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr {}_{\mathbf {X}_{n}|\theta }(W_{n}^{B}(0.5))=\left\{ \begin{array}{l} 1, for \, \theta >\theta _{0}, [P_{\theta }], \\ \ \\ 0, for \, \theta <\theta _{0}, [P_{\theta }]. \end{array} \right. \end{aligned}$$

Thus, for any $\theta \ge \theta _{0}$,

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr {}_{\mathbf {X}_{n}|\theta }(D_{n}(0.05,0.5))=\lim _{n\rightarrow \infty }\Pr {}_{\mathbf {X}_{n}|\theta }( \bar{W}_{n}^{F}(0.05))=1-\lim _{n\rightarrow \infty }\beta _{n}^{F}(\theta ), [P_{\theta }], \end{aligned}$$

and for $\theta \le \theta _{0}$,

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr {}_{\mathbf {X}_{n}|\theta }(D_{n}(0.05,0.5))=\lim _{n\rightarrow \infty }\Pr {}_{\mathbf {X}_{n}|\theta }(W_{n}^{F}(0.05)), [P_{\theta }]. \end{aligned}$$

The symbol $[P_{\theta }]$ means that the limit is in probability when sampling from $f(x|\theta )$.

3.1 Bernoulli model

Let X be a random variable with distribution $Ber(x|\theta )=\theta ^{x}(1-\theta )^{1-x}$, $x=0,1$, $\theta \in (0,1)$. We want to test $ H_{1}:\theta \le \theta _{0}$ versus $H_{2}:\theta \ge \theta _{0}$. Since the Jeffreys’ prior for $\theta $ is the proper Beta distribution $Be(\theta |1/2,1/2)$, we compare the Bayesian models

$$\begin{aligned} M_{1}:\left\{ Ber(x|\theta ),\pi _{1}^{J}(\theta |\theta _{0})=\frac{\theta ^{-1/2}(1-\theta )^{-1/2}}{I_{\theta _{0}}}1_{(0,\theta _{0})}(\theta )\right\} , \end{aligned}$$

and

$$\begin{aligned} M_{2}:\left\{ Ber(x|\theta ),\pi _{2}^{J}(\theta |\theta _{0})=\frac{\theta ^{-1/2}(1-\theta )^{-1/2}}{\pi -I_{\theta _{0}}}1_{(\theta _{0},1)}(\theta )\right\} , \end{aligned}$$

where $I_{\theta _{0}}=\int _{0}^{\theta _{0}}$ $\theta ^{-1/2}(1-\theta )^{-1/2}\mathrm{d}\theta $ is the incomplete beta function. For a sample $\mathbf {x} _{n}=(x_{1},\ldots ,x_{n})$ from $Ber(x|\theta )$ and model prior $P (M_{1})=P (M_{2})=1/2$, the posterior probability of $M_{1}$ is given by

$$\begin{aligned} \Pr (M_{1}|\mathbf {x}_{n})=\left( 1+\frac{I_{\theta _{0}}}{\pi -I_{\theta _{0}}} \frac{\int _{\theta _{0}}^{1}\theta ^{t_{n}+1/2}(1-\theta )^{n-t_{n}+1/2}\mathrm{d}\theta }{\int _{0}^{\theta _{0}}\theta ^{t_{n}+1/2}(1-\theta )^{n-t_{n}+1/2}\mathrm{d}\theta }\right) ^{-1}, \end{aligned}$$

where $t_{n}=\sum _{i=1}^{n}x_{i}$. For $\theta _{0}=1/2$, the critical and discrepancy regions for the conventional values $a_{n}=$ 0.05, $b_{n}=0.5$ , and sample sizes $n=5,10,20,40$ are given in Table 1.

Table 1 Critical and disagreement regions

Full size table

From Table 1, we note that $W_{n}^{F}(0.05)\subset W_{n}^{B}(0.5)$, and hence, $ D_{n}$ contains critical points under the Bayesian approach that are not critical under the frequentist.

Figure 1 displays the probability $d_{n}(\theta )$ for the sample sizes $ n=10,20$ as $\theta $ ranges over (0, 1). From Fig. 1, it follows that the curve of disagreement probabilities is a nonsymmetric curve around the boundary $\theta _{0}=0.5$. The points of large disagreement probabilities are located in the alternative hypothesis. For $n=10$, the mode of the curve is as large as 0.82, and it is attained $\theta =0.66$. Further, for $\theta \in (0.45,0.84)$ the probability of disagreement is greater than 0.5. The asymmetry arises because of in the frequentist decision rule the null hypothesis $\theta \le 0.5$ plays a different role than the alternative $\theta \ge 0.5$.

3.2 Exponential model

Let X be a random variable with the exponential distribution $ \, f(x|\theta )=1/\theta \exp (-x/\theta )$, $x>0$, $\theta >0$, and consider the one-sided testing with null $H_{1}:0<\theta \le 1$ versus the alternative $H_{2}:1\le \theta <\infty $. The Jeffreys’ prior for $\theta $ is the improper density $\pi ^{J}(\theta )=k/\theta $ that cannot be used for the Bayesian test. To derive the intrinsic priors, we follow the two steps presented in Sect. 2.2. First, we find the intrinsic prior $\pi ^{I}(\theta |\theta _{0}=1)$ arising from the model comparison $ M_{0}:f(x|\theta _{0})$ versus $M_{3}:\{f(x|\theta ),\pi ^{J}(\theta )\}$, which turns out to be $\pi ^{I}(\theta |1)=(1+\theta )^{-2},\theta \ge 0$, and the restriction of $\pi ^{I}(\theta |1)$ to the interval $H_{1}$ and $H_{2}$ yields the Bayesian models

$$\begin{aligned} M_{1}:\left\{ f(x|\theta )=\frac{1}{\theta }\exp \left( -\frac{x}{\theta } \right) , \, \pi _{1}^{I}(\theta |1)=\frac{2}{(1+\theta )^{2}} 1_{(0,1)}(\theta )\right\} , \end{aligned}$$

and

$$\begin{aligned} M_{2}:\left\{ f(x|\theta )=\frac{1}{\theta }\exp \left( -\frac{x}{\theta } \right) , \, \pi _{2}^{I}(\theta |1)=\frac{2}{(1+\theta )^{2}} 1_{(1,\infty )}(\theta )\right\} . \end{aligned}$$

Hence, for a sample $\mathbf {x}_{n}$ from $f(x|\theta )$ and the model prior $P (M_{1})=P (M_{2})=1/2$, the posterior probability of model $M_{1}$ is given by

$$\begin{aligned} \Pr (M_{1}|n,\bar{x}_{n})=\left( 1+\frac{\int \nolimits _{1}^{\infty }(1+\theta )^{-2 }\theta ^{-n}\exp \{-n \bar{x}_{n}/\theta \}\mathrm{d}\theta }{\int \nolimits _{0}^{1}(1+\theta )^{-2 }\theta ^{-n}\exp \{-n \bar{x}_{n}/\theta \}\mathrm{d}\theta }\right) ^{-1}, \end{aligned}$$

where $\bar{x}_{n}=\sum _{i=1}^{n}x_{i}/n$.

Table 2 presents the critical regions $W_{n}^{F}(0.05)$ and $W_{n}^{B}(0.5)$ and the disagreement region $D_{n}(0.05,0.5)$ for the sample sizes $n=5$, 10, 20, 40.

Table 2 Critical and disagreement regions

Full size table

Since $W_{n}^{F}(0.05)\subset W_{n}^{B}(0.5)\,$the set $D_{n}$ contains points $\bar{x}_{n}$ for which the null $H_{1}$ is rejected under the Bayesian analysis, but it is accepted under the frequentist. Thus, the frequentist is a more conservative test than the Bayesian. Figure 2 displays the probability $d_{n}(\theta )$ for the sample size $n=5$, 10 as $\theta $ ranges in (0, 5). This figure shows that the probability of disagreement is a nonsymmetric curve around the boundary $\theta =1$. The probability of disagreement attains its maximum in the alternative hypothesis $H_{2}$, a similar behavior as that in the preceding Bernoulli one-sided testing.

3.3 Normal model

Let us consider the random variable X with normal distribution $N(x|\mu ,$ $\sigma ^{2})$, where $\mu $ is the parameter of interest and $\sigma $ is the nuisance parameter, and consider the one-sided testing $H_{1}:\mu \le 0$ versus $H_{2}:\mu \ge 0$. The Jeffreys’ prior is the improper density $\pi ^{J}(\mu ,\sigma )=k/\sigma $. Then, we derive the intrinsic priors using the auxiliary models $M_{0}:N(x|0,\sigma _{0}^{2})$ and $ M_{3}:\{N(x|\mu ,\sigma ^{2}),$ $\pi ^{J}(\mu ,\sigma )=k/\sigma \}$, where $ \sigma _{0}$ is a unknown but fixed value. The intrinsic prior of $(\mu ,\sigma )$, conditional on $(0,\sigma _{0})$, turns out to be the proper density

$$\begin{aligned} \pi ^{I}(\mu ,\sigma |0,\sigma _{0})=N\left( \mu |0,\frac{ \sigma _{0}^{2}+\sigma ^{2}}{2}\right) HC^{+}(\sigma |0,\sigma _{0}), \end{aligned}$$

where $HC^{+}(\sigma |0,\sigma _{0})$ represents the half Cauchy distribution on the positive part of the real line. The restriction of $\pi ^{I}(\mu ,\sigma |0,\sigma _{0})$ to the sets $\{\mu :\mu \le 0\}$ and $ \{\mu :\mu \ge 0\}$ yields the priors

$$\begin{aligned} \pi _{i}^{I}(\mu ,\sigma |0,\sigma _{0})=2 N\left( \mu |0 ,\frac{\sigma _{0}^{2}+\sigma ^{2}}{2}\right) HC^{+}(\sigma |0,\sigma _{0}) 1_{H_{i}}(\mu ), i=1,2. \end{aligned}$$

Integrating $\sigma _{0}$ with respect to the improper prior $\pi ^{N}(\sigma _{0})=k/\sigma _{0}$, we have the unconditional intrinsic priors

$$\begin{aligned} \pi _{i}^{I}(\mu ,\sigma |0)=2k 1_{H_{i}}(\mu )\int _{0}^{\infty }N\left( \mu | 0,\frac{\sigma _{0}^{2}+\sigma ^{2}}{2} \right) HC^{+}(\sigma |0,\sigma _{0})\frac{1}{\sigma _{0}}\mathrm{d}\sigma _{0}, \,i=1,2. \end{aligned}$$

This yields the Bayesian model $M_{i}:\left\{ N(x|\mu ,\sigma ^{2}),\pi _{i}^{I}(\mu ,\sigma |0)\right\} ~$for $i=1,2$. For a sample $\mathbf {x} _{n}=(x_{1},\ldots ,x_{n})$ from the normal distribution $N(x|\mu ,\sigma ^{2})$ and the model prior $P(M_{1})=P(M_{2})=0.5$, the posterior probability of $M_{1}$ turns out to be

$$\begin{aligned} \Pr (M_{1}|n,\bar{x}_{n},s_{n})=\left( 1+\frac{\int _{0}^{\infty } \int _{0}^{\pi /2}g(\mathbf {x}_{n},\varphi ,\mu )\mathrm{d}\varphi d\mu }{\int _{-\infty }^{0} \int _{0}^{\pi /2}g(\mathbf {x}_{n},\varphi ,\mu )\mathrm{d}\varphi \mathrm{d}\mu }\right) ^{-1}, \end{aligned}$$

where

$$\begin{aligned} g(\mathbf {x}_{n},\varphi ,\mu )=(\sin \varphi )^{-n}\left( 2\mu ^{2}+\frac{ ns_{n}^{2}+n(\bar{x}_{n}-\mu )^{2}}{\sin ^{2}\varphi }\right) ^{-(n+1)/2}, \end{aligned}$$

$\bar{x}_{n}=\sum _{i=1}^{n}x_{i}/n$ and $s_{n}^{2}=\sum _{i=1}^{n}(x_{i}-\bar{ x}_{n})^{2}/n$. Thus, the posterior probability of $M_{1}$ depends on the data $(\bar{x}_{n},s_{n}^{2})$ through the statistic $t_{n-1}=\sqrt{n}$ $ \bar{x}_{n}/s_{n}$. Further, $\Pr (M_{1}|n,t_{n-1})$ is a decreasing function of $t_{n-1}$. Since $\Pr (M_{1}|n,t_{n-1})$ $=0.5$ if and only if $ t_{n-1}=0$, the Bayesian critical region is given by

$$\begin{aligned} W_{n}^{B}=\{t_{n-1}:t_{n-1}\ge 0\}. \end{aligned}$$

The frequentist critical region $W_{n}^{F}$ for rejecting the null $H_{1}$ is given by the quantile 0.95 of the student t distribution with $n-1$ degrees of freedom. Critical and disagreement regions for sample sizes $ n=5,10,20,40,80$ are displayed in Table 3.

Table 3 Critical and disagreement regions

Full size table

From Table 3, it follows that $W_{n}^{F}(0.05)\subset W_{n}^{B}(0.5)$, and hence, $D_{n}$ contains points $\bar{x}_{n}$ that yield the Bayesian rejection of the null but lead to the frequentist acceptation. Thus, the frequentist is a more conservative test than the Bayesian.

Figure 3 displays the probabilities of $D_{n}(\mu ,\sigma )$ for the sample sizes $n=5,10$ as $\mu $ ranges over $(-3,4)$ and $\sigma =1$. The message of these probabilities is that the probability of discrepancy is again a nonsymmetric curve around the boundary $\mu =0$. For $n=5$, the maximum probability is as large as 0.712 and it is attained at $\theta =1.06$, a point located in the alternative hypothesis $H_{2}$. Further, for $\mu \in (0.08,2.09)$ the probability of disagreement is larger than 0.5.

4 Asymptotic

Consistency is a key property of any statistical testing procedure as it means that the procedure provides the correct decision without uncertainty when the sample size goes to infinity. An inconsistent procedure should not be used for hypothesis testing.

Definition 1

Given a class of models $\mathcal {M}=\{M_{i},P (M_{i}),$ $i=1,\ldots ,p\}$, where $M_{i}$ $=\{f_{i}(x|\theta _{i}),\pi _{i}(\theta _{i}),\theta _{i}\in \Theta _{i}\}$ and $P (M_{i})$ is the prior probability of model $M_{i}$, the Bayesian model selection is posterior model consistent in $\mathcal {M}$ if the equations

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr (M_{i}|\mathbf {x}_{n})=\left\{ \begin{array}{l} 1, [P_{\theta _{j}}], for j=i, \\ 0, [P_{\theta _{j}}], for j\ne i, \end{array} \right. \end{aligned}$$

(5)

hold for any $i,j=1,\ldots p$. The symbol $[P_{\theta _{j}}]$ reminds us that the limit is in probability when sampling from $f_{j}(x|\theta _{j})$.

From the expression of $\Pr (M_{i}|\mathbf {x}_{n})$, it follows that posterior model consistency in $\mathcal {M}$ holds if and only if for any $ \theta _{j}\in \Theta _{j}$ we have that

$$\begin{aligned} \lim _{n\rightarrow \infty }\sum _{\begin{array}{c} k=1 \\ k\ne i \end{array}}^{p}B_{ki}( \mathbf {x}_{n}) \frac{P (M_{k})}{P (M_{i})}=\left\{ \begin{array}{l} 0, [P_{\theta _{j}}], j=i, \\ \infty , [P_{\theta _{j}}], j\ne i, \end{array} \right. \mathrm { for } \, \, i,j=1,2,\ldots ,p. \end{aligned}$$

(6)

Theorem 1 proves that under mild conditions on the likelihood and priors, posterior model consistency holds for the Bayesian one-sided testing.

Theorem 1

Let us consider the class of models $\mathcal {M}=\{M_{i},P (M_{i}),$ $ i=1,2\}$, where $M_{i}=\{f(x|\theta ),\pi _{i}(\theta ),$ $\theta \in \Theta _{i}\}$ with $\Theta _{1}=\{\theta :\theta \le \theta _{0}\}$ and $\Theta _{2}:\{\theta :\theta \ge \theta _{0}\}$, and $P (M_{i})>0$, $i=1,2$. Then, if the likelihood $f(\mathbf {x}_{n}|\theta )=\prod \nolimits _{i=1}^{n}f(x_{i}|\theta )$ and prior $\pi _{i}(\theta )$ are continuous functions such that $f(\mathbf {x}_{n}|\theta _{0})>0$ and $\pi _{i}(\theta _{0})>0$, the Bayes factor

$$\begin{aligned} B_{21}(\mathbf {x}_{n})=\frac{\int _{\theta _{0}}^{\infty }f(\mathbf {x} _{n}|\theta )\pi _{2}(\theta )\mathrm{d}\theta }{\int _{-\infty }^{\theta _{0}}f( \mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta } \end{aligned}$$

satisfies

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{21}(\mathbf {x}_{n})=\left\{ \begin{array}{l} 0, \ \ [P_{\theta }],\ if \ \ \theta <\theta _{0}, \\ \infty , [P_{\theta }], if\ \ \ \ \theta >\theta _{0}. \end{array} \right. \end{aligned}$$

Further, the speed of the limit in probability $[P_{\theta }]$ is $ O(e^{-nA(\theta )})$ if $\theta <\theta _{0}$ with $A(\theta )=E_{\theta }\log f(x|\theta )-E_{\theta }\log f(x|\theta _{0})>0$, and $O(e^{nA(\theta )})$ if $\ \theta >\theta _{0}$.

Proof

Let us write the Bayes factor $B_{21}(\mathbf {x}_{n})$ as

$$\begin{aligned} B_{21}(\mathbf {x}_{n})=B_{01}(\mathbf {x}_{n}) B_{20}(\mathbf {x}_{n}), \end{aligned}$$

where

$$\begin{aligned} B_{01}(\mathbf {x}_{n})=\frac{f(\mathbf {x}_{n}|\theta _{0})}{\int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta } \text { and } B_{20}(\mathbf {x}_{n})= \frac{\int _{\theta _{0}}^{\infty }f(\mathbf {x}_{n}|\theta )\pi _{2}(\theta )\mathrm{d}\theta }{f(\mathbf {x}_{n}|\theta _{0})}. \end{aligned}$$

When sampling from $f(x|\theta _{1})$ for $\theta _{1}<\theta _{0}$, we can write

$$\begin{aligned} -2\log B_{01}(\mathbf {x}_{n})=-2\log \frac{f(\mathbf {x}_{n}|\theta _{0})}{f( \mathbf {x}_{n}|\theta _{1})}-2\log \frac{f(\mathbf {x}_{n}|\theta _{1})}{ \int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta }. \end{aligned}$$

(7)

Using the Laplace approximation for the integral $\int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta $, the second term of the right side of (7) can be written as

$$\begin{aligned}&-2\lim _{n\rightarrow \infty }\log \frac{f(\mathbf {x}_{n}|\theta _{1})}{ \int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta } \\= & {} -2\lim _{n\rightarrow \infty }\log \left( \frac{f(\mathbf {x}_{n}|\theta _{1})}{f(\mathbf {x}_{n}|\hat{\theta }_{1})}\frac{1}{n^{-1/2}}\right) \\= & {} \lim _{n\rightarrow \infty }\left( -2\log \frac{f(\mathbf {x}_{n}|\theta _{1})}{f(\mathbf {x}_{n}|\hat{\theta }_{1})}\right) -2\lim _{n\rightarrow \infty }\log n^{1/2},[P_{\theta _{1}}], \end{aligned}$$

where $f(\mathbf {x}_{n}|\hat{\theta }_{1})=\sup _{\theta <\theta _{0}}$ $f( \mathbf {x}_{n}|\theta )$. Further, the limit in probability $[P_{\theta _{1}}]$ of $-2\log f(\mathbf {x}_{n}|\theta _{1})/f(\mathbf {x}_{n}|\hat{\theta }_{1})$ is a positive random variable that does not degenerate to a constant (in fact, it has a Chi-squared distribution; see Wilks 1963, chapter 13). Moreover, the first term in the right-hand side of (7) can be written as

$$\begin{aligned} -2\log \frac{f(\mathbf {x}_{n}|\theta _{0})}{f(\mathbf {x}_{n}|\theta _{1})} =2n\left( \frac{1}{n}\sum \limits _{i=1}^{n}\log f(x_{i}|\theta _{1})-\frac{1}{ n}\sum \limits _{i=1}^{n}\log f(x_{i}|\theta _{0})\right) . \end{aligned}$$

(8)

If $A(\theta _{1})$ is the limit in probability $[P_{\theta _{1}}]$ of the expression inside the parenthesis in (8), we have by the Law of Large Numbers that

$$\begin{aligned} A(\theta _{1})= & {} \lim _{n\rightarrow \infty }\left( \frac{1}{n} \sum \limits _{i=1}^{n}\log f(x_{i}|\theta _{1})-\frac{1}{n} \sum \limits _{i=1}^{n}\log f(x_{i}|\theta _{0})\right) \\= & {} E_{\theta _{1}}\log f(x|\theta _{1})-E_{\theta _{1}}\log f(x|\theta _{0})>0,[P_{\theta _{1}}], \end{aligned}$$

where from the Jensen inequality $A(\theta _{1})>0$. Thus,

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( -2\log \frac{f(\mathbf {x}_{n}|\theta _{0}) }{f(\mathbf {x}_{n}|\theta _{1})}\right) =\lim _{n\rightarrow \infty }2nA(\theta _{1})=\infty ,[P_{\theta _{1}}]. \end{aligned}$$

(9)

We note that this convergence is not only in probability, but it is almost surely $[P_{\theta _{1}}]$. Therefore,

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{01}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty }n^{1/2}e^{-nA(\theta _{1})}=0,[P_{\theta _{1}}]. \end{aligned}$$

(10)

On the other hand, when sampling from $f(x|\theta _{1})$ the Bayes factor $ B_{20}(\mathbf {x}_{n})$ is finite, in fact it is smaller than or equal to 1 . For,

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{20}(\mathbf {x}_{n})= & {} \lim _{n\rightarrow \infty }\frac{\int _{\theta _{0}}^{\infty }f(\mathbf {x}_{n}|\theta )\pi _{2}(\theta )\mathrm{d}\theta }{f(\mathbf {x}_{n}|\theta _{0})} \nonumber \\\le & {} \lim _{n\rightarrow \infty }\frac{\sup _{\theta \ge \theta _{0}}f( \mathbf {x}_{n}|\theta )}{f(\mathbf {x}_{n}|\theta _{0})}=1, [P_{\theta _{1}}], \end{aligned}$$

(11)

where the last equality follows from the fact that the MLE $\hat{\theta }_{n}$ of $\theta $ in the set $\{\theta \ge \theta _{0}\}$ converges in probability $[P_{\theta _{1}}]$ to $\theta _{0}$ . Thus, from (10) and (11) it follows that

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{21}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty }B_{20}(\mathbf {x}_{n})B_{01}(\mathbf {x}_{n})=0,[P_{\theta _{1}}]. \end{aligned}$$

(12)

This proves the first assertion.

Similar arguments to those used for proving (10) yield that when sampling from $f(x|\theta _{2})$ for $\theta _{2}>\theta _{0}$,

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{20}(\mathbf {x}_{n})=\infty , [P_{\theta _{2}}], \end{aligned}$$

(13)

and the rate of convergence is $O(e^{nA(\theta _{2})})$, where $A(\theta _{2})=E_{\theta _{2}}\log f(x|\theta _{2})-E_{\theta _{2}}\log f(x|\theta _{0})>0$. Further, when sampling from $f(x|\theta _{2})$, the Bayes factor $ B_{01}(\mathbf {x}_{n})$ is greater than zero. Indeed, this result follows from

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{01}(\mathbf {x}_{n})= & {} \lim _{n\rightarrow \infty }\frac{f(\mathbf {x}_{n}|\theta _{0})}{\int _{-\infty }^{\theta _{0}}f( \mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta } \nonumber \\\ge & {} \lim _{n\rightarrow \infty }\frac{f(\mathbf {x}_{n}|\theta _{0})}{ \sup _{\theta \le \theta _{0}}f(\mathbf {x}_{n}|\theta )}=1, [P_{\theta _{2}}]. \end{aligned}$$

(14)

Thus, from (13) and (14) we have that

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{21}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty }B_{20}(\mathbf {x}_{n})B_{01}(\mathbf {x}_{n})=\infty , [P_{\theta _{2}}]. \end{aligned}$$

(15)

This proves the second assertion and completes the proof of the theorem. $\square $

Corollary 1

Let $f(x|\theta ,\xi )$ be a sampling model, where $\xi \in \varXi $ is a nuisance parameter, and consider the one-sided testing $H_{1}:\theta \le \theta _{0}$ versus $H_{2}:\theta \ge \theta _{0}$. Then, the Bayes factor $ B_{21}^{I}(\mathbf {x}_{n})$ for the intrinsic priors $\pi _{i}^{I}(\theta ,\xi |\theta _{0})$ for $i=1,2$ given in (2) is consistent.

Proof

The Bayes factor $B_{21}^{I}(\mathbf {x}_{n})$ can be written as

$$\begin{aligned} B_{21}^{I}(\mathbf {x}_{n})=\frac{\int _{\theta _{0}}^{\infty }h_{2}(\mathbf {x} _{n}|\theta )\mathrm{d}\theta }{\int _{-\infty }^{\theta _{0}}h_{1}(\mathbf {x} _{n}|\theta )\mathrm{d}\theta }, \end{aligned}$$

where

$$\begin{aligned} h_{i}(\mathbf {x}_{n}|\theta )=\int _{\varXi }\int _{\varXi }f(\mathbf { x}_{n}|\theta ,\xi )\pi _{i}^{I}(\theta ,\xi |\theta _{0},\xi _{0})\pi ^{N}(\xi _{0})\mathrm{d}\xi \mathrm{d}\xi _{0} . \end{aligned}$$

This proves the assertion. $\square $

Let us consider the multiple model comparison between model $ M_{0}:f(x|\theta _{0})$, $M_{1}:\{f(x|\theta _{1}),\pi _{1}(\theta _{1}),\theta _{1}<\theta _{0}\}$ and $M_{2}:\{f(x|\theta _{2}),\pi _{2}(\theta _{2}),\theta _{2}>\theta _{0}\}$ with model prior $P (M_{i})$ $ >0$ for $i=0,1,2\,$. Theorem 2 proves that the Bayesian model selection is posterior model consistent and the rate of convergence is exponential except when sampling from $M_{0}$ in which case the rate is potential.

Theorem 2

Under the conditions in Theorem 1, and model prior $P (M_{i})$ $>0$, $ i=0,1,2\,$, we have that

(i)

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr (M_{0}|\mathbf {x}_{n})=\left\{ \begin{array}{c} 1, [P_{\theta }], if \theta =\theta _{0}, \\ 0, [P_{\theta }], if \ \theta \ne \theta _{0}, \end{array} \right. \end{aligned}$$

and the rate of convergence for $\theta =\theta _{0}$ is $O(n^{b})$ with $ b>0,$ and for $\theta \ne \theta _{0}$ is $O(e^{-nA(\theta )})$, where $ A(\theta )=\lim _{n\rightarrow \infty }E_{\theta }\log f(x|\theta )-E_{\theta }f(x|\theta _{0})>0$.

(ii) For $i\ne 0$, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\Pr (M_{i}|\mathbf {x}_{n})=\left\{ \begin{array}{c} 1, [P_{\theta }], \ if \theta <\theta _{0}, \\ 0, [P_{\theta }], if \ \theta \ge \theta _{0}, \end{array} \right. ,i=1,2 \end{aligned}$$

and the rate of convergence is again exponential.

Proof

We prove that Eq. (6) holds for $i,j=0,1,2,$ and any $\theta $ in the real line $\mathbb {R}$.

To prove i) we have to prove that when sampling from $f(x|\theta _{0})$

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( B_{10}(\mathbf {x}_{n})\frac{P (M_{1})}{ P(M_{0})}+B_{20}(\mathbf {x}_{n})\frac{P(M_{2})}{P(M_{0})}\right) =0 , [P_{\theta _{0}}], \end{aligned}$$

(16)

and when sampling from $\theta \ne \theta _{0,}$

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( B_{10}(\mathbf {x}_{n})\frac{P (M_{1})}{ P(M_{0})}+B_{20}(\mathbf {x}_{n})\frac{P(M_{2})}{P (M_{0})}\right) =\infty , [P_{\theta }],\theta \ne \theta _{0}. \end{aligned}$$

(17)

To prove (16), we use the Laplace approximation to the integral on $B_{10}( \mathbf {x}_{n}),$ and write

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{10}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty } \frac{\int _{-\infty }^{\theta _{0}}f(\mathbf {x}_{n}|\theta )\pi _{1}(\theta )\mathrm{d}\theta }{f(\mathbf {x}_{n}|\theta _{0})}=\lim _{n\rightarrow \infty }\frac{f( \mathbf {x}_{n}|\hat{\theta }_{1})}{f(\mathbf {x}_{n}|\theta _{0})} n^{-1/2}=0,[P_{\theta _{0}}]. \end{aligned}$$

(18)

The last equality follows from the fact that $\lim _{n\rightarrow \infty }f( \mathbf {x}_{n}|\hat{\theta }_{1})/f(\mathbf {x}_{n}|\theta _{0})),[P_{\theta _{0}}]$ is a positive random variable. The same arguments yields

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{20}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty } \frac{\int _{\theta _{0}}^{\infty }f(\mathbf {x}_{n}|\theta )\pi _{2}(\theta )\mathrm{d}\theta }{f(\mathbf {x}_{n}|\theta _{0})}=\lim _{n\rightarrow \infty }\frac{f( \mathbf {x}_{n}|\hat{\theta }_{2})}{f(\mathbf {x}_{n}|\theta _{0})} n^{-1/2}=0,[P_{\theta _{0}}]. \end{aligned}$$

(19)

Then, (18) and (19) prove (16) and the first assertion in (i) is proved. If we now sampling from $f(x|\theta _{1})$ for $\theta _{1}<\theta _{0}$, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{10}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty } \frac{f(\mathbf {x}_{n}|\hat{\theta }_{1})}{f(\mathbf {x}_{n}|\theta _{1})} \frac{f(\mathbf {x}_{n}|\theta _{1})}{f(\mathbf {x}_{n}|\theta _{0})} n^{-1/2}=\infty ,[P_{\theta _{1}}]. \end{aligned}$$

(20)

The last equality follows from equality (10) in Theorem 1. A similar argument proves that when sampling from $f(x|\theta _{2})$ for $\theta _{2}>\theta _{0},$

$$\begin{aligned} \lim _{n\rightarrow \infty }B_{20}(\mathbf {x}_{n})=\lim _{n\rightarrow \infty } \frac{f(\mathbf {x}_{n}|\hat{\theta }_{2})}{f(\mathbf {x}_{n}|\theta _{2})} \frac{f(\mathbf {x}_{n}|\theta _{2})}{f(\mathbf {x}_{n}|\theta _{0})} n^{-1/2}=\infty ,[P_{\theta _{2}}]. \end{aligned}$$

(21)

Expressions (20) and (21) prove the second assertion in (i). This completes the proof of part (i).

To prove (ii) for $i=1$, we have to prove that

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( B_{01}(\mathbf {x}_{n})\frac{P (M_{0})}{ P(M_{1})}+B_{21}(\mathbf {x}_{n})\frac{P (M_{2})}{P (M_{1})}\right) =0 , [P_{\theta _{1}}], \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( B_{01}(\mathbf {x}_{n})\frac{P (M_{0})}{ P (M_{1})}+B_{21}(\mathbf {x}_{n})\frac{P (M_{2})}{P (M_{1})}\right) =\infty , [P_{\theta }],\theta \ne \theta _{1}. \end{aligned}$$

And for $i=2$, arguments are similar to that used for proving (i) show that

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( B_{01}(\mathbf {x}_{n})\frac{P (M_{0})}{ P (M_{1})}+B_{12}(\mathbf {x}_{n})\frac{P (M_{1})}{P(M_{2})}\right) =0 , [P_{\theta _{2}}], \end{aligned}$$

and

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( B_{01}(\mathbf {x}_{n})\frac{P (M_{0})}{ P(M_{1})}+B_{12}(\mathbf {x}_{n})\frac{P (M_{1})}{P(M_{2})}\right) =\infty , [P_{\theta }],\theta \ne \theta _{2}. \end{aligned}$$

This completes the proof of the theorem. $\square $

We note that these theorems are valid for the intrinsic priors $\pi _{1}^{I}(\theta |\theta _{0})$ and $\pi _{2}^{I}(\theta |\theta _{0})$.

5 Concluding remarks

For small and moderate samples sizes, a comparison between the decisions of Bayesian and frequentist one-sided testing has been presented. The comparison is based on the probability that the procedures make opposite decisions. This comparison indicates that for usual sampling families the probability is maximum when the true model is in the alternative hypothesis and not at the boundary of the hypotheses. This asymmetric behavior is inherited from the asymmetric role the frequentist approach assigns to the null and the alternative hypothesis.

It is interesting to point out that in our one-sided testing examples the sampling region in which the decisions disagree contains points that the frequentist analysis accepts the null hypothesis and the Bayesian rejects it $(W^{F}\subset W^{B})$. For other hypothesis testing, the disagreement region is not necessarily of this type, as the following simple example shows.

Example 2

Let us consider the two-sided testing on the mean $\theta $ of the normal distribution $N(x|\theta ,1)$. For the null $H_{1}:\theta =0$ and the alternative $H_{2}:\theta \ne 0$ with the prior $N(\theta |0,2)$, the frequentist rejection region contains the Bayesian rejection region $ (W^{B}\subset W^{F})$, which is exactly the opposite what occurs in the one-sided testing.

Figure 4 displays the probability of disagreement, conditional on $\theta $ , as $\theta $ ranges over $(-5,5)$ for $n=1$. The disagreement region is given by $D_{1}=\{x:1.96\le |x|$ $\le 2.22\}$. The curve as a function of $ \theta ~$represents the probability that the Bayesian analysis accepts the null and the frequentist rejects the null. The maximum probability of disagreement is equal to 0.103 and it is attained at $|\theta |$ $=2.105$. This probability grows as n grows. For instance, for $n=20$ the shape of the curve is similar to that in Fig. 4, but the mode is as large as 0.64 and it is attained at $|\theta |$ $=0.34$.

On the other hand, for large sample sizes we have proved that under mild conditions the Bayesian procedure provides consistent posterior model probabilities. This assertion is also valid in the presence of nuisance parameters. Moreover, the speed of the convergence is surprisingly fast. For the multiple test $H_{0}:\theta =\theta _{0},$ $\ H_{1}:\theta \le \theta _{0},$ $H_{2}:\theta \ge \theta _{0}$ the Bayesian testing procedure is also consistent, and the speed of convergence of the posterior probabilities is also exponential except when sampling from $H_{0}$ in which case it is $ O(n^{1/2})$. We recall that the frequentist test finds serious difficulties for dealing with multiple tests.

References

Berger JO (2003) Could Fisher, Jeffreys and Neyman have agreed on testing? Stat Sci 18:1–32
Article MathSciNet Google Scholar
Berger JO, Mortera J (1999) Default Bayes factors for one-sided hypothesis testing. J Am Stat Assoc 94:542–554
Article Google Scholar
Berger JO, Pericchi LR (1996) The intrinsic Bayes factor for linear models (with discussion). In: Bernardo JM et al (eds) Bayesian statistics, vol 5. Oxford University Press, New York, pp 25–44
Google Scholar
Berger JO, Sellke T (1987) Testing a point null hypothesis: the irreconcilability of p values and evidence. J Am Stat Assoc 82:112–122
MathSciNet MATH Google Scholar
Casella G, Berger R (1987) Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J Am Stat Assoc 82:106–111
Article MathSciNet Google Scholar
Casella G, Girón FJ, Martínez ML, Moreno E (2009) Consistency of Bayesian procedures for variable selection. Ann Stat 37:1207–1228
Article MathSciNet Google Scholar
Casella G, Moreno E, Girón FJ (2014) Cluster analysis, model selection, and prior distributions on models, Bayesian. Analysis 9:613–658
MathSciNet MATH Google Scholar
Dickey JM (1977) Is the tail area useful as an approximate Bayes factor? J Am Stat Assoc 72:138–142
Article MathSciNet Google Scholar
Dudley RM, Haughton D (1997) Information criteria for multiple data sets and restricted parameters. Stat Sin 7:265–284
MathSciNet MATH Google Scholar
Edwards W, Lindman H, Savage L (1963) Bayesian statistical inference for psychological research. Psychol Rev 70:193–242
Article Google Scholar
Efron B, Gous A (2001) Scales of evidence for model selection: Fisher versus Jeffreys. In: Lahiri P (ed) Lecture notes-monograph series, vol 38. Institute of Mathematical Statistics, Hayward, pp 208–246
Google Scholar
Girón FJ, Martínez ML, Moreno E, Torres F (2006) Objective testing procedures in linear models: calibration of the p-values. Scand J Stat 33:765–784
Article MathSciNet Google Scholar
Kass RE, Vaidyanathan SK (1992) Approximate Bayes factors and orthogonal parameters. With application to testing equality of two binomial proportions. J R Stat Soc Ser B 54:129–144
MathSciNet MATH Google Scholar
Micheas AC, Dey DK (2003) Prior and posterior predictive p-values in the one-side location parameter testing problem. Sankhya Ser A 65:158–178
MATH Google Scholar
Moreno E (1997) Bayes factors for intrinsic and fractional priors in nested models, Bayesian robustness. In: Dodge Y (ed) L$_{1}$ Statistical procedures and related topics, vol 31 of lecture notes-monograph series. Institute of Mathematical Statistics, Hayward, CA, pp 257–270
Moreno E (2005) Objective Bayesian analysis for one-sided testing. Test 14:181–198
Article MathSciNet Google Scholar
Moreno E, Girón FJ (2005) Consistency of Bayes factors for linear models. C R Acad Sci Paris Ser I 340:911–914
Article Google Scholar
Moreno E, Bertolino F, Racugno W (1998) An intrinsic limiting procedure for model selection and hypothesis testing. J Am Stat Assoc 93:1451–1460
Article Google Scholar
Moreno E, Girón FJ, Casella G (2010) Consistency of objective Bayes factors as the model dimension grows. Ann Stat 38:1937–1952
Article MathSciNet Google Scholar
Moreno E, Girón FJ, Casella G (2015) Posterior model consistency in variable selection as the model dimension grows. Stat Sci 30:228–241
Article MathSciNet Google Scholar
Morris C (1987) Comment to Casella and Berger (1987). J Am Stat Assoc 82:112–139
Article Google Scholar
Mulder J (2014) Prior adjusted default Bayes factors for testing (in) equality constrained hypotheses. Comput Stat Data Anal 71:448–463
Article MathSciNet Google Scholar
Mulder J, Raftery AE (2019) BIC extensions for order-constrained model selection. Sociol Methods Res. https://doi.org/10.1177/0049124119882459
Article Google Scholar
Wang M, Maruyama Y (2016) Consistency of Bayes factor for nonnested model selection when the model dimension grows. Bernoulli 22:2080–2100
MathSciNet MATH Google Scholar
Wilks SS (1963) Mathematical statistics. Wiley, New York
MATH Google Scholar

Download references

Acknowledgements

The first author of this research was supported by the Junta de Andalucía Grant A-FQM-456-UGR18. We are grateful to two anonymous Referees for their comments that lead to improve the presentation of the article.

Author information

Authors and Affiliations

Department of Statistics and O.R., University of Granada, Granada, Spain
Elías Moreno & Carmen Martínez

Authors

Elías Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Martínez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elías Moreno.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moreno, E., Martínez, C. Bayesian and frequentist evidence in one-sided hypothesis testing. TEST 31, 278–297 (2022). https://doi.org/10.1007/s11749-021-00778-8

Download citation

Received: 26 October 2020
Accepted: 12 May 2021
Published: 08 June 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11749-021-00778-8

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Bayesian and frequentist evidence in one-sided hypothesis testing

Abstract

Similar content being viewed by others

Confidence distributions and hypothesis testing

History and nature of the Jeffreys–Lindley paradox

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

1 Introduction

2 Priors for one-sided testing

Example 1

2.1 The presence of nuisance parameters

2.2 Multiple test

3 Comparing Bayesian and frequentist one-sided test

3.1 Bernoulli model

3.2 Exponential model

3.3 Normal model

4 Asymptotic

Definition 1

Theorem 1

Proof

Corollary 1

Proof

Theorem 2

Proof

5 Concluding remarks

Example 2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Bayesian and frequentist evidence in one-sided hypothesis testing

Abstract

Similar content being viewed by others

Confidence distributions and hypothesis testing

History and nature of the Jeffreys–Lindley paradox

Blending Bayesian and frequentist methods according to the precision of prior information with applications to hypothesis testing

1 Introduction

2 Priors for one-sided testing

Example 1

2.1 The presence of nuisance parameters

2.2 Multiple test

3 Comparing Bayesian and frequentist one-sided test

3.1 Bernoulli model

3.2 Exponential model

3.3 Normal model

4 Asymptotic

Definition 1

Theorem 1

Proof

Corollary 1

Proof

Theorem 2

Proof

5 Concluding remarks

Example 2

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation