1 Introduction

The Full Bayesian Significance Test (FBST) is a novel statistical test of hypothesis published in 1999 by both authors [34] and further extended in Ref. [37, 86]. This solution is anchored by a novel measure of statistical significance known as the e-value, \({\text{ev}}(H\,\vert \,X)\), a.k.a. the evidence value provided by observational data X in support of the statistical hypothesis H or, the other way around, the epistemic value of hypothesis H given the observational data X. The e-value, its theoretical properties and its applications have been a topic of research for the Bayesian Group at USP, the University of São Paulo, for the last 20 years, including collaborators working at UNICAMP, the State University of Campinas, UFSCar, the Federal University of São Carlos, and other universities in Brazil and around the world. The bibliographic references list a selection of contributions to the FBST research program and its applications.

The FBST was specially designed to provide a significance measure to sharp or precise statistical hypothesis, namely, hypotheses consisting of a zero-volume (or zero Lebesgue measure) subset of the parameter space. Furthermore the e-value has many necessary or desirable properties for a statistical support function, such as:

  1. (i)

    Give an intuitive and simple measure of significance for the hypothesis in test, ideally, a probability defined directly in the original or natural parameter space.

  2. (ii)

    Have an intrinsically geometric definition, independent of any non-geometric aspect, like the particular parameterization of the (manifold representing the) null hypothesis being tested, or the particular coordinate system chosen for the parameter space, in short, be defined as an invariant procedure.

  3. (iii)

    Give a measure of significance that is smooth, i.e. continuous and differentiable, on the hypothesis parameters and sample statistics, under appropriate regularity conditions for the model.

  4. (iv)

    Obey the likelihood principle , i.e., the information gathered from observations should be represented by, and only by, the likelihood function, [13, 96, 143].

  5. (v)

    Require no ad hoc artifice like assigning a positive prior probability to zero measure sets, or setting an arbitrary initial belief ratio between hypotheses.

  6. (vi)

    Be a possibilistic support function, where the support of a logical disjunction is the maximum support among the support of the disjuncts, see [133].

  7. (vii)

    Be able to provide a consistent test for a given sharp hypothesis.

  8. (viii)

    Be able to provide compositionality operations in complex models.

  9. (ix)

    Be an exact procedure, i.e., make no use of “large sample” asymptotic approximations when computing the e-value.

  10. (x)

    Allow the incorporation of previous experience or expert’s opinion via (subjective) prior distributions.

The objective of the next two sections is to recall standard nomenclature and provide a short survey of the FBST theoretical framework, summarizing the most important statistical properties of its statistical significance measure, the e-value; these introductory sections follow closely the tutorial [122, appendix A], see also [37].

2 Bayesian statistical models

A standard model of (parametric) Bayesian statistics concerns an observed (vector) random variable, x, that has a sampling distribution with a specified functional form, \(p(x \,\vert \,\theta )\), indexed by the (vector) parameter \(\theta\). This same function, regarded as a function of the free variable \(\theta\) with a fixed argument x, is the model’s likelihood function.

In frequentist or classical statistics, one is allowed to use probability calculus in the sample space, but strictly forbidden to do so in the parameter space, that is, x is to be considered as a random variable, while \(\theta\) is not to be regarded as random in any way. In frequentist statistics, \(\theta\) should be taken as a “fixed but unknown quantity”, and neither probability nor any other belief calculus may be used to directly represent or handle the uncertain knowledge about the parameter.

In the Bayesian context, the parameter \(\theta\) is regarded as a latent (non-observed) random variable. Hence, the same formalism used to express (un)certainty or belief, namely, probability theory, is used in both the sample and the parameter space. Accordingly, the joint probability distribution, \(p(x,\theta )\) should summarize all the information available in a statistical model. Following the rules of probability calculus, the model’s joint distribution of x and \(\theta\) can be factorized either as the likelihood function of the parameter given the observation times the prior distribution on \(\theta\), or as the posterior density of the parameter times the observation’s marginal density,

$$\begin{aligned} p(x,\theta ) = p(x \,\vert \,\theta ) p(\theta ) = p(\theta \,\vert \,x) p(x) \ . \end{aligned}$$

The prior probability distribution \(p_0(\theta )\) represents the initial information available about the parameter. In this setting, a predictive distribution for the observed random variable, x, is represented by a mixture (or superposition) of stochastic processes, all of them with the functional form of the sampling distribution,according to the prior mixing (or weight) distribution,

$$\begin{aligned} p(x) = \int _{\theta } p(x \,\vert \,\theta ) p_0(\theta ) d \theta \ . \end{aligned}$$

If we now observe a single event, x, it follows from the factorizations of the joint distribution above that the posterior probability distribution of \(\theta\), representing the available information about the parameter after the observation, is given by

$$\begin{aligned} p_1(\theta ) \propto p(x \,\vert \,\theta ) p_0(\theta ) \ . \end{aligned}$$

In order to replace the ‘proportional to’ symbol, \(\propto\), by an equality, it is necessary to divide the right hand side by the normalization constant, \(c_1 = \int _\theta p(x \,\vert \,\theta ) p_0(\theta ) d\theta\). This is the Bayes rule, giving the (inverse) probability of the parameter given the data. That is the basic learning mechanism of Bayesian statistics. Computing normalization constants is often difficult or cumbersome. Hence, especially in large models, it is customary to work with unormalized densities or potentials as long as possible in the intermediate calculations, computing only the final normalization constants. It is interesting to observe that the joint distribution function, taken with fixed x and free argument \(\theta\), is a potential for the posterior distribution.

Bayesian learning is a recursive process, where the posterior distribution after a learning step becomes the prior distribution for the next step. Assuming that the observations are i.i.d. (independent and identically distributed) the posterior distribution after n observations, \(x^{(1)},\ldots x^{(n)}\), becomes,

$$\begin{aligned} p_n(\theta ) \ \propto \ p(x^{(n)} \,\vert \,\theta ) p_{n-1}(\theta ) \ \propto \ \prod \nolimits _{i=i}^n p(x^{(i)} \,\vert \,\theta ) p_0(\theta ) \ . \end{aligned}$$

If possible, it is very convenient to use a conjugate prior, that is, a mixing distribution whose functional form is invariant by the Bayes operation in the statistical model at hand. For example, the conjugate priors for the Normal and Multivariate models are, respectively, Wishart and the Dirichlet distributions, see [55, 145].

The founding fathers of the Bayesian school, namely, Reverend Thomas Bayes, Richard Price and Pierre-Simon de Laplace, interpreted the Bayesian operation as a path taken for learning about probabilities related to unobservable causes, represented by the parameters of a statistical model, from probabilities related to their consequences, represented by observed data. Nevertheless, later interpretations of statistical inference, like those of Bruno de Finetti who endorsed the epistemological perspectives of empirical positivism, strongly discouraged such causal interpretations, see [128, 129] for further discussion of this controversy.

The ‘beginnings and the endings’ of the Bayesian learning process deserve further discussion, that is, we should present some rationale for choosing the prior distribution used to start the learning process, and some convergence theorems for the posterior as the number observations increases. In order to do so, we must access and measure the information content of a (posterior) distribution. References [64, 69, 123, 145] explain how the concept of entropy can be used to unlock many of the mysteries related to the problems at hand. In particular, they discuss some fine details about criteria for prior selection and important properties of posterior convergence.

3 The Epistemic e-values

Let \(\theta \in \Theta \subseteq {R}^p\) be a vector parameter of interest, and \(p(x \,\vert \,\theta )\) be the likelihood associated to the observed data x, as in the standard statistical model. Under the Bayesian paradigm the posterior density, \(p_n(\theta )\), is proportional to the product of the likelihood and a prior density,

$$\begin{aligned} p_n(\theta ) \propto p(x \,\vert \,\theta ) \, p_0(\theta ) \ . \end{aligned}$$

A hypothesis H states that the parameter lies in the null set, defined by inequality and equality constraints given by vector functions g and h in the parameter space,

$$\begin{aligned} \Theta _H = \{ \theta \in \Theta \,\vert \,g(\theta ) \le \text{0 }\wedge h(\theta ) = {\text{0} }\} \ . \end{aligned}$$

From now on, we use a relaxed notation, writing H instead of \(\Theta _H\). We are particularly interested in sharp (precise) hypotheses, i.e., those in which there is at least one equality constraint and, therefore, \(\dim (H) < \dim (\Theta )\).

The FBST defines \({\text{ev}}(H)\), the e-value supporting (in favor of) the hypothesis H, and \(\overline{{\text{ev}}}(H)\), the e-value against H, as

$$\begin{aligned} s(\theta ) = {p_n(\theta )}/{r(\theta )} \ , \\ s^* = s(\theta ^*) = \sup \nolimits _{\theta \in H} s(\theta ) \ , \ \ \widehat{s}= s(\widehat{\theta }) = \sup \nolimits _{\theta \in \Theta } s(\theta ) \ , \\ T(v) = \{ \theta \in \Theta \,\vert \,s(\theta ) \le v \} \ , \ \ W(v) = \int _{T(v)} p_n\left( \theta \right) d\theta \ , \\ \overline{T}(v) = \Theta - T(v) \ , \ \ \overline{W}(v) = 1-W(v) \ , \\ \,{\text{ev}}\,(H) = W(s^*)\ , \ \ \overline{\,{\text{ev}}}(H) = \overline{W}(s^*) = 1-{\text{ev}}(H) \ . \end{aligned}$$

The function \(s(\theta )\) is known as the posterior surprise function relative to a given reference density, \(r(\theta )\). W(v) is the cumulative surprise distribution. Due to its interpretation in mathematical and philosophical logic, see [16], W(v) is also known as (the statistical model’s) truth function or Wahrheitsfunktion. The surprise function was used in the context of statistical inference by Good [56], Evans [48], Royall [103] and Schackle [109, 110], among others. Its role in the FBST is to make \({\text{ev}}(H)\) explicitly invariant under suitable transformations on the coordinate system of the parameter space, see next section.

The tangential (to the hypothesis) set \(\overline{T}=\overline{T}(s^*)\), is a Highest Relative Surprise Set (HRSS). It contains the points of the parameter space with higher surprise, relative to the reference density, than any point in the null set H. When \(r(\theta )\propto 1\), the possibly improper uniform density, \(\overline{T}\) is the Posterior’s Highest Density Probability Set (HDPS) tangential to the null set H. Small values of \(\overline{\text{ev}}(H)\) indicate that the hypothesis traverses high density regions, favoring the hypothesis.

Notice that, in the FBST definition, there is an optimization step and an integration step. The optimization step follows a typical maximum probability argument, according to which, “a system is best represented by its highest probability realization”. The integration step extracts information from the system as a probability weighted average. Many inference procedures of classical statistics rely basically on maximization operations, while many inference procedures of Bayesian statistics rely on integration (or marginalization) operations. In order to achieve all its desired properies, the FBST procedure has to use both operation types.

3.1 Nuisance parameters

Let us consider the situation where the hypothesis constraint, \(H:\ h(\theta )=h(\delta )=0\ , \ \ \theta =[\delta ,\lambda ]\) is not a function of some of the parameters, \(\lambda\). This situation is described in [11] by Debabrata Basu as follows:

If the inference problem at hand relates only to \(\delta\), and if information gained on \(\lambda\) is of no direct relevance to the problem, then we classify \(\lambda\) as the Nuisance Parameter. The big question in statistics is: How can we eliminate the nuisance parameter from the argument?

Basu goes on listing at least 10 categories of procedures to achieve this goal, like using \(max_\lambda\) or \(\int \ d\lambda\), the maximization or integration operators, in order to obtain a projected profile or marginal posterior function, \(p(\delta \,\vert \,x)\). The FBST does not follow the nuisance parameters elimination paradigm, working in the original parameter space, in its full dimension.

3.2 Reference prior and invariance

In the FBST the role of the reference density, \(r(\theta )\) is to make \(\overline{{\text{ev}}}(H)\) explicitly invariant under suitable transformations of the coordinate system. The natural choice of reference density is an uninformative prior, interpreted as a representation of no information in the parameter space, or the limit prior for no observations, or the neutral ground state for the Bayesian learning operation. Standard (possibly improper) uninformative priors include the uniform, maximum entropy densities, or Jeffreys’ invariant prior. Finally, invariance, as used in statistics, is a metric concept, and the reference density can be interpreted as induced by the statistical model’s information metric in the parameter space, \(dl^2=d\theta 'G(\theta )d\theta\), see [2, 12, 17, 49, 55, 64, 69, 70, 145] for a detailed discussion. Jeffreys’ invariant prior is proportional to the square root of the information matrix determinant, \(p(\theta )\propto \sqrt{\text{ det } G(\theta )}\).

3.3 Proof of invariance

Consider a proper (bijective, integrable, and almost surely continuously differentiable) reparameterization \(\omega =\phi (\theta )\). Under the reparameterization, the Jacobian, surprise, posterior and reference functions are:

$$\begin{aligned} J(\omega ) = \left[ \frac{\partial \,\theta }{\partial \,\omega } \right] = \left[ \frac{\partial \,\phi ^{-1}(\omega )}{\partial \,\omega } \right] = \left[ \begin{array}{ccc} \frac{\partial \,\theta _1}{\partial \,\omega _1} &{}\quad \ldots &{}\quad \frac{\partial \,\theta _1}{\partial \,\omega _n} \\ \vdots &{}\quad \ddots &{}\quad \vdots \\ \frac{\partial \,\theta _n}{\partial \,\omega _1} &{}\quad \ldots &{}\quad \frac{\partial \,\theta _n}{\partial \,\omega _n} \end{array} \right] \\ \widetilde{s}(\omega ) = \frac{ \widetilde{p}_n(\omega ) }{ \widetilde{r}(\omega ) } = \frac{ p_n(\phi ^{-1}(\omega )) \left| J(\omega ) \right| }{ r(\phi ^{-1}(\omega )) \left| J(\omega ) \right| } \end{aligned}$$

Let \(\Omega _H = \phi (\Theta _H)\). It follows that

$$\begin{aligned} \widetilde{s}^* = \sup _{\omega \in \Omega _H} \widetilde{s}(\omega ) = \sup _{\theta \in \Theta _H} s(\theta ) = s^* \end{aligned}$$

hence, the tangential set, \(\overline{T}\mapsto \phi (\overline{T}) = \widetilde{\overline{T}}\), and

$$\begin{aligned} \widetilde{\overline{\text{ev}}}(H) = \int _{\widetilde{\overline{T}}} \widetilde{p}_n(\omega ) d \omega = \int _{\overline{T}} p_n(\theta ) d \theta = \overline{{\text{ev} }}(H) . \end{aligned}$$

3.4 Asymptotics and consistency

Let us consider the cumulative distribution of the evidence value against the hypothesis, \(\overline{V}(c)= {\text{ Pr }}(\,\overline{\text{ev}}\le c)\), given \(\theta ^0\), the true value of the parameter. Under appropriate regularity conditions, for increasing sample size, \(n\rightarrow \infty\), we can say the following:

  • If H is false, \(\theta ^0\notin H\), then \(\overline{{\text{ev}}}\) converges (in probability) to 1, that is, \(\overline{V}(0\le c <1)\rightarrow 0\).

  • If H is true, \(\theta ^0\in H\), then \(\overline{V}(c)\), the confidence level, is approximated by the function

$$\begin{aligned} QQ(t,h,c) = {\text{ Q }}\left( t-h, {\text{ Q }}^{-1}\left( t,c\right) \right) \ , \ \ {\text{ where }} \\ {\text{ Q }}(k,x) = \frac{\Gamma (k/2, x/2)}{\Gamma (k/2, \infty )} \ , \ \ \Gamma (k,x) = \int _0^x y^{k-1}e^{-y} dy \ , \end{aligned}$$

\(t=\dim (\Theta )\), \(h=\dim (H)\) and \({\text{ Q }}(k,x)\) is the cumulative chi-square distribution with k degrees of freedom.

Under the same regularity conditions, an appropriate choice of threshold or critical level, c(n), provides a consistent test, \(\tau_c\), that rejects the hypothesis if \(\overline{{\text{ev}}}(H) > c\). The empirical power analysis developed in [76, 135] provides critical levels that are consistent and also effective for small samples.

3.5 Proof of consistency

Let \(\overline{V}(c)= {\text{ Pr }}(\,\overline{{\text{ev}}}\le c )\) be the cumulative distribution of the evidence value against the hypothesis, given \(\theta\). We stated that, under appropriate regularity conditions, for increasing sample size, \(n\rightarrow \infty\), if H is true, i.e. \(\theta \in H\), then \(\overline{V}(c)\), is approximated by the function

$$\begin{aligned} QQ(t,h,c) = {\text{ Q }}\left( t-h, {\text{ Q }}^{-1}\left( t,c\right) \right) \ . \end{aligned}$$

Let \(\theta ^0\), \(\widehat{\theta }\) and \(\theta ^*\) be the true value, the unconstrained MAP (Maximum A Posteriori), and constrained (to H) MAP estimators of the parameter \(\theta\).

Since the FBST is invariant, we can chose a coordinate system where, the (likelihood function) Fisher information matrix at the true parameter value is the identity, i.e., \(J(\theta ^0)=I\). From the posterior Normal approximation theorem, see [55], we know that the standarized total difference between \(\widehat{\theta }\) and \(\theta ^0\) converges in distribution to a standard Normal distribution, i.e.

$$\begin{aligned} \sqrt{n}(\widehat{\theta }-\theta ^0) \rightarrow N\left( 0, J(\theta ^0)^{-1} J(\theta ^0) J(\theta ^0)^{-1} \right) \\ = N\left( 0, J(\theta ^0)^{-1} \right) = N\left( 0, I \right) \end{aligned}$$

This standarized total difference can be decomposed into tangent (to the hypothesis manifold) and transversal orthogonal components, i.e.

$$\begin{aligned} d_t = d_h + d_{t-h} \ , \ \ dt = \sqrt{n}(\widehat{\theta }-\theta ^0) \ , \\ d_h = \sqrt{n}(\theta ^* -\theta ^0) \ , \ \ d_{t-h} = \sqrt{n}(\widehat{\theta }-\theta ^*) \ . \end{aligned}$$

Hence, the total, tangent and transversal distances (\(L^2\) norms), \(||d_t||\), \(||d_h||\) and \(||d_{t-h}||\), converge in distribution to chi-square variates with, respectively, t, h and \(t-h\) degrees of freedom.

Also from, the MAP consistency, we know that the MAP estimate of the Fisher information matrix, \(\widehat{J}\), converges in probability to true value, \(J(\theta ^0)\).

Now, if \(X_n\) converges in distribution to X, and \(Y_n\) converges in probability to Y, we know that the pair \([X_n, Y_n]\) converges in distribution to [XY]. Hence, the pair \([||d_{t-h}||, \widehat{J}]\) converges in distribution to \([x, J(\theta ^0)]\), where x is a chi-square variate with \(t-h\) degrees of freedom. So, from the continuous mapping theorem, the evidence value against H, \(\overline{{\text{ev}}}(H)\), converges in distribution to \(\overline{e}={\text{ Q }}(t,x)\), where x is a chi-square variate with \(t-h\) degrees of freedom.

Since the cumulative chi-square distribution is an increasing function, we can invert the last formula, i.e., \(\overline{e}={\text{ Q }}(t,x)\le c \Leftrightarrow x \le {\text{ Q }}^{-1}(t,c)\). But, since x in a chi-square variate with \(t-h\) degrees of freedom,

$$\begin{aligned} {\text{ Pr }}(\overline{e} \le c) = QQ(t,h,c) = {\text{ Q.E.D. }} \end{aligned}$$

A similar argument, using a non-central chi-square distribution, proves the other asymptotic statement.

3.6 Decisions: reject H, remain neutral, or accept

In this subsection we briefly discuss the important question of deciding when to Accept, or Reject, or remain Neutral about a statistical hypothesis H, given observed data X. We start our discussion elaborating on the asymptotic results derived in the last sub-section.

If a random variable, x, has a continuous cumulative distribution function, F(x), its probability integral transform generates a uniformly distributed random variable, \(u=F(x)\), see [5]. Hence, the tranformation \(\overline{\text{sev}}=QQ(t,h,\overline{\text{ev}})\), defines a “standarized e-value”, \({\text{sev}}=1-\overline{\text{sev}}\), that can be used somewhat in the same way as a p-value of classical statistics. This standarized e-value may be a convenient value to report, since its asymptotically uniform distribution (under H) provides a large-sample limit interpretation, and many researchers will feel already familiar with consequent diagnostic procedures for scientific hypotheses based on adequately large empirical data-sets. In particular, a researcher may use cut-off thresholds already familiar to the him when dealing with p-values. Efficient procedures for computing empirical cut-off thresholds that are effective for small size data sets are developed in [14, 76,77,78,79].

Traditionally, statisticians are used to establish a dichotomy: Reject/ Accept (technically, Not-Reject) H if the significance measure in use is below or above the established cut-off threshold. Nevertheless, a thorough analysis of consistent desiderata for logical properties of such a decision procedure take us to an unavoidable conclusion: The classical Reject/ Accept dichotomy must be replaced by a trichotomy, namely, Reject/ remain Neuter (a.k.a remain undecided or agnostic)/ Accept H if, respectively, \(0 \le {\text{sev}}(H\,\vert \,X) < c_1\), \(c_1 \le {\text{sev}}(H\,\vert \,X) < c_2\), or \(c_2 \le {\text{sev}}(H\,\vert \,X) \le 1\), where \(0< c_1< c_2 < 1\); For an extensive and detailed analysis of consistent desiderata for statistical test procedures, see [63, 112].

The study of such logical desiderata was in part motivated as a way to contrast the statistical properties of the FBST with other statistical tests of hypotheses. Surprisingly, it is possible to travel this path in the opposite direction, that is, it is possible to start from consistent desiderata for logical properties of statistical tests and, from those, derive a complete characterization of a class of statistical significance measures and hypothesis tests that coherently generalizes the FBST, see [46, 47, 131] for further details. Moreover, this Generalized FBST finds interesting applications in metrology and related fields, were reliable bounds for the precision of experimental measurements can be obtained from sources external to the statistical experiment designed to test the hypothesis under scrutiny. Finally, this kind of detailed error analysis for crucial scientific experiments finds valuable applications in the fields of metrology, epistemology, and philosophy of science, see [47] and future research.

4 A survey of FBST related literature

A systematic cataloging of all published articles related to this research program is beyond the scope of this article; in the next subsections we survey a selection of such articles. The selected articles provides a sample covering diverse areas like statistical theory and methods, applications to statistical modeling and operations research, and research in foundations of statistics, logic and epistemology. This selection is certainly biased, favoring the the authors’ personal taste or involvement.

4.1 Statistical theory

Several authors have developed the statistical theory that provides the mathematical formalism and demonstrates the outstanding statistical properties FBST and its significance measure, the e-value. The following articles have explored and developed these themes of research:

  • Reference [34] is the first article of this research program. It presents the basic definition of the e-value and the FBST, and gives several simple and intuitive applications. Reference [86] provides an explicitly invariant version of the inference procedures. After a long process in which the authors had to overcome objections raised by influential mainstream Bayesian thinkers, [37] was published in the flagship journal of ISBA - the International Society for Bayesian Analysis. Reference [30] provides an entry on the FBST in the International Encyclopedia of Statistical Science.

  • References [6, 7, 32, 77,78,79] give and extensive treatment for the case of nonnested and separate hypotheses, including a detailed analysis of some Bayesian classifiers.

  • References [28, 44, 84, 94, 95, 113] establish several theoretical or empirical relations between the the e-value and alternative significance measures.

  • References [19, 98, 99, 104, 105, 137,138,139] develop higher order asymptotic approximations of (log) likelihood and pseudo-likelihood functions that, in turn, are used do develop high-precision but fast computational algorithms for calculating e-values in parametric models. The availability of a good library of such fast and reliable computer programs will, in turn, we believe, facilitate the incorporation of the FBST in statistical softwares intended for end-users or routine applications.

4.2 Statistical modeling

Several authors have developed a wide range of applications of the FBST to statistical modeling and operations research. The following articles have explored and developed these themes of research:

  • Reference [35] applies the FBST to software compliance testing and certification.

  • Reference [76] provides a unified and coherent treatment to a large class of structural models based on the multivariate normal distribution. Previously, sub-classes of these models had to be handled individually using tailor-made tests. Reference [141] gives some simple applications.

  • References [24, 42, 43, 142] develop or use unit root and cointegration testing for time series. The FBST is shown to be more reliable and effective than several previously published tests, without the need of any ad hoc artifices, like specially designed artificial priors (an obvious oxymoron).

  • References [61, 88, 102] apply the FBST to failure analysis and systems’ reliability.

  • Reference [22] applies the FBST to detect equilibrium conditions, or the lack thereof, in market prices of economic commodities or financial derivative contracts.

  • Reference [25] applies the FBST in the context of empirical economic studies.

  • Reference [54] use the FBST for selection and testing of statistical copulas.

  • References [57, 135] consider applications using generalizations of the Poisson distribution.

  • References [58,59,60] use the FBST for signal processing and detection of acoustic events.

  • Reference [114] applies the FBST to model selection in statistical studies conducted under informative sampling conditions.

  • References [1838, 62, 80, 83, 87, 91, 144] use the FBST to verify Hardy-Weinberg equilibrium conditions, and other applications of statistical modeling in the area of genetics.

  • References [103, 29, 75, 100, 101] use the FBST to test parametric hypotheses related to generalized Brownian motions, continuous or jump diffusions, extremal distributions, persistent memory and other stochastic processes.

  • References [4, 14, 39, 93] develop theory or applications of the FBST for statistical hypotheses related to independence in contingency tables and other multinomial models.

  • References [201, 21, 31, 36, 41, 74, 82, 107, 108, 115] apply the FBST for checking hypotheses in statistical models applied to biological sciences, ecology, environmental sciences, medical diagnostics and efficacy evaluation, psychology and psychiatry.

  • References [23, 65] apply the FBST to test hypotheses in astronomy and astrophysics.

4.3 Foundations of statistics, logic and epistemology

Traditional significance measures used in statistics are always designed to work in tandem with a specific epistemological framework that gives them an appropriate interpretive context and support. For example, p-values are usually presented in the context of the “judgment metaphor” and the deductive falibilism epistemological framework, as developed by the philosopher Karl Popper, among others. Meanwhile, Bayes factors are presented in the context of the “gambling metaphor” and utility based decision theory, as developed by Bruno de Finetti, see Ref. [40, 45, 66, 67]. Furthermore, the logic or algebraic properties of each significance measure, in its appropriate domain of statistical hypotheses, must be mutually supportive and compatible with intended interpretations. The following articles have explored and developed these themes of research:

  • References [85, 112, 136] analyze the FBST from a decision-theoretic Bayesian perspective. The first paper proves the “Bayesianity” of the FBST, in the sense its inference procedures can be derived by minimization of an appropriate loss function.

  • References [86, 133] compare the theoretical properties of the e-value with those of traditional significance measures, like the p-value and Bayes Factors. These articles analyze in great detail historical arguments given by celebrated statisticians against the use of procedures based on highest density probability sets. Among those that opposed such ideas is Dennis Lindley, an influential figure at IME-USP and a personal friend of the first author. Finally, Ref. [86, 133] analyze historical desiderata for an acceptable Bayesian significance test that were formulated by the frequentist statistician Oscar Kempthorne to the first author, and show how the FBST successfully achieves all these desired characteristics.

  • Reference [16] analyzes the composition of hypotheses defined in independent statistical models and the corresponding composition rules for e-values and truth functions.

  • Reference [140] studies significance measures for evidence amalgamation and meta-analysis.

  • References [46, 47, 51, 63, 131] analyze conditions of logical consistency for significance measures and test procedures for several hypotheses defined in the same statistical model. Conversely, these articles fully characterize some (agnostic or trivalent) generalizations of the FBST as the only statistical tests satisfying such logical consistency conditions.

  • References [119,120,121, 123,124,125,126,127] develop the Objective Cognitive Constructivism as an epistemological framework formally compatible and semantically amenable to the e-value significance measure and the FBST hypothesis test.

  • Reference [27] analyzes solutions to the problem of (statistical) induction, including Bayesian perspectives in general and the FBST in particular.

  • References [59, 97, 117, 118, 129] apply concepts related to the FBST or the Objective Cognitive Constructivism epistemological framework to the study of economic or legal systems.

  • References [128, 130] analyze the philosophical premises used by Karl Pearson to define the p-value and to establish the epistemological foundations of frequentist statistics; why Pearson’s work and the subsequent work of Bruno de Finetti reversed previous commitments of Bayesian statistics; and how the FBST can be seen a way to reenter the path envisioned by the founding fathers of the Bayesian school, namely, Reverend Thomas Bayes, Richard Price and Pierre-Simon de Laplace.

  • References [15, 50, 81, 89, 106, 121] analyze the role of randomization procedures in the context of the Objective Cognitive Constructivism epistemological framework in particular, and in Bayesian statistics in general.

5 Future research and final remarks

The FBST research program has grown and spread far and wide, in some directions suggested by these authors, and also in other directions that were for us completely unforeseen and wonderfully surprising. We are confident that this research program will continue to flourish and expand, exploring new areas of theory and application. The authors would like to suggest a few topics (focusing on theoretical and applied statistics) worthy of further attention as possible entry points for those interested (be all welcome) in joining this research program:

  1. (1)

    In the context of information based medicine, see [31], it is important to compare and test the sensibility and specificity of alternative diagnostic tools, access the bio-equivalence of drugs coming from different suppliers, identify and test the efficacy of possible genetic markers for clinical conditions, etc. How to combine fast and computationally inexpensive heuristic algorithms and reliable statistical test procedures to best handle these and similar problems?

  2. (2a)

    Influence diagrams are a powerful tool for decision modeling, see [9, 33]. Nevertheless, it is often hard to select optimal diagrams to model complex applications, see for example [41, 111]. How can the FBST best be used for sequential or concomitant inclusion/ exclusion of links or edge selection in influence graphs?

  3. (2b)

    The aforementioned questions also arise in the context of Bayesian networks. In this context, it is important not only to test the significance of individual edges, but also to test the integrity of higher level sparsity structures, like the network click structure or its block factors, see [116, 121, 132, 134].

  4. (3)

    The e-value and the FBST were originally developed for parametric models. How can the e-value be used, interpreted, computed (and maybe generalized) in semi-parametric or non-parametric settings? For instance, in models using functional bases, how can we test speeds of convergence for series expansions?

  5. (4a)

    The compositionality rules established in [16] are based on functional operations over the truth functions, W(v). [8, 71] present similar rules (for serial-parallel composition) in the context of reliability theory. Can these theories be seen as particular cases of more general and abstract logical formalisms?

  6. (4b)

    The same compositionality rules assume independence between distinct models in a given structure. Could statistical copulas, see [52, 54, 68, 92], be used to successfully capture weak dependencies between distinct truth functions?

  7. (5)

    The conditions for pragmatic acceptance of sharp hypotheses stated in [47] depend on consensual bounds for background uncertainties. For universal physical constants, metrologists establish such bounds by aggregating results of diverse experiments; similar situations occur in meta-analysis studies. Several statistical methods have been proposed to aggregate such diverse data-sets, see [26, 53, 72, 73, 90]. What are the best ways to coherently establish and represent aggregate uncertainty bounds in the FBST framework?