Abstract
The uniform prior probability density for the means of normal data leads to inconsistent Bayesian inference of their mean power and jeopardizes the possibility of selecting among different models that explain the data. We reinvestigated the problem avoiding delivering unrecognised information and looking at it in a novel way. Namely, to consider a finite power, we used a normal prior minimally diverging from the uniform one, hyperparameterised by the mean and variance, and left the data to choose the most supported parameters. We also obtained an extended James–Stein estimator averaging the hyper-parameters and avoiding empirical Bayes techniques.
Graphical abstract
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The Bayes theorem encodes the measurement uncertainty in the probability assignments to the possible values of a measurand. It also applies to encoding the lack of certainty of the data model. If the data explanations are questionable and different models compete, their posterior probabilities provide the framework to assess the model uncertainties. The model likeliness is proportional to the data evidence (or marginal likelihood). Since the evidence must remain unchanged after changes in the model parameters, the prior distributions of different parameterisations must be proper and comply with the change-of-variable rule.
When no objective prior information is available, subjective inferences provide the required compliance with the transformation of the prior distributions under parameter changes. Alternatives, discussed by [16] and known as Jeffreys’ priors, are distributions obtained as a model functional encoding the symmetries of the measurand link with the data. Others, developed by [5] and [3], maximise the mutual information between the data and measurand, which measures how much we learn about the measurand from the data.
If improper, these distributions can lead to inconsistencies. For example, the Jeffreys’ uniform prior over the reals for the mean of Gaussian data leads to an inconsistent inference of their mean power, see [4, 5, 12, 13, 28].
We consider the simultaneous inferences of the individual means of multivariate Gaussian data, individual means’ squares, and average means’ squares. Supposing the data are the measured values of a discrete-time signal, we will refer to the lasts as powers. The uniform distribution for the means’ prior delivers information about their squares. To avoid introducing this information into the problem and taking a finite power into account, we investigate a novel solution which extends the approaches of [1, 8, 9, 24, 25].
Our investigations were prompted by related problems in the estimate of the power of signals and the modulus of complex quantities from measurement results affected by additive uncorrelated Gaussian errors, which were lengthy discussed in [2, 6, 14, 15, 19, 32].
The manuscript is organised as follows. Section 2 states the problem, identifies its origin, and outlines a solution. Next, we overview hierarchical modelling and averaging. In Sect. 4, apply them to derive consistent inferences of the data power. Numerical examples are given in Sect. 5.
All the integrations were carried out in terms of standard mathematical functions with the aid of Mathematica [30]. The relevant notebook is given as supplementary material. To read and interact with it, download the Wolfram Player free of charge [31].
2 Stein paradox
2.1 Problem statement
In the simplest form of the Stein paradox, \(x_i \sim N(x_i|\mu _i,\sigma =1)\) are observations of independent normal variables with unknown means \(\mu _i\) and known variance \(\sigma ^2\), for \(i = 1, 2,..., m\). With a somewhat inconsistent use of notation, we will use the same symbol to indicate both the random variables and the labels of their possible values. To keep the algebra simple, without loss of generality, we use units where \(\sigma =1\).
For example, \(x_i\) are samples of a \(\mu (t)\) signal affected by additive white Gaussian noise, where the uncertainty for each datum is known, as in [6] and [2]. Suppose that, in addition to the measurands \(\mu _i = \mu (t_i)\), we are also interested in the signal power, that is to \(\theta ^2=|{\varvec{\mu }}|^2/m\), where \({\varvec{\mu }}= \{\mu _1, \mu _2,...\, \mu _m\}\) [32].
The Jeffreys prior of every instantaneous signal \(\mu _i\) is the uniform distribution over the reals \(U_\infty (\mu _i)\propto \) const., resulting in independent normal posteriors, \(\mu _i \sim N(\mu _i|x_i,\sigma =1)\). By changing variable in the posterior distributions, \(\mu _i^2\) are independent non-central \(\chi ^2_1\) variables having one degree of freedom, mean \(1 + x_i^2\), and variance \(2(1+2 x_i^2)\). Similarly, \(m\theta ^2\) is a non-central \(\chi ^2_m\) variable having m degrees of freedom. Therefore, it follows that the posterior mean and variance of \(\theta ^2\) are
where \(\textbf{x}=\{x_1, x_2,\, ...\, x_m\}\), \(\overline{x^2} = |\textbf{x}|^2/m\), and \(\mathscr {M}_\infty \) is the data model assuming the uniform prior, and
From a frequentist viewpoint, since \(m\overline{x^2}\) is a noncentral \(\chi ^2_m\) variable having m degrees of freedom, \(\overline{x^2}\) is a biased estimator of \(\theta ^2\). In fact,
and
As m tends to infinity, provided that, \(\theta ^2\) and, consequently, \(\overline{x^2}\) converge, a bad situation occurs: (1a) and (1b) jointly predict that \(\theta ^2\) is certainly \(\overline{x^2} + 1\), but, at the same time, (2a) and (2b) jointly predict that \(\theta ^2\) is certainly \(\overline{x^2}-1\).
2.2 Paradox explanation
The \(U_\infty (\mu _i)\propto \) const. usage can be justified by showing that the \(\mu _i \sim N(\mu _i|x_i,\sigma =1)\) posterior is a suitable limit obtained from proper priors having increasingly large variance [3]. The difference between the Bayesian and frequentist certainties is due to the information encoded in \(U_\infty (\mu _i)\), which makes the data explanations underlying the Bayesian and frequentist analyses different.
As most of its mass is at infinite, \(U_\infty (\mu _i)\) encodes that \(|\mu _i|\) is greater than any positive number. This information is irrelevant to \(\mu _i\); but, it is not for \(\mu _i^2\). It is worth noting that, if really \(\mu _i \sim U_\infty (\mu _i)\), then \(\overline{x^2}\) and \(\theta ^2\) must diverge and the inconsistency disappears. Therefore, the problem originates in the conflict between the information encapsulated in \(U_\infty (\mu _i)\) and the data, see also [11]. Also, if we are not happy with the difference between (1a) and (2a), it means that we believe that \(\theta ^2\) and, consequently, \(\mu _i^2\) are finite.
2.3 Proposed solution
To remove the inconsistency between (1a) and (2a), we must take the assumption that \(|\mu _i|\) is bounded into account and use a proper prior. Therefore, extending [1], we set \(\mu _i|a_i,b_i \sim N(\mu _i|a_i,b_i)\), where the mean and standard deviation \(a_i\) and \(b_i\) are hyper-parameters and the \(b_i \rightarrow \infty \) limit is the uniform distribution. The normal distribution has been chosen because it has the minimum relative entropy concerning the uniform one under a fixed variance [26, 27].
The simplest way to encode \(|\mu _i|<\infty \) is setting \(a_i=a\) and \(b_i=b\) for all the samples, which condition is sufficient, though not necessary, for the convergence. If the samples are indistinguishable, i.e., the data labelling is unknown, assigning the same mean and variance is reasonable.
Hence, we assume the prior
According to the Bayesian viewpoint, firstly, the \({\varvec{\mu }}\) measurands are sampled from the \(\pi ({\varvec{\mu }}|a,b)\) prior, then the \(\textbf{x}\) data are sampled from the \(N(\textbf{x}|{\varvec{\mu }},\sigma =1)\) distribution. Therefore, different priors identify different models, and we can let the data select the most likely.
Since the prior (3) encodes the belief that all the measurands (e.g., the instantaneous signals \(\mu _i\)) have the same mean, it will originate a shrinkage of the means’ inferences on the sample mean. Our prior choice is not new. However, previous investigations, for instance, [4, 12, 24], set \(\mu _i|b \sim N(\mu _i|0,b)\), which encodes the strongest belief that all the measurands are expected to have a zero mean.
3 Outline of the Bayesian model selection
To determine the most likely prior, we let the data choose. Let \(N(\textbf{x}|{\varvec{\mu }},\sigma =1)\) be the multivariate distribution of the \(\textbf{x}\) data. Hence, the hierarchical models competing to explain the data are
where \(\pi ({\varvec{\mu }}|a,b)\) is the prior distribution (3) and a and b index the models. The posterior distribution of the measurands is
The marginal likelihood or evidence,
is the sampling distribution \(\textbf{x}\) given the model indexed by a and b, but no matter what the values of \({\varvec{\mu }}\) – or of others model parameterisations – might be.
The marginal likelihood can be used to compare the competing models by their probability \(Q(a,b|\textbf{x})\) as provided by the data,
where \(\varpi (a,b)\) is the prior probability of a, b and the integration is carried out on its support.
The model-averaged posterior of \({\varvec{\mu }}\) is
which can also be obtained by marginalising (5) for the hyper-parameters. The model uncertainty can be embedded in the expected value of the data mean by
where
4 Application to the Stein paradox
4.1 Posteriors of the instantaneous signals
By application of (5) and (6), the prior (3) results in independent and identically distributed \(\mu _i\) having marginal likelihood
and normal posterior
where
and
are the posterior mean and variance of \(\mu _i\). It is worth noting that if \(b\rightarrow \infty \) then \(\overline{\mu _i} = x_i\) and \(\sigma _\mu ^2 = 1\). The relevant integrations are given in the supplementary material.
4.2 Hyper-prior
We assign prior probabilities to a and b to continue the analysis. The sampling distribution of \(\textbf{x}\) given a and b is (see the supplementary material)
where \(\overline{x}=\sum _{i=1}^m x_i/m\) and \(s_x^2=\sum _{i=1}^m (x_i-\overline{x})^2/m\) are the sample mean and (biased) sample variance, respectively.
Since any of the models (4) is uncertain, to offer evidence or disprove that it explains the data, they must allow for comparisons. This requires proper prior distributions of different parameterisations and compliance with the change-of-variable rule. In the absence of measurable information, the normalised Jeffreys’ hyper-prior (see the supplementary material),
where \(b>0\) and \(V_a\) is the length of the a’s domain, does the work. It is worth noting that (15) preserves the convergence of \(\theta ^2\) when \(m\rightarrow \infty \).
4.3 Model probabilities
By application of (7) with (14) and (15), the probability of a, b explaining the data is (see the supplementary material)
where \(u_x^2=ms_x^2/2\) and \(\Gamma (a,z_1,z_2)\) is the generalized incomplete gamma function [20]. Since \(V_a\) simplifies in the \(Q(a,b|\overline{x},s_x^2)\) calculation and provided it is large enough to approximate (7) by extending the integration over a to the reals, there is no need to introduce additional undefined parameters.
Figure 1 shows (16) when \(m=1\) (top) and \(m=20\) and \(s_x = 2\) (bottom). It is worth noting that \(Q(a,b|\overline{x},s_x^2)\) depends only on the sample mean and biased variance; its mode is \(a=\overline{x}\) and, as \(m \rightarrow \infty \), \(b^2=s_x^2-1\) (see the supplementary material).
4.4 Expectations of the instantaneous signals
4.4.1 \(m=1\) case
By application of (7) with (11) and (15), the probability density reduces to
which can also be obtained as the \(s_x^2 \rightarrow 0\) limit of (16) evaluated for \(m=1\) (see the supplementary material). The most supported model is indexed by \(a=x_1\) and \(b=1/\sqrt{3}\), whereas the \(b\rightarrow \infty \) model – corresponding to the uniform prior – is excluded.
According to (8), after averaging the \(\mu _1\) posterior (12) over the model probability (17),
i.e. the distribution of \(\mu _1\) is a normal distribution having mean \(x_1\) and unit variance, and the distribution of \(\mu _1^2\) is a non-central \(\chi _1^2\) distribution, having one degree of freedom and non-centrality parameter \(x_1^2\) (see the supplementary material). These are important and non-trivial results. They demonstrate that the hierarchical models (4) are consistent with the uniform prior (see Sect. 2.1) and that there is no hyper-prior effect on the posterior distributions of \(\mu _1\) and \(\mu _1^2\), as it occurs in [4].
4.4.2 \(m \ge 2\) case
By substitution of (13a) and (16) into (9), the model-averaged value of \(\mathbb {E}(\mu _i|x_i,a,b)\) is (see the supplementary material)
where
\(u_x^2=ms_x^2/2\) and \(\Gamma (a,z_1,z_2)\) is the generalised incomplete gamma function [20]. Given the belief encoded in the priors, this inference minimises the (Bayesian) quadratic risk. It belongs to the estimator class considered in [21], but it does not comply with the condition required to dominate (from a frequentist perspective) the James–Stein estimator. In this regard, we note that this paper is not about the dominance over the James–Stein estimators, but about highlighting and encoding the belief \(\theta ^2<\infty \).
If \(m=1\) then \(s_x = 0\) and \(\overline{x}=x_i\). In this case, the last term of (19a) vanishes and the mean is the observed value (see the \(m=1\) case). Also, since its value is irrelevant, we set \(R(m=1)\) conventionally to zero (incidentally, the \(s_x\rightarrow 0\) limit is 1/3). As shown in Fig. 2, the mean (19a) is between \(x_i\) and the sample mean \(\overline{x}\). This behaviour follows from the mild assumption of a constant \(\mu _i\) (see Sect. 2.3) encapsulated in the most supported data model.
As shown in the supplementary material, when \(s_x^2 \ll 1\), (19a) is approximated by
which, as the sample size increases and \(s_x \rightarrow 0\), tends to \(\overline{x}\). When \(s_x^2 \gg 1\), (19a) is approximated by
and supports the \(x_i\) datum. For many observations, we obtain
As shown in the supplementary material, when \(m\rightarrow \infty \), it is certain that \(s_x^2 \ge 1\). These asymptotic expressions are consistent with the expectation that a sample variance larger than the data variance (which was set to one) supports a varying signal and a smaller one the opposite.
4.5 James–Stein estimate
Empirical Bayes methods set the a and b hyper-parameters in (13a) to specific values, see [1], instead of integrating them out. For instance, if in (13a) and following [8] we set a to its posterior mode \(\overline{x}\) and \(1+b^2\) to \(ms_x^2/(m-3)\), then \(\overline{\mu _i}\) reduces to the (positive) James–Stein estimate given in [9, 10],
the first line of which is derived in the supplementary material and shown in Fig. 2 for \(m=5\). The replacement in the second line avoids pulling the estimate away from the \([x_i,\overline{x}]\) interval, see [10]. The reason for the \(1+b^2=ms_x^2/(m-3)\) choice resides in the fact that \((m-3)/(ms_x^2)\) is an unbiased estimator of \(1/(1+b^2)\), see [8].
In Sect. 4.3 we have shown that, as \(m\rightarrow \infty \), the \(b^2\) mode is \(s_x^2-1\), see (16). By using this value in (13a), we obtain (see the supplementary material) the same asymptotic limit of (21).
4.6 Expectations of the instantaneous powers
From (12), it follows that the normalized powers \((\mu _i/\sigma _\mu )^2\) are independent non-central \(\chi _1^2\) variables having one degree of freedom and non-centrality parameter \(\lambda _i=(\overline{\mu _i}/\sigma _\mu )^2\), where \(\overline{\mu _i}\) and \(\sigma _\mu ^2\) are given by (13a) and (13b), respectively. Hence,
By taking the mean of the non-central \(\chi _1^2\) distribution and the \(\mu _i/\sigma _\mu \) normalisation into account, the posterior means of the \(\mu _i^2\) powers is
and, by application of (9) to (23), its model-averaged value is
where (see the supplementary material)
\(u_x^2=ms_x^2/2\), and \(\Gamma (a,z_1,z_2)\) is the generalised incomplete gamma function [20].
The asymptotic means of the \(\mu _i^2\) powers are derived in the supplementary material. When \(s_x^2 \ll 1\) the data support equal \(\mu _i\),
where we used \(x_i \rightarrow \overline{x}\), and the mean shrinks to \(\overline{x}^2\). If \(m=1\), then \(s_x=0\) and the mean is \(x_1^2+1\). A large sample variance supports different \(\mu _i\) and
Eventually, when m tends to the infinity,
where \(s_x^2 \ge 1\), see the supplementary material. Since \(x_i\), \(\overline{x}\), and \(s_x^2\) are not independent, a general graphical display of (24a) is impossible. To make it feasible, in Fig. 3, we considered the \(x_i=\overline{x}\) case.
4.7 Expectation of the mean power
Let us turn the attention to the mean power \(\theta ^2=|{\varvec{\mu }}|^2/m\). From (12), \(m\theta ^2/\sigma _\mu ^2\) is a non-central \(\chi _m^2\) variable having m degrees of freedom and non-centrality parameter \(\lambda =\sum (\overline{\mu _i}^2/\sigma _\mu ^2)\), where \(\overline{\mu _i}^2\) and \(\sigma _\mu ^2\) are given by (13a) and (13b), respectively. Hence, from (23), the expectation and variance of the mean power are (see the supplementary material)
and
Averaging \( \mathbb {E}(\theta ^2|\overline{x},s_x^2,a,b)\) over the models via (9), we obtain (see the supplementary material)
where \(\overline{x^2} = |\textbf{x}|^2/m\) is the sample mean power,
\(u_x^2=ms_x^2/2\), and \(\Gamma (a,z_1,z_2)\) is the generalised incomplete gamma function [20]. This inference, which minimises the (Bayesian) quadratic risk, belongs to estimator classes previously considered by [7, 17, 18, 23,24,25].
As shown in Fig. 4, when \(s_x^2 \ll 1\) the data support \(\mu _i =\) const. and
If \(m=1\), then \(s_x=0\) and the power of this datum is again \(x_1^2+1\). When \(s_x^2 \gg 1\), the data support a varying signal and
Eventually, it is non-obvious and remarkable that, as \(m\rightarrow \infty \), the expectation of the mean power converges to the frequentist estimate \(\overline{x^2}-1\), see (2a). In fact,
where \(s_x^2 \ge 1\), see the supplementary material. These asymptotic expressions are derived in the supplementary material.
5 Application examples
According to the Bayes theorem, the posterior probability of a model is proportional to the marginal likelihood of its parameters based on the data. However, if the parameter prior-density is improper, the marginal likelihood can not be determined. In fact, when a probability density is non-integrable, it is given only up to an arbitrary scale factor, which means that the marginal likelihood depends on the chosen value of this factor. This is the case of the Jeffreys’ uniform prior over the reals for the mean of Gaussian data.
The problem is evaded by the prior (3), which has been proved to produce sound posteriors for the data mean and power, while avoiding inconsistencies, and, contrary to the uniform one, is proper and encodes a finite measurand value.
To give examples, we considered the measured values of the Newtonian constant of gravitation G, the Planck constant h, and the Boltzmann constant k given in [22] and [29]. These measured values have been used by the CODATA Task Group on Fundamental Physical Constants to determine mutually consistent values for use in science and technology [29]. Their differences from the weighted mean are shown in Fig. 5.
These examples have been selected to represent the cases where a visual inspection of the data suggests disagreement (G values), agreement (k values), or uncertain judgment (h values). Where the data are mutually inconsistent, most probably they reflect systematic errors. Still, it is possible that they are pointing to unknown subtleties, – perhaps the constant value depend on how it is measured.
The objective of a Bayesian equal-mean test is to quantify these qualitative judgments by assigning them probabilities. Therefore, we compare the hypothesis \(H_0\) that the measured values are sampled from Gaussian distributions having the same mean against that they are sampled from Gaussian distributions whose means might be different, \(H_1\). Assuming the same 50% prior probability of the two data models, their posterior probabilities are
where \(Z(\textbf{x}|H_n)\) is the marginal likelihood of the n-th model parameters.
Calculating the \(Z(\textbf{x}|H_1)\) marginal likelihood in the simplest way, by resting on the previous results, requires equal and unit variances of the input data. The unequal variance case makes the algebra cumbersome without adding conceptual news. To comply with the unit variances constraint, we consider the normalised differences \((x_i - x_0)/u_i\) of the measured values from their weighted mean \(x_0\), where \(u_i^2\) is the variance of the i-th datum. However, these scaled data only have the same mean if it equals \(x_0\). Therefore, we must restrict \(H_0\) to this case and, to take the \(x_0\) variance, \(\sigma _0^2\), into account, increase the data variances to \(\sigma _i^2 = u_i^2 + \sigma _0^2\).
5.1 \(H_0\) hypothesis
Let us consider the normalised differences \((x_i - x_0)/\sigma _i\) of the measured values from their weighted mean \(x_0\), where \(\sigma _i^2 = u_i^2 + \sigma _0^2\) is the sum of variances of the i-th datum and the mean, \(u_i^2\) and \(\sigma ^2_0\), respectively. If each normalised difference is independently sampled from the same Gaussian distributions having zero mean and unit variance, their joint distribution is
Since the distribution (30) is free of parameters, the marginal likelihood coincides with it. Hence,
where \(\overline{x}\) is the arithmetic mean of the normalised data and \(\chi ^2\) is the sum of the squared residuals.
5.2 \(H_1\) hypothesis
Contrary, if the measured values are independently sampled from Gaussian distributions having (or not having) different means and standard deviations, the likelihood of the scaled data \(x_i \rightarrow (x_i - x_0)\big /u_i\) is
where \(\mu _i\) is the scaled mean. By using the prior (3),
the marginal likelihood \(Z(\textbf{x}|a,b)\) of the scaled data is given by (14).
To determine the most probable model in the family (4), we look for the values of the hyper-parameters a and b maximising their posterior density, \(Q(a,b|\textbf{x})\), which is given by (16). They are \(a_0=\overline{x}\) and \(b_0 = \textrm{argmax}\big [ Q(a=\overline{x},b|\textbf{x}) \big ]\), which must be found numerically. Eventually,
5.3 Results
The calculations relevant to this analysis are available in the supplementary material. The results are summarised in table 1. The posterior probabilities confirm our expectations regarding the measured G and k values and resolve the uncertainty for the h values.
The probabilities of the \(H_0\) models are smaller than expected, which may be because we assumed not only a common mean but also that it is equal to the weighted mean of the measured values. In addition, \(H_1\) does not exclude that the data are sampled from distributions with the same mean. It’s worth noting that, assuming a uniform prior for the data means, the Bayesian test of equal means would have been impossible.
6 Conclusion
Given measurement results affected by additive uncorrelated Gaussian errors, we investigated the Bayesian inferences of the data means, individual means’ squares, and average means’ squares. The result is a new way to cope with the inconsistency originated by using a uniform prior, which inconsistency occurs because the uniform prior – contrary to what was intended and the belief that it is finite – encodes that the data power is infinite.
To minimise the difference (expressed by the Kullback–Leibler divergence) from the uniform distribution, we encoded the measurands’ indistinguishability and the belief of finite measurand values in a normal prior hyper-parameterised by the mean and variance. Averaging over the unknown hyper-parameters or letting the data to chose the most supported ones removes the shortcomings of the uniform distribution.
In the case of a single datum, the inferred measurand is not biased to the smallest value, as occurs in [4], but it is the measurement result itself. With more than one datum, we derived a James–Stein estimate of every single measurand consistent with the stated belief. This result was obtained without the use of empirical methods as in [8]. We showed that, as the sample size grows, the inference of the mean power is consistent and converges to the frequentist estimate.
After proving that it produces sound posteriors for the data mean and power while avoiding inconsistencies, we applied the hyper-parameterised normal prior to determining whether the measured values of the Newtonian constant of gravitation came from populations with the same mean or not. We repeated the test using the results of the measurements of the Planck and Boltzmann constants. If we had used an improper prior for the data mean, this Bayesian test would have been impossible.
Availability of data and material
Not applicable.
Code availability
A Mathematica notebook including the relevant symbolic and numerical computations is supplied as supplementary material. To read and interact with it, download the Wolfram Player free of charge from Wolfram Research.
References
H. Akaike, Ignorance prior distribution of a hyperparameter and Stein’s estimator. Ann. Inst. Stat. Math. 32, 171–178 (1980)
F. Attivissimo, N. Giaquinto, M. Savino, A Bayesian paradox and its impact on the GUM approach to uncertainty. Measurement 45(9), 2194–2202 (2012)
J.O. Berger, J.M. Bernardo, D. Sun, The formal definition of reference priors. Ann. Stat. 37(2), 905–938 (2009)
J.O. Berger, J.M. Bernardo, D. Sun, Overall objective priors. Bayesian Anal. 10(1), 189–221 (2015)
J.M. Bernardo, Reference analysis. In: K DD, R RC (eds) Handbook of Statistics, vol. 25 (North Holland, Amsterdam, 2005), pp. 17–99
C. Carobbi, Bayesian inference on a squared quantity. Measurement 48(1), 13–20 (2014)
M.S. Chow, A complete class theorem for estimating a noncentrality parameter. Ann. Stat. 15(2), 800–804 (1987)
B. Efron, T. Hastie, Computer Age Statistical Inference: Algorithms, Evidence, and Data Science, 1st edn. (Cambridge University Press, Cambridge, 2016)
B. Efron, C. Morris, Stein’s estimation rule and its competitors – an empirical Bayes approach. J. Am. Stat. Assoc. 68(341), 117–130 (1973)
B. Efron, C. Morris, Data analysis using Stein’s estimator and its generalizations. J. Am. Stat. Assoc. 70(350), 311–319 (1975)
M. Evans, H. Moshonov, Checking for prior-data conflict. Bayesian Anal. 1(4), 893–914 (2006). https://doi.org/10.1214/06-BA129
M. Evans, M. Shakhatreh, Consistency of Bayesian estimates for the sum of squared normal means with a normal prior. Sankhya A 76(1), 25–47 (2014)
J.F. Ferragud, Una solucion bayesiana a la paradoja de Stein. Trabajos de Estadística e Investigación Operativa 33, 31–46 (1982)
N. Giaquinto, L. Fabbiano, Examples of S1 coverage intervals with very good and very bad long-run success rate. Metrologia 53(2), S65–S73 (2016)
B.D. Hall, Evaluating methods of calculating measurement uncertainty. Metrologia 45(2), L5–L8 (2008)
H. Harney, Bayesian Inference: Data Evaluation and Decisions (Springer International Publishing, Berlin, 2018)
T. Kubokawa, C.P. Robert, A.K.M.E. Saleh, Estimation of noncentrality parameters. Can. J. Stat. La Revue Canadienne de Statistique 21(1), 45–57 (1993)
Q. Li, J. Zhang, S. Dai, On estimating the non-centrality parameter of a chi-squared distribution. Stat. Probab. Lett. 79(1), 98–104 (2009)
I. Lira, On the long-run success rate of coverage intervals. Metrologia 45(4), L21–L23 (2008)
O. Marichev, M. Trott, Mathematical functions site. http://functions.wolfram.com/06.07.02.0001.01 (2019)
Y. Maruyama, Stein’s idea and minimax admissible estimation of a multivariate normal mean. J. Multivar. Anal. 88(2), 320–334 (2004)
P.J. Mohr, D.B. Newell, B.N. Taylor, E. Tiesinga, Data and analysis for the codata 2017 special fundamental constants adjustment*. Metrologia 55(1), 125 (2018)
N. Neff, W.E. Strawderman, Further remarks on estimating the parameter of a noncentral chi-square distribution. Commun. Stat. Theory Methods 5(1), 65–76 (1976). https://doi.org/10.1080/03610927608827332
M.D. Perlman, U.A. Rasmussen, Some remarks on estimating a noncentrality parameter. Commun. Stat. 4(5), 455–468 (1975)
K.M.L. Saxena, K. Alam, Estimation of the non-centrality parameter of a chi squared distribution. Ann. Stat. 10(3), 1012–1016 (1982). https://doi.org/10.1214/aos/1176345892
D. Sivia, J. Skilling, Data Analysis: A Bayesian Tutorial (Oxford University Press, Oxford, 2006)
M. Smerlak, Minimum relative entropy distributions with a large mean are gaussian. Phys. Rev. E 94, 062107 (2016)
C. Stein, An example of wide discrepancy between fiducial and confidence intervals. Ann. Math. Stat. 30(4), 877–880 (1959)
E. Tiesinga, P.J. Mohr, D.B. Newell, B.N. Taylor, CODATA recommended values of the fundamental physical constants: 2018*. J. Phys. Chem. Ref. Data 50(3), 033105 (2021)
Wolfram Research, Inc Mathematica, Version 14.0 (Champaign, 2024a). https://www.wolfram.com/mathematica
Wolfram Research, Inc Wolfram Player, Version 14.0 (Champaign, 2024b). https://www.wolfram.com/player
G. Wuebbeler, C. Elster, On the transferability of the GUM S1 type A uncertainty. Metrologia 57, 015005 (2020)
Funding
Open access funding provided by Istituto Nazionale di Ricerca Metrologica within the CRUI-CARE Agreement. This work was funded by the Ministero dell’Università e della Ricerca.
Author information
Authors and Affiliations
Contributions
Giovanni Mana and Carlo Palmisano contributed equally to the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no Conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Mana, G., Palmisano, C. Bayesian inference of the mean power of several Gaussian data. Eur. Phys. J. B 97, 93 (2024). https://doi.org/10.1140/epjb/s10051-024-00737-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjb/s10051-024-00737-w