The Guide to the Expression of Uncertainty in Measurement (the GUM) is one of the main documents in metrology [1]. The GUM was published for the first time in 1993 on behalf of seven international organizations: BIPM, IEC, IFCC, ISO, IUPAC, UIPAP, and OIML. Currently, Working Group 1 (WG 1) of the Joint Committee for Guides in Metrology (JCGM), which was specially created in 1997 and is headed by the Director of BIPM, is engaged in the maintenance and development not only of the GUM but also of the whole concept of uncertainty.

According to the authors, two main objectives of the GUM must be delineated:

  1. 1)

    provision of the uniformity and transparency of methods of estimating the quality of measurements of any level, which is a necessary condition of comparison and correlation of results of measurements done in different laboratories, and will in the final analysis lead to mutual recognition of measurement results;

  2. 2)

    strict mathematical justification of rules and algorithms for calculating uncertainty of measurement, as the authors of the GUM considered it incorrect to describe systematic errors as random variables and subsequently “sum” the characteristics of systematic and random errors as accepted in classical measurement error theory.

The first objective has certainly been achieved. The GUM has been used for more than two decades practically unchanged. It has been translated into various languages and has been adopted as a national standard in a number of countries. The GUM is applied to analyze the results of Key Comparisons of the national standards conducted for the purpose of implementing the Agreement on Mutual Recognition of National Measurement Standards and Certificates of Measurements and Calibrations issued by National Metrological Institutes (CIPM MRA) [2]. The concept of uncertainty is widely used in calibration of measuring instruments (MI) [3, 4].

The second objective has been only partially achieved. An undoubted advantage of the GUM is the conceptual approach that permits jointly and uniformly considering the contributions to uncertainty arising both from random and from systematic factors. As a result, the uncertainty of measurement is calculated uniformly, using the methodology of combining both types of contribution to uncertainty that has been approved by the international metrological community. However, in analyzing the probability-theoretical foundations of the calculation of uncertainty in accordance with the GUM, it is clear that initially the GUM was an internally inconsistent document. The reason for these contradictions is that different interpretations of probability are accepted for calculation of standard uncertainties: frequency and Bayesian (subjective), or types A and B uncertainty calculations, respectively. Such a “mixture” of the frequency and Bayesian approaches resulted in internal inconsistency of the GUM and even led to “mistakes,” and became a guarantee of the inevitable revision of the GUM [5].

The concept of uncertainty arose to a significant degree because of disagreement of the authors of the GUM regarding how the probability-theoretical approach is used in the theory of errors. The authors of the GUM pointed to the ill-posedness of describing systematic errors by random variables with the frequency interpretation of probability and offered a different approach to expressing the accuracy of measurement based on uncertainty of measurement. Other possible reasons for the emergence of the concept of uncertainty are given in [6].

Standard uncertainty was selected as the main indicator of uncertainty, being the standard deviation of the distribution of a random variable, which is used to describe the possible values of quantity regardless of the source of uncertainty, whether it be random variability of data with repeated measurements or systematic effects in measurement that are brought about by introducing adjustments or by the MI being used. The rules for estimating standard uncertainties are considered in the GUM depending on available information (statistical or non-statistical) that corresponds to an estimation of the uncertainty of measurement by types A or B. The selection of standard deviation as the uniform indicator of uncertainty simplifies the summation of the component uncertainties caused by various systematic and random factors. The approach proposed in the GUM of summation of standard uncertainties is based on the laws of probability theory and mathematical statistics, specifically: the dispersion of the sum of independent random variables is the sum of the dispersions of these values. The quadratic summation of standard uncertainties under the square root sign is called the law of transformation of uncertainties, which is based on expansion of the measurement model Y = ƒ(X 1, X 2, ..., X n ) (Y is the measurand (output); X 1, X 2, ..., X n are the input values) in a first-order Taylor series: \( {u}^2(y)=\sum_i^N{u}^2\left({x}_i\right){\partial}^2f/\partial {x}_i^2, \) where u (x i )is the standard uncertainty of input values. The corresponding term must be added to a Taylor series when there is correlation between influence quantities.

The law of transformation of uncertainty can be applied to nonlinear models if the following conditions are fulfilled:

  1. 1)

    the measurement function has a continuous derivative on input values Xi in vicinities of the best estimates x i , and this is valid concerning derivatives of all orders used in the law of transformation of uncertainty;

  2. 2)

    the values X i entering into the significant members of the expansion of the measurement function into a Taylor series of the highest orders are independent; and

  3. 3)

    the members of the highest orders that are not included in the approximation of the measurement function in a Taylor series are negligible.

However, checking these conditions is not always given the necessary attention in practice, and often the law of transformation of uncertainty is applied in a formal manner, which can lead to absurd results. A clear example of obtaining untrustworthy estimates is contained in JCGM 101:2008 [7, P. 9.4.2], where a nonlinear function of a measurement model of the form \( Y={X}_1^2+{X}_2^2 \) is considered. When estimates of input values are x 1 = x 2 = 0 and standard uncertainties u(x 1) = u(x 1) = = 0.005, then in accordance with the GUM the estimates of output value y = 0 and standard uncertainty u(y) = 0 will be equal to null. It is obvious that in this case the conditions of applicability of the law of transformation of uncertainties are violated. Partially in order to correct this situation, it is permitted to account for higher-order members when expanding the measurement function in a Taylor series, as likewise recommended in the GUM. Since all partial derivatives of the measurement function considered in the example are zero above the second order, the accounting for members of only the second order at expansion in a Taylor series row automatically leads to accounting for all members of higher order, and that means complete accounting for nonlinearity of measurement function. In this example, the standard uncertainty of the output value turns out to be equal to u(y) = 50·10–6, and the estimation of the measurand remains equal to zero.

The simplicity of use of the probability-theoretical approach that is implemented when calculating the standard uncertainty disappears upon transition to calculating the expanded uncertainty of measurement. Expanded uncertainty is demanded in practice, and in particular the calibration and measurement capabilities (CMC) as defined by the CIPM MRA and declared in the key comparison database of the International Bureau of Weights and Measures (BIPM KCDB) represent expanded uncertainties for confidence level 0.95.

To calculate the expanded uncertainty (coverage interval), it is necessary to know the law of distribution of the measurand. Application of the GUM permits obtaining only an estimation of the measurand and the corresponding standard uncertainty, which then can be used as the mean and standard deviation of the probability distribution of the measurand. However, within the GUM the probability distribution density of the output value is not handled, and the problem of finding it based on the distribution densities of the input values is not studied. Therefore, various approximations are proposed in order to create coverage intervals in the GUM. It is recommended to calculate the expanded uncertainty U by multiplying the standard uncertainty u(y) by an “suitable” coverage factor k. Here, in order to determine the value of the coefficient of coverage it is necessary to know the distribution law of the measurand – the circle is closed. Knowing the distribution law is key to establishing the coverage interval (expanded uncertainty).

In this situation, it is proposed in the GUM to “assign” for the measurand an “approximate” probability distribution which, based on the central limit theorem for probabilities can be the normal distribution (or in certain cases a t-distribution). For the normal distribution law and standard P = 95%, k = 2 is suggested. However, the normal distribution can be reasonably accepted when a number of conditions are observed: a linear approximation of the measurement function, and a large number of input values in the measurement model which bring approximately comparable contributions of uncertainty and are described by regular probability distributions (for example, normal and rectangular).

Another approximation approach to calculating the coverage factor is based on calculation of the effective number of degrees of freedom for the Welch–Satterthwaite formula, and the separate Appendix G in the GUM is devoted to this procedure. The proposed method of calculating expanded uncertainty turned out to be a consequence of the previously mentioned “mixture” of frequency and Bayesian interpretations, when the coverage factors are taken as fractiles of the Student distribution with an effective number of degrees of freedom that corresponds to the total standard uncertainty. However, the use of the Welch–Satterthwaite formula when creating coverage intervals, as well as expanding the number of degrees of freedom borrowed from the frequency approach to the uncertainties calculated as type B, causes objections and even leads to contradictory results to which many authors, particularly [8, 9], repeatedly paid attention.

For an illustration, we return to the example being studied. For normal distribution and probability of coverage 95%, the coverage interval for the measurand appeared equal to [0; 0] taking into account only the linear members of the expansion into a Taylor series, and [–98·10–6; 98·10–6] the members of the second order. It is clear that in this case application of the GUM leads to unsatisfactory results. The coverage interval [0; 0] can not be considered as plausible. In creating the second interval, the nonlinearity of the measurement model was considered, but the interval turned out to be symmetric relative to y = 0, i.e., a 50% probability of the existence of negative values for Y, which according to the measurement model is not meaningful. The example that has been presented shows that the procedure for creating coverage intervals in the GUM is suitable only for a limited number of situations, and in some cases can lead generally to paradoxical results.

Unfortunately, in practice the conditions for applicability of the GUM are checked extremely rarely, and use expanded uncertainty U = 2u(y) without additional justification in constructing a coverage interval for P = 95%. It should also be noted that for asymmetric distributions of output value (which are practically not discussed in the GUM), creation of symmetric coverage intervals based on expanded uncertainty appears to be generally unacceptable.

Thus, use of the GUM is limited to measurement models permitting linearization, and where output values have close to normal distributions. When these conditions are fulfilled, the calculation of expanded uncertainty may be recognized as correct. The specified restrictions were overcome by the creation of Supplements to this document. The general strategy of development of Supplements to the GUM is stated in JCGM 104:2009 [10]. This document precedes the GUM according to its semantic contents, and serves as an introduction to the concept of uncertainty in measurement, the GUM, and other documents developed by JCGM. The main steps in estimating uncertainty are listed in [10], as well as the methods of calculating standard uncertainty of the measurand that are briefly characterized and still used today and described in the GUM and Supplements. These pertain to: calculation of uncertainty according to the GUM based on the law of transformation of uncertainties; calculation of the standard deviation of the probability distribution of the measurand, derived analytically or by numerical modeling using the Monte Carlo method [6].

At the current time, two Supplements to the GUM have been developed: JCGM 101:2008 [7] and JCGM 102:2011 [11], hereinafter named the GUM S1 and GUM S2. The use of the Monte Carlo method to calculate uncertainty in measurement is examined in the GUM S1. Numerical modeling takes repeated samples of the probability density of input values, and then by means of a measurement model formulates a selection corresponding to the output value. Based on this selective distribution, it is possible to obtain the values of the measurands and standard uncertainties as the mean and standard deviation, respectively, and also to construct coverage intervals for any specified coverage probability without additional assumptions about the law of distribution of the measurand. This approach by analogy with the one examined in the GUM is called the law of transformation of distributions.

The addition of GUM S1 develops the GUM. The proposed method is applicable both for linear and nonlinear measurement models. For linear and linearized measurement functions and input values subject to normal distribution, such an approach will be consonant with the GUM approach. However, when the condition for application of the GUM approach is not fulfilled, using the GUM S1 makes it possible to derive more reliable and well-founded conclusions about uncertainty than the GUM does. In the GUM S1, the measurement model and an “obligatory” attribution to input values of probability distributions that, unlike in the GUM method, are in an explicit form used to obtain distribution densities of the output value are applied.

Since additional assumptions are not required in order to apply the Monte Carlo method, it can be used intrinsically for uncertainty calculation and for validating the results obtained by means of the GUM.

Let us return to the example being considered. An estimation y = 50·10–6 and standard uncertainty u(y) = 50·10–6 were obtained by the Monte Carlo method in the GUM S1. The coverage interval is equal to [0; 150·10–6] for P = 95% (in this case, the shortest interval having the least length among all possible coverage intervals with the same coverage probability). In this example, the distribution density of the output value is successfully obtained by analytical methods, which is quite uncommon in practice. The mean and standard deviation for the analytically found probability distribution density of the output value Y that are accepted as an estimation of the measurand y and standard uncertainty u(y) were y = 50·10–6, u(y) = 50·10–6, and the smallest coverage interval for P = 95% was [0; 150·10–6]. One can become familiarized with a detailed analytical development of the formulas for this problem in [7, Appendix F]. Table 1 presents a comparison of results obtained in various manners (GUM, Monte Carlo, analytical development) for calculations for the example under consideration.

Table 1. Results of Calculations by Various Methods of Transforming Distributions for Measurement Model \( Y={X}_1^2+{X}_2^2 \)

Values of the standard uncertainties of the output value, obtained by using higher-order members of a Taylor series and the Monte Carlo method, coincide among themselves and with the analytical solution of this problem. Thus the zero estimation of output value calculated using the GUM differs from the estimates y = 50·10–6 found by the Monte Carlo and analytical methods. In accordance with the GUM approach (and Supplements to it), the best estimation is associated with the mean of the output value, but the question naturally arises as to the validity of a similar selection for distributions of similar type and also for asymmetric unimodal distributions.

For the purpose of a graphical illustration of the results obtained in the various ways, Fig. 1 shows a diagram of probability distribution densities for Y, and also the coverage intervals constructed on their basis for P = 95%. According to the GUM (taking into account higher-order members in the Taylor series expansion), the measurand is described by a normal distribution density (the bell-shaped dot-dash curve). The probability distribution densities obtained by the analytical development (the exponentially decreasing curve) represent a special case of a χ2 distribution and differs significantly from the normal distribution densities used in the GUM. The histogram constructed by the Monte Carlo method and the analytical curve practically coincide. The dot-dash vertical lines designate the coverage interval [–98·10–6; 98·10–6] according to the GUM, taking into account the members of a higher-order Taylor series expansion. The continuous vertical lines specify borders of the smallest coverage interval for P = 95% found in the analytical method and the Monte Carlo method, being [0; 150·10–6].

Fig. 1.
figure 1

Results of constructing distribution densities and coverage intervals by various methods of estimating uncertainty for the measurement model \( Y={X}_1^2+{X}_2^2 \) and the case x 1 = x 2 = 0, u(x 1) = u(x 2) = 0.005: the dot-dash and semibold lines show the results obtained by the the GUM method, taking into account higher-order members of the Taylor series and the analytical method, respectively; the histogram is constructed using the Monte Carlo method; X is the value of the output Y; Y is the probability distribution density.

Hence, using the Monte Carlo method in the GUM S1 to calculate uncertainty made it possible to extend the uncertainty concept to nonlinear measurement models and to remove the limitation of only using the normal distribution law in calculating coverage intervals.

The Supplement GUM S2 [11] extends the uncertainty calculation to multivariate measurement models frequently met in practice, including any number of both input and output values. Any calibration of a set of artifacts – weights, condensers, end gauges, and others – is a multivariate case. And only multiple-parameter processing with matrix notation makes it possible correctly to examine and present covariant matrixes of the input and output values. For estimation of uncertainty in the GUM S2, it is proposed to use both the Monte Carlo method, generalized so as to obtain a discrete representation of the joint probability distribution of the output values of a multivariate model, and the estimation method according to the GUM, generalized for the case of a multivariate measurement model, with application of the Bayesian approach for all input values, including those that containing statistical data.

For the specified probability of coverage in the GUM S2, a method of determining the coverage region for output values has been established – an analog of the coverage interval for a one-dimensional model with scalar output value. Coverage regions have forms of hyperellipsoids and rectangular hyperparallelepipeds in the multivariate space of input values.

Supplements to the GUM were developed, first of all, for the purpose of expanding the field of application of the concept of uncertainty to nonlinear measurement models distinct from the normal distribution laws and multi-parameter values. It was initially planned that the documents developed must follow the GUM methodology to the maximum degree. However, as noted above, the GUM is an internally inconsistent document relying both on the frequency and on the Bayesian concept of probability. The accepted frequency approach for interpretation of probability distributions of input values when estimating type A uncertainty and transforming the uncertainty of input values, and also for finding the coverage interval of the output value contradicts the definition of the concept of “uncertainty” according to the VIM dictionary [12] and the method of calculating type B uncertainty, and consequently can not presume to fully correct the shortcomings of the GUM.

According to the frequency approach, the uncertainties and estimates of values are estimates of the moments of frequency distributions and have degrees of freedom associated with them that are considered in the GUM to be measures of accuracy for uncertainties. The GUM provides a formula for calculating the value of “the uncertainty of uncertainty,” for which a limited set of statistical data was obtained (even after ten repeated observation tests, the uncertainty of uncertainty will be 24%). At the same time, the estimation of type B uncertainty accepted in the GUM is based on a Bayesian approach, according to which the estimates of values and uncertainty are the precise moments of the distributions based on available knowledge of the value, and consequently there are no degrees of freedom associated with them. Thus according to the GUM, the law of transformation of uncertainties combines squares of standard deviations, and of estimates of standard deviations, that rely on mutually exclusive treatments of probability.

In the development of the Supplements, the main methodological shortcomings inherent to the GUM regarding the exposition of the concept of uncertainty were eliminated. Placed at the foundation was the Bayesian approach, in which probability serves as a quantitative expression of the degree of confidence, based on available information, of the validity of some statement. Measurement is impossible without a priori information. A priori information must be presented in the form of some a priori distribution of the probabilities of a defined parameter. The distribution describes the possible values of this parameter even before measurement (but can rely on experimental data of previous measurements). In the process of obtaining new information (experimental data), the a priori distribution is refined and transitions to a posteriori using the Bayes formula. Bayesian methods work even in the case of a sample with zero volume. Then the a priori and a posteriori distributions simply coincide. So, the Bayesian approach permits approaching the calculation of both type A and type B uncertainties from uniform positions. Hence, in accordance with the Supplements to the GUM, it is necessary to assign to all input values in an explicit form the probability distribution densities (instead of standard uncertainties of estimates of these values, as in the GUM), based on available information on these values [13]. For such probability densities, a special term “state-of-knowledge PDF” is introduced, and the mean and standard deviation (the precise moments of the corresponding order of this probability density) are an estimation of the value and standard uncertainty.

The distribution densities of values the information about which is contained in a series of observations (type A estimate), is calculated based on Bayes’ theorem. A series of observations is considered as an instantiation of the independent equally distributed random variables with the specified form of probability distribution densities, but with unknown parameters. As a rule, a Gaussian distribution with unknown mean and dispersion is used. An inconclusive joint a priori probability distribution is attributed to a population mean and dispersion, and then the joint probability distribution density is refined based on data from a series of observations (multiplication by likelihood function), and as a result the joint a posteriori density for two unknown parameters is derived. Integration of the joint a posteriori probability distribution density on a nuisance parameter (in this case dispersions) makes it possible to find the distribution density of values of a quantity (the unknown mean). To select probability distribution density of values to which observation data are inaccessible, or values that are impossible to measure (type B estimation), it is proposed to apply the maximum entropy principle introduced by Jaynes. According to this principle, a unique probability distribution density is selected from all possible distributions with the specified intervals, on which the density is not equal to zero [7]. Thus, in the Supplements to the GUM the standard uncertainty estimated by type A is now not an estimation of standard deviation but a parameter of the probability distribution density function established by taking into account available information, as is also the uncertainty estimated by type B. Therefore, a division of the methods of estimating uncertainty by types A and B is in fact no longer necessary.

Further probability distribution densities of input values are transformed through a measurement model to derive the probability density of the measurand, or the joint probability density for a multivariate measurand. The necessary numerical characteristics of the measurand can be found in an explicit form from the probability density: the mean and standard deviation, the values identified with an estimation of the measurand and the related standard uncertainty, and also the required precise coverage interval (as a rule, the smallest or probabilistically symmetric) for any specified probability of coverage without any assumptions and approximations. So it is possible to exclude calculation of expanded uncertainty that takes into account effective degrees of freedom, which as already mentioned has been repeatedly criticized.

The Bayesian approach was the basis for the GUM S1 and the GUM S2, and now the GUM will conceptually not be coordinated with the Supplements. Therefore, in 2008 the JCGM made the decision regarding revision of the GUM. The adoption of the decision on revising such a successful and universally-adopted document was, in the view of the authors of this article, based on very weighty reasons and careful analysis of the planned changes. However, it is difficult to compare the advantages of revision and the obvious inconveniences caused by introducing changes in the set of documents associated with the GUM. The question of the revision of the GUM was widely covered in various publications [5, 13], and it is possible to list the following among the main objectives of the revision [5]:

  1. 1)

    scientific processing of the basic principles of the concept of uncertainty based on a Bayesian approach;

  2. 2)

    terminological coordination with the VIM3 dictionary [12] developed significantly later by the GUM and containing terminology that is rather strongly different from the previous version of the VIM2 dictionary;

  3. 3)

    terminological coordination with the GUM S1 and the GUM S2 and elimination of ambiguity in designations;

  4. 4)

    correction of the general style and method of exposition of the document, which must become more strict from the scientific point of view; and

  5. 5)

    precise delineation of the region of application of the GUM.

At the end of 2014, for the purpose of receiving necessary responses and comments, the first edition of the new GUM (JCGM 100:201X) prepared by WG 1 was presented to all member organizations of the JCGM, national metrological institutes, and other interested organizations. There was extensive discussion of the draft of the new GUM and many critical remarks were expressed, and so now the document is being worked on. A seminar [14] was devoted to application of the GUM and its planned revision.

Analysis of the materials of the seminar shows that in the draft of the new GUM, the law of transformation of uncertainty remains the basic method of calculating standard uncertainty when its application is correctly based [14, 15]. The draft of the new GUM is based exclusively on the Bayesian likelihood approach, where considerable attention is paid to calculation of standard uncertainties of input values as standard deviation of the state-of-knowledge PDF, which represent in total the available information on these values. The analysis of specific possible changes in the method of estimating uncertainty of measurements by the GUM will be examined in the second part of this article.

In the opinion of the authors, the revision of the GUM based on a uniform Bayesian approach is an important step in the development of the concept of uncertainty and elaboration of a consistent and theoretically well-founded method of estimating the quality of measurement. Among the objectives of the revision of the GUM is mentioned the importance of a precise delineation of its region of application. It is also important to clarify the place of uncertainty among the quantitative measures of accuracy of measurement. In the current GUM, great attention is paid to comparison of uncertainty and measurement error. One might even say that the concept of uncertainty is to some extent presented “by contradiction,” as a contradiction of the error of measurement. Certainly, one must expect that this will not be in the new GUM. But the issue of comparing the two concepts remains open. This opposition has been preserved also in relation to such a generalized concept as accuracy of measurement. It should be noted that uncertainty is mentioned in the project of the new GUM exclusively as a measure of the quality of measurements (without mentioning accuracy of measurement). Here, the definition of the concept of uncertainty indicates that this a question about the quantitative expression of the accuracy of measurement. It is possible that an examination of uncertainty as one of the possible measures of the accuracy of measurement would more logically permit connecting the calculation of uncertainty and estimation of precision and validity of measurements methods [16], calculation of uncertainty, and its use in the estimation of compliance.