Introduction. In Russian metrological practice, the problem of measuring instrument calibration on the basis of a correction function emerged following the publication of a Russian translation of the ISO publication Guide to the Expression of Uncertainty in Measurement (GUM), whose editorial process did not include a scientific review [1]. The absence of such commentary was compensated to some extent by a paper by the present author [2] that discussed the concept of measurement uncertainty as an indirect measurement method according to a subjective probability interpretation, as well as providing an example of thermometer calibration according to a correction function. Here, the concept of measurement uncertainty was discussed in terms of an error contradiction taking the form of so-called normal theory, whose lack of statistical inferential logic for the new stage of metrology development was disguised by statistical terminological misrepresentation; as a condition of Russia’s accession to the WTO; and, most importantly, as “the drama of inadequacy.”

Although there are several studies addressing the calibration problem (e.g., [3,4,5,6,7,8]), the fundamental issue concerning the origin of the so-called measurement equation, comprising an indirect measurement method for solving the measurement problem, is generally avoided, including by the authors of the aforementioned works. Moreover, this issue is not problematized under the concept of measurement uncertainty.

In fact, while, on closer examination, GUM claims to provide the only realistic characteristics of measurement accuracy, it correctly cites the primary cause for the ambiguity of “measurement uncertainty” as being the inadequate mathematical description of quantities using measurement equations. Thus “although a measurand should be defined in sufficient detail that any uncertainty arising from its incomplete definition is negligible in comparison with the required accuracy of the measurement, it must be recognized that this may not always be practicable” [1]. This “not always” part is precisely the reason why the applicability of GUM cannot be verified in practice. In fact, the issue of negligible uncertainty [1] remains unresolved in GUM.

A different view on corrections during calibration is presented in MI 1747-87, MU GSI. General-Purpose and Reference Weights. Verification Procedure, and GOST 7328-2001, Weights. General Specifi cations. In 2002, another paper by the present author [9] provided an example of the structural and parametric identification of a mass scale according to R 50.2.004-2000, GSI. Determination of the Characteristics Exhibited by the Mathematical Models of Dependences between Physical Quantities when Solving Measurement Problems. Basic Provisions. Here GO-1-1110, a set of E2 class weights defined against the working measurement standard of the kilogram served as the physical model, while the mathematical models of the weights consisted in the convolutions of probability distributions in the cross-sectional observation of inadequacy-related error [10]:

$$ {f}_{\Delta}\left(\Theta; \updelta \right)=\left[{F}_{\ast}\left(\updelta +{\uptheta}_0\right)-{F}_{\ast}\left(\updelta -{\uptheta}_0\right)\right]/\left(2{\uptheta}_0\right), $$

where F*(δ) – type “*” distribution function of the observed inadequacy-related error component; θ0 – scattering parameter of its unobserved component. The consideration of the unobserved error component led to rejecting the hypothesis about the conformity of the weights to the accuracy class E2.

In 2007, the International Vocabulary of Metrology: Basic and General Concepts and Associated Terms (VIM3) gave the following definition of calibration [11]:

  • 2.39 (6.11) calibration: operation that, under specified conditions, in a first step, establishes a relation between the quantity values with measurement uncertainties provided by measurement standards and corresponding indications with associated measurement uncertainties and, in a second step, uses this information to establish a relation for obtaining a measurement result from an indication.

  • Note 1. A calibration may be expressed by a statement, calibration function, calibration diagram, calibration curve, or calibration table. In some cases, it may consist of an additive or multiplicative correction of the indication with associated measurement uncertainty. ... Note 3. Often, the first step alone in the above definition is perceived as being calibration.

Note 3 to the term 2.39 presented in VIM3 refers to the result of calibration in the international context “often” comprising a table containing combined indications of the measurement standard (calibrator) and measuring instruments (MIs).

In practice, however, often showing unawareness of the main issue pertaining to the concept of measurement uncertainty, i.e., the “the drama of inadequacy” [2], the authors do not mention the problems related to “specified conditions” and “indications with associated measurement uncertainties.” Nevertheless, it is the unresolved problem of negligible inadequacy that prompted the international standard ISO/IEC 31010:2009, Risk Management. Risk Assessment Techniques, to clearly state for the first time since GUM that the Bayesian approach and Monte Carlo method were not applicable for calculating risk. Admittedly, the shortcomings inherent to the subjective concept of probability and the statistical modeling method had been known long before the publication of GUM.

The aforementioned statement given in ISO/IEC 31010:2009 was used in GOST R ISO/IEC 31010-2011, Risk Management. Risk Assessment Techniques.

In addition, the VIM3 definition of the term 2.39 (calibration) suggests that the relation established during the first calibration step comprises a MI transformation function, whereas the relation established during the second calibration step is the characteristic understood as the inverse of the transformation function. This represents yet another approach to solving the MI calibration problem, regarding which there is ambiguity in RMG 29-2013.

A definition of calibration performed “under specified conditions” is also given in GOST OIML R 111-1-2009, GSI. Weights of Classes E1, E2, F1, F2, М1, M1–2, M2, M2–3, and M3. Part 1. Metrological and Technical Requirements:

  • 2.3 calibration: set of operations that establish, under specified conditions, the relationship between values of quantities indicated by a measuring instrument or measuring system, or values represented by a material measure or a reference material, and the corresponding values realized by standards.

  • Note 3. The result of a calibration may be recorded in a document, sometimes called calibration certificate or calibration report.

In addition, VIM3 defines the term measurement result as follows [11]:

  • 2.9 (3.1) measurement result: set of quantity values being attributed to a measurand together with any other available relevant information.

  • Note 1. A measurement result generally contains “relevant information” about the set of quantity values, such that some may be more representative of the measurand than others. This may be expressed in the form of a probability density function (PDF).

Another definition of the term calibration is provided in RMG 29-2013, GSI. Metrology. Basic Terms and Definitions, without taking “the specified conditions” into account:

  • 9.6 calibration (of measuring instruments): set of operations that establish a relationship between the quantity value indicated by a given measuring instrument and the corresponding quantity value realized by a measurement standard in order to ascertain the metrological characteristics of this measuring instrument.

  • 9.7 calibration diagram: graphical expression of the relationship between the indication and the corresponding measurement result.

GOST R 8.879-2014, GSI. Procedures of Measuring Instrument Calibration. General Requirements for the Contents and Presentation, introduces the following terms:

  • 1.3 validation: confirmation via examination and provision of objective evidence that particular requirements for the specific intended use are fulfilled.

  • 1.4 calibration certificate: document certifying the fact and results of measuring instrument calibration, which is issued by the organization performing calibration.

  • 2.5 target measurement uncertainty: measurement uncertainty predetermined as the upper limit and adopted based on the assumed intended use of measurement results.

In other words, GOST R 8.879-2014 requires that the compliance of calibration results be ascertained (as with verification) against MI validation requirements, which must be confirmed by a calibration certificate according to objective evidence.

However, the problem consists in more than just MI calibration employing verification procedures under normal conditions, since the calibration protocols often differ from the verification protocols in name only.

Firstly, “calibration is generally performed to enable the use of a measuring instrument under conditions characterized by a set of influence quantities. Therefore, calibration facilities should include test equipment reproducing these conditions with the required accuracy” [12].

Secondly, the position characteristic of the calibration diagram should be the solution to the problem of the structural and parametric identification of the calibration function of one or more arguments, which are associated with various statistical assumptions. Conversely, the scattering characteristic of the calibration diagram requires a solution to the problem related to the statistical testing of hypotheses, concerning the probability distribution type of the random component of deviations from the calibration function and the construction of its convolution with unobservable measurement and inadequacy-related error components. Moreover, inadequacy-related errors can be estimated during the solution of both problems performing simultaneous measurements with the use of cross-sectional observation.

Thirdly, measurements conducted during MI calibration yield results of varying accuracy due to the non-additive nature of calibrator errors, which depend on the value of reproducible quantities.

Finally, GOST ISO/IEC 17025-2019, General Requirements for the Competence of Testing and Calibration Laboratories, requires an objective assessment of risks associated with statistical assumptions, as well as false-positive and false-negative solutions. It was for this reason, GOST R 58771-2019, Risk Management. Risk Assessment Techniques, was hastily adopted to address the case. The shortcomings of the concept of measurement uncertainty specified in GOST R ISO/IEC 31010-2011 were renamed in GOST R 58771-2019 as limitations, while the statements about the inapplicability of the Bayesian approach and Monte Carlo method for calculating risk were excluded.

However, this quick fix only added to the criticality of the situation, with neither the Guide [1] nor GUM revision drafts providing statistical tests for hypotheses.

Moreover, the introduction of the term definitional uncertainty in VIM3-2007 and RMG 29-2013 brought the solution of the inadequacy-related problem to a dead end “in any measurement of a given measurand”:

  • 5.44 definitional uncertainty: component of measurement uncertainty, resulting from the finite amount of detail in the definition of a measurand.

  • Note 1. Definitional uncertainty is the practical minimum measurement uncertainty achievable in any measurement of a given measurand.

  • Note 2. Any change in the descriptive detail leads to another definitional uncertainty.

After all, it is precisely such detailing that requires the statistical testing of hypotheses about the structure and parameters of the so-called measurement equation.

Issue regarding the confidence level norm. GOST 8.061-80, GSI. Verification Schemes. Scope and Layout, states that:

  • 2.9.2. The errors of reference measuring instruments should be characterized by the limit of error ... or the confidence limit of error ... at the corresponding confidence level. A single confidence level is adopted for each hierarchy scheme, which is chosen from the following options: 0.90, 0.95, or 0.99.

  • 2.9.3. The metrological characteristics of ordinary measuring instruments must meet the requirements of GOST 8.009-84. The errors of ordinary measuring instruments should be characterized by the limit of error of measuring instruments.

For symmetric bounds, this limit is denoted by the symbol Δp.

The word “could” is used imprudently in the definition provided in GOST 8.009-84, GSI. Standardized Metrological Characteristics of Measuring Instruments:

3.17. Permissible limits ... are the bounds of an interval, within which, with a probability of P = 1, the value of the characteristic ... of a measurement instrument sample falls. The probability P = 1 is a reference characteristic, whose individual control could be omitted during testing and verification of measuring instruments.

In fact, it is this probability that is not controlled at all.

MI 1317-2004, GSI. Results and Characteristics of Measurement Errors. Forms of Presentation. Application in Testing Product Samples and Controlling their Parameters, (Note 4 to Table 1) states that:

Table 1. CMSU Test Results

The permissible limits of characteristics exhibited by error define the interval, within which the given characteristic lies, i.e., they correspond to the probability of finding the characteristic in this interval, which is equal to one. In addition, RMG 29-2013 provides the following definitions:

  • 5.22 confidence bounds (of measurement error): upper and lower bounds of the interval, within which, with a specified probability, the value of measurement error falls.

  • Note 1. At a probability equal to 1, confidence bounds are called error bounds.

  • Note 2. Confidence bounds of error are sometimes incorrectly referred to as confidence error.

  • 9.1 uniformity of measurement: state of measurements in which the results thereof are expressed in legal units of quantities or values according to the established measurement scales, with the measurement accuracy indicators lying within the established bounds.

If we replace the word “limits” in Clause 13.1 of RMG 29-99, GSI. Metrology. Basic Terms and Definitions, with the word “bounds,” it becomes apparent that the definition presented in Clause 9.1 of RMG 29-2013 lacks an integral component of requirements for ensuring uniform measurements, i.e., the confidence level:

  • 13.1 uniformity of measurement (traceability): state of measurements in which the results thereof are expressed in legal units of physical quantities, whose sizes within the established limits are equal to those of units reproduced by primary measurement standards; the errors of measurement results are known and, with the specified probability, do not exceed the established limits.

This specified probability constitutes the norm, a requirement both for the validity of verification results and the width of the MI calibration diagram.

That being said, it remains questionable whether it is possible to stay within the established limits with the specified probability.

Thus, a question arises about the possibility of applying the results of MI verification and calibration under normal conditions to the operating conditions of measurements – “the issue of 35%” found in GOST 8.395-80, GSI. Normal Conditions of Measurements during Verification. In this case, would it not be reasonable to introduce a coefficient by which the 0.35 acceptance tolerance limit could be lowered? From this perspective, MI calibration performed under normal conditions is meaningless, since it is necessary to reproduce the operating conditions. However, this issue constitutes a particular case of a broader problem statement.

MI calibration under specified conditions. The defining metrological characteristic of any MI is an error function Δ(x, x1, x2, ..., xW), generally described by the multidimensional probability distribution of its possible error values within the measurement range for quantity X and the variation range for influence quantities X1, X2, ..., XW. An almost inverse metrological characteristic of an MI consists in the correction function ∇(y, x1, x2,..., xW), which represents a multidimensional distribution of possible correction values within the ranges of its indications Y and influence quantities X1, X2, ..., XW. Both functions have two characteristics: position and scattering.

Under normal conditions (according to GOST 8.395-80), the error and correction functions are assumed to be the functions only of the measurand and instrument indications, respectively. In the course of MI tests, both these functions are sampled at the measurement plan points; their observed components, i.e., differences between the indications of the standard and the MI – differ only in the statistical representation sign and random component formation. The systematic component of these functions, which is generally estimated by performing regression analysis on the basis of the random component, plays a leading role when developing a measurement plan in the procedures of MI verification and calibration.

A measurement plan for verification / calibration procedures establishes the number M, placement of verification/calibration points on the MI scale, as well as the number n and the sequence of measurements at each of them.

In MI verification procedures, the parameter Ωp characterizes the quality of the measurement plan or a complementary component of error arising due to the drift or shift of the continuous normalized function δ′(x) = ΔOS(x) of the systematic component of the intrinsic error within the MI measurement range relative to the fixed verification points. According to MI 187-86, MU GSI Measuring Instruments. Validation Criteria and Parameters of Verification Procedures, this parameter is defined as a permissible “difference between the maximum modulus of the continuous normalized function δ′(x) and its value at the neighboring verification point when selecting M.” In accordance with MI 188-86, MU GSI Measuring Instruments. Determination of Parameters of Verification Procedures, it is assumed that for many MIs, the systematic component of the error function ΔS(x) (position characteristic), described by a trigonometric or power polynomial of degree 3 or less, makes no more than one or two complete oscillations within the measurement range. Thus, the difference Ω between the maxima of the function æS(x) = |ΔS(x)|/Δp and its values at neighboring verification points must satisfy the condition Ω ≤ Ωp ~ 0.05–0.10. Given the current methods of selecting verification points, their number M does not exceed 5–6.

At ΔS(x) = a0 + a1x, the measurement plan must be uniform, while in the other cases, the verification points are selected at the extrema of the function ΔS(x) and breakpoints. If the normalized function of the systematic component of the intrinsic error ΔS(x) specified in MI 187-86 is continuous, the minimum required number of verification points is equal to the number of parameters of the mathematical model of ΔS(x) plus one, which meets the condition of the cross-sectional observation of inadequacy-related error according to R 50.2.004-2000. If ΔS(x) exhibits breakpoints, the number of verification points is increased.

It seems sufficient to consider the parameter Ωp only when developing the verification procedure. However, we cannot discard the following aspects: the spread of characteristics exhibited by MIs belonging to one type, drift thereof during operation and testing, inadequacy-related errors of the mathematical model of ΔS(x), and the probability distribution of deviations from it. Moreover, experience has shown that complementary errors associated with the parameter Ωp, which are comparable to verification errors, require statistical control [13]; this is also characteristic of the systematic component of the correction function.

Thus, the most important steps in solving the calibration problem under specified conditions comprise a statement of the measurement problem in accordance with R 50.2.004-2000, MI 1317-2004, and MI 2916-2005, GSI. Identification of Probability Distributions When Solving Measurement Problems, along with the selection of test equipment.

The problem statement must contain a mathematical model of the calibration curve structured in binary code to enumerate possibilities (i.e., detailing) and a set of truncated distributions to form the boundaries of the calibration diagram.

In this respect, a case in point is the MCM-stat M program recommended in R 50.2.004-2000, which has been used for calibration purposes since 1998. The program identifies the systematic component of the correction function (position characteristic) having the binary code structure θ = θ000θ100θ010…θmij…θKKK:

$$ {\tilde{\nabla}}_{\upvartheta}\left({x}_1,{x}_1,\kern0.5em \dots, {x}_M\right)=\sum \limits_{i=0}^I\sum \limits_{j=0}^J\dots \sum \limits_{m=0}^M{\upvartheta}_{ijm}{\uptheta}_{ijm}{x}_1^i{x}_1^j\dots {x}_M^m,\kern1em \max \left\{I,J,M\right\}\le K, $$

where K sets the model of maximum complexity, while its shift Ωp is determined by the difference between the average modulus of inadequacy-related error (AMIE) and the mean absolute deviation (MAD) of the observed component of corrections.

Theoretically, any model can be used so long as the errors arising from its inadequacy for the position characteristic in cross-sectional observation are estimated. The scattering characteristic, on the other hand, requires the use of contour estimates and the establishment of the confidence level norm in accordance with the state hierarchy scheme.

The easiest way to create an extended measurement plan for determinative tests under specified conditions is to introduce intermediate calibration points. Under these circumstances, it becomes possible to ensure the correctness of selected parameters while testing the verification procedure, for which purpose so-called “extended verification” is performed [14]. Data from type approval tests can also be used to calibrate MIs since these tests determine the dependence of MI metrological characteristics on various influence quantities.

Drawing on the determinative test data and taking the varying accuracy of measurements into account, the identification of the boundaries of the calibration diagram employing MMI-verification and MCM-calibration programs on the basis of the truncated probability distributions requires preliminary normalization according to MI 187-86, MI 1317-2004, and MI 2916-2005. Thus, the calibration result can be obtained using normalizing values.

Example of calibration under specified conditions. On December 5, 2010, the launch of a Proton-M rocket from the Baikonur Cosmodrome, having a Block DM-03 upper stage and carrying three GLONASS-M navigation satellites, resulted in an accident. The rocket, upper stage, and satellites fell into the Pacific Ocean. While all the launch vehicle systems had operated correctly, the satellites were lost due to applying the wrong formula in the design documentation, resulting in an additional 1.5 tons of liquid oxygen being poured into the upper stage.

On February 20, 2020, the launch of a Meridian-M telecommunications satellite in Plesetsk nearly caused an accident due to the third-stage engine of the Soyuz 2.1a rocket shutting down a few seconds prematurely. Fortunately, the accident was prevented thanks to the intelligent control system and propellant reserves of the Fregat upper stage, which were used to compensate for the lack of velocity. While a signal about a lack of propellant had been generated during filling, experts had concluded that it resulted from a software error.

In order to demonstrate the calibration procedure, control and measurement subunits (CMSUs) were selected for determining the capacitance of filling level sensors. Table 1 presents the data on CMSU tests conducted in an MHU-CNSA environmental chamber in 2016 when measuring the capacitance of R597 reference measures according to Control and Measurement Subunits. Description of Type No. 64452-16; RT-PI-3173-551-2016, Control and Measurement Subunits. Type Approval Test Program; and R597 Reference Capacitance Measures. Description of Type No 2684-70. The following symbols are used in Table 1: Xn, Xc – nominal and certified values of capacitance measures; X1 – CMSU indications; X2 – temperature in the environmental chamber; Δ = X1Xc; θC, θT – bounds of errors in the certification of capacitance measures C and temperature measurements T; Δp(Xn) – limit of CMSU error.

In accordance with R 50.2.004-2000, the following estimate was obtained for the structural and parametric identification of the systematic component of the normalized correction function Y′(Θ; X1; X2) by the minimum AMIE criterion using the MCMLSMFootnote 1 algorithm (see Fig. 1):

$$ {Y}^{\prime}\left(\Theta; {X}_1,{X}_2\right)={\uptheta}_{00}+{\uptheta}_{10}{X}_1+{\uptheta}_{01}{X}_2\pm {\Omega}_{\mathrm{p}}, $$
Fig. 1.
figure 1

MCM-stat M program: projections of Y′(X1, X2) on the planes {Y, X1} and {Y, X2} (a and b, respectively).

where θ00 = 1.8974200·10–1, θ10 = –1.8985258·10–4, θ01 = 2.5074412·10–4 (estimates of parameters are given in protocol form); the average modulus of residual (AMR), the shift of position characteristic as analogous to the concept of “residual systematic error” comes to

$$ {\Omega}_{\mathrm{p}}=9.2938386\cdot {10}^{-2}-8.9129418\cdot {10}^{-2}=3.808968\cdot {10}^{-3}. $$

In GOST 8.317-80, GSI. State Primary Standard and State Hierarchy Scheme for Capacitance Measuring Instruments, the value of the maximum permissible error is normalized, which corresponds to a confidence level of P = 1 in GOST 8.061-80, GSI. Hierarchy Schemes. Scope and Layout, and MI 1317-2004.

The data presented in Table 1 were used to identify the CMSU calibration diagram according to the convolution of the distributions of error components using the MMI-verification program (Table 2).

Table 2. Data on the Identification of the Systematic Component of the Correction Function

For the observed calibration error component, uniform distribution was found to be the most probable, whereas the normalized contour estimate of the inadequacy-related error corresponded to an uncertainty interval of ±0.198594. In general, the trapezoidal convolution of calibration error components falls within the interval of normalized values [–0.331011; +0.318053].

Conclusion. Since the calibration of measuring instruments under specified conditions constitutes a part of a broader problem associated with developing, certifying, and testing calibration procedures, its solution is directly related to the certification of test equipment. Guided only by GOST R 8.879-2014, it is not possible to develop even the simplest calibration procedure that employs auxiliary equipment to reproduce influence quantities at the same time as meeting the requirements for uniform measurements, whose integral component is the confidence level norm established in state hierarchy schemes. In general, unless the solution to the calibration measurement problem is determined by the statistical inference logic for the mathematical model of the diagram, the assessment of risks according to GOST ISO/IEC 17025-2019 is meaningless.