1 Introduction

The first International Comparison of Absolute Gravimeters of the International Association of Geodesy (IAG) was hosted by the International Bureau for Weights and Measures (BIPM) in 1981 (Boulanger et al. 1983), initiating a 40-year successful cooperation between geoscientific and metrology communities developed especially within working groups of IAG and Consultative Committee for Mass and Related Quantities (CCM). The comparisons took place at regular four-year interval in Paris up until 2009 and beginning in 2001 were augmented in particular by the so-called European comparisons in Walferdange (Francis et al. 2012) and North-American comparisons in Boulder (Schmerge et al. 2012). More frequent comparisons had become necessary after extensive spreading of the commercial FG5 (Niebauer et al. 1995) and FG5X (Niebauer et al. 2011) absolute gravimeters (AG), which have been able to demonstrate repeatability better than 1.6 µGal as it was shown, e.g. in Van Camp et al. (2005), Rosat et al. (2009) or Pálinkáš et al. (2010). Nevertheless, determination of bias changes over time for a particular gravimeter (e.g. after maintenance) or bias differences between instruments have to be known at the microgal level (1 µGal = 10 nm/s2) for ensuring consistent gravity measurements needed in geophysics (e.g. Van Camp et al. 2017), geodynamics (e.g. Olsson et al. 2016) and geodesy (Pálinkáš et al. 2013). In metrology, the highest requirements on g-values are related to the realization of the kilogram by the Kibble balance (Stock 2013; Robinson and Schlamminger 2016) where the absolute gravity has to be known at the centre of the test mass with accuracy better than 5 µGal (e.g. Jiang et al. 2013).

With the establishment of a new absolute gravity reference system (Wilmes et al. 2016), the international comparisons will gain importance as a backbone of its realization. The absolute gravimeters contributing to the realization of the system (International Gravity Reference Frame) should participate in the comparisons or validate the measurements at the reference stations established within the frame. Combination of results from comparisons and monitoring of temporal gravity changes at reference stations equipped by superconducting gravimeters should allow to provide information on biases of a particular AG.

Since at a given time and a given site, only one gravimeter can measure, the comparisons were organized at stations with several sites allowing simultaneous measurements of different AGs, taking care to optimize the measurement schedule and also including corrections due to the time variable gravity field during comparisons lasting from a few days to a few months. The reference values and biases at comparisons were determined by an adjustment procedure that was significantly progressing over time, applying different weighting schemes and uncertainty estimates. Moreover, since 2009 (Jiang et al. 2012), as a consequence of the Mutual Recognition Arrangement of the International Committee for Weights and Measures (CIPM MRAFootnote 1), the international comparisons of AGs have been split up into the key comparison (KC) and the pilot study or recently additional comparison (Marti et al. 2014) to adjust to the established rules in metrology. As a consequence, different subsets of gravimeters have to be considered in comparisons with different contributions to the comparison reference values. While gravimeters having metrological background (assigned to a National Metrology Institute or Designated Institute, NMI/DI), enclosing a complete uncertainty budget with their results, take part at the KC and can contribute to the definition of the absolute reference values, the majority of gravimeters operated by geodetic or geophysical institutions (non-NMI/DI) can be treated in the adjustment procedure only as relative meters. At first glance, this separation seems to be illogical especially when both groups of institutions are using the same type of gravimeters (mostly FG5 or FG5X); however, it is a typical situation in metrology where the NMI/DIs have responsibilities to maintain the national standards and should declare Calibration and Measurements Capabilities (CMC) published in the Key Comparison Database (KCDBFootnote 2) that comprises a review of laboratories and declared CMCs (including uncertainties). Therefore, in an ideal case, the measurements of NMI/DIs should be associated with such an uncertainty budget, where all the possible error sources influencing measurements are taken into account, and therefore, the declared uncertainties represent realistic estimates of the measured gravity values. Consequently, from a metrological point of view, the main goal of key comparisons is the validation of CMCs published in the KCDB and in other words, to verify whether the determined bias (degree of equivalence) is consistent with declared uncertainties. On the other hand, users from geodesy and geophysics plan to use the determined biases for correcting their measurements (Olsson et al. 2016; Pálinkáš et al. 2013) and by this way to obtain consistent time series and utilize the excellent long-term repeatability of FG5 and FG5X (FG5/X) gravimeters. Nevertheless, the real accuracy of the gravity reference itself must always be taken into account when the “absolute g” is under discussion. This is relevant not only for the Kibble balance experiments, but also for long-term gravity time series in geosciences, since the realization of the gravity reference depends on the cloud of participating gravimeters. If one type of AG is dominating (FG5/X since 1997, cf. Table 1), possible common systematic errors might affect the reference. Therefore, it is necessary to investigate systematic errors of the individual technologies, as in case of FG5/X gravimeters mainly the diffraction effect (Van Westrum and Niebauer 2003; Křen and Pálinkáš 2018) and the distortion effect (Křen et al. 2016) and in case of cold-atom gravimeters (e.g. Pereira and Bonvalot 2016) the wavefront aberration (Schkolnik et al. 2015). Of course, comparisons might serve as in Jiang et al. (2012) also for an experimental determination of biases between different technologies.

Table 1 List of key comparisons of absolute gravimeters organized by the Consultative Committee for Mass and Related Quantities (CCM) and EURAMET since 2009

As it can be seen, the comparisons and its processing play a key role in the determination of the absolute gravity reference. Although the processing strategies have been evolving over time, at present still several open questions exist. In this paper, we present and discuss the results of reprocessing of data obtained in the recent International and European comparisons since 2009, see Table 1. The least squares approach is used for obtaining estimates for the reference values and biases of the comparisons. However, for the first time in gravimetry, it takes into account correlations between measurements. Such an approach has been used also for other comparisons in metrology (Sutton 2004; Woolliams et al. 2006). As discussed in White (2004) and Koo and Clare (2012), it allows to obtain appropriate uncertainty estimates for the determined biases. Since the comparisons of gravimeters have several specifics as, e.g. existence of two groups of gravimeters (NMI/DI and non-NMI/DI), more reference values or several measurements from particular gravimeters, in Sect. 2, we show and explain, step by step, the important aspects that influence the elaboration of comparisons as (1) harmonization of uncertainties, (2) definition of the constraint, (3) link of the regional comparisons, (4) correlation of measurements, (5) consistency check and outlier detection, (6) uncertainty estimates for results of the comparison. In the published reports of AG comparisons, some of these aspects have not been treated consistently and the final results have been associated with uncertainty estimates that do not always follow the law of error propagation. Acknowledging the enormous efforts in the preparation of these reports, the presented elaboration of comparisons described in Sect. 2 should not be understood as a criticism of the published results, but as an effort aiming to contribute to the discussion on elaboration of comparisons in general, and finding a consensus on the elaboration of AG comparisons in the future.

In Sect. 3, results of reprocessing are presented by two solutions for each comparison. The first solution, labelled as KCN, is taking into account the division of gravimeters as belonging to NMI/DIs that can contribute to the definition of KCRV (key comparison reference values). The second solution, labelled as ICN, is treating all gravimeters at the same level—as appropriate to contribute to the definition of reference values. According to the Strategy paper of CCM and IAG (Marti et al. 2014), the already published KC solutions are the official ones. Therefore, KCN solutions provide improved estimates of the official estimates.

In Sect. 4, we are analysing the results of the selected comparisons as a whole, to identify the significance of differences between different types of gravimeters and comparison sites. Further, variabilities of biases for individual gravimeters are shown.

All the uncertainties reported in this study are representing the standard uncertainties with coverage factor k = 1, while expanded uncertainties (k = 2) are used in the KCDB. Accordingly, we are using symbol ± for expressing the standard uncertainty.

2 Elaboration of comparisons

Each gravimeter participating in a comparison is operated at several sites (usually 3, but 4 in 2017) and reporting results of the absolute gravity measurements graw associated with the standard uncertainty uraw. These measurements are representing the mean acceleration of free fall at a given site at the specific measurement height of a gravimeter (Pálinkáš et al. 2012) corrected for defined geophysical effects (tides, atmospheric mass variations, polar motion) and all known instrumental effects. The reported values (graw, uraw) are then transferred to a common comparison reference height using vertical gravity gradients, determined from measurements with relative gravimeters. If a superconducting gravimeter is operated at the station, also corrections due to residual temporal gravity variations are applied (Francis et al. 2012). Every measurement made by the gravimeter “i” (with a bias δi) at the site “j” during the comparison at the given comparison reference height may be described by the observation

$${g}_{ij}={g}_{j}+{\delta }_{i}+{\varepsilon }_{ij}$$
(1)

where εij is the random error associated with the measurement distributed around zero mean E(ε) = 0 with a variance \({s}_{i}^{2}\). The input g-values \({g}_{ij}\) are associated with standard uncertainties \({u}_{ij}\) that besides uraw also include error contributions due to vertical transfer and temporal variations, too.

Equation (1) may be rewritten in matrix form as

$${\varvec{y}}={\varvec{X}}{\varvec{\beta}}+{\varvec{\varepsilon}}$$
(2)

where y is the column vector of all measured g-values, ε is the column vector of random errors, X is the design matrix representing the functional relationship between observations, sites and gravimeters, and \({\varvec{\beta}}=\left(\begin{array}{c}\varvec{g}\\ \varvec{\delta }\end{array}\right)\) represents a column vector of the unknown reference values \({\varvec{g}}\) at each site and biases δ for each gravimeter. It is clear that since the measurements are realized by different gravimeters at different times and sites, the input covariance matrix V associated with y will not be composed of equal diagonal elements, and weighted least squares have to be applied for the solution of (2). The corresponding weight matrix is W = V−1, assuming the standard error of unit weight is equal to one.

As the set of observation equations has no unique solution, a constraint, which can be interpreted as the definition of the consensus reference values, is required (White 2004). The reference values in absolute gravimetry, similarly as for comparisons where the true value of the artefact is unknown, are obtained by constraining biases of participating laboratories/gravimeters as

$$ \mathop \sum \limits_{i = 1}^{n} {\textit {w}_i}\,\delta_{i} = l, $$
(3)

where \( {\textit {w}_i}\) are the weights assigned to each of the n gravimeters and l is the linking converter, discussed in Sect. 2.3. The constraint (3) can be described in matrix form using the column vector of weights \( {\varvec {w}}\) and consequently by the column vector of constraint coefficients c by one linear equation

$$\left({\mathbf{0}} \quad\right.\left. {\varvec {w}}^{T}\right) \left(\begin{array}{c}\varvec{g}\\ \varvec{\delta }\end{array}\right)={{\varvec{c}}}^{T}{\varvec{\beta}}=l$$
(4)

The linearly constrained weighted least squares problem that minimizes \({{\varvec{\varepsilon}}}^{T}{\varvec{W}}{\varvec{\varepsilon}}\) and fulfils the constraint (4) can be solved by the Lagrangian approach in the following form

$$ \left( {\begin{array}{*{20}c} {{\varvec{X}}^{T} {\varvec{WX}}} & {\varvec{c}} \\ {{\varvec{c}}^{T} } & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\varvec{\beta }}\\ k \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {{\varvec{X}}^{T} {\varvec{W}} {\varvec{y}}} \\ l \\ \end{array} } \right) $$
(5)

where k is the unknown Lagrange coefficient (a scalar quantity because only one constraint has been defined). In textbooks on adjustment theory, e.g. Reissmann (1976), it is shown that the inverse of the normal equations (the covariance matrix of unknown parameters) has the following configuration

$${\left(\begin{array}{cc}{{\varvec{X}}}^{T}{\varvec{W}}{\varvec{X}}& {\varvec{c}}\\ {{\varvec{c}}}^{T}& 0\end{array}\right)}^{-1}=\left(\begin{array}{c}{{\varvec{V}}}_{\varvec{\beta\beta}}\\ {{\varvec{V}}}_{{\varvec{\beta}}{\varvec{k}}}^{T}\end{array}\begin{array}{c}{\varvec{V}}_{\varvec{\beta}\varvec{k}}\\ {V}_{kk}\end{array}\right)$$
(6)

By this, the column matrix of adjusted g-values and biases can be expressed as the linear relation between unknowns and measurements

$${\varvec{\beta}}={{\varvec{V}}}_{{\varvec{\beta}}{\varvec{\beta}}}{ {\varvec{X}}}^{T}{\varvec{W}}{\varvec{y}} + {{\varvec{V}}}_{{\varvec{\beta}}{\varvec{k}}} l={\varvec{R}}{\varvec{y}}+ {{\varvec{V}}}_{{\varvec{\beta}}{\varvec{k}}} l.$$
(7)

In our case, with one constraint defined by Eq. (3), \({{\varvec{V}}}_{{\varvec{\beta}}{\varvec{k}}}\) is a column vector containing the value of 1 and − 1 for biases δ and reference values \({\varvec{g}}\), respectively. It is just because the linking converter only shifts the mean level by l. A positive shift of the biases by l causes a decrease in reference values by − l. Therefore, the covariance matrix associated with estimated unknowns is obtained as

$$ {\text{cov}} \left( {\varvec{\beta}} \right) = \user2{R}\,\user2{V}\,\user2{R}^{T} + V_{ll} {\varvec{I}}, $$
(8)

where \({V}_{ll}\) is the variance of the linking converter, represented by the squared uncertainty of the linking converter, and I is the identity matrix.

2.1 Harmonization of uncertainties

Not all non-NMI/DI laboratories present the full uncertainty budget, and therefore, it is legitimate to assume that some error sources might have left unaccounted or underestimated. This fact becomes obvious from the declared measurement uncertainties of the FG5/X gravimeters at all the comparisons except for 2018, where non-NMI/DIs declared uncertainties were roughly lower by 20% for the same type of gravimeters. This fact is causing troubles especially in case of the weighted constraint, and therefore, we decided for a harmonization of the FG5/X gravimeters similarly as in Pálinkáš et al. (2017). Such a harmonization should ensure more realistic weighting of the g-values in Eq. (1) and especially in the constraint given by Eq. (3). We have determined the average uncertainty of 2.4 µGal from uncertainties declared by NMI/DIs for the FG5/X gravimeters. Declared uncertainties of those FG5/X gravimeters that have been lower than this value were changed to 2.4 µGal in the ICN solution. In case of the KCN solution, only the uncertainties of non-NMI/DIs were harmonized.

2.2 Construction of the input covariance/weight matrix

As discussed in Koo and Clare (2012), the special importance in analyses of comparison results is the construction of the input covariance matrix V associated with the measurements y. For comparison results documented in gravimetry, always a diagonal weight matrix W = V−1 was applied, obtained from the diagonal covariance matrix V based on the squared declared uncertainties. It is important to note that the weight matrix has no impact on the definition of the mean level of the reference values (g-values in absolute sense), since it is only connected with Eq. (1), where all absolute gravity measurements are treated relative to the (unknown) reference values for each site and the systematic deviation of each gravimeter (bias) to this value. Similarly, it would be possible to include measurements of a relative gravimeter to Eq. (1). Therefore, it is adequate to include the measurements of non-NMI/DI gravimeters in the official metrological key comparison solution and by this to improve the robustness of the gravity differences between reference values. The covariance or weighting matrix within the constrained least squares approach describes the stochastic component of the functional model and therefore only the capability of an absolute gravimeter to determine gravity values in a relative sense, as observed by a relative gravimeter. Therefore, strictly speaking, the covariance matrix should reflect the measurement repeatability instead of the uncertainty and this is even obvious from the definition of ε in Eq. (1). However, excluding the contribution of systematic error components from uncertainties, error propagation leads to inappropriate covariance estimates from Eq. (8) for the unknowns. Nevertheless, results published in Koo and Clare (2012) show that there is no difference in the estimate of the unknown parameters, regardless of whether V is constructed only as diagonal matrix from the measurement repeatability \({s}^{2}\) or from the measurement uncertainty \({u}^{2}\), once the correlations between measurements will be accounted in non-diagonal elements of V as \({cov}={u}^{2}-{s}^{2}\). Therefore, the same covariance matrices can be used for the determination of the unknowns in Eq. (7) and their error estimates by Eq. (8).

As pointed out in White (2004), the uncertainties reported by the laboratories should be separated into two parts: “…that characterizing the laboratory repeatability and that characterizing the range of values that may reasonably be attributed to the laboratory bias”. Do we have enough information to provide such a separation in absolute gravimetry? Certainly yes, at least for the most common FG5/X gravimeter of which the repeatability is known from combination of measurements with superconducting gravimeters at reference stations as mentioned in Sect. 1. Further, repeatability of different gravimeters has been computed from the scatter of measurements of a particular gravimeter with respect to the reference values at comparisons. From the published results (Francis et al. 2012, 2015; Jiang et al. 2012; Pálinkáš et al. 2017), the repeatability of an FG5/X is about s = 1.2 µGal, and it means roughly half of the typical FG5/X uncertainty (u = 2.4 µGal). Generally, this information means that measurements of a particular gravimeter are correlated which should be reflected in the input covariance matrix V by introducing non-diagonal elements at least for the measurements with the same gravimeter. For the given example of FG5/X gravimeter with uncertainty of 2.4 µGal and repeatability of 1.2 µGal, the covariance will be \({cov}={u}^{2}-{s}^{2}=4.32\,\upmu{\mathrm{Gal}}^{2}\), describing the joint variability of an FG5/X due to errors common to all measurements of the laboratory expressed by the correlation coefficient of \(\rho =({u}^{2}-{s}^{2})/{u}^{2}=0.75\). Therefore, we used this correlation coefficient to determine the covariances for a particular FG5/X “i” from the harmonized uncertainties \({u}_{ij}\) as \({cov}=0.75\, {{u}_{ij,\mathrm{min}}}^{2}\), where \({u}_{ij,\mathrm{min}}\) is the minimum of all \({u}_{ij}\). This approach can be understood as if the measurements of a particular gravimeter carried out within a few days are affected by a group of systematic errors that remain the same.

For a few other types of instruments as A-10, CAG-01, a similar ratio between repeatability and declared uncertainty has been published (Falk et al. 2012; Karcher et al. 2018); therefore, we decided to apply the approach described for FG5/X for all other type of gravimeters, except the rise and fall type of gravimeter IMGC-02 (D’Agostino et al. 2008), where random errors are dominating in the error budget (A. Prato, personal communication), and covariances have been set to zero.

A comparison between the standard approach (V is diagonal with elements \({u}^{2}\)) and our approach where covariances \({cov}={u}^{2}-{s}^{2}\) with the correlation coefficient of 0.75 appear in off-diagonal elements showed that the influence on the estimates of unknowns is negligible, with differences below 0.1 µGal. However, as expected, the error estimates are significantly different as shown in Fig. 1. Here, the variances of the estimated biases are roughly twice as large as those obtained without accounting for the correlations.

Fig. 1
figure 1

Comparison of covariance matrices related to the measurements (Top) and the estimates of unknowns (Bottom) for comparison in 2018 with 48 measurements of 16 gravimeters (unknowns #1–#16) at 4 sites (unknowns #17–#20). Left: All the measurements are treated as independent with diagonal elements computed as square of declared uncertainties. Right: Correlation with coefficient of ρ = 0.75 is taken into account for measurements of a particular gravimeter

2.3 Construction of the constraint

The constraint given by Eq. (3) is defining the consensus value of comparisons in gravimetry, since the true g-values are unknown. The weighted constraint was used for processing of the comparisons in 2009, 2015, 2017 and 2018 (Jiang et al. 2012; Pálinkáš et al. 2017; Wu et al. 2020; Falk et al. 2020). On the other hand, a non-weighted constraint was used for processing comparisons in 2011 and 2013 (Francis et al. 2012, 2015).

The linking converter l in Eq. (3) is conventionally taken to be zero in CCM key comparisons. However, in regional comparisons that have to be linked to a CCM comparison, rigorously l should be computed as (weighted) average of biases achieved at this CCM comparison for those gravimeters that provide the link. In such a case, the sum of respective weights in Eq. (3) must be Σwi = 1.

In de Viron et al. (2011), it was recommended to use a constraint which minimizes the L1 norm of the biases instead of imposing zero mean of biases.

As it can be seen, several possibilities have to be considered in the construction of the constraint. Due to its importance, the following aspects are elaborated:

  1. A.

    Derivation of the weights. Laboratories participating at comparisons are treated as independent, even when they are using the same type of instrument. Of course, some of the error sources might be therefore common to these instruments, but it can be only hardly captured, and thus, the results of all the comparisons presented here will be FG5/X dependent as it can be expected from the dominance of these meters in Table 1. The weighted constraint was used for all the results presented here, and the respective weights are computed from declared uncertainties in case of the KCN solution and from harmonized uncertainties (see Sect. 2.1) in case of ICN solutions. It turns to the question which uncertainty estimate \({u}_{i}\) from several measurements of a particular gravimeter should be associated with the weight \({\textit {w}_i}={u}_{0}^{2}/{u}_{i}^{2}\) in the constraint? Usually, all the declared uncertainties for a particular gravimeter are the same within a range of 10% and commonly the root mean square of the declared uncertainties \({u}_{ij}\) of each laboratory is used to compute \({u}_{i}\). Nevertheless, this approach fails in case of large discrepancies between uncertainty estimates between several observations of a particular gravimeter. Another possibility is to estimate the uncertainty of the mean value analogically to the standard error of the weighted mean as \({u}_{i}=\sqrt{\frac{1}{\sum {u}_{ij}^{-2}}}\), that is, however, correct only for uncorrelated observations, which is not applicable for the majority of gravimeters with dominating contributions from systematic error sources. Therefore, the rigorous way has to account for correlations as discussed in Sect. 2.2. It means separating the error contributions that will not decrease with the number of measurements due to averaging. Nevertheless, such a computation is impractical and also inaccurate due to the limitations to estimate the true value of the correlation coefficient. Therefore, an appropriate and easy possibility is to use the minimum uncertainty from declared/harmonized uncertainties \({u}_{i}=\mathrm{min}({u}_{ij})\) for a particular gravimeter, practically saying that the contribution of a gravimeter to the realization of the reference value will not be smaller than its best measurement having the lowest uncertainty.

  2. B.

    Contribution of laboratories and linking converter. In case of the ICN solution, all the laboratories (except those showing incompatibility, see Sect. 2.4), will contribute to the constraint based on the harmonized uncertainties. Further l will be set equal to zero, which means no difference between the CCM and regional comparison will be considered, expecting that the cloud of participating gravimeters (at least 16 for all comparisons) defines a consistent reference independently. In case of the KCN solution, the contributions/weights of non-NMI/DIs have to be zero in the constraint. Further, the linking converter should be applied for regional comparisons. It is important to test the quality of the link by the differences between the biases obtained from both (CCM and regional) comparisons for those gravimeters providing the link, see Fig. 2. The variability of such differences is independent on the choice of the constraint and allows us to test whether the gravimeters were stable enough to keep their mutual deviations from the CCM to the regional comparison (organized few years later). In an ideal case, all the differences should stay within the repeatability of gravimeters. As it can be seen from Fig. 2, the appropriate link has been done only in 2018. In contrast, it can be clearly seen that the mutual differences between biases have changed for FG5-209 and FG5-215 between 2011 and 2009, similarly as for FG5-215 and FG5-221 between 2015 and 2013. Therefore, in the results presented in Sect. 3, we set the linking converters for KCN solutions to 0.0 µGal, 0.0 µGal and (− 0.78 ± 1.26) µGal for the comparisons 2011, 2015 and 2018, respectively. These values differ from the original KC solution only for the comparison in 2015, where the linking converter of + 0.32 µGal was used, but as it is shown in Fig. 2, the quality of the link is weak and the approach as in 2011 should have been followed (l = 0).

  1. C.

    The solution minimizing the L1 norm of biases was computed numerically. The “L1 norm” results have been achieved from “zero mean” results, by shifting the biases by a value δc in the range of ± 10 µGal with a step of 0.01 µGal. Finally, we determined such a δc for which:

    $$ \mathop \sum \limits_{{{{i}} = 1}}^{{{n}}} \left| {\textit{w}_{i} (\delta_{{{i}}} + \delta_{{\rm c}} )} \right| = {\rm min}{.} $$
    (9)
Fig. 2
figure 2

Quality of the link carried out at comparisons in 2011, 2015 and 2018 is expressed as differences between biases (from the ICN solutions) at regional and CCM comparison (in 2009, 2013 and 2017) of those gravimeters that provide the link. Error bars have been determined based on repeatabilities of gravimeters. Note Correction for systematic effects (self-attraction, diffraction) was not unified between comparisons

2.4 Consistency check

According to Cox (2002), a Chi-square test should be used for testing the overall consistency of the results, before accepting consensus reference values of the comparison. In case of AG comparisons, we have the possibility to test directly the biases

$${\chi }_{\mathrm{ave}}^{2}=\sum_{i=1}^{n}\frac{{\delta }_{i}}{{u}_{i}^{2}}$$
(10)

or the complete set of deviations from reference values \({d}_{ij}={g}_{ij}-{g}_{j}\) expressed by the column vector \({\varvec{d}}\) [Eq. (16)]. In this case, we suppose that the related Chi-square value has to be computed from the variance matrix to take into account correlations according to Woolliams et al. (2006)

$$ \chi_{{{\text{obs}}}}^{2} = {\varvec{d}}^{T} {\varvec{V}}^{ - 1} {\varvec{d}}.$$
(11)

To specify the critical Chi-square value with probability of 5% for statistics given by Eqs. (10) and (11), the degrees of freedom have to be assigned.

The Chi-square test has not been used in published results of AG comparisons. Nevertheless, inconsistent measurements were investigated based on normalized deviations (Newell et al. 2017)

$${r}_{ij}=\frac{({g}_{ij}-{g}_{j})}{{u}_{ij}}$$
(12)

or the compatibility index (Francis et al. 2015)

$$ E_{ij} = \frac{{\left( {g_{ij} - g_{j} } \right)}}{{\sqrt {u_{ij}^{2} + u_{j}^{2} } }}. $$
(13)

However, these equations are not in agreement with the error propagation after the adjustment, since measurements and reference values are correlated and this is not taken into account in the denominators of Eqs. (12) and (13). As, for example, given in Cox (2002), for a reference value \({g}_{\mathrm{ref}}\) computed as simple weighted mean, the deviation of institute i results in the variance \({u}^{2}\left({d}_{i}\right)={u}^{2}\left({g}_{i}\right)-{u}^{2}\left({g}_{\mathrm{ref}}\right)\). Similarly, to find out the correct uncertainty estimate for the model given by Eqs. (1) and (3), the covariance matrix of deviations has to be computed.

The matrix R from Eq. (7) expresses the unknowns (biases and reference values) as the linear combination of measurements. The complete vector of measurements and estimated unknowns can be written (without accounting for the linking converter) as

$$\left(\begin{array}{c}\varvec{y}\\ \varvec{\beta} \end{array}\right)=\left(\begin{array}{c}\varvec{I}\\ \varvec{R}\end{array}\right){\varvec{y}}$$
(14)

where I is the identity matrix with size (no,no), and no is the number of measurements. The associated covariance matrix is then

$$\mathrm{cov}\left(\begin{array}{c}\varvec{y}\\ \varvec{\beta }\end{array}\right)=\left(\begin{array}{c}\varvec{I}\\ \varvec{R}\end{array}\right){\varvec{V}} {\left(\begin{array}{c}\varvec{I}\\ \varvec{R}\end{array}\right)}^{T}={{\varvec{V}}}_{{\varvec{a}}{\varvec{l}}{\varvec{l}}}$$
(15)

reflecting correlations between measurements and unknowns according to the functional model. Therefore, the matrix from Eq. (15) can be used to express covariances for any functional dependency between measurements and estimated unknowns, including the deviations from reference values: \({d}_{ij}={g}_{ij}-{g}_{j}\). These deviations can be described by a design matrix Ad (composed of values 0, 1 and − 1) as

$${\varvec{d}}={{\varvec{A}}}_{{\varvec{d}}}\left(\begin{array}{c}\varvec{y}\\ \varvec{\beta }\end{array}\right)$$
(16)

with the associated covariance matrix

$$ {\text{cov}}\left( {\varvec{d}} \right) = {\varvec{A}}_{{\varvec{d}}} \user2{ V}_{{{\varvec{all}}}} {\varvec{A}}_{{\varvec{d}}}^{T} \left\{ { + V_{ll} {\varvec{I}}\} } \right., $$
(17)

where the variance of the linking converter \({V}_{ll}\) should be included. Finally, the compatibility index is

$$ E_{ij} = \frac{{\left( {g_{ij} - g_{j} } \right)}}{{u(d_{ij} )}}, $$
(18)

where the uncertainty of deviations \(u({d}_{ij})\) is achieved as square root of the diagonal elements of cov(d). Note that \(\sqrt{{u}_{ij}^{2}-{u}_{j}^{2}}\le u({d}_{ij}) \le {u}_{ij}\) is always valid and outliers should be identified according to the level of confidence with the coverage factor k as \(\left|{E}_{ij}\right|>k\).

If a gravimeter does not meet the compatibility criteria, it should not contribute to the definition of the reference value (weight = 0 within the constraint). The treatment of incompatible measurements within the weighting matrix is, however, a different issue since the repeatability is a parameter that is important in this case, i.e., a gravimeter with a significant bias should not be automatically excluded from the weighting matrix. For those gravimeters i that showed incompatible measurements at the confidence level of 95% (k = 2), we have computed also the parameter of repeatability and following modifications have been done in the adjustment for the ICN solution: (1) if more than one measurement of a gravimeter showed incompatibility, such a gravimeter was excluded to contribute to the constraint, (2) if one measurement was incompatible, the harmonized uncertainties were increased by 50%, and (3) if the repeatability of measurements was higher than the harmonized uncertainty, the harmonized uncertainty was enlarged by 50% that consequently reduced the contribution of this gravimeter to the constraint and the parameter estimate too. The increase in uncertainties by 50% is therefore just a parameter of a cycle for progressive adaptation of uncertainties within the stochastic model. In case of the KCN solutions, we followed the rules applied in published KC solutions, non-compatible measurements of NMI/DIs have been excluded and the harmonized uncertainties of non-NMI/DIs were used in the variance matrix as in the ICN solution.

2.5 Degree of equivalence and related uncertainty estimates

In metrology, the degree of equivalence (DoE) of a measurement standard is quantitatively defined as the deviation from the key comparison reference value. Therefore, the biases determined from the adjustment should be equivalent with DoE. Nevertheless, the harmonization of uncertainties described in Sect. 2.1 and the treatment of inconsistent measurements explained in Sect. 2.4 mean that individual deviations from the reference value of a particular gravimeter are not weighted exactly according to the assumptions of the operators. Therefore, those DoE should be computed according to Jiang et al. (2012) as weighted average of the deviations from reference values, using the formula

$$ D_{i} = \left[ {\sum \textit{w}_{ij} \left( {g_{ij} - g_{j} } \right)} \right]{/}\sum \textit{w}_{ij} , $$
(19)

where weights are computed based on the declared uncertainties \( \textit{w}_{ij}=1/{u}_{ij}^{2}\). To derive the covariance estimates of DoEs, it is necessary to construct a design matrix \({{\varvec{A}}}_{{\varvec{D}}}\) that contains, in correspondence with Eq. (19), the ratios \(\textit{w}_{ij}/\sum\textit {w}_{ij}\) and zeros. Further, analogically to Eq. (17), we have to derive the covariance matrix of individual deviations, however, in this case based on declared uncertainties, to determine the error propagation according to the declared estimates. To do so, it is necessary to construct a matrix \({{\varvec{V}}}_{{\varvec{d}}{\varvec{e}}{\varvec{c}}}\) (similarly to V) from squared declared uncertainties as diagonal elements and treating non-diagonal elements using the assumed correlation coefficients. Consequently, \({{\varvec{V}}}_{\varvec{dec}}\) instead of V is propagated according to Eq. (15), achieving \({{\varvec{V}}}_{\varvec{alldec}}\) instead of \({{\varvec{V}}}_{\varvec{all}}\) and analogically to Eq. (17) resulting in \({\rm cov}\left({{\varvec{d}}}_{\varvec{dec}}\right)={{\varvec{A}}}_{{\varvec{d}}} {{\varvec{V}}}_{\varvec{alldec}} {{\varvec{A}}}_{{\varvec{d}}}^{T} \left\{+{V}_{ll }{\varvec{I}}\}\right.\) to determine the covariance matrix of the DoE

$$ {\text{cov}}\left( {\varvec{D}} \right) = {\varvec{A}}_{{\varvec{D}}} {\text{ cov}}\left( {{\varvec{d}}_{{{\text{dec}}}} } \right) {\varvec{A}}_{D}^{T} \left\{ { + V_{ll} {\varvec{I}}\} } \right.. $$
(20)

The discussion concerning uncertainties of individual deviations might be assumed as marginal and fruitless. Nevertheless, the main goal of a comparison is to achieve not only the deviations of the laboratories from the reference value but also the associated uncertainty estimates that indicate the laboratories if their measurements are consistent within the declared uncertainties.

In the past, different approaches were used to determine the uncertainty of DoE. Since no correlations were assumed for the covariance matrix of observations, mathematically correct estimates given by Eq. (8) were too optimistic and were therefore not used. As pointed out in Francis et al. (2015): “It can be shown that with increasing N (number of measurements for a particular gravimeter) the uncertainty of the DoE determined in this way decreases approximately in proportion to 1/√N. Thus this uncertainty is not appropriate for assessing the compatibility of the DoE with the declared uncertainty of the gravimeter. Using it effectively implies an uncertainty model where with increasing N the DoE of a gravimeter should converge towards zero for the gravimeter to stay in equivalence”. Therefore, the DoE was linked to the RMS of the uncertainties of the differences between the gravimeter measurements and the KCRV by summing up variances of measurement and KCRV. However, this solution is incorrect for two reasons: (1) As already discussed in Sect. 2.4, the uncertainty of the difference should be smaller than the uncertainty of the measurements itself because the correlation with reference values has to be taken into account. (2) The RMS cannot be a good estimate as it can be easily demonstrated when measurements of a particular gravimeter have different uncertainties, because the weighted average of a set of deviations cannot be associated with higher uncertainty than the lowest uncertainty from individual measurements.

In the final check of consistency, the square root of diagonal elements of cov(D) given by Eq. (20) should be compared with DoE at a given confidence level.

2.6 Mean square error of the unit weight

As it can be seen from Eqs. (8) or (20), for obtaining uncertainty estimates for the unknown parameters, the covariance matrices have not been multiplied/scaled by the mean square error of the unit weight (MSE)

$${s}_{0}^{2}=\frac{{{\varvec{\varepsilon}}}^{T}{{\varvec{V}}}^{-1}{\varvec{\varepsilon}}}{\mathrm{DoF}}$$
(21)

with DoF degrees of freedom and \({\varvec{\varepsilon}}\) as the vector of residuals from Eq. (1) estimated from the adjustment. It means that the given uncertainty estimates (square root of diagonal elements of the respective covariance matrix) are based directly on the declared or harmonized uncertainties, the assumed correlation between measurements and the error propagation through the adjustment, in agreement with White (2004) and Koo and Clare (2012). This approach (non-scaled covariance matrix) is not usual in geodesy. However, the main role of the comparison in metrology is to check whether the declared uncertainties are consistent with the measurement. Therefore, the input covariance matrix V can be understood as built on known/a priori uncertainty estimates and the goal is not to obtain an a posteriori estimate based on dispersion of residuals, but to determine how the a priori errors are propagated through the functional model. The relevance of such an approach is demonstrated in Fig. 3, where both, scaled and non-scaled estimates of bias uncertainties for different choices of the correlation coefficient, are shown for the comparison in 2018. It is evident that the scaled estimates are significantly depending on the choice of the correlation factor since the estimates of s0 are changing by 100% from 0.4 to 0.8 µGal, for a correlation coefficient of 0.0 and 0.75, respectively.

Fig. 3
figure 3

Uncertainty estimates for the bias (ICN solution) of gravimeters participating in the comparison of 2018 depending on the choice of the correlation coefficient between measurements of the same gravimeter and scaling/non-scaling of the covariance matrix given by Eq. (8) by the mean square error s0

The MSE should reach \({s}_{0}^{2}\cong 1\) for measurements associated with proper covariance matrix and a complete functional model. Therefore, \({s}_{0}^{2}\) can be understood as a measure for the goodness of fit of the model to the measurement results and can be used as an overall consistency test between model and measurements by comparing \({\mathrm{DoF}\cdot s}_{0}^{2}\) with the expected value from the Chi-square distribution DoF with the standard deviation \(\sqrt{2\cdot \mathrm{DoF}}\) (Sutton 2004).

3 Results of reprocessing

Tables 3, 4, 5, 6, 7, 8 and 9 in Appendix show the results of comparison reprocessing according to the approach described in Sect. 2. The estimated parameters of the adjustment are the gravimeter's biases and the reference gravity values. The bias estimates are related to the harmonized uncertainties and therefore are not so strongly influenced by a potential overestimation of the measurement accuracy of some laboratories. On the other hand, degrees of equivalence (DoE) are related to the declared uncertainties and should be used to validate the equivalence of the estimated deviations of a laboratory/gravimeter from the reference values, including all observations. If the harmonized uncertainties of a particular gravimeter are higher than the declared ones, the error estimates for biases are also higher than those for DoEs. It is exactly the reason, why a few non-NMI/DI gravimeters have been found to be inconsistent—due to declared uncertainties which are too low, see Tables 3, 4, 5, 6, 7 and 8. Generally, our approach of computing DoE uncertainties causes narrower limits to indicate a gravimeter inconsistency as the approaches published previously.

Uncertainty estimates of reference values of the ICN solution are (20–50)% smaller than those of the KCN solution. This is caused by the larger number of gravimeters used for constraining the adjustment, and it clearly indicates the advantage to include the non-NMI/DI gravimeters in the comparisons. The ICN solution cannot be used as official results since key comparisons are mandatory according to the Strategy paper (Marti et al. 2014). Nevertheless, it is a valuable test of the robustness of the KCN solution and therefore very useful especially in regional comparisons, where the link may be uncertain (see Sect. 2.3). Except of the huge benefit from the non-NMI/DIs for the precise determination of relative ties between reference values, it is another argument to keep the concept of joint comparisons of all absolute gravimeters.

The main results of the reprocessing are summarized in Table 2. On average, the differences between official reference values (KC) and corresponding new reprocessing (KCN) are up to 0.5 µGal. The systematic difference between ICN and KCN solutions is due to the different definition of the constraint. Surprisingly, the largest difference of 1.5 µGal can be seen for the comparison in 2018, where the link was done in a rigorous way, accounting for the linking converter with a precision of about 0.6 µGal. It therefore indicates that the reference level of the cloud of all gravimeters in 2017 and 2018 would be different by 1.5 µGal. This example shows how important an additional solution to the KC is to check the robustness and reliability of the results.

Table 2 Main results of the reprocessing of comparisons

As expected, the difference between the approaches constraining zero-mean biases and minimizing the L1 norm of biases reaches values exceeding 1 µGal only for the KCN solutions of regional comparisons, due to the small number of NMI/DI gravimeters that are contributing to the definition of the constraint. Naturally, the results of the ICN solution are therefore statistically more robust on the definition of the constraint. Generally, it is worth to apply also the L1 norm statistics to the comparisons, since it again brings additional information on the robustness of the results.

The mean square errors \({s}_{0}\) are ranging between 0.73 and 1.27 µGal for all the comparisons. In case correlations between measurements of a particular gravimeter would not be accounted, \({s}_{0}\) would always drop below 0.7 µGal, which clearly indicates an inconsistency between stochastic model and measurements. Table 2 contains also a Chi-square test of \({s}_{0}^{2}\) consistency according to Sect. 2.6. This test indicates a discrepancy between the measurement uncertainty (including correlations) and the consistency of the observations with the functional model. If the assumption about the input covariance matrix matches the dispersion of the residuals, the mean square errors \({s}_{0}\) should be close to 1. As it can be seen, the comparisons in 2013 and 2017 show a significant deviation from this assumption, reflected by the DoF and its limits. In contrast, for 2018, the consistency between observations and models is much better than reflected by the input covariance matrix. Both cases may be related to the harmonized uncertainties, the implicated repeatability and selected correlation coefficient, which may have caused an over- or underestimation in these cases.

4 Analyses of comparison results

Data achieved from reprocessing of 6 comparisons allowed us to investigate some interesting aspects of final results that helps to answer questions related to the significance of deviations between different types of gravimeters or the influence of noisy sites. Results from the ICN solutions have been used for this analysis.

4.1 Deviation between FG5 and FG5X gravimeters

Altogether, 97 biases of FG5/X gravimeters have been determined at 6 comparisons, while 6 of them have been excluded from all the next analyses due to the detected outliers at the 99% level of confidence. These outliers are related only to three gravimeters (FG5-102 in 2009, 2011 and 2013; FG5-230 in 2009 and 2011; and FG5X-247 in 2015). The data set of 91 biases seems to be normally distributed, see Fig. 4. Nevertheless, is it correct to assume that FG5 and FG5X gravimeters share the same probability distribution of biases?

Fig. 4
figure 4

Histogram of biases for FG5/X gravimeters at 6 comparisons. Left: 97 biases (filled columns) together with corresponding histogram for the normal distribution N(− 0.4, 2.82) depicted by empty columns. Right: Histogram of 91 biases after excluding 6 outliers (filled column) together with corresponding histogram (empty columns) of the normal distribution N(0, 2.12)

From the remaining 91 biases, 62 and 29 are related to FG5 and FG5X gravimeters, respectively. In case of FG5 biases, the mean value reaches (0.07 ± 0.27) µGal with the sample standard deviation of 2.08 µGal. In case of FG5X biases, the mean value reaches (− 0.21 ± 0.39) µGal with the sample standard deviation of 2.08 µGal. Based on statistical t test and F test, we can conclude that both data sets show very high level of consistency and therefore can be described with a single distribution function of which the mean value is (− 0.02 ± 0.22) µGal and the sample standard deviation of 2.08 µGal (with standard error of 0.16 µGal). The achieved standard deviation of about 2.1 µGal can be interpreted as an experimental estimate for the reproducibility of FG5/X gravimeters in general. It is a quite important result, since it is not related to a single instrument (as estimates given in Sect. 2), but it reflects the repeatability of the specific type of gravimeter. Further, this result shows that the experimental standard uncertainty of FG5/X gravimeters should be larger than 2.1 µGal due to possible systematic errors of this gravimeter's type.

4.2 Comparison sites

It is shown in Sect. 4.1 that biases of FG5 and FG5X gravimeters might be treated as a random variable with a common Gaussian distribution. Nevertheless, also the aspect of invariability of the variances of biases between different comparisons has to be fulfilled for describing all the biases with a single distribution. This condition is not self-evident because comparisons have been carried out at different sites with different noise conditions. Standard deviations of FG5/X biases at particular comparisons are shown in Fig. 5. As it can be seen, the scatter of results is within their standard errors associated with standard deviations. Nevertheless, the largest standard deviation is related to the comparison in 2015 in Belval, where the measurement conditions were significantly worse than for other comparisons, because the comparison sites were located close to sources of anthropogenic noise (traffic, construction works around). Further, the sites have been located on a “strong floor” (for load tests in civil engineering) that was not founded directly on the subsoil but supported by three 3 m high and 10 m long girders grounded on the building foundation. To test whether the variance at the Belval comparison is consistent with the other comparisons, the F-test statistics of 1.88 has been determined that corresponds to an 10.2% F critical value and the hypothesis on equality of standard deviations cannot be rejected at 95% confidence level so the data set might be described by a common distribution function of N(0, 2.12). In fact, this finding shows that FG5/X gravimeters are able to reach consistent results also in case of sites with high noise level.

Fig. 5
figure 5

Standard deviations of biases of FG5/X gravimeters achieved at 6 comparisons. Error bars represent the standard error of standard deviations

4.3 FG5/X bias

The determined results at comparisons are significantly influenced by FG5/X gravimeters due to their dominance and high weights in the constraint. To investigate whether there is a significant difference between FG5/X and other type gravimeters, we have divided the biases of gravimeters into these two groups and corresponding mean differences have been computed for all the comparisons. As it can be seen in Fig. 6, all these differences are within the respective standard uncertainties. Nevertheless, due to higher uncertainties of non-FG5/X gravimeters, the standard uncertainties of the mean differences are ranging from 1.7 µGal to 4.1 µGal for individual comparisons. The weighted average of all differences reaches (− 0.4 ± 1.0) µGal.

Fig. 6
figure 6

Differences between average biases of FG5/X gravimeters and other type of gravimeters at particular comparisons. Error bars represent the standard uncertainty of the difference

4.4 Bias variability

Variability of biases from the ICN solutions for gravimeters participating at more than 3 comparisons is shown in Fig. 7. Significant changes are obvious for FG5-102 after its rebuilt to FG5X-102 in 2014. On the other hand, highly consistent results are documented for FG5-202 or FG5-242. Also, slight transient changes are visible for FG5-301 and FG5X-302 which barely exceed the uncertainty. In case of FG5-301, a change could also be related to service by the manufacturer in 2016.

Fig. 7
figure 7

Variability of biases for gravimeters participating at more than 3 elaborated comparisons. Error bars represent the standard uncertainty of determined biases. An upgrade of a gravimeter from FG5 to FG5X is indicated by a gap in connecting lines and by different markers. In case of FG5-215, the gravimeter was not upgraded to FG5X but equipped with a new measurement system described in Křen et al. (2016)

5 Summary and conclusions

A general procedure to process and analyse comparisons of absolute gravimeters by the method of constrained least squares has been described, focusing on several details which have been treated differently in the past, addressing specifically the separation into groups of gravimeters operated by metrology laboratories and the geodetic/geoscience community. It has been shown that for a reliable and thorough evaluation of comparison results, it is necessary to distinguish between stochastic and systematic contributions within the frame of declared uncertainties of gravimeters. It allows to construct a realistic covariance matrix of measurements which allows to rigorously follow the law of error propagation from the observational data to the measure of compatibility. This approach was described mathematically in detail and applied for 6 comparisons where the covariances between measurements of a particular gravimeter have been accounted. Based on a ratio between repeatability and uncertainty, a correlation coefficient reaching up to 75% was applied. Compared to the common approach without accounting for correlations, the estimated unknown parameters, biases and reference values, were affected only slightly (within a range of 0.2 µGal). However, the estimated variances of these parameters are much more realistic, achieving roughly doubled values. The new approach for computing DoE uncertainties results in smaller values than the approach used in the past. Therefore, more gravimeters seem to be inconsistent with declared uncertainties as published previously. In majority, this is the case for non-NMI/DI gravimeters that are tending to underestimate their measurement uncertainties. Of course, the constraint might be influenced accordingly and in fact, from scientific point of view, this is the only argument against a solution where all gravimeters are treated equally. Nevertheless, as we showed, this can be solved by a harmonization of uncertainties.

As it can be anticipated, the main impact on the determination of reference values and biases is given by the definition of the constraint. We demonstrated that a solution (ICN) in which gravimeters are not separated into groups according to metrological institutes or geodetic operators, yields parameter estimates which are naturally more robust because more gravimeters contribute to the constraint. This solution should be used in testing the robustness of official key comparisons results as well. It is needed especially for the regional (e.g. EURAMET) comparisons where the link to international (CIPM) comparisons is provided only by a few linking gravimeters. We showed that for EURAMET comparisons in 2011 and 2015, carried out two years after CIPM comparisons, 4 gravimeters providing the link were not able to keep their biases within the expected precisions. On the other hand, in 2018, when the EURAMET comparison has been organized one year after the CIPM comparison, the link has been realized with a precision well below 1 µGal.

Differences between the published and newly presented solutions (including those based on L1 norm) are within a range of 2 µGal for all the reprocessed comparisons. Such a difference is still in agreement with the presented uncertainty estimates for the reference values (0.7–1.5 µGal). Of course, the fact that all comparisons are affected by significant dominance of FG5/X gravimeters is not reflected in the error estimates, since all the gravimeters were treated as independent. Nevertheless, generally, the method of elaboration of comparisons we have shown allows to include the possible correlation related to the whole set of observations of gravimeters type.

The results of uniformly processed comparisons are used to verify the statistical significance of some interesting information included in results. We compared the biases achieved by FG5 and FG5X gravimeters showing that both can be described by the same normal distribution N(0, 2.12). Such a finding can be interpreted as the different dropping chambers of these instruments neither causing systematic biases between instruments nor differences in the reproducibility. However, the zero mean of the normal distribution for FG5/X rather stands for the consistency of this particular group of gravimeters than for bias-free results or the ability to determine the bias of these types of gravimeters with respect to other gravimeters with an accuracy well below 1 µGal. On the other hand, the obtained standard deviation of 2.1 µGal of a normal distribution represents an experimentally documented reproducibility of FG5/X.

Comparisons of absolute gravimeters and their processing play an essential role for the determination of the absolute gravity reference, important specifically in geodesy and metrology in the frame of the realization of International Reference Gravity System and the Kibble balance experiment. With this study, we introduce a consistent treatment of comparisons carried out in the past, allowing for a thorough reanalysis with regard to stability of references values over time and compatibility of applied systematic corrections. We further provide a frame for the elaboration of future comparisons (KCN and ICN solutions) as a basis for the establishment of a long-term stable gravity reference frame, compliant with the requirements in geodesy and metrology.

The official (KC) reference values published in the past differ on average by less than 0.5 µGal from our equivalent KCN solutions, showing that adequate processing strategies have been applied in the past. The significant improvement, highlighted in this paper, is more realistic uncertainty estimates for evaluated parameters (reference values, biases, DoE). The solution including all gravimeters (ICN) which is not strictly following the rules in metrology has a huge benefit at least for testing robustness of the KC results that might be less reliable especially in regional comparisons with only a few linking gravimeters. From this point of view, the ICN solution, as the statistically most robust solution, should also be provided along with the official results and should be documented for the realization of the IGRS.

Presently, only 7 NMI/DIs have published Calibration and Measurement Capabilities in gravity, and hence, we strongly recommend to keep the model of joint comparisons of absolute gravimeters from metrology and geoscience communities for the future. It seems to us as a key factor to continue the 40-year successful cooperation between metrology and geoscientific communities that boosted knowledge in absolute gravimetry.