1 Introduction

The superconducting gravimeter (SG) is presently the most precise instrument to determine the time fluctuation of local gravity. The measure itself is the voltage controlling the current in a feedback coil in order to keep the equilibrium of a superconducting sphere between the gravity and the magnetic levitation. This voltage is converted into gravity changes using amplitude and phase calibration factors. A precise calibration of the superconducting gravimeters in amplitude is required to constrain oceanic tidal loading models or to evaluate the recent global Earth models, which do not differ by more than in their tidal gravimetric factors and only 0.01 degree—or equivalently 1.2 s at the frequency of 2 cycles per day—in the phase (Baker and Bos 2003). Hence, it is advisable to achieve a calibration with a precision level of 1 per mille in amplitude and 0.01 degree in phase.

In practice, the phase can be estimated at the 0.01 s level by measuring the instrument response to step or sine waves (Van Camp et al. 2000). Nowadays, the amplitude factor is classically determined using side-by-side measurements from an absolute gravimeter (AG). The tidal signal allows a determination of the calibration factor at the level, given its large amplitude, about 20 times larger than the influence of the atmosphere, the hydrosphere, or the polar motion. Other methods, moving a known mass around the SG (Achilli et al. 1995), installing the SG on a calibration platform (Richter et al. 1995), or comparing with spring gravimeters (Meurers 2012), also allows an estimation of the calibration factor at the level.

Francis (1997) already noticed that a strong tidal signal is required to obtain a good precision and could achieve the level in lesser than 2 days when the tidal signal was strong. Hinderer et al. (1998) could achieve the level after 6.5 days. Then, Francis et al. (1998) reported that at least 5 to 7 days of measurements side-by-side with an AG are required to reach the precision level on the SG calibration factor, but in that study they could not benefit from time series starting at a tidal extremum. Combining data from more calibration experiments allows a calibration below the precision (Rosat et al. 2009; Meurers 2012; Virtanen et al. 2014).

Presently, the calibration precision of SGs operating in the framework of the Global Geodynamics Project (GGP) (Crossley and Hinderer 2009) reaches a few per mille (Meurers 2001). Clearly, as discussed by Rosat et al. (2009), Meurers (2012) and Virtanen et al. (2014), different calibration experiments on the same instrument can give results that differ by more than . As far as we know, this is the first study quantifying the number of experiments needed to achieve 1 per mille level with a given confidence interval, by investigating the causes of the uncertainties on the amplitude of the calibration factor, and propose methods to mitigate their impact. In particular, we quantify the aliasing effect affecting AG measurements and the attenuation bias caused by noisy SG series.

2 Effect of the noise on the absolute gravity measurements

The calibration factor \(\beta \) is computed by a least-square fit (LSQ) on the observation equations:

$$\begin{aligned} y_i =\beta x_i +P( {t_i })+\varepsilon _i, \end{aligned}$$
(1)

where \(x_i\) and \(y_i\) represent the SG and AG times series, respectively (\(i=1,\ldots ,N\)). \(\varepsilon _i \) are the measurement errors on the AG time series, called drop-to-drop scatter. As this is usually a Gaussian white noise (Van Camp et al. 2005), we call it GW noise here after. \(P(t_i )\) is a first- or second-degree polynomial, which accounts for the differential instrumental drifts between both instruments (Imanishi et al. 2002; Meurers 2012) and which is estimated by the least-squares fit simultaneously with \(\beta \). If the drift is not accounted for, the calibration factor is biased as shown by e.g., Hinderer et al. (1991), Francis and Hendrickx (2001), or Meurers (2012).

2.1 Amplitude of the tidal signal

To enhance the signal-to-noise ratio (SNR), calibrations should be performed at spring tides, where the gravity variation amplitude reaches its maximum. As the tide amplitude decreases, this will cancel out the increase of the factor \(\sqrt{N} \) at a given stage. As an example, we generated a 120-day-long synthetic tide at the Membach station (Belgium, 50.61\(^{\circ }\)N, 6.01\(^{\circ }\)E), with a starting date 3 days before spring tides, and added a Gaussian white noise distributed with a zero mean and 70 nm/s\(^{2}\) rms amplitude, consistent with a typical GW. The sampling rate is 1 data (or drop) per 10 s. We then fitted this series on the synthetic tide acting as an SG time series, and calculated the error on the calibration factor provided by the LSQ process as a function of the length of the time series. This is shown by the black curve on Fig. 1a, b, where it is compared with the \(1/\sqrt{N} \) function shown in red. This red curve is normalized, only the comparison of rates of change is meaningful. One can see that the 1.0 and levels of precision are reached after 1.5 and 5 days, respectively. Note that this precision depends linearly on the drop to drop standard deviation, as pointed out by Hinderer et al. (1998).

Figure 1a, b also evidence the step-like decrease in the error, where the decrease in the error slows down every 14 days, at neap tides. In other words, as discussed by Francis (1997), the standard deviation of the estimator of the calibration factor decreases when the tidal signal is large. Note that on Fig. 1b, the error first decreases faster than \(1/\sqrt{N} \), because the amplitude of the tide increases over the first 3 days.

In our example, less than 5, 42, and 98 days are required to achieve, respectively, the 1, 0.5, and level of precision, provided that the GW remains lower than 135 nm/s\(^{2}\), for peak-to-peak amplitude of the tidal signal of about 2000 nm/s\(^{2}\) during those 5 days. Using real data, Francis (1997) could achieve the level in lesser than 2 days when the tidal signal was maximum, while Francis et al. (1998) obtained error bars at the level of starting from day #7. This is consistent with our simulations, considering their experiment parameters: GW equal to 63.5 nm/s\(^{2}\) and 100-drop sets observed hourly at a rate of 1 drop by 10 s (Fig. 2). If this calibration experiment had started at day #8, corresponding to a tidal maximum of 2550 nm/s\(^{2}\), our simulation shows that 2.8 days would have sufficed for a same GW of 63.5 nm/s\(^{2}\).

Fig. 1
figure 1

a In gray, the tidal signal simulated at the Membach station (Belgium, 50.61\(^{\circ }\)N, 6.01\(^{\circ }\)E, days since 2014-07-08 00:00). AG series synthesized by adding a GW noise of amplitude 70 nm/s\(^{2}\) to the tidal signal (10 s sampling interval, continuously). In black, evolution of the error on the calibration () as a function of the number of days. In red, the \(1/\sqrt{N} \) law normalized on the first value of the standard deviation. Due to this normalization, only the comparison of rates of change is meaningful. b Same as (a) focusing on the 11 first days

Fig. 2
figure 2

In gray the tidal signal simulated at the Boulder station (USA, 40.13\(^{\circ }\)N, \(-\)105.20\(^{\circ }\)E, days since 1996-07-20 00:00). AG series synthetized by adding a GW noise of amplitude 63.2 nm/s\(^{2}\) to the tidal signal [10 s sampling interval, 100 drops per set, one set per hour as in Francis et al. (1998)]. In black evolution of the error on the calibration () as a function of the number of days. In red, the \(1/\sqrt{N} \) law normalized on the first value of the standard deviation. Due to this normalization, only the comparison of rates of changes is meaningful

Fig. 3
figure 3

Signal used to test the influence of the truncation: in red the full series, in blue the truncated series. For legibility the truncated series was shifted right

2.2 Measuring during tidal extremes

As the noise level of the SG does not depend on the gravity value, the SNR is maximum when the gravimetric tide reaches its extrema. In the same experiment as in the example here above, we used the full-time series and only kept gravity amplitudes around gravity extremes, as shown in Fig. 3. This was done considering one AG measurement per 5 s and one per 10 s, for an AG GW noise with a standard deviation of 62 nm/s\(^{2}\). The errors, in per mille, are given in Table 1, for 10,000 runs.

For a given number of drops, we can reach the same precision of by doubling the sampling rate and limiting the measurements according to the truncation than what is obtained when measuring on the whole time series. In the present case, where the AG noise reflects quiet conditions, and does not experience aliasing, there is no preferred choice. This is rather an economic choice to be made by the operator, as a function of the actual environmental noise and taking into account the AG operational costs. The protocol should be also adapted as a function of the station and amplitude of the tidal signal, especially near the poles where the diurnal and semi-diurnal tides are missing, or near the equator where there is no diurnal tide.

Table 1 Error on the calibration factor using the whole or truncated series as shown in Fig. 3, and of the sampling rate
Fig. 4
figure 4

PSDs of simulated AG noise. In black, using 1 data per 5 s, in red using 1 data per 10 s. The semi-diurnal (2 cycles per day) and diurnal (1 cycle per day) tidal frequencies are indicated by the arrows

2.3 Aliasing

When the microseismic noise is high, causing a drop-to-drop scatter higher than about 150–200 nm/s\(^{2}\), an aliasing effect influences the AG measurements. Increasing the sampling rate from 1 drop per 10 s to 1 drop per 5 s is a way to reduce this effect (Van Camp et al. 2005).

To quantify the actual effect of the aliasing on the calibration factor, a GW noise of 150 nm/s\(^{2}\) is generated, to which a high-frequency noise is added. To obtain this high-frequency noise a GW noise of 1000 nm/s\(^{2}\) amplitude is generated. Then, in the frequency domain its amplitude spectrum is multiplied by \(f^3\), before coming back to the time domain. This allows simulating the spectral edge of the high-frequency microseismic noise. The total contribution to the rms noise amplitude in the frequency band up to 0.1 Hz is 231 nm/s\(^{2}\). The PSD of the noise model has a value of \(3.4\cdot 10^{5}\) nm\(^2\) s\(^{-4}\) Hz\(^{-1}\) at 0.001 Hz and is shown in black in Fig. 4. Discarding of every other sample creates a strong aliasing effect, as shown in red in Fig. 4.

The ratio of the standard deviations is linked to the ratio of the PSDs according to

$$\begin{aligned} \frac{\sigma _1 }{\sigma _2 }=\sqrt{\frac{\mathrm{PSD}_1 }{\mathrm{PSD}_2 }}, \end{aligned}$$
(2)

where PSD\(_{1}\) is related to the noise level of the 1 per 10 s data and PSD\(_{2}\) of the 1 per 5 s ones. Consequently, for white noise PSD levels of \(1.0\cdot 10^7\) and \(3.4\cdot 10^5( {\mathrm{nm}/\mathrm{s}^{2}})^{2}/\mathrm{Hz}\) (Fig. 4), doubling the sampling rate should improve the uncertainty by a factor of \(\sqrt{10^2/3.4} =5.4\) provided the observation period is the same.

As the noises are colored, to quantify the actual influence, tests were made by computing the calibration factor generating 10,000 noise series, with the same amplitude. As shown in Table 2, doubling the sampling rate increases the precision dramatically. In other words, for the same number of drops, restricting measurements to tidal extremes as in Fig. 3 with a doubled rate improves the precision by a factor \(5.2/1.1 = 4.7\) with respect to what would be obtained measuring continuously with a rate of 1 data every 10 s.

Table 2 Error on the calibration factor for the series shown in Fig. 3, for two different sampling rates, for an AG experiencing a 150 nm/s\(^{2}\) GW noise with high-frequency noise of which the PSD is shown in black in Fig. 4

3 Bias caused by the noise from superconducting gravimeter

So far, the time series \(x_i \) was considered as noise-free. According to previous studies (Banka and Crossley 1999; Van Camp et al. 2005; Rosat and Hinderer 2011), the SG instrumental noise is white, at the 10 \(\mathrm{nm}^2~\mathrm{s}^{-4}~\mathrm{Hz}^{-1}\) level, corresponding to signal rms amplitude of 1 nm/s\(^{2}\) when taking one drop every five seconds. This is lower than the AG GW noise by 80–120 dB, which dominates the spectrum at frequencies larger than 1 cycle per day, as illustrated in Fig. 4. Of course, the actual signal of the SG increases at frequencies higher than 0.01 Hz, given the structure of the geophysical noise (Peterson 1993), but most often, it remains lower than the AG white noise.

When the microseism is high, the AG experiences aliasing as previously discussed. However, in some circumstances the PSD of the SG can reach a level close to that from the AG, at periods shorter than about 25 s. The superspring (Niebauer et al. 1995) allows the AG to maintain a reasonable GW, while the SG experiences a dramatic increase in the high-frequency noise if it is not properly low-pass filtered. This is shown in Fig. 5, for a calibration experiment performed at the Membach station in May 2014. This figure shows the gravity residuals, which are the gravity signals after correcting for tidal and atmospheric effects. The AG drop-to-drop scatter amounts to an rms amplitude of 62 nm/s\(^{2}\), equivalent to 77,000 \(\mathrm{nm}^2~\mathrm{s}^{-4}~\mathrm{Hz}^{-1}\) when taking one drop every ten seconds, which can be considered as low noise condition, on the one hand. On the other hand, the SG experienced several perturbations due to moderate and strong earthquakes around the Pacific Ocean, with magnitudes Mw ranging 5.5–6.3. Note that those events are still too small to be observable in the AG series.

In that case, unless the SG signal is correctly low-passed, the classical LSQ theory cannot be applied anymore, as x is not error free. Let us consider that the SG time series \(x_i \) includes an independent measurement error \(\eta _i\):

$$\begin{aligned} x_i =\tilde{x}_i +\eta _i, \end{aligned}$$
(3)

where \(\tilde{x}_i\) would be the SG output voltage in the error-free case.

We then have

$$\begin{aligned} y_i =\beta ( {x_i -\eta _i })+\varepsilon _i =\beta x_i +u_i \end{aligned}$$
(4)

with

$$\begin{aligned} u_i =\varepsilon _i -\beta \eta _i. \end{aligned}$$
(5)

As x and \(\eta \) are not independent, considering that \(\tilde{x},\varepsilon ,\eta \) are independent, we have

$$\begin{aligned} \mathrm{Cov}( {x,u})=\mathrm{Cov}( {\tilde{x}+\eta ,\varepsilon -\beta \eta })=-\beta \mathrm{var}(\eta ). \end{aligned}$$
(6)

Consequently, the estimator of the calibration factor becomes

$$\begin{aligned} \hat{\beta }=\frac{\mathrm{cov}(x,y)}{\mathrm{var}(x)}=\beta \left( {1-\frac{\mathrm{var}(\eta )}{\mathrm{var}(x)}}\right) =\beta \cdot \frac{\mathrm{var}(\tilde{x})}{\mathrm{var}( {\tilde{x}})+\mathrm{var}(\eta )}. \end{aligned}$$
(7)

We see that \({\hat{\beta }}\) is systematically underestimated, by a factor \(\frac{\mathrm{var}(\tilde{x})}{\mathrm{var}( {\tilde{x}})+\mathrm{var}(\eta )}\). Note that this effect does not add variance to the estimator, but a systematic negative bias of the estimation. This phenomenon is known as attenuation or regression dilution bias (Hutcheon et al. 2010).

In many cases evaluating this bias is not straightforward (Frost and Thompson 2000). Fortunately, the attenuation bias is easy to determine in case of gravity measurements, given that \(\tilde{x}\) is essentially the tidal signal, and \(\eta \) the SG residual, obtained after removing a synthetic tide and correcting the atmosphere effect. Application of an appropriate low-pass filter to the SG series will mitigate this bias.

Another SG instrumental effect is the time lag. However, our tests show that an uncorrected lag as large as 30 s influences the calibration factor at a level smaller than 0.1 per mille (see also Meurers 2002).

3.1 Simulation

We consider the first 10 days of the Membach time series shown in Fig. 1, of which the rms amplitude equals 542.8 nm/s\(^{2}\). To mimic the SG noise, we added a red noise, a violet noise, and an ultraviolet noise. To obtain a red noise we generate a GW noise of 2 nm/s\(^{2}\) amplitude. Then, in the frequency domain its amplitude spectrum is multiplied by \(f^{-1}\), before coming back to the time domain. Similarly, the violet and ultraviolet noises are obtained from 20 and 30 nm/s\(^{2}\) GW noises, of which the amplitude spectra are multiplied by, respectively, f and \(f^2\). The total contribution to the rms noise amplitude in the whole frequency band up to 0.05 Hz is 7.9 nm/s\(^{2}\). The PSD of the noise model has a value of 2.4 \(\mathrm{nm}^2\mathrm{s}^{-4}\, \mathrm{Hz}^{-1}\) at 0.001 Hz and is shown in red in Fig. 5b, together with the PSD of the SG residual of Membach. The AG noise is modeled by a 77,000 \(\mathrm{nm}^2\mathrm{s}^{-4}\,\mathrm{Hz}^{-1}\) GW noise (equivalent to 62 nm/s\(^{2}\) standard deviation at a period of 10 s).

Fig. 5
figure 5

a SG and AG residuals and b their PSDs at Membach (SG data taken when the AG drops are available) from 2014-05-14 01:00 to 2014-05-18 14:46, 1 drop/10 s, 100 drops/set, 2 sets/hour; the PSD of the noise model used in our simulations is also shown in red. The semi-diurnal (2 cycles per day) and diurnal (1 cycle per day) tidal frequencies are indicated by the arrows

The calibration factor was calculated using the LSQ approach; this was repeated 10,000 times, generating a different random noise for each run.

The observed distribution from the simulation has a 0.31 nm/s\(^{2}\)/V precision, with a 783.82 nm/s\(^{2}\)/V mean. This is lower than the expected factor of 784 nm/s\(^{2}\)/V by , consistent with the bias predicted by Eq. (7):

$$\begin{aligned} \frac{542.8^2}{542.8^2+7.9^2}=0.99978. \end{aligned}$$

This result is more than 4 times smaller than the targeted per mille level, but this is a systematic bias which is not accounted for in the error bars provided by the LSQ process.

To ensure that the bias remains at a negligible level compare to the target per mille level, it makes sense to achieve a calibration for which the bias is smaller than .

According to Eq. (7), we need

$$\begin{aligned} \frac{\sigma _\eta }{\sigma _{\tilde{x}}}<\sqrt{\frac{1-0.9999}{0.9999}} =10^{-2}, \end{aligned}$$
(8)

where \(\eta \) is the SG residual, estimated after removing a synthetic tide and correcting the atmosphere effect using a linear admittance of \(-\)3.3 nm s\(^{-2}\)/hPa, and \(\tilde{x}\) is the actual tidal signal.

The formula (8) holds for white noise, our test shows that this is the high-frequency noise of the SG which might be high enough to bias the estimate of the calibration factor. Table 3 provides the attenuation factor for a 100-day time series at Membach as a function of the noise level.

Table 3 Attenuation factor estimated according to Eq. (7) for rms amplitude of 616 nm/s\(^{2}\) for the tidal signal at Membach, as a function of the rms amplitude of the noise \(\eta \) affecting the \(\tilde{x}_i \) values

4 Conclusions

We demonstrated that the per mille precision can be reached within 48 h by measuring at spring tides and by increasing the AG sampling rate. This is shorter than what is reported in previous empirical studies (Hinderer et al. 1998; Francis et al. 1998; Rosat et al. 2009), but supports the result of Francis (1997). We also showed that the error decrease with \(1/\sqrt{N} \) is not correct, given the tidal modulation effect. We then evidenced that, if the standard deviation of the noise affecting the SG is at least 100 times lower than the rms amplitude of the tidal signal used to compute the calibration factor, then the attenuation bias remains lower than the level. To mitigate this bias, a least-square LSQ filter with cutoff frequency of 0.05 Hz and length of 60 s is an appropriate choice, given that the macroseismic noise is strong above 0.05 Hz. The cutoff frequency must remain high enough not to remove a common signal to AG and SG.

Filtering would not help in other cases, for example if the noise is induced by spikes or steps, as they contaminate the whole frequency band. Editing carefully the AG and SG time series to remove earthquakes, spikes, and other disturbance is the minimum to be done before applying the LSQ process (Hinderer et al. 2007). Note that at frequencies smaller than about one cycle per day, the geophysical red noise affects the SG and AG in the same way (Van Camp et al. 2005) and does not bias the determination of the calibration factor.

The calibration precision of SGs operating in the framework of the GGP varies about a few per mille and different calibration experiments for the same SG can differ by more than (Rosat et al. 2009; Meurers 2012; Virtanen et al. 2014). This is expected given that the per mille value represents the one sigma level, such that 32 % of the calibration factors lie outside the error bars, assuming that the attenuation bias only plays a negligible role. Assuming that the SG calibration factor remains stable, to ensure a robust calibration factor, with error well below the level, we need averaging over several experiments, as shown by Rosat et al. (2009) and Virtanen et al. (2014). Considering that a calibration experiment is performed at the 1 per mille level, using more than 30,000 drops, we can assume an infinite number of degrees of freedom for the Student’s t-distribution. If we take a risk that 5 times in 100, the error in the calibration estimate will be at more than the 1 per mille level, 4 experiments will be required, or 16 at the 0.5 per mille level, or 43 at the 0.3 per mille level (Natrella 1963). For a risk of 1 time in 100, the number of experiments becomes 7, 27, and 74, respectively.

Fig. 6
figure 6

Calibration factors of the SG GWR-C025 during different experiments using an FG5 (F) and a JILAg (J) AG. Light red single results, light blue their errors associated to the LSQ process. Gray the range around the average. With the exception of 3 experiments, all results deviate from the overall weighted average by less than . All but the first formal error (light blue) are well below , which is equivalent to 0.79 nm s\(^{-2 }\)V\(^{-1}\) for this SG. The dark red shows how the weighted average develops with increasing number of experiments. The dotted dark blue line shows how the standard deviation from the average develops

This is not contradicted by Fig. 6, which represents the calibration factors determined by performing 13 experiments on the Austrian SG GWR-C025, as well as the evolution of the average factor as a function of the experiments. These experiments were done using JILAg (Faller et al. 1983) and FG5 AGs. The first three experiments lasted over more than 3.5 days, thereafter over 4–8 days. As shown by the light red curve, 3 factors differ by 1 per mille or more (050620; 111121 and 120609); however, the calibration factor stabilizes well below the level after the second experiment (dark blue line).

In the future, atom absolute gravimeters may change that picture as they do not rely on a mechanical process, and can thus be operated continuously (de Angelis et al. 2009). In that case for the same conditions as illustrated in Fig. 1, it would require 21 days to achieve the level, which would ensure a calibration factor at the level with a 99.7 % confidence interval. If the GW noise of future absolute gravimeters decreases, this precision could be obtained in less than 21 days.

When calibration factors are discussed, the amplitude of the tidal signal, the duration of the AG measurements, the attenuation bias as well as the AG sampling rate, and the number of drops should be provided.

Finally, even if a calibration experiment could be performed for months, other factors may limit the precision at a level better than a few tenths of per mille: small instabilities in the SG calibration factor, changing drifts, or possible non-linearity in the sensors. Other factors such as the calibration of the AG atom clock, tilts of the instrument, or changes in the refraction index or magnetic field (see Niebauer et al. 1995 for a comprehensive review of the possible sources of error), all are at a level smaller than 10\(^{-8}\) and are not presently of concern.