Introduction

As a critical piece of contemporary infrastructure, Global Satellite Navigation Systems (GNSS) are experiencing rapid advancements to cater to the exponential growth of GNSS user groups. However, there are numerous factors that hinder the optimal performance of these satellite navigation systems, which can be typically divided into several categories: satellite and signal errors, atmospheric anomalies, product (correction) errors, operating environment anomalies, and user-end errors (Du et al. 2021). The portions of the above errors that have not been completely eliminated by data processing collectively form the residual errors in the navigation system, named navigation residual estimation errors. Magnificent errors can lead to navigation fault/failure, including range/positioning failure, integrity failure, overbounding failure, and statistical failure (Milner et al. 2016). Range/positioning failure happens when errors in range/position domain exceed a tolerable threshold. Integrity failure represents the event that the probability of positioning failure without timely alert exceeds the predefined integrity risk. Overbounding failure refers to the cumulative distribution function (CDF) curve of real data that lies outside the CDF of the assumed model. And statistical failure denotes the probability distribution function (PDF) of error distribution changes to a failure state by misdefined statistical parameters. Aiming at better navigation performance, a thorough comprehension of GNSS error characteristics is a foundation of handling all the above possible GNSS error sources and fault/failure modes.

A considerable amount of research has been devoted to the characterization of GNSS errors and faults (Anderson and Ellis 1971). Errors are approximated by simplified time-dependent mathematical functions, as demonstrated in Bhatti’s (Bhatti et al. 2007) work, including step error, ramp error, random noise, random walk, oscillation, and bias. It should be noted that these approximations aim to depict temporal trends rather than provide precise models. Various common distributions are utilized to capture the CDF/PDF of errors for simplicity and intuitiveness. And Gaussian assumption is the most widely adopted in navigation community. Navigation residual errors are usually assumed to be normally distributed, and the probabilities of false alert and missed detection in GNSS integrity monitoring are thereby assumed to follow Chi-squared distributions (Panagiotakopoulos et al. 2014). However, the fact is error distributions are usually heavy-tailed in the presence of gross errors, blunders, and faults (Hsu 1979). And the behavior of the tails is significant, especially in integrity risk assessment, so we must treat it in a more ‘pessimistic’ way. Better descriptions such as Exponential, Laplace (Shively and Braff 2000), and Generalized Extreme Value (GEV) distributions (Panagiotakopoulos et al. 2008) have been suggested by researchers to deliver better descriptions to the heavy tails. To decide which is the best alternative to the Gaussian model, we need to estimate which distribution is the closest to the characteristics of real data, so we require a technique to fully validate the goodness-of-fit of each Gaussian alternative for the tails of error distributions.

The ‘tails,’ in other words, the faults, are where the characterization problem exactly lies in. The goodness-of-fit verification of GNSS fault modeling can be generally categorized into two types: mathematical and positional. From the perspective of mathematics, for faults that can be fitted with known distributions, conventional statistical metrics like correlation coefficients, root mean square error, skewness, and excess kurtosis (Joanes and Gill 1998) are generally utilized to gauge the similarity between fault models and actual data. Besides, several goodness-of-fit techniques (described in Methodology section) originating from the fields of mathematical statistics and machine learning are available. While from the positional perspective, modeling accuracy is evaluated through positioning performance. For instance, when a machine learning technique is used to model multipath faults (Pan et al. 2023), any improvement in position accuracy is considered an indicator of the accuracy of GNSS fault modeling. However, there are several shortcomings in the current techniques for GNSS fault model evaluation. One problem stems from the limited evaluation capacity of the common statistical metrics. These metrics, on the one hand, often require parameter estimation and typically provide specific characteristics of a distribution and do not consider the overall similarity of distributions. On the Other hand, they are typically suitable for low-dimensional, single-factor fault situations with relatively simple mathematical distributions, but they are impractical for high-dimensional, multifactor fault types. Another issue arises with the growth of machine learning techniques is that there is an increasing number of novel fault models, particularly those generated by black box methods or those with extremely complex structures. There is a need for an evaluation method that can be applied to both new and established models, enabling us to determine which model is more suitable.

Brief reviews on the methods of goodness-of-fit and the details of the Wasserstein distance-based scoring method are presented first. This is followed by the method validation using simulation and demonstration using real data from correction services and road tests. The conclusions are given in the last section.

Methodology

This part initially provides a concise overview of existing goodness-of-fit methodologies, laying the groundwork for the selection of the Wasserstein distance as the fundamental element within the proposed algorithm. We introduce the primary implementation of the Wasserstein distance, outlining its strengths and limitations. Recognizing the inadequacy of the raw Wasserstein distance in effectively measuring the similarity between real and modeled data across varied dimensions, we propose a novel Gaussian Wasserstein distance-based precision metric. This innovative metric aims to rectify evaluation disparities arising from diverse dimensions. Additionally, we establish a model scoring criterion, enhancing the interpretability of model assessments.

Goodness-of-fit

In scientific research, evaluating the compatibility between models and observed data, known as goodness-of-fit, is essential for validating a model’s applicability. Several techniques are proposed to check the fitting goodness of data and known distribution candidates, each with its unique strengths and weaknesses. The Chi-Square Goodness-of-Fit Test (Pearson 1992), suited for discrete distributions, is straightforward but less effective for small samples and continuous distributions. The Kolmogorov–Smirnov Test (Smirnov 1948) evaluates continuous distributions, especially normality, but loses accuracy with small samples. The Anderson–Darling Test (Anderson and Darling 1954), sensitive to tail fitting in continuous distributions, is ideal for small samples but involves complex calculations. The Cramér–von Mises Test (Cramér, 1928) also assesses continuous distributions and tail fitting, suitable for small samples but might miss non-normal distributions. The Lilliefors Test (Lilliefors 1967) checks goodness-of-fit to normal distributions, fitting small samples but limited to normality. Bayesian Goodness-of-Fit Tests (Rubin 1984) incorporate prior information, beneficial for low sample size or data-scarce cases but require careful prior selection and entail computational complexity.

The major limitation of traditional goodness-of-fit tests is that they often focus on specific distributional properties or assume parametric models, while they may provide misleading results if the data do not conform to known distributions. To tackle these, many sophisticated and flexible statistical tools are developed with the advancements in machine learning, which offer a more comprehensive assessment of goodness-of-fit and provide more accurate results. The popular methods are Kullback–Leibler (KL) divergence (Kullback and Leibler 1951; Dwass and Kullback 1960), Jensen–Shannon (JS) divergence (Jensen 1906; Shannon 1948), and Wasserstein distance (Vaserstein 1969). These methods significantly outperform traditional statistical metrics and goodness-of-fit techniques in that 1) they offer global comparison measures that take into account not only the central tendency (mean) and dispersion (variance) of distributions but also capture differences in their shapes; 2) they do not require prior assumptions on the distributions; and 3) they can serve as distance metrics, whereas traditional methods may not possess metric properties. While KL divergence is prone to an asymmetry issue when measuring the similarity of two distributions, this issue can be rectified by JS divergence. However, both KL divergence and JS divergence are ineffective when the two distributions do not intersect.

Wasserstein distance addresses this problem by providing an effective distance metric and gradient information even for two non-overlapping distributions in a high-dimensional space (Panaretos and Zemel 2019). Given these advantages, Wasserstein distance is employed here as a method of evaluating the precision of GNSS fault models.

Raw Wasserstein distance

Consider a fault type \(f\) with a dimension of \(n(n\ge 1)\), with its true distribution denoted as \({P}_{{\text{real}}}\) and its modeled distribution denoted as \({P}_{{\text{simu}}}\). Suppose that the sequence of \(k\) samples \({x}_{1},{x}_{2},\cdots ,{x}_{k}\) is collected from the real datasets or sampled from \({P}_{{\text{real}}}\). Since each sample is \(n\)-dimensional, we denote the sequence of the sample points in \({i}^{th}\) dimension as \({X}^{\left(i\right)}\) and rewrite it as

$$X^{\left( i \right)} = \left[ {x_{1}^{\left( i \right)} ,x_{2}^{\left( i \right)} \ldots ,x_{k}^{\left( i \right)} } \right]^{T} ,{ }i = 1,{ }2,{ } \ldots ,{ }n$$
(1)

where \({x}_{k}^{\left(i\right)}\) is the \({i}^{th}\) dimension of sample point \(k\). Then the real sample points can be represented as

$$X = \left\{ {X^{\left( 1 \right)} ,X^{\left( 2 \right)} , \ldots ,X^{\left( n \right)} } \right\}$$
(2)

We denote the fault model of fault type \({\text{f}}\) as \({M}_{f}\). A sequence of \(k\) samples \({y}_{1},{y}_{2},\cdots ,{y}_{k}\) is collected from the fault model. Similarly, we denote the sequence of the simulated points in \({i}^{th}\) dimension as \({Y}^{\left(i\right)}\) and rewrite it as

$$Y^{\left( i \right)} = \left[ {y_{1}^{\left( i \right)} ,y_{2}^{\left( i \right)} \ldots ,y_{k}^{\left( i \right)} } \right]^{T} ,{ }i = 1,2, \ldots ,n$$
(3)

where \({y}_{k}^{\left(i\right)}\) is the \({i}^{th}\) dimension of simulated point \(k\). Then the simulated sample points can be represented as

$$\begin{array}{*{20}c} {Y = \left\{ {Y^{\left( 1 \right)} ,Y^{\left( 2 \right)} , \ldots ,Y^{\left( n \right)} } \right\}} \\ \end{array}$$
(4)

Both \(X\) and \(Y\) can be treated as a \(k\times n\) matrix.

\(p\)-order Wasserstein distance \({W\left({P}_{real},{P}_{simu}\right)}^{p}\) \({P}_{real}\) and \({P}_{simu}\) can be calculated by (Vaserstein 1969)

$$W\,\left( {P_{{{\text{real}}}} ,P_{{{\text{simu}}}} } \right)^{p} \, = \,\left( {\begin{array}{*{20}c} {{\text{inf}}} \\ {\gamma \sim \Pi \left( {P_{{{\text{real}}}} ,\,P_{{{\text{simu}}}} } \right)} \\ \end{array} \,{\rm E}_{{\left( {x,y} \right)\sim \gamma }} \,\left[ {d\,\left( {X - Y} \right)^{p} } \right]} \right)^{1/p}$$
(5)

where \(\Pi \left({P}_{{\text{real}}},{P}_{{\text{simu}}}\right)\) represents the collection of all possible joint distributions between \({P}_{{\text{real}}}\) and \({P}_{{\text{simu}}}\) for the fault type \(f\). \(\gamma\) denotes a possible joint distribution belonging to this collection. \(d\left(X-Y\right)\) is the Euclidean distance between \(X\) and \(Y\). \(\left(x,y\right)\sim \gamma\) means sampling from the joint distribution \(\gamma\). \({\rm E}_{\left(x,y\right)\sim \gamma }[{d(X-Y)}^{p}]\) represents mathematical expectation of \({d\left(X-Y\right)}^{p}\). And \({\text{inf}}\) represents the infimum.

The smaller the Wasserstein distance, the lower the cost required to move the modeled distribution to the true distribution, indicating a higher similarity between the two distributions. This, in turn, demonstrates a higher accuracy of the established fault model.

Wasserstein distance exhibits many good properties which can be summarized as follows:

  • The properties of the distributions are not tightly constrained. In other words, the fault model can be arbitrary finite-dimensional, discrete, or continuous, generated by known distributions or black box methods.

  • Even though the real faulty data and its fault model do not overlap in high dimensions, Wasserstein distance is still able to measure their similarities.

  • Although the solution of Wasserstein distance in high-dimensional space can be very complex, a special case is second-order Gaussian Wasserstein distance, which has analytical solution, so we can take good advantage of it.

However, when utilizing Wasserstein distance for fault model evaluation, two problems need to be tackled:

  • Models with different dimensions may have the same value of Wasserstein distance but different precision. The precision metric of models should be uncoupled with the dimensionality and more intuitive.

  • Users cannot decide whether the model is good or not if merely given the numeric value of the Wasserstein distance.

Based on above discussions, an indicative criterion should be established for reference.

Gaussian Wasserstein distance-based precision metric

For two n-dimensional Gaussian distributions (each dimension is independently and identically distributed), \({\mu }_{1}\sim {N}_{1}({m}_{1},{C}_{1}), {\mu }_{2}\sim {N}_{2}({m}_{2},{C}_{2})\), the second-order Gaussian Wasserstein distance, denoted as \({W}_{2}{\left({\mu }_{1},{\mu }_{2}\right)}^{2}\), has a closed-form solution, which is expressed as (Takatsu 2011)

$$W_{2} \left( {\mu_{1} ,\mu_{2} } \right)^{2} = |m_{1} - m_{2} |^{2} + {\text{Tr}}\left( {C_{1} + C_{2} - 2\left( {C_{1}^{1/2} C_{2} C_{1}^{1/2} } \right)^{1/2} } \right)$$
(6)

where \((|\cdot |)\) denotes the Euclidean norm, \({\text{Tr}}\left(\cdot \right)\) is the trace of a matrix, and \({C}_{1}^{1/2}\) represents the square root of the covariance matrix \({C}_{1}\).

More intuitively, assume \({\mu }_{1}\) as a n-dimensional standard Gaussian distribution and \({\mu }_{2}\) as a n-dimensional Gaussian distribution with a mean vector of zero and diagonal elements of the covariance matrix equal to \({\sigma }^{2}\). In this case, (6) can be further simplified to

$$W_{2} \left( {\mu_{1} ,\mu_{2} } \right)^{2} = n \cdot \left( {\sigma - 1} \right)^{2}$$
(7)

We denote the second-order Wasserstein distance between the true distribution and the generated distribution corresponding to the \({\text{n}}\)-dimensional fault type \(f\) as \({{W}_{f}}^{2}\). For fault model \({M}_{f}\), let \({{W}_{f}}^{2}\) be equal to \({W}_{2}{\left({\mu }_{1},{\mu }_{2}\right)}^{2}\) and \(\sigma \ge 1\). Hence, \(\sigma\) can be calculated by

$$\sigma = \sqrt {\frac{{W_{f}^{2} }}{n}} + 1$$
(8)

Since the dimension \(n\) is known to us, \(\sigma\) can serve as the precision metric of the fault model. The smaller \(\sigma\) is, the higher the accuracy of the model.

It can be derived from (8) that Wasserstein distance is related to the dimensionality of the model. Taking a 2D model and a 3D model as examples, even if both models have similar accuracy, the Wasserstein distance of the 2D model will be smaller than that of the 3D model due to the difference in dimensionality. By converting Wasserstein distance into the precision metric \(\sigma\), it eliminates the evaluation discrepancy caused by different dimensions and facilitates the establishment of a unified model scoring criterion.

Model scoring criterion

For a given fault type \(f\), there is a full database generated from real world. Three datasets with the same sample size \(N\) are built: two real datasets from the full database (denoted as set 1 and set 2) and one simulated dataset from the fault model (denoted as set 3). We suppose that one database exhibits the same characteristics.

The second-order Wasserstein distance \({W}_{1, 2}^{2}\) between set 1 and set 2 is calculated by Eq. (5); let \({W}_{1, 2}^{2}\) be equal to \({{W}_{f}}^{2}\) in Eq. (8), and we can derive \(\upsigma\) between set 1 and set 2. Since set 1 and set 2 are real datasets from the same database, \(\sigma\) is treated as the baseline \(\sigma\) which serves as the benchmark value for model scoring, denoted as \({\sigma }_{{\text{baseline}}}^{f}\). Then, we sample \(N\) points from set 1 and set 2 to generate set 0, and the \(\sigma\) calculated between set 0 and set 3 is denoted as \({\sigma }^{f}\).

A scoring criterion is proposed to convert \(\sigma\) to model score. The percentage changes from \({\sigma }^{f}\) to \({\sigma }_{{\text{baseline}}}^{f}\) can be derived by

$$\sigma_{inc} = {\text{abs}}\left( {\sigma^{f} - \sigma_{{{\text{baseline}}}}^{f} } \right)/ \sigma_{{{\text{baseline}}}}^{f} \left( {\text{\% }} \right)$$
(9)

where \(abs(\cdot )\) represents the absolute value. \({\sigma }_{{\text{inc}}}\) is treated as the indicator of the loss of precision in the process of modeling. And the scoring criterion is expressed as

$${\text{score}}\, = \,\left\{ {\begin{array}{*{20}l} {100\left( {1 - \sigma_{{{\text{inc}}}} } \right),} \hfill & {0\, < \,\sigma_{{{\text{inc}}}} \, < 1} \hfill \\ {0,} \hfill & {\sigma_{{{\text{inc}}}} \, \ge \,1} \hfill \\ \end{array} } \right.$$
(10)

Based on user requirements and model scores, we can decide whether to select a particular model. For example, if the user’s requirement is to select models with a score above 80, then models with scores below 80 are considered to have insufficient accuracy and not recommended to use.

Experiments and analysis

To validate the performance of the proposed evaluation method, three experiments are conducted and analyzed subsequently. Real-world samples of each fault type are collected and divided into training set and testing set in a 4:1 ratio. The evaluation process can be divided into four steps: 1) datasets construction, 2) fault modeling, 3) fault simulation, and 4) model evaluation. The first step is to extract required data from raw data and establish training and testing datasets. The second step is to exploit training datasets to generate fault models. The third step is to simulate faults with the models built in step two. The final step is to evaluate the performance of fault models within the proposed evaluation method on the testing set. A flowchart is demonstrated in Fig. 1 for better understanding.

Fig. 1
figure 1

Flowchart of evaluation process

Validation of the proposed method

In high-dimensional space, since analytical solutions are available only for second-order Gaussian Wasserstein distance, we have to calculate Wasserstein distance by numerical approximation techniques, and tolerable inaccuracy in final model score should be allowed. A numerical example is presented below to validate the proposed method.

Given that second-order n-dimensional Gaussian Wasserstein distance has a closed-form formula, we assume that a Gaussian model is the fault model to be evaluated. Suppose that we have a fault model which follows a four-dimensional Gaussian distribution where each dimension has a mean of 0 and a standard deviation of 1, and apparently theoretical precision metric \(\upsigma\) will be 1. Since we are sampling from a known distribution, the ideal model score should be 100.

Then, we simulated data following the distribution with sizes of 100, 1000, 10,000, 50,000, and 100,000. Wasserstein distances are calculated through numerical approximation method (with a convergence of 10–12) between these datasets and four-dimensional standard Gaussian distribution. Table 1 shows that the calculated values of \(\upsigma\) are approximating to the theoretical value ‘1’ and the model scores progressively become more accurate as the sample size increases. And the relationship between sample size and model score is plotted in Fig. 2. We can conclude that when the sample size is relatively large (for example, ≥ 10,000 in this case), the precision loss is tolerable and can be ignored. Besides, even for the precise model, undersampling can introduce large inaccuracy in model scoring, which should be avoided in practical use.

Table 1 The model scores corresponding to different sample sizes
Fig. 2
figure 2

Relationship between sample size and model score

Fault model evaluation using the proposed method

Sklar’s theorem (Sklar 1959) states that a n-dimensional multivariate probability distribution can be decomposed into two parts: n univariate marginal distributions and a copula (Oh 2014) which is solely a function of the dependence structure to capture the inter-correlations between the n dimensions. To validate the method’s ability for complex model evaluation, two copula-based multivariate models with correlated variables are built and scored in this part.

GNSS correction product fault model evaluation

GNSS correction product fault is one of fault sources in GNSS positioning. In this part, a Slant Troposphere Delay (STD) correction fault model is established and evaluated. In general circumstances, the precision of STD correction products degrades under extreme troposphere conditions such as typhoon and heavy rain.

The real-world STD correction faults are generated from the grid STD correction products broadcast by Qianxun Spatial Intelligence (QXSI) company. With the hydrostatic components corrected by Saastamoinen model (Saastamoinen 1972), the generation is based on the difference between \({{\text{STD}}}_{{\text{wet}},{\text{real}}}\)(true values of wet components of STD) obtained by monitoring stations and \({{\text{STD}}}_{{\text{wet}},{\text{est}}}\) (estimated values of wet components of STD) obtained with troposphere correction, denoted as \({\text{dSTD}}\), as follows:

$${\text{dSTD}}_{i}^{s} \, = \,{\text{STD}}_{{_{{{\text{wet,real}}\,i}} }}^{s} \, - \,{\text{STD}}_{{{\text{wet,}}\,{\text{est}}\,i}}^{s}$$
(11)

where \({dSTD}_{i}^{s}\) represents the correction error on grid \(i\) of satellite \(s\).

With STD correction data of six months broadcast in Shanghai, China, \({{\text{dSTD}}}_{i}^{s}\) sequences are collected for faulty data extraction. With time series anomaly detection methods applied to detect STD correction faults, faulty \({{\text{dSTD}}}_{i}^{s}\) sequences are generated and added to training and testing datasets. Two examples of real faults detected in the \({\text{dSTD}}\) sequences are demonstrated in Fig. 3, showing that STD correction temporal anomalies undergo a sudden rise–sustained–fall in value.

Fig. 3
figure 3

Real faults (outlined with red dashed lines) detected in the \({\text{dSTD}}\) sequences on grid 188 of Sat C03 (left) and on grid 84, 85 of Sat E19 (right) during July 1–8, 2023

Based on the observations and analysis of faulty samples, STD correction faults can be abstracted as a trapezoid-like model, as depicted in Fig. 4. This ‘trapezoid’ is characterized by four parameters: (a) rising slope (unit: m/s) denoted as \({k}_{1}\); (b) descending slope (unit: m/s) denoted as \({k}_{2}\); (c) ramp duration (unit: s) denoted as \({\Delta t}_{1}\); and (d) step duration (unit: s) denoted as \({\Delta t}_{2}\). Since rainfall events typically include several features with correlation such as the intensity, precipitation, and duration, it is straightforward that the model is multivariate, multidimensional model (Balistrocchi and Bacchi 2011). It is verified by Kendall coefficients that the aforementioned four parameters have rank corrections. A multivariate, four-dimensional trapezoid-like model is established by four marginal functions and one copula function for STD correction faults characterization. The joint distribution \(F\left( {k_{1} ,k_{2} ,\Delta t_{1} ,\Delta t_{2} } \right)\) is expressed as

$$F\,\left( {k_{1} ,k_{2} ,\Delta t_{1} ,\Delta t_{2} } \right)\, = \,C\left( {F_{1} \,\left( {k_{1} } \right), F_{2} \,\left( {k_{2} } \right), F_{3} \,\left( {\Delta t_{1} } \right), F_{4} \,\left( {\Delta t_{2} } \right)} \right)$$
(12)

where \({\text{C}}(\cdot )\) is the copula function, and \({\text{F}}_{*} \left( \cdot \right)\) denotes each marginal function. The distribution fittings of four marginal functions are shown in Fig. 5.

Fig. 4
figure 4

Abstract trapezoidal model for STD correction faults

Fig. 5
figure 5

Distribution fittings of the four parameters \({k}_{1}\) (top left), \({k}_{2}\) (top right), \({\Delta t}_{1}\) (bottom left), and \({\Delta t}_{2}\) (bottom right) of STD correction fault model

In the previous step, STD correction fault model is built on the training set. And the testing set is equally divided in two, named set 1 and set 2, and the count of the sample points in each set is denoted as N. Then, we generate N points with four dimensions from the established STD correction fault model to build set 3. Thereby, we have three datasets with equal count of samples for model evaluation.

We obtained three sets of \({\text{n}}\)-dimensional STD correction fault samples in fault simulation: Set 1 and set 2 are the real data testing sets, and set 3 is the simulated dataset. The sample size of each set is 10,000. Then, we use set 1 and set 2 to calculate baseline result and testing set and set 3 to calculate model result. The result is summarized in Table 2. Values of raw Wasserstein distance are derived by numerical approximation techniques, and numeric values of precision metric are calculated by (8). The model score of baseline result is set as 100. And the model score of the established model, computed by (9) and (10), yields a result of 96.80.

Table 2 The precision metric and model scores of STD correction fault model

Multipath fault model evaluation

Multipath is a typical unmodeled fault source which dramatically deteriorates GNSS positioning performance. It arises when the satellite signals received by GNSS receivers travel not only directly but also indirectly via reflections of nearby surfaces such as skyscrapers and terrain. A double-frequency code multipath fault model is constructed and estimated in this section.

Real-world multipath faults are generated from drive test data collected by QXSI in March 2022. And undifferenced and uncombined precise point positioning model is used to extract code multipath, which can be expressed as

$$v_{{p_{i} }} \, = \,{\text{MP}}_{{p_{i} }} \, + \,\varepsilon_{{p_{i} }} \, = \,P_{i, r}^{s} \, - \,\rho_{r}^{s} \, - \,c\left( {{\text{dt}}_{r} \, - \,{\text{dt}}^{s} } \right)\, - \,T_{r}^{s} \, - \,I_{i, r}^{s}$$
(13)

where \(s, r, i\) denote the satellite, receiver, and frequency index, respectively; \(c\) represents the velocity of light in vacuum; \({P}_{i, r}^{s}\) is the pseudorange measurement from satellite \(s\) to receiver \(r\) on frequency \(i\); \({\rho }_{r}^{s}\) is the receiver-to-satellite geometric distance; \({{\text{dt}}}_{r}\) and \({{\text{dt}}}^{s}\) denote the clock error of receiver \(r\) and satellite \(s\), respectively; \({T}_{r}^{s}\) is the tropospheric delay, and \({I}_{i, r}^{s}\) is the ionospheric delay.

Since the receivers’ accurate coordinates can be derived by map-matching technique, satellite-end and atmosphere-end errors can be corrected by QXSI correction products, and code multipath datasets are thereby established based on drive test datasets.

Temporal code multipath faults on frequency \(i\) can be characterized as a triangle-like model, abstracted in Fig. 6, where the ‘base’ represents fault duration, and ‘height’ represents the peak value of faults. Besides, as is shown in Fig. 7, SNR (signal–noise ratio) measured by receivers usually descends when the magnitude of code multipath faults grows.

Fig. 6
figure 6

Abstract triangle model for code multipath faults on frequency \({\text{i}}\)

Fig. 7
figure 7

Relationship between code multipath on frequency L1 and SNR value

A double-frequency code multipath fault is described by four parameters: (a) fault duration (unit: s) denoted as \(\Delta t\); (b) the peak value of multipath on the first frequency denoted as MPL1; (c) the peak value of multipath on the second frequency denoted as MPL2; and (d) delta SNR value denoted as \(\Delta {\text{SNR}}\). It is verified by Kendall coefficients that the four parameters are correlated. Hence, A four-dimensional double-frequency code multipath model is established by four marginal functions (the distribution fittings are shown in Fig. 8) and one copula function, and the joint distribution \({\text{F}}\left(\Delta t,{{\text{MP}}}_{L1},{{\text{MP}}}_{L2},\Delta {\text{SNR}}\right)\) is written as

$${\mathcal{F}}\left( {\Delta t,{\text{MP}}_{L1} ,{\text{MP}}_{L2} ,\Delta {\text{SNR}}} \right) = {\mathbb{C}}\left( {{\text{\rm M}}_{1} \left( {\Delta t} \right),{\text{ {\rm M}}}_{2} \left( {{\text{MP}}_{L1} } \right),{\text{ {\rm M}}}_{3} \left( {{\text{MP}}_{L2} } \right),{\text{ {\rm M}}}_{4} \left( {\Delta {\text{SNR}}} \right)} \right)$$
(14)

where \({\mathbb{C}}(\cdot )\) is the copula function, and \({\mathrm{\rm M}}_{*}(\cdot )\) denotes each marginal function.

Fig. 8
figure 8

Distribution fittings of the four parameters \(\Delta t\) (top left), \({{\text{MP}}}_{L1}\) (top right), \({{\text{MP}}}_{L2}\) (bottom left), and \(\Delta {\text{SNR}}\) (bottom right) of the double-frequency code multipath fault model

The procedure of fault simulation is the same as the simulation of STD product fault model. We obtained three sets of n-dimensional double-frequency code multipath fault samples in fault simulation: Set 1 and set 2 are the real data testing sets, and set 3 is the simulated dataset. The sample size of each set is 10,000. As is shown in Table 3, the model score of baseline result is set as 100. And the model score of the established model, computed by Eq. (9) and Eq. (10), yields a result of 97.15.

Table 3 The precision metric and model scores of double-frequency code multipath fault model

Conclusions

GNSS errors often exhibit heavy tails, non-Gaussian behavior, and are nearly impossible to completely eliminate. Gaussian assumption, although widely applied in the field of GNSS, fails to capture the heaviness of tails which are typically attributed to the projection of measurement faults to position domain. Error tails and faults should be more precisely modeled to deliver improved navigation performance. Researchers not only need precise models but also want to quantify how ‘precise’ the model is. And it is straightforward that universal goodness-of-fit evaluation criteria are required.

This study has focused on the goodness-of-fit evaluation of GNSS fault characteristics and models, by exploiting Wasserstein distance, a machine learning method to measure the similarities between the two distributions. Considering that numeric values of raw Wasserstein distance cannot reflect model precision intuitively (models with different dimensions may have the same value of Wasserstein distance but different precision), a second-order Gaussian Wasserstein distance-based precision metric \(\upsigma\) is presented to convert the cost between n-dimensional real distribution and fault model to the cost between two n-dimensional Gaussian distributions. A scoring criterion is thereby established based on \(\upsigma\) to provide reference opinions to users. The contributions of the proposed method can be summarized as follows:

  1. (1)

    The method inherits the good properties of Wasserstein distance, without imposing any constraints on the models’ dimensionality or specific distribution, so we can evaluate fault models as long as we can sample from them.

  2. (2)

    The precision metric proposed uncoupled dimensionality with modeling precision, thus unifying the evaluation criteria among different models with different dimensions.

  3. (3)

    The scoring criterion further provides a more humane way for GNSS practitioners to select fault models according to their individual needs and contributes to the validation of the ubiquitous non-Gaussian problems.

From the experimental results, the proposed method is first validated though known n-dimensional Gaussian distributions, indicating that adequate sampling should be conducted to maintain the model scoring inaccuracy resulting from undersampling within an acceptable range. A STD product fault model and a double-frequency code multipath model are given as examples to apply the proposed method to fault model evaluation. Detailed implementation processes have been presented to show that this method is able to evaluate and score GNSS fault models. In future work, fault models can be scored, compared, and selected more conveniently within the method introduced in this study. The computed model score can be used as the indicator of model iteration. Advanced fault models can provide sufficient faulty simulated measurements for large-scale system robustness testing and positioning algorithm refinement, as well as significantly reducing the cost of data acquisition. Furthermore, the proposed technique is expected to have broader applications such as assisting in the validation of non-Gaussian phenomena and identification of Gaussian alternatives, incorporating in novel integrity algorithms and helping derive tighter protection levels to improve availability.