Introduction

GNSS positioning accuracy is affected by various observation errors, which seriously deteriorates its application in displacement monitoring such as settlement and landslides. Many studies have explored possible methods to mitigate the effects of these observation errors. To date, however, there has been no definitive model that can eliminate these errors caused by the complex spatiotemporal characteristics. These errors, which cannot be eliminated, are referred to as unmodeled errors. There are many unmodeled error mitigation strategies, a common strategy is to model observation errors. For example, Hoque and Jakowski (2008) investigated the higher-order ionospheric effects in precise GNSS positioning. Another well-known example is multipath mitigation, such as multipath ray-tracing (Lau and Cross 2007), multipath sidereal filtering (Choi et al. 2004; Ragheb et al. 2007; Shen et al. 2020), multipath hemispheric mapping (Dong et al. 2016), adaptive tracking of line-of-sight/non-line-of-sight signals (Chen et al. 2012; Chen et al. 2014; Chen et al. 2017) and so on. The effects of unmodeled error cannot be fully eliminated due to the complex spatiotemporal characteristics of observation errors.

The second strategy for error mitigation is to select the appropriate positioning strategy based on the characteristics of the observation residuals of the original positioning results. Most research on GNSS error characteristics has been carried out based on observation residuals. Zhang et al. (2017) compared and analyzed the unmodeled error in Global Positioning System (GPS) and BeiDou signals. The residual error analysis of the observation data shows that the unmodeled error is present in the positioning time series. The short-term correlation of unmodeled error is analyzed by correlation analysis, and two empirical models are used to estimate the correlation coefficient. Sequential adjustment is used in baseline resolution to take into account the short-term correlation of unmodeled error. Zhang et al. (2018) introduced a real-time adaptive weighting model which can mitigate the site-specific unmodeled error of undifferenced code observations. The model is a combination of the elevation-based model and the carrier-to-noise power density ratio (C/N0) based model, in which the parameters of the C/N0 model need to be determined through static observations of the receiver in a low multipath environment. Li et al. (2018a) proposed a procedure to test the significance of unmodeled errors and identify their components in GNSS observation. In this method, the unmodeled error is divided into three categories, and suggestions for processing each type of unmodeled error are also given. A recent study by Zhang and Li (2020) made use of this procedure to detect the significance of observation residuals. Then the unmodeled error mitigation method based on multi-epoch partial parameterization is adapted to process the observations with significant unmodeled error. In this kind of method, the choice of function model or stochastic model mainly depends on the hypothesis test of residuals (Wang et al. 2013), and other observation features are not fully utilized.

Another kind of strategy is the time–frequency analysis of geographic coordinate time series. Different trajectory models are defined to describe coordinate time series. Among them, the standard linear trajectory model (SLTM) is defined as the sum of three different types of displacements, including trend terms, jump terms, and periodic terms (Bevis et al. 2020). In addition, Bevis and Brown define the extended trajectory model (ETM), which adds one or more transients to the SLTM (Bevis and Brown 2014). Different parameter estimation methods are used to estimate the parameters involved in these models, including maximum likelihood estimation (Langbein 2017; Bos et al. 2020), Bayesian inference (Olivares-Pulido et al. 2020), and Kalman filter (Engels 2020). Usually, the error characteristics of coordinate time series are analyzed by the power spectrum, and the parameters of each error component can be estimated by the least square variance component (Teunissen and Amiri-Simkooei 2008). However, most of the above methods are for offline analysis of coordinate time series, which is not suitable for online applications.

The last strategy is based on regional filtering, which utilizes the regional correlation characteristics of unmodeled error and the reference station network to reduce the impact of the unmodeled error. In regional network analysis, regional-related errors caused by satellite orbit, earth orientation parameters, atmospheric effects, and other factors are called common-mode errors (CME) (Wdowinski et al. 1997; Dong et al. 2006). Many studies have attempted to mitigate the common model error through regional filtering techniques. The first detailed regional filtering study was carried out to estimate coseismic and postseismic displacements (Wdowinski et al. 1997). This technique assumes that the CME is special uniform, which leads to a decrease of the calculated CME with the increase in regional network size. To handle this problem, the principal component analysis (Wold et al.1987) and the Karhunen–Loeve expansion (Kirby and Sirovich 1990) were introduced in the regional filtering, in which the nonuniform spatial response of the network stations to a CME source is considered (Dong et al. 2006). The entire network time series are taken into account and the time series is decomposed into various spatial and temporal coherent orthogonal modes in the regional filtering. Besides, the multiscale principal component analysis (PCA) techniques have been introduced into regional filtering (Li et al. 2017, 2018b). The wavelet denoising is applied to the coordinate time series before performing the PCA for common model error mitigation (Li et al. 2017). Similarly, the empirical mode decomposition (EMD) was adopted to denoise the coordinate time series before the common model error mitigation was performed using PCA (Li et al. 2018b). Besides, the multi-channel singular spectrum analysis was adopted to estimate common environmental impacts affecting GPS observations (Gruszczynska et al. 2018), and it is not suitable for site-related errors, such as multipath. All of these methods are based on time series analysis of coordinate residuals, so the features of observation are not fully utilized.

The observation features which are derived from the single epoch GNSS positioning results, such as satellite elevation angle, pseudorange residual, carrier-phase residual, signal strength, etc., are related to the unmodeled error. Some machine learning algorithms have been introduced for multipath detection (Hsu 2017). A convolutional neural network (CNN)-based carrier-phase multipath detection method has been proposed for static and kinematic GPS high-precision positioning (Quan et al. 2018). However, after detection, the weight loss or removal strategy needs further research (Lau and Cross 2006; Shen et al. 2020), and there is a lack of end-to-end unmodeled error removal research. So far, very little attention has been paid to the relationship between the unmodeled error and the observation features. Previous studies of unmodeled error mitigation are limited to correcting the observation error, positioning strategy, or coordinate time series. These studies have failed to demonstrate a link between the observation feature and the unmodeled error. Using historical observations, we seek to establish the relationship between the observation features and the unmodeled error. The purpose of this paper is to explore a data-driven unmodeled error prediction method using machine learning. The method is divided into offline training and online prediction, for which the machine learning algorithm is adopted. The research data in this paper is drawn from International GNSS Service (IGS) stations. This study offers some important insights into the unmodeled error mitigation of a specific positioning model.

The remaining part of the paper proceeds as follows: the second section of this paper is concerned with the methodology used for this study. The experiments and results are presented next, followed by a discussion of the main issues of the proposed method and the conclusion.

Methods

This part first provides the principle of an unmodeled error mitigation method based on machine learning. Then, the machine learning algorithm adopted in the proposed method is presented. Finally, the basic principle of the positioning model used in the experiment is introduced.

Data-driven unmodeled error prediction

Similar to other applications based on machine learning, unmodeled error prediction is also divided into two processes: offline training and online prediction, which are introduced below.

Online prediction

The unmodeled error mitigation in this work is achieved by training the unmodeled error prediction model of the positioning model through historical observation data. The trained unmodeled error prediction model is used to predict the unmodeled error in the positioning model to improve the accuracy of single epoch positioning. The unmodeled error mitigation process is shown in Fig. 1.

Fig. 1
figure 1

Unmodeled error mitigation process

As shown in Fig. 1, the output of single-epoch positioning is the basis for model training and unmodeled error prediction. The GNSS positioning in Fig. 1 is described as

$$Y = F(L)$$
(1)

where \(L\) represents the positioning information such as observation data, navigation message, precise ephemeris, etc.; \(F\) denotes the GNSS positioning model, which can be single-point positioning, precise point positioning, relative positioning, etc.; \(Y\) is the unknown parameters to be estimated, and only the coordinates are considered. The coordinate and observation features extracted from single epoch positioning are used as the basis for model training and error prediction. The data for model training is extracted from historical observations, which will be described in more detail in the following subsection. The observation features in the model prediction are obtained from the single epoch positioning of the current epoch. The trained model predicts the unmodeled error, and the general definition of the prediction model in Fig. 1 is as follows

$$y^{ - } = f(x)$$
(2)

where \(x\) denotes observation features, including the original observation information and the intermediate processing results of single epoch GNSS positioning result; \(f\) is the trained prediction model, which will be described in the following subsection; \(y^{ - }\) is the predicted unmodeled error in the current epoch. As shown in Fig. 1, after obtaining the original positioning value and the predicted unmodeled error, the final position estimation can be obtained by the following formula

$$\hat{Y} = Y - y^{ - }$$
(3)

Offline training

The above description is an online unmodeled error prediction process. However, how to train a reliable prediction model \(f\) is the key to online prediction. The problem of unmodeled error prediction is essentially a regression problem in supervised learning. The machine learning algorithm is to learn the mapping relationship between input variables and output variables through training data. Training data is the basis of model training, and the expression of training data is given as follows

$$D = \{ (x_{1} ,y_{1} ),\,(x_{2} ,y_{2} ) \cdots ,\,(x_{i} ,y_{i} ) \cdots ,\,(x_{n} ,y_{n} )\} ,\,y_{i} \in R$$
(4)

where \(x_{i}\) represents the input of the \(i\)th sample, that is, the selected observation feature; \(y_{i}\) denotes the output of the \(i\)th sample, that is, the unmodeled error.

The selected features can be divided into three categories: one is the features closely related to unmodeled error studied by previous researchers, such as satellite elevation, signal strength, observation residual, etc.; the other is GPS time which reflects the time characteristics of error; the last one is the quality features of observation, such as valid data flag, cycle-slip flag, cycle-slip count, etc. To eliminate the influence of unit and scale differences between features and treat all dimensional features equally, the features need to be normalized. Min–max normalization and z-score normalization are adopted to normalize these features. The details of observation features and corresponding normalization methods are shown in Table 1.

Table 1 Observation features of the machine learning algorithm, \(x\) denotes the original value of the feature, \(\overline{x}\) is the mean of the original value, \(\sigma\) represents the standard deviation of the original value, DoS is short for depends on samples

The features of all GPS satellites are organized in a matrix, forming a so-called feature matrix. Each column in the matrix represents the satellite, and each row in the matrix represents each feature. The size of the feature matrix is fixed, and the columns corresponding to the unavailable satellites will be reserved and filled with zeros. The output of the training data, the GNSS positioning unmodeled error, is usually obtained by the difference between the GNSS single-epoch positioning result and the relatively high-precision positioning result. More precise positioning results are used as reference positioning results, but these positioning results have a time delay. We used the difference between the positioning result of a single epoch and the positioning result of a whole day as the training data output. The unmodeled error is obtained as follows

$$y_{i} = Y_{i} - \overline{Y}$$
(5)

where \(Y_{i}\) is the result of a single epoch solution and \(\overline{Y}\) is the average value of a single epoch solution within a day. After preparing training samples, the key to the prediction of unmodeled error is to train a desirable prediction model with these sample data. The machine learning algorithm used in this method is given below.

Convolutional neural network

CNN has been widely used in the field of computer vision, and it has also performed well in other application fields (Goodfellow et al. 2016). Therefore, a deep convolutional neural network is designed to regress the unmodeled error in GNSS positioning.

Network architecture

The network consists of an input layer, 4 convolutional layers, a pooling layer, a flattening layer, a fully connected layer, and an output layer. The unmodeled error CNN regression model is shown in Fig. 2.

Fig. 2
figure 2

Unmodeled error CNN regression model

The dimension of the input layer is 13*32, where 13 represents the number of features used and 32 represents the number of GPS satellites. The size of each convolution kernel in this network is 3 by 3, and the step size in both directions is set to 1. The convolution layer adopts the valid zero-padding strategy (Goodfellow et al. 2016), that is, to reject the zero-padding, resulting in a reduction of the length and width of the output data by 2. The number of convolution kernels in each convolution layer is set to 64, 64, 32, and 32, respectively, to extract multiple types of features.

The convolution operation of the convolution layer \(l\) is described below (LeCun and Bengio 1998).

$$X_{j}^{l} = f_{{{\text{relu}}}} (\sum\limits_{i = 1}^{d} {X_{i}^{l - 1} \cdot K_{ij}^{l} } + B_{j}^{l} )$$
(6)

where \(X_{i}^{l - 1}\) represents the feature matrix of the input of the convolution layer on the \(i\)th channel; \(d\) denotes the number of input channels of the convolution layer; \(K_{ij}^{l}\) is the \(j\)th convolution kernel of the convolution layer corresponding to the \(i\)th input channel; \(\cdot\) denotes the convolution operator; \(B_{j}^{l}\) represents the \(j\)th convolution kernel bias of the convolution layer; \(X_{j}^{l}\) is the output feature matrix of the \(j\)th convolution kernel of the convolution layer, also known as feature map in the computer vision. After each convolutional layer, an activation layer is immediately followed. The activation layer essentially adds nonlinear elements to the model through the activation function, thereby making up for the expressive power of the linear model (Buduma and Locascio 2017). The activation layer does not change the dimensions of the input data. The rectified linear unit (ReLU) is generally selected as the activation function of the convolution layer. The definition of the activation function is as follows (Nair and Hinton 2010).

$$f_{{{\text{relu}}}} (x) = \max \,(0,x)$$
(7)

The activation function can reduce gradient vanishment and has a fast operation speed. There is a pooling layer after the last convolution layer, and the pooling size is set to 2 by 2. The size of the data passing through the pooling layer becomes half of that of the input, and the number of channels does not change. Convolution and pooling are regarded as infinitely strong priors (Goodfellow et al. 2016). The final flattening layer is responsible for converting multidimensional data into one-dimensional data.

Back propagation

The output \(y\) is obtained through the above process, where the input \(x\) passes through hidden layers such as a convolutional layer, a pooling layer, and a fully connected layer. This process is also called the forward propagation process. Through each epoch of the forward propagation, the value of the loss function can be obtained. The loss function is defined as follows

$$J(\theta ) = \frac{1}{2n}\sum\limits_{i = 1}^{n} {\left\| {y_{i} - f(x_{i} ;\theta )} \right\|^{2} }$$
(8)

where \(\theta\) is the optimization parameter, and \(n\) is the number of samples. The mean square error cost is used here. The loss function value \(J(\theta )\) can be obtained for each forward propagation, and the parameter information is adjusted by the gradient of the loss function. The process of spreading the loss function information back through the network is called backpropagation (Rumelhart et al. 1988). In this work, the stochastic gradient descent method is used to calculate the gradient.

$$\nabla_{\theta } J(\theta ) = - \frac{1}{n}\sum\limits_{i = 1}^{n} {(y_{i} - f(x_{i} ;\theta ))\frac{{\partial f(x_{i} ;\theta )}}{\partial \theta }}$$
(9)

The minibatch stochastic gradient descent is adopted to improve computing efficiency, and the batch size is set to 32. After obtaining the gradient of the loss function, the parameters are updated by the following formula, and the network training process enters the next epoch.

$$\theta^{\prime} = \theta - \varepsilon \nabla_{\theta } J(\theta )$$
(10)

where \(\theta^{\prime}\) is the updated parameter; \(\varepsilon\) is the learning rate. In this work, an adaptive estimation algorithm of learning rate named adaptive moment estimation (Adam) is adopted. The Adam algorithm dynamically adjusts the parameter’s learning rate based on the loss function’s first and second-moment estimates of the parameters (Kingma and Ba 2015).

Regularization

To reduce the generalization error of the model, two regularization methods were adopted which include early stopping, Dropout. Early stopping is to evaluate the performance of the model on the validation set during the training process, and stop training when the model performance on the validation set starts to decline to avoid the problem of overfitting caused by continued training (Prechelt 1998). In this work, the early stop criterion is to stop training when the validation set error increases continuously for 50 epochs. Dropout is another method adopted in this work to prevent overfitting. The Dropout strategy can randomly inactivate neurons at a certain percentage and inactivated neurons do not participate in the forward propagation of the network (Srivastava et al. 2014), and backpropagation only updates the weights of activated neurons, that is, the weights of inactive neurons are not updated. In this work, Dropout is applied to the last hidden layer, and the inactivation ratio is set to 0.25, that is, within each minibatch epoch, 192 neurons in the last hidden layer are inactivated.

For the implementation of the CNN regression model mentioned above, we make full use of the open-source software TensorFlow to facilitate our research. Instead of implementing the model directly based on TensorFlow (Abadi et al. 2016), we adopted another open-source software: Keras, which is an encapsulation of TensorFlow to support rapid practice so that we can quickly convert ideas into results without paying too much attention to the underlying details (Gulli and Pal 2017).

Evaluation of predicted results

To evaluate the unmodeled error of predictions, we compute the standard deviation (STD) of the difference between the original time series and the predicted time series. In addition, wavelet tools are utilized to analyze the original time series and the predicted time series. As an effective time–frequency analysis tool, wavelet is widely used in the field of image and audio, and also in the field of geodesy. As a time–frequency analysis method, the time–frequency resolution of the wavelet transform is variable. For example, in wavelet transform, the low-frequency part has a lower time resolution and higher frequency resolution, while the high-frequency part has a lower frequency resolution and higher time resolution (Ruch and Fleet 2009). The wavelet transform is widely used in signal analysis because of its adaptability to the signal. The definition of continuous wavelet transform (CWT) is as follows (Addison 2017)

$${\text{CWT}}_{x}^{{\uppsi }} (s,\tau ) = \frac{1}{\sqrt s }\int_{ - \infty }^{\infty } {x(t){\uppsi }\left( {\frac{t - \tau }{s}} \right){\text{d}}t}$$
(11)

where \(s\) is the scale factor, \(t\) is the translation factor, \({\uppsi }\) is the mother wavelet, and \(x\) is the signal to be analyzed. In the experiments, the CWT will be used to analyze the original time series and the predicted time series.

Precise point positioning (PPP)

In the above, the prediction model of unmodeled error is proposed, and the machine learning algorithm is introduced. We will study the prediction of unmodeled error of the precise point positioning (PPP) kinematic positioning model. The basic principle of PPP is introduced here. PPP is a technology that uses a single receiver to achieve global high-precision positioning. This technology makes use of the precise satellite orbit and clock products provided by external organizations such as IGS and takes into account the fine modeling of various errors. The ionosphere-free (IF) combination of dual-frequency is the most widely used observable model in PPP, which can eliminate the first-order ionosphere influence. The ionosphere-free combination observation equations of PPP are demonstrated as follows (Kouba and H´eroux 2001; Malys and Jensen 1990),

$$\left\{ {\begin{array}{*{20}l} {P_{{r,{\text{IF}}_{12} }}^{s} = \frac{{f_{1}^{2} }}{{f_{1}^{2} - f_{2}^{2} }}P_{r,1}^{s} - \frac{{f_{2}^{2} }}{{f_{1}^{2} - f_{2}^{2} }}P_{r,2}^{s} = \rho_{r}^{s} + c({\text{d}}t_{r} - {\text{d}}t^{s} ) + T_{r}^{s} + e_{{{\text{IF}}_{12} }} } \hfill \\ {\Phi_{{r,{\text{IF}}_{12} }}^{s} = \frac{{f_{1}^{2} }}{{f_{1}^{2} - f_{2}^{2} }}\Phi_{r,1}^{s} - \frac{{f_{2}^{2} }}{{f_{1}^{2} - f_{2}^{2} }}\Phi_{r,2}^{s} = \rho_{r}^{s} + c({\text{d}}t_{r} - {\text{d}}t^{s} ) + T_{r}^{s} + B_{{r,{\text{IF}}_{12} }}^{s} + {\text{d}}\Phi_{{r,{\text{IF}}_{12} }}^{s} + \varepsilon_{{{\text{IF}}_{12} }} } \hfill \\ \end{array} } \right.$$
(12)

where \(f_{i} (i = 1,2)\) denotes frequency; \(P_{r,i}^{s}\) and \(\Phi_{r,i}^{s} (i = 1,2)\) are pseudorange observations and carrier phase observations, both in meters; \(P_{{r,{\text{IF}}_{12} }}^{s}\) is the ionosphere-free combination observation of pseudoranges; \(\Phi_{{r,IF_{12} }}^{s}\) is the ionosphere-free combination observation of carrier phases; \(\rho_{r}^{s}\) is the geometric range from the receiver to the satellite; \(c\) is the speed of light in vacuum; \({\text{d}}t_{r}\) is the receiver clock offset; \({\text{d}}t_{s}\) is the satellite clock offset; \(T_{r}^{s}\) is the tropospheric delay; \(B_{{r,{\text{IF}}_{12} }}^{s}\) is the carrier phase bias, including the ionosphere-free combination of integer ambiguity and phase delays; \({\text{d}}\Phi_{{r,{\text{IF}}_{12} }}^{s}\) is ionosphere-free carrier phase corrections, including receiver antenna phase center correction, satellite antenna phase center correction, earth tide correction, and phase wind-up correction; \(e_{{{\text{IF}}}}\) and \(\varepsilon_{{{\text{IF}}}}\) are measurement noises of pseudorange combined observation and carrier combined observation, respectively. For the latter method verification, only the floating resolution results of the ionospheric-free model are used. The satellite clock correction provided by IGS is based on the ionosphere-free combination observation, which absorbs the ionosphere-free combination of the satellite pseudorange hardware delay. The ionosphere-free combination of receiver pseudorange hardware delay is absorbed by the clock error of the receiver. The carrier phase hardware delay is absorbed by the carrier phase bias \(B_{{r,{\text{IF}}_{12} }}^{s}\) defined herein. Therefore, in (12), the description of pseudorange hardware delay and carrier phase hardware delay is ignored. To obtain high-precision centimeter-level positioning results using PPP, the error terms in (12) have to be carefully modeled or estimated together with the user position. In this work, the open-source software RTKLIB (Takasu 2011) is used to realize PPP positioning, and kinematic positioning is tried.

Experiments and results

The data used in this study comes from IGS stations. Since the model training is very time-consuming, 12 stations around the world are randomly selected for training, and the distribution of these stations is shown in Fig. 3. The observation data of these stations in October 2018 are used, and the sampling interval of these observation data is 30 s. Only the \(L_{1}\) and \(L_{2}\) observations of GPS in the observation data were used in the experiment. RTKLIB, an open-source GNSS data processing software, was used for the data processing. The positioning mode was set to PPP kinematic mode. The filter type was set to forward filter solution, and the elevation mask angle was set to 15°. Model correction items considered in model refinement include antenna phase center, phase wind-up, solid earth tide, ocean loading, polar tide, and others, as described by the RTKLIB manual (Takasu 2011). Despite the addition of the various model corrections above, millimeter-scale level daily variations can occur due to imperfect ocean tides, other types of loading, and temperature effects on station monuments. Therefore, we assume that the actual displacement of the station is far less than the unmodeled error that we want to correct and take certain strategies to exclude the large displacement data from the training data.

Fig. 3
figure 3

Geographical distribution of IGS stations

After obtaining the positioning results according to the above processing strategy, we extracted the features and labels according to the method described in the methods section. To reduce the influence of abnormal training data, we discarded the data of days with a large standard deviation in the positioning results. Data of days with label standard deviations greater than 10 cm were removed. Although the standard deviation of each day of kinematic positioning result of the HKWS station is greater than 10 cm, the processing results obtained from these data are also provided. We use the data on October 30 and October 31 as the test data and use the 8-day data and 18-day data that meet the conditions above as the training data, respectively. Besides, in the training data, all single epoch data are shuffled and a quarter of the data was used as the validation data set.

Unmodeled error prediction of PPP kinematic positioning

The STD of the difference between the original time series and the predicted time series for the two-day test data is exhibited in Table 2. On the whole, the STD in the horizontal direction is smaller than that in the vertical direction, mainly because the measurement noise in the horizontal direction is smaller than that in the vertical direction. In addition, in the unmodeled error predictions for the next two days, there is no significant difference between the prediction results of day 1 and day 2. Taking the ALIC station as an example, the time–frequency analysis results of the predicted results of the north, east, and up coordinate components on the first day are shown in Figs. 4, 5, 6, respectively. In these figures, the top panel shows the original and the predicted time series, and the bottom-left and bottom-right panels show the wavelet analysis results of the original and the predicted time series, respectively. It can be seen from Fig. 4 that both the original and the predicted time series of the north coordinate component have spectral components with a period of about 8 h, especially between 10 and 20 h. The spectral components with a period of about 10 h in the unmodeled error of the east coordinate component have been effectively predicted, and it can be seen from Fig. 5 that it is mainly concentrated between 0 and 15 h. It can be seen from Fig. 6 that the unmodeled error of the up coordinate component is noisier than in the other two directions, and its spectral composition is more complex. Nonetheless, the time–frequency analysis of the original and the predicted time series also show a certain degree of similarity. Time–frequency analysis results of the SCH2 station are also given, as shown in Figs. 7, 8, 9. Unlike the previous station, the STD of the north component of this station is larger than that of the other two components. As can be seen from Figs. 7, 8, 9, the original time series of the north component is noisier than that of the other two components.

Table 2 STD of prediction results of different stations
Fig. 4
figure 4

Prediction results of the north coordinate component at the ALIC station

Fig. 5
figure 5

Prediction results of the east coordinate component at the ALIC station

Fig. 6
figure 6

Prediction results of the up coordinate component at the ALIC station

Fig. 7
figure 7

Prediction results of the north coordinate component at the SCH2 station

Fig. 8
figure 8

Prediction results of the east coordinate component at the SCH2 station

Fig. 9
figure 9

Prediction results of the up coordinate component at the SCH2 station

Unmodeled error prediction of the combined filtering mode

In the above experiments, the forward filtering mode is used, and the prediction results corresponding to the combined filtering mode are given below. Using 8 days’ data from the ALIC station, the STD of the difference between the original time series and the predicted time series on the first day are 18.12 mm, 16.59 mm, 52.58 mm, and the STD on the second day are 17.41 mm, 15.55 mm, 48.56 mm. The STD of combined filtering is smaller than that of the forward filtering, which is mainly because the noise is further suppressed in combined filtering. Wavelet analysis results of the predicted and original time series of the ALIC station are also given, as shown in Figs. 10, 11, 12. Compared with the forward filtering mode, the wavelet analysis results of the combined filtering mode are more similar to those of the original time series. It can be seen from the above experiments that the unmodeled error of the low-frequency component is effectively predicted.

Fig. 10
figure 10

Prediction results of the north coordinate component using the combined filtering mode at the ALIC station

Fig. 11
figure 11

Prediction results of the east coordinate component using the combined filtering mode at the ALIC station

Fig. 12
figure 12

Prediction results of the up coordinate component using the combined filtering mode at the ALIC station

Discussions

In the previous section, experiments of data-driven unmodeled error prediction, the number of days of training data was set to 8. The discussion of the number of days of training data on the prediction result is given here. Besides, the influence of different features on prediction is explored, and the corresponding STD is given when a different feature is excluded.

Influence of more training data on unmodeled error prediction

To further verify the influence of training data of different days on the prediction results, the prediction results of different training data of the ALIC station are provided in Figs. 13, 14. In these figures, the X-axis represents the number of training days, the Y-axis represents the STD of the difference between the original time series and the predicted time series. From the results, it can be seen that as the number of days increases, the STD decreases and then stabilizes. After using more than about 14 days of data, the STD stabilized. For all stations, the predicted results of 8-day training data and 18-day training data are compared as shown in Figs. 15, 16. It can be seen from these figures that the prediction results of most stations have been improved after using more training data. The main reason is that PPP positioning results show low-frequency characteristics as well as high-frequency characteristics, and the 8-day training data is not enough to meet the training requirements of the model, that is, the model is under-fitting. Therefore, in PPP kinematic positioning mode, more training data is more conducive to the prediction of unmodeled error.

Fig. 13
figure 13

First-day prediction results of different days of training data for PPP kinematic mode at the ALIC station

Fig. 14
figure 14

Second-day prediction results of different days training data for PPP kinematic mode at the ALIC station

Fig. 15
figure 15

Comparison of the first-day prediction results of 8-day training data and 18-day training data

Fig. 16
figure 16

Comparison of the second-day prediction results of 8-day training data and 18-day training data

Influence of different features on prediction results

To further explore the influence of different features on the predicted results, we try to exclude features one by one. Taking two stations ALIC and BAKE as examples, the STD of the difference between the original time series and the predicted time series obtained when the different feature is excluded are shown in Tables 3 and 4. When GPS time is excluded from the feature, the STD of the difference between the original time series and the predicted time series of the ALIC station changes obviously, and some components of the BAKE station also increase. Sidereal filtering is one of the proposed methods in multipath mitigation of sites, where time is a crucial parameter. When the satellite altitude, satellite azimuth, and signal-to-noise ratio are excluded, there is also an increase in the STD of some components. The multipath hemispheric mapping model is a multipath mitigation method based on the satellite elevation and satellite azimuth of the station, and many stochastic models for positioning depend on the satellite elevation and the signal-to-noise ratio. Observation residuals have a significant impact on the prediction results because the residuals reflect the suitability of the stochastic model. Besides, the STD is also influenced by the features of the flag type such as the valid data flag, which reflect the quality of the current positioning.

Table 3 STD of the ALIC station when the different feature is excluded
Table 4 STD of the BAKE station when the different feature is excluded

Different features have different effects on the prediction results, and the same feature has different effects on different stations. The influence of these features on the prediction results is nonlinear, and its regularity cannot be obtained intuitively. Therefore, in the feature design, it is recommended to include as many features as possible that may have an impact on the unmodeled error, and the determination of the weight of each feature is handed over to a large amount of training data.

Conclusions

This study set out to develop a data-driven method for unmodeled error prediction of a specific positioning model. An unmodeled error prediction method based on machine learning has been proposed and the convolution neural network is adopted as the regression model. The effectiveness of this method applying to the PPP kinematic positioning is verified by the historical observation data of IGS stations distributed all over the world. Wavelet analysis was employed to evaluate the prediction results. The most obvious finding from this study is that the data-driven model can effectively predict the unmodeled error in the GNSS positioning, especially in the low-frequency component. More training data to complete the training is more conducive to obtaining better prediction results. However, the prediction results do not keep improving with increasing training data but started to stabilize after reaching about 14 days of training data. In addition, we have investigated 13 training features and found that for various stations, different features have different importance in the neural network time series prediction. The impact of these features on the prediction results is nonlinear and complex, and it is recommended to include as many features as possible that may have an impact on unmodeled error. This is the first study that has evaluated the effectiveness of the GNSS unmodeled error prediction method based on historical data. Although only the PPP kinematic positioning was used in the experiment, this method should apply to other positioning strategies, which will be carried out in future research.