Abstract
The data pre-analysis plays a significant role in the noise determination. The most important issue is to find an optimum criterion for outliers removal, since their existence can affect any further analysis. The noises in the GNSS time series are characterized by spectral index and amplitudes that can be determined with a few different methods. In this research, the Maximum Likelihood Estimation (MLE) was used. The noise amplitudes as well as spectral indices were obtained for the topocentric coordinates with daily changes from few selected EPN (EUREF Permanent Network) stations. The data were obtained within the EPN re-processing made by the Military University of Technology Local Analysis Centre (MUT LAC). The outliers were removed from the most noisy 12 EPN stations with the criteria of 3 and 5 times the standard deviations (3σ, 5σ) as well as Median Absolute Deviation (MAD) to investigate how they affect noise parameters. The results show that the removal of outliers is necessary before any further analysis, otherwise one may obtain quite odd and unrealistic values. The probability analysis with skewness and kurtosis was also performed beyond the noise analysis. The values of skewness and kurtosis show that assuming a wrong criterion of outliers removal leads to the wrong results in case of probability distribution. On the basis of the results, we propose to use the MAD method for the outliers removal in the GNSS time series.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Commonly, the noises in most of geophysical time series are described as a power-law process (Agnew 1992) with the power spectrum equal to:
where f is the spatial or temporal frequency, P 0 and f 0 are the normalising constants and κ is the spectral index of noise (Mandelbrot and Van Ness 1968). Agnew (1992) described that the spectral indices for the geophysical processes often fall between −3 and −1. The integer values of indices indicate special types of noises: “κ = −2” represents random-walk process which is related to the monument instability of the GPS antennae (Johnson and Agnew 1995; Williams et al. 2004; Klos et al. 2014); “κ = −1” stands for the flicker noise process (Mandelbrot 1983) that is recognized in most of GNSS time series (Mao et al. 1999; Williams et al. 2004; Bogusz and Kontny 2011); “κ = 0” corresponds to the white noise which is not correlated in time.
Any of the topocentric component is thought to follow the sum of:
where x 0 is the initial value, v x is the velocity, A, ω, ϕ are the amplitude, angular velocity and phase shift of the i-th periodic component of a time series, O x stands for any known outliers, x off for offsets, p is the Heaviside step function, ε x is the noise. The noises in geophysical time series are correlated in time. This correlation has a great impact on any linear parameters that are estimated from these time series (Williams 2003).
The outliers detection and their removal plays a significant role in the interpretation of the GNSS data. The disputable issue here is the criterion. The most common criteria that depend on the time series character are the removal of values greater than 3 or 5 times the standard deviation. Bergstrand et al. (2007) estimated the noises in the GPS time series after removal of the outliers with 5σ criterion what was stated to be more conservative approach than the 3σ one, used for instance by Johansson et al. (2002). Dong et al. (2006) used the method of discarding the residuals exceeding the constant values of 100, 100 and 300 mm for east, north and vertical components, respectively, to remove the outliers before performing the Principal Component Analysis (PCA). It is worth to note that sigma-based methods correspond strictly to the normal distribution of data. However, what about data that are not normally distributed? Having the above in mind, we decided to investigate the influence that the outliers removal method may have on the time series characteristic using skewness, kurtosis (derived from the moments of data probability density function – PDF) and noise analysis (with Maximum Likelihood Estimation). We took 12 extremely spread EPN time series and removed the outliers with three chosen criterions. At the beginning, the commonly used 3 and 5 times of standard deviations were applied that assume data normal distribution. Then, the Median Absolute Deviation criterion was used. Our main goal of this research was to show how the proper removal of outliers affects estimation of kurtosis and skewness and therefore our understanding of the nature of the data. As shown previously by Peinke et al. (2004) or Sura and Gille (2003), the geophysical phenomena are not necessarily Gaussian. The deviations from Gaussianity can have an impact on the real dynamics. On the other hand, Sura and Gille (2010) stated that the skewness is positive if the additive and multiplicative noises are positively correlated and the skewness is negative if the noise terms are negatively correlated.
2 Data Processing and Methods
The time series used in the following research were obtained within the reprocessing project (“repro-1”) according to the EPN guidelines (Bruyninx et al. 1996) using Bernese 5.0 software (Dach et al. 2007). It was performed at the Military University of Technology in the Centre of Applied Geomatics that is one of the 16 independent Local Analysis Centres (MUT LAC). The coordinates in the ITRF2005 reference frame (Altamimi et al. 2007) were obtained as the result. The set of 12 stations with the greatest number of outliers was selected to the research. The white and power-law noise were assumed to be present in the time series before the Maximum Likelihood Estimation (MLE) with CATS software (Williams 2008). The MLE method follows the equation of:
The power-law noise is characterized by spectral index κ and the amplitude A. The MLE method has been already successfully used to evaluate noises in many researches, described e.g. in the papers by Beavan (2005), Bergstrand et al. (2007), Teferle et al. (2008), Bos et al. (2008).
3 Outliers Removal in the Noise Analysis
Three methods of outliers removal were tested in this research. The first and the second one removed the outliers greater than 3 and 5 times the standard deviation of time series (referred to as: 3 sigma (3σ), 5 sigma (5σ)), respectively. The third one focused on the Median Absolute Deviation – MAD (Mosteller and Tukey 1977; Sachs 1984), of time series. No interpolation of removed data was performed. The advantage of MAD method is being much more robust for outliers than sigma-based methods. The ‘robust’ is being used throughout the paper when describing the MAD method. We mean here that the data median value makes MAD not to be as sensitive to outliers as the sigma-based criterions are. The MAD is calculated from:
To use the MAD value in a similar way as the standard deviation for the normal distribution, we multiply it by 1.4826 (Ruppert 2011). Later in this paper, whenever we use MAD it is actually \( 3\cdot 1.4826\cdot \mathrm{M} AD \), what makes the values of median absolute deviation close to 3 times the standard deviation, but never equal to. Twelve extremely noisy EPN stations (BISK, BOLG, CNIV, BZRG, HERS, MDVO, MEDI, MOPI, NYIR, SNEC, ZWEN, SFER) were chosen to investigate how the outliers influence noise estimation (Figs. 1 and 2).
The number of outliers removed from the twelve of the analyzed stations reaches the greatest value of 4% for ZWEN station with the 3 sigma criterion, whereas it is larger than 15% for MAD for the same station (Fig. 3). The MLE was performed after outliers removal with 3σ, 5σ, MAD assuming the white plus power-law noises. As the result, the spectral indices and noise amplitudes with uncertainties were obtained (Fig. 4a–c).
The spectral indices for twelve of analyzed stations range between −2 and 0. The noise amplitudes for stations with spread time series reach quite odd and unrealistic values (HERS, SNEC, SFER). The noise amplitude uncertainties in case of no removal of outliers are too large and unacceptable. All stations prove the necessity of outliers removal. The disputable issue here is the criterion. No removal or 5σ criterion brings unacceptable results for stations with just a few of outliers (BISK; BOLG; CNIV; BZRG; HERS – the North and East components; MDVO; MEDI; MOPI; NYIR; ZWEN). The noise amplitudes obtained after 3σ or MAD criterion are smaller than \( 10\ mm\cdot y{r}^{\kappa /4} \) and quite close to each other at the same time for the consistent time series. The situation changes in case of spread time series. Here, the MAD criterion results in smaller noise amplitudes and uncertainties as well. The most interesting time series with extremely spread values for both horizontal and vertical changes comes definitely from the SNEC station. The spectral index for SNEC was estimated as close to random-walk what may be interpreted as changes related to the monument instability. As stated by King and Williams (2009) random-walk amplitudes for well monumented stations are probably no higher than \( 0.5\ mm\cdot y{r}^{-0.5} \). The SNEC station with such a spread time series reaches the highest noise amplitude. It is still too large even after MAD outliers removal. Now, the BZRG station with quite consistent time series with two periods of strong reflexes from trend. No removal of outliers, 5σ and 3σ criteria result in similar values of amplitudes, while the MAD criterion results in smaller and interpretable noise parameters. It causes the reduction of amplitudes to around \( 10\ mm\cdot y{r}^{\kappa /4} \) with the increment of spectral index to -1 for the Up component. Bearing in mind, that the type and amplitude of noise takes part in estimation of the linear parameters from the time series, one has to understand the values he obtains. Sometimes they do not strictly reflect the existence of the noise, but they can simply be the effect of the wrong or even lack of data pre-analysis.
4 The Probability Analysis
The probability analysis was conducted beyond the noise analysis. The point is whether treating the time series as normally distributed for the GNSS time series and therefore using the 3σ criterion for outliers removal is appropriate or some robust method (here MAD) should be used. The analysis was performed by estimation of moments of the data’s probability density function (PDF) that are the skewness and kurtosis. Their advantage in this study, however, is high sensitiveness to outliers.
The asymmetry of PDF’s shape can be described by the skewness:
where \( \overline{x} \) is the mode of x, σ is the standard deviation of the data and E is the expected value. If the classic Gaussian distribution is considered, its skewness is equal to zero. If not, the distribution is skewed right for values greater than zero or skewed left for values below zero. The standard error of skewness (SES) can be computed by (Cramer 1977):
where n is the number of data in the time series. In this paper, \( {\it SES}=\pm 0.06 \). The value of \( 3\times {\it SES}=\pm 0.18 \) was assumed here as the boundary value for normal distribution.
The kurtosis is a measure of the probability distribution “peakedness” of a real-valued random variable. The kurtosis is computed by the formula:
If the kurtosis is equal to 3 we deal with the normal distribution. High kurtosis means that the peak near the mean is distinct, and probability distribution decline rather rapidly. The standard error of kurtosis can be estimated by (Cramer 1977):
where n is the number of data in the time series. Here, \( {\it SEK}=\pm 0.12 \) and \( 3\times {\it SEK}=\pm 0.36 \) were assumed as the boundary values for the normal distribution. The skewness and kurtosis put together can indicate the normally distributed time series.
Firstly, the skewness and kurtosis were calculated for data with no removal of outliers. Then, for the 5σ, 3σ and MAD criterion. The usage of 5σ brought the unexpectedly good betterment in the analyzed values (even though there were just few values exceeding this limit), what proved that the skewness and kurtosis are really sensitive to outliers (Fig. 5). The differences in the skewness values after removal of outliers with 3σ and MAD criteria are mostly within 3 times of SES for the horizontal components what proves that the use of removal criterion does not change the probability distribution. Three stations (HERS, SNEC, SFER) in case of the Up component show quite large differences between skewness after 3σ and MAD. The differences between the kurtosis values after 3σ and MAD removal in most cases fall into 3 times the SEK. However, the differences are greater for few stations: HERS (the East and Up components), MDVO (the East component), SNEC (the East and Up components), SFER (the North and Up components). One of the kurtosis interpretations is the precision of data gathered. If kurtosis is high, precision is also high – the peak near the mean is very distinct (but only if the skewness is equal to 0). In case of the inappropriate criterion of outliers removal and no analyses of skewness, remaining outliers can have a significant impact on kurtosis values and therefore lead to falsified conclusions. The example of data stated as highly precised (without analysing its skewness) is presented in the Fig. 6. However, it is well known that high values of kurtosis can also mean heavy tails, which is exactly what would be expected if outliers are present. Thus, the large value of kurtosis obtained without outliers removal is entirely expected. Therefore the data pre-analysis is so essential before any further estimations.
5 Discussion and Conclusions
Our main goal in this research is to show how the proper removal of outliers affects the estimation of kurtosis and skewness and therefore our understanding of the nature of the data. The pre-analysis of data that includes outliers removal has to be well-chosen to the type of time series. The commonly used 3σ criterion seems to fail in case of spread GNSS time series, due to the fact that the standard deviation is calculated from the whole data set. Otherwise, the MAD criterion seems to be more appropriate for outliers removal, since it is calculated from the median value and therefore is much more robust for outliers than sigma-based methods. The obvious issue is that the outliers have to be removed, while further analyses that are to be conducted could be really sensitive to them. As showed in this research, although the MLE method resulted in quite consistent spectral indices, the amplitudes of noises were unacceptable in a few cases. They did not even differ in the range of their uncertainties, what may result in the variety of wrong interpretations. To show how the outliers can affect any further estimations, the probability analysis was performed, since skewness and kurtosis are highly sensitive to outliers. We showed that the wrongly-chosen criterion leads to the misinterpretation on the time series distribution and also data precision. A few of differences of skewness and kurtosis showed in this research were higher than the set value of 3 times the SEK and SES. It proved that sometimes the use of 3σ criterion is not proper enough to remove outliers since the analyzed time series do not strictly reflect the normal distribution. On the basis of the results, the usage of the MAD criterion is recommended for the GNSS data. Its advantages over commonly used sigma-based criteria are quite obvious, according to the presented paper. Being less sensitive to outliers, it removes greater number of them, providing in this way better interpretation of real effects. The presented paper discusses the univariate time series. In the future, authors plan to expand the work for multivariate cases as in Feng (2012).
References
Agnew DC (1992) The time-domain behaviour of power-law noises. Geophys Res Lett 19(4):333–336
Altamimi Z, Collilieux X, Legrand J, Garayt B, Boucher C (2007) ITRF2005: a new release of the International Terrestrial Reference Frame based on time series of station positions and earth orientation parameters. J Geophys Res Solid Earth 112(B9). doi:10.1029/2007JB004949
Beavan J (2005) Noise properties of continuous GPS data from concrete pillar geodetic monuments in New Zealand and comparison with data from U.S. deep drilled braced monuments. J Geophys Res 110:B08410 doi:10.1029/2005JB003642
Bergstrand S, Schnereck H-G, Lidberg M, Johansson JM (2007) (2007): BIFROST: Noise properties of GPS time series. Dyn Planet Int Assoc Geodesy Symposia 130:123–130. doi:10.1007/978-3-540-49350-1_20
Bogusz J, Kontny B (2011) Estimation of sub-diurnal noise level in GNSS time series. Acta Geodynamica et Geomaterialia 83(163)273–281
Bos MS, Fernandes RMS, Williams SDP, Bastos L (2008) Fast error analysis of continuous GPS observations. J Geodesy 82:157–166. doi:10.1007/s00190-007-0165-x
Bruyninx C, Gurtner W, Muls A (1996) The EUREF permanent GPS network. Ankara, Turkey, May 22–25 1996, EUREF Publication No. 5, Veröffentlichungen der Bayerischen Kommission für die Internationale Erdmessung der Bayerischen Akademie der Wissenschaften, pp 123–130
Cramer D (1977) Basic Statistics for Social Research. Step-by-step calculations and computer techniques using. Routledge, Minitab. ISBN 0-419-12004-7
Dach R, Hugentobler U, Fridez S, Meindl M (eds) (2007) Bernese GPS software version 5.0. Astonomical Institute, the University of Bern, Bern
Dong D, Fang P, Bock Y, Webb F, Prawirodirdjo L, Kedar S, Jamason P (2006) Spatiotemporal filtering using principal component analysis and Karhunen-Loeve expansion approaches for regional GPS network analysis. J Geophys Res 111:B03405. doi:10.1029/2005JB003806
Feng Y (2012) Regression and hypothesis tests for multivariate GNSS state time series. J Global Positioning Syst 11(1):33–45
Johansson JM, Davis JL, Schnereck H-G, Milne GA, Vermeer M, Mitrovica JX, Bennett RA, Jonsson B, Elgered G, Elosegui P, Koivula H, Poutanen M, Ronnang BO, Shapiro LI (2002) Continuous GPS measurements of postglacial adjustment in Fennoscandia J Geodetic results. J Geophys Res 107(B8):2157. doi:10.1029/2001JB000400
Johnson HO, Agnew DC (1995) Monument motion and measurements of crustal velocities. Geophys Res Lett 22(21):2905–2908. doi:10.1029/95GL02661
King MA, Williams SDP (2009) Apparent stability of GPS monumentation from short-baseline time series. J Geophys Res 114:B10. doi:10.1029/2009JB006319
Klos A, Bogusz J, Figurski M, Kosek W (2014) Noise analysis of continuous GPS time series of selected EPN stations to investigate variations in stability of monument types. Accepted for publication by Springer in the IAG Symposium Series volume 142
Mandelbrot B (1983) The fractal geometry of nature. W.H. Freeman, San Francisco, 466 pp
Mandelbrot B, Van Ness J (1968) Fractional Brownian motions, fractional noises, and applications. SIAM Rev 10:422–439
Mao A, Harrison CGA, Dixon TH (1999) Noise in GPS coordinate time series. J Geophys Res 104(B2):2797–2816
Mosteller F, Tukey J (1977) Data analysis and regression. Addison-Wesley, Upper Saddle River
Peinke J, Bottcher F, Barth S (2004) Anomalous statistics in turbulence, financial markets and other complex systems. Ann Phys 13:450–460
Ruppert D (2011) Statistics and data analysis for financial engineering. Springer, New York, Dordrecht, Heidelberg, London. doi:10.1007/978-1-4419-7787-8
Sachs L (1984) Applied statistics: a handbook of techniques. Springer-Verlag, New York, p 253
Sura P, Gille ST (2003) Interpreting wind-driven Southern Ocean variability in a stochastic framework. J Mar Res 61:313–334
Sura P, Gille ST (2010) Stochastic dynamics of sea surface height variability. J Phys Oceanogr 40(7):1582–1596. doi:10.1175/2010JPO4331.1
Teferle FN, Williams SDP, Kierulf KP, Bingley RM, Plag HP (2008) A continuous GPS coordinate time series analysis strategy for high-accuracy vertical land movements. Phys Chem Earth 33(2008):205–216. doi:10.1016/j.pce.2006.11.002
Williams SDP (2003) The effect of coloured noise on the uncertainties of rates estimated from geodetic time series. J Geodesy 76:483–494. doi:10.1007/s00190-002-0283-4
Williams SDP (2008) CATS: GPS coordinate time series analysis software. GPS Solutions 12:147–153. doi:10.1007/s10291-007-0086-4
Williams SDP, Bock Y, Fang P, Jamason P, Nikolaidis RM, Prawirodirdjo L, Miller M, Johnson D (2004) Error analysis of continuous GPS position time series. J Geophys Res 109:B03412. doi:10.1029/2003JB002741
Acknowledgments
This research was financed by the Faculty of Civil Engineering and Geodesy MUT statutory research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Klos, A., Bogusz, J., Figurski, M., Kosek, W. (2015). On the Handling of Outliers in the GNSS Time Series by Means of the Noise and Probability Analysis. In: Rizos, C., Willis, P. (eds) IAG 150 Years. International Association of Geodesy Symposia, vol 143. Springer, Cham. https://doi.org/10.1007/1345_2015_78
Download citation
DOI: https://doi.org/10.1007/1345_2015_78
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24603-1
Online ISBN: 978-3-319-30895-1
eBook Packages: Earth and Environmental ScienceEarth and Environmental Science (R0)