Introduction

Global Positioning System (GPS) measurements are an extremely valuable source of data for a wide range of geophysical studies, many of which had to rely on imprecise or difficult to obtain and sparse data until not very long ago. GPS data is useful for studies ranging from different fields of tectonics (e.g. Blewitt and Lavalée 2002; Davis et al. 2006; Heflin et al. 2020; Ji et al. 2020) to seismology (e.g. Davis et al. 2006; Wernicke and Davis 2010; Gonzalez-Ortega et al. 2014). The present paper deals with application of GPS to studies in volcanology (e.g. Mann et al. 2002; Bartel et al. 2003; Miyagi et al. 2004; Puglisi and Bonforte 2004; Janssen 2007; Geirsson et al. 2017; Larson et al. 2010; Lee et al. 2015), where GPS data is used to monitor deformations in volcanic edifices, because these deformations can be related to magma of gas-related processes going on underground, so that GPS measurements are useful for studying these volcanic processes and sometimes to anticipate possible eruptive activity.

The subject of the present study is to identify very large periodic (or semi-periodic) components due to seasonal signals (e.g., Dong et al. 2002; Ray et al. 2008; Tregoning and Watson 2009; Chen et al. 2013; He et al. 2017; Wang et al. 2018), the causes of which can be many, e.g., Chanard et al. 2020, and are outside the scope of this paper.

Because seasonal components present large amplitudes with respect to the amplitudes of background volcanic processes, many methods to identify seasonal components have been proposed (e.g., Davis et al. 2006, 2012; Freymueller 2009; Chen et al. 2013, 2020; Gruszczynska et al. 2017; He et al. 2017; Klos et al. 2019; Ji et al. 2020; and many others).

In the present paper, we contribute a simple-minded, intuitive approach to seasonal component identification. The analysis program we present identifies each seasonal component, and if desired, outputs a series consisting of the original one minus the seasonal component. If the seasonal component is the object of interest, it can be easily obtained by subtracting the resulting series from the original one.

Method

In what follows, we present a method for identifying seasonal components, whether annual, semiannual, draconitic, etc. We will use a simple method to identify a periodic component within a time series, based on the wonderful capacity of Fourier analysis to identify a component with a given period, even in the presence of other periods and noise.

Let \(d=\left\{{d}_{i}; \,i=\mathrm{1,2},\dots ,N\right\}\) be a time series having N elements sampled with sampling interval \(\Delta t\), so that element \({d}_{i}\) corresponds to time \({t}_{i}={t}_{1}+(i-1)\Delta t\), where \({t}_{1}\) is the time of the first element.

Let the period of the seasonal component to be identified, and perhaps later eliminated, be T, so there will be \({N}_{T}=\mathcal{R}\left(T/\Delta t\right)\) elements in one period, where \(\mathcal{R}\) is the “rounding” function that assigns the integer closest to the real number argument. Hence, we will be really identifying the period \({N}_{T} \Delta t\), which we will suppose to be close enough to T for practical purposes.

To eliminate the seasonal component two cases are considered. Case one is when series are at least one period long, \(N\ge {N}_{T}\). It is well known that Fourier analysis can correctly characterize only periods that are submultiples of the total observation time, and other periods present in the sampled series are represented in terms of various submultiple periods. Thus, except for the case \(N=k{N}_{T}\), where k is some positive integer, straightforward application of the Fourier transform to the whole series is not appropriate to determine amplitude and phase of a seasonal component with period T.

From a series with N elements, \(m=N-{N}_{T}+1\) different continuous segments one period long can be chosen, \({d}^{(j)}=\left\{{d}_{j},{d}_{j+1},\dots ,{d}_{j+{N}_{T}-1};\, j=1,\dots ,m\right\}\), and the Fourier transform of each of these series:

$$D_{1}^{\left( j \right)} = \mathop \sum \limits_{q = j}^{{j + N_{T} - 1}} {\text{e}}^{{ - i \,2{\uppi }f_{o} \left( {q - j} \right)\Delta t}} d_{q} \Delta t,$$
(1)

(only one spectral component, the fundamental, is needed) yields an amplitude \({A}^{(j)}\) and a phase \({\psi }^{(j)}\) measured with reference to \({t}_{j}\). This phase is changed to a phase \({\phi }^{(j)}\) referred to the time of the first element of the series \({t}_{1}\) as \(\phi^{\left( j \right)} = \psi^{\left( j \right)} - 2\pi f_{o} \left( {t_{j} - t_{1} } \right)\).

Transforms from different segments, as many as possible, are used because, as will be discussed below, seasonal signals are usually not invariant over time, which means the amplitude and phase determined from one segment may not be representative of the whole series.

Once amplitudes and phases have been computed, a first option consists of using the mean values of the m amplitudes and phases, \(\overline{A }\) and \(\overline{\phi }\), respectively, to define an average seasonal component as:

$${s}_{i}=\overline{A }+\mathrm{cos}\left[2\pi {f}_{o}\left({t}_{i}-{t}_{1}\right)+\overline{\phi }\right] ,$$
(2)

which gives good results for non-varying seasonal signals and yields estimates of the spread in A and \(\phi\) through their standard deviations. Figure 1 is an example of this option applied to a segment of a GPS displacement series from Colima volcano.

Fig. 1
figure 1

A Displacement GPS time series recorded at Colima volcano (blue) and seasonal component from the average phase and amplitude option (red). B Resulting displacement after elimination of the seasonal component (black line) and original displacement (thin blue line)

However, it has been observed that the seasonal signal in geodetic time series is not invariant (Tregoning and Watson 2009; Davis et al. 2012; Chen et al. 2013; Ji et al. 2020; Tucikešić et al. 2020; Ruttner 2021), and different methods for characterizing the seasonal components are proposed in the papers referenced above.

Our approach to varying seasonal components is based on Eqs. (2) and (4) of Davis et al. (2012), who observe that a small segment of the time signal can be considered as belonging to a seasonal sinusoid if the amplitude does not change too rapidly over the period \(T\), in which case the seasonal \(s\left(t\right)\) can be written as:

$$\mathrm{s}\left(t\right)=A\left(t\right)\mathrm{ cos}\theta \left(t\right)=A\left(t\right)\mathrm{cos}\left[2\uppi {f}_{o}t+\phi (t)\right] ,$$
(3)

where \({f}_{o}={T}^{-1}\), \(A(t)\) and \(\theta (t)\) are instantaneous amplitude and total phase, respectively, and \(\phi (t)\) is the instantaneous phase offset, so that a variation in the instantaneous frequency \(f(t)\) implied in Eq. (1) can be expressed through the variation in \(\phi (t)\) as:

$$f\left(t\right)={f}_{o}+\frac{1}{2\uppi }\frac{{\text{d}}\phi \left(t\right)}{{\text{d}}t} ,$$
(4)

keeping \({f}_{o}\) constant.

Thus, for each of the \({d}^{(j)}\) series, instantaneous amplitudes \({A}^{(j)}\) and \({\phi }^{(j)}\) are determined from the Fourier transform as mentioned above. Since each pair of these values is representative of the whole segment, the values are assigned to the middle element of the series, i.e., \({A}_{j+{N}_{T}/2-k}={A}^{(j)}\) and \({\phi }_{j+{N}_{T}/2-k}={\phi }^{(j)}\), where \(k=0\) for even \({N}_{T}\) and \(k=1\) for odd \({N}_{T}\), and we have two series of instantaneous parameters, \(A\) and \(\phi\), to substitute in (3). The \({N}_{T}/2\) elements at each extreme are “tails” of the method and can either be discarded for series that are long enough so that information from the tails is not needed for interpretation. For short or middle sized series, with lengths only a few times the tail length, the tails can be assigned the closest values, i.e., \({A}^{(1)}\) and \({\phi }^{(1)}\) at the beginning, and \({A}^{(m)}\) and \({\phi }^{(m)}\) at the end.

The seasonal component is then computed as:

$${s}_{i}={A}_{i}+\mathrm{cos}\left[2\pi {f}_{o}\left({t}_{i}-{t}_{1}\right)+{\phi }_{i}\right] .$$
(5)

Figure 2 shows an example of the application of this method to a long segment of GPS displacement data from the Colima Volcano, and Fig. 3 shows the instantaneous phases and amplitudes.

Fig. 2
figure 2

A Displacement GPS time series recorded at Colima volcano (blue) and seasonal component from varying phase method (red); dotted lines indicate the tails of the fit. B Resulting displacement after elimination of the seasonal component (black line) and original displacement (thin blue line)

Fig. 3
figure 3

Blue lines are A instantaneous phase and B amplitude for the seasonal fit shown in Fig. 2. Horizontal red lines indicate the mean values and dotted lines are the mean ± one standard deviation

Case two is for series that are shorter than one period \(N<{N}_{T}\). It is not feasible to characterize a signal with period longer than the total duration from direct application of Fourier analysis, but we propose a bootstrap method to get an approximate characterization, based on the fact that the period of interest is already known (postulated).

First, we compute an incomplete transform with \(K=N\) as in (6):

$$D_{1} = \mathop \sum \limits_{q = 1}^{K} {\text{e}}^{{ - i 2{\uppi }f_{o} \left( {q - 1} \right) \Delta t}} d_{q} \Delta t,$$
(6)

which results in approximate amplitude \({A}^{[0]}\) and phase \({\phi }^{[0]}\), where [0] indicates that it is the zero’th iteration.

Next, the “missing” data are computed as:

$${d}_{i}={A}^{[y]}+\mathrm{cos}\left[2\pi {f}_{o}\left({t}_{i}-{t}_{1}\right)+{\phi }^{[y]}\right] ;\, i=N+1,\dots ,{N}_{T},$$
(7)

for \(y=0\). These extrapolated data are appended to the original series, and a new “complete” transform is calculated using (6) with \(K={N}_{T}\) to obtain \({A}^{[1]}\) and \({\phi }^{[1]}\).

The process is iterated, replacing at each iteration the previously interpolated data by the new ones, until for some y iteration, \({A}^{[y]}-{A}^{[y-1]}<{\epsilon }_{A}\) and \({\phi }^{[y]}-{\phi }^{[y-1]}<{\epsilon }_{\phi }\), where \({\epsilon }_{A}\) and \({\epsilon }_{\phi }\) are convergence criteria (we use \({\epsilon }_{A}=0.001 {\text{mm}}\) and \({\epsilon }_{\phi }=0.001^\circ\)) or a maximum number of iterations is reached (we use \({y}_{\mathrm{max}}=2000\), but for all cases having a reasonable length, estimates converge after at most a few tens of iterations).

After convergence, the seasonal component is computed from:

$${s}_{i}={A}^{[y]}+\mathrm{cos}\left[2\pi {f}_{o}\left({t}_{i}-{t}_{1}\right)+{\phi }^{[y]}\right] ;\, i=1,..N,$$
(8)

and subtracted from the original series.

Figure 4 shows an example of this bootstrap fitting to the first 300 elements of the signal shown in Fig. 1. A comparison of both the fitted seasonal component and the resulting series shows that our approximate iterative fitting works quite well.

Fig. 4
figure 4

A Short segment of the displacement GPS time series recorded at Colima volcano shown in Fig. 1 (blue) and seasonal component from boot-strapping method (red). The yellow lines are the fits for the 9 iterations needed to attain convergence. B Resulting displacement after elimination of the seasonal component. Fit and resulting series can be compared with the equivalent segment in Fig. 1

Synthetics and discussion

To keep this paper at a reasonable length, only a few examples of application to real data were shown here, but, although results look nice, it is not possible to judge from real data whether seasonal components were correctly identified, since it is not known a priori which, if any, are present in the data. It is necessary to test the proposed method with synthetic signals that consist of, or incorporate, known components.

The method was implemented as a MATLAB program, diseas.m, which can be found in http://cicese.repositorioinstitucional.mx/jspui/handle/1007/3925. Also found at the same address is a complementary program, syntseas.m, which generates synthetic series with one or several different periods and corresponding original phases, and the option of varying the phase in time for each period, by adding (or subtracting) a quantity that changes in time as a sine function with a chosen period. The synthetics program can also add to the seasonals a trend, and add random noise, either normally or uniformly distributed, with a chosen standard deviation. Evaluating the effects of a trend in the data is important because it represents a signal with longer “period” than the duration of the time series and, being the double integral of a Dirac delta function, has a Fourier spectrum \(\propto {f}^{-2}\), which means that it can be an important factor in the treatment of short series.

Instructions for the use of the abovementioned programs are included in the corresponding headers, and the programs themselves are user-friendly. Thus, interested readers can use the attached programs to reproduce the results we will now show, and get figures showing how seasonal components are fitted.

First, we will show a compilation of results from seasonal identification in long series Fig. 5 shows \(\epsilon\), the rms fit error between true and estimated seasonals, for a yearly seasonal with \(T=1 {\text{year}}\), amplitude \(A=5.0 {\text{mm}}\), and initial phase \({\phi }_{0}=33.0^\circ , \mathrm{for various series lengths }N\). The first seasonal long series model, L0, a sine curve without noise, phase variations, or trend, is not illustrated because the identification is exact and errors are zero up to at least six decimal places.

Fig. 5
figure 5

Fit rms error \(\epsilon\) between true and estimated yearly seasonals for different series lengths N. Blue lines are the total error for the whole series and dashed red lines are the error excluding the tails

Seasonal model L1 (Fig. 5A) illustrates the effect of adding normally distributed random noise with 0.5 mm standard deviation to L0 (random number generator seed = 88). The error corresponding to the whole series is plotted as a continuous blue line, and the red dashed line is the error without including the tails. Error increases as the series grows longer, i. e., as more noise is added, but it is always very small, smaller than the noise standard deviation.

Seasonal model L2 (Fig. 5B) is L0 with added trend of \(1.0 \text{mm/yr}\). The error is null outside the tails and the total error decreases as the series grows and tails become smaller in relation to the total length.

Seasonal model L3 (Fig. 5-C) shows the effect of adding varying phase to L0 as:

$$\phi ={\phi }_{0}+\Delta \phi \mathrm{ sin}\frac{2\uppi }{\theta }(t-{t}_{1}) ,$$
(9)

where \(\Delta \phi\) is the maximum increment (or decrement if negative) that is attained at time \(\theta /4\) after \({t}_{1}\). The example features \(\Delta \phi =20^\circ\) and \(\theta =5 {\text{yr}}\). The error for no tails is quite small, and the error with tails decreases as tails get relatively smaller.

Finally, model L4 (Fig. 5D) includes both varying phase and trend. Again, trend does not increase the error for the segments without tails, but it produces some error within the tails. Adding noise to L4 (not shown) does not result in a significant increase in error.

In the examples shown above, we have added fairly large amounts of noise, phase variation and trend, yet errors remain small, with the largest ones being less than ~ 0.04 of the seasonal range (here 10 mm), and the difference between postulated and adjusted seasonals is quite small.

Next, we use synthetics to test the bootstrap scheme (referred to from here on as BSS) for identifying seasonal components of a given period in short series with durations less than one period. The wisdom of trying to do this may be questioned, because Fourier analysis tells us that it cannot be done exactly. However, it is not uncommon for observed time series to have gaps due mainly to power shortages or instrument malfunction; a problem complicated by the fact that access to GPS stations located on volcanoes may be complicated because of weather factors or because of the volcanic activity itself. Hence, it is not uncommon to have relatively short data segments that may be important to the interpretation of the volcanic activity, and many times some overall curvature in the data segment will indicate the presence of a seasonal component that must be extracted for the data to be useful. Hence, we propose to identify, albeit approximately, seasonal components in short series, and application of the BSS to synthetic series will let us estimate how reliable the results will be depending on the length of the series.

Figure 6 shows how parameters A and \(\phi\) are identified (plots A and B, respectively), the number of iterations to convergence (plot C), and, most important, the rms error between the true seasonal component and the identified one (plot d), for different series lengths N from 25 to 364. This example uses a series having the same characteristics as the one used previously, \(T=1 {\text{year}}\), amplitude \(A=5.0 {\text{mm}}\), and initial phase \({\phi }_{0}=33.0^\circ\); we will not consider varying phases.

Fig. 6
figure 6

Estimated amplitude A (A) and phase \(\phi\) (B), number of iterations to convergence \({N}_{it}\) (C), and rms fit error between true and estimated seasonals \(\epsilon\) (D) for four different models as a function of series length N. Model S0 is a simple sinusoid, model S1 is model S0 plus normally distributed random noise, model S2 is model S0 plus a trend, and model S3 is model S2 plus noise. In (C), the approximate \({N}_{it}\) values for very short series are written rather than plotted in order to see the details for longer series

Since the most important question is whether the BSS works at all, we consider as model S0 the above mentioned seasonal with no noise or trend added. The results for this basic model are shown as blue lines with circles, and it is clear that the BSS works very well even for quite small N.

The next model, S1, is model S0 with added normally distributed random noise with 0.5 mm standard deviation, and is plotted as red dashed lines with diamond markers. Although noise slightly distorts the phase and amplitude identification, the BSS error is still quite small all the way to lengths ~ 225 samples and is small for lengths as short as ~ 50.

Model S2 is model S0 with added trend of \(1.0 \text{mm/yr}\)., and is plotted as a black dotted line with triangle markers, and it is clear in Fig. 6 that a trend superposed on the seasonal components causes large errors in the BSS. The values obtained for A and \(\phi\) differ significantly from the true values, but these differences appear to cancel somewhat, because the resulting \(\epsilon\), although much larger than for the trendless case, is much smaller than the differences in A. The case is pretty much the same for model S3, which is S2 with normal noise added, and curiously the presence of noise reduces slightly the differences and errors, particularly for small N values.

It should be pointed out that even the largest values of \(\epsilon \sim 0.289\), occurring for \(N\sim 225\), amount only to about 0.029 of the total range of the seasonal signal, which indicates that, although the BSS is an approximate method, it can yield useful results.

For short series that are close in time to other larger series, it should be possible to complement the results of BSS with the seasonal identifications from the longer series.

A curious feature of BSS is that the number of iterations needed for convergence, \({N}_{\mathrm{it}}\), appear to depend only on the length of the signal, and are essentially equal for all models, except for extremely short series with 50 or less elements, and even then they differ among models by only a small number of iterations.

The main assumption behind the method presented here is that the seasonal signals can be represented as a sinusoid, an assumption common to most of the papers dealing with seasonal identification (e.g. Blewitt and Lavallée 2002; Ding et al. 2005; Davis et al. 2006, 2012; Bogusz and Figurski 2014). This assumption is essential for short series, where variability is hard or impossible to appreciate, but it is relaxed in the variable phase approach for long series, and long real series usually do show at least some variability.

The real signals shown in the examples were low-pass filtered, using a Butterworth filter (Hamming 1977) with reference frequency \({f}_{c}=20 {\mathrm{yr}}^{-1}\) and filter order \({N}_{B}=4\), before the seasonal identification was done; but it turns out that results do not change significantly if the identification method is applied directly to unfiltered signals, and the resulting signals filtered afterwards.

The averages approach works very well for long invariant series or for series longer than one period but short enough so that invariancy is not a major issue. For long variant series, the varying phase approach results in better fits, but for real data, it is impossible to tell whether the fit may be including in it any non-seasonal features.

In any case, since the method employs one-period-long segments, it is imperative that long periods be identified and eliminated before shorter ones.

Conclusions

We have presented a simple-minded intuitive method for identifying seasonal components in GPS displacement time series. The method uses an approach based on multiple transforms, each of a one-period segment of the time series, appropriate for long series, which can take into account non-stationarity of the seasonal components, and features a scheme to deal with series somewhat smaller than one period of the seasonal component.

Unlike other more sophisticated and complicated methods, the method presented here has the advantage of simplicity, so that it is possible to recognize how the components are identified in the processed series.

Results are, of course, approximate, but considering the length limitations and the usual noise in the real data it has been applied to, it would not be practical to complicate seasonal signal identification to achieve a better, but meaningless, precision.

Two MATLAB programs, one to do the seasonal components identification, and another to generate synthetic signals to test the identification program and methods, are given in full. Interested readers can test and evaluate the method by themselves using these programs.