1 Introduction

Signal-to-noise ratio (SNR) statistics are widely used to describe the strength of the variations of the signal relative to those expressed by the noise. SNR statistics are used to quantify diverse aspects of models where an observable quantity Y is decomposed into a predictable or structural component s, often called signal or model, and a stochastic component \(\varepsilon \), called noise or error. Although the definition of SNR is rather general in this paper we focus on a typical situation where one assumes a sequence \( \{Y_i\}_{i \in \mathbb {Z}}\) is determined by

$$\begin{aligned} Y_i:= s(t_i) + \varepsilon _i, \end{aligned}$$
(1)

where i is a time index, \(s(\cdot )\) is a smooth function of time evaluated at the time point \(t_i\) with \(i\in \mathbb {Z}\), and \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is some random sequence. Assume \(t_i \in (0,1)\), however if a time series is observed at time points \(t_i \in (a,b)\), these can be rescaled onto the interval (0, 1) without changing the results of this paper.

Equation (1) is a popular model in many applications that range from physical sciences to engineering, biosciences, social sciences, etc. (see Parzen 1966, 1999 and references therein). Although we use the conventional term “noise” for \(\varepsilon _i\), this term may have a rich structure well beyond what we would usually consider noise. Some of the terminology here originates from physical sciences where the following concepts have been first explored. Consider a non stochastic signal s(t) defined on the time interval (0, 1), and assume that s(t) has a zero average level (that is \(\int _0^1 s(t)dt=0\)). The average “variation” (or magnitude) of the signal is quantified as

$$\begin{aligned} P_\text {signal}:= \int _{0}^{1} s^2(t) dt. \end{aligned}$$
(2)

In physical science terminology (2) is the average power of the signal, that is the “energy” contained in \(s(\cdot )\) per time unit (if the reference time interval is (ab) the integral in (2) is divided by \((b-a)\)). If the average signal level is not zero, s(t) is centered on its mean value, and then (2) is computed. The magnitude, or the “power”, of the noise component is given by \(P_\text {noise}:= {{\,\mathrm{Var}\,}}[\varepsilon _i]\). The SNR of the process is the ratio

$$\begin{aligned} \text {SNR}:= 10 \log _{10} \frac{P_\text {signal}}{P_\text {noise}}, \end{aligned}$$
(3)

expressed in decibels unit. The SNR can also be defined as the ratio \((P_\text {signal} / P_\text {noise})\), however the decibel scale is more common. Low SNR implies that the strength of the random component of (1) makes the signal \(s(\cdot )\) barely distinguishable from the observation of \(Y_i\). On the other hand, high SNR means that the sampling about \(Y_i\) will convey enough information about the predictable/structural component \(s(\cdot )\).

In many analysis, SNR is a crucial parameter to be known. In radar detection applications (Richards 2014), speech recognition (Loizou 2013), audio and video applications of signal processing (Haykin and Kosko 2001), it is crucial to build filtering algorithms that are able to reconstruct \(s(\cdot )\) with the largest possible SNR. In neuroscience there is strong interest in quantifying the SNR of signals produced by neurons activity. In fact, the puzzle is that single neurons seem to have low SNR meaning that they emit “weak signals” that are still processed so efficiently by the brain system (Czanner et al. 2015). In medical diagnostics, a physiological activity is measured and digitally sampled (e.g. fMRI, EEG, etc) with methods and devices that need to guarantee the largest SNR possible (Ullsperger and Debener 2010). The historic discovery of the first detection of a gravitational wave announced on 11 February 2016 has been made possible because of decades of research efforts in designing instruments and measurement methods able to work in an extremely low SNR environment (Kalogera 2017). These are just a few examples of the relevance of the SNR concept. The main goal of this paper is to define a SNR statistic, and provide an estimator for its distribution with proven statistical guarantees under general assumptions on the elements of (1).

Let \(\mathscr {Y}_n:=\{ y_1, y_2, \ldots , y_n\}\) be sample values of Y observed at equally spaced time points \(t_i=i/n\) for \(i=1,2,\ldots ,n\), with \(t_i \in (0,1)\). That is, in this work we focus on situations where Y is sampled at constant sampling rate (also known as fixed frequency sampling, or uniform design), although the theory developed here can be extended to non-constant sampling rates. Let \(\hat{s}(\cdot )\) and \(\hat{\varepsilon }\) be estimated quantities based on \(\mathscr {Y}_n\). Consider the observed SNR statistic

$$\begin{aligned} \widehat{SNR} := 10 \log _{10} \left( \frac{\frac{1}{n}\sum _{i=1}^{n} \hat{s}^2\left( \frac{i}{n}\right) }{\frac{1}{m}\sum _{i=1}^m\left( \hat{\varepsilon }_i-\bar{\hat{\varepsilon }} \right) ^2} \right) , \quad \text {with} \quad \bar{\hat{\varepsilon }}=\frac{1}{m} \sum _{i=1}^m \hat{\varepsilon }_i, \end{aligned}$$
(4)

for some choice of an appropriate sequence \(\{m\}\) such that \(m \rightarrow \infty \) and \(m/n \rightarrow 0\) as \(n \rightarrow \infty \). In this paper we propose a subsampling strategy that consistently estimates the quantiles of the distribution of \(\tau _m (\widehat{SNR} - \text {SNR})\) for an appropriate sequence \(\{\tau _m\}\) (see Theorem 3). These quantiles are used to construct simple confidence intervals for the SNR parameter.

In most applications the observed \( \{Y_i\}_{i \in \mathbb {Z}}\) is treated as “stable” enough so that smoothing methods (typically linear filtering) are applied to get \(\hat{s}(\cdot )\) and the error terms, \(\hat{\varepsilon }_i\), \(i=1,2,\ldots ,n\). Therefore, the general practice is to divide the observed data stream into sequential blocks of overlapping observations of some length (time windowing), and for each block \(\widehat{SNR}\) is computed (see Haykin and Kosko 2001; Weinberg 2017). These SNR measurements are then used to construct its distribution to make inference statements about the underlying SNR. These windowing methods implicitly assume some sort of local stationarity and uncorrelated noise, but there is a lack of theoretical justification. However, data structures often exhibit strong time-variations and other complexities not consistent with these simplifying assumptions. To our knowledge, a framework and a method for estimating the distribution of SNR statistics like (4) with provable statistical guarantees does not exist in the literature.

The major contribution of this paper is a subsampling method for the approximation of the quantiles of the distribution of the centered statistic \(\tau _m (\widehat{SNR} - \text {SNR})\). The method is based on the observation that \(\tau _m (\widehat{SNR} - \text {SNR})\) can be decomposed into the sum of the following components:

$$\begin{aligned} -10\tau _m\left[ \underbrace{\log _{10}\left( 1+\frac{\hat{V}_m-\sigma _{\varepsilon }^2}{\sigma _{\varepsilon }^2}\right) }_{\text {error contribution}} -\underbrace{\log _{10}\left( 1+\frac{\frac{1}{n}\sum _{i=1}^{n} \hat{s}^2(i/n)-\int s^2(t) dt}{\int s^2(t) dt}\right) }_{\text {signal contribution}}\right] ,\nonumber \\ \end{aligned}$$
(5)

where \(\hat{V}_m = m^{-1}\sum _{i=1}^m\left( \hat{\varepsilon }_i-\bar{\hat{\varepsilon }} \right) ^2\). In (5) the two components reflect the power contribution of the signal estimated based on \(\hat{s}^2(\cdot )\), and the power contribution of and the error term estimated in terms of \(\hat{V}_m\). In practice, the proposed method, formalized in Algorithm 1, works as follows: (i) the observed time series is randomly divided into subsamples, that is random blocks of consecutive observations; (ii) in each subsample the estimates \(\hat{s}^2(\cdot )\) and \(\hat{V}_m\) in (5) are computed; (iii) finally these subsample estimates are used to approximate the distribution of \(\tau _m (\widehat{SNR} - \text {SNR})\) and its quantiles.

Based on Altman (1990), a consistent kernel smoother with an optimal bandwidth estimator is derived. The smoother does not require any further tuning parameters, even though the stochastic structure here is richer than that considered in the original paper by Altman (1990). The subsampling procedure extends the contributions of Politis and Romano (1994) and Politis et al. (2001). The main difference with the classical subsampling is that the method proposed in this paper does not require computations on the entire observed time series. Therefore the kernel smoothing is performed subsample-wise, and this is particularly beneficial in applications when the sample size is significantly large. In sleep studies often an electrophysiological record (EEG) of the brain activity is performed in several positions of the scalp, where each sensor samples an electrical signal for 24 h at 100Hz, implying \(n=8,640,000\) data points for each sensor (see Kemp et al. 2000). Music is usually recorded at 44.1Khz (ISO 9660), which implies that a stereo song of 5 min produces \(n=26,460,000\) data points. This approach has been explored by Coretto and Giordano (2017) for the estimation of the dynamic range of music signals. However, the work of Coretto and Giordano (2017) deals with noise structures less general than those studied here. A further original element of this work is that, although the setup for \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) does not exclude long memory regimes, the methods proposed do not require the identification of any long memory parameter.

The rest of the paper is organized as follows. In Sect. 2 we define and discuss the reference framework for \( \{Y_i\}_{i \in \mathbb {Z}}\). In Sect. 3 the main estimation Algorithm 1 is introduced. The smoothing step of Algorithm 1 is studied in Seciton 4, while the subsampling step is investigated in Sect. 5. In Sect. 6 we show finite sample results of the proposed method based on simulated data, moreover an application to real data is illustrated. Final remarks and conclusions are given in Sect. 8. All proofs are given in the final “Appendix”.

2 Setup and assumptions

The framework (1) underpins a popular strategy to study experiments where a continuous time (analog) signal \(s(\cdot )\) is sampled at fixed time points \(t_i\). The stochastic term \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) represents various sources of randomness. In some cases, the source and the structure of the random component are known, but this does not apply universally. A ubiquitous assumption about \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is that it is white noise; sometimes the simplification pushes further towards Gaussianity (see Parzen 1966, and references therein). However, in various applications the evidence of departure from this simplicity it is quite rich.

The most elementary source of randomness is the quantization noise, i.e. the added noise introduced by the quantization of the signal. In Music, speech, EEG and many other applications, a voltage amplitude is recorded at fixed time intervals using a limited range of integer numbers. This is the so called Pulse Code Modulation (PCM), which is at the base of digital encoding techniques. The quantization noise is produced by the rounding error of the PCM sampling. Theoretically the quantization noise is a uniform white noise process, however, Gray (1990) showed that the structure of the quantization noise varies a lot across applications and measurement techniques, and often the white noise assumption is too restrictive. Apart from quantization noise, the recorded signal may be affected by a number of disturbances unrelated to the signal. Take for example an EEG acquisition where electrical noise from the power lines is injected into the measuring device. In speech recording microphones capture stray radio frequency energy. Another example is that of wireless signal transmission affected by multi-path interference, that is: waves bounce off of and around surfaces creating unpredictable phase distortions. Complex effects like these happen in radar transmission too, where it is well known that the Gaussian white noise assumption is generally violated (see Conte and Maio 2002, and references therein).

Sometimes the stochastic component does not only include unpredictable external artifacts. There are cases where the structure of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is the result of several complex phenomena occurring within the system under study. In their pioneering works Voss and Clarke (1978, 1975) found evidence of 1 / f–noise or similar fractal processes in recorded music. Similar evidence is documented in Levitin et al. (2012). 1 / f–noise is a stochastic process where the spectral density follows the power law \(c|f|^{-\beta }\), where f is the frequency, \(\beta \) is the exponent, and c is a scaling constant. \(\beta =1\) gives pink noise, that is just an example of such processes. Depending on \(\beta \) these forms of noise are characterized by slowly vanishing serial correlations and/or what is known as long memory. Many electronic devices found in data acquisition instruments introduce 1 / f-type noise (Kogan 1996; Weissman 1988; Kenig and Cross 2014). Evidence of departure from linearity and Gaussianity in the transient components of music recordings was also found in Brillinger and Irizarry (1998) and Coretto and Giordano (2017).

The main goal of this paper is to build an estimation method for the distribution of the SNR that works under the most general setting. Of course achieving universality is impossible, but here we set a model environment that is as rich as possible. Our model is restricted by the following assumptions:

Assumption A1

The function \(s(\cdot )\) has a continuous second derivative.

Assumption A2

The sequence \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) fulfills one of the following:

  1. (SRD)

    \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is a strictly stationary and \(\alpha \)-mixing process with mixing coefficients \(\alpha (k)\), \({{\,\mathrm{E}\,}}[\varepsilon _i]=0\), \({{\,\mathrm{E}\,}}\left| \varepsilon _i^2 \right| ^{2+\delta } < +\infty \), and \(\sum _{k=1}^{+\infty } \alpha ^{\delta /(2+\delta )}(k)<\infty \) for some \(\delta >0\).

  2. (LRD)

    \(\varepsilon _i=\sum _{j=0}^{\infty }\psi _ja_{i-j}\) with \({{\,\mathrm{E}\,}}[a_i]=0\), \({{\,\mathrm{E}\,}}[a_i^4]<\infty \quad \forall i\), \(\{a_i\}\sim i.i.d.\), \(\psi _j\sim C_1j^{-\eta }\) with \(\eta =\frac{1}{2}(1+\gamma _1)\), \(C_1>0\) and \(0<\gamma _1\le 1\).

Assumption A1 reflects a common smoothness requirement for \(s(\cdot )\) which does not need further discussion. In most applications, \(s(\cdot )\) will represent the sum of possibly many harmonic components, or long term smooth trends. A2 sets a wide range of possible structures for the stochastic component. Two regimes are considered here: short range dependence (SRD) and long range dependence (LRD). SRD is a rather general \(\alpha \)-mixing assumption that allows to overcome the usual linear process assumption. The latter is essential to model fast decaying energy variations that is typical in some form of noise. Assumptions A1 and A2-SRD are also considered in Coretto and Giordano (2017) for the estimation of the dynamic range of music signals. However, in this paper, we are interested in SNR statistics, and we extend the analysis to the cases where LRD occurs. A2-LRD has the role to capture situations where the noise spectra shows long-range dependence; in practice this assumption accommodates the 1 / f-type noise. The LRD is controlled by \(\gamma _1\) which is between zero and 1.

Note that A2-LRD assumes a linear structure while A2-SRD does not. SRD assumption allows for dependence, and the rate at which it vanishes it is controlled by \(\delta \). Under SRD, in the infinite future, the terms of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) act as an independent sequence. Hence SRD can capture many different forms of dependence but not long memory features. Is the linearity structure of LRD a strong assumption for the long-memory cases? The class of long-memory linear processes is well known in the literature, and in most cases, LRD effects are more common to appear with a linear autoregressive structure. Moreover, A2-LRD is compatible with the classical parametric models for LRD, e.g. the well known ARFIMA class, already used to capture the 1 / f-noise phenomenon. One could overcome the linearity assumption in LRD but at the expense of serious technical complications. It is important to stress that we are not interested in identifying SRD-vs-LRD, and we want to avoid the additional estimation of the LRD order. The latter is crucial in most parametric models for LRD. Assumption A2 only defines plausible stochastic structures that can occur in the most diverse applications. Note that A2-LRD does not imply that \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is a Gaussian process or a function of it, as it is assumed in Jach et al. (2012) and Hall et al. (1998).

3 The smoothing-subsampling procedure

figure a

The SNR distribution is estimated performing Algorithm 1. This is a simple smoothing-subsampling procedure where for each subsample \(P_{\text {signal}}\) is consistently estimated by \(\hat{U}_{n,b,t}\), and \(P_{\text {noise}}\) is estimated by \(\hat{V}_{n,b_1,t}\) on a secondary subsample taken from the previous one. Details and theoretical motivation of the procedure will be treated in Sects. 4 and 5. The distribution is constructed for the SNR expressed in decibel scale.

The procedure is called “Monte Carlo” because the subsample selection is randomized. The latter reduces the huge number of subsamples to be explored. Note that here none of the calculations involve computations over the entire observed sample \(\mathscr {Y}_n\). The latter differs from the classical subsampling for time series data introduced in Politis and Romano (1994) and Politis et al. (2001). In the classical subsampling, one would estimate the variance of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) based on the entire sample. This would require that the estimation of \(s(\cdot )\) is performed globally on \(\mathscr {Y}_n\). In Algorithm 1 both \(s(\cdot )\), and the variance of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) in (7) are estimated blockwise. This blockwise smoothing strategy, where computations are performed only at the subsample level, has been proposed in Coretto and Giordano (2017). The advantages over the classical subsampling are twofold. First, thanks to the increased data acquisition technology, in most of the applications mentioned in Sect. 1, n scales in terms of millions or billions of data points. It is well known that kernel and other nonparametric smoothing methods become computationally intractable for such big sample sizes. In Algorithm 1 the computational complexity for the calculation of \(\hat{s}(\cdot )\) is governed by the subsample size b, which is chosen much smaller than n (see Theorem 2). Second, the kind of signals we want to reconstruct may exhibit strong structural variations along the time axis, therefore, estimation of \(s(\cdot )\) on the entire sample would require the use of optimal kernel methods with local bandwidth increasing the computational burden even more. Working on smaller data chunks allows treating the signal locally. Therefore, simpler kernel methods based on global bandwidth within the subsampled block are better suited to capture the local structure of the signal. Optimal estimation of \(s(\cdot )\), and the random subsampling part of Algorithm 1 are developed in the next two sections.

4 Optimal signal reconstruction

Unless one has enough information about the shape of \(s(\cdot )\), nonparametric estimators of functions with proven statistical properties are natural candidates to reconstruct the underlying signal. Our choice is the classical Priestley-Chao kernel estimator (Priestley and Chao 1972), because it can be easily optimized in regression models where the error is not necessarily uncorrelated. The estimator for \(s(\cdot )\) is defined as

$$\begin{aligned} \hat{s}(t) = \frac{1}{nh} \sum _{i=1}^n \mathscr {K}\left( \frac{t-i/n}{h}\right) y_i. \end{aligned}$$
(9)

The following assumption involving the kernel function \(\mathscr {K}\left( \cdot \right) \) and the bandwidth h is assumed to hold.

Assumption A3

\(\mathscr {K}\left( \cdot \right) \) is a density function with compact support and symmetric about 0. Moreover, \(\mathscr {K}\left( \cdot \right) \) is Lipschitz continuous of some order. The bandwidth \(h \in H=[c_1\Lambda _n^{-1/5}, \; c_2\Lambda _n^{-1/5}]\), where \(c_1 < c_2\) are two positive constants such that: \(c_1\) is arbitrarily small, \(c_2\) is arbitrarily large.

Define

$$\begin{aligned} \Lambda _n:= {\left\{ \begin{array}{ll} n &{} \text {if } \mathrm{A2}\text {-SRD holds,} \\ \frac{n}{\log n} &{} \text {if } \mathrm{A2}\text {-LRD holds with} \quad \gamma _1=1,\\ n^{\gamma _1} &{} \text {if } \mathrm{A2}\text {-LRD holds with} \quad 0<\gamma _1<1. \end{array}\right. } \end{aligned}$$
(10)

Whenever \(n \rightarrow \infty \) it happens that \(h \rightarrow 0\) and \(\Lambda _nh \rightarrow \infty \).

There are a number of possible choices for \(\mathscr {K}\left( \cdot \right) \) satisfying A3, and we will use the Epanechnikov kernel for its well known efficiency properties. Setting an optimal bandwidth in (9) when the error term may be correlated requires special care. Here an optimal choice of h is even more involved due to the fact that \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) may follow either the SRD or the LRD regime. The sequence (10) has a role in managing this added complexity. Altman (1990) developed the Priestley-Chao kernel estimator (9) with dependent additive errors, and showed that under serial correlation standard bandwidth optimality theory does not apply. Altman (1990) proposed to estimate an optimal h based on a cross-validation function accounting for the dependence structure of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\). Altman’s contribution deals with errors belonging to the class of linear processes with finite memory. Therefore, Altman’s assumptions do not allow the LRD case. Moreover, we consider the SRD assumption because it is typical for stochastic processes with a nonlinear model representation in time series framework. Finally, Altman (1990) assumes that the true autocorrelation function of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is known which is not the case in real world applications.

Let \(\hat{\varepsilon }_i = y_i - \hat{s}(i/n)\), and define the cross-validation objective function

$$\begin{aligned} \text {CV}(h)= \left[ 1-\frac{1}{nh}\sum _{j=-M}^M \mathscr {K}\left( \frac{j}{nh}\right) \hat{\rho }(j) \right] ^{-2} \frac{1}{n} \sum _{i=1}^n \hat{\varepsilon }_i^2. \end{aligned}$$
(11)

The optimal bandwidth is estimated by minimizing (11), that is

$$\begin{aligned} \hat{h} = {\text {argmin}}_{h \in H} \; \text {CV}(h). \end{aligned}$$

The first term in (11) is the correction factor proposed by Altman (1990), but replacing the true unknown autocorrelations with their sample counterparts \(\hat{\rho }(\cdot )\) up to the Mth order. M is an additional smoothing parameter, but Altman’s contribution does not deal with its choice. Consistency of the optimal bandwidth estimator is obtained if M increases at a rate smaller than the product nh. As in Coretto and Giordano (2017) M is chosen so that the following holds.

Assumption A4

Whenever \(n\rightarrow \infty \); then \(M \rightarrow \infty \) and \(M=O(\sqrt{nh})\).

Let \(\text {MISE}(\hat{s};h)\) be the mean integrated square error of \(\hat{s}(\cdot )\), that is

$$\begin{aligned} \text {MISE}(\hat{s}; h) = \int _0^1 \text {MSE}(\hat{s}(t);h)\;dt \quad \text {where} \quad \text {MSE}(\hat{s}(t);h) = \text {E}[(\hat{s}(t) - s(t))^2]. \end{aligned}$$

Let \(h^\star \) be the global minimizer of \(\text {MISE}(\hat{s};h)\). The next result states the optimality of the kernel estimator.

Theorem 1

Assume A1, A2, A3 and A4. \(\hat{h}/{h^\star } {\mathop {\longrightarrow }\limits ^{\text {p}}}1\) as \(n \rightarrow \infty \).

The previous result relates \(\hat{h}\) to the optimal global bandwidth for which convergence rate is known, that is \(O(\Lambda _n^{-1/5} )\). Theorem 1 is equivalent to that given in Coretto and Giordano (2017), however, the difference here is that \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) may well follow LRD. Therefore, proof of Theorem 1 (given in the “Appendix”) needs some further developments.

Remark 1

Theorem 1 improves the existing literature in several aspects. First of all, the proposed signal reconstruction is optimal (in the MISE sense) under both SRD and LRD. Its key feature is that one does not need to identify the type of dependence, that is SRD vs LRD. There are only two smoothing tunings: h that is estimated optimally, and M fixed according to A4. The SRD regime is already treated in Coretto and Giordano (2017). Regarding LRD, the result should be compared to Hall et al. (1995). The advantages of our approach compared to the latter are: (i) the method is simplified by eliminating a tuning needed to deal with LRD, that is the block length for the leave-k-out cross-validation in Hall et al. (1995). This is because the Altman’s cross-validation correction in (11) already incorporates the dependence structure via \(\hat{\rho }(\cdot )\), and M is able to correct (11) without any further step identifying whether LRD or SRD occurs; (ii) here we do not assume existence of higher order moments of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\).

5 Monte Carlo approximation of the subsampling distribution

In this section we exploit the subsampling procedure underlying Algorithm 1. We call this procedure “Monte Carlo”, because it is based on a random selection of subsamples, and here we provide a Monte Carlo approximation of the subsampling distribution of the statistic of interest. Let us introduce the following quantities:

$$\begin{aligned} V_n=\frac{1}{n}\sum _{i=1}^n\left( \varepsilon _i-\bar{\varepsilon } \right) ^2, \qquad \text {with} \quad \bar{\varepsilon }=\frac{1}{n} \sum _{i=1}^n \varepsilon _i. \end{aligned}$$
(12)

Although the random sequence \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is not observable, one can work with its estimate. Replace \(\varepsilon _i\) with \(\hat{\varepsilon }_i\) in the previous formula and obtain

$$\begin{aligned} \hat{V}_n=\frac{1}{n}\sum _{i=1}^n\left( \hat{\varepsilon }_i-\bar{\hat{\varepsilon }} \right) ^2, \qquad \text {with} \quad \bar{\hat{\varepsilon }}=\frac{1}{n} \sum _{i=1}^n \hat{\varepsilon }_i. \end{aligned}$$

The distribution of a proper scaled and centered \(\hat{V}_n\) can now be used to approximate the distribution of \(\tau _n(V_n-\sigma ^2_{\varepsilon })\) where \(\tau _n\) is defined in (13) and \(\sigma _{\varepsilon }^2:=\text {E}[\varepsilon _t^2]\). One way to do this is to perform the subsampling as proposed in Politis et al. (1999) and Politis et al. (2001).

That is, for all blocks of observations of length b (subsample size) compute \(\hat{V}_n\). However the number of possible subsample is huge even for moderate n. Moreover, in typical cases where n is of the order of millions or billions of samples, the computation of the optimal \(\hat{s}(\cdot )\) would require an enormous computer power. The problem is solved by performing the blockwise smoothing of Algorithm 1 proposed in Coretto and Giordano (2017). Therefore, the signal and the average error are estimated block-wise, so that the computing effort is only driven by b. This allows making the algorithm scalable with respect to n, a very important feature to process data from modern data acquisition systems. Here we investigate the theoretical properties of the estimation Algorithm 1. The formalization is similar to that given in Coretto and Giordano (2017), however, here we deal with a different target statistic, and we face the added complexity of the existence of LRD regimes in \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\).

First define

$$\begin{aligned} \tau _n:= {\left\{ \begin{array}{ll} n^{1/2} &{} \text {if } \mathrm{A2}\text {-SRD holds},\\ n^{1/2} &{} \text {if } \mathrm{A2}\text {-LRD holds with} \quad 1/2<\gamma _1\le 1,\\ \left( \frac{n}{\log n}\right) ^{1/2} &{} \text {if } \mathrm{A2}\text {-LRD holds with} \quad \gamma _1=1/2,\\ n^{\gamma _1} &{} \text {if }\mathrm{A2}\text {-LRD holds with} \quad 0<\gamma _1<1/2. \end{array}\right. } \end{aligned}$$
(13)

At a given time point t consider a block of observations of length b, and the statistics computed in Algorithm 1:

$$\begin{aligned} V_{n,b,t}=\frac{1}{b}\sum _{i=t}^{t+b-1} (\varepsilon _i- \bar{\varepsilon }_{b,t})^2, \qquad \text {and} \qquad \hat{V}_{n,b,t}=\frac{1}{b}\sum _{i=t}^{t+b-1} (\hat{\varepsilon }_i-\bar{\hat{\varepsilon }}_{b,t})^2, \end{aligned}$$

with \(\bar{{\varepsilon }}_{b,t}=b^{-1}\sum _{i=t}^{t+b-1} {\varepsilon }_i\) and \(\bar{\hat{\varepsilon }}_{b,t}=b^{-1}\sum _{i=t}^{t+b-1}\hat{\varepsilon }_i\). The empirical distribution functions of \(\tau _n (V_n -\sigma _{\varepsilon }^2)\), based on the true and estimated noise, respectively, are given by

$$\begin{aligned} G_{n,b}(x) =&\frac{1}{n-b+1}\sum _{t=1}^{n-b+1}\mathbb {I}\left\{ \tau _b\left( V_{n,b,t}-V_n\right) \le x\right\} , \\ \hat{G}_{n,b}(x) =&\frac{1}{n-b+1}\sum _{t=1}^{n-b+1} \mathbb {I}\left\{ \tau _b(\hat{V}_{n,b,t}-V_n)\le x\right\} . \end{aligned}$$

\(\mathbb {I}\left\{ A\right\} \) denotes the usual indicator function of the set A. \(\tau _b\) is defined in (13). Lemma 2 and 3 in the “Appendix” state that the subsampling based on statistic (12) is consistent under both A2-SRD and A2-LRD. Notice that results in Politis et al. (2001) can only be used to deal with SRD. The LRD treatment is inspired to Hall et al. (1998) and Jach et al. (2012). However, we improve upon their results in the sense that the Gaussianity assumption for \(\varepsilon _t\) is avoided under A2-LRD with \(1/2<\gamma _1\le 1\). The quantiles of the subsampling distribution also converges to the quantiles of the asymptotic distribution of \(\tau _n(V_n - \sigma ^2_{\varepsilon })\). This is a consequence of the fact that \(\tau _n (V_n-\sigma _{\varepsilon }^2)\) converges weakly (see Remark 2). For \( \gamma _2 \in (0,1)\) the quantities \(q(\gamma _2)\), \(q_{n,b}(\gamma _2)\) and \(\hat{q}_{n,b}(\gamma _2)\) denote respectively the \(\gamma _2\)-quantiles with respect the distributions G, (see Remark 2), \(G_{n,b}\) and \(\hat{G}_{n,b}\) respectively. We adopt the usual definition that \(q(\gamma _2)=\inf \left\{ x: G(x)\ge \gamma _2\right\} \). Lemma 4 in the “Appendix” states the same consistency for the quantiles. The following remark covers the different cases (A2-SRD and A2-LRD) for the asymptotic distribution of \(\tau _n (V_n - \sigma ^2_{\varepsilon })\).

Remark 2

By A2 it can be shown that \(\tau _n (V_n - \sigma ^2_{\varepsilon } )\) converges weakly to a random variable with distribution, say \(G(\cdot )\), where \(\sigma ^2_{\varepsilon } ={{\,\mathrm{E}\,}}[\varepsilon _t^2]\). Under A2-SRD, \(G(\cdot )\) is a Normal distribution. \(G(\cdot )\) is still a Normal distribution under A2-LRD with \(1/2<\gamma _1\le 1\), which follows from Theorem 4 of Hosking (1996). The same Theorem implies that \(G(\cdot )\) is Normal under A2-LRD with \(\gamma _1=1/2\) when \(a_t\) is normally distributed. Moreover, \(G(\cdot )\) is not Normal under A2-LRD with \(0<\gamma _1<1/2\).

A variant is to reduce the number of subsamples by introducing a random block selection with \(s(\cdot )\) estimated blockwise on subsamples of length b. Let \(I_i\), \(i=1,\ldots K\) be random variables indicating the initial point of every block of length b. We draw, without replacement with uniform probabilities, the sequence \(\left\{ I_i\right\} _{i=1}^K\) from the set \(I=\{1,2,\ldots ,n-b+1\}\). The empirical distribution function of the subsampling variances of \(\hat{\varepsilon }_{t}\) over the random blocks is

$$\begin{aligned} \tilde{G}_{n,b}(x)=\frac{1}{K}\sum _{i=1}^{K} \mathbb {I}\left\{ \tau _b \left( \hat{V}_{n,b,I_i}-V_{n} \right) \le x \right\} . \end{aligned}$$

In order to get the consistency of the subsample procedure both in the SRD and LRD cases, we consider two subsamples. The first one has a length of b and we use it to estimate the signal, that is \(s(\cdot )\). Instead, the second subsample, which is a subset of the first, has a length \(b_1=o(b^{4/5})\) and we use this second subsample to estimate the variance and its distribution. The following result states the consistency of \(\tilde{G}\) in approximating G.

Theorem 2

AssumeA1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Let \(\hat{s}(t)\) be the estimate of s(t) on a subsample of length b. Let \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{4/5})\) and \(K \rightarrow \infty \), then \(\sup _x\left| \tilde{G}_{n,b_1}(x)-G(x)\right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).

Proof of Theorem 2 is given in the “Appendix”. In analogy with what we have seen before we also establish consistency for the quantiles of \(\tilde{G}(\cdot )\). Let \(\tilde{q}_{n,b_1}(\gamma _2)\) be the \(\gamma _2\)-quantile with respect to \(\tilde{G}(\cdot )\).

Corollary 1

Assume A1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Let \(\hat{s}(t)\) be the estimate of s(t) on a subsample of length b. Let \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{4/5})\) and \(K \rightarrow \infty \), then \(\tilde{q}_{n,b_1}(\gamma _2) {\mathop {\longrightarrow }\limits ^{\text {p}}}q(\gamma _2)\).

Proof of Corollary 1 is given in the “Appendix”.

Remark 3

Note that the second subsample of length \(b_1\) is a consequence of the optimal rate for the estimation of s(t) subsample-wise.

The next result states the consistency of subsample procedure when \(\hat{V}_n\) replaces \(V_n\) in \(\tilde{G}_{n,b}(x)\). Define

$$\begin{aligned} \tilde{G}_{n,b}^0(x)=\frac{1}{K}\sum _{i=1}^{K} \mathbb {I}\left\{ \tau _b \left( \hat{V}_{n,b,I_i}-\hat{V}_{n} \right) \le x \right\} . \end{aligned}$$

Corollary 2

Assume the same assumptions as in Theorem 2. Let \(\hat{s}(t)\) be the estimate of s(t) on a subsample of length b. Let \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{4/5})\) and \(K \rightarrow \infty \), then \(\sup _x\left| \tilde{G}_{n,b_1}^0(x)-G(x)\right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).

Proof of Corollary 2 is given in the “Appendix”.

Remark 4

Following the same arguments as in the proof of Corollary 2, we have that \(V_n-\sigma _{\varepsilon }^2=O_p(\tau _n^{-1})\) and also \(\hat{V}_n-\sigma _{\varepsilon }^2=O_p(\tau _n^{-1})\) in the cases of SRD and LRD with \(5/8\le \gamma _1\le 1\). Moreover, in the proof of Theorem 2 the key point for the consistency is \(\tau _{b_1}\Lambda _b^{-4/5}\rightarrow 0\) when \(n\rightarrow \infty \). Therefore, if we set \(b_1=b\) the results of Theorem 2 and Corollary 2 hold again in the cases of SRD and LRD with \(5/8\le \gamma _1\le 1\). In fact, \(\tau _b\Lambda _b^{-4/5}\rightarrow 0\) and \(\tau _b\tau _n^{-1}\rightarrow 0\) when \(n\rightarrow \infty \).

Now, by using the previous results, we can state that the subsample strategy is consistent to estimate the asymptotic distribution of \(\tau _m(\widehat{SNR}-SNR)\) where \(\widehat{SNR}\) is defined in (4). The statistic \(\widehat{SNR}\) has the numerator and denominator depending on n and m, respectively. The latter is mimicked in the subsample procedure. In fact, a subsample of length b is used for the estimation of the signal power, while a subsample of length \(b_1\) is used to estimate the variance of the error term.

Theorem 3

Let \(\mathscr {Y}_n:=\{ y_1, y_2, \ldots , y_n\}\) be a sampling realization of \( \{Y_i\}_{i \in \mathbb {Z}}\). Assume A1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Assume \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{2/5})\), \(m=o(n^{2/5})\), \(b_1/m\rightarrow 0\) and \(K \rightarrow \infty \), then

$$\begin{aligned} \sup _x|\mathbb {Q}_n(x)-\mathbb {Q}(x)|{\mathop {\longrightarrow }\limits ^{p}} 0 \end{aligned}$$

where

$$\begin{aligned} \mathbb {Q}_n(x):= & {} \frac{1}{K}\sum _{i=1}^K\mathbb {I}\left\{ \tau _{b_1}(\widehat{SNR}_{n,b,I_i}-\widehat{SNR})\le x\right\} , \\&\text{ with } \widehat{SNR}_{n,b,I_i}:=10\log _{10}\left( \frac{\hat{U}_{n,b,I_i}}{\hat{V}_{n,b_1,I_i}}\right) \end{aligned}$$

and \(\mathbb {Q}(x)\) is the asymptotic distribution of \(\tau _m(\widehat{SNR}-SNR)\).

Proof of Theorem 3 is given in the appendix. Note that in Theorem 3 we need \(b_1=o(b^{2/5})\) instead of \(b_1=o(n^{4/5})\) found in previous results. The reason for this is that the statistical functional \(\widehat{SNR}\) is more complex than \(\hat{V}_n\) and a different relative speed for the secondary block size \(b_1\) is required.

Theorem 3 provides the theoretical justification for the consistency of the subsample procedure with respect to the statistic \(\widehat{SNR}\). Let \(q^{Q}(\gamma _2)\) and \(\tilde{q}^{Q}_{n,b_1}(\gamma _2)\equiv \tilde{q}^{Q}_{n,b_1}(\gamma _2|\tau _{b_1})\) be the quantiles with respect to \(\mathbb {Q}(x)\) and \(\mathbb {Q}_n(x)\), respectively. Note that we write \(\tilde{q}^{Q}_{n,b_1}(\gamma _2|\tau _{b_1})\) to highlight the dependence on the scaling factor \(\tau _{b_1}\) as in Sect. 8 of Politis et al. (1999). The main goal is to do inference for SNR without estimating the long memory parameter, and without using the sample statistic \(\widehat{SNR}\). In this way, we do not need to fix or estimate m. To do this, we use Lemma 8.2.1 in Politis et al. (1999). First, \(\mathbb {Q}(x)\) always has a strictly positive density function, at least, in a subset of real line (see Hosking (1996) and references therein). So, by Lemma 8.2.1 in Politis et al. (1999), and using the same arguments as in the proof of Corollary 1, we have that

$$\begin{aligned} \tilde{q}^{Q}_{n,b_1}(\gamma _2|\tau _{b_1})=q^{Q}(\gamma _2)+o_p(1). \end{aligned}$$
(14)

Following the same lines as in Sect. 8 of Politis et al. (1999), we have that

$$\begin{aligned} \tilde{q}^{Q}_{n,b_1}(\gamma _2|1)=\frac{\tilde{q}^{Q}_{n,b_1}(\gamma _2|\tau _{b_1})}{\tau _{b_1}}+\widehat{SNR}. \end{aligned}$$
(15)

Note that \(\tilde{q}^{Q}_{n,b_1}(\gamma _2|1)\) is the quantile with respect to the empirical distribution function \(1/K\sum _{i=1}^K\mathbb {I}\left( \widehat{SNR}_{n,b,I_i}\le \frac{x}{\tau _{b_1}}+\widehat{SNR}\right) \). Therefore, by (14) and (15) it follows that

$$\begin{aligned} \tilde{q}^{Q}_{n,b_1}(\gamma _2|1)\tau _{b_1}=\tau _{b_1}\widehat{SNR}+q^{Q}(\gamma _2)+o_p(1). \end{aligned}$$

Since \(\widehat{SNR}=SNR+O_p(\tau _m^{-1})\) and \(\tau _{b_1}/\tau _{m}\rightarrow 0\) when \(n\rightarrow \infty \), we have that

$$\begin{aligned} \tilde{q}^{Q}_{n,b_1}(\gamma _2|1)=SNR+\frac{q^{Q}(\gamma _2)}{\tau _{b_1}}+o_p(\tau _{b_1}^{-1}). \end{aligned}$$

Therefore, a confidence interval for SNR with a nominal level of \(\gamma _2\) is given by

$$\begin{aligned} \left[ \tilde{q}^{Q}_{n,b_1}(\gamma _2/2|1),\quad \tilde{q}^{Q}_{n,b_1}(1-\gamma _2/2|1)\right] . \end{aligned}$$
(16)

It is possible to consider the methods of self-normalization as in Jach et al. (2012), and the estimation of the scaling factor \(\tau \) as in Politis et al. (1999). These methods would lead to more efficient confidence bands, in the sense that these would be first order correct with a rate of \(\tau _m^{-1}\) instead of \(\tau _{b_1}^{-1}\). However, this would require the estimation of the unknown constants as in Jach et al. (2012).

Remark 5

In Theorems 2, Corollary 2 and Theorem 3 the definition of \(\tilde{G}(\cdot )\), \(\tilde{G}^0(\cdot )\) and \(\mathbb {Q}_n(\cdot )\) depend on quantities (\(V_n\), \(\hat{V}_n\) and \(\widehat{\text {SNR}}\)) computed on the whole sample. On the other hand, these results give the theoretical framework for computing confidence interval as in (16), and these calculations will not require any calculation on the entire sample. In other words, \(V_n\), \(\hat{V}_n\), and \(\widehat{SNR}\) are needed to center the involved distributions, but not needed to approximate the quantiles as in (16). Therefore, in this work these quantities only have a theoretical role to show that the subsample procedure does not produce degenerate asymptotic distributions.

6 Numerical experiments

In this section we present numerical experiments on simulated data. The assumptions given in this paper are rather general, and it is not possible to design a computer experiment that can be considered representative of all the kind of structures consistent with A1A4. Here we assess the performance of Algorithm 1 under different scenarios for the structure of the noise term. In order to do this we keep the structure of true signal fixed, and we investigate three variations of the noise data generating process. Data are sampled at fixed sampling frequency set at Fs = 44100Hz, a common value in audio applications. Let [0, T] be the data acquisition time interval, where T is the duration of the simulated signal in seconds. The signal is sampled at time \(t=t_1, t_2, \ldots , T\), with \(t_i = (i-1) / \text {Fs}\) for \(i=1,2,\ldots , T{\times }\text {Fs}\), as follows

$$\begin{aligned} y_{i} = A_s \, \sin (2 \pi \, 50 \, t_i) + \varepsilon _i, \qquad \text {with}\quad {i=1,2,\ldots T{\times }\text {Fs}}, \end{aligned}$$

and \(t_i = i/({T{\times }\text {Fs}})\). Therefore, the signal consists of a sinusoidal wave that produces energy at 50Hz. The signal power is equal to \(A_s^2/2\), where \(A_s\) is a scaling constant properly tuned to achieve a given true SNR. We set \(T=30\text {sec}\) (implying \(n=1,323,000\)), and we consider the following three cases for the noise.

AR:

The noise is generated from an AR(1) process with independent normal innovations. This produces serial correlation in the error term and represents a case for SRD. In particular \(\varepsilon _i = -0.7\varepsilon _{i-1} + u_i\), where \(\{u_i\}\) is an i.i.d. sequence with distribution \(\text {Normal}(0, A_\varepsilon )\), where \(A_{\varepsilon }\) is set to achieve a certain SNR.

P1:

The random sequence \(\{\varepsilon _i \}\) has power spectrum equal to \(P(f) = A_{\varepsilon } / f^{\beta }\), where P(f) is the power spectral density at frequency fHz. Here \(\beta =0.2\) which induces some moderate LRD in \(\{\varepsilon _i\}\). The scaling constant \(A_\varepsilon \) is set to achieve the desired SNR.

P2:

same as P1 but with \(\beta =0.6\). This design introduces a much stronger LRD.

In P1 and P2 the noise has a so-called \(1/f^\beta \)-“power law” where \(\beta \) controls the amount of long range dependence. Larger values \(\beta \) implies slower rate of decays for the serial correlations. For \(\beta =1\) pink noise is obtained. Values of \(\beta \in [0,1]\) give a behavior between the white noise and the pink noise. In the case P1, \(\gamma _1=1-\beta =0.8\) in A2-LRD. So, the asymptotic distribution of \(\tau _n(V_n-\sigma _{\varepsilon }^2)\) is Normal. Whereas, in the case P2, \(\gamma _1=1-\beta =0.4\). This implies that the asymptotic distribution of \(\tau _n(V_n-\sigma _{\varepsilon }^2)\) is not Normal (see Remark 2).

P1 and P2 are simulated based on the algorithm of Timmer and König (1995) implemented in the tuneR software of Ligges et al. (2016). For each of the three sampling designs we consider two values for the true SNR: 10dB and 6dB. In most applications an SNR = 6dB is considered a rather noisy situations. We recall that at 6dB the signal power is circa only four times the variance of the noise, and 10dB means that the signal power is ten times the noise variance. There is a challenging aspect of these designs. The case with P2 noise and SNR = 6dB, is particularly difficult for our method. In fact, P2 puts relatively large amount of variance (power) at low frequencies around 50 Hz, so that the signal is not well distinguished from some spectral components of the noise. The two parameters of Algorithm 1 are b and K. We consider three settings for the subsample window: \(b=10\text {ms}=441\) samples, \(b=15\text {ms}=662\) samples, and b estimated based on the method proposed in Götze and Račkauskas (2001). In the latter case the optimal b is computed over a grid ranging from \(b=2\) ms to \(b=20\) ms. In many applications is not easy to fix a value for b. However, in certain situations researchers have an idea about the structure of the signal, and the time series is windowed with blocks of a certain length. In applications where the underlying signal is expected to be composed by harmonic components, the usual practice is to take blocks of size approximately equal to the period of the harmonic component with the lowest expected frequency. The rationale is to take the smallest window size so that each block is still expected to carry some information about the low frequency components. For example for speech data usually blocks of 10ms are normally considered (Haykin and Kosko 2001), whereas for music data 50ms is a common choice (Weihs et al. 2016). Note that the artificial data here have an harmonic component at 50Hz with a period of 20ms, and we consider the fixed alternatives \(b=10\)ms and \(b=15\) ms as a robustness check.

We set \(K=200\), of course larger values of K would ensure less subsample induced variability. The \(b_1\), i.e. the window length of the secondary subsample needed to estimate the distribution of the sampling variance, is set according to Theorem 2. This is achieved by setting \(b_1 = [b^{2/5}]\). For each combination of noise type, SNR, and b we considered 500 Monte Carlo replica and we computed statistics to assess the performance of the procedure. Two aspects of the method are investigated corresponding to the two main contributions of the paper.

Table 1 Monte Carlo averages for the Mean Square Error (MSE) of the estimated signal power
Table 2 Monte Carlo averages for the optimal b estimated using the method proposed in Götze and Račkauskas (2001)

The first contribution of the paper is Theorem 1, where optimality and consistency of the Priestley-Chao kernel estimator is established under rather general assumptions on the error term. The kernel smoothing is used in Algorithm 1 to estimate the signal power in the numerator of (8). In Table 1 we report the Monte Carlo averages for the Mean Square Error (MSE) of the estimated signal power. Going from the simplest AR to the complex P2 noise model there is an increase in MSE as expected. The longer \(b=15\) ms subsample window always produced better results. The apparently counterintuitive evidence is that for larger amount of noise (lower SNR), the signal power is slightly better estimated. In order to understand this, note that the noise (in all three cases) produces most of its power in a low frequency region containing the signal frequency (i.e. 50Hz). In the lower noise case there is still a considerable amount of noise acting at low frequency that the adaptive nature of the kernel smoother is not able to recognize properly. In Table 2 we report the estimated average b with its Monte Carlo standard error. The estimated b is always near 10ms, and the latter produced results that are only slightly worse than those obtained for fixed \(b=15\) ms.

The second contribution of the paper is the consistency result (see Theorem 3 and related results) for the distribution of the SNR statistic. In order to measure the quality of method one needs to define the ground truth in terms of the sampling distribution of the target SNR statistic. The derivation of an expression for such a distribution would be an analytically intractable. Therefore, we computed the quantiles of the true SNR statistic based on Monte Carlo integration, and in Table 3 we report the average absolute differences between estimated quantiles and the true counterpart. Based on Corollary 1 the convergence of the distribution of the SNR is mapped into its quantiles, therefore this makes sense. Comparison involves five different quantile levels to assess the behavior of the procedure both in the tails and in the center of the distribution. The average deviations of Table 3 are computed in decibels. Overall the method can capture the center of the distribution pretty well in all cases. The estimation error increases in the tails of the distribution as one would expect. The right tail is estimated better than the left tail. In all cases the performance in the tails of the SNR distribution is better captured with a \(b=15\) ms window, although in the center of the distribution the differences implied by different values of b are much smaller. Going from SNR \(=\) 6 to SNR \(=\) 10 results are clearly better on the left tail of the distribution especially in the case P2. Again the estimated version of b pushes the corresponding results towards the \(b=10\)ms case.

Table 3 Monte Carlo averages for the absolute deviation of the estimated quantiles of the SNR distribution from the true counterpart

Every method has its own tunings, and the evidence here is that b has some effects on the proposed method. The major impact of b is about the tails of the SNR distribution. The selection of b based on the method proposed by Götze and Račkauskas (2001) deliver a fully satisfying solution that does not require any prior knowledge on the data structure. The only drawback of estimating b is that the overall algorithm needs to be executed for several candidate values of b. As final remark we want to stress that the method proposed here is designed to cope with much larger values of n. In this experiment the sampling is repeated a number of times to produce Monte Carlo estimates, therefore we had to choose an n compatible with reasonable computing times according to the available hardware. A limited number of trials with T up to several minutes (which implies that n goes up to several millions) have been successfully tested without changing the final results. Therefore, we can conclude that the algorithm scales well with the sample size.

7 Application to EEG data

In this section we illustrate an application of the proposed methodology to electroencephalography (EEG) data obtained from the PhysioNet repository (Goldberger et al. 2000). In particular we considered the “CHB-MIT Scalp EEG Database” available at https://physionet.org/pn6/chbmit/. The database contains EEG traces recorded at the Children’s Hospital Boston on pediatric subjects with intractable seizures. Subjects were monitored for several days after the withdrawal of anti-seizure medication before the final decision about the surgical intervention. 22 subjects were traced during the experiment for several days using the international 10–20 EEG system. The latter is a standard that specifies electrode positions and nomenclature. Therefore, for each subject 21 electrodes have been placed in certain positions of the scalp, each of these electrodes produced an electric signal sampled at 256 Hz and measured with 16bit precision. This means that each day (24 h), the EEG machine produced 21 time series each containing \(n = 22,118,400\) data points for a total of 464,486,400 amplitude measurements for each subject in the experiment. A description of the “CHB-MIT Scalp EEG Database”, as well as details about the data acquisition is given in Shoeb (2009).

Fig. 1
figure 1

Time series plots of the amplitude of the EEG signals recorded in positions P8-02 and T8-P8 for two distinct subjects in the experiment. The duration of the fragment is 5sec, and it starts at 12 min from the beginning of the recording

EEG signals have complex structures. Various sources of noise can be injected in the measurement chain, therefore it is always of interest to understand the behavior of the SNR. For this application we considered data for the first 3 subjects of the database, and we considered two electrode positions labeled P8-02 and T8-P8 in the 10–20 EEG system. The P8-02 electrode is placed on the parietal lobe responsible for integrating sensory information of various types. The T8-P8 electrode is placed on the temporal lobe which transforms sensory inputs into meanings retained as visual memory, language comprehension, and emotion association. An example of these traces is given in Fig. 1.

The method proposed here has been applied to obtain confidence intervals for the SNR. An SNR\(\ge 10\)dB can be considered a requirement for a favorable noise floor in these applications. In order to assess the robustness of the procedure with respect to the choice of the subsampling window b, for each case we considered windows of fixed size \( b = \{3\text {sec},5\text {sec},7\text {sec}\}\) which means \(b=\{768, 1280, 1792\}\), plus the estimated b with the method proposed by Götze and Račkauskas (2001). The estimation of b is performed on a grid of equispaced points between 2sec and 10sec. The literature about EEG signals doesn’t tell us whether the processes involved have a clear time scale, but 5sec is considered approximately the time length needed to identify interesting cerebral activities. For each b the corresponding \(b_1\) is set according to \(b_1 = [b^{2/5}]\) as for the numerical experiments. In Table 4 lower and upper limits for 90% and 95% confidence intervals of the SNR are reported.

Overall the results with estimated b are comparable to those with fixed b. The upper limits of these confidence intervals is never smaller than 10dB. The lower limits is negative in all cases, which means that for all cases there is a chance that the power of the stochastic component dominates that of the deterministic component in model (1). While the upper limit of these confidence intervals is rather stable across units for the same b value, larger differences are observed in terms of the lower limit. All this is a clear indication of the asymmetry of the SNR statistic. But this is expected since the two tails of the SNR statistic reflects two distinct mechanisms. In fact, a negative value for the SNR statistic (left tail) corresponds to situations where the dynamic of the observed time series is driven by the error term of equation (1). On the other hand a positive value of the SNR statistic (right tail) corresponds to situations where the dynamic is driven by the smooth changes induces by \(s(\cdot )\). Ceteris paribus, going from 90% level to 95% does not change the results dramatically. Note that in this kind of applications a 3dB difference is not considered a large difference. Regarding data recorded in the P8-02 position the length of the confidence interval, going from 90% to 95% changes between 1.28dB to 3.6dB, where the maximum variation is measured for Subject 3 when \(b=7\)sec. For the T8-P8 case the length of the confidence interval, going from 90% to 95%, changes between 1.52dB to 3.83dB, and here the maximum variation is measured for Subject 2 when \(b=7\)sec.

Some pattern is observed across experimental units. For given confidence level and b, overall Subject 1 reports the shortest confidence intervals. Subject 2 reports the longest intervals for records in position P8-02. Subject 3 reports the longest intervals in position T8-P8. The variations across values of b, with all else equal, are not dramatic. The settings with \(b=3\), and \(b=5\), produced longer intervals if compared with \(b=7\)sec and the optimal b. The data-driven method of Götze and Račkauskas (2001) produced an estimated b in the range [3sec, 7sec] for P8-02 data, and [6sec, 8sec] for T8-P8 data. These values are comparable with the rule of thumb that 5sec is a reasonable time scale for the kind of signals involved here. The general conclusion is that in absence of relevant information, the method of Götze and Račkauskas (2001) gives a useful data-driven choice of b.

Table 4 Lower and upper limit of the confidence interval for the SNR (expressed in dB) of the EEG records in positions P8-02 and T8-P8

8 Conclusions and final remarks

In this paper we developed an estimation method that consistently estimates the distribution of a SNR statistic in the context of time series data with errors belonging to a rich class of stochastic processes. We restricted the model to the case where the signal is a smooth function of time. The theory developed here can be easily adapted to more general time series additive regression models. The reference model for the observed data, and the theory developed here adapts to many possible applications that will be the object of a distinct paper. In this work we concentrated on the theoretical guarantees of the proposed method. The estimation is based on a random subsampling algorithm that can cope with massive sample sizes. Both the smoothing, and the subsampling techniques at the earth of Algorithm 1 embodies original innovations compared to the existing literature on the subject. Numerical experiments described in Sect. 6 showed that the proposed algorithm performs well in finite samples.