Abstract
Signal-to-noise ratio (SNR) statistics play a central role in many applications. A common situation where SNR is studied is when a continuous time signal is sampled at a fixed frequency with some noise in the background. While estimation methods exist, little is known about its distribution when the noise is not weakly stationary. In this paper we develop a nonparametric method to estimate the distribution of an SNR statistic when the noise belongs to a fairly general class of stochastic processes that encompasses both short and long-range dependence, as well as nonlinearities. The method is based on a combination of smoothing and subsampling techniques. Computations are only operated at the subsample level, and this allows to manage the typical enormous sample size produced by modern data acquisition technologies. We derive asymptotic guarantees for the proposed method, and we show the finite sample performance based on numerical experiments. Finally, we propose an application to electroencephalography data.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Signal-to-noise ratio (SNR) statistics are widely used to describe the strength of the variations of the signal relative to those expressed by the noise. SNR statistics are used to quantify diverse aspects of models where an observable quantity Y is decomposed into a predictable or structural component s, often called signal or model, and a stochastic component \(\varepsilon \), called noise or error. Although the definition of SNR is rather general in this paper we focus on a typical situation where one assumes a sequence \( \{Y_i\}_{i \in \mathbb {Z}}\) is determined by
where i is a time index, \(s(\cdot )\) is a smooth function of time evaluated at the time point \(t_i\) with \(i\in \mathbb {Z}\), and \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is some random sequence. Assume \(t_i \in (0,1)\), however if a time series is observed at time points \(t_i \in (a,b)\), these can be rescaled onto the interval (0, 1) without changing the results of this paper.
Equation (1) is a popular model in many applications that range from physical sciences to engineering, biosciences, social sciences, etc. (see Parzen 1966, 1999 and references therein). Although we use the conventional term “noise” for \(\varepsilon _i\), this term may have a rich structure well beyond what we would usually consider noise. Some of the terminology here originates from physical sciences where the following concepts have been first explored. Consider a non stochastic signal s(t) defined on the time interval (0, 1), and assume that s(t) has a zero average level (that is \(\int _0^1 s(t)dt=0\)). The average “variation” (or magnitude) of the signal is quantified as
In physical science terminology (2) is the average power of the signal, that is the “energy” contained in \(s(\cdot )\) per time unit (if the reference time interval is (a, b) the integral in (2) is divided by \((b-a)\)). If the average signal level is not zero, s(t) is centered on its mean value, and then (2) is computed. The magnitude, or the “power”, of the noise component is given by \(P_\text {noise}:= {{\,\mathrm{Var}\,}}[\varepsilon _i]\). The SNR of the process is the ratio
expressed in decibels unit. The SNR can also be defined as the ratio \((P_\text {signal} / P_\text {noise})\), however the decibel scale is more common. Low SNR implies that the strength of the random component of (1) makes the signal \(s(\cdot )\) barely distinguishable from the observation of \(Y_i\). On the other hand, high SNR means that the sampling about \(Y_i\) will convey enough information about the predictable/structural component \(s(\cdot )\).
In many analysis, SNR is a crucial parameter to be known. In radar detection applications (Richards 2014), speech recognition (Loizou 2013), audio and video applications of signal processing (Haykin and Kosko 2001), it is crucial to build filtering algorithms that are able to reconstruct \(s(\cdot )\) with the largest possible SNR. In neuroscience there is strong interest in quantifying the SNR of signals produced by neurons activity. In fact, the puzzle is that single neurons seem to have low SNR meaning that they emit “weak signals” that are still processed so efficiently by the brain system (Czanner et al. 2015). In medical diagnostics, a physiological activity is measured and digitally sampled (e.g. fMRI, EEG, etc) with methods and devices that need to guarantee the largest SNR possible (Ullsperger and Debener 2010). The historic discovery of the first detection of a gravitational wave announced on 11 February 2016 has been made possible because of decades of research efforts in designing instruments and measurement methods able to work in an extremely low SNR environment (Kalogera 2017). These are just a few examples of the relevance of the SNR concept. The main goal of this paper is to define a SNR statistic, and provide an estimator for its distribution with proven statistical guarantees under general assumptions on the elements of (1).
Let \(\mathscr {Y}_n:=\{ y_1, y_2, \ldots , y_n\}\) be sample values of Y observed at equally spaced time points \(t_i=i/n\) for \(i=1,2,\ldots ,n\), with \(t_i \in (0,1)\). That is, in this work we focus on situations where Y is sampled at constant sampling rate (also known as fixed frequency sampling, or uniform design), although the theory developed here can be extended to non-constant sampling rates. Let \(\hat{s}(\cdot )\) and \(\hat{\varepsilon }\) be estimated quantities based on \(\mathscr {Y}_n\). Consider the observed SNR statistic
for some choice of an appropriate sequence \(\{m\}\) such that \(m \rightarrow \infty \) and \(m/n \rightarrow 0\) as \(n \rightarrow \infty \). In this paper we propose a subsampling strategy that consistently estimates the quantiles of the distribution of \(\tau _m (\widehat{SNR} - \text {SNR})\) for an appropriate sequence \(\{\tau _m\}\) (see Theorem 3). These quantiles are used to construct simple confidence intervals for the SNR parameter.
In most applications the observed \( \{Y_i\}_{i \in \mathbb {Z}}\) is treated as “stable” enough so that smoothing methods (typically linear filtering) are applied to get \(\hat{s}(\cdot )\) and the error terms, \(\hat{\varepsilon }_i\), \(i=1,2,\ldots ,n\). Therefore, the general practice is to divide the observed data stream into sequential blocks of overlapping observations of some length (time windowing), and for each block \(\widehat{SNR}\) is computed (see Haykin and Kosko 2001; Weinberg 2017). These SNR measurements are then used to construct its distribution to make inference statements about the underlying SNR. These windowing methods implicitly assume some sort of local stationarity and uncorrelated noise, but there is a lack of theoretical justification. However, data structures often exhibit strong time-variations and other complexities not consistent with these simplifying assumptions. To our knowledge, a framework and a method for estimating the distribution of SNR statistics like (4) with provable statistical guarantees does not exist in the literature.
The major contribution of this paper is a subsampling method for the approximation of the quantiles of the distribution of the centered statistic \(\tau _m (\widehat{SNR} - \text {SNR})\). The method is based on the observation that \(\tau _m (\widehat{SNR} - \text {SNR})\) can be decomposed into the sum of the following components:
where \(\hat{V}_m = m^{-1}\sum _{i=1}^m\left( \hat{\varepsilon }_i-\bar{\hat{\varepsilon }} \right) ^2\). In (5) the two components reflect the power contribution of the signal estimated based on \(\hat{s}^2(\cdot )\), and the power contribution of and the error term estimated in terms of \(\hat{V}_m\). In practice, the proposed method, formalized in Algorithm 1, works as follows: (i) the observed time series is randomly divided into subsamples, that is random blocks of consecutive observations; (ii) in each subsample the estimates \(\hat{s}^2(\cdot )\) and \(\hat{V}_m\) in (5) are computed; (iii) finally these subsample estimates are used to approximate the distribution of \(\tau _m (\widehat{SNR} - \text {SNR})\) and its quantiles.
Based on Altman (1990), a consistent kernel smoother with an optimal bandwidth estimator is derived. The smoother does not require any further tuning parameters, even though the stochastic structure here is richer than that considered in the original paper by Altman (1990). The subsampling procedure extends the contributions of Politis and Romano (1994) and Politis et al. (2001). The main difference with the classical subsampling is that the method proposed in this paper does not require computations on the entire observed time series. Therefore the kernel smoothing is performed subsample-wise, and this is particularly beneficial in applications when the sample size is significantly large. In sleep studies often an electrophysiological record (EEG) of the brain activity is performed in several positions of the scalp, where each sensor samples an electrical signal for 24 h at 100Hz, implying \(n=8,640,000\) data points for each sensor (see Kemp et al. 2000). Music is usually recorded at 44.1Khz (ISO 9660), which implies that a stereo song of 5 min produces \(n=26,460,000\) data points. This approach has been explored by Coretto and Giordano (2017) for the estimation of the dynamic range of music signals. However, the work of Coretto and Giordano (2017) deals with noise structures less general than those studied here. A further original element of this work is that, although the setup for \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) does not exclude long memory regimes, the methods proposed do not require the identification of any long memory parameter.
The rest of the paper is organized as follows. In Sect. 2 we define and discuss the reference framework for \( \{Y_i\}_{i \in \mathbb {Z}}\). In Sect. 3 the main estimation Algorithm 1 is introduced. The smoothing step of Algorithm 1 is studied in Seciton 4, while the subsampling step is investigated in Sect. 5. In Sect. 6 we show finite sample results of the proposed method based on simulated data, moreover an application to real data is illustrated. Final remarks and conclusions are given in Sect. 8. All proofs are given in the final “Appendix”.
2 Setup and assumptions
The framework (1) underpins a popular strategy to study experiments where a continuous time (analog) signal \(s(\cdot )\) is sampled at fixed time points \(t_i\). The stochastic term \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) represents various sources of randomness. In some cases, the source and the structure of the random component are known, but this does not apply universally. A ubiquitous assumption about \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is that it is white noise; sometimes the simplification pushes further towards Gaussianity (see Parzen 1966, and references therein). However, in various applications the evidence of departure from this simplicity it is quite rich.
The most elementary source of randomness is the quantization noise, i.e. the added noise introduced by the quantization of the signal. In Music, speech, EEG and many other applications, a voltage amplitude is recorded at fixed time intervals using a limited range of integer numbers. This is the so called Pulse Code Modulation (PCM), which is at the base of digital encoding techniques. The quantization noise is produced by the rounding error of the PCM sampling. Theoretically the quantization noise is a uniform white noise process, however, Gray (1990) showed that the structure of the quantization noise varies a lot across applications and measurement techniques, and often the white noise assumption is too restrictive. Apart from quantization noise, the recorded signal may be affected by a number of disturbances unrelated to the signal. Take for example an EEG acquisition where electrical noise from the power lines is injected into the measuring device. In speech recording microphones capture stray radio frequency energy. Another example is that of wireless signal transmission affected by multi-path interference, that is: waves bounce off of and around surfaces creating unpredictable phase distortions. Complex effects like these happen in radar transmission too, where it is well known that the Gaussian white noise assumption is generally violated (see Conte and Maio 2002, and references therein).
Sometimes the stochastic component does not only include unpredictable external artifacts. There are cases where the structure of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is the result of several complex phenomena occurring within the system under study. In their pioneering works Voss and Clarke (1978, 1975) found evidence of 1 / f–noise or similar fractal processes in recorded music. Similar evidence is documented in Levitin et al. (2012). 1 / f–noise is a stochastic process where the spectral density follows the power law \(c|f|^{-\beta }\), where f is the frequency, \(\beta \) is the exponent, and c is a scaling constant. \(\beta =1\) gives pink noise, that is just an example of such processes. Depending on \(\beta \) these forms of noise are characterized by slowly vanishing serial correlations and/or what is known as long memory. Many electronic devices found in data acquisition instruments introduce 1 / f-type noise (Kogan 1996; Weissman 1988; Kenig and Cross 2014). Evidence of departure from linearity and Gaussianity in the transient components of music recordings was also found in Brillinger and Irizarry (1998) and Coretto and Giordano (2017).
The main goal of this paper is to build an estimation method for the distribution of the SNR that works under the most general setting. Of course achieving universality is impossible, but here we set a model environment that is as rich as possible. Our model is restricted by the following assumptions:
Assumption A1
The function \(s(\cdot )\) has a continuous second derivative.
Assumption A2
The sequence \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) fulfills one of the following:
-
(SRD)
\( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is a strictly stationary and \(\alpha \)-mixing process with mixing coefficients \(\alpha (k)\), \({{\,\mathrm{E}\,}}[\varepsilon _i]=0\), \({{\,\mathrm{E}\,}}\left| \varepsilon _i^2 \right| ^{2+\delta } < +\infty \), and \(\sum _{k=1}^{+\infty } \alpha ^{\delta /(2+\delta )}(k)<\infty \) for some \(\delta >0\).
-
(LRD)
\(\varepsilon _i=\sum _{j=0}^{\infty }\psi _ja_{i-j}\) with \({{\,\mathrm{E}\,}}[a_i]=0\), \({{\,\mathrm{E}\,}}[a_i^4]<\infty \quad \forall i\), \(\{a_i\}\sim i.i.d.\), \(\psi _j\sim C_1j^{-\eta }\) with \(\eta =\frac{1}{2}(1+\gamma _1)\), \(C_1>0\) and \(0<\gamma _1\le 1\).
Assumption A1 reflects a common smoothness requirement for \(s(\cdot )\) which does not need further discussion. In most applications, \(s(\cdot )\) will represent the sum of possibly many harmonic components, or long term smooth trends. A2 sets a wide range of possible structures for the stochastic component. Two regimes are considered here: short range dependence (SRD) and long range dependence (LRD). SRD is a rather general \(\alpha \)-mixing assumption that allows to overcome the usual linear process assumption. The latter is essential to model fast decaying energy variations that is typical in some form of noise. Assumptions A1 and A2-SRD are also considered in Coretto and Giordano (2017) for the estimation of the dynamic range of music signals. However, in this paper, we are interested in SNR statistics, and we extend the analysis to the cases where LRD occurs. A2-LRD has the role to capture situations where the noise spectra shows long-range dependence; in practice this assumption accommodates the 1 / f-type noise. The LRD is controlled by \(\gamma _1\) which is between zero and 1.
Note that A2-LRD assumes a linear structure while A2-SRD does not. SRD assumption allows for dependence, and the rate at which it vanishes it is controlled by \(\delta \). Under SRD, in the infinite future, the terms of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) act as an independent sequence. Hence SRD can capture many different forms of dependence but not long memory features. Is the linearity structure of LRD a strong assumption for the long-memory cases? The class of long-memory linear processes is well known in the literature, and in most cases, LRD effects are more common to appear with a linear autoregressive structure. Moreover, A2-LRD is compatible with the classical parametric models for LRD, e.g. the well known ARFIMA class, already used to capture the 1 / f-noise phenomenon. One could overcome the linearity assumption in LRD but at the expense of serious technical complications. It is important to stress that we are not interested in identifying SRD-vs-LRD, and we want to avoid the additional estimation of the LRD order. The latter is crucial in most parametric models for LRD. Assumption A2 only defines plausible stochastic structures that can occur in the most diverse applications. Note that A2-LRD does not imply that \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is a Gaussian process or a function of it, as it is assumed in Jach et al. (2012) and Hall et al. (1998).
3 The smoothing-subsampling procedure
The SNR distribution is estimated performing Algorithm 1. This is a simple smoothing-subsampling procedure where for each subsample \(P_{\text {signal}}\) is consistently estimated by \(\hat{U}_{n,b,t}\), and \(P_{\text {noise}}\) is estimated by \(\hat{V}_{n,b_1,t}\) on a secondary subsample taken from the previous one. Details and theoretical motivation of the procedure will be treated in Sects. 4 and 5. The distribution is constructed for the SNR expressed in decibel scale.
The procedure is called “Monte Carlo” because the subsample selection is randomized. The latter reduces the huge number of subsamples to be explored. Note that here none of the calculations involve computations over the entire observed sample \(\mathscr {Y}_n\). The latter differs from the classical subsampling for time series data introduced in Politis and Romano (1994) and Politis et al. (2001). In the classical subsampling, one would estimate the variance of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) based on the entire sample. This would require that the estimation of \(s(\cdot )\) is performed globally on \(\mathscr {Y}_n\). In Algorithm 1 both \(s(\cdot )\), and the variance of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) in (7) are estimated blockwise. This blockwise smoothing strategy, where computations are performed only at the subsample level, has been proposed in Coretto and Giordano (2017). The advantages over the classical subsampling are twofold. First, thanks to the increased data acquisition technology, in most of the applications mentioned in Sect. 1, n scales in terms of millions or billions of data points. It is well known that kernel and other nonparametric smoothing methods become computationally intractable for such big sample sizes. In Algorithm 1 the computational complexity for the calculation of \(\hat{s}(\cdot )\) is governed by the subsample size b, which is chosen much smaller than n (see Theorem 2). Second, the kind of signals we want to reconstruct may exhibit strong structural variations along the time axis, therefore, estimation of \(s(\cdot )\) on the entire sample would require the use of optimal kernel methods with local bandwidth increasing the computational burden even more. Working on smaller data chunks allows treating the signal locally. Therefore, simpler kernel methods based on global bandwidth within the subsampled block are better suited to capture the local structure of the signal. Optimal estimation of \(s(\cdot )\), and the random subsampling part of Algorithm 1 are developed in the next two sections.
4 Optimal signal reconstruction
Unless one has enough information about the shape of \(s(\cdot )\), nonparametric estimators of functions with proven statistical properties are natural candidates to reconstruct the underlying signal. Our choice is the classical Priestley-Chao kernel estimator (Priestley and Chao 1972), because it can be easily optimized in regression models where the error is not necessarily uncorrelated. The estimator for \(s(\cdot )\) is defined as
The following assumption involving the kernel function \(\mathscr {K}\left( \cdot \right) \) and the bandwidth h is assumed to hold.
Assumption A3
\(\mathscr {K}\left( \cdot \right) \) is a density function with compact support and symmetric about 0. Moreover, \(\mathscr {K}\left( \cdot \right) \) is Lipschitz continuous of some order. The bandwidth \(h \in H=[c_1\Lambda _n^{-1/5}, \; c_2\Lambda _n^{-1/5}]\), where \(c_1 < c_2\) are two positive constants such that: \(c_1\) is arbitrarily small, \(c_2\) is arbitrarily large.
Define
Whenever \(n \rightarrow \infty \) it happens that \(h \rightarrow 0\) and \(\Lambda _nh \rightarrow \infty \).
There are a number of possible choices for \(\mathscr {K}\left( \cdot \right) \) satisfying A3, and we will use the Epanechnikov kernel for its well known efficiency properties. Setting an optimal bandwidth in (9) when the error term may be correlated requires special care. Here an optimal choice of h is even more involved due to the fact that \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) may follow either the SRD or the LRD regime. The sequence (10) has a role in managing this added complexity. Altman (1990) developed the Priestley-Chao kernel estimator (9) with dependent additive errors, and showed that under serial correlation standard bandwidth optimality theory does not apply. Altman (1990) proposed to estimate an optimal h based on a cross-validation function accounting for the dependence structure of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\). Altman’s contribution deals with errors belonging to the class of linear processes with finite memory. Therefore, Altman’s assumptions do not allow the LRD case. Moreover, we consider the SRD assumption because it is typical for stochastic processes with a nonlinear model representation in time series framework. Finally, Altman (1990) assumes that the true autocorrelation function of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is known which is not the case in real world applications.
Let \(\hat{\varepsilon }_i = y_i - \hat{s}(i/n)\), and define the cross-validation objective function
The optimal bandwidth is estimated by minimizing (11), that is
The first term in (11) is the correction factor proposed by Altman (1990), but replacing the true unknown autocorrelations with their sample counterparts \(\hat{\rho }(\cdot )\) up to the Mth order. M is an additional smoothing parameter, but Altman’s contribution does not deal with its choice. Consistency of the optimal bandwidth estimator is obtained if M increases at a rate smaller than the product nh. As in Coretto and Giordano (2017) M is chosen so that the following holds.
Assumption A4
Whenever \(n\rightarrow \infty \); then \(M \rightarrow \infty \) and \(M=O(\sqrt{nh})\).
Let \(\text {MISE}(\hat{s};h)\) be the mean integrated square error of \(\hat{s}(\cdot )\), that is
Let \(h^\star \) be the global minimizer of \(\text {MISE}(\hat{s};h)\). The next result states the optimality of the kernel estimator.
Theorem 1
Assume A1, A2, A3 and A4. \(\hat{h}/{h^\star } {\mathop {\longrightarrow }\limits ^{\text {p}}}1\) as \(n \rightarrow \infty \).
The previous result relates \(\hat{h}\) to the optimal global bandwidth for which convergence rate is known, that is \(O(\Lambda _n^{-1/5} )\). Theorem 1 is equivalent to that given in Coretto and Giordano (2017), however, the difference here is that \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) may well follow LRD. Therefore, proof of Theorem 1 (given in the “Appendix”) needs some further developments.
Remark 1
Theorem 1 improves the existing literature in several aspects. First of all, the proposed signal reconstruction is optimal (in the MISE sense) under both SRD and LRD. Its key feature is that one does not need to identify the type of dependence, that is SRD vs LRD. There are only two smoothing tunings: h that is estimated optimally, and M fixed according to A4. The SRD regime is already treated in Coretto and Giordano (2017). Regarding LRD, the result should be compared to Hall et al. (1995). The advantages of our approach compared to the latter are: (i) the method is simplified by eliminating a tuning needed to deal with LRD, that is the block length for the leave-k-out cross-validation in Hall et al. (1995). This is because the Altman’s cross-validation correction in (11) already incorporates the dependence structure via \(\hat{\rho }(\cdot )\), and M is able to correct (11) without any further step identifying whether LRD or SRD occurs; (ii) here we do not assume existence of higher order moments of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\).
5 Monte Carlo approximation of the subsampling distribution
In this section we exploit the subsampling procedure underlying Algorithm 1. We call this procedure “Monte Carlo”, because it is based on a random selection of subsamples, and here we provide a Monte Carlo approximation of the subsampling distribution of the statistic of interest. Let us introduce the following quantities:
Although the random sequence \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\) is not observable, one can work with its estimate. Replace \(\varepsilon _i\) with \(\hat{\varepsilon }_i\) in the previous formula and obtain
The distribution of a proper scaled and centered \(\hat{V}_n\) can now be used to approximate the distribution of \(\tau _n(V_n-\sigma ^2_{\varepsilon })\) where \(\tau _n\) is defined in (13) and \(\sigma _{\varepsilon }^2:=\text {E}[\varepsilon _t^2]\). One way to do this is to perform the subsampling as proposed in Politis et al. (1999) and Politis et al. (2001).
That is, for all blocks of observations of length b (subsample size) compute \(\hat{V}_n\). However the number of possible subsample is huge even for moderate n. Moreover, in typical cases where n is of the order of millions or billions of samples, the computation of the optimal \(\hat{s}(\cdot )\) would require an enormous computer power. The problem is solved by performing the blockwise smoothing of Algorithm 1 proposed in Coretto and Giordano (2017). Therefore, the signal and the average error are estimated block-wise, so that the computing effort is only driven by b. This allows making the algorithm scalable with respect to n, a very important feature to process data from modern data acquisition systems. Here we investigate the theoretical properties of the estimation Algorithm 1. The formalization is similar to that given in Coretto and Giordano (2017), however, here we deal with a different target statistic, and we face the added complexity of the existence of LRD regimes in \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\).
First define
At a given time point t consider a block of observations of length b, and the statistics computed in Algorithm 1:
with \(\bar{{\varepsilon }}_{b,t}=b^{-1}\sum _{i=t}^{t+b-1} {\varepsilon }_i\) and \(\bar{\hat{\varepsilon }}_{b,t}=b^{-1}\sum _{i=t}^{t+b-1}\hat{\varepsilon }_i\). The empirical distribution functions of \(\tau _n (V_n -\sigma _{\varepsilon }^2)\), based on the true and estimated noise, respectively, are given by
\(\mathbb {I}\left\{ A\right\} \) denotes the usual indicator function of the set A. \(\tau _b\) is defined in (13). Lemma 2 and 3 in the “Appendix” state that the subsampling based on statistic (12) is consistent under both A2-SRD and A2-LRD. Notice that results in Politis et al. (2001) can only be used to deal with SRD. The LRD treatment is inspired to Hall et al. (1998) and Jach et al. (2012). However, we improve upon their results in the sense that the Gaussianity assumption for \(\varepsilon _t\) is avoided under A2-LRD with \(1/2<\gamma _1\le 1\). The quantiles of the subsampling distribution also converges to the quantiles of the asymptotic distribution of \(\tau _n(V_n - \sigma ^2_{\varepsilon })\). This is a consequence of the fact that \(\tau _n (V_n-\sigma _{\varepsilon }^2)\) converges weakly (see Remark 2). For \( \gamma _2 \in (0,1)\) the quantities \(q(\gamma _2)\), \(q_{n,b}(\gamma _2)\) and \(\hat{q}_{n,b}(\gamma _2)\) denote respectively the \(\gamma _2\)-quantiles with respect the distributions G, (see Remark 2), \(G_{n,b}\) and \(\hat{G}_{n,b}\) respectively. We adopt the usual definition that \(q(\gamma _2)=\inf \left\{ x: G(x)\ge \gamma _2\right\} \). Lemma 4 in the “Appendix” states the same consistency for the quantiles. The following remark covers the different cases (A2-SRD and A2-LRD) for the asymptotic distribution of \(\tau _n (V_n - \sigma ^2_{\varepsilon })\).
Remark 2
By A2 it can be shown that \(\tau _n (V_n - \sigma ^2_{\varepsilon } )\) converges weakly to a random variable with distribution, say \(G(\cdot )\), where \(\sigma ^2_{\varepsilon } ={{\,\mathrm{E}\,}}[\varepsilon _t^2]\). Under A2-SRD, \(G(\cdot )\) is a Normal distribution. \(G(\cdot )\) is still a Normal distribution under A2-LRD with \(1/2<\gamma _1\le 1\), which follows from Theorem 4 of Hosking (1996). The same Theorem implies that \(G(\cdot )\) is Normal under A2-LRD with \(\gamma _1=1/2\) when \(a_t\) is normally distributed. Moreover, \(G(\cdot )\) is not Normal under A2-LRD with \(0<\gamma _1<1/2\).
A variant is to reduce the number of subsamples by introducing a random block selection with \(s(\cdot )\) estimated blockwise on subsamples of length b. Let \(I_i\), \(i=1,\ldots K\) be random variables indicating the initial point of every block of length b. We draw, without replacement with uniform probabilities, the sequence \(\left\{ I_i\right\} _{i=1}^K\) from the set \(I=\{1,2,\ldots ,n-b+1\}\). The empirical distribution function of the subsampling variances of \(\hat{\varepsilon }_{t}\) over the random blocks is
In order to get the consistency of the subsample procedure both in the SRD and LRD cases, we consider two subsamples. The first one has a length of b and we use it to estimate the signal, that is \(s(\cdot )\). Instead, the second subsample, which is a subset of the first, has a length \(b_1=o(b^{4/5})\) and we use this second subsample to estimate the variance and its distribution. The following result states the consistency of \(\tilde{G}\) in approximating G.
Theorem 2
AssumeA1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Let \(\hat{s}(t)\) be the estimate of s(t) on a subsample of length b. Let \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{4/5})\) and \(K \rightarrow \infty \), then \(\sup _x\left| \tilde{G}_{n,b_1}(x)-G(x)\right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).
Proof of Theorem 2 is given in the “Appendix”. In analogy with what we have seen before we also establish consistency for the quantiles of \(\tilde{G}(\cdot )\). Let \(\tilde{q}_{n,b_1}(\gamma _2)\) be the \(\gamma _2\)-quantile with respect to \(\tilde{G}(\cdot )\).
Corollary 1
Assume A1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Let \(\hat{s}(t)\) be the estimate of s(t) on a subsample of length b. Let \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{4/5})\) and \(K \rightarrow \infty \), then \(\tilde{q}_{n,b_1}(\gamma _2) {\mathop {\longrightarrow }\limits ^{\text {p}}}q(\gamma _2)\).
Proof of Corollary 1 is given in the “Appendix”.
Remark 3
Note that the second subsample of length \(b_1\) is a consequence of the optimal rate for the estimation of s(t) subsample-wise.
The next result states the consistency of subsample procedure when \(\hat{V}_n\) replaces \(V_n\) in \(\tilde{G}_{n,b}(x)\). Define
Corollary 2
Assume the same assumptions as in Theorem 2. Let \(\hat{s}(t)\) be the estimate of s(t) on a subsample of length b. Let \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{4/5})\) and \(K \rightarrow \infty \), then \(\sup _x\left| \tilde{G}_{n,b_1}^0(x)-G(x)\right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).
Proof of Corollary 2 is given in the “Appendix”.
Remark 4
Following the same arguments as in the proof of Corollary 2, we have that \(V_n-\sigma _{\varepsilon }^2=O_p(\tau _n^{-1})\) and also \(\hat{V}_n-\sigma _{\varepsilon }^2=O_p(\tau _n^{-1})\) in the cases of SRD and LRD with \(5/8\le \gamma _1\le 1\). Moreover, in the proof of Theorem 2 the key point for the consistency is \(\tau _{b_1}\Lambda _b^{-4/5}\rightarrow 0\) when \(n\rightarrow \infty \). Therefore, if we set \(b_1=b\) the results of Theorem 2 and Corollary 2 hold again in the cases of SRD and LRD with \(5/8\le \gamma _1\le 1\). In fact, \(\tau _b\Lambda _b^{-4/5}\rightarrow 0\) and \(\tau _b\tau _n^{-1}\rightarrow 0\) when \(n\rightarrow \infty \).
Now, by using the previous results, we can state that the subsample strategy is consistent to estimate the asymptotic distribution of \(\tau _m(\widehat{SNR}-SNR)\) where \(\widehat{SNR}\) is defined in (4). The statistic \(\widehat{SNR}\) has the numerator and denominator depending on n and m, respectively. The latter is mimicked in the subsample procedure. In fact, a subsample of length b is used for the estimation of the signal power, while a subsample of length \(b_1\) is used to estimate the variance of the error term.
Theorem 3
Let \(\mathscr {Y}_n:=\{ y_1, y_2, \ldots , y_n\}\) be a sampling realization of \( \{Y_i\}_{i \in \mathbb {Z}}\). Assume A1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Assume \(n\rightarrow \infty \), \(b\rightarrow \infty \), \(b/n \rightarrow 0\), \(b_1=o(b^{2/5})\), \(m=o(n^{2/5})\), \(b_1/m\rightarrow 0\) and \(K \rightarrow \infty \), then
where
and \(\mathbb {Q}(x)\) is the asymptotic distribution of \(\tau _m(\widehat{SNR}-SNR)\).
Proof of Theorem 3 is given in the appendix. Note that in Theorem 3 we need \(b_1=o(b^{2/5})\) instead of \(b_1=o(n^{4/5})\) found in previous results. The reason for this is that the statistical functional \(\widehat{SNR}\) is more complex than \(\hat{V}_n\) and a different relative speed for the secondary block size \(b_1\) is required.
Theorem 3 provides the theoretical justification for the consistency of the subsample procedure with respect to the statistic \(\widehat{SNR}\). Let \(q^{Q}(\gamma _2)\) and \(\tilde{q}^{Q}_{n,b_1}(\gamma _2)\equiv \tilde{q}^{Q}_{n,b_1}(\gamma _2|\tau _{b_1})\) be the quantiles with respect to \(\mathbb {Q}(x)\) and \(\mathbb {Q}_n(x)\), respectively. Note that we write \(\tilde{q}^{Q}_{n,b_1}(\gamma _2|\tau _{b_1})\) to highlight the dependence on the scaling factor \(\tau _{b_1}\) as in Sect. 8 of Politis et al. (1999). The main goal is to do inference for SNR without estimating the long memory parameter, and without using the sample statistic \(\widehat{SNR}\). In this way, we do not need to fix or estimate m. To do this, we use Lemma 8.2.1 in Politis et al. (1999). First, \(\mathbb {Q}(x)\) always has a strictly positive density function, at least, in a subset of real line (see Hosking (1996) and references therein). So, by Lemma 8.2.1 in Politis et al. (1999), and using the same arguments as in the proof of Corollary 1, we have that
Following the same lines as in Sect. 8 of Politis et al. (1999), we have that
Note that \(\tilde{q}^{Q}_{n,b_1}(\gamma _2|1)\) is the quantile with respect to the empirical distribution function \(1/K\sum _{i=1}^K\mathbb {I}\left( \widehat{SNR}_{n,b,I_i}\le \frac{x}{\tau _{b_1}}+\widehat{SNR}\right) \). Therefore, by (14) and (15) it follows that
Since \(\widehat{SNR}=SNR+O_p(\tau _m^{-1})\) and \(\tau _{b_1}/\tau _{m}\rightarrow 0\) when \(n\rightarrow \infty \), we have that
Therefore, a confidence interval for SNR with a nominal level of \(\gamma _2\) is given by
It is possible to consider the methods of self-normalization as in Jach et al. (2012), and the estimation of the scaling factor \(\tau \) as in Politis et al. (1999). These methods would lead to more efficient confidence bands, in the sense that these would be first order correct with a rate of \(\tau _m^{-1}\) instead of \(\tau _{b_1}^{-1}\). However, this would require the estimation of the unknown constants as in Jach et al. (2012).
Remark 5
In Theorems 2, Corollary 2 and Theorem 3 the definition of \(\tilde{G}(\cdot )\), \(\tilde{G}^0(\cdot )\) and \(\mathbb {Q}_n(\cdot )\) depend on quantities (\(V_n\), \(\hat{V}_n\) and \(\widehat{\text {SNR}}\)) computed on the whole sample. On the other hand, these results give the theoretical framework for computing confidence interval as in (16), and these calculations will not require any calculation on the entire sample. In other words, \(V_n\), \(\hat{V}_n\), and \(\widehat{SNR}\) are needed to center the involved distributions, but not needed to approximate the quantiles as in (16). Therefore, in this work these quantities only have a theoretical role to show that the subsample procedure does not produce degenerate asymptotic distributions.
6 Numerical experiments
In this section we present numerical experiments on simulated data. The assumptions given in this paper are rather general, and it is not possible to design a computer experiment that can be considered representative of all the kind of structures consistent with A1–A4. Here we assess the performance of Algorithm 1 under different scenarios for the structure of the noise term. In order to do this we keep the structure of true signal fixed, and we investigate three variations of the noise data generating process. Data are sampled at fixed sampling frequency set at Fs = 44100Hz, a common value in audio applications. Let [0, T] be the data acquisition time interval, where T is the duration of the simulated signal in seconds. The signal is sampled at time \(t=t_1, t_2, \ldots , T\), with \(t_i = (i-1) / \text {Fs}\) for \(i=1,2,\ldots , T{\times }\text {Fs}\), as follows
and \(t_i = i/({T{\times }\text {Fs}})\). Therefore, the signal consists of a sinusoidal wave that produces energy at 50Hz. The signal power is equal to \(A_s^2/2\), where \(A_s\) is a scaling constant properly tuned to achieve a given true SNR. We set \(T=30\text {sec}\) (implying \(n=1,323,000\)), and we consider the following three cases for the noise.
- AR:
-
The noise is generated from an AR(1) process with independent normal innovations. This produces serial correlation in the error term and represents a case for SRD. In particular \(\varepsilon _i = -0.7\varepsilon _{i-1} + u_i\), where \(\{u_i\}\) is an i.i.d. sequence with distribution \(\text {Normal}(0, A_\varepsilon )\), where \(A_{\varepsilon }\) is set to achieve a certain SNR.
- P1:
-
The random sequence \(\{\varepsilon _i \}\) has power spectrum equal to \(P(f) = A_{\varepsilon } / f^{\beta }\), where P(f) is the power spectral density at frequency fHz. Here \(\beta =0.2\) which induces some moderate LRD in \(\{\varepsilon _i\}\). The scaling constant \(A_\varepsilon \) is set to achieve the desired SNR.
- P2:
-
same as P1 but with \(\beta =0.6\). This design introduces a much stronger LRD.
In P1 and P2 the noise has a so-called \(1/f^\beta \)-“power law” where \(\beta \) controls the amount of long range dependence. Larger values \(\beta \) implies slower rate of decays for the serial correlations. For \(\beta =1\) pink noise is obtained. Values of \(\beta \in [0,1]\) give a behavior between the white noise and the pink noise. In the case P1, \(\gamma _1=1-\beta =0.8\) in A2-LRD. So, the asymptotic distribution of \(\tau _n(V_n-\sigma _{\varepsilon }^2)\) is Normal. Whereas, in the case P2, \(\gamma _1=1-\beta =0.4\). This implies that the asymptotic distribution of \(\tau _n(V_n-\sigma _{\varepsilon }^2)\) is not Normal (see Remark 2).
P1 and P2 are simulated based on the algorithm of Timmer and König (1995) implemented in the tuneR software of Ligges et al. (2016). For each of the three sampling designs we consider two values for the true SNR: 10dB and 6dB. In most applications an SNR = 6dB is considered a rather noisy situations. We recall that at 6dB the signal power is circa only four times the variance of the noise, and 10dB means that the signal power is ten times the noise variance. There is a challenging aspect of these designs. The case with P2 noise and SNR = 6dB, is particularly difficult for our method. In fact, P2 puts relatively large amount of variance (power) at low frequencies around 50 Hz, so that the signal is not well distinguished from some spectral components of the noise. The two parameters of Algorithm 1 are b and K. We consider three settings for the subsample window: \(b=10\text {ms}=441\) samples, \(b=15\text {ms}=662\) samples, and b estimated based on the method proposed in Götze and Račkauskas (2001). In the latter case the optimal b is computed over a grid ranging from \(b=2\) ms to \(b=20\) ms. In many applications is not easy to fix a value for b. However, in certain situations researchers have an idea about the structure of the signal, and the time series is windowed with blocks of a certain length. In applications where the underlying signal is expected to be composed by harmonic components, the usual practice is to take blocks of size approximately equal to the period of the harmonic component with the lowest expected frequency. The rationale is to take the smallest window size so that each block is still expected to carry some information about the low frequency components. For example for speech data usually blocks of 10ms are normally considered (Haykin and Kosko 2001), whereas for music data 50ms is a common choice (Weihs et al. 2016). Note that the artificial data here have an harmonic component at 50Hz with a period of 20ms, and we consider the fixed alternatives \(b=10\)ms and \(b=15\) ms as a robustness check.
We set \(K=200\), of course larger values of K would ensure less subsample induced variability. The \(b_1\), i.e. the window length of the secondary subsample needed to estimate the distribution of the sampling variance, is set according to Theorem 2. This is achieved by setting \(b_1 = [b^{2/5}]\). For each combination of noise type, SNR, and b we considered 500 Monte Carlo replica and we computed statistics to assess the performance of the procedure. Two aspects of the method are investigated corresponding to the two main contributions of the paper.
The first contribution of the paper is Theorem 1, where optimality and consistency of the Priestley-Chao kernel estimator is established under rather general assumptions on the error term. The kernel smoothing is used in Algorithm 1 to estimate the signal power in the numerator of (8). In Table 1 we report the Monte Carlo averages for the Mean Square Error (MSE) of the estimated signal power. Going from the simplest AR to the complex P2 noise model there is an increase in MSE as expected. The longer \(b=15\) ms subsample window always produced better results. The apparently counterintuitive evidence is that for larger amount of noise (lower SNR), the signal power is slightly better estimated. In order to understand this, note that the noise (in all three cases) produces most of its power in a low frequency region containing the signal frequency (i.e. 50Hz). In the lower noise case there is still a considerable amount of noise acting at low frequency that the adaptive nature of the kernel smoother is not able to recognize properly. In Table 2 we report the estimated average b with its Monte Carlo standard error. The estimated b is always near 10ms, and the latter produced results that are only slightly worse than those obtained for fixed \(b=15\) ms.
The second contribution of the paper is the consistency result (see Theorem 3 and related results) for the distribution of the SNR statistic. In order to measure the quality of method one needs to define the ground truth in terms of the sampling distribution of the target SNR statistic. The derivation of an expression for such a distribution would be an analytically intractable. Therefore, we computed the quantiles of the true SNR statistic based on Monte Carlo integration, and in Table 3 we report the average absolute differences between estimated quantiles and the true counterpart. Based on Corollary 1 the convergence of the distribution of the SNR is mapped into its quantiles, therefore this makes sense. Comparison involves five different quantile levels to assess the behavior of the procedure both in the tails and in the center of the distribution. The average deviations of Table 3 are computed in decibels. Overall the method can capture the center of the distribution pretty well in all cases. The estimation error increases in the tails of the distribution as one would expect. The right tail is estimated better than the left tail. In all cases the performance in the tails of the SNR distribution is better captured with a \(b=15\) ms window, although in the center of the distribution the differences implied by different values of b are much smaller. Going from SNR \(=\) 6 to SNR \(=\) 10 results are clearly better on the left tail of the distribution especially in the case P2. Again the estimated version of b pushes the corresponding results towards the \(b=10\)ms case.
Every method has its own tunings, and the evidence here is that b has some effects on the proposed method. The major impact of b is about the tails of the SNR distribution. The selection of b based on the method proposed by Götze and Račkauskas (2001) deliver a fully satisfying solution that does not require any prior knowledge on the data structure. The only drawback of estimating b is that the overall algorithm needs to be executed for several candidate values of b. As final remark we want to stress that the method proposed here is designed to cope with much larger values of n. In this experiment the sampling is repeated a number of times to produce Monte Carlo estimates, therefore we had to choose an n compatible with reasonable computing times according to the available hardware. A limited number of trials with T up to several minutes (which implies that n goes up to several millions) have been successfully tested without changing the final results. Therefore, we can conclude that the algorithm scales well with the sample size.
7 Application to EEG data
In this section we illustrate an application of the proposed methodology to electroencephalography (EEG) data obtained from the PhysioNet repository (Goldberger et al. 2000). In particular we considered the “CHB-MIT Scalp EEG Database” available at https://physionet.org/pn6/chbmit/. The database contains EEG traces recorded at the Children’s Hospital Boston on pediatric subjects with intractable seizures. Subjects were monitored for several days after the withdrawal of anti-seizure medication before the final decision about the surgical intervention. 22 subjects were traced during the experiment for several days using the international 10–20 EEG system. The latter is a standard that specifies electrode positions and nomenclature. Therefore, for each subject 21 electrodes have been placed in certain positions of the scalp, each of these electrodes produced an electric signal sampled at 256 Hz and measured with 16bit precision. This means that each day (24 h), the EEG machine produced 21 time series each containing \(n = 22,118,400\) data points for a total of 464,486,400 amplitude measurements for each subject in the experiment. A description of the “CHB-MIT Scalp EEG Database”, as well as details about the data acquisition is given in Shoeb (2009).
EEG signals have complex structures. Various sources of noise can be injected in the measurement chain, therefore it is always of interest to understand the behavior of the SNR. For this application we considered data for the first 3 subjects of the database, and we considered two electrode positions labeled P8-02 and T8-P8 in the 10–20 EEG system. The P8-02 electrode is placed on the parietal lobe responsible for integrating sensory information of various types. The T8-P8 electrode is placed on the temporal lobe which transforms sensory inputs into meanings retained as visual memory, language comprehension, and emotion association. An example of these traces is given in Fig. 1.
The method proposed here has been applied to obtain confidence intervals for the SNR. An SNR\(\ge 10\)dB can be considered a requirement for a favorable noise floor in these applications. In order to assess the robustness of the procedure with respect to the choice of the subsampling window b, for each case we considered windows of fixed size \( b = \{3\text {sec},5\text {sec},7\text {sec}\}\) which means \(b=\{768, 1280, 1792\}\), plus the estimated b with the method proposed by Götze and Račkauskas (2001). The estimation of b is performed on a grid of equispaced points between 2sec and 10sec. The literature about EEG signals doesn’t tell us whether the processes involved have a clear time scale, but 5sec is considered approximately the time length needed to identify interesting cerebral activities. For each b the corresponding \(b_1\) is set according to \(b_1 = [b^{2/5}]\) as for the numerical experiments. In Table 4 lower and upper limits for 90% and 95% confidence intervals of the SNR are reported.
Overall the results with estimated b are comparable to those with fixed b. The upper limits of these confidence intervals is never smaller than 10dB. The lower limits is negative in all cases, which means that for all cases there is a chance that the power of the stochastic component dominates that of the deterministic component in model (1). While the upper limit of these confidence intervals is rather stable across units for the same b value, larger differences are observed in terms of the lower limit. All this is a clear indication of the asymmetry of the SNR statistic. But this is expected since the two tails of the SNR statistic reflects two distinct mechanisms. In fact, a negative value for the SNR statistic (left tail) corresponds to situations where the dynamic of the observed time series is driven by the error term of equation (1). On the other hand a positive value of the SNR statistic (right tail) corresponds to situations where the dynamic is driven by the smooth changes induces by \(s(\cdot )\). Ceteris paribus, going from 90% level to 95% does not change the results dramatically. Note that in this kind of applications a 3dB difference is not considered a large difference. Regarding data recorded in the P8-02 position the length of the confidence interval, going from 90% to 95% changes between 1.28dB to 3.6dB, where the maximum variation is measured for Subject 3 when \(b=7\)sec. For the T8-P8 case the length of the confidence interval, going from 90% to 95%, changes between 1.52dB to 3.83dB, and here the maximum variation is measured for Subject 2 when \(b=7\)sec.
Some pattern is observed across experimental units. For given confidence level and b, overall Subject 1 reports the shortest confidence intervals. Subject 2 reports the longest intervals for records in position P8-02. Subject 3 reports the longest intervals in position T8-P8. The variations across values of b, with all else equal, are not dramatic. The settings with \(b=3\), and \(b=5\), produced longer intervals if compared with \(b=7\)sec and the optimal b. The data-driven method of Götze and Račkauskas (2001) produced an estimated b in the range [3sec, 7sec] for P8-02 data, and [6sec, 8sec] for T8-P8 data. These values are comparable with the rule of thumb that 5sec is a reasonable time scale for the kind of signals involved here. The general conclusion is that in absence of relevant information, the method of Götze and Račkauskas (2001) gives a useful data-driven choice of b.
8 Conclusions and final remarks
In this paper we developed an estimation method that consistently estimates the distribution of a SNR statistic in the context of time series data with errors belonging to a rich class of stochastic processes. We restricted the model to the case where the signal is a smooth function of time. The theory developed here can be easily adapted to more general time series additive regression models. The reference model for the observed data, and the theory developed here adapts to many possible applications that will be the object of a distinct paper. In this work we concentrated on the theoretical guarantees of the proposed method. The estimation is based on a random subsampling algorithm that can cope with massive sample sizes. Both the smoothing, and the subsampling techniques at the earth of Algorithm 1 embodies original innovations compared to the existing literature on the subject. Numerical experiments described in Sect. 6 showed that the proposed algorithm performs well in finite samples.
References
Altman NS (1990) Kernel smoothing of data with correlated errors. J Am Stat Assoc 85(411):749–759
Brillinger DR, Irizarry RA (1998) An investigation of the second-and higher-order spectra of music. Sig Process 65(2):161–179
Conte E, Maio AD (2002) Adaptive radar detection of distributed targets in non-gaussian noise. In: RADAR 2002. IEE
Coretto P, Giordano F (2017) Nonparametric estimation of the dynamic range of music signals. Aust N Z J Stat 59(4):389–412
Czanner G, Sarma SV, Ba D, Eden UT, Wu W, Eskandar E, Lim HH, Temereanca S, Suzuki WA, Brown EN (2015) Measuring the signal-to-noise ratio of a neuron. Proc Natl Acad Sci 112(23):7141–7146
Goldberger AL, Amaral LAN, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng C-K, Stanley HE (2000) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101(23):e215–e220
Götze F, Račkauskas A (2001) Adaptive choice of bootstrap sample sizes. In: State of the art in probability and statistics. Institute of Mathematical Statistics, pp 286–309
Gray R (1990) Quantization noise spectra. IEEE Trans Inf Theory 36(6):1220–1244
Hall P, Jing BY, Lahiri SN (1998) On the sampling window method for long-range dependent data. Stat Sin 8:1189–1204
Hall P, Lahiri SN, Polzehl J (1995) On bandwidth choice in nonparametric regression with both short- and long-range dependent errors. Ann Stat 23(6):1921–1936
Haykin SS, Kosko B (2001) Intelligent signal processing. Wiley-IEEE Press
Hosking JRM (1996) Asymptotic distributions of the sample mean, autocovariances, and autocorrelations of long-memory time series. J Econom 73:261–284
Jach A, McElroy T, Politis DN (2012) Subsampling inference for the mean of heavy-tailed long-memory time series. J Time Ser Anal 33:96–111
Kalogera V (2017) Too good to be true? Nat Astron 1(0112):1–4
Kay SM (1993) Fundamentals of statistical signal processing, volume 1. Estimation theory. Prentice Hall, Englewood Cliffs
Kemp B, Zwinderman A, Tuk B, Kamphuisen H, Oberye J (2000) Analysis of a sleep-dependent neuronal feedback loop: the slow-wave microcontinuity of the EEG. IEEE Trans Biomed Eng 47(9):1185–1194
Kenig E, Cross MC (2014) Eliminating 1/f noise in oscillators. Phys Rev E 89(4):0429011–0429017
Kogan S (1996) Electronic noise and fluctuations in solids. Cambridge University Press, Cambridge
Levitin DJ, Chordia P, Menon V (2012) Musical rhythm spectra from bach to joplin obey a 1/f power law. Proc Natl Acad Sci 109(10):3716–3720
Ligges U, Krey S, Mersmann O, Schnackenberg S (2016) tuneR: analysis of music. CRAN
Loizou PC (2013) Speech enhancement: theory and practice, 2nd edn. CRC Press, Boca Raton
Parzen E (1966) Time series analysis for models of signal plus white noise. Technical report, Department of Statistics, Stanford University
Parzen E (1999) Stochastic processes (classics in applied mathematics). Society for Industrial and Applied Mathematics
Politis DN, Romano JP (1994) Large sample confidence regions based on subsamples under minimal assumptions. Ann Stat 22(4):2031–2050
Politis DN, Romano JP, Wolf M (1999) Subsampling. Springer, New York
Politis DN, Romano JP, Wolf M (2001) On the asymptotic theory of subsampling. Stat Sin 11(4):1105–1124
Priestley MB, Chao MT (1972) Nonparametric function fitting. J R Stat Soc 34:385–392
Richards MA (2014) Fundamentals of Radar signal processing, second edition (McGraw-Hill professional engineering). McGraw-Hill Education, New York
Romano JP (1989) Bootstrap and randomization tests of some nonparametric hypotheses. Ann Stat 17(1):141–159
Shoeb AH (2009) Application of machine learning to epileptic seizure onset detection and treatment. Ph.D. thesis, Massachusetts Institute of Technology
Timmer J, König M (1995) On generating power law noise. Astron Astrophys 300:707
Ullsperger M, Debener S (2010) Simultaneous EEG and fMRI: recording, analysis, and application. Oxford University Press, Oxford
Voss RF, Clarke J (1975) “1/f noise” in music and speech. Nature 258:317–318
Voss RF, Clarke J (1978) “1/f noise” in music: music from 1/f noise. J Acoust Soc Am 63:258
Weihs C, Jannach D, Vatolkin I, Rudolph G (2016) Music data analysis: foundations and applications. Chapman and Hall/CRC, New York
Weinberg G (2017) Radar detection theory of sliding window processes. CRC Press, Boca Raton
Weissman MB (1988) 1fnoise and other slow, nonexponential kinetics in condensed matter. Rev Mod Phys 60(2):537–571
Acknowledgements
We thank the editor and two anonymous reviewers for their constructive comments, which helped to improve the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this section we report the proofs of statements and some useful technical lemmas. First, we state a lemma to evaluate the \(\text {MISE}(\hat{s};h)\).
Lemma 1
Assume A1, A2 and A3. For \(t\in I_h=(h,1-h)\)
where \(\hat{s}\) is the kernel estimator in (9), \(R(s'')=\int _{I_h}[s''(t)]^2dt\), \(d_K=\int u^2\mathscr {K}(u)du\), \(N_K=\int \mathscr {K}^2(u)du\), \(\sigma ^2_{\varepsilon }={{\,\mathrm{E}\,}}[\varepsilon _t^2]\), \(\Lambda _n\) is defined in (10) and
Proof
By A3 it follows that conditions A–C of Altman (1990) are satisfied. Now, let
For the cases SRD and LRD with \(\gamma _1=1\) the conditions D and E of Altman (1990) are still satisfied with \(\rho _n(j)\). Following the same arguments as in the proof of Theorem 1 of Altman (1990) the result follows. Finally, in the last case, \(\rho _n(j)\) satisfies condition D but not condition E of Altman (1990). So, we have
Therefore, using Lemma A.4 in Altman (1990), it follows that
The latter completes the proof. \(\square \)
The \(\text {AMISE}(\hat{s};h)\) is the asymptotic MISE, the main part of the MISE. Note that Lemma 1 gives a similar formula to (2.8) in Theorem 2.1 of Hall et al. (1995). However, differently from Hall et al. (1995) our approach does not need to introduce an additional parameter to capture SRD and LRD. Also notice that taking \(h\in H\) as in A3, implies that \(\text {MISE}(\hat{s};h)=O\left( \Lambda _n^{-4/5}\right) \), which means that the kernel estimator achieves the global optimal rate.
Proof of Theorem 1
Lemma 1 holds under A1, A2 and A3. Let \(\hat{\gamma }(j)=\frac{1}{n}\sum _{t=1}^{n-j}\hat{\varepsilon }_t\hat{\varepsilon }_{t+j}\) be the estimator of the autocovariance \(\gamma (j)\) with \(j=0,1,\ldots \). By A3\(r_n=\frac{1}{\Lambda _nh}+h^4=\Lambda _n^{-4/5}\), and by Markov inequality
for some \(\eta >0\) and when \(n\rightarrow \infty \).
It means that \(\frac{1}{n}\sum _{i=1}^{n}\left( s(i/n)-\hat{s}(i/n)\right) ^2=\text {AMISE}(\hat{s};h)+o_p(r_n)\). Rewrite \(\hat{\gamma }(j)\) as
By (18) and Cauchy–Schwartz inequality it results that term I\(=O_p(r_n)\) in \(\hat{\gamma }(j)\). Consider term III in (19). Without loss of generality assume t \(s(t)\not = 0\). By Chebyshev inequality
for some \(\eta >0\). By using the same arguments as in the proof of Lemma 1, it follows that \(MSE(\hat{s};h)=O\left( r_n\right) \) so that \(\hat{s}(t)=s(t)(1+O_p(r_n^{1/2}))\). Therefore, it is sufficient to investigate the behaviour of
\(\sum _j^n\rho (j)=O(\log n)\) under LRD with \(\gamma _1=1\), and \(\sum _j^n\rho (j)=O(n^{1-\gamma _1})\) under LRD with \(0<\gamma _1<1\). By A1, and applying Chebyshev inequality, it happens that III\(=O_p(\Lambda _n^{-1/2})\). Based on similar arguments one has that term II\(=O_p(\Lambda _n^{-1/2})\). Now consider last term of (19), and notice that it is the series of products of autocovariances. Theorem 3 in Hosking (1996) is used to conclude that the series is convergent under SRD and LRD with \(1/2<\gamma _1\le 1\), while it is divergent under LRD with \(0<\gamma _1\le 1/2\). Based on this, direct application of Chebishev inequality to term IV implies that IV\(=o_p(\Lambda _n^{-1/2})\). Then \(\hat{\gamma }(j)=\gamma (j) + O_p(r_n) +O_p(\Lambda _n^{-1/2}) +O_p(j/n)\), where the \(O_p(j/n)\) is due to the bias of \(\hat{\gamma }(j)\). This means that \(\hat{\rho }(j) = \rho (j) + O_p(r_n)+ O_p(\Lambda _n^{-1/2})+O_p(j/n)\). Since \(\mathscr {K}(\cdot )\) is bounded then one can write
Using A4 and \(h=O(\Lambda _n^{-1/5})\), A3 implies that
Consider
and by (20) it follows that
which implies that \(Q_1=o_p(r_n)\). It means that the CV function, as defined in (22) of Altman (1990) with the estimated correlation function, has an error rate of \(o_p(r_n)\) with respect to
Now, we can apply the classical bias correction and based on (14) in Altman (1990), we have that
Since \(\text {AMISE}(\hat{s};h)=O(r_n)\), it follows that \(\hat{h}\), the minimizer of \(\text {CV}(h)\), is equal to \(h^\star \), the minimizer of \(\text {MISE}(\hat{s};h)\), asymptotically in probability. By Lemma 1, it follows that \(h^\star \) is the same minimizer with respect to \(\text {AMISE}(\hat{s};h)\) asymptotically. \(\square \)
The subsequent Lemmas are needed to show Theorem 2 and Corollary 1.
Lemma 2
Assume A2. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Then \(n\rightarrow \infty \), \(b\rightarrow \infty \) and \(b/n \rightarrow 0\) implies \(\sup _x\left| G_{n,b}(x)-G(x)\right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\), and \(q_{n,b}(\gamma _2) {\mathop {\longrightarrow }\limits ^{\text {p}}}q(\gamma _2)\) for all \(\gamma _2 \in (0,1)\).
Proof
Under A2-SRD, Theorems 4.1 and 5.1 of Politis et al. (2001) hold and the results follow. The rest of the proof deals with the LRD case. Since G(x) is continuous (Hosking 1996), we follow proof of Theorem 4 of Jach et al. (2012). Fix \(G_{n,b}^0(x)=\frac{1}{N}\sum _{i=1}^N \mathbb {I}\left\{ \tau _b\left( V_{n,b,i}-\sigma _{\varepsilon }^2\right) \le x\right\} \) with \(N=n-b+1\). It is sufficient to show that \({{\,\mathrm{Var}\,}}[G_{n,b}^0(x)] \rightarrow 0\) as \(n\rightarrow \infty \). Apply Theorem 2 in Hosking (1996) to conclude that \(\tau _n\left( V_n-\sigma _{\varepsilon }^2\right) \) has the same distribution as \(\tau _n\left( V_n^{1}\right) \), where \(V_n^{1}=\frac{1}{n}\sum _{i=1}^n(\varepsilon _i^2-\sigma _{\varepsilon }^2)\). Therefore, we have to show that \({{\,\mathrm{Var}\,}}[G^1_{n,b}(x)] \rightarrow 0\) as \(n \rightarrow \infty \), where
Using the stationarity of \( \{\varepsilon _i\}_{i \in \mathbb {Z}}\), it follows that \({{\,\mathrm{Var}\,}}[G_{n,b}^1(x)] = {{\,\mathrm{E}\,}}[(G_{n,b}^1(x)-G_b^1(x))^2]\), where \(G_b^1(x)=P\left( \tau _bV_b^1\le x\right) \). By Hall et al. (1998) the Hermite rank of the square function is 2. Then, based on the same arguments as in the proof of Theorem 2.2 of Hall et al. (1998) with \(q=2\), we can write
Consider
After some algebra, we obtain
where for \(k=1,2,\ldots \), \(\phi _2(k)\) are the autocovariances of \(\{\varepsilon _t^2\}_{t_\in \mathbb {Z}}\). For \(k \rightarrow \infty \), A2-LRD with \(0<\gamma _1\le 1\) implies that \(\phi _2(k)=O(k^{-2\gamma _1})\) by Theorem 3 of Hosking (1996). Take (22) and note that
where
The latter implies that for \(n\rightarrow \infty \), (22) converges to zero. Therefore, \(\tau _bV_{n,b,1}^1\) and \(\tau _bV_{n,b,N}^1\) are asymptotically independent. The latter can be argued based on asymptotic normality when \(1/2 \le \gamma _1 \le 1\). For the case \(0< \gamma _1 < 1/2\) the asymptotic independence can be obtained by using Theorem 2.3 of Hall et al. (1998). Thus, right hand side of (21) converges to zero as \(n\rightarrow \infty \) by Cesaro Theorem. The latter shows that \(\sup _x\left| G_{n,b}(x)-G(x)\right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).
Following the same arguments as in Theorem 5.1 of Politis et al. (2001), and by using the first part of this proof one shows that \(q_{n,b}(\gamma _2){\mathop {\longrightarrow }\limits ^{p}}q(\gamma _2)\). The latter completes the proof. \(\square \)
Lemma 3
Assume A1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Let \(\hat{s}(t)\) be the estimate of s(t) computed on the entire sample (of length n). Then \(n \rightarrow \infty \) and \(b=o(n^{4/5})\) implies \(\sup _x \left| \hat{G}_{n,b}(x)-G(x) \right| {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).
Proof
Denote \(r_n=\frac{1}{\Lambda _nh}+h^4\). By Lemma 1 and A3, \(r_n=\Lambda _n^{-\frac{4}{5}}\). \(\hat{s}(t)\) is computed on the whole time series. By Lemma 2, we can use the same approach as in Lemma 1, part (i) of Coretto and Giordano (2017). We have only to verify that \(\tau _b r_n \rightarrow 0\) as \(n \rightarrow \infty \) which is always true if \(b=o(n^{4/5})\). \(\square \)
Lemma 4
Assume A1, A2, A3 and A4. Suppose that \(\{a_t\}\), in A2, is Normally distributed when \(0<\gamma _1\le 1/2\). Let \(\hat{s}(t)\) be the estimate of s(t) computed on the entire sample (of length n). Then \(n \rightarrow \infty \) and \(b=o(n^{4/5})\) implies \(\hat{q}_{n,b}(\gamma _2) {\mathop {\longrightarrow }\limits ^{\text {p}}}q(\gamma _2)\) for any \(\gamma _2 \in (0,1)\).
Proof
Using the same arguments as in Lemma 3 we have that \(\hat{G}_{n,b}(x)-G_{n,b}(x) = o_p(1)\) for each point x. By the continuity of G(x) at all x we have that \(q_{n,b}(\gamma _2) {\mathop {\longrightarrow }\limits ^{\text {p}}}q(\gamma _2)\) by Lemma 2. Therefore \(\hat{q}_{n,b}(\gamma _2) {\mathop {\longrightarrow }\limits ^{\text {p}}}q(\gamma _2)\). Note that assumption \(b=o(n^{4/5})\) is needed to deal with A2-LRD, however for A2-SRD only we would only need \(b=o(n)\). \(\square \)
Proof of Theorem 2
Let \(P^*(X)\) and \({{\,\mathrm{E}\,}}^*(X)\) be the conditional probability and the conditional expectation of a random variable X with respect to a set \(\chi = \left\{ Y_1,\ldots ,Y_n\right\} \). Let \(\hat{G}_{n,b_1}^b(x)\) be the same as \(\hat{G}_{n,b}(x)\), but now \(\hat{s}(t)\) is estimated on each subsample of length b, and the variance of the error term is computed on the same subsample of length \(b_1 < b\). Without loss of generality, we consider the first observaiton with \(t=1\) as in Algorithm 1. Then,
using Lemma 1 as in the proof of Lemma 3. Let \(b_1=o(b^{4/5})\).
Let \(Z_i(x)=\mathbb {I}\left\{ \tau _{b_1}\left( \hat{V}_{n,b_1,i}-V_n\right) \le x\right\} \) and \(Z_i^*(x)=\mathbb {I}\left\{ \tau _{b_1}\left( \hat{V}_{n,b_1,I_i}-V_n\right) \le x\right\} \). \(I_i\) is a Uniform random variable on \(I=\left\{ 1,2,\ldots ,n-b+1\right\} \). \(P(Z_i^*(x)=Z_i(x)|\chi )=\frac{1}{n-b+1}\)\(\forall i\) at each x. Write \(\tilde{G}_{n,b_1}(x)=\frac{1}{K}\sum _{i=1}^KZ_i^*(x)\), it follows that
as \(n\rightarrow \infty \), the latter is implied by by Lemma 3, and the fact that \(\tau _{b_1}\Lambda _b^{-4/5}\rightarrow 0\) when \(0<\gamma _1\le 1\) in assumption A2.
Since \(\{I_i\}\) is the set of uniform random variables sampled without replacement, we can apply Corollary 4.1 of Romano (1989).
Therefore it follows that \(\tilde{G}_{n,b_1}(x)-\hat{G}_{n,b_1}^b(x){\mathop {\longrightarrow }\limits ^{\text {p}}}0\) as \(K\rightarrow \infty \) and \(n\rightarrow \infty \). Applying the delta method approach
as \(K\rightarrow \infty \), \(n\rightarrow \infty \) and \(\forall x\). Since G(x) is continuous, the convergence is uniform because of the argument of the last part of the proof of Theorem 2.2.1 in Politis et al. (1999). This concludes the proof. \(\square \)
Proof of Corollary 1
The results follow from the proof of Lemma 4 by replacing Lemma 3 with Theorem 2. \(\square \)
Proof of Corollary 2
We can write \(\tilde{G}_{n,b_1}^0(x)\) as
So, it is sufficient to use Theorem 2 to show that \(\sup _x|\tilde{G}_{n,b_1}(x)-G(x)|{\mathop {\longrightarrow }\limits ^{\text {p}}}0\) and we only need to show that \(\tau _{b_1}\left( V_n-\hat{V}_n\right) {\mathop {\longrightarrow }\limits ^{\text {p}}}0\).
Following the same arguments as in the proof of Theorem 1, we have that \(V_n-\sigma _{\varepsilon }^2=O_p\left( \tau _n^{-1}\right) \) and
So, \(\tau _{b_1}\left( V_n-\hat{V}_n\right) =\tau _{b_1}\left( V_n-\sigma _{\varepsilon }^2\right) -\tau _{b_1}\left( \hat{V}_n-\sigma _{\varepsilon }^2\right) \).
Since \(b_1=o(b^{4/5})\), we have that \(\tau _{b_1}\left( V_n-\sigma _{\varepsilon }^2\right) =O_p\left( \tau _{b_1}\tau _n^{-1}\right) =o_p(1)\) and
In both cases, it follows that \(O_p\left( \tau _{b_1}\tau _n^{-1}\right) =o_p(1)\) and \(O_p\left( \tau _{b_1}\Lambda _n^{-4/5}\right) =o_p(1)\), respectively. Finally, we can conclude that the result follows. \(\square \)
Proof of Theorem 3
By (4) we have that
and \(SNR=10\log _{10}\left( {\sigma _{\varepsilon }^{-2}{\int s^2(t)dt}}\right) \). First, we analyze the quantity \(\tau _m(\widehat{SNR}-SNR)\). So we can write
Using the same arguments as in the proof of Theorem 1, it follows that \(\hat{V}_m-\sigma _{\varepsilon }^2=O_p(\tau _m^{-1})\). Expanding \(\log _{10}(1+x)\) in Taylor’s series, we have that
and
Now, we show the last result. From the proof of Theorem 1 and by assumption A3, we have that \(\hat{s}(t)=s(t)\left( 1+O_p(\Lambda _n^{-2/5})\right) \). Therefore,
Now, we can write
By using the convergence of the quadrature of a bounded and continuous function to its integral, it follows that \(II_s=O(n^{-1})\). By (24), we have that
Since \(m=o(n^{2/5})\), it follows that \(\tau _m\Lambda _n^{-2/5}\rightarrow 0\) as \(n\rightarrow \infty \). So, (23) is shown.
Hence, we can conclude that \(\tau _m(\widehat{SNR}-SNR)\) has the same asymptotic distribution as \(\frac{\tau _m}{\sigma _{\varepsilon }^2}\left( \hat{V}_m-\sigma _{\varepsilon }^2\right) \) by the Slutsky’s Theorem. Therefore, assumption 3.2.1 of Politis et al. (1999) is verified by Theorem 2.
Consider the SNR evaluated at a given point, namely \(SNR_i=10\log _{10}\left( \frac{s^2(t_i)}{\sigma _{\varepsilon }^2}\right) \)., and write \(\tau _{b_1}\left( \widehat{SNR}_{n,b,I_i}-\widehat{SNR}\right) \) in \(\mathbb {Q}_n(x)\) as
for a given subsample starting at \(I_i\). By using the first part of this proof, it follows that \(S_2=O_p(\tau _{b_1}/\tau _{m})=o_p(1)\) since \(b_1/m\rightarrow 0\) when \(n\rightarrow \infty \). Now, in order to deal with the quantity \(S_1\), we need to show that
where \(t_i\) is the initial point in the block of b values. By using again the convergence of the quadrature of a bounded and continuous function to its integral, we have that \(\frac{1}{b}\sum _{j=i}^{i+b-1}\left[ s\left( \frac{j-i+1}{b}\right) \right] ^2\rightarrow \int _0^1\left( s^b_i(t)\right) ^2dt\) as \(n\rightarrow \infty \), \(b\rightarrow \infty \) and \(b/n\rightarrow 0\). The quantity \(s_i^b(\cdot )\) denotes the portion of the signal in the block of b values in (0, 1) with i the index for the initial point. Note that \(b/n\rightarrow 0\), and by the mean value theorem \(\int _0^1\left( s_i^b(t)\right) ^2dt\rightarrow s^2(t_i)\). By using, again, the first part of this proof and by (25), we have that \(\tau _{b_1}\left( \widehat{SNR}_{n,b,I_i}-SNR_{I_i}\right) \) has the same asymptotic distribution as \(\frac{\tau _{b_1}}{\sigma _{\varepsilon }^2}\left( \hat{V}_{n,b_1,I_i}-\sigma _{\varepsilon }^2\right) \). Now we study the quantity \(S_3\). First, we show that
when \(n\rightarrow \infty \) with some \(x>0\). Since \(SNR_i-SNR=10\log _{10}\left( \frac{s^2(t_i)}{\int s^2(t)dt}\right) \), the equation in (26) becomes
We have that
Moreover, \(\frac{s^2(t_i)}{\int s^2(t)dt}>10^{\frac{x}{10\tau _{b_1}}}\) can be written as
Summing over the index i and dividing by \(n-b+1\), we can write
Since \(\tau _{b_1}\left( 10^{\frac{x}{10\tau _{b_1}}}-1\right) \rightarrow c>0\) when \(b_1\rightarrow \infty \), by using equation (27) we obtain
Therefore \(\frac{N_n^b}{n-b+1}\rightarrow 0\) as \(n\rightarrow \infty \), where \(N_n^b=\sum _{i=1}^{n-b+1}\mathbb {I}\left\{ \frac{s^2(t_i)}{\int s^2(t)dt}>10^{\frac{x}{10\tau _{b_1}}}\right\} \). Then, (26) is shown.
As in the proof of Slutsky’s Theorem, we split \(\mathbb {Q}_n(x)\) as the sum of three empirical distribution function computed over \(S_1\), \(S_2\) and \(S_3\) respectively. Here the random variables \(I_i\) are treated as in the proof of Theorem 2. Based on the argument above only the component of \(\mathbb {Q}_n(x)\) computed over \(S_1\) has a non degenerate limit distribution, and this will the same as the asymptotic distribution of the estimator for the variance of the error term. The proof is now completed. \(\square \)
Rights and permissions
About this article
Cite this article
Giordano, F., Coretto, P. A Monte Carlo subsampling method for estimating the distribution of signal-to-noise ratio statistics in nonparametric time series regression models. Stat Methods Appl 29, 483–514 (2020). https://doi.org/10.1007/s10260-019-00487-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-019-00487-5