1 Introduction

Non-stationary and non-linear signals are ubiquitous in applications. Standard techniques like Fourier or wavelet transform prove to be inadequate in capturing properly their hidden features [10]. For this reason Huang et al. proposed in 1998 a new kind of algorithm, called Empirical Mode Decomposition (EMD) [41], which allows to unravel the hidden features of a non-stationary signal s(x), \(x\in {\mathbb {R}}\), by iteratively decomposing it into a finite sequence of simple components, called Intrinsic Mode Functions (IMFs). Such IMFs fulfill two properties: the number of extrema and the number of zero crossings must either equal or differ at most by one; considering upper and lower envelopes connecting respectively all the local maxima and minima of the function, their mean has to be zero at any point.

The EMD, over the last two decades, has attracted the attention of many researchers from a wide variety of applied fields. We mention here geophysical studies [6, 42, 70], with applications in Seismology (data denoising and/or detrending [5, 37, 73], pre-seismic signal analysis [3, 7, 78], earthquake-induced co/post-seismic anomalies analysis [8]), Exploration Seismology (for improving signal-to-noise ratio in seismic data processing routines [4, 9] or for seismic interpretation [76]), Geomagnetism [43, 63, 91], Engineering Seismology (mainly for analysing ground motion data [39, 52, 87, 92, 93]), climate, atmospheric and oceanographic sciences [21,22,23,24,25, 45]. Its use is also common in Physics (for data analysis [32, 40, 56, 71], data denoising and/or detrending [31, 77], to assess causal relationships between two time series [86], or to extract information on multiple time scales [18]); Medicine and Biology [16, 19, 29, 30, 35, 49, 82, 85, 97]; Engineering [2, 46, 51, 59, 65, 81]; Economics and Finance [89, 94, 95]; Computer vision [1, 47, 84], just to mention a few. The impact of this technique is testified also by the high number of citationsFootnote 1 that the original paper by Huang et al. [41] has received so far.

However, while the EMD method proves to be extremely powerful in extracting simple components from a given signal, it is unstable to perturbations [83] and susceptible to mode splitting and mode mixing [75]. These are the reasons why the Ensemble Empirical Mode Decomposition (EEMD) method [83] first, and then several alternative noise-assisted EMD-based methods (e.g. the complementary EEMD [88], the complete EEMD [72], the partly EEMD [96], the noise assisted multivariate EMD (NA-MEMD) [74], and fast multivariate EMD (FMEMD) [44]) have been proposed. While these newly developed methods are based on EMD, they all address the so called mode mixing problem and guarantee the stability of the decomposition with respect to noise [38]. However, mode splitting is still an open problem for all these methods and, more importantly, their mathematical analysis is by no means complete [38].

For all these reasons many alternative methods have been proposed recently, like, for instance, the sparse TF representation [33, 34], the Geometric mode decomposition [90], the Empirical wavelet transform [28], the Variational mode decomposition [20], and similar techniques [54, 62, 64].

All of these methods are based on optimization. The only alternative technique proposed so far in the literature which is based on iterations, like the EMD-based methods, is the so called Iterative Filtering (IF) method proposed by Lin et al. in [50]. In recent years, this alternative iterative method has been used in a wide variety of applied fields like, for instance, in Engineering [48, 55, 65], Chemistry [15], Economy [60], Medicine [66], and Physics [26, 27, 53, 57, 58, 61, 67, 68].

The mathematical analysis of IF has been tackled by several authors in the last few years [11, 12, 14, 36, 79, 80], even for 2D or higher dimensional signals [17]. However several problems regarding this technique are still unsolved. In particular it is not yet clear how the stopping criterion used to discontinue the calculations of the IF algorithm influences the decomposition. Furthermore all the aforementioned analyses focused on the convergence of IF when applied to the extraction of a single IMF from a given signal, the so called inner loop. Regarding the decomposition of all the IMFs contained in a signal, which is related to the outer loop convergence and potential finiteness of the decomposition itself, nothing has been said so far. In this work we further analyze the IF technique addressing these and other questions.

The rest of this work is organized as follows: in Sect. 2 we review the details and properties of the method in the continuous setting and we provide new results regarding its inner loop convergence in presence of a stopping criterion as well as the outer loop convergence and finiteness. In Sect. 3 we address the convergence analysis in the discrete setting for both the inner and outer loop of the algorithm. Based on these results in Sect. 4 we propose a new algorithm, called Fast Iterative Filtering (FIF), which produces the very same results as IF, but at consistently lower computational cost.

2 IF algorithm in the continuous setting

The key idea behind this decomposition technique is separating simple oscillatory components contained in a signal s(x), \(x\in {\mathbb {R}}\), the so called IMFs, by approximating the moving average of s and subtracting it from s itself. The approximated moving average is computed by convolution of s with a window/filter function w.

Definition 1

A filter/window w is a nonnegative and even function in \(C^0\left( [-L,\ L]\right) \), \(L>0\), and such that \(\int _{\mathbb {R}}w(z){\text {d}}z=\int _{-L}^{L} w(z){\text {d}}z=1\).

We point out that the idea of iteratively subtracting the moving average comes from the Empirical Mode Decomposition (EMD) method [41] where the moving average was computed as a local average between an envelope connecting the maxima and one connecting the minima of the signal under study. The use of envelopes in an iterative way is the reason why the EMD algorithm is still lacking a rigorous mathematical framework.

The pseudocode of IF is given in Algorithm 1 where \(w_m(t)\) is a nonnegative and compactly supported window/filter with area equal to one and support in \([-l_m,\ l_m]\), where \(l_m\) is called filter length and represents the half support length.

figure a

The IF algorithm contains two loops: the inner and the outer loop, the second and first while loop in the pseudocode respectively. The former captures a single IMF, while the latter produces all the IMFs embedded in a signal.

Assuming \(s_1=s\), the key step of the algorithm consists in computing the moving average of \(s_m\) as

$$\begin{aligned} {\mathcal {L}}_m(s_m)(x)=\int _{-l_m}^{l_m} s_m(x+t)w_m(t){\text {d}t}, \end{aligned}$$
(1)

which represents the convolution of the signal itself with the window/filter \(w_m(t)\).

The moving average is then subtracted from \(s_m\) to capture the fluctuation part as

$$\begin{aligned} {\mathcal {M}}_{m}(s_m)= s_m-{\mathcal {L}}_m(s_m)=s_{m+1} \end{aligned}$$
(2)

The first IMF, \(\text {IMF}_1\), is computed repeating iteratively this procedure on the signal \(s_m\), \(m\in {\mathbb {N}}\), until a stopping criterion is satisfied, as described in the following section.

To produce the 2-nd IMF we apply the same procedure to the remainder signal \(r=s-\text {IMF}_1\). Subsequent IMFs are produced iterating the previous steps.

The algorithm stops when r becomes a trend signal, meaning it has at most one local extremum.

We observe that, even thought the algorithm allows potentially to recompute the filter length \(l_m\) at every step of each inner loop, in practice we always compute the filter length only at the first step of an inner loop and then we keep it constant throughout the subsequent iterations. Hence \(l_m=l_1=l\) for every \(m\ge 1\).

Following [50], one possible way of computing the filter length l is given by the formula

$$\begin{aligned} l:=2\left\lfloor \nu \frac{N}{k}\right\rfloor \end{aligned}$$
(3)

where N is the total number of sample points of a signal s(x), k is the number of its extreme points, \(\nu \) is a tuning parameter usually fixed around 1.6, and \(\left\lfloor \cdot \right\rfloor \) rounds a positive number to the nearest integer closer to zero. In doing so we are computing some sort of average highest frequency contained in s.

Another possible way could be the calculation of the Fourier spectrum of s and the identification of its highest frequency peak. The filter length l can be chosen to be proportional to the reciprocal of this value.

The computation of the filter length l is an important step of the IF technique. Clearly, l is strictly positive and, more importantly, it is based solely on the signal itself. This last property makes the method nonlinear.

In fact, if we consider two signals p and q where \(p\ne q\), assuming \(\text {IMFs}(\bullet )\) represent the decomposition of a signal into IMFs by IF, the fact that we choose the half support length based on the signal itself implies that in general

$$\begin{aligned} \text {IMFs}(p+q)\ne \text {IMFs}(p)+\text {IMFs}(q) \end{aligned}$$

Regarding the convergence analysis of the Iterative Filtering inner loop we recall here the following theorem

Theorem 1

(Convergence of the Iterative Filtering method [14, 36]) Given the filter function \(w(t), t\in [-l,l]\) be \(L^2\), symmetric, nonnegative, \(\int _{-l}^l w(t){\text {d}t}= 1\) and let \(s(x)\in L^2({\mathbb {R}})\).

If \(|1- {\widehat{w}}(\xi )| < 1 \) or \({\widehat{w}}(\xi )=0\), where \({\widehat{w}}(\xi )\) is the Fourier transform of w computed at the frequency \(\xi \),

Then \(\{{\mathcal {M}}^m(s)\}\) converges and

$$\begin{aligned} \text {IMF}_1 = \lim \limits _{m\rightarrow \infty }{{\mathcal {M}}^m(s)(x)}= \int _{-\infty }^{\infty } {\widehat{s}}(\xi ) \chi _{\{\xi \in {\mathbb {R}}| {\widehat{w}}(\xi )=0 \}} e^{2\pi i \xi x} \text {d}\xi \end{aligned}$$
(4)

where \(\chi _{\{A \}}\) represents the indicator function of the subset A of \({\mathbb {R}}\).

We observe here that given \(h:[-\frac{l}{4},\frac{l}{4}]\rightarrow {\mathbb {R}}\), \(z\mapsto h(z)\), nonnegative, symmetric, with \(\int _{\mathbb {R}}h(z){\text {d}}z=\int _{-\frac{l}{4}}^{\frac{l}{4}} h(z){\text {d}}z=1\), if we construct the window \(w_1\) as the convolution of h with itself and we fix \(w_m=w_1\) throughout all the steps m of an inner loop, then the method converges for sure to the limit function (4) which depends only on the shape of the filter function chosen and the support length selected by the method [13, 14].

In general we can assume that the filter functions \(w_m(u)\) are defined as some scaling of an a priori fixed filter shape \(w:[-1,1]\rightarrow {\mathbb {R}}\). In particular we define the scaling function

$$\begin{aligned} g_m:[-1,1]\rightarrow [-l_m,l_m],\qquad t\mapsto u=g_m(t), \end{aligned}$$
(5)

where \(g_m\) is assumed to be invertible and monotone, such that \(w_m(u)=C_m w(g_m^{-1}(u))=C_m w(t)\), where \(t=g_m^{-1}(u)\), \(u=g_m(t)\) and \(C_m\) is a scaling coefficient which is required to ensure that \(\int _{\mathbb {R}}w_m(u){\text {d}}u=\int _{-l_m}^{l_m} w_m(u){\text {d}}u=1\).

We point out that, from Theorem 1, it is clear that the choice of the filter function w(t) can have an important influence in the decomposition produced by the IF algorithm. This is an important direction of research, which has never been explored in the literature so far, to best of our knowledge. However its analysis is out of the scope of this work. We plan to study it in a future work.

Regarding the computation of the scaling coefficient \(C_m\), from the observation that \({\text {d}}u = g_m'(t) {\text {d}}t\), it follows that

$$\begin{aligned} \int _{-l_m}^{l_m} w_m(u){\text {d}}u=\int _{-l_m}^{l_m} C_m w(g_m^{-1}(u)){\text {d}}u=C_m \int _{-1}^{1} w(t) |g_m'(t)| {\text {d}}t \end{aligned}$$
(6)

hence

$$\begin{aligned} C_m= \frac{1}{\int _{-1}^{1} w(t) |g_m'(t)| {\text {d}}t} \end{aligned}$$
(7)

and

$$\begin{aligned} w_m(u)=C_m w(g_m^{-1}(u))=\frac{w(g_m^{-1}(u))}{\int _{-1}^{1} w(t) |g_m'(t)| {\text {d}}t} \end{aligned}$$
(8)

As an example of a scaling function we can consider, for instance, linear or quadratic scalings: \(g_m(t)=l_m t\) and \(g_m(t)=l_m t^2\) respectively.

In the case of linear scaling we have that \(g_m^{-1}(u)=\frac{u}{l_m}\), \(g'_m(t)=l_m\ge 0\), for every \(t\in {\mathbb {R}}\), and \(C_m= \frac{1}{l_m}\). Hence

$$\begin{aligned} w_m(u)=\frac{w\left( \frac{u}{l_m}\right) }{l_m}. \end{aligned}$$
(9)

2.1 IF inner loop convergence in presence of a stopping criterion

In Algorithm 1 the inner loop has to be iterated infinitely many times. In numerical computations, however, some stopping criterion has to be introduced. One possible stopping criterion follows from the solution of

Problem 1

For a given \(\delta > 0\) we want to find the value \(N_0\in {\mathbb {N}}\) such that

$$\begin{aligned} \Vert {\mathcal {M}}^N(s)(x)-{\mathcal {M}}^{N+1}(s)(x)\Vert _{L^2}<\delta \qquad \forall N\ge N_0 \end{aligned}$$

Applying the aforementioned stopping criterion, the inner loop of Algorithm 1 converges in finite steps to an IMF whose explicit form is given in the following theorem where \({\widehat{s}}(\xi )\) represents the Fourier transform of s at frequency \(\xi \).

Theorem 2

Given \(s\in L^2({\mathbb {R}})\) and w obtained as the convolution \({\widetilde{w}}*{\widetilde{w}}\), where \({\widetilde{w}}\) is a filter/window, Definition 1, and fixed \(\delta >0\).

Then, for the minimum \(N_0\in {\mathbb {N}}\) such that the following inequality holds true

$$\begin{aligned} \frac{N_0^{N_0}}{(N_0+1)^{N_0+1}}<\frac{\delta }{\left\| {\widehat{s}}(\xi )\right\| _{L^2}} \quad \forall \xi \in {\mathbb {R}}\end{aligned}$$
(10)

we have that \(\left\| {\mathcal {M}}^N(s)(x)-{\mathcal {M}}^{N+1}(s)(x)\right\| _{L^2}<\delta \quad \forall N\ge N_0\) and the first IMF is given by

$$\begin{aligned} \text {IMF}_1^\text {SC}={\mathcal {M}}^N(s)(x)=\int _{{\mathbb {R}}} (1-{\widehat{w}}(\xi ))^N {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \quad \forall N\ge N_0 \end{aligned}$$
(11)

Proof

From the hypotheses on the filter w it follows that its Fourier transform is in the interval \([0,\ 1]\), see [14]. Furthermore from the linearity of the Fourier transform it follows that

$$\begin{aligned} {\widehat{{\mathcal {M}}^N(s)(x)}(\xi )}= (1- {\widehat{w}}(\xi ))^N {\widehat{s}}(\xi )=\left\{ \begin{array}{ll} {\widehat{s}}(\xi ) &{}\quad \text { if } {\widehat{w}}(\xi )=0\\ (1- {\widehat{w}}(\xi ))^N {\widehat{s}}(\xi ) &{} \quad \text { if } |1- {\widehat{w}}(\xi )| < 1\\ \end{array} \right. \end{aligned}$$

since the Fourier Transform is a unitary operator, by the Parseval’s Theorem, it follows that

$$\begin{aligned} \left\| {\mathcal {M}}^N(s)(x)-{\mathcal {M}}^{N+1}(s)(x)\right\| _{L^2}= & {} \left\| \widehat{{\mathcal {M}}^N(s)(x)}(\xi )-\widehat{{\mathcal {M}}^{N+1}(s)(x)}(\xi )\right\| _{L^2} \\= & {} \left\| (1-{\widehat{w}}(\xi ))^N \left[ 1 -(1-{\widehat{w}}(\xi ))\right] {\widehat{s}}(\xi )\right\| _{L^2} \\= & {} \left\| (1-{\widehat{w}}(\xi ))^N {\widehat{w}}(\xi ){\widehat{s}}(\xi )\right\| _{L^2} \end{aligned}$$

We point out that this formula can also be interpreted as the \(L^2\)-norm of the moving average of \({\mathcal {M}}^N\) which is given by the convolution \({\mathcal {M}}^N*w\).

For a fixed N we can compute the maximum of the function \((1-{\widehat{w}}(\xi ))^N {\widehat{w}}\), for \({\widehat{w}}\in [0,\ 1]\), that is attained for \({\widehat{w}}(\xi )=\frac{1}{N+1}\). Therefore

$$\begin{aligned} \left\| (1-{\widehat{w}}(\xi ))^N {\widehat{w}}(\xi ){\widehat{s}}(\xi )\right\| _{L^2}\le & {} \left\| \left( 1-\frac{1}{N+1}\right) ^N\frac{1}{N+1}{\widehat{s}}(\xi )\right\| _{L^2} \\= & {} \left\| \frac{N^N}{(N+1)^{N+1}}{\widehat{s}}(\xi )\right\| _{L^2}<\delta \end{aligned}$$

Hence we consider the smallest \(N_0\in {\mathbb {N}}\) such that

$$\begin{aligned} \frac{N_0^{N_0}}{(N_0+1)^{N_0+1}}<\frac{\delta }{\left\| {\widehat{s}}(\xi )\right\| _{L^2}}. \end{aligned}$$

\(\square \)

Equation (11) provides a valuable insight on how the implemented algorithm is actually decomposing a signal into IMFs. We recall that without any stopping criterion each IMF of a signal s is given by the inverse Fourier transform of \({\widehat{s}}\) computed at the frequencies corresponding to zeros of \({\widehat{w}}\), as stated in (4).

Therefore, from the observation that \({\widehat{w}}\) is a function not compactly supported and with isolated zeros, the IMFs produced with IF are given by the summation of pure and well separated tones.

Whereas, when we enforce a stopping criterion, we end up producing IMFs containing a much richer spectrum. In fact from (11) we discover that an IMF is now given by the inverse Fourier transform of \({\widehat{s}}\) computed at every possible frequency in \({\mathbb {R}}\), each multiplied by the coefficient \((1-{\widehat{w}}(\xi ))^N\). Since, by construction, \(0 \le {\widehat{w}}(\xi )\le 1\), \(\forall \xi \in {\mathbb {R}}\), then \((1-{\widehat{w}}(\xi ))^N\) is equal to 1 if and only if \({\widehat{w}}(\xi )=0\), whereas for all the other frequencies it is smaller than 1 and it tends to zero as N grows. The \((1-{\widehat{w}}(\xi ))^N\) quantity represents in practice the percentage with which each frequency is contained in the reconstruction of an IMF from the Fourier transform of the original signal. The higher is the number of iterations N the narrower are the intervals of frequencies that are almost completely captured in each IMF. And as \(N\rightarrow \infty \) such intervals coalesce into isolated points corresponding to the zeros of \({\widehat{w}}\).

2.1.1 Convergence with a threshold

We start recalling a few properties regarding the filter functions w. Assuming w(x), \(x\in {\mathbb {R}}\), is a filter function supported on \((-1,\ 1)\), if we use the linear scaling described in (9), then we can construct

$$\begin{aligned} w^a(x)=\frac{1}{a} w\left( \frac{x}{a}\right) \end{aligned}$$
(12)

where \(w^a(x)\) is supported on \((-a,\ a)\).

If we define \({\widehat{w}}(\xi )=\int _{-\infty }^{+\infty } w(x) e^{-i\xi x 2 \pi } {\text {d}}x\), then

$$\begin{aligned} \widehat{w^a}(\xi )=\int _{-\infty }^{+\infty } \frac{1}{a}w\left( \frac{x}{a}\right) e^{-i\xi \frac{x}{a} a 2 \pi } {\text {d}}x={\widehat{w}}(a\xi ) \end{aligned}$$
(13)

Therefore, if \(\xi _0\) is a root of \({\widehat{w}}(\xi )=0\), then \(\frac{\xi _0}{a}\) is a root of \(\widehat{w^a}(\xi )=0\) because \(\widehat{w^a}\left( \frac{\xi _0}{a}\right) ={\widehat{w}}\left( a\frac{\xi _0}{a}\right) ={\widehat{w}}(\xi _0)=0\).

We remind that, since w are compactly supported functions, their Fourier transform are defined on \({\mathbb {R}}\) and they have zeros which are isolated points.

Given \(0< \gamma < 1\), we identify the set

$$\begin{aligned} I_{w,\gamma , N}=\left\{ \xi \in {\mathbb {R}}\ :\ {\widehat{w}}(\xi ) \le 1 - \root N \of {1-\gamma }\right\} . \end{aligned}$$
(14)

As \(N\rightarrow \infty \) the quantity \(1 - \root N \of {1-\gamma }\rightarrow 0\), therefore \(I_{w,\gamma , N}\) coalesces into isolated points corresponding to the zeros of \({\widehat{w}}\).

If we consider filters like the Fokker-Planck filters [14] or any filter with smooth finite support properties we must have that, for a fixed \(N\in {\mathbb {N}}\) and \(\gamma >0\), there exists \(\varXi _0>0\) such that

$$\begin{aligned} {\widehat{w}}(\xi ) \le 1 - \root N \of {1-\gamma } < 1 \qquad \text { for all }\ |\xi |\ge \varXi _0 \end{aligned}$$
(15)

In fact, since \(\int |w(x)|^2{\text {d}}x < + \infty \) with w(x) smooth function, then \(\int |{\widehat{w}}(\xi )|^2{\text {d}}\xi < + \infty \) which implies that \({\widehat{w}}(\xi )\) decays as \(|\xi |\rightarrow \infty \).

So for a filter w with smooth finite support properties the set \(I_{w,\gamma , N}\) is made up of a finite number of disjoint compact intervals, containing zeros of \({\widehat{w}}\), together with the intervals \((-\infty ,\ -\varXi _0]\) and \([\varXi _0,\ \infty )\).

Furthermore if we scale these filters using a linear scaling with coefficient \(a>1\) it follows from the previous observations that \(\varXi _0\rightarrow 0\) and, as a consequence, \(I_{w,\gamma , N}\) converges to \({\mathbb {R}}\backslash \{0\}\).

As an example of a compactly supported filter we can consider the triangular filter function

$$\begin{aligned} w(x) = \left\{ \begin{array}{ll} \frac{1}{L}-\frac{1}{L^2}|x| &{}\quad \text {for} \; |x|\le L\\ 0 &{} \quad \text {otherwise} \\ \end{array} \right. \end{aligned}$$
(16)

whose Fourier transform is

$$\begin{aligned} {\widehat{w}}(\xi ) = \frac{1}{L}\frac{\sin ^2\left( L\pi \xi \right) }{\left( \pi \xi \right) ^2}. \end{aligned}$$
(17)

The triangular filter and its Fourier transform are depicted in Fig. 1.

Fig. 1
figure 1

Left panel, triangular filter (16) with \(L=1\). Right panel, in black the Fourier transform (17) and in red the threshold value \(1 - \root N \of {1-\gamma }\)

Given the threshold value \(1 - \root N \of {1-\gamma }\) depicted in the right panel of Fig. 1 and the triangular filter (16) with \(L=1\), the set \(I_{w,\gamma , N}\) is made up of four intervals: two compactly supported and centered around 1/2 and \(-\,1/2\), and other two starting around 0.8 and \(-\,0.8\) and ending at infinity and minus infinity, respectively.

We can use the threshold value \(1 - \root N \of {1-\gamma }\) in the computation of an IMF as follows: given (11), whenever \((1-{\widehat{w}}(\xi ))^N \ge 1-\gamma \), we substitute \({\widehat{w}}(\xi )\) with zero. This is equivalent to setting \({\widehat{w}}(\xi )=0\) whenever \(\xi \in I_{w,\gamma , N}\).

Therefore, using the previously described thresholding and based on Theorem 2, Algorithm 1 converges to

$$\begin{aligned} \text {IMF}_1^\text {TH} = \int _{{\mathbb {R}}\backslash I_{w,\gamma , N}} (1-{\widehat{w}}(\xi ))^N {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi + \int _{I_{w,\gamma , N}} {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \quad \forall N\ge N_0 \end{aligned}$$
(18)

where \(I_{w,\gamma , N}\) is defined in (14).

We are now ready to prove the following

Proposition 1

Assuming that all the hypotheses of Theorem 2 are fulfilled, then for every \(\epsilon >0\) there exist a stopping criterion value \(\delta >0\) and a threshold \(0<\gamma <1\) such that

$$\begin{aligned} \left\| \text {IMF}_1-\text {IMF}_1^\text {TH}\right\| \le \frac{\epsilon }{2}, \qquad \left\| \text {IMF}_1^\text {TH}-\text {IMF}_1^\text {SC}\right\| \le \frac{\epsilon }{2} \end{aligned}$$
(19)

and

$$\begin{aligned} \left\| \text {IMF}_1-\text {IMF}_1^\text {SC}\right\| \le \epsilon \end{aligned}$$
(20)

where \(\text {IMF}_1\), \(\text {IMF}_1^\text {SC}\), and \(\text {IMF}_1^\text {TH}\) are defined in (4), (11), and (18) respectively.

Proof

First of all we have that

$$\begin{aligned} \left\| \text {IMF}_1-\text {IMF}_1^\text {SC}\right\| \le \left\| \text {IMF}_1-\text {IMF}_1^\text {TH}\right\| + \left\| \text {IMF}_1^\text {TH}-\text {IMF}_1^\text {SC}\right\| \end{aligned}$$

where

$$\begin{aligned}&\left\| \text {IMF}_1-\text {IMF}_1^\text {TH}\right\| \nonumber \\&\quad \le \left\| \int _{\mathbb {R}}{\widehat{s}}(\xi ) \chi _{\left\{ \xi \in {\mathbb {R}}\ |\ {\widehat{w}}(\xi )=0 \right\} } e^{2\pi i \xi x} {\text {d}}\xi \right. \nonumber \\&\left. \qquad - \int _{{\mathbb {R}}\backslash I_{w,\gamma , N}} (1-{\widehat{w}}(\xi ))^N{\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi - \int _{I_{w,\gamma , N}} {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \nonumber \\&\quad \le \left\| \int _{{\mathbb {R}}\backslash I_{w,\gamma , N}} (1-{\widehat{w}}(\xi ))^N{\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| + \left\| \int _{I_{w,\gamma , N}\backslash \left\{ \xi \in {\mathbb {R}}\ |\ {\widehat{w}}(\xi )=0 \right\} } {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \nonumber \\ \end{aligned}$$
(21)

and

$$\begin{aligned} \left\| \text {IMF}_1^\text {TH}-\text {IMF}_1^\text {SC}\right\|\le & {} \left\| \int _{{\mathbb {R}}\backslash I_{w,\gamma , N}} (1-{\widehat{w}}(\xi ))^N{\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right. \nonumber \\&\left. \quad + \int _{I_{w,\gamma , N}} {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi - \int _{\mathbb {R}}(1-{\widehat{w}}(\xi ))^N{\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \nonumber \\\le & {} \left\| \int _{I_{w,\gamma , N}} \left[ 1-(1-{\widehat{w}}(\xi ))^N\right] {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \end{aligned}$$
(22)

From (14) and the fact that \(\int _{I_{w,\gamma , N}} {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \rightarrow \int _{ \left\{ \xi \in {\mathbb {R}}\ |\ {\widehat{w}}(\xi )=0 \right\} } {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \) as \(\gamma \rightarrow 0\) or \(N\rightarrow \infty \), it follows that there exist \(N_1\in {\mathbb {N}}\) big enough and \(0<\gamma _1<1\) small enough such that

$$\begin{aligned} \left\| \int _{I_{w,\gamma _1, N_1}\backslash \left\{ \xi \in {\mathbb {R}}\ |\ {\widehat{w}}(\xi )=0 \right\} } {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \le \frac{\epsilon }{4} \end{aligned}$$

Furthermore there exist \(0<\gamma _2<1\) small enough and a \(N_2\in {\mathbb {N}}\) so that

$$\begin{aligned} \left\| \int _{I_{w,\gamma _2, N_2}} \left[ 1-(1-{\widehat{w}}(\xi ))^{N_2}\right] {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \le \frac{\epsilon }{2} \end{aligned}$$

in fact as \(\gamma _2\rightarrow 0\) the interval \(I_{w,\gamma _2, N_2}\) tends to the set of frequencies corresponding to the zeros of \({\widehat{w}}(\xi )\). Given \(\gamma =\min \left\{ \gamma _1,\ \gamma _2\right\} \), then there exists \(N_3\in {\mathbb {N}}\) big enough such that \((1-{\widehat{w}}(\xi ))^{N_3}\) is small enough in order to have

$$\begin{aligned} \left\| \int _{{\mathbb {R}}\backslash I_{w,\gamma , N}} (1-{\widehat{w}}(\xi ))^{N_3}{\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| \le \frac{\epsilon }{4} \end{aligned}$$

If we consider \(N_0=\max \left\{ N_1,\ N_2,\ N_3\right\} \) there exists \(\delta >0\) such that (10) holds true for every \(N\ge N_0\).\(\square \)

This proposition implies that \(\text {IMF}_1^\text {TH}\) can be as close as we like to both \(\text {IMF}_1^\text {SC}\) and \(\text {IMF}_1\) if we choose wisely the stopping criterion value \(\delta \) and the threshold \(\gamma \).

2.2 IF outer loop convergence

We do have now all the tools needed to study the Iterative Filtering outer loop convergence.

Definition 2

(Significant IMFs with respect to \(\eta >0\)) Fixed \(\eta >0\) and given a signal s and its decomposition in IMFs obtained using Algorithm 1, then we define significant IMFs with respect to \(\eta \) all the IMFs whose \(L^\infty \)-norm is bigger than \(\eta \).

Theorem 3

Given a signal \(s\in L^2({\mathbb {R}})\cap L^\infty ({\mathbb {R}})\), whose continuous frequency spectrum is compactly supported with upper limit \(B>0\) and lower limit \(b>0\), and such that \(\Vert {\widehat{s}}\Vert _{\infty } = c<\infty \), chosen a filter w produced as convolution of a filter with itself, fixed \(\delta >0\) and \(\eta >0\).

Then the inner loop of Algorithm 1 converges to (11) and the outer loop produces only a finite number \(M\in {\mathbb {N}}\) of significant IMFs whose norm is bigger than \(\eta \).

Proof

Let us consider the Fourier transform of the signal s. From the hypotheses it follows that \(|{{\widehat{s}}}(\xi )| = 0\) for every \(\xi \ge B\).

We can assume that Algorithm 1 in the first step of its outer loop starts selecting a filter \(w_1\) such that the zero of \(\widehat{w_1}\) with smallest frequency is at B. We recall in fact that one of the possible way to choose the filter length is based on the Fourier transform of s, as explained in Sect. 2. Given \(\delta >0\) we can identify \(N_1\in {\mathbb {N}}\) such that (10) is fulfilled for every \(N\ge N_1\).

Now, from the hypothesis that \(\Vert {\widehat{s}}\Vert _{\infty } = c<\infty \) it follows there exists the upper bound c on \({{\widehat{s}}}(\xi )\) uniformly on \(\xi \in {\mathbb {R}}\). From the hypotheses on the filter function it follows that \(0<\widehat{w_1}<1\), ref. end of Sect. 2 in [14]. Furthermore, from the assumption on the lower bound b and upper bound B of the continuous frequency spectrum of s, the fact that \(\left\| e^{2\pi i \xi x} \right\| _{\infty }\le 1\) for every \(x,\xi \in {\mathbb {R}}\), by definition of the interval \(I_{w_1,\gamma , {\widetilde{N}}_1}\), and for every \({\widetilde{N}}_1\ge N_1\) and \(0<\gamma < \frac{\eta }{2c(B-b)}\), it follows that

$$\begin{aligned}&\left\| \text {IMF}_1^\text {SC}-\text {IMF}_1^\text {TH}\right\| _{\infty } \nonumber \\&\quad \le \left\| \int _{[b,\ B]\cap I_{w_1,\gamma , {\widetilde{N}}_1}} \left[ 1-(1-\widehat{w_1}(\xi ))^{{\widetilde{N}}_1}\right] {\widehat{s}}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| _{\infty } \nonumber \\&\quad \le \int _{[b,\ B] \cap I_{w_1,\gamma , {\widetilde{N}}_1}} \left\| \left[ 1-(1-\widehat{w_1}(\xi ))^{{\widetilde{N}}_1}\right] \right\| _{\infty } \left\| {\widehat{s}}(\xi ) \right\| _{\infty } \left\| e^{2\pi i \xi x} \right\| _{\infty } {\text {d}}\xi \nonumber \\&\quad \le c \int _{[b,\ B]\cap I_{w_1,\gamma , {\widetilde{N}}_1}} \left\| \left[ 1-(1-\widehat{w_1}(\xi ))^{{\widetilde{N}}_1}\right] \right\| _{\infty } {\text {d}}\xi \le c\gamma (B-b) < \frac{\eta }{2} \end{aligned}$$
(23)

In particular we point out that \(I_{w_1,\gamma , {\widetilde{N}}_1}\), defined as in (14), covers the interval of frequencies \([B-r_1,\ B+r_1]\), for some \(r_1>\varepsilon >0\).

This last inequality follows from the fact that if we scale linearly the filter function w to enlarge its support, as in (12) for \(a>1\), its Fourier transform is proportionally shrunk (13). However the signal s does have a lower bound b in the continuous frequency spectrum which implies that the filter function \(w_1\) cannot have a too wide support and as a consequence its Fourier transform cannot be too much squeezed. Therefore it does exist \(\varepsilon >0\) which lower bounds the radius \(r_1\).

If \(\left\| \text {IMF}_1^\text {TH}\right\| _{\infty }< \frac{\eta }{2}\) then we can for sure regard this component as not significant because \(\left\| \text {IMF}_1^\text {SC}\right\| _{\infty }\le \left\| \text {IMF}_1^\text {SC}-\text {IMF}_1^\text {TH}\right\| _{\infty } + \left\| \text {IMF}_1^\text {TH}\right\| _{\infty }< \eta \). Otherwise, assuming \(\left\| \text {IMF}_1^\text {TH}\right\| _{\infty } \ge \frac{\eta }{2}\), if \(\left\| \text {IMF}_1^\text {SC}\right\| _{\infty }\ge \eta \), then \(\text {IMF}_1^\text {SC}\) represents the first significant IMF in the decomposition. This conclude the first step of the outer loop in Algorithm 1.

In the second step of the outer loop Algorithm 1 iterates the previous passages using now the remainder signal \(s_2=s-\text {IMF}_1^{SC}\) and selecting a filter \(w_2\) such that the zero of \(\widehat{w_2}\) with smallest frequency is at \(B-r_1\).

Also in this case, given \(\delta >0\), we can identify \(N_2\in {\mathbb {N}}\) such that (10) is fulfilled for every \(N\ge N_2\). Furthermore \({\widehat{s}}_2(\xi )={\widehat{s}}(\xi )-\widehat{\text {IMF}}_1^{SC}(\xi )=\left[ 1-\left( 1-\widehat{w_2}(\xi )\right) ^{{\widetilde{N}}_1}\right] {\widehat{s}}(\xi ),\ \forall \xi \in {\mathbb {R}}\) which implies that

$$\begin{aligned} \left\| {\widehat{s}}_2\right\| _{\infty }\le \left\| \left[ 1-\left( 1-\widehat{w_2}(\xi )\right) ^{{\widetilde{N}}_1}\right] \right\| _{\infty }\left\| {\widehat{s}}(\xi )\right\| _{\infty }\le \left\| {\widehat{s}}(\xi )\right\| _{\infty } \end{aligned}$$
(24)

since \(\widehat{w_2}(\xi )\in [0,\ 1],\ \forall \xi \in {\mathbb {R}}\) [14]. Hence \({\widehat{s}}_2\) has the same uniform upper bound c over all \(\xi \in {\mathbb {R}}^+\) as \({{\widehat{s}}}(\xi )\).

Therefore

$$\begin{aligned} \left\| \text {IMF}_2^\text {SC}-\text {IMF}_2^\text {TH}\right\| _{\infty }\le & {} \left\| \int _{I_{w_2,\gamma , {\widetilde{N}}_2}} \left[ 1-(1-\widehat{w_2}(\xi ))^{{\widetilde{N}}_2}\right] \widehat{s_2}(\xi ) e^{2\pi i \xi x}{\text {d}}\xi \right\| _{\infty }\nonumber \\\le & {} c\gamma (B-b) < \frac{\eta }{2} \end{aligned}$$
(25)

for every \({\widetilde{N}}_2\ge N_2\) and \(0<\gamma < \frac{\eta }{2c(B-b)}\).

Furthermore \(I_{w_2,\gamma , {\widetilde{N}}_2}\) covers the interval of frequencies \([B-r_2,\ B+r_2]\), for some \(r_2>\varepsilon >0\). This last inequality follows from the same reasoning as before and the fact that the lower bound on the continuous frequency spectrum of \(s_2\) is again b, by construction of \(s_2\), the fact that \(\gamma \) is fixed for every IMF and the Fourier transform of the scaled filter \(w_2\) is a squeezed version of \({{\widehat{w}}}\), ref. equation (13).

If \(\left\| \text {IMF}_2^\text {TH}\right\| _{\infty }< \frac{\eta }{2}\) then we can regard this component as not significant. If instead \(\left\| \text {IMF}_2^\text {TH}\right\| _{\infty }\ge \frac{\eta }{2}\) and \(\left\| \text {IMF}_2^\text {SC}\right\| _{\infty }\ge \eta \), then \(\text {IMF}_2^\text {SC}\) represents another significant IMF in the decomposition.

The subsequent outer loop steps follow similarly. The existence of the lower limit \(\varepsilon \) for all \(r_k>0\), \(k\ge 1\), ensures that we can have a finite coverage of the interval of frequencies \([b,\ B]\). In particular the algorithm generates a set \(\left\{ r_k\right\} _{k=1}^{R}\) such that \(\sum _{k=1}^R r_k = B-b\) and there exists a natural number \(0\le M\le R\) which represents the number of significant IMFs with respect to \(\eta \).\(\square \)

We point out that this theorem holds true also if we consider the \(L^2\)-norm instead of the \(L^\infty \)-norm thanks to the inclusion of \(L^p\) spaces on a finite measure space.

From this Theorem it follows that IF with a stopping criterion allows to decompose a signal into a finite number of components given by (11) each of which contains frequencies of the original signal filtered in a smart way.

We observe also that this theorem, together with Theorems 1 and 2 , allow to conclude that the IF method can not produce fake oscillations. Each IMF is in fact containing part of the oscillatory content of the original signal, as described in (4) and (11).

3 IF algorithm in the discrete setting

Real life signals are discrete and compactly supported, therefore we want to analyze the IF algorithm discretization and study its properties.

Consider a signal s(x), \(x\in {\mathbb {R}}\), we assume for simplicity it is supported on \([0,\ 1]\), sampled at n points \(x_j= \frac{j}{n-1}\), with \(j= 0,\ldots , n-1\), with a sampling rate which allows to capture all its fine details, so that aliasing will not play any role. The goal is to decompose the vector \(\left[ s(x_j)\right] _{j=0}^{n-1}\) into vectorial IMFs. Without loosing generality we can assume that \(\Vert \left[ s(x_j)\right] \Vert _2=1\).

From now on, to simplify the formulas, we use the notation \(s=\left[ s(x_j)\right] _{j=0}^{n-1}\). Furthermore, if not specified differently, we consider as matrix norm the so called Frobenius norm \(\Vert A\Vert _2=\sqrt{\sum _{i,\ j=0}^{n-1}\left| a_{ij}\right| ^2}\) which is unitarily invariant.

Definition 3

A vector \(w\in {\mathbb {R}}^n\), n odd number, is called a filter if its values are symmetric with respect to the middle, nonnegative, and \(\sum _{p=1}^n w_p = 1\).

We assume that a filter shape has been selected a priori, like one of the Fokker-Planck filters described in [14], and that some invertible and monotone scaling function \(g_m\) has been chosen so that \(w_m(\xi )\) can be computed as described in (8). Therefore, assuming \(s_1=s\), the main step of the IF method becomes

$$\begin{aligned} s_{m+1}(x_i)= & {} s_{m}(x_i)-\int _{x_i-l_m}^{x_i+l_m} s_m(y)w_m(x_i-y){\text {d}y}\approx s_{m}(x_i) \nonumber \\&- \sum _{x_j=x_i-l_m}^{x_i+l_m} s_m(x_j)w_m(x_i-x_j)\frac{1}{n}, \quad j=0,\ldots ,n-1 \end{aligned}$$
(26)

In matrix form we have

$$\begin{aligned} s_{m+1}=(I-W_m)s_m \end{aligned}$$
(27)

where

$$\begin{aligned} W_m=\left[ w_m(x_i-x_j)\cdot \frac{1}{n}\right] _{i,\ j=0}^{n-1}=\left[ \frac{w(g_{m}^{-1}(x_i-x_j))}{\sum _{z_r=-1}^{1} w(z_r) |g'_{m}(z_r)| \varDelta z_r}\cdot \frac{1}{n}\right] _{i,\ j=0}^{n-1} \end{aligned}$$
(28)

Algorithm 2 provides the discrete version of Algorithm 1.

figure b

We remind that the first while loop is called outer loop, whereas the second one inner loop.

The first IMF is given by \(\text {IMF}_1=\lim _{m\rightarrow \infty } (I-W_m)s_m\), where we point out that the matrix \(W_m=[w_m(x_i-x_j)]_{i,\ j=0}^{n-1}\) depends on the half support length \(l_m\) at every step m.

However in the implemented code the value \(l_m\) is usually computed only in the first iteration of each inner loop and then kept constant in the subsequent steps, so that the matrix \(W_m\) is equal to W for every \(m\in {\mathbb {N}}\). So the first IMF is given by

$$\begin{aligned} \text {IMF}_1=\lim _{m\rightarrow \infty } (I-W)^{m} s \end{aligned}$$
(29)

Furthermore in the implemented algorithm we do not let m to go to infinity, instead we use a stopping criterion as described in Sect. 2.1. For instance, we can define the following quantity

$$\begin{aligned} SD:=\frac{\Vert s_{m+1}-s_{m}\Vert _2}{\Vert s_{m}\Vert _2} \end{aligned}$$
(30)

and we can stop the process when the value SD reaches a certain threshold. Another possible option is to introduce a limit on the maximal number of iterations for all the inner loops. It is always possible to adopt different stopping criteria for different inner loops.

If we consider the case of linear scaling, making use of (9), the matrix \(W_m\) becomes

$$\begin{aligned} W_m=\left[ \frac{w\left( \frac{x_i-x_j}{l_m}\right) }{l_m}\cdot \frac{1}{n}\right] _{i,\ j=0}^{n-1}=\left[ \frac{w\left( \frac{i-j}{(n-1)l_m}\right) }{l_m}\cdot \frac{1}{n}\right] _{i,\ j=0}^{n-1} \end{aligned}$$
(31)

We point out that the previous formula represent an ideal \(W_m\), however we need to take into account the quadrature formula we use to compute the numerical convolution in order to build the appropriate \(W_m\) to be used in the DIF algorithm.

For instance, if we use the rectangle rule, we need to substitute the exact value of w(y) at y with its average value in the interval of length \(\frac{1}{n}\) centered in y and multiply this value for the length of interval itself. Furthermore we should handle appropriately the boundaries of the support of w(y), in fact the half length of the support is, in general, a non integer value. This can be done by handling separately the first and last interval in the quadrature formula. In fact we can scale the value of the integral on these two intervals proportionally to the actual length of the intervals themselves.

If we take into account all the aforementioned details we can reproduce a matrix \(W_m\) which is row stochastic.

We observe that in the implemented code we simply scale each row of \(W_m\) by its sum so that the matrix becomes row stochastic.

3.1 Spectrum of \(W_m\)

Since \(W_m\in {\mathbb {R}}^{n\times n}\) represents the discrete convolution operator, it can be a circulant matrix, Toeplitz matrix or it can have a more complex structure. Its structure depends on the way we extend the signal outside its boundaries.

From now on we assume for simplicity that n is an odd natural number, and that we have periodic extension of signals outside the boundaries, therefore \(W_m\) is a circulant matrix given by

$$\begin{aligned} W_m=\left[ \begin{array}{cccc} c_0 &{}\quad c_{n-1} &{}\quad \ldots &{}\quad c_1 \\ c_{1} &{}\quad c_0 &{}\quad \ldots &{}\quad c_2 \\ \vdots &{}\quad \vdots &{} \quad \ddots &{}\quad \vdots \\ c_{n-1} &{} \quad c_{n-2} &{} \quad \ldots &{}\quad c_0 \\ \end{array} \right] \end{aligned}$$
(32)

where \(c_j\ge 0\), for every \(j=0,\ldots , \ n-1\), and \(\sum _{j=0}^{n-1}c_j=1\). Each row contains a circular shift of the entries of a chosen vector filter \(w_m\). For the non periodic extension case we refer the reader to [12].

Denoting by \(\sigma (W_m)\) the spectrum of the matrix, in the case of a circulant matrix it is well known that the eigenvalues \(\lambda _j\in \sigma (W_m)\), \(j=0,\ldots , \ n-1\) are given by the formula

$$\begin{aligned} \lambda _j=c_0+c_{n-1}\omega _j+\cdots +c_1 \omega _j^{n-1},\quad \text { for } \qquad j=0,\ldots , \ n-1 \end{aligned}$$
(33)

where \(i=\sqrt{-1}\), and \(\omega _j=e^\frac{2\pi i j}{n}\) j-th power of the n-th root of unity, for \(j=0,\ldots , \ n-1\).

Since we construct the matrices \(W_m\) using symmetric filters \(w_m\), we have that \(c_{n-j}=c_j\) for every \(j=1,\ldots ,\frac{n-1}{2}\). Hence \(W_m\) is circulant, symmetric and

$$\begin{aligned} \lambda _j= & {} c_0+c_{1}\left( \omega _j+\omega _j^{n-1}\right) +c_{2}\left( \omega _j^2+\omega _j^{n-2}\right) \cdots + c_{\frac{n-1}{2}}\left( \omega _j^\frac{n-1}{2}+\omega _j^{\frac{n+1}{2}}\right) \nonumber \\= & {} c_0+\sum _{k=1}^{\frac{n-1}{2}}c_{k}\left( \omega _j^k+\omega _j^{n-k}\right) =c_0+\sum _{k=1}^{\frac{n-1}{2}}c_{k}\left( e^{\frac{2\pi i j}{n}k}+e^{\frac{2\pi i j}{n}(n-k)}\right) \nonumber \\= & {} c_0+\sum _{k=1}^{\frac{n-1}{2}}c_{k}\left( e^{\frac{2\pi i j}{n}k}-e^{\frac{2\pi i j}{n}k}e^{2\pi i j}\right) \end{aligned}$$
(34)

Therefore

$$\begin{aligned} \lambda _j = c_0+2\sum _{k=1}^{\frac{n-1}{2}}c_{k} \cos \left( \frac{2\pi j k}{n}\right) ,\quad \text { for } \qquad j=0,\ldots , \ n-1 \end{aligned}$$
(35)

It is evident that, for any \(j=0,\ldots , \ n-1\), \(\lambda _j\) is real and \(\sigma (W_m)\subseteq [-1,\ 1]\) since \(W_m\) is a stochastic matrix.

Furthermore, if we make the assumption that the filter half supports length is always \(l_m\le \frac{n-1}{2}\), then the entries \(c_j\) of the matrix \(W_m\) are going to be zero at least for any \(j\in [\frac{n-1}{4},\frac{3}{4}(n-1)]\).

We observe that the previous assumption is reasonable since it implies that we can study oscillations with periods at most equal to half of the length of a signal.

Theorem 4

Considering the circulant matrix \(W_m\) given in (32), assuming that \(n>1\), \(\sum _{j=0}^{n-1}c_j=1\), \(c_j\ge 0\), and \(c_{n-j}=c_j\), for every \(j=1,\ldots ,n-1\).

Then \(W_m\) is non-defective, diagonalizable and has real eigenvalues.

Furthermore, if the filter half supports length \(l_m\) is small enough so that \(c_0=1\) and \(c_j=0\), for every \(j=1,\ldots , \ n-1\), then we have n eigenvalues \(\lambda _j\) all equal 1.

Otherwise, if the filter half supports length \(l_m\) is big enough so that \(c_0<1\) and the values \(c_k\) correspond to the discretization of a function with compact and connected support, then there is one and only one eigenvalue equal to 1, which is \(\lambda _0\), all the other eigenvalues \(\lambda _j\) are real and strictly less than one in absolute value. So they belong to the interval \((-1,1)\).

Proof

First of all we recall that symmetric matrices are always non-defective, diagonalizable and with a real spectrum.

In the case of \(c_0=1\) the conclusion follows immediately from the observation that \(W_m\) reduces to an identity matrix.

When \(c_0<1\) from (35) it follows that \(\lambda _0=1\) and all the other eigenvalues belong to the interval \([-1,1]\). Let us assume, by contradiction, that there exists another eigenvalue \(\lambda _d=1\) for some \(d\in \left\{ 1,\ 2,\ldots ,\ n-1\right\} \). We assume for simplicity that n is odd. The proof in the even case works in a similar way.

From (35) and the fact that \(c_{n-j}=c_j\), for every \(j=1,\ldots ,\frac{n-1}{2}\), it follows that

$$\begin{aligned} \lambda _d = c_0+2\sum _{k=1}^{\frac{n-1}{2}}c_{k} \cos \left( \frac{2\pi d k}{n}\right) ,\quad \text { for } \qquad d\in \left\{ 1,\ 2,\ldots ,\ n-1\right\} \end{aligned}$$
(36)

In the right hand side we have among the terms \(c_{k}\), which by themselves would add up to 1, at least \(c_1>0\) which is multiplied by \(\cos \left( \frac{2\pi d}{n}\right) <1\) for any \(d\in \left\{ 1,\ 2,\ldots ,\ n-1\right\} \). Therefore the right hand side will never add up to 1. Hence we have a contradiction.

From (35) it follows also that \(\lambda _d\ne -1\) for any \(d\in \left\{ 1,\ 2,\ldots ,\ n-1\right\} \) because \(\lambda _d\) is given by a convex combination of cosines and \(+1\).

So all the eigenvalues of \(W_m\) except \(\lambda _0\) are real and strictly less than one in modulus.\(\square \)

We observe that in the discrete iterative filtering algorithm the entries \(c_k\) derive from the discretization of a filter function which is by Definition 1 compactly supported. Furthermore, since the filter function is used to compute the moving average of a signal, it is reasonable to require its support to be connected.

Form this theorem it follows that

Corollary 1

Considering the matrix \(W_m\) given in the previous theorem, assuming \(c_0<1\) and that \(W_m\) is constructed using a filter \(w_m\) that is produced as convolution of a symmetric filter \(h_m\) with itself, then there is one and only one eigenvalue equal to 1, all the other eigenvalues belong to the interval [0, 1).

Proof

The proof follows directly from the previous theorem and the fact that the matrix \(W_m={\widetilde{W}}_m^T*{\widetilde{W}}_m={\widetilde{W}}_m^2\), where \({\widetilde{W}}_m\) is a circulant symmetric convolution matrix associated with the filter \({\widetilde{w}}_m\).\(\square \)

Corollary 2

Assuming \(c_0<1\), the eigenvector of \(W_m\) corresponding to \(\lambda _0=1\) is a basis for the kernel of the matrix \((I-W_m)\), which has dimension one.

Before presenting the main proposition we recall that, given a circulant matrix \(C=\left[ c_{pq}\right] _{p,\ q=0,\ldots ,n-1}\), its eigenvalues are

$$\begin{aligned} \lambda _p\ =\ \sum _{q=0}^{n-1} c_{1q}e^{-2\pi i p \frac{q}{n}} \quad p\ =\ 0,\ldots ,\ n-1 \end{aligned}$$
(37)

and the corresponding eigenvectors are

$$\begin{aligned} u_p\ =\ \frac{1}{\sqrt{n}} \left[ 1,\ e^{-2\pi i p\frac{1}{n}},\ldots ,\ e^{-2\pi i p\frac{n-1}{n}}\right] ^T \quad p\ =\ 0,\ldots ,\ n-1 \end{aligned}$$
(38)

which form an orthonormal set.

We recall that an eigenvalue of a matrix is called semisimple whenever its algebraic multiplicity coincides with its geometric multiplicity.

Proposition 2

Given a matrix \(W_m\), assuming that all the assumptions of Theorem 4 and Corollary 1 hold true, and assuming that \(W_m=W\) for any \(m\ge 1\). Given \(\left\{ \lambda _p\right\} _{p=0,\ldots ,n-1}\), semisimple eigenvalues of W, and the corresponding eigenvectors \(\left\{ u_p\right\} _{p=0,\ldots ,n-1}\), we define the matrix U having as columns the eigenvectors \(u_p\). Assuming that W has k zero eigenvalues, where k is a number in the set \(\in \{0,\ 1,\ldots ,\ n-1\}\),

Then

$$\begin{aligned} \lim _{m\rightarrow \infty }(I-W)^m=U Z U^T \end{aligned}$$
(39)

where U is unitary and Z is a diagonal matrix with entries all zero except k elements in the diagonal which are equal to one.

Proof

From Theorem 4 we know that W is diagonalizable, therefore the matrix U is orthogonal and all the eigenvalues of W are semisimple. Furthermore, since the eigenvectors of W are orthonormal, it follows that U is a unitary matrix. Hence \(W=U D U^T\), where D is a diagonal matrix containing in its diagonal the eigenvalues of W. From the assumption that W is associated with a double convolved filter it follows that the spectrum of W is contained in [0, 1], ref. Corollary 1. Therefore also the spectrum of \((I-W)\) is contained in [0, 1]. Furthermore

$$\begin{aligned} (I-W)=U(I-D)U^T \end{aligned}$$

and \(I-D\) is a diagonal matrix whose diagonal entries are in the interval (0, 1) except the first one which equals 0, ref. Corollary 2, and k entries that are equal to 1. Hence

$$\begin{aligned} \lim _{m\rightarrow \infty }(I-W)^m=\lim _{m\rightarrow \infty }U(I-D)^m U^T= U Z U^T \end{aligned}$$

where Z is a diagonal matrix with entries all zero except k elements in the diagonal which are equal to one.\(\square \)

From the previous proposition it follows

Corollary 3

Given a signal \(s\in {\mathbb {R}}^n\), assuming that we are considering a doubly convolved filter, and the half filter support length is constant throughout all the steps of an inner loop,

Then the first outer loop step of the DIF method converges to

$$\begin{aligned} \text {IMF}_1=\lim _{m\rightarrow \infty }(I-W)^m s=U Z U^Ts \end{aligned}$$
(40)

So the DIF method in the limit produces IMFs that are projections of the given signal s onto the eigenspace of W corresponding to the zero eigenvalue which has algebraic and geometric multiplicity \(k\in \{0,\ 1,\ldots ,\ n-1\}\). Clearly, if W has only a trivial kernel then the method converges to the zero vector. We point out that since (37) is also the Discrete Fourier Transform (DFT) formula of the sequence \(\{c_{1q}\}_{q=0,\ldots ,n-1}\), where \(C=[c_{pq}]\) is a circulant matrix, it follows that the eigenvalues of W, can be computed directly as the DFT of the sequence \(\{w_{1q}\}_{q=0,\ldots ,n-1}\), by means of the Fast Fourier Transform (FFT). If we regard the DFT as a discretization of the Fourier Transform of the filter function w it becomes clear that, since the latter has only isolated zeros, in many cases we will not have eigenvalues exactly equal to zero. So in general W has only a trivial kernel and (40) converges to the zero vector. In order to ensure that the method produces a non zero vector we need to discontinue the calculation introducing some stopping criterion.

3.2 DIF inner and outer loop convergence in presence of a stopping criterion

If we assume that the half support length \(l_m\) is computed only in the beginning of each inner loop, then the first IMF is given by (29) and (40).

In order to have a finite time method we may introduce a stopping criterion in the DIF algorithm, like the condition

$$\begin{aligned} \Vert s_{m+1}-s_m\Vert _2<\delta \qquad \forall m\ge N_0 \end{aligned}$$
(41)

for some fixed \(\delta >0\).

Then, based on Corollary 3, we produce an approximated first IMF given by

$$\begin{aligned} \overline{\text {IMF}}_1=(I-W)^{N_0} s = U(I-D)^{N_0} U^T s. \end{aligned}$$
(42)

Theorem 5

Given \(s\in {\mathbb {R}}^n\), we consider the convolution matrix W defined in (32), associated with a filter vector w given as a symmetric filter h convolved with itself. Assuming that W has k zero eigenvalues, where k is a number in the set \(\in \{0,\ 1,\ldots ,\ n-1\}\), and fixed \(\delta >0\),

Then, calling \({\widetilde{s}}=U^T s\), for the minimum \(N_0\in {\mathbb {N}}\) such that it holds true the inequality

$$\begin{aligned} \frac{N_0^{N_0}}{\left( N_0+1\right) ^{N_0+1}}<\frac{\delta }{\Vert {\widetilde{s}}\Vert _\infty {\sqrt{n-1-k}}} \end{aligned}$$
(43)

we have that \(\left\| s_{m+1}-s_m\right\| _{2}<\delta \quad \forall m\ge N_0\) and the first IMF is given by

$$\begin{aligned} \overline{\text {IMF}}_1=U(I-D)^{N_0} U^T s= U P \left[ \begin{array}{ccccccc} 0 &{} &{} &{} &{} &{} &{} \\ &{} (1-\lambda _1 )^{N_0} &{} &{} &{} &{} &{} \\ &{} &{} \ddots &{} &{} &{} &{} \\ &{} &{} &{} (1-\lambda _{n-1-k} )^{N_0} &{} &{} &{} \\ &{} &{} &{} &{} 1 &{} &{} \\ &{} &{} &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} &{} &{} 1 \\ \end{array} \right] P^T U^T s \end{aligned}$$
(44)

where P is a permutation matrix which allows to reorder the columns of U, which correspond to eigenvectors of W, so that the corresponding eigenvalues \(\{\lambda _p\}_{p=1,\ldots ,\ n-1}\) are in decreasing order.

Proof

$$\begin{aligned} \Vert s_{m+1}-s_m\Vert _2= & {} \Vert (I-W)^{m+1}-(I-W)^{m}\Vert _2=\Vert U(I-D)^{m}(I-D-I)U^Ts\Vert _2 \nonumber \\= & {} \Vert (I-D)^{m}(I-D-I)U^Ts\Vert _2 = \Vert (I-D)^{m}(I-D-I){\widetilde{s}}\Vert _2\nonumber \\ \end{aligned}$$
(45)

since U is a unitary matrix and where \({\widetilde{s}}=U^T s\).

Given a permutation matrix P such that the entries of the diagonal \(PDP^T\) are the eigenvalues of W in decreasing order of magnitude, starting from \(\lambda _0=1\), and assuming that W has k zero eigenvalues, where k is a number in the set \(\in \{0,\ 1,\ldots ,\ n-1\}\), then

$$\begin{aligned}&\Vert (I-D)^{m}(I-D-I){\widetilde{s}}\Vert _2 \nonumber \\&\quad \le \left\| P\left[ \begin{array}{ccccccc} 0 &{} &{} &{} &{} &{} &{} \\ &{} (1-\lambda _1 )^m \lambda _1 &{} &{} &{} &{} &{} \\ &{} &{} \ddots &{} &{} &{} &{} \\ &{} &{} &{} (1-\lambda _{n-1-k} )^m \lambda _{n-1-k} &{} &{} &{} \\ &{} &{} &{} &{} 0 &{} &{} \\ &{} &{} &{} &{} &{} \ddots &{} \\ &{} &{} &{} &{} &{} &{} 0 \\ \end{array} \right] P^T\left[ \begin{array}{c} \Vert {\widetilde{s}}\Vert _\infty \\ \vdots \\ \Vert {\widetilde{s}}\Vert _\infty \\ \end{array} \right] \right\| _2 \nonumber \\&\quad \le {\sqrt{n-1-k}} \left( 1-\frac{1}{m+1} \right) ^m \frac{1}{m+1} \Vert {\widetilde{s}}\Vert _\infty = {\sqrt{n-1-k}} \frac{m^m}{(m+1)^{m+1}} \Vert {\widetilde{s}}\Vert _\infty \nonumber \\ \end{aligned}$$
(46)

because the function \((1-\lambda )^m \lambda \) achieves its maximum at \(\lambda =\frac{1}{m+1}\) for \(\lambda \in [0,\ 1]\).

Hence the stopping criterion (41) is fulfilled for \(N_0\) minimum natural number such that 43 holds true.\(\square \)

We observe that, as we mentioned earlier, since (37) is also the Discrete Fourier Transform (DFT) formula of the sequence \(\{c_{1q}\}_{q=0,\ldots ,n-1}\), it follows that the eigenvalues of \(W=\left[ w_{pq}\right] _{p,\ q=0,\ldots ,n-1}\), can be computed directly as the DFT of the sequence \(\{w_{1q}\}_{q=0,\ldots ,n-1}\), by means of the Fast Fourier Transform (FFT). This calculation can be done “off line”, in fact, once the filter shape w has been fixed, we can compute and store its FFT for different values of the size of its support. This fact, together with other previous results, can be used to improve the efficiency of the method as explained in the following section.

It is interesting to notice that each IMF is generated as a linear combination of elements in an orthonormal basis. Therefore we can regard the IMFs as elements of a frame which allows to decompose a given signal into a few significant components. From this prospective the IF algorithm can be viewed as a method that automatically produces elements of a frame associated with a signal. The possible connections between IF and the frame theory are fascinating, but out of the scope of the present work. We plan to follow this direction of research in a future work.

Regarding the DIF outer loop convergence they hold true the same results described in Sect. 2.2 for the continuous setting. In fact, while the inner loop of the IF algorithm requires a discretization to deal with discrete signals, the outer loop does not require any form of discretization and it works the very same as in the continuous setting.

4 Efficient implementation of the DIF algorithm

In this section we present an efficient implementation of the DIF algorithm applied to the decomposition of a signal s of length n, named Fast Iterative Filtering algorithm. This approach assumes implicitly that the signal at the boundaries is periodical. This is clearly a limitation, since most signals are far from being periodical at the edges, but it can be treated by pre-extending the signal and making it periodical at the new boundaries [12, 69].

We start from Theorem 5 which allows to compute each IMF as fast as the FFT. In fact, we can compute the IMF using (44) where the eigenvalues \(\{\lambda _k\}_{k=1,\ 2,\ldots ,\ n}\) can be evaluated using (37), or by means of the Fast Fourier Transform since (37) is equivalent to the Discrete Fourier Transform of the sequence \(\{w_{1q}\}_{q=0,\ldots ,n-1}\). Furthermore we recall that \(U^Ts\) is the DFT of s that can be computed using the FFT algorithm, whose computational complexity is \(n\log (n)\), and that multiplying on the left by the matrix \(U=\left[ u_k\right] \) is equivalent to computing the inverse DFT (iDFT) which can be done using the inverse FFT. Hence the following corollary holds true and provides an explicit formula for the calculation of each IMF.

Corollary 4

Given a signal \(s\in {\mathbb {R}}^n\), and a filter vector w derived from a symmetric filter h convolved with itself. Fixed \(\delta >0\), then, for the minimum \(N_0\in {\mathbb {N}}\) such that the stopping criterion \(SD=\frac{\Vert s_{N_0+1}-s_{N_0}\Vert _2}{\Vert s_{N_0}\Vert _2}<\delta \) is satisfied, the first IMF contained in s is given by

$$\begin{aligned} \text {IMF} = \sum _{k=0}^{n-1} u_k (1-\lambda _k)^{N_0}\sigma _k = \text {iDFT}\left( (I-D)^{N_0}\text {DFT}(s)\right) \end{aligned}$$
(47)

where \(\sigma _k\) represents the k-th element of the DFT of the signal s, and D is a diagonal matrix containing as entries the eigenvalues of the convolution matrix W defined in (32), associated with the filter vector w.

We exploit this theoretical new result in the so called Fast Iterative Filtering (FIF) method implemented for Matlab and available online.Footnote 2 The pseudocode of FIF is given in Algorithm 3.

figure c

In FIF method we speed up the calculations significantly by computing the convolution as product in the frequency domain via FFT, as suggested by Corollary 4. In the recently published paper [11] several tests and a comparison of the standard DIF and the newly proposed FIF method have been conducted on both artificial and real life signals. The results obtained in those examples show, as expected, that the two techniques produces decompositions that are identical, up to machine precision. However, based on those tests, the newly proposed FIF method proves to be roughly three order of magnitudes faster, from a computational time prospective, than the standard DIF method, when applied to signals containing \(10^4\) sample points or more, Table 1.

Table 1 Computational times versus the number of sample points n for the decomposition of both artificial and real life signals via DIF and FIF method

We point out that other approaches can be implemented and explored to further speed up the computational cost of the FIF method. For instance, one possibility is the precomputation of the number of iterations needed to achieve the required accuracy \(\delta \) in the computation of a certain IMF. This number of iterations can be approximated by the minimum \(N_0\in {\mathbb {N}}\) satisfying the inequality (43).

Another idea that can be implemented is the identification of the appropriate number of iteration via a bisection approach. The FIF algorithm inner loop can be initiated, in fact, with a reasonable big value of m and then, via a bisection approach, the minimal value of m which allows to satisfy the stopping criterion can be identified.

To speed up the FIF computation it is also possible to precompute the eigenvalues \( \lambda _k \) corresponding to any possible scaling of a filter w. In doing so we can reduce even more the computational time of the algorithm.

The FIF algorithm further acceleration is, however, out of the scope of this paper, whose focus is the numerical analysis of the IF method. The interested reader can find more ideas and details in this direction of research in the recently published paper [11].

5 Conclusions and outlook

In this work we tackle the problem of a complete analysis of the IF algorithm both in the continuous and discrete setting. In particular in the continuous setting we show how IF can decompose a signal into a finite number of so called IMFs and that each IMF contains frequencies of the original signal filtered in a “smart way”.

In the discrete setting we prove that the DIF method is also convergent and, in the case of periodic extension at the boundaries of the given signal, we provide an explicit formula for the a priori calculation of each IMF. From this equation it follows that each IMF is a smart summation of eigenvectors of a circulant matrix.

We show that no fake oscillations can be produced neither in the continuous nor in the discrete setting.

From the properties of the DIF algorithm and the explicit formula for the IMFs produced by this method and derived in this work, we propose new ideas that has been directly incorporated in the implemented algorithm in order to increase its efficiency and reduce its computational complexity. The result is the so called FIF method which allows to quickly decompose a signal by means of the FFT. This is an important result in this area of research which opens the doors to an almost instantaneous analysis of non stationary signals.

There are several open problems that remain unsolved. First of all from the proposed analysis it is clear that different filter functions have different Fourier transform and hence the decomposition produced by IF and DIF algorithms is directly influenced by this choice. In a future work we plan to study more in details the connections between the shape of the filters and the quality of the decomposition produced by these methods.

In the current work we analyzed the DIF assuming a periodic extension of the signals at the boundaries. We plan to study in a future work the behavior of the DIF method in the case of reflective, anti-reflective and other boundaries extensions of a signal.

Based on the numerical evidence [14, 17] we claim that the Iterative Filtering method is stable under perturbations of the signal. We plan to study rigorously such stability in a future work.

The results about the DIF algorithm convergence suggest that the method allows, in general, to automatically generate a frame associated with a given signal. We plan to further analyze this connection in a future work.

Finally we recall that it is still an open problem how to extend all the results obtained for the Iterative Filtering technique to the case of the Adaptive Local Iterative Filtering method, whose convergence and stability analysis is still under investigation [13, 14, 61].