1 Introduction

For describing and analyzing complex socio-economic objects and activities, economic-mathematical modeling (Kolosinska and Kolosinskyi 2013) has become one of the most effective methods in terms of mathematical models in combination with new engineering decisions, and the modeling also becomes part of economics itself. As an abstract category, knowledge economics (Svarc and Dabic 2017) must be expressed in a tangible concrete form. This can be achieved by means of mathematical modeling of its processes as managerial objects. The results of theoretical research and the development of mathematical models (methods) (Neittaanmäki et al. 2016) have been successfully used to solve real and important economic and financial problems: analysis and forecast of the flow of budget financial resources (Konchitchki and Patatoukas 2014), control of development and operation support at any budget level (Power et al. 2017), quality (risk) assessment (Bruzda 2017) of economic (financial) systems management in terms of energy-entropic approach, risk management (Boubaker and Raza 2017) of investment processes. Up to now, the traditional economic (financial) analysis methods are still widely used. However, the requirements of the market economy and global finance, and the rapid development of science and technology necessitate new, higher theoretical support and precise analytical methods.

In a wider framework that treats time series as signals, economic (financial) time series models are making use of new mathematical tools and signal processing technologies which go beyond traditional Fourier transform into two new and related methodologies: the time–frequency analysis and the time-scale method. Both have benefited from the newly developed mathematical tool: the wavelet transform (Tiwari et al. 2016; Jiang et al. 2017). The concept of “wavelet” originated from the study of time–frequency signal analysis, wave propagation and sampling theory (Morlet et al. 1982a, b). When signal is nonstationary, the traditional spectral analysis cannot preserve the time dependence of relevant patterns. Results of the wavelet transform can be presented as a contour map in the time–frequency plane, allowing the changing spectral composition of nonstationary signals to be measured and compared. So, the wavelet transform is more suitable for the analysis of economic data since most economic and financial time series are nonstationary (Yang et al. 2017). One of the main reasons for the discovery and use of wavelets and wavelet transforms is that the Fourier transform does not contain the local information of signals (Pal and Mitra 2017). Wavelet analysis allows a general function to be decomposed into a series of (orthogonal) basis functions called wavelets, with different frequency and time locations. In wavelet analysis, signals consist of different features in time and frequency domain, but their high-frequency components would have a shorter time duration than their low-frequency components (Dewandaru et al. 2016). A particular feature of the function can be identified with the positions of the wavelets into which it is decomposed (Chen and Li 2016). Since it provides a very useful tool for time–frequency localization, the wavelet transform has enjoyed an ever increasing success in various scientific, engineering, and signal processing applications. Like many other complex systems, economic and financial systems also include variables simultaneously interacting on different time scales (Tzagkarakis et al. 2016) so that relationships between variables can occur at different horizons. Therefore, the wavelet analysis is an exciting method for solving difficult problems in economics and finance.

In the field of economy and finance, prediction always plays a very important role (Singh and Challa 2016). Long-term prediction is used to control the behavior of a dynamic economic system, and short-term prediction is intended to give an accurate track of economic activities. The wavelet method has also received a large attention in analysis of economic and financial data for forecasting purpose; including crude oil price forecast (de Souza e Silva et al. 2010; Jammazi 2012; Jammazi and Aloui 2012; Chang and Lee 2015; Alzahrani et al. 2014; Tiwari 2013; He et al. 2012a, b; Naccache 2011; Yousefi et al. 2005), stock market (financial time series) forecast (Tiwari and Kyophilavong 2014; Kim and In 2007; In and Kim 2006; Bagheri et al. 2014; Huang 2011; Shin and Han 2000), commodity market forecast and energy futures prices (Vacha and Barunik 2012; Vacha et al. 2013), and metal market forecast and risk management (Kriechbaumer et al. 2014; Michis 2014; He et al. 2012a, b).

In the field of signal processing, a signal is rarely observed in isolation from a combination of noise since noise is present in various degrees in almost all environments; and noise can correspond to actual physical noise in the measuring devices, or to inaccuracies of the model used (Vaseghi 2008). Similarly, in economic and financial fields, almost all economic data records contain noise because of inevitable measurement errors and seasonalities. And the sources of noise mainly include Gaussian noise (Dhifaoui 2018), white noise (Benedetto et al. 2015; Li and Shang 2018; Kinateder et al. 2017; Oet et al. 2016; Gidea and Katz 2018), red (color) noise (Ftiti et al. 2015), shot noise (Egami and Kevkhishvili 2017; Liang and Lu 2017) and so on. In addition to the above noises in the classical sense, the economic systems also have their specific types of noise: micro-market noise (Hammoudeh and McAleer 2015), micro-structure noise (Zu and Boswijk 2014; Barunik and Hlinkova 2016), noise traders (Hoesli et al. 2017; Page and Siemroth 2017; Shahzad et al. 2017; Yildirim-Karaman 2018; Ramiah et al. 2015) and so on. Thus, traditionally, for noisy economic data, the corresponding economic (financial) behavior would have been regarded as almost completely unpredictable (Cao et al. 1996) since noise is a random process; whereas we now know that at least in principle it is possible to forecast their future evolution exactly (Wang et al. 2012; Furlaneto et al. 2017; Cavalcante et al. 2016). In practice, there are some limits to the predictability of economic and financial behaviors due to noise or measurement errors. Based on the noisy economic (financial) time series data, there are two ways to predict the economic behavior. One is to make prediction directly on the original time series without making noise reduction (Lahmiri 2016; Sehgal and Pandey 2015). Another is to reduce the noise firstly, then to make predictions on the denoised time series, and finally to translate the predictions of the denoised time series data back into predictions of the original data (Gao et al. 2010; Afanasyev and Fedorova 2016; Bai and Wei 2015). It could be argued that in practice, a better approach could often be to reduce noise in the data before performing the forecast. Since noisy model is not invertible, estimation of the noise-free data is quite difficult and requires new methods.

The success of a noise processing method depends on its ability to characterize and model the noise process, and to use the noise characteristics advantageously to differentiate the signal from the noise. This problem leads to some interesting forms of denoising (Jammazi et al. 2015; Sun and Meinl 2012). When looking at a financial time series, it is often useful to consider the data as observations sampled from a noisy version of some underlying functions, known as data generating process. The data generating process may be considered to be a function from a function space. We can specify very simple functions, known as atoms (Greenblatt 1998), which may be taken in linear combinations to represent any function within a particular function space. The famous continuous wavelets yield a redundant decomposition (Elad 2010), but they have properties that may be more suitable to noise than those of other decomposition techniques (Ftiti et al. 2015). Wavelet denoising methods have been widely used in financial (economic) data analysis and processing. They mainly include compressed sensing based denoising method (Yu et al. 2014), the generalized optimal wavelet decomposing algorithm (Sun et al. 2015), wavelet band least square method (Barunik and Hlinkova 2016), wavelet coherence and phase-difference method (Ferrer et al. 2016), a seasonal Mackey–Glass-GARCH (1, 1) model (Kyrtsou and Terraza 2010), instrumental variables combined with wavelet analysis method (Ramsey et al. 2010), wavelet denoising-based back propagation neural network (Wang et al. 2011), wavelet decomposition, particle swarm optimization and neural network (Chiang et al. 2016), the cross-wavelet power spectrum, wavelet coherency and phase difference method (Aguiar-Conraria et al. 2008), and statistical white noise tests and generalized variance portmanteau test method (Boubaker 2015).

The above analysis describes two aspects of the application of wavelet transform in economic and financial fields: prediction and denoising. Though wavelet analysis is not a forecasting technique, its distinguishing features (excellent temporal resolution, computational efficiency, overall methodological simplicity, and the possibility to decompose time series according to time scales) allow one to believe it can be helpful in forecasting economic and financial processes. In theory, wavelets should enable one to conduct a more precise study via building models specified within (or across) different frequency bands and computing forecasts as aggregates of the forecasted values for the component series, at the same time, they may significantly simplify the analysis via transforming the predicted variables in such a manner that it may be possible to find better forecast models. Another of the most important characteristics of wavelets is their denoising property. The wavelet transform (WT) can be adapted to distinguish noise in signal through its properties in the time and frequency domains. The ideas of noise removing by WT are based on the singularity information analysis and the thresholding of the wavelet coefficients. Only few wavelet coefficients in the lower bands could be used for approximating the main features of the clean signal. Hence, by setting the smaller details to zero, up to a predetermined threshold value, we can reach a nearly optimal elimination of noise while preserving the important information of the clean (desired) signal. A particular wavelet method called wavelet shrinkage denoising (Donoho and Johnstone 1995) has caused its advocates to claim that it offers all that we might desire of a technique, from optimality to generality. Wavelet shrinkage denoising is also a nonparametric method. Thus, it is distinct from parametric methods in which we must estimate parameters for a particular model that must be assumed a priori (Taswell 2000).

Time series analysis is usually about identifying signal from noise. A time series is a collection of observations well-defined data items obtained through repeated measurements during a period of time. But data, particularly economic (financial) data, can be truly noisy, partly because the outcome is often a human construct, which can only be measured with some error. Noise can stem from other factors, such as the collection of data on variables that are not correlated with the outcome of interest, or the addition of interaction and/or higher order terms, which can easily fail in-sample goodness-of-fit tests (Breiman 2001), meaning the combination of many variables and their transformations in a conventional, say, linear regression (Zografidou et al. 2017; Wei et al. 2018; Amasyali and EI-Ghoary 2018), can be so flexible or fit the observed data so well that it has little value in the explanation of new or out-of-sample data. Because of the noises from the system internal and external factors, the uncertainty increases in the financial markets and economic activities. Due to the uncertainty, the prediction task becomes more complex and difficult. In recent years, researchers are attracted towards the study of noise on the economic (and financial) predictions. And these predictions included economic model predictive control (Rashid et al. 2016; Bayer et al. 2016), bankruptcy predictions (Jardin 2017; Barboza et al. 2017), intraday predictions (Ruan and Ma 2017; Gunduz et al. 2017), contagion prediction strategy of the financial network (Sui et al. 2016), stock price trend prediction (Gopal and Ramasamy 2017), residential price trend prediction (Francke and Minne 2017) and so on. Obviously, financial time series involve different time scales such as intraday (high frequency), hourly, daily, weekly, monthly, or tick-by-tick stock prices of exchange rates. The analyses of financial time series have economic importance. Especially, stock index (price) prediction (Chiang et al. 2016) is a challenging application of modern time series forecasting, essential for the success of many businesses and financial institutions. On the other hand, with the continued liberalization of cross-border cash flow, international financial market has become increasingly interdependent. Investors are highly susceptible to exchange risk and fluctuations in equity prices throughout the world. Consequently, this paper only relates to stock index (price) prediction problem. For investor, stock price (index) data are always one of the most important information. Stock price prediction usually provides the fundamental basis for decision models to achieve good returns, which is the first and the most important factor for any investor. This can help to improve companies’ strategies and decrease the risk of potentially high losses, it can also help investors to cover the potential market risk to establish some techniques to progress the quality of financial decisions (Hussain et al. 2016). Unfortunately, stock price (index) is non-stationary and highly-noisy due to the fact that stock markets are affected by a variety of random factors. Predicting stock price or index with the noisy data directly is usually subject to large errors. In order to improve the prediction accuracy of the stock price, it is very necessary to denoise the original stock data.

In this paper, we mainly study denoising methods and the WT is utilized to remove noise from the stock prices data for prediction purpose. It is well known that volatility is an intrinsic characteristics of the stock market. While the prices of stock indices fluctuate daily, the price movements do not always reflect the value of the indices or their associated long-term trends. This is due in part to various noises (random factors) in the stock market, such as those caused by speculation and program trading, among others. In fact, the reasons for the volatility of stock prices are very complicated. They are the result of the combined effect of many random factors, not caused by a single factor. In practice, the observed stock data are contaminated by a lot of noise. These types of noise may be diverse, including classical (Gaussian, white, color, shot) noise and financially specific noise (micro-market noise, micro-structure and noise trader). Therefore, the noise in the stock prices data involved in this paper is a mixture of many random factors, not a single type of noise. The goal of this paper is to investigate the wavelet shrinkage denoising methods and the threshold rules based on stock prices data, the focus is on constructing a new shrinkage (thresholding) function. The remainder of the paper is organized as follows. In Sect. 2 we briefly introduce the concepts of shrinkage denoising and wavelet transform, and several classical thresholding functions are also introduced. In order to improve the denoising performance, in Sect. 3 we devote to designing a novel shrinkage function which has continuous derivatives (i.e., the differentiability property). In Sect. 4 the L2 risks of different thresholding functions are compared and analyzed, we also discuss the two classic (universal, minimax) threshold rules. Some experiments based on stock data are done in Sect. 5 to validate the denoising performance of the proposed thresholding function. Finally, Sect. 6 summarizes the results and presents the conclusions.

2 Wavelet Transform and Shrinkage Denoising

In economic and financial time series (signals) processing and analysis, detection of signals in the presence of noise and interference is a critical issue. The wavelet transform (WT) has become an important tool to suppress the noise due to its ability to detect transient features in signals. The WT has primary properties such as sparsity or compression which means that WT of real-world signals tend to be sparse, i.e., they have a few large coefficients that contain the main energy of the signal and other small coefficients which can be ignored. In essence, the WT is equivalent to a correlation analysis system, so that we can expect its output to be maximum when the input signal most resembles the analysis template and much smaller coefficients when there is mostly noise. This is the basic principle of wavelet denoising. This principle is the basis for the matched filter which is the optimum detector of a deterministic signal in the presence of additive Gaussian white noise. However, compared with matched filter, the advantage of the WT is that WT use several scales to decompose the signal into several resolutions. Denoising techniques are based on the idea that the amplitude (correlation factor) rather than the location of the spectrum of the signal is different from the noise. It might also be noted that denoising should not be confused with smoothing (despite the use by some authors of the term smoothing as a synonym for the term denoising). Whereas smoothing removes high frequencies and retains low ones, denoising attempts to remove whatever noise is present and retain whatever signal is present regardless of the signal’s frequency content. It is well known that the energy of the noise is spread among all the coefficients in the wavelet domain. Due to the fact that the WT of a noisy signal is a linear combination of the WT of the noise and the clean signal, the noise power can be suppressed significantly with a suitable threshold value while the main signal features can be preserved. Therefore, in the wavelet domain, the denoising can be done by simply thresholding the WT coefficients based on the so-called wavelet shrinkage. The functionality of the shrinkage process is due to the threshold value and the thresholding rule. The related works on signal denoising via wavelet shrinkage have shown that various wavelet thresholding schemes for denoising have near-optimal properties in the minimax sense (Donoho and Johnstone 1998).

Wavelet transform is generally used to analyze non-stationary time series for generating information in both the time and frequency domains. It may also be regarded as a special type of Fourier transform at multiple scales and decomposes a signal into shifted and scaled versions of a ‘‘mother’’ wavelet. The continuous wavelet transform (CWT) is defined as the convolution of a time series x(t) with a wavelet function w(t) (Gomes and Velho 2015):

$$ CWT_{x}^{\psi } (b,a) = \varphi_{x}^{\psi } (b,a) = \frac{1}{{\sqrt {|a|} }}\int {x(t)\psi^{ * } \left( {\frac{t - a}{b}} \right)dt} $$
(1)

where a is a scale parameter, b is the translational parameter and ψ* is the complex conjugate of ψ(t). Let a  = 1/2s and b  = k/2s, where s and k belong to the integer set Z. The CWT of x(t) is a number at (k/2s, 1/2s) on the time-scale plane. It represents the correlation between x(t) and ψ*(t) at that time-scale point. Thus, as a discrete version of (1) the discrete wavelet transform (DWT) is defined as

$$ \begin{aligned} DWT_{x}^{\psi } (k,s) = \varphi_{x}^{\psi } \left( {\frac{k}{{2^{s} }},\frac{1}{{2^{s} }}} \right) = \int_{ - \infty }^{\infty } {x(t)\psi^{ * } \left( {\frac{{t - {k \mathord{\left/ {\vphantom {k {2^{s} }}} \right. \kern-0pt} {2^{s} }}}}{{{1 \mathord{\left/ {\vphantom {1 {2^{s} }}} \right. \kern-0pt} {2^{s} }}}}} \right)dt} \hfill \\ \hfill \\ \end{aligned} $$
(2)

which separates the signal into components at various scales corresponding to successive frequencies. Note that DWT corresponds to the multi-resolution approximation expressions for analysis of a signal in many frequency bands (or at many scales). In practice, multi-resolution analysis is carried out by starting with two channel filter banks composed of a low-pass and a high-pass filter, and then each filter bank is sampled at a half rate (1/2 down sampling) of the previous frequency. The number of repeats of this decomposition procedure will dependent on the length of data. The down sampling procedure keeps the scaling parameter constant (1/2) throughout successive wavelet transforms (Li and Kuo 2008), so that it benefits from simple computer implementation. DWT has been well developed and applied in signal analysis of various fields. In this paper, DWT is utilized to remove noise from the stock prices data by constructing the appropriate shrinkage (thresholding) function.

The pioneered studies on wavelet shrinkage denoising include RiskShrink, a practical spatially adaptive method which works by shrinkage of empirical wavelet coefficients (Donoho and Johnstone 1994), an asymptotically minimax method which translates the empirical wavelet coefficients towards the origin for curve estimation based on noisy data (Donoho et al. 1995), and SureShrink which suppresses noise by thresholding the empirical wavelet coefficients (Donoho and Johnstone 1995). Generally these methods are collectively referred to as “WaveShrink” which is based on the principle of shrinking wavelet coefficients towards zero to remove noise. WaveShrink is now well established as a technique for removing noise from corrupted signals.

Many scholars have provided a large number of applications of wavelets in economics and finance by making use of DWT in decomposing economic and financial data. These works paved the way to the application of wavelet analysis for empirical economics and finance. The ensemble empirical model decomposition (EEMD) (Wu and Huang 2009), the ensemble version of empirical model decomposition (EMD) (Huang et al. 1998), applied with the goal of feature extraction, aims at creating features by extracting quasi-periodic components from signals. The components generated by this nonparametric method can be used as inputs to classification models, effectively removing most of the human bias from feature generation. This decomposition technique and DWT have been applied successfully in many fields and are especially useful for non-stationary time series. A lot of studies have applied the EEMD and DWT to predictive tasks in the field of economics and finance. Prediction on the highly oscillatory, non-stationary time series generated by financial instruments such as stock prices and market indices present one of the most popular and important problems in time series research. Wavelet analysis is well known for its ability of certain wavelet-based methods of signal estimation for forecasting economic and financial time series, and for its ability to examine the frequency content of the processes under scrutiny with a good joint time–frequency resolution. The endogenously varying time window, which underlies this type of frequency analysis, makes this approach efficient computationally, enables a precise timing of events causing or influencing economic fluctuations and makes it possible to analyze economic relationships decomposed according to time horizons. From the above analysis we can get the following conclusions. In addition to the application of signal (time series) denoising, the application of wavelet analysis in economic predictions is even more important because conventional DWT’s good decorrelation property allow one to believe that it can be helpful in forecasting economic and financial processes. A surprising implication of the development of forecasting techniques to real and financial economic variables is the recognition that the results are strongly dependent on the analysis of scale. Only in the simplest of circumstances will forecasts based on traditional time series (data) aggregates accurately reflect what is revealed by the time scale decomposition of the time series. In practice, the WT (CWT and DWT) leads to an analysis of less complicated univariate and multivariate processes, to which tailor-made forecasting techniques can be applied.

Suppose we observe data (time series) y = (y1, y2, …, yn)T given by

$$ y_{k} = f(t_{k} ) + \sigma z_{k} \quad (k = 1,2, \ldots ,n) $$
(3)

where tk =  k/n, f(·) is a deterministic function which is potentially complex and spatially inhomogeneous, {zk} are independent Gaussian random variables with zero mean and unit variance, and σ is the standard deviation of the Gaussian noise. Our goal is to estimate f(·) with small L2 risk (Donoho and Johnstone 1994, 1995):

$$ R\left( {\hat{f},f} \right) = \frac{1}{n}\sum\limits_{k = 1}^{n} {E\left[ {\hat{f}(t_{k} ) - f(t_{k} )} \right]^{2} } $$
(4)

where \( \hat{f}(t_{k} ) \) is the estimated sample value of f(tk). Obviously, L2 risk is the mean-square-error (MSE).

WaveShrink has been widely used to estimate the function f(·) and it has very broad asymptotic near-optimality properties, e.g., WaveShrink achieves, within a factor of log n, the optimal minimax risk over each functional class in a variety of smoothness classes and with respect to a variety of loss functions, including L2 risk. The approach of WaveShrink comprises the following step:

  1. (a)

    A forward WT of the observed data, i.e., transform data y into the wavelet domain;

  2. (b)

    Thresholding the wavelet coefficients, i.e., shrink the empirical wavelet coefficients towards zero;

  3. (c)

    Inverse wavelet transform of the thresholded coefficients, i.e., transform the shrunken coefficients back to the data domain.

Shrinkage of the empirical wavelet coefficients works best when the underlying set of the true coefficients of f(·) is sparse. In other words, when the overwhelming majority of the (non-important) coefficients of f(·) are small, and the remaining few (important) large ones explain most of the functional form of f(·). Therefore, an admissible shrinkage function (also known as a thresholding function) should have the following two ingredients: (a) throw away the small non-important coefficients; and (b) keep the large important coefficients.

The shrinkage functions used in the WaveShrink are the hard and the soft shrinkage functions, and they are all the basic ones. The hard shrinkage function can be described as the process of keeping the important coefficients unchanged, and setting the value of the non-important coefficient to zero if its absolute value is lower than the threshold λ. This is mathematically expressed as

$$ \delta_{\lambda }^{H} (x) = \left\{ {\begin{array}{*{20}c} {0,} & {|x| \le \lambda } \\ {x,} & {|x| > \lambda } \\ \end{array} } \right. $$
(5)

where threshold value λ ∈ [0, ∞).

The soft shrinkage function can be considered as an extension of the hard shrinkage. It sets the non-important coefficients to zero if their absolute values are lower than the threshold, and then shrinks the (nonzero) important coefficients toward zero. More specifically, the important coefficients are reduced by the absolute threshold value. Mathematically, this can be expressed as (Donoho 1995)

$$ \delta_{\lambda }^{S} (x) = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {|x| \le \lambda } \hfill \\ {x - \lambda ,} \hfill & {x > \lambda } \hfill \\ {x + \lambda ,} \hfill & {x < - \lambda } \hfill \\ \end{array} } \right. $$
(6)

Both the hard and the soft shrinkages have advantages and disadvantages. The soft shrinkage is continuous with discontinuous derivative, and its estimates tend to have bigger bias due to the shrinkage of large coefficients. Due to the discontinuities of the shrinkage function, the hard thresholding estimates tend to have bigger variance and can be unstable, that is, sensitive to small changes in the data (Gao 1998).

In order to remedy the drawbacks of the hard and the soft shrinkages, the WaveShrink denoising technique can use the non-negative garrote shrinkage function which was first introduced by Breiman (1995). The non-negative garrote shrinkage function is defined as follows

$$ \delta_{\lambda }^{G} (x) = \left\{ {\begin{array}{*{20}l} {0,} & {|x| \le \lambda } \\ {x - {{\lambda^{2} } \mathord{\left/ {\vphantom {{\lambda^{2} } x}} \right. \kern-0pt} x},} & {|x| > \lambda } \\ \end{array} } \right. $$
(7)

From the above three shrinkage functions, we can summarize the following conclusions: (a) The hard shrinkage function is discontinuous at the threshold λ; (b) The soft shrinkage function is continuous everywhere; (c) Like the soft shrinkage, the non-negative garrote shrinkage function is continuous, therefore, it is more stable than the hard shrinkage. In addition, the non-negative garrote shrinkage approaches the identity line as |x| gets large (close to the hard shrinkage), so it has smaller bias than the soft shrinkage for large coefficient. These results suggest that the non-negative garrote shrinkage function provides a good compromise between the hard and the soft shrinkage functions. However, these three functions in common is that they all have a single threshold. Gao and Bruce (1997) introduced a general firm (also called semisoft) shrinkage function with two thresholds (λ1, λ2):

$$ \delta_{{\lambda_{1} ,\lambda_{2} }} (x) = \left\{ {\begin{array}{*{20}l} { 0,} \hfill & {|x| \le \lambda_{ 1} } \hfill \\ {\text{sgn} (x)\frac{{\lambda_{2} (|x| - \lambda_{1} )}}{{\lambda_{2} - \lambda_{1} }},} \hfill & {\lambda_{1} < |x| \le \lambda_{2} } \hfill \\ {x,} \hfill & {|x| > \lambda_{2} } \hfill \\ \end{array} } \right. $$
(8)

where sgn(·) is the sign function. For values of x near the lower threshold λ1, the firm shrinkage δλ1, λ2(x) behaves like the soft shrinkage \( \delta_{\lambda 1}^{S} (x) \). For values of x above the upper threshold λ2, \( \delta_{\lambda 1,\lambda 2} (x) = \delta_{\lambda 2}^{H} (x) = x \). Obviously, the firm shrinkage with λ1 = λ2 is the hard shrinkage, and the firm shrinkage with λ2 = ∞ is the soft shrinkage. This means that the firm shrinkage generalizes the hard and soft shrinkage in the WaveShrink.

By choosing appropriate thresholds (λ1, λ2), semisoft shrinkage outperforms both hard and soft shrinkages and it has all the benefits of the best of the hard and soft without the drawbacks of either (Gao 1998). However, a distinct disadvantage of the semisoft shrinkage is that it requires two thresholds. This makes threshold selection problems much harder and computationally more expensive for adaptive threshold selection procedures.

The waveforms of the hard (λ = 3.31), soft (λ = 3.31), non-negative garrote shrinkage functions (λ = 3.31) and the semisoft shrinkage function (λ1 = 2.331, λ2 = 7.259) are shown in Fig. 1.

Fig. 1
figure 1

The waveforms of the shrinkage functions

In spite of the fact that the improved thresholding functions such as non-negative garrote and semisoft are a good compromise between hard and soft functions and have advantages over both of them, their fixed structure, their dependency on the threshold value, and sometimes lack of higher order differentiability, decrease their functionality and flexibility. Therefore, it is necessary to investigate a more desirable thresholding function with continuous higher order derivatives.

3 New Nonlinear Thresholding Function

In order to simplify the threshold selection process and reduce the computational burden, we will construct a new shrinkage function which contains only one threshold parameter λ. At the same time, the designed thresholding function must be as much as possible to overcome the defects of the hard and the soft shrinkage functions.

The basic principles for constructing the new thresholding function are as follows. First, the proposed shrinkage function should have continuous derivatives in order to solve the potential problems of other shrinkage functions. Secondly, this function should remain the wavelet coefficients unchanged if the absolute values are higher than the threshold value λ. From a mathematical point of view, the proposed shrinkage function should approximate the linear function y = x as much as possible. Finally, the proposed function should set the wavelet coefficients to zero if the absolute values are lower than the threshold value λ. Similarly, mathematically, the proposed function should approximate the function y = 0. Due to the fact that the necessary condition for implementing an adaptive learning gradient-based algorithm is the differentiability of the thresholding function, the first principle above is to consider the differentiability property of the shrinkage function. The second and third principles are to overcome the disadvantages of the hard and the soft shrinkage, i.e., the proposed thresholding function tries to avoid large estimate variance and large estimate bias.

According to the above basic principles, the proposed shrinkage function should have the following properties. If the wavelet coefficients x fall within the interval [− λ, λ], the thresholding function should be replaced by y = 0; If the wavelet coefficients x are outside the interval [− λ, λ], the thresholding function should be replaced by y = x. It is based on this idea that we will look for the suitable function to approximate these properties.

Mathematically, when we need to approach a function near the origin, an arctangent function is often used since it is usually used to obtain a better approximation. The arctangent function is defined as the inverse of the tangent function and it is also denoted by y = arctan(x). The domain of arctangent function is a real number domain, and its range is {y| − π/2 < y < π/2}. Obviously, the function y = arctan(x) has two horizontal asymptotes y = π/2 and y = − π/2. It is one-to-one function and onto function. The waveform of the arctangent function is shown in Fig. 2.

Fig. 2
figure 2

The waveform of arctangent function

From Fig. 2 we can see that the arctangent function passes through the origin, i.e., y = 0 if the x value is very small. On the contrary, y ≈ x when the x value leaves the origin. This means that arctan(x) has the desired properties of the thresholding function we need, therefore, arctan(x) can be used as a basic function to construct the new shrinkage function with continuous derivatives.

In order to increase the flexibility and capability of the proposed shrinkage (thresholding) function, three shape tuning factors are added to it as follows:

$$ \eta (\lambda ,x,\alpha ,\beta ,k) = \left\{ {\begin{array}{*{20}l} {x - \text{sgn} (x)\left[ {\lambda - \beta \cdot \arctan (k \cdot \lambda^{2\alpha + 1} )} \right],} \hfill & {|x| \ge \lambda } \hfill \\ {\beta \cdot \arctan (k \cdot x^{2\alpha + 1} ),} \hfill & {|x| < \lambda } \hfill \\ \end{array} } \right. $$
(9)

where α, β and k are all real numbers that need to be determined, and the value of α should be nonzero positive integer to ensure that the thresholding function is an odd function. In order to facilitate the processing of various signals, the proposed shrinkage function is not only a continuous function but also has a higher order derivative.

We now investigate the differentiability property through applying this property on the proposed function. The necessary and sufficient condition for differentiability of the function at point x = λ with respect to the threshold variable is continuity and equality of left and right derivatives at the given point.

The partial derivative of the proposed shrinkage function is

$$ \frac{\partial \eta (\lambda ,x,\alpha ,\beta ,k)}{\partial x} = \left\{ {\begin{array}{*{20}l} {1,} & {|x| \ge \lambda } \\ {\beta \cdot \frac{{(2\alpha + 1) \cdot k \cdot x^{2\alpha } }}{{1 + (k \cdot x^{2a + 1} )^{2} }},} & {|x| < \lambda } \\ \end{array} } \right. $$
(10)

due to the continuity of the partial derivative, i.e.,

$$ \left. {\frac{\partial \eta (\lambda ,x,\alpha ,\beta ,k)}{\partial x}} \right|_{x = \lambda - } = \left. {\frac{\partial \eta (\lambda ,x,\alpha ,\beta ,k)}{\partial x}} \right|_{x = \lambda + } $$
(11)

the parameter β can be obtained

$$ \beta = \frac{{1 + (k \cdot \lambda^{2a + 1} )^{2} }}{{(2\alpha + 1) \cdot k \cdot \lambda^{2\alpha } }} $$
(12)

The second order partial derivative of the proposed shrinkage function is

$$ \frac{{\partial^{2} \eta (\lambda ,x,\alpha ,\beta ,k)}}{{\partial x^{2} }} = \left\{ {\begin{array}{*{20}l} {0,} & {|x| \ge \lambda } \\ {2(2\alpha + 1)k\beta \frac{{\alpha x^{2\alpha - 1} - k^{2} (\alpha + 1)x^{6\alpha + 1} }}{{[1 + (kx^{2a + 1} )^{2} ]^{2} }},} & {|x| < \lambda } \\ \end{array} } \right. $$
(13)

Similarly, due to the continuity of the second order partial derivative, i.e.,

$$ \left. {\frac{{\partial^{2} \eta (\lambda ,x,\alpha ,\beta ,k)}}{{\partial x^{2} }}} \right|_{x = \lambda - } = \left. {\frac{{\partial^{2} \eta (\lambda ,x,\alpha ,\beta ,k)}}{{\partial x^{2} }}} \right|_{x = \lambda + } $$
(14)

the parameter k can also be obtained

$$ k = \sqrt {\frac{\alpha }{\alpha + 1}} \frac{1}{{\lambda^{2a + 1} }} $$
(15)

By substituting (15) into (12) we get

$$ \beta = \frac{\lambda }{{\sqrt {\alpha (\alpha + 1)} }} $$
(16)

Similarly, by substituting (15) and (16) into (9), a differentiable shrinkage function is resulted as follows:

$$ \eta (\lambda ,x,\alpha ) = \left\{ {\begin{array}{*{20}l} {x - \text{sgn} (x)\left[ {\lambda - \frac{\lambda }{{\sqrt {\alpha (\alpha + 1)} }}\arctan \left( {\sqrt {\frac{\alpha }{\alpha + 1}} } \right)} \right],} & {|x| \ge \lambda } \\ {\frac{\lambda }{{\sqrt {\alpha (\alpha + 1)} }}\arctan \left( {\sqrt {\frac{\alpha }{\alpha + 1}} \frac{1}{{\lambda^{2\alpha + 1} }}x^{2\alpha + 1} } \right),} & {|x| < \lambda } \\ \end{array} } \right. $$
(17)

In practical applications, the parameter α must be given a determined value before using the shrinkage function. To determine the optimal value of α, we can further simplify the formula (17) as follows:

$$ \eta (\lambda ,x,\alpha ) = \left\{ {\begin{array}{*{20}l} {x - \lambda \text{sgn} (x)\left[ {1 - \frac{1}{{\sqrt {\alpha (\alpha + 1)} }}\arctan \left( {\sqrt {\frac{\alpha }{\alpha + 1}} } \right)} \right],} & {|x| \ge \lambda } \\ {\frac{\lambda }{{\sqrt {\alpha (\alpha + 1)} }}\arctan \left[ {\sqrt {\frac{\alpha }{\alpha + 1}} \left( {\frac{x}{\lambda }} \right)^{2\alpha + 1} } \right],} & {|x| < \lambda } \\ \end{array} } \right. $$
(18)

By analyzing the formula (18), we can get the following two results:

(a) When |x| ≥ λ, i.e., the absolute values of the wavelet coefficients are higher than the threshold λ, the proposed shrinkage function should be close to a hard thresholding as much as possible because the hard shrinkage has a smaller (estimate) bias than the soft thresholding. Therefore, for the (important) wavelet coefficients, the value of α must satisfy the following condition:

$$ q_{1} = 1 - \frac{1}{{\sqrt {\alpha (\alpha + 1)} }}\arctan \left( {\sqrt {\frac{\alpha }{\alpha + 1}} } \right)\;{\text{ is as small as possible}} $$

Obviously, when α = 1, the q1 value is minimized because the parameter α is a nonzero positive integer, and in this case, q1 = 0.5648. This result shows that for the important wavelet coefficients, the proposed shrinkage function is closer to the non-negative garrote shrinkage. So the proposed function also has the advantage of the non-negative garrote shrinkage.

(b) When |x| < λ, i.e., the absolute values of the wavelet coefficients are lower than the threshold λ, the proposed function should set the non-important wavelet coefficients to zero as much as possible in order to reduce the noise. Thus, for the wavelet coefficients belonging to the noise, the value of α must satisfy the following condition:

$$ \frac{\lambda }{{\sqrt {\alpha (\alpha + 1)} }}\arctan \left[ {\sqrt {\frac{\alpha }{\alpha + 1}} \left( {\frac{x}{\lambda }} \right)^{2\alpha + 1} } \right] \to 0 $$

This condition requires that the value of α be as large as possible.

From the above analysis we can see that the two results are contradictory. Obviously, we need to consider these two results comprehensively in order to select the optimal value of α. To this end, we also need to use the L2 risk (or MSE) as a criterion of judgement because the L2 risk is also a comprehensive balance of estimate variance and bias (Gao and Bruce 1997). In the next section, we will discuss the selection of the optimal α value and will give formulas of bias, variance and L2 risk of the proposed shrinkage estimate of a Gaussian random variable.

4 The L 2 Risks and the Optimal Thresholds

For the sake of convenience, the proposed shrinkage function can be simply expressed as ηλ(x). Let X ~ N(θ, 1), where the symbol N represents the Gaussian distribution. Under shrinkage function ηλ(·) and threshold λ, define the mean, variance and L2 risk function of the shrinkage estimate of θ by (Gao 1998)

$$ M_{\lambda } (\theta ) = E[\eta_{\lambda } (X)],\quad \quad V_{\lambda } (\theta ) = \text{var} [\eta_{\lambda } (X)] $$
(19)
$$ R_{\lambda } (\theta ) = E[\eta_{\lambda } (X) - \theta ]^{2} = V_{\lambda } (\theta ) + [M_{\lambda } (\theta ) - \theta ]^{2} $$
(20)

where [Mλ(θ) − θ]2 is the square bias. After some simple mathematical manipulations we get

$$ R_{\lambda } (\theta ) = V_{\lambda } (\theta ) \cdot \theta^{2} - 2\theta \cdot M_{\lambda } (\theta ) $$
(21)

Obviously, from (20) we know that the L2 risk can be decompose into squared bias and variance components. Equation (20) implies that the contribution of the bias component to the L2 risk function is larger than that of the variance component. Therefore, the bias is minimized by selecting the optimal α value, which in turn minimizes the L2 risk function. In this case, even if the variance is not its minimum, the L2 risk function may still make the bias and the variance achieve an optimal tradeoff.

In order to determine the value of the parameter α that minimizes the L2 risk, we can compute and compare the L2 risk functions of different α values. Let α = 1, 2, 3, 5, 7 and the threshold λ = 3.31, the corresponding waveforms of the L2 risk functions of the proposed shrinkage function are shown in Fig. 3.

Fig. 3
figure 3

The risk functions of the proposed shrinkage with different α values

From Fig. 3 we can know that the L2 risk function of the proposed shrinkage function is the optimal when α = 1. That is, the proposed function with α = 1 provides an optimal compromise between estimated bias and variance. Thus, given α = 1, the waveform of the proposed thresholding function is also shown in Fig. 1.

For comparison, we also plot L2 risk functions of the hard shrinkage (λ = 3.31), the soft shrinkage (λ = 3.31) and the proposed shrinkage functions (λ = 3.31). The results are shown in Fig. 4.

Fig. 4
figure 4

The risk functions of the hard, soft and the proposed functions

From Fig. 4 we can see that the proposed shrinkage function is indeed an optimal tradeoff between the hard and soft thresholding functions. In addition, although the performance of the proposed thresholding function may not be optimal, it is important that we obtain a new nonlinear continuous differentiable shrinkage function by applying differentiability property to the proposed function. That is, the proposed function has the potential advantage in flexibility and capability.

Besides the thresholding function, selection of the optimum threshold value also plays an important role in the wavelet shrinkage denoising. Threshold selection methods are divided into three main groups.

The first group contains universal-threshold methods in which the threshold value is chosen uniquely for all wavelet coefficients of the noisy signal. The main method of this group (RiskShrink or VisuShrink) is introduced with the above hard and soft thresholding function as the first practical technique in signal denoising (Donoho and Johnstone 1994; Donoho et al. 1995). The second group (SureShrink) includes subband-adaptive methods that the threshold value is selected differently for each detail subband (Donoho and Johnstone 1995), where “Sure” is the abbreviation of Stein’s unbiased estimate of risk. In the third group, spatially adaptive group of threshold selection, each detail wavelet coefficient has its own threshold value (Mihcak et al. 1999).

In VisuShrink method, the famous universal threshold is proposed by Donoho and Johnstone:

$$ \lambda = \sigma \cdot \sqrt {2\log_{2} N} $$
(22)

where σ is the standard deviation of Gaussian white noise, and N is the total number of wavelet coefficients, in general, N is also the number of the samples of the noisy data. The wavelet transform of many noiseless objects is very sparse, and filled with essentially zero coefficients. After contamination with noise, these coefficients are all nonzero. If a sample that in the noiseless case ought to be zero is in the noisy case nonzero, and that character is preserved in the reconstruction, the reconstruction will have an annoying visual appearance: it will contain small blips against an otherwise clean background.

The threshold (2log2N)1/2 avoids this problem because of the fact that when {zi} is a white noise sequence independent and identically distribution N(0, 1), then, as N→∞,

$$ P\{ \hbox{max} |z_{i} | > \sqrt {2\log_{2} N} \} \to 0 $$
(23)

where P represents the probability. So, with high probability, every sample in the wavelet transform in which the underlying signal is exactly zero will be estimated as zero (Donoho and Johnstone 1994).

The universal threshold also has its limitations because the standard deviation of noise in practice is unknown. For practical use, it is important to estimate the noise intensity σ from the data rather than to assume that the noise intensity is known. Therefore, we can use an estimate from the finest scale empirical wavelet coefficients (Donoho and Johnstone 1995):

$$ \hat{\sigma } = \frac{{median\{ |x_{J - 1,k} |\} }}{0.6745}\quad \quad (0 \le k < 2^{J - 1} ) $$
(24)

where {xij} are the noisy wavelet coefficients, and 2J= N is the number of noisy data. It should be noted that the number of signal samples is an integer power of 2, but this need not be the case in general.

From (24) we know that it is important to use a robust estimator like the median, in case the fine-scale wavelet coefficients contain a small proportion of strong useful signals mixed in with noise.

In addition to the VisuShrink, we also use another procedure (SureShrink) that suppresses noise by thresholding the empirical wavelet coefficients (Donoho and Johnstone 1995). The thresholding is adaptive: A threshold level is assigned to each dyadic resolution level by the principle of minimizing the Stein unbiased estimate of risk (Sure) for threshold estimates. The procedure is used to recover a function of unknown smoothness from noisy sampled data. The computational effort of the overall procedure is order N·log(N) as a function of the sample size N.

Here, we do not elaborate on the derivation process of SureShrink, and mainly introduce its ingredients as follows: (a) Discrete wavelet transform of noisy data. The N noisy data are transformed via the discrete wavelet transform, to obtain N noisy wavelet coefficients {xj,k}; (b) Thresholding of noisy wavelet coefficients. Let ηλ(x) = sgn(x)(|x| − λ)+ denote the soft threshold, which sets to zero data x below λ in absolute value and pulls other data toward the origin by an amount λ. The wavelet coefficients xj,k are subjected to soft thresholding with a level-dependent threshold level λj; (c) Stein’s unbiased estimate of risk (Sure) for threshold choice. The level-dependent thresholds are found by regarding the different resolution levels (i.e., different j) of the wavelet transform as independent multivariate normal estimation problems.

Through use of a data based choice of threshold, SureShrink (also called minimax threshold) is more explicitly adaptive to unknown smoothness and has better large-sample MSE properties. The values of the threshold with unit noise standard deviation (σ = 1) for some values of the number of data sample points are listed in Table 2 of the literature (see Donoho and Johnstone 1994). The Table shows the case where the number of samples N is an integer power of 2. In practice, if N does not satisfy this condition, we will use the principle in which the actual number of samples N is adopted the value of the nearest integer power of 2.

5 Experimentation and Numerical Results

Stock prices as time series are always one of the most important information to investors. However, stock prices are essentially dynamic, nonlinear, nonparametric, and chaotic in nature. This implies that the investors have to handle the time series which are non-stationary, noisy, and have frequent structural breaks (Oh and Kim 2002). In fact, stock prices’ movements are affected by many macro-economical factors such as political events, company’s policies, general economic conditions, commodity price index, bank rates, investors’ expectations, institutional investors’ choices, and psychological factors of investors. Obviously, these factors are uncertain (random). For the financial time series, these factors are the noises that cause the stock price to fluctuate. Thus, analyzing stock price movement accurately is not only extremely challenging but also of great interest to investors.

5.1 Data Preparation

The empirical data set of Shanghai Composite Index (SCI) closings prices are used for numerical analysis. These data are collected on the Shanghai Stock Exchange (SSE). The total number of values for the SCI closing prices is 5790 trading days (about 284 trading months), from January 2, 1991 to August 29, 2014. The original (raw) data series of the closing price of every trading day are shown in Fig. 5.

Fig. 5
figure 5

The raw data of each trading day

From Fig. 5, we can see that the observed data are contaminated by a lot of noise (Wang et al. 2011). The reason for these noises may be the classical noises, it may also be the financial system-specific noises. The noises can be removed from the observed data by different wavelet shrinkage (thresholding) functions.

5.2 Evaluation Criterion

For actual stock price data y, we can represent it as

$$ {\mathbf{y}} = {\mathbf{s}} + {\mathbf{n}} $$
(25)

where s is the clean source signal, and n is additive noise. As mentioned above, there are many reasons for the noise. The noise n may be the market fluctuation caused by economic (or political) factors, or by investors’ expectations and psychological factors, etc. Considering various factors comprehensively, the noise n manifests itself as white noise. Two measures are utilized to evaluate the denoising performance relative to the actual stock price. These are signal-to-noise ratio (SNR), and root mean-square error (RMSE) which are given as follows:

$$ SNR = 10\log_{10} \frac{{||{\mathbf{s}}||_{2}^{2} }}{{||{\mathbf{s}} - {\hat{\mathbf{s}}}||_{2}^{2} }} $$
(26)
$$ RMSE = \sqrt {\frac{1}{N}||{\mathbf{s}} - {\hat{\mathbf{s}}}||_{2}^{2} } $$
(27)

where \( {\hat{\mathbf{s}}} \) is the denoised signal, and N is the number of sample of the signal s. Larger value of SNR and smaller value of RMSE indicate good denoising performance. However, in practice, it is very difficult to separate the clean signal s and the noise n from the observed signal y. Therefore, in this paper we will use the actual data y instead of the clean data s to approximately estimate the performance indices SNR and RMSE. Of course, this will produce a certain estimation error.

5.3 Wavelet Decomposition

There are a variety of wavelets proposed in the literature for performing DWT, and each has its own application domain with unique resolution capability, efficiency, and computational cost etc. In addition, the appropriate number of levels of wavelet decomposition can be determined by the nature of a time series, according to entropy criterion (Coifman and Wickerhauser 1992), or application’s characteristics (Li and Shue 2001). In general, the number of levels of decomposition depends on the length of the time series.

The type of wavelet is represented by “dbk” (k  = 1, 2, …, 45, …). The db refers to a particular family of wavelets. They are technically speaking called the Daubechies extremal phase wavelets. The number refers to the number of vanishing moments. Basically, the higher the number of vanishing moments, the smoother the wavelet (and longer the wavelet filter). The length of the wavelet (and scaling) filter is two times that number. For example, the length of db2 wavelet filter is 4, the length of db3 wavelet filter is 6, and so on. Different types of wavelet have different sparse representation of the same signal. The more sparse the signal representation, the better the denoising effect. For the SCI data (signal), we expect to find a wavelet type which can represent the signal most sparsely so as to achieve optimal denoising of the signal. In addition, boundary effects are very common in the processing of finite- length signals. It is well known that the wavelet transform is calculated as shifting the wavelet function in time along the input signal and calculating the convolution of them. As the wavelet gets closer to the edge of the signal, computing the convolution requires the non-existent values beyond the boundary. This creates boundary effects caused by incomplete information in the boundary regions. To deal with boundary effects, the boundaries should be treated differently from the other parts of the signal. If not properly made, distortion would appear at the boundaries. Two alternatives to deal with boundary effects can be found. The first one is to accept the loss of data and truncate those unfavorable results at boundaries after convolution between signal and wavelet. But simply neglect these regions in analysis yields to a considerable loss of data which is not allowed in many situations where the edges of the signal contain critical information. The other one is artificial the extension at boundaries before processing signals. In fact, there is another approach that employs the usual wavelet filters for the interior of the signal and constructs different boundary wavelets at the ends of the signal. This method has been shown to be merged into the class of signal extension. Various extension schemes have been developed to deal with the boundaries of finite length signals (Brislawn 1996; Ferretti and Rizzo 2000; Khashman and Dimililer 2008; Sezen 2009; Su et al. 2012). Because the SCI data are very long and the edges of the signal do not contain vital information, the boundary effects of the SCI data are not considered in this paper.

We use a series of experiments to determine the type of wavelet that best fits the SCI data (closing prices). For ease of comparison, we use the hard thresholding function and the minimax threshold rule to denoise the party of the SCI data (about 5000 closing price data). The wavelet decomposition level is 6, and the wavelet types are “db1”, “db2”, …, “db12”, respectively. The experimental results are shown in Table 1.

Table 1 Comparison of the wavelet types for the part of the daily data

From Table 1 we have found that the largest SNR obtained using “db5” is slightly larger than that of the other wavelets, and the smallest RMSE also obtained using “db5” is slightly smaller than that of the other wavelets. The theoretical significance of this result is that when we use db5 to represent the SCI data, the representation is the most sparse and the best denoising effect can be obtained. That is why we choose db5 instead of other.

In theory, the wavelet analysis method can handle the non-stationarity and nonlinearity of financial time series, but its effectiveness is greatly impacted by the result of the wavelet decomposition of the financial time series. In the process of discrete wavelet decomposition, a key issue is the choice of decomposition level. Different levels of decomposition correspond to different periods, such as levels 1–3 correspond to 2–4, 4–8, 8–16 days periods, and so on. It is well known that the maximum decomposition level (M) can be calculated as: M = log2(N), where N is the series length. For the purpose of denoising, it is necessary to determine the most suitable decomposition level from 1 to M. Along with the increase of decomposition level, more sub-signals and detailed information of series at larger temporal scales would appear. In spite of this, it is not true that the greater the level of decomposition, the better the denoising effect. In other words, level 1 is no more noisy than level 2, and level 2 is no more noisy than level 3, and so no. In general, for different noisy signals, the most suitable decomposition level is also different. It is based on this consideration that we still use the experimental method to determine the most suitable wavelet decomposition level. For the above SCI data (about 5000 closing price data), we use the wavelet “db5”, the hard thresholding function and the minimax threshold rule to achieve the task of denoising. The wavelet decomposition levels are 1, 2, …, 12, respectively. The experimental results are shown in Table 2.

Table 2 Comparison of the wavelet decomposition levels for the part of the daily data

From Table 2 we have also found that the largest SNR and smallest RMSE obtained using six as the number of levels of wavelet decomposition. The results show that for the denoising task of the SCI data, the most suitable wavelet decomposition level is 6. So, “db5” and a six level decomposition are used in the next subsection. In this case, the noise variance in the raw data is estimated to be σ2 = 17.32 (σ = 4.16) using Eq. (24).

5.4 Denoising Performance

In order to analyze the denoising performance, we use different shrinkage functions and different threshold rules to remove the noise (given σ2 = 17.32 or σ = 4.16) from the original (raw) data (as shown in Fig. 5).

We first investigate the universal threshold rule. In this experiment, the corresponding SNR and RMSE values of the different thresholding functions are computed and recorded. The results are shown in Table 3.

Table 3 Comparison of the universal threshold rule for daily data

For the universal threshold rule, from Table 3 we can know that the denosing performance (SNR and RMSE) of the hard shrinkage function is the best, and the performance of the soft shrinkage function is the worst. In addition, the denoising performance of the proposed thresholding function is better than that of the soft and non-negative garrote shrinkage, but slightly worse than that of the hard shrinkage. That is to say, the denoising performance of the proposed shrinkage function is close to the best one.

To more intuitively observe the denoising effect, we draw part of the data (raw data and their corresponding denoised data) together. The waveforms are shown in Figs. 6, 7, 8 and 9.

Fig. 6
figure 6

Comparison of raw data and hard shrinkage denoising data for the universal threshold rule

Fig. 7
figure 7

Comparison of raw data and soft shrinkage denoising data for the universal threshold rule

Fig. 8
figure 8

Comparison of raw data and nonnegative garrote shrinkage denoising data for the universal threshold rule

Fig. 9
figure 9

Comparison of raw data and proposed shrinkage denoising data for the universal threshold rule

And then, we investigate the minimax threshold rule. Similarly, for the original (raw) data, we also compute and record the corresponding SNR and RMSE values of the different shrinkage functions. The results of the denoising performance are shown in Table 4.

Table 4 Comparison of the minimax threshold rule for daily data

From Table 4 we can know that for the minimax threshold rule, the denoising performance of the proposed thresholding function is the best and the performance of the soft shrinkage is still the worst.

In addition, the data in Table 4 also shows that the denoising performance of the proposed thresholding function is much better than that of the hard and non-negative garrote shrinkage functions. This means that the proposed shrinkage function is very suitable for the minimax threshold rule.

For the minimax threshold rule, we also draw the part of the data (raw data and their corresponding denoised data) together. The waveforms are shown in Figs. 10, 11, 12 and 13.

Fig. 10
figure 10

Comparison of raw data and hard shrinkage denoising data for the minimax threshold rule

Fig. 11
figure 11

Comparison of raw data and soft shrinkage denoising data for the minimax threshold rule

Fig. 12
figure 12

Comparison of raw data and non-negative garrote shrinkage denoising data for the minimax threshold rule

Fig. 13
figure 13

Comparison of raw data and proposed shrinkage denoising data for the minimax threshold rule

With a comprehensive consideration of Tables 3 and 4, we can summarize the following conclusion. The proposed thresholding function can provide excellent denoising effects for both the universal threshold rule and the minimax threshold rule.

The above analysis is based on daily trading data. As the amount of data for the daily closing price is too large (a total of 5790 samples), it is not easy to observe the overall trend of data changes. Therefore, we can observe the monthly closing price to predict the future trend of the stock. More importantly, the monthly closing price is also one of the focus of investor attention. We define the closing price for the last trading day of the month as the closing price for the month. Thus, the above 5790 trading days are converted into 284 trading months. The corresponding waveform of the monthly trading data is shown in Fig. 14.

Fig. 14
figure 14

The monthly trading data

From Fig. 14 we can know that the observed (monthly) data are also contaminated by a lot of noise (Wang et al. 2011). For noisy (monthly) closing price data, we still use different thresholding functions and different threshold rules to achieve denoising tasks. The results of the universal threshold rule are shown in Table 5.

Table 5 Comparison of the universal threshold rule for monthly data

From Table 5 we can see that the denoising performance of the hard shrinkage is the best and the performance of the soft shrinkage is the worst. The denoising performance of the proposed thresholding function is similar to that of the non-negative garrote shrinkage but slightly better than that of the non-negative garrote shrinkage. That is, the proposed thresholding function has a near-best denoising effect.

Similarly, for the monthly closing price and the different thresholding functions, the denoising performance for the minimax threshold rule are shown in Table 6.

Table 6 Comparison of the minimax threshold rule for monthly data

From Table 6 we can also see that for the monthly trading data, the proposed thresholding function has the best denoising performance.

In addition, comparing Tables 3 with 5, 4 with 6, we can see that the denoising performance indexes for the monthly trading data are different from the indexes for the daily trading data. This is because the structure of the monthly data has changed comparing with the daily data. From the above definition, we know that the monthly trading data is only take the last one of the daily trading data in the month. Nonetheless, from Tables 5 and 6 we know that the proposed thresholding function can still provide excellent denoising performance for the monthly trading data.

In the above analysis, we have known that the closing price of every trading day of the Shanghai Composite Index (SCI) itself contained noise which may be caused by many random factors. In general, the noise manifests itself as zero-mean Gaussian white noise and the noise intensity σ can be estimated using Eq. (24). However, the estimated noise intensity is also different for different wavelet types. Given wavelet decomposition level 6, we calculate the noise intensity for “db1”, “db2”, …, “db12”, respectively. The results are shown in Table 7.

Table 7 The noise intensities for different wavelet types

From Table 7 we can see that the largest noise variance σ2 obtained using “db1” is slightly larger than that of the other wavelets, this value (σ2 = 19.54 or σ = 4.42) means that the raw (closing price) data is most affected by the noise (random factor). The above experiments (for daily data and monthly data) are only considered in a case of noise variance σ2 = 17.32 (or σ = 4.16), to more fully investigate the denoising performance of various thresholding functions, we use the largest noise variance (σ2 = 19.54) for the following experiments.

We first verify the universal threshold rule. In this case, we can compute and record the corresponding SNR and RMSE values of the different shrinkage functions. The results are shown in Table 8.

Table 8 Comparison of the universal threshold rule for the largest noise variance

For the universal threshold rule and the largest noise variance, from Table 8 we can know that the denosing performance of the hard shrinkage function is the best, and the performance of the soft shrinkage function is the worst. Obviously, the denoising performance of the proposed thresholding function is better than that of the soft and non-negative garrote shrinkage, and the performance of the proposed function is almost the same as that of the hard thresholding function.

And then, we investigate the minimax threshold rule. Similarly, for the largest noise variance (σ2 = 19.54), we also compute and record the corresponding SNR and RMSE values of various thresholding functions. The results are shown in Table 9.

Table 9 Comparison of the minimax threshold rule for the largest noise variance

From Table 9 we can know that the denoising performance of the proposed shrinkage function is the best and the performance of the soft shrinkage is still the worst. In addition, the denoising performance of the proposed shrinkage function is much better than other shrinkage functions. This result again illustrates that the proposed thresholding function is very suitable for the minimax threshold rule.

6 Conclusion

In the prediction of the relevant economic indicators based on the noisy time series data, the prediction accuracy without the denoising process is not satisfactory. Therefore, the noisy observed data are usually first subjected to denoising and wavelet shrinkage denoising is the most commonly used method. For the actual economic (financial) data, the denoising effect of the traditional shrinkage functions (e.g., soft shrinkage) is not ideal. This motivates us to find a more appropriate shrinkage function to improve the denoising performance. The proposed thresholding function has a continuous derivative which satisfies the necessary condition for implementing an adaptive learning gradient-based algorithm. In addition, we have also conducted an experiment with the SCI closings prices in order to verify the performance of the proposed thresholding function. The experiment has given the following results: (a) for the universal threshold rule, the denoising performance of the proposed function is better than that of the soft and the nonnegative garrote shrinkage functions, but slightly worse than that of the hard shrinkage; (b) for the minimax threshold rule, the denoising performance of the proposed thresholding function is better than that of the other shrinkage functions.

In this paper, we only consider the denoising of the time series data. How to combine denoising and forecasting effectively is a future research topic.