1 Introduction

Time series prediction is important and challenging task in finance. Compared with traditional time series analysis with emphasis on modeling the conditional first moment, Engle [1] and Bollerslev [2] developed generalized autoregressive conditional heteroscedasticity (GARCH) model to take the dependency of the conditional second moments into modeling consideration. Since this accommodates the increasingly important demand to explain and to model risk and uncertainty in financial time series, GARCH model has been main tool for volatility forecasting [38]. To enhance the forecasting performance of GARCH farther, Perez-Cruz proposed GARCH-SVM model and proved that forecasting volatility using support vector machine (SVM) is not only feasible but also effective [9].

SVM have originally been used for classification purposes but their principles can be extended easily to the task of regression and time series prediction [1012]. The prediction performance of SVM is greatly dependent upon kernel [13]. There are many kinds of existent support vector kernels such as the Gaussian and polynomial kernels used to map the data in input space to a high-dimensional feature space in which the problem becomes linearly separable. Since wavelet function can describe stock time series both at various locations and at varying time granularities [1417], it should describe the cluster feature of volatility well. One of the basic methods for constructing wavelets involves the use of spline functions, which are probably the simplest functions with small supports [18, 19]. Therefore, it is valuable for us to research the problem of whether a desirable performance could be achieved if we combine SVM with spline wavelet theory. In this paper, we construct a novel spline wavelet kernel for SVM to forecast volatility.

The objective of this paper is to evaluate the performance of spline wavelet kernel in spline wavelet support vector machine (SWSVM) for volatility prediction according to GARCH model by comparing it with the Gaussian kernel in SVM. This paper is organized as follows: Sect. 2 provides a brief introduction to the multi-resolution theory of wavelets. Section 3 describes how to construct spline wavelet kernel and prove that it is admissible support vector kernel. Section 4 discusses about the experimental results on simulating and real data sets, followed by conclusions in the last section.

2 Multi-resolution theory

Multi-resolution analysis is the decomposition of the Hilbert space L 2(R) to the nested sequence of closed subspaces {V j } jZ , which satisfy the following relations [15]:

  1. (1)

    \( \cdots \subset V_{2} \subset V_{1} \subset V_{0} \subset V_{-1} \subset V_{-2} \subset \cdots \)

  2. (2)

    \( {\rm {close}}\left\{{\bigcup\limits_{j \in Z} {V_{j}}} \right\} = L_{2} \left(R \right),\quad \bigcap\limits_{j \in Z} {V_{j}} = \left\{0 \right\} \)

  3. (3)

    f ∈ L 2(R) and \( \forall j \in Z, \quad f\left(x \right) \in V_{j} \Leftrightarrow f\left({2x} \right) \in V_{j-1} \)

  4. (4)

    f ∈ L 2(R) and \( \forall k \in Z, \quad f\left(x \right) \in V_{0} \Leftrightarrow f\left({x-k} \right) \in V_{0} \)

  5. (5)

    φ ∈ V 0 so that {φ(− k)} kZ is the Riesz basis in V 0.

Dilatations and translations of scaling function φ(x), the {φ j,k (x) = φ(2j x − k)} kZ is the Riesz basis in V j . Let us denote by W j the complement of the subspace V j in the space V j−1, there is V j−1 = V j  ⊕ W j , ∈ Z. Similarly, the Riesz basis of W j is {ψ j,k (x) = ψ(2j x − k)} kZ by dilatations and translations of wavelet function ψ(x).

3 Spline wavelet kernel and SWSVM

Theorem 1: Let x ∈ R, the scaling function φ (N)(x) is B-spline of the order (N − 1) with the compact support interval [0, N] and the integer division {x k  = k, k ∈ Z}, the recursion formula is valid [20]

$$ \begin{aligned}\varphi_{k}^{(N)} (x) & = \frac{x-k}{N - 1}\varphi_{k}^{(N - 1)} (x) + \frac{k + N - x}{N - 1}\varphi_{k + 1}^{(N - 1)} (x),\quad N = 2,3, \ldots, \\ \varphi_{k}^{(1)} (x) & = A_{[k,k + 1)} (x) = \left\{\begin{gathered} 1, \quad k \le x < k + 1 \hfill \\ 0,\quad {\rm {otherwise}}, \hfill \\ \end{gathered} \right. \\ \end{aligned} $$
(1)

where φ (N) k (x) = φ (N)(x − k).

Theorem 2: Compactly supported spline wavelet ψ(N)(x)can be expressed by the wavelet equation \( \psi^{(N)} (x) = \sum_{k} {d(k)} \varphi^{(N)} (2x - k), \) where scaling function is B-spline φ (N)(x) given in (1), and coefficients d(k) are equal to [19]

$$ d(k) = \frac{{(- 1)^{k}}}{{2^{N - 1}}}\sum\limits_{l = 0}^{N} {\left(\begin{gathered} N \hfill \\ l \hfill \\ \end{gathered} \right)} \varphi^{(2 N)} (k - l + 1),\quad k = 0, \ldots,\,3-2$$
(2)

Theorem 3: Let ψ(N)(x) be a mother wavelet, let j and k denote the dilation and translation, respectively. All calculations are done on integer divisions, with the finest resolution level division {x k  = k, k = 0, …, 2m}. If s, t ∈ R M , then dot-product wavelet kernels are:

$$ K(s,t) = \prod\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{J} {\sum\limits_{k = 1 - N}^{{L_{j} - N}} {\psi_{{_{j,k}}}^{(N)} (s_{i})}} \psi_{{_{j,k}}}^{(N)^\prime} (t_{i})}, \quad L_{j} = 2^{m - j} $$
(3)

Proof: Let x 1, …, x ∈ R M and a 1 , …, a ∈ R,

$$ \begin{gathered} \sum\limits_{p,q}^{l} {a_{p} a_{q} K(x^{p},x^{q})} \hfill \\ = \sum\limits_{p,q}^{l} {a_{p} a_{q}} \prod\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{J} {\sum\limits_{k = 1 - N}^{{L_{j} - N}} {\psi_{{_{j,k}}}^{(N)} \left({x_{i}^{p}} \right)}} \psi_{{_{j,k}}}^{(N)^\prime} (x_{i}^{q})} \hfill \\ = \left({\sum\limits_{p}^{l} {a_{p}} \prod\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{J} {\sum\limits_{k = 1 - N}^{{L_{j} - N}} {\psi_{{_{j,k}}}^{(N)} \left({x_{i}^{p}} \right)}}}} \right)\left({\sum\limits_{q}^{l} {a_{q}} \prod\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{J} {\sum\limits_{k = 1 - N}^{{L_{j} - N}} {\psi_{{_{j,k}}}^{(N)^\prime} \left({x_{i}^{q}} \right)}}}} \right) \hfill \\ = \left({\sum\limits_{p}^{l} {a_{p}} \prod\limits_{i = 1}^{M} {\sum\limits_{j = 1}^{J} {\sum\limits_{k = 1 - N}^{{L_{j} - N}} {\psi_{{_{j,k}}}^{(N)} \left({x_{i}^{p}} \right)}}}} \right)^{2} \hfill \\ \ge 0 \hfill \\ \end{gathered} $$

Hence, dot-product kernels satisfy Mercer’s condition. Therefore, it is admissible support vector kernel.

It’s well known to all that the Gaussian kernel need to compute a l by l kernel matrix, O(l 2). The major difference on complexity of spline wavelet kernel is on the constructing stage to gain Multi-resolution frame. Hence, if there is the number h of frame elements describing the frame-based kernel, the computational complexity can be calculated by O(h 2 × l 2).

4 Experiment analysis

Two simulated data sets are examined in the first series of experiment. Each data set consists of 1,560 samples generated by a GARCH (1,1) model:

$$ y_{t} = \mu + \sigma_{t} \varepsilon_{t} $$
(4)
$$ \sigma_{t}^{2} = \omega + \alpha y_{t - 1}^{2} + \beta \sigma_{t - 1}^{2} $$
(5)

where y t is daily return and ε t is innovation, an uncorrelated process with zero mean and unit variance. For the sake of simplicity, the mean of a financial return series μ is often neglected. In addition, the parameters ω, α and β must satisfy ω > 0, α, β ≥ 0 to ensure that the conditional variance σ 2 t is positive. The experimental setup is as follows: μ = 0, ω = 0.1, α = 0.4 and β = 0.5 and a disturbance term ε t distributed first as Gaussian and then as a Student’s t with four degrees of freedom (kurtosis = 4). This second distribution tries to model the excess of kurtosis that appears in real financial series. Every time series consists of 1,560 samples. They are referred to as Data-1 and Data-2.

The data examined in the experiment are composed of the following daily indices: DAXINDX, FRCAC40, FTSE100, JAPDOWA and SPCOMP. These stock market indices P t are then transformed into daily returns y t by 100 times their log differences:

$$ y_{t} = 100\ln \left({p_{t}/p_{t - 1}} \right) $$
(6)

All the index data encompass the period from 1 January 1992 to 31 December 1997. There are 1,560 observations for each time series of daily return. Each whole data set is divided into several overlapping training and testing sets according to the walk-forward testing routine [21]. Each training and test set is moved forward through the time series by 130 observations, in which there are a total of 520 observations in the training set, 520 observations in the test set. The optimal values of C, ε and γ in SVM with Gaussian kernel are chosen based on fivefold cross validation. The same method is also used in the SWSVM to choose C, ε and the dilation j separately. All parameters are shown in Table 1. The results are collated and the best results are recorded as follow, which are gained from the first sets (January 1992–January 1996).

Table 1 Parameters of model

The prediction performance is evaluated using the following statistical metrics: normalized mean squared error (NMSE), normalized mean absolute error (NMAE) and the hit rate (HR). These metrics are calculated as follows:

$$ {\rm{NMSE}} = \sqrt{{{\sum\nolimits_{t = 1}^N {\left({\mathop{{\hat{\sigma_{t}^2}}}} - y_{t}^{2}\right)^{2}}}{\mathord{\left/{\vphantom{\sum\nolimits_{t = 1}^{N}{\left({\mathop{{\hat{\sigma_{t}^2}}}} - y_{t}^{2} \right)^{2}}}}{\sum\nolimits_{t = 1}^{N}{\left({y_{t - 1}^{2} - y_{t}^{2}}\right)}} \right. \kern-\nulldelimiterspace}{\sum\nolimits_{t = 1}^{N} {\left({y_{t - 1}^{2} - y_{t}^{2}} \right)}}}^{2}}} $$
(7)
$$ {\rm{NMAE}} = {{\sum\nolimits_{t = 1}^{N} {\left|{\mathop{{\hat{\sigma_{t}^2}}}} - y_{t}^{2} \right|}}\mathord{{\left/{\vphantom {{\sum\nolimits_{t = 1}^{N} {\left|{\mathop{{\hat{\sigma_{t}^2}}}} - y_{t}^{2}\right|}}{\sum\nolimits_{t = 1}^{N} {\left|{y_{t - 1}^{2} - y_{t}^{2}} \right|}}}} \right. \kern-\nulldelimiterspace}{\sum\nolimits_{t = 1}^{N} {\left| {y_{t - 1}^{2} - y_{t}^{2}}\right|}}}} $$
(8)
$$ {\rm{HR}} = \frac{1}{N}\sum\limits_{t = 1}^{N} {q_{t},q_{t} = \left\{{\begin{array}{*{20}c} 1 & : \\ 0 & : \\ \end{array}}\right.} \begin{array}{*{20}c} {\left({\mathop{{\hat{\sigma_{t}^2}}}} - y_{t - 1}^{2} \right)\left(y_{t}^{2} - y_{t - 1}^{2}\right) \ge 0} & {} \\ { {\rm{else}}} &{} \\ \end{array} $$
(9)

where N represents the total number of data points in the test set. \( {\mathop{{\hat{\sigma_{t}^2}}}} \) denotes the predicted conditional variance. \( \mathop{\hat{y}} \) represents the predicted return. y denotes the actual return. The NMSE relates the mean square error of the predicted volatility \( {\mathop{{\hat{\sigma_{t}^2}}}} \) by SVM to the mean square error of the naive model \( {\mathop{{\hat{\sigma_{t}^2}}}} = y_{t - 1}^{2}. \) The NMAE is more robust against outliers in comparison with NMSE. They are the measures of the deviation between the actual and predicted values. The smaller the values of them, the closer are the predicted time series values to the actual values. On the contrary, the larger the value of HR as a measure of how often the model gives the correct direction of change of volatility, the better is the performance of prediction.

The smaller cross-validation error is the stronger generalization ability of model is. After we gained parameters by cross-validation method to avoid over-fitting, we firstly inspect the prediction result of training set, which is supplement for prediction result of test set.

The results on the training set are listed in Table 2. It can be observed that in all the daily indices, the smaller values of NMSE are in the spline wavelet kernel. The smaller values of NMAE are in spline wavelet kernel with the exclusion of FTSE100 and JAPDOWA, too. As for HR, only in Data-1 and DAXINDX the larger values are in Gaussian kernel. A paired-test [22] is performed to determine if there is significant difference between the two kernels based on the NMSE of the training set. The calculated t-value indicates that spline wavelet kernel outperforms Gaussian kernel with 5% significance level for a one-tailed test.

Table 2 Results on the training set

The results on the test set in Table 3 provide a better basis for a comparison of the two kernels where over-fitting issues may be neglected. As expected, the results of the test set are worse than those of the training set in terms of NMSE, NMAE and HR. But the similar conclusion can still be achieved. The table illustrates that apart from Data-1, the smaller values of NMSE are founded in spline wavelet kernel. And the smaller values of NMAE are founded in spline wavelet kernel except for DAXINDX and FTSE100. The larger values of HR all occurred in spline wavelet kernel. And a paired-test on the NMSE of the test set also shows that spline wavelet kernel outperforms Gaussian kernel with 5% significance level for a one-tailed test.

Table 3 Results on the test set

The squared observations y 2 t and the predicted values of \( {\mathop{{\hat{\sigma_{t}^2}}}} \) from both two kernels for the test sets are illustrated in Figs. 1 and 2, where only the FRCAC40 is drawn, since it has well representative values of NMSE and NMAE in all daily indices. In this investigation, all of the parameters are seen in Table 1. It is clear that both Gaussian and spline wavelet kernel are enough to grasp the features reflected by the naive model. The predictions made by both kernels are very similar according to Figs. 1 and 2, although, as we can see in Table 3, performance of spline wavelet kernel is better than that of Gaussian kernel.

Fig. 1
figure 1

Squared observations and forecasted volatility by Gaussian kernel

Fig. 2
figure 2

Squared observations and forecasted volatility by spline wavelet kernel

All experiments are run on 1.6 MHz Intel processors with 2 GB main memory under window XP professional. Training time in CPU-seconds of SVM is 3.5688 while that of WSVM is 3.1182e + 003. WSVM is more slowly than SVM because better prediction performance gained by constructing wavelet kernel based on multi-scale theory is at the cost of the time consuming.

5 Conclusion and discussion

An effective spline wavelet kernel which we combine spline theory and wavelet method with SVM to construct for volatility forecasting is presented in this paper. The existence of spline wavelet kernel is proven firstly. And then the forecasting performance is evaluated by using two simulated data sets and five real daily indices. As demonstrated in the experiment, spline wavelet kernel forecasts significantly better than Gaussian kernel in all aspects. The superior performance of spline wavelet kernel to the Gaussian kernel mostly lies in that spline wavelet is a set of bases that can approximate arbitrary functions. Future work will involve a theoretic analysis of the multi-scale frame on spline. More sophisticated spline wavelet kernel which can closely follow the volatility cluster will be explored for further improving the performance of SWSVM in volatility forecast.