Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels

Bezerra, Pedro Correia S.; Albuquerque, Pedro Henrique M.

doi:10.1007/s10287-016-0267-0

Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels

Original Paper
Published: 16 November 2016

Volume 14, pages 179–196, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Management Science Aims and scope Submit manuscript

Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels

Download PDF

Pedro Correia S. Bezerra¹ &
Pedro Henrique M. Albuquerque¹

1438 Accesses
48 Citations
3 Altmetric
Explore all metrics

Abstract

The support vector regression (SVR) is a supervised machine learning technique that has been successfully employed to forecast financial volatility. As the SVR is a kernel-based technique, the choice of the kernel has a great impact on its forecasting accuracy. Empirical results show that SVRs with hybrid kernels tend to beat single-kernel models in terms of forecasting accuracy. Nevertheless, no application of hybrid kernel SVR to financial volatility forecasting has been performed in previous researches. Given that the empirical evidence shows that the stock market oscillates between several possible regimes, in which the overall distribution of returns it is a mixture of normals, we attempt to find the optimal number of mixture of Gaussian kernels that improve the one-period-ahead volatility forecasting of SVR based on GARCH(1,1). The forecast performance of a mixture of one, two, three and four Gaussian kernels are evaluated on the daily returns of Nikkei and Ibovespa indexes and compared with SVR–GARCH with Morlet wavelet kernel, standard GARCH, Glosten–Jagannathan–Runkle (GJR) and nonlinear EGARCH models with normal, student-t, skew-student-t and generalized error distribution (GED) innovations by using mean absolute error (MAE), root mean squared error (RMSE) and robust Diebold–Mariano test. The results of the out-of-sample forecasts suggest that the SVR–GARCH with a mixture of Gaussian kernels can improve the volatility forecasts and capture the regime-switching behavior.

Forecasting Financial Returns Volatility: A GARCH-SVR Model

Article 13 May 2019

Forecasting stock volatility process using improved least square support vector machine approach

Article 05 January 2019

Forecasting volatility by using wavelet transform, ARIMA and GARCH models

Article Open access 10 December 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Financial time series prediction is important and challeging tak in empirical finance. The data geration process of these series are complex because of its chaotic, noisy, non-stacionary and nonlinear nature (Cao and Tay 2001). Thus, the use of support vector regression (SVR) in financial forecasting has been proposed in the literature, because it do not establish hypotheses about the distribution of data, is a pure data-driven technique, is very flexible, has excellent forecasting accuracy, and show theoretical and empirical superior results than artificial neural networks and traditional statistical methods (Sapankevych and Sankar 2009; Cavalcante et al. 2016). Moreover, the SVR is a machine learning technique based on the statistical learning theory that implement the Structural Risk Minimization Principle, which results in better generalization performance (Cao and Tay 2003).

Volatility is a measure of the degree of flutuaction of financial return and is a proxy for risk, thus is a key variable in risk management, asset pricing and portfolio selection (Brownlees and Gallo 2009). Linear and nonlinear parametric generalized autoregressive conditional heteroscedasticity (GARCH) models make assumptions about the functional form of the data generating process and the error distribution. Besides, empirical studies provide evidence that GARCH has low forecasting performance (Jorion 1995; Brailsford and Faff 1996; Mcmillan and Speight 2000; Choudhry and Wu 2008). Therefore, modifications have been proposed to improve its forecasts accuracy such as: changes in specification and model estimation, the use of different proxies for volatility, changes in evaluation metrics of forecast (Chen et al. 2010). To overcome these limitations, volatility forecasting models based on SVR have been proposed in the literature, because it able to capture non-linear caractheristics of financial time series such as volatility clustering, leptokurtosis and leverage effect, without any assumptions about the data distribution properties. As shown by Fernando et al. (2003), Chen et al. (2010), Li (2014) and Santamaría-Bonfil et al. (2015), SVR shows superior results on volatility forecasting compared with GARCH models, due to its ability to capture the dynamic and nonlinear behavior of financial time series.

The volatility of financial asset returns change over time due to the capital market behavior. Empirical evidence shows that there are oscillation between several regimes in the financial market, in which the overall distribution of returns is a mixture of normals (Levy and Kaplanski 2015). In general, researches report the existence of two regimes (one with high and the other with low volatility) for the distribution of stock returns in the equity market. However, the markets can have more than two states (Bae et al. 2014). Then, it is necessary a mixture of more than two normal distributions to model the regime-switching behavior (Guidolin 2011). As the SVR is a kernel-based methodology, its forecasting performance is greatly dependent upon the selection of kernel function. To improve the SVR learning and generalization ability and take advantage of different kernel functions, it is possible to construct hybrid kernels via linear or non-linear combination of kernels (Huang et al. 2014). Empirical evidence shows that the hybrid kernel has superior empirical results on forecasting accuracy than the SVR with a single-kernel (Huang et al. 2014). However, to the best of our knowledge, no research on volatility forecasting via SVR used a hybrid kernel. In this context, the main purpose of this article is to use a mixture of Gaussian kernels in the SVR based on GARCH (1,1) (heretofore SVR–GARCH) in order to improve the prediction accuracy and model: (i) market regimes and (ii) financial returns stylized characteristics such as high curtosis, heavy tails and volatility clustering.

The forecasting accuracy of SVR–GARCH with a linear combination of one, two, three and four Gaussian kernels in one-period-ahead volatility forecasting is compared with the SVR–GARCH with Morlet wavelet kernel, GARCH, EGARCH, GJR. For each GARCH models four different distributions for the innovations are considered: the Normal, the Student’s t, the skewed Student’s t and the GED. in terms of two evaluation metrics of mean absolute error (MAE) and root mean squared error (RMSE).

The remainder of this paper is organized as follows. Section 2 provides a brief explanation of the Support Vector Machine (SVM) for regression. Section 3 explains the use of mixture of Gaussian kernels in the SVR–GARCH. Section 4 describes the empirical modeling. Section 5 shows the empirical results of the proposed model on daily financial returns of Nikkei 225 and Ibovespa indexes. Section 6 provides the concluding remarks of this paper.

2 Support vector regression

The support vector machine (SVM) is a machine learning algorithm based on the statistical learning theory developed by Vapnik and Chervonenkis (1974). SVM for linear and non-linear regressions is called support vector regression (SVR) (Smola and Schölkopf 2004). The non-linear SVR can be written as follows: given a set of training data ${(x_1,y_1),\ldots ,(x_n,y_n) }$, where $x_i\in \mathcal {X} \subseteq \mathbb {R}$ is the input vector and $y_i\in Y \subseteq \mathbb {R}$ being the output scalar. The goal of SVR is to find a function f(x) that approximate the output escalar $y_i$ less than a forecast error. To achieve this goal, the SVR nonlinearly maps the input vector space ($\mathbb {R}^n$) into a higher dimension feature space ($\mathcal {F}$), where the non-linear relations of the input space are aproximated by a linear regression in the feature space Vapnik (1995):

$$\begin{aligned} f(x)=w^T\phi (x)+b, \quad \text{ with } \; \phi :\mathbb {R}^n \rightarrow \mathcal {F},w \in \mathcal {F} \end{aligned}$$

(1)

where w and b are the regression parameter vectors and $\phi ({.})$ is the nonlinear mapping function, which projects the input vector into a higher dimension feature space, where the linear regression is defined. Vapnik (1995) introduced the $\epsilon $-insensitive loss function ($L_\epsilon $) to measure the difference between the actual and predicted values. The goal of $\epsilon $-SVR is to find a function that has at least $\epsilon $ deviation from $y_i$. The vector w and constant b can be found by minimizing the following regularized function R(C) (Vapnik 1995):

$$\begin{aligned} {\textit{Minimize}}: R(C)=\dfrac{1}{2}\Vert w\Vert ^2+\dfrac{C}{n}\sum _{i=1}^{n}(L_\epsilon (f(y_i, f(x_i)); \end{aligned}$$

(2)

where:

$$\begin{aligned} L_\epsilon (y, f(x))= {\left\{ \begin{array}{ll} |y_i-f(x_i)|-\epsilon , &{} \quad \text{ if } \; |y_i-f(x_i)|>\epsilon \\ 0, &{} \quad \text{ otherwise } \end{array}\right. }, \epsilon \ge 0 \end{aligned}$$

(3)

is the $\epsilon $-insensitive loss function ($L_\epsilon $). Only the observations on or outside the $\epsilon $-insensitive zone will serve as the support vectors to construct the decision fucntion ($f(\mathbf x )$). To indicate the errors outside the $\epsilon $-insensitive zone, slack variables (($\xi _i,\xi _i^*$), $i=1,2,\ldots ,n$) are introduced (Smola and Schölkopf 2004). Then, the primal problem of SVR is given by:

$$\begin{aligned} {\textit{Minimize}}: \dfrac{1}{2}\Vert w\Vert ^2+ C \sum _{i=1}^{n}(\xi _i+\xi _i^*), \end{aligned}$$

(4)

$$\begin{aligned} \text{ subject } \text{ to }\ {\left\{ \begin{array}{ll} y-w^T\phi (x)-b\le \epsilon +\xi _i,\\ w^T\phi (x)+b-y\le \epsilon +\xi _i^*,\\ \xi _i,\xi _i^*\ge 0 \end{array}\right. } \end{aligned}$$

The convex quadratic programming and the linear restrictions of the primal problem above assure that the SVR will always achieve the optimal global solution. The term $\dfrac{1}{2}\Vert w\Vert ^2$ characterizes the model complexity. The parameter C denotes the trade-off between the function complexity and training error $\sum _{i=1}^{n}(\xi _i+\xi _i^*)$ (Sermpinis et al. 2014). If the value C is large, the algorithm will overfit the data and will have lower generalization hability. The parameter $\epsilon $ controls the width of the $\epsilon $-insensitive zone: the higher $\epsilon $ is, less support vectors are selected (Cherkassky and Ma 2004). The parameters C and $\epsilon $ are the SVR meta-parameters and, in general, are determined by cross-validation (Haykin 1998). In order to solve equation 4, it is possible to use the Lagrangian multipliers ($\alpha _i$ e $\alpha _i^*$) and the Karush–Kuhn–Tucker conditions (Karush (1939); Kuhn and Tucker (1951)), transforming the problem in its dual form (Vapnik 1995):

$$\begin{aligned} \text{ Maximize }:\mathcal {L}&=-\dfrac{1}{2}\sum _{i=1}^{n}(\alpha _i-\alpha _i^*)(\alpha _j^*-\alpha _j)\langle \ \phi ({x}_i),\phi ({x})\rangle \nonumber \\&\quad \,+\;\sum _{i=1}^{n}y_i(\alpha _i-\alpha _i^*) -\epsilon \sum _{i=1}^{n}(\alpha _i+\alpha _i^*) \end{aligned}$$

(5)

$$\begin{aligned} \text{ subject } \text{ to }\ {\left\{ \begin{array}{ll} \sum _{i=1}^{n}(\alpha _i^*-\alpha _i)=0, \\ 0\le \alpha _i \le C, i=1,\ldots ,n \\ 0\le \alpha _i^* \le C, i=1,\ldots ,n \end{array}\right. } \end{aligned}$$

The Lagrange multipliers are calculated, then we find the support vector in expansion: $w=\sum _{i=1}^{n}(\alpha _i^*-\alpha _i)\phi ({x}_i)$. From the solution of the dual problem, the $\epsilon $-SVR function can be written as (Vapnik 1995):

$$\begin{aligned} f(x)=\sum _{i=1}^{n}(\alpha _i-\alpha _i^*)\langle \phi ({x}_i),\phi (x) \rangle +b \end{aligned}$$

(6)

where $\langle \phi (x_i),\phi (x) \rangle $ is the dot product in the feature space $\mathcal {F}$. To avoid the complexity of computing $\phi (.)$, we can substitute the dot product by a kernel function:

$$\begin{aligned} f(x)=\sum _{i=1}^{n}(\alpha _i-\alpha _i^*) K(x_i,x)+b^* \end{aligned}$$

(7)

The kernel function $K(x,x')=\langle \phi (x),\phi (x')\rangle $ is critical to the forecasting performance of the SVR. Any function that satisfies the (Mercer 1909) theorem is an admissible kernel. So far there is no analytical method for choosing the most appropriate kernel for a given problem (Sangeetha and Kalpana 2010). Before estimating w and b with SVR, it is necessary to choose the regularized parameter C, loss function parameter $\epsilon $ and the parameters of the chosen kernel (Sermpinis et al. 2014).

3 SVR–GARCH with a mixture of Gaussian kernels

In empirical finance, the normal distribution is the most convenient distributional assumption of asset returns (Wirjanto and Xu 2009). Nevertheless, empirical studies show that the distribution of returns depart from the normal distribution shape due to their substancial leptokurtosis (fat tails) and skewness (assymetry) (Wang and Taaffe 2015). Besides, the stock market oscillates between regimes (or states) (Tu 2010). Then, to explain these facts, it is more appropriate to use a mixture of two or more normal distributions (Wang and Taaffe 2015), because they are more flexible and can model complex phenomena (Marron and Wand 1992; McLachlan and Peel 2004).

Conditional volatility models such as the Autoregressive Conditional Heteroscedasticity (ARCH) (Engle 1982) and Generalized ARCH (GARCH) (Bollerslev 1986) can capture volatility clustering and time-varying volatility. However, empirical evidence shows that these models with Gaussian or heavier distribution innovations can not model the full extent of skewness and kurtosis (Wirjanto and Xu 2009). Since any continuous distribution can be well approximated by a finite mixture of normal distributions, the use of mixtures of normals in GARCH innovations have been proposed by Bai et al. (2003), Haas et al. (2004), Marcucci (2005), Alexander and Lazar (2006), Wirjanto and Xu (2009).

Given that financial returns are subject to regime-switching behavior between k states, even if the distribution of financial returns of each regime is normal, the overall distribution, given the probability of each state, is not normal. In fact, it is a mixture of k normal distributions (Levy and Kaplanski 2015). One way to accomodate this situation in the context of volatility forecasting via SVR is to use a mixture of Gaussian kernels. The mixture of normal distributions can capture extreme events, high curtosis, heavy tails of financial returns and approximate arbitraly any continuous probability distribution (McLachlan and Peel 2004; Wang and Taaffe 2015). In this context, we attempt to find the optimal number of mixtures of Gaussian kernels for the SVR–GARCH model. We will test a linear combination of one, two, three and four Gaussian kernels in SVR based on GARCH(1,1), which can capture market regimes and perhaps show better out-of-sample forecasting results than the SVR–GARCH with a single-kernel.

To verify which are the most used kernels in volatility forecasting via SVR, we conduct a search from 2000 to 2016 through each of the following publishers online search engines: Elsevier, Wiley Online Library, IEEE Xplore Digital Library, SCOPUS, ISI Web of Knowledge, Sciencedirect, Google Scholar and ProQuest Journals. We use the same query for every search engine:

(“support vector machine” OR ”support vector regression”) AND (“financial time series forecasting” OR“volatility forecasting” OR “volatility” )

Then, we select papers about volatility forecasting via SVR published in peer-reviewed journals with an impact factor. We found that the ten selected research articles used only a single-kernel and that the Gaussian is the most widely used kernel function (Table 1):

Table 1 Kernel choice in volatility forecasting

Full size table

The kernel function maps non-linear observations of input data into a higher dimensional feature space, in which the data is linearly separable (Vapnik 1995). In this paper, we use a linear combination of $k=1,2,3,4$ Gaussian kernels:

$$\begin{aligned} K_{mix}(x,x')= \sum _{k=1}^{K}\rho _k \times K_k(x,x'), \quad \rho _k\ge 0 \quad \text{ and } \quad \sum _{k=1}^{K}\rho _k=1 \end{aligned}$$

(8)

where $\rho $ is the weighting coefficient and $K(x,x')_k=\exp \left( -\gamma ||\ x-x' ||^2\right) $ . Following Huang et al. (2014) the optimal $\rho $ is obtained by a grid search with the search step length 0.1.

Empirical research show that wavelet kernels have superior volatility forecasting results than the Gaussian kernel (Tang et al. 2009b, a; Li 2014). Then, we also use the Morlet wavelet kernel in the SVR–GARCH (Zhang et al. 2004):

$$\begin{aligned} k(x,x')=\prod _{i=1}^{N}\Big (cos\Big (1.75\times (\dfrac{x_i-x_i'}{a})\Big )\Big )\exp \Big ( \dfrac{-||\ x_i-x_i' ||^2}{2a^2}\Big ),\quad {x},{x}' \in \mathbb {R}^N \end{aligned}$$

(9)

4 Empirical modelling

4.1 Parametric volatility models

Let $P_t$ be asset price at time t. Then, the return of asset in time t is given by:

$$\begin{aligned} r_t=log \left( \frac{P_t}{P_{t-1}}\right) \end{aligned}$$

(10)

The GARCH can capture volatility clustering and persistence. The GARCH(1,1) is specified as follows:

$$\begin{aligned} r_t= & {} u_t+a_t \end{aligned}$$

(11)

$$\begin{aligned} a_t= & {} \sqrt{h_t}z_t,\quad z_t \sim i.i.d(0,1) \end{aligned}$$

(12)

$$\begin{aligned} h_t= & {} \alpha _0+\alpha _1a^2_{t-1}+\beta _1h_{t-1} \end{aligned}$$

(13)

where $\alpha _0>0$ and $\alpha _1, \beta _1\ge 0$. One of the drawbacks of the standard GARCH is that negative and positive shocks have the same impact in the volatility forecasts (Franses and van Dijk 1996). Glosten et al. (1993) introduced the GJR model to capture the asymmetric response of volatility to shocks. The GJR(1,1) is defined as:

$$\begin{aligned} h_t=\alpha _0+\alpha _1a^2_{t-1}+\beta _1h_{t-1}+\gamma S^-_{t-1}a^2_{t-1}, \end{aligned}$$

(14)

where

$$\begin{aligned} S^-_{t-1}= {\left\{ \begin{array}{ll} 1, &{} \quad \text{ se }\quad a_{t-1} < 0 \\ 0, &{} \quad \text{ otherwise } \end{array}\right. } \end{aligned}$$

(15)

where $\alpha _0>0$ and $\alpha _1, \beta _1\ge 0$, $\alpha _1+\gamma \ge 0$. The exponential generalized autoregressive conditional heteroscedasticity (EGARCH) (Nelson 1991) can model the the skewness of financial returns and ensure that the variance is always postive. The EGARCH(1,1) is written as the following:

$$\begin{aligned} \ln (h_t)=\alpha _0+\beta _1\ln \Big (h_{t-1}\Big )+\alpha _1\Bigg [\dfrac{|a_{t-1}|}{\sqrt{h_{t-1}}}- \dfrac{2}{\sqrt{\pi }}\Bigg ]-\gamma \Bigg (\dfrac{|a_{t-1}|}{\sqrt{h_{t-1}}}\Bigg ) \end{aligned}$$

(16)

where $\gamma $ is the assymetric response parameter. If this parameter is positive, a negative return generates more volatility than positive returns. In order to model the fat-tails of the empirical distribution of financial returns, the errors $z_t$ can follow a Student’s t, skewed Student’s t or Generalized Error Distribution (GED) distributions (Marcucci 2005):

1.
A random variable X that follows a Student’s t distribution has the following probability density function (pdf) (Casella and Berger 2001):
$$\begin{aligned} f(x) = \frac{\Gamma (\frac{\nu +1}{2})}{\sqrt{\nu \pi }\,\Gamma (\frac{\nu }{2})} \left( 1+\frac{x^2}{\nu } \right) ^{(-\frac{\nu +1}{2})} \end{aligned}$$
(17)
where $\nu $ is the degree of freedom parameter and $\Gamma (.)$ is the Gamma function.
2.
Generalized error distribution (GED): a random variable X that follows a GED distribution with zero mean and unit variance has the following pdf Tsay (2010):
$$\begin{aligned} f\left( x\right) =\dfrac{\nu exp [-\left( \frac{1}{2}\right) |(x / \lambda ) |^\nu ]}{\lambda 2^{(\nu +1 / \nu ) }\Gamma (1/\nu ) }, \quad 0<\nu \le \infty \end{aligned}$$
(18)
where:
$$\begin{aligned} \lambda =\left[ \dfrac{2^{-(2 / \nu )}\Gamma \left( 1 / \nu \right) }{\Gamma (3 / v) }\right] ^{1 / 2} \end{aligned}$$
(19)
where the parameter $\nu $ denotes the thickness-of-tail. When $0<\nu <2$, GED has thicker tails than the normal distribution.
3.
The skewed Student’s t-distribution can model the asymmetric effects and excess of kurtosis, the pdf takes the following form Fernandez and Steel (1998):
$$\begin{aligned}&f( x| \iota ,\nu ) =\dfrac{2}{\iota +1 / \iota } [ g(\iota (sx+m)|\nu ) I_{( -\infty ,0)} (x+ m / s)] \end{aligned}$$
(20)

$$\begin{aligned}&+\dfrac{2}{\iota +1 / \iota }[ g((sx+m)/\iota |\nu ) I_{( 0,+\infty )} (x+m / s)], \end{aligned}$$
(21)
where $g(./\nu )$ is a Student’s t-distribution with $\nu $ degress of freedom,
$$\begin{aligned} m= & {} \dfrac{\Gamma \left( \left( \nu +1\right) / 2\right) \sqrt{\nu -2}}{\sqrt{\pi }\Gamma \left( \nu / 2\right) }(\iota -1 / \iota ), \end{aligned}$$
(22)

$$\begin{aligned} s= & {} \sqrt{(\iota ^{2}+1 /\iota ^2-1)-m^2} \end{aligned}$$
(23)
where $\iota $ is the assymetric parameter.

4.2 SVR based on GARCH

In order to forecast volatility, we have to define the inputs and outputs of the SVR decision function. Previous studies showed that the GARCH(1,1) is sufficient to model financial volatility (Poon and Granger 2003; Hansen and Lunde 2005). Thus, in this work the conditional variance is modeled by a GARCH (1,1), while the conditional mean is modeled by an AR (1) process (Franses and van Dijk 1996). Then, to forecast volatility, we use a SVR based on GARCH (1,1) (heretofore SVR–GARCH). The output variable is $h_t$ and the input vector is: $x_t=[a^2_{t-1},h_{t-1}]$. The SVR–GARCH is given by the following structure:

$$\begin{aligned} r_t=f\left( r_{t-1}\right) +a_t \end{aligned}$$

(24)

where f is the decision function estimated by SVR for the mean equation. We get the squared residuals from the conditional mean estimation of the SVR–GARCH, then we estimate the conditional variance equation given by:

$$\begin{aligned} \tilde{h}_t=g(\tilde{h}_{t-1},a_{t-1}^2) \end{aligned}$$

(25)

where g is the decision function estimated by SVR, $a_{t}^2$ it is the squared residuals and $\tilde{h}$ is the volatility proxy. In the mean equation, we use a single Gaussian kernel ($K(x,x')= \exp \left( -\gamma ||\ x-x' ||^2\right) $), because it is a common choice in financial time series forecasting via SVR (Sapankevych and Sankar 2009). In the volatility equation of SVR–GARCH, we use a linear combination of one, two, three and four Gaussian kernels given by Eq. 8.

As volatility is not directly observable, is necessary the use of proxy. As in Brooks (2001), Brooks and Persand (2003), Chen et al. (2010) we use the following proxy:

$$\begin{aligned} \tilde{h}_t=(r_t-\bar{r})^2 \end{aligned}$$

(26)

where $r_t$ are the financial returns and $\bar{r}$ it is the mean of returns. Any volatility proxy is an imperfect estimator of the true conditional variance (Patton 2011). Perhaps the use of another proxy may alter the results presented here. However, this issue is beyond the scope of this paper.

Before applying the SVR–GARCH for volatility prediction, we use the validation procedure (also known as holdout method) based on grid search and sensitivy analysis to select the kernel parameter $\gamma $ (for Gaussian kernel), the regularized parameter C and loss function parameter $\epsilon $ (Stone 1974; Kohavi 1995; Arlot and Celisse 2010). We divide the database into three mutually exclusive sets: training, validation and testing (Shalev-shwartz and Ben-david 2014). The training set is used to estimate model parameters, then the performance of various values of the parameters are evaluated in the validation set. Following Cao and Tay (2001) and Chen et al. (2010), we make a sensitivity analysis to assess the effects of variation of parameters C, $\epsilon $, $\gamma $ in the MAE of volatility forecasting in the validation set. Therefore, we make a grid search for each parameter, keeping the others fixed. For the variation of each of the parameters, we make the forecasting in the validation set and then calculate the MAE. We choose the parameters that has the smallest value of MAE. Finally, we evaluate the SVR–GARCH generalization performance in the test set. In this paper, the whole data are divided into three subsets: the first 50% composes the training set, the next 20% composes the validation set and the last, 30%, is reserved for the test set.

We use the MAE and the RMSE to evaluate the prediction performance. The RMSE is given by:

$$\begin{aligned} RMSE=\sqrt{\dfrac{1}{n}\sum _{t=1}^{n} \epsilon _t^2 } \end{aligned}$$

(27)

MAE measures the magnitude of overall error and is given by the following equation (Hyndman and Koehler 2006):

$$\begin{aligned} MAE=\frac{1}{n} \sum _{t=1}^{n} |y_t-\hat{y_t} |\end{aligned}$$

(28)

where $y_t$ denote the observation at time t and $\hat{y_t}$ denote the forecast of $y_t$. The model which produces the smallest values of MAE and RMSE is judged to be the best model. MAE is a random variable and we have to use a statistical procedure to determine if one model shows superior predictive performance over another model. We use the two-sided (Diebold and Mariano 1995) (DM) test to compare the forecast performance of two competing models. Then, the DM test statistic is based on the difference of MAE loss function and it has the following null and alternative hypothesis:

$$\begin{aligned} H_0: MAE_1-MAE_0=0\quad \text{ versus }\quad H_1:MAE_1-MAE_0\ne 0 \end{aligned}$$

where $MAE_0$ is the MAE of the competing model and $MAE_1$ is the MAE of the proposed model. Thus, if the null hypothesis is rejected , there is evidence that some model is superior to the other. Moreover, according to Chen et al. (2010), the Diebold–Mariano(DM) statistic in a robust form for a time series with volatility $\sigma _t$ is given by:

$$\begin{aligned} DM={\frac{1}{\sqrt{n}}}{\frac{1}{\sqrt{\hat{S^2}}}}\sum _{t=T_1}^{T-1} (|\sigma _{t+1}^2 -\hat{\sigma }_{1,t+1}^2 |- |\sigma _{t+1}^2 -\hat{\sigma }_{0,t+1}^2 |)\sim N(0,1) \end{aligned}$$

(29)

where $\hat{\sigma }_{0,t+1}^2$ is the volatility estimated by the competing model, $\hat{\sigma }_{1,t+1}^2$ is the volatility estimated by the proposed model and $\hat{S^2}$ denotes the co(variance) matrix estimated by the Newey and West (1987) procedure. Negative (positive) values of DM statistic indicates that the proposed model performs better (worse) than the competing model.

5 Empirical results

In this section, we apply the SVR–GARCH with a linear combination of one, two, three and four Gaussian kernels to volatility forecast and compare its performance to three other parametric volatility models, specifically, the GARCH, EGARCH and GJR. The first dataset consists of the Nikkei 225 daily closing price from May 1, 2010 to January 28, 2016 for a total of 1422 observations all obtained from Yahoo Finance and then transformed into log return as in 10. The second dataset consists of the daily closing price of the Bovespa index for the period December 22, 2007 to January 04, 2016. The first half of the whole data are used as the training data, 20% are reserved for the validation set and the remaining data, 30%, as test set. Table 2 shows the summary of the descriptive statistics for the Nikkei 225 and Ibovespa along the whole sample period.

Table 2 Descriptive statistics for daily returns

Full size table

The returns are characterized by excess kurtosis and deviate from normal distribution. Table 3 and Table 4 show the parameter estimates for the GARCH (1,1), EGARCH (1,1) and GJR (1,1) models for the Nikkei and Ibovespa returns. For each model four different distributions for the innovations are considered: the Normal, the Student’s t,the skewed Student’s t and the GED (Generalized Error Distribution). The Nikkei and Ibovespa series best fit to the GJR with skewed Student’s t innovation, according to highest value of Log likelihood (LL) and smallest value of AIC and BIC.

Table 3 Goodness of fit for Nikkei returns

Full size table

Table 4 Goodness of fit for Ibovespa returns

Full size table

Then, we select the parameters C, $\epsilon $ and the kernel parameters for the conditional mean and volatility equation via cross-validation. For the mean equation, we use a Gaussian kernel and for the volatility equation we use a linear combination of one, two three and four Gaussian kernels. The first 711 observations of Nikkei returns series are used for training, from 712 to 996 for validation and from 997 to 1422 for the test set. We use the training set to estimate the function f of the mean equation and g of the volatility equation of SVR–GARCH. In this section, we only report the parameter selection for the SVR–GARCH with a linear combination of two Gaussian kernels. The parameter selection for the SVR–GARCH with one, two, three Gaussian kernels and Morlet kernel is similar and not reported here to save space. For the same reason, we do not report the results for the Ibovespa returns.

First, we estimate the conditional mean equation in the training set:

$$\begin{aligned} r_t=f(r_{t-1})\quad \text{ for } \; i \in (2,\ldots ,711) \end{aligned}$$

(30)

For the selection of optimal parameters , we use a grid search for each parameter, while keeping the others fixed. For the variation of each parameter, we make a forecast in the validation set in order to minimize the following expression:

$$\begin{aligned} MAE=\frac{1}{284} \sum _{t=712}^{996} |r_t - f(r_{t-1}) |\end{aligned}$$

(31)

For the sensitive analysis of C, we fix $\epsilon =0,0001$, $\gamma =1,25$ and parameter C takes values in the range [0, 10]. The value of $C=0.004$ leads to the best validation performance. Epsilon varies in the range [0, 5], with $\gamma =1,25$, $C=0,025$. The validating MAE attains the minima when $\epsilon =0.2205$. Parameter $\gamma $ takes value in the range [0, 10], with $C=0.004$ and $\epsilon =0.2205$. The value of $\gamma =0.9$ results in the best validation performance. Thus, the best parameters of SVR–GARCH for the conditional mean returns are: $C=0.004$, $\epsilon =0.2205$ and $\gamma =0.9$ (Table 5):

Table 5 Sensitivity analysis of SVR in conditional mean estimation

Full size table

Therefore, we estimate the conditional mean equation by using the SVR–GARCH with the best parameters for the conditional mean until the 996 observation to obtain the residuals $a_t$ in the following way:

$$\begin{aligned} a_t=r_t-f(r_t)\quad \text{ for } \; i \in (2,\ldots ,996) \end{aligned}$$

(32)

Then we estimate the volatility equation of SVR–GARCH(1,1):

$$\begin{aligned} \tilde{h}_t=g(\tilde{h}_{t-1},a_{t-1}^2)\quad \text{ for } \; i \in (2,\ldots ,711) \end{aligned}$$

(33)

where $a_t^2$ is the squared residuals. The volatility proxy $\tilde{h}_t$ is calculated until the 996 observation and the parameter selection is made in order to minimize the following expression:

$$\begin{aligned} MAE=\frac{1}{284} \sum _{t=712}^{996} |\tilde{h}_t - g(\tilde{h}_{t-1},a_{t-1}^2) |\end{aligned}$$

(34)

For the sensitive analysis of C, we fix $\epsilon =0.0001$, $\gamma _1=0.01$, $\gamma _2=0.07$, $\rho =0.25$ and parameter C takes values in the range [0, 10]. The validating MAE attains the minima when $C=0,625$. As in the mean equation , we do the same procedure for the others parameters (Table 6).

Table 6 Sensitivity analysis of SVR in conditional variance estimation

Full size table

Thus, the appropriate parameters of SVR–GARCH for the conditional variance are $C=5.184$, $\epsilon =0.05929$, $\gamma _1=0.9801$, $\gamma _2=0.01 $ and $\rho =0.37$.

5.1 Volatility forecasting evaluation

With the SVR–GARCH optimal parameters (C, $\epsilon $ and kernel parameters), we make the one-period-ahed volatility forecasts in the test set (i.e. out-of-sample). After each forecast, we calculate the forrecast errors and repeat the forecasting process for the next period. Table 7 report the values of MAE and RMSE obtained from different models for the Nikkei and Ibovespa returns.

Table 7 Out-of-sample evaluation of one-period-ahead volatility forecasts

Full size table

For the Nikkei 225 series, the SVR–GARCH with a mixture of three Gaussian kernels achieve smallest value of MAE. But, the SVR–GARCH with Morlet wavelet kernel achieve the smallest value of RMSE. According to MAE and RMSE measures in Table 7, the SVR–GARCH with a linear combination of four Gaussian kernels is the best one for the Ibovespa series. To compare the predictive power of two models we use the two-sided Diebold–Mariano test given by the following null and alternative hypotheses for the Nikkei returns:

$$\begin{aligned}&H_0: \dfrac{1}{426}\left| \tilde{h}_t- \hat{h}_{1,t}\right| -\left| \tilde{h}_t- \hat{h}_{0,t}\right| =0 \quad \text{ versus } \nonumber \\&H_1:\dfrac{1}{426}\left| \tilde{h}_t- \hat{h}_{1,t}\right| - \left| \tilde{h}_t- \hat{h}_{0,t}\right| \ne 0, \end{aligned}$$

(35)

Table 8 Diebold–Mariano test (benchmark: SVR–GARCH ⁴, one-step-ahead)

Full size table

Table 9 Diebold–Mariano test (benchmark: SVR–GARCH ³, one-step-ahead)

Full size table

where $\tilde{h}_t$ is the volatility proxy, $\hat{h}_{0,t}$ is the volatility estimated by the proposed model and ${\hat{h}}_{1,t}$ is the volatility estimated by the competing model. Moreover, the DM test statistic is given by Chen et al. (2010):

$$\begin{aligned} DM={\frac{1}{\sqrt{426}}}{\frac{1}{\sqrt{{\hat{S}}^{2}}}}\sum _{t=996}^{1422} \left| {\tilde{h}}_{t}- {\hat{h}}_{1,t}\right| -\left| {\tilde{h}}_{t}- {\hat{h}}_{0,t}\right| \sim N(0,1) \end{aligned}$$

(36)

Tables 8 and 9 report the DM statistics and p-values of the Diebold–Mariano test for the difference of MAE loss function for the Nikkei 225 and Ibovespa daily returns, respectively:

For the Nikkei 225 and Ibovespa series, it is evident that all SVR–GARCH models significantly outperform every GARCH model at any usual confidence level. For the Nikkei 225 series, except for SVR–GARCH with Morlet wavelet kernel, the sign of the DM statistic is always negative, implying that the benchmark’s loss is lower than the loss implied by the competing models. However, we cannot reject the null hypothesis for the SVR–GARCH with Morlet wavelet kernel and with one and two Gaussian kernels, which means that these models have equal forecasting ability. For the other models we always reject the null hypothesis of equal forecast accuracy at any usual confidence level. For the Ibovespa series, the sign of the DM statistic for the SVR–GARCH with a linear combination of three kernels is always negative and we always reject the null hypothesis of equal forecast accuracy.

6 Concluding remarks

The main contributions of this paper is to use a mixture of one, two, three and four Gaussian kernels in the SVR based on GARCH(1,1) to take into account the existence of market regimes. We compare these models with SVR–GARCH with Morlet wavelet kernel, GARCH, EGARCH and GJR models in terms of their ability to forecast volatility by using MAE, RMSE and Diebold–Mariano test. All GARCH models are estimated assuming Gaussian, Student’s t, sweked Student’s t and GED innovations. To determine the SVR optimal parameters we use used the validation technique (holdout method) based on grid-search and sensitivity analysis. Nikkei 225 and Ibovespa daily returns were used as the dataset. The empirical results indicate that the mixture of Gaussian kernels can improve the SVR–GARCH one-period-ahead volatility forecasts. In sum, the mixture of normal distributions can model the overall distribution of financial returns when markets display regime behaviour and also better approximate nonlinear characteristics of financial returns such as heavy tails, volatility clustering and time-varying skewness.

References

Alexander C, Lazar E (2006) Normal mixture GARCH(1,1): applications to exchange rate modelling. J Appl Econom 21(3):307–336
Article Google Scholar
Arlot S, Celisse A (2010) A survey of cross-validation procedures for model selection. Stat Surv 4:40–79
Article Google Scholar
Bae GI, Kim WC, Mulvey JM (2014) Dynamic asset allocation for varied financial markets under regime switching framework. Eur J Oper Res 234(2):450–458
Article Google Scholar
Bai X, Russell JR, Tiao GC (2003) Kurtosis of GARCH and stochastic volatility models with non-normal innovations. J Econom 114(2):349–360
Article Google Scholar
Bollerslev T (1986) Generalized autoregressive conditional heteroskedasticity. J Econom 31:307–327
Article Google Scholar
Brailsford TJ, Faff RW (1996) An evaluation of volatility forecasting techniques. J Bank Finance 20:419–438
Article Google Scholar
Brooks C (2001) A Double-threshold GARCH Model for the French Franc/Deutschmark exchange rate. J Forecast 20(2):135–143
Article Google Scholar
Brooks C, Persand G (2003) Volatility forecasting for risk management. J Forecast 22(1):1–22
Article Google Scholar
Brownlees CT, Gallo GM (2009) Comparison of volatility measures: a risk management perspective. J Financial Econom 8(1):29–56
Article Google Scholar
Cao L, Tay F (2003) Support vector machine with adaptive parameters in financial time series forecasting. IEEE Trans Neural Netw 14(6):1506–1518
Article Google Scholar
Cao L, Tay FE (2001) Financial forecasting using support vector machines. Neural Comput Appl 10(2):184–192
Article Google Scholar
Casella G, Berger RL (2001) Statistical inference, 2nd edn. Duxbury Press, California
Google Scholar
Cavalcante RC, Brasileiro RC, Souza VL, Nobrega JP, Oliveira AL (2016) Computational intelligence and financial markets: a survey and future directions. Expert Syst Appl 55:194–211
Article Google Scholar
Chen S, Härdle WK, Jeong K (2010) Forecasting volatility with support vector machine-based GARCH model. J Forecast 433(29):406–433
Google Scholar
Cherkassky V, Ma Y (2004) Practical selection of SVM parameters and noise estimation for SVM regression. Neural Netw 17(1):113–126
Article Google Scholar
Choudhry T, Wu HAO (2008) Forecasting ability of GARCH vs Kalman filter method: evidence from Daily UK Time-Varying Beta. J Forecast 689:670–689
Article Google Scholar
Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263
Google Scholar
Engle RF (1982) Autoregressive conditional Heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4):987–1007
Article Google Scholar
Fernandez C, Steel MFJ (1998) On Bayesian modeling of fat tails and skewness. J Am Stat Assoc 93(441):359
Google Scholar
Fernando P-C, Afonso-Rodríguez JA, Giner J (2003) Estimating GARCH models using support vector machines. Quant Finance 3:1–10
Article Google Scholar
Franses PH, van Dijk D (1996) Forecasting stock market volatility using (non-linear) Garch models. J Forecast 15(3):229–235
Article Google Scholar
Gavrishchaka VV, Banerjee S (2006) Support vector machine as an efficient framework for stock market volatility forecasting. Comput Manag Sci 3(2):147–160
Article Google Scholar
Gavrishchaka VV, Ganguli SB (2003) Volatility forecasting from multiscale and high-dimensional market data. Neurocomputing 55(1–2):285–305
Article Google Scholar
Glosten LR, Jagannthan R, Runkle DE (1993) On the relation between the expected value and the volatility of the nominal excess return on stocks. J Finance 48(5):1779–1801
Article Google Scholar
Guidolin M (2011) Markov switching models in empirical finance. In: Drukker DM (ed) Missing data methods: time-series methods and applications (advances in econometrics), vol 27. Emerald Group Publishing Limited, UK, pp 1–86
Haas M, Mittnik S, Paolella MS (2004) Mixed normal conditional heteroskedasticity. J Financial Econom 2(2):211–250
Article Google Scholar
Hansen PR, Lunde A (2005) A forecast comparison of volatility models: does anything beat a GARCH(1,1)? J Appl Econom 20(7):873–889
Article Google Scholar
Haykin S (1998) Neural networks: a comprehensive foundation, 2nd edn. Prentice Hall, New York (ISBN-10: 0132733501, ISBN-13: 978-0132733502)
Huang, C., Gao, F., Jiang, H., 2014. Combination of biorthogonal wavelet hybrid kernel OCSVM with feature weighted approach based on EVA and GRA in financial distress prediction. In: Mathematical problems in Engineering 2014
Hyndman RJ, Koehler AB (2006) Another look at measures of forecast accuracy. Int J Forecast 22(4):679–688
Jorion P (1995) Predicting volatility in the foreign exchange market. J Finance 50(2):507–528
Article Google Scholar
Karush W (1939) Minima of functions of several variables with inequalities as side constraints. Ph.D. thesis, University of Chicago
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the 14th international joint conference on artificial intelligence, vol 14. Morgan Kaufmann Publishers Inc., Monreal, pp 1137–1143
Kuhn HW, Tucker A (1951) Nonlinear programming. University of California Press, California
Google Scholar
Levy M, Kaplanski G (2015) Portfolio selection in a two-regime world. Eur J Oper Res 242(2):514–524
Article Google Scholar
Li Y (2014) Estimating and forecasting APARCH-Skew- t model by wavelet support vector machines. J Forecast 269(March):259–269
Article Google Scholar
Marcucci J (2005) Forecasting stock market volatility with regime-switching GARCH models. Stud Nonlinear Dyn Econom 9(4):1–55
Marron JS, Wand M (1992) Exact mean integrated squared error. Ann Stat 20(2):712–736
Article Google Scholar
McLachlan G, Peel D (2004) Finite mixture models. Wiley, Canada
Mcmillan DG, Speight A (2000) Forecasting UK stock market volatility. Appl Financial Econ 10:435–448
Article Google Scholar
Mercer J (1909) Functions of positive and negative type and their connection with the theory of integral equations. Philos Trans R Soc Lond 209(A):415–446
Article Google Scholar
Nelson DB (1991) Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59(2):347–370
Article Google Scholar
Newey WK, West KD (1987) A simple, positive semi-definite, heteroskedasticity and autocorrelation consistent covariance matrix. Econometrica 55:703–708
Ou P, Wang H (2013) Volatility modelling and prediction by hybrid support vector regression with chaotic genetic algorithms. Int Arab J Inf Technol 11(3):287–292
Google Scholar
Patton AJ (2011) Volatility forecast comparison using imperfect volatility proxies. J Econ 160(1):246–256
Article Google Scholar
Poon SH, Granger CW (2003) Forecasting volatility in financial markets: a review. J Econ Lit 41(2):478–539
Sangeetha R, Kalpana B (2010) A comparative study and choice of an appropriate kernel for support vector machines. In: Information and communication technologies, pp 549–553
Santamaría-Bonfil G, Frausto-Solís J, Vázquez-Rodarte I (2015) Volatility forecasting using support vector regression and a hybrid genetic algorithm. Comput Econ 45:111–133
Article Google Scholar
Sapankevych NI, Sankar R (2009) Time series prediction using support vector machines: a survey. IEEE Comput Intell Mag 4(2):24–38
Sermpinis G, Stasinakis C, Theofilatos K, Karathanasopoulos A (2014) Inflation and unemployment forecasting with genetic support vector regression. J Forecast 33(6):471–487
Article Google Scholar
Shalev-shwartz S, Ben-david S (2014) Understanding machine learning: from theory to algorithms. Cambridge University Press, New York
Smola AJ, Schölkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Article Google Scholar
Stone M (1974) Cross-validatory choice and assessment of statistical predictions. J R Stat Soc 36(2):111–147
Google Scholar
Tang L-B, Sheng H-Y, Tang L-X (2009a) GARCH prediction using spline wavelet support vector machine. Neural Comput Appl 18(8):913–917
Article Google Scholar
Tang L-B, Tang L-X, Sheng H-Y (2009b) Forecasting volatility based on wavelet support vector machine. Expert Syst Appl 36(2):2901–2909
Article Google Scholar
Tsay RS (2010) Analysis of financial time series, 3rd edn, vol 48. Wiley, Newyork
Tu J (2010) Is regime switching in stock returns important in portfolio decisions? Manag Sci 56(7):1198–1215
Article Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, Berlin
Book Google Scholar
Vapnik VN, Chervonenkis AY (1974) Theory of pattern recognition: statistical problems of learning. Nauka, Moscow
Google Scholar
Wang B, Huang H, Wang X (2011) A support vector machine based MSM model for financial short-term volatility forecasting. Neural Comput Appl 22(1):21–28
Article Google Scholar
Wang J, Taaffe MR (2015) Multivariate mixtures of normal distributions: properties, random vector generation, fitting, and as models of market daily changes. INFORMS J Comput 27(2):193–203
Article Google Scholar
Wirjanto TS, Xu D (2009) The applications of mixtures of normal distributions in empirical finance: a selected survey, Working paper 0904. University of Waterloo, Department of Economics. http://economics.uwaterloo.ca/documents/mn-review-paper-CES.pdf
Zhang L, Zhou W, Jiao L (2004) Wavelet support vector machine. IEEE Trans Syst Man Cybern Part B 34(1):34–39
Article Google Scholar

Download references

Author information

Authors and Affiliations

Departament of Management, Universidade de Brasília, Brasília, Brazil
Pedro Correia S. Bezerra & Pedro Henrique M. Albuquerque

Authors

Pedro Correia S. Bezerra
View author publications
You can also search for this author in PubMed Google Scholar
Pedro Henrique M. Albuquerque
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pedro Henrique M. Albuquerque.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bezerra, P.C.S., Albuquerque, P.H.M. Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels. Comput Manag Sci 14, 179–196 (2017). https://doi.org/10.1007/s10287-016-0267-0

Download citation

Received: 20 April 2016
Accepted: 01 November 2016
Published: 16 November 2016
Issue Date: April 2017
DOI: https://doi.org/10.1007/s10287-016-0267-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels

Abstract

Similar content being viewed by others

Forecasting Financial Returns Volatility: A GARCH-SVR Model

Forecasting stock volatility process using improved least square support vector machine approach

Forecasting volatility by using wavelet transform, ARIMA and GARCH models

1 Introduction

2 Support vector regression

3 SVR–GARCH with a mixture of Gaussian kernels

4 Empirical modelling

4.1 Parametric volatility models

4.2 SVR based on GARCH

5 Empirical results

5.1 Volatility forecasting evaluation

6 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Volatility forecasting via SVR–GARCH with mixture of Gaussian kernels

Abstract

Similar content being viewed by others

Forecasting Financial Returns Volatility: A GARCH-SVR Model

Forecasting stock volatility process using improved least square support vector machine approach

Forecasting volatility by using wavelet transform, ARIMA and GARCH models

1 Introduction

2 Support vector regression

3 SVR–GARCH with a mixture of Gaussian kernels

4 Empirical modelling

4.1 Parametric volatility models

4.2 SVR based on GARCH

5 Empirical results

5.1 Volatility forecasting evaluation

6 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation