1 Introduction

Accurately measuring and forecasting financial volatility play a crucial role in asset and derivative pricing, hedge strategies, portfolio allocation and risk management (Franke and Westerhoff 2011; Huang 2011; Berument et al. 2012; Brandão et al. 2012; Lin et al. 2012; Haugom et al. 2014; Seo and Kim 2015). A convenient framework for dealing with time-dependent volatility in financial markets involves the generalized autoregressive conditional heteroskedasticity (GARCH) model (Engle 1982), a popular tool for volatility modeling. The GARCH model jointly estimates a conditional mean and variance, characterized by a fat tail and excess of kurtosis, regularly used in studying daily stock market return data (Han and Park 2008; Bentes 2015).Footnote 1

Despite its success, GARCH modeling has been criticized for failing to capture volatility dynamics of asset returns in highly unstable environments, as during a financial crisis (Kung and Yu 2008; Tseng et al. 2008; Lim and Sek 2013; Apergis 2015). Methods based on artificial neural networks have been extensively used because they provide a flexible way to describe stock returns volatility (Hajizadeh et al. 2012; Kristjanpoller et al. 2014; Monfared and Enke 2014; Dash et al. 2015). Neural networks approaches use volatility as input to achieve more accurate forecasts (Tung and Quek 2011; Wang et al. 2012; Fernandes et al. 2014; Vortelinos 2015).Footnote 2 Despite their attractiveness, neural networks have some drawbacks such as their “black box” nature, proneness to overfitting, and the character of model development. Moreover, they do not consider one important asset return stylized fact: volatility clustering (Liu and Hung 2010; Ning et al. 2015).

To overcome these limitations, methods based on fuzzy set theory have been developed as a flexible framework to explain complex dynamics such as stock trading decision (Troiano and Kriplani 2011; Vella and Ng 2014), exchange rate forecasting (Korol 2014; Gharleghi et al. 2014), portfolio selection (Bermúdez et al. 2012; Zhang and Zhang 2014; Li et al. 2015), term structure of interest rates estimation (Sánchez and Gómez 2003), asset pricing modeling (Moussa et al. 2014), and also financial time-series volatility modeling and forecasting (Hung 2011a; Capotorti et al. 2013; Muzzioli et al. 2015). Basically, these methods construct hybrid models combining fuzzy systems and GARCH models whose structure addresses both, time-varying volatility and volatility clustering. Popov and Bykhanov (2005), Chang et al. (2011), Helin and Koivisto (2011) and Hung (2011b), for example, combine fuzzy and GARCH models to tackle the problem of volatility modeling and forecasting. Modeling in this framework requires high computational effort because they must estimate model parameters using all data currently stored in the database. This may be troublesome in situations in which forecasts are needed whenever new data arrive. Therefore, current hybrid modeling methods may be useless in dynamic environments such as volatility forecasting, immunization strategies, portfolio allocation, and risk management.

This paper suggests an evolving fuzzy-GARCH approach for financial time-series volatility modeling and forecasting. The model is a collection of fuzzy rules whose consequents are GARCH models. The collection of fuzzy rules, or rule base for short, is continuously revised whenever new data is input. The number of fuzzy rules and parameters of the rules antecedents and consequents are simultaneously adjusted using a recursive clustering procedure to create/exclude/update rules, and a recursive algorithm to estimate the parameters. This is an essential requirement to develop models in time-varying, nonstationary environments (Lemos et al. 2011). Evolving fundamentally means high level adaptation to accommodate new data into existing models on a recursive basis. Adaptation may add new rules into the rule base, remove, or update current rules whenever necessary. The parameters of the GARCH models of rule consequents are also object of adaptation. This means that the evolving fuzzy model captures new information from data streams, adapts itself to the new scenario, and avoids redesigning and retraining.

Recent literature reveals several applications of evolving fuzzy rule-based models in finance and economics. Examples include: Value-at-Risk modeling and forecasting (Ballini et al. 2009a), sovereign bonds modeling (Ballini et al. 2009b), exchange rates forecasting (McDonald and Angelov 2010), fixed income option pricing (Maciel et al. 2012b), interest rate term structure forecasting (Maciel et al. 2012a), financial volatility forecasting (Luna and Ballini 2012a), stochastic volatility prediction (Luna and Ballini 2012b), and volatility forecasting with jumps (Maciel et al. 2013; Rosa et al. 2014).

Recently, Maciel (2012, 2013) proposed a fuzzy GJR-GARCH model to forecast the volatility of S&P 500 and Ibovespa indexes. The model addresses fuzzy inference systems and GJR-GARCH framework, which is appropriate to account for leverage effects. Moreover, a differential evolution (DE) algorithm is suggested to solve the problem of fuzzy GJR-GARCH parameters estimation. The results indicate that the proposed method offers significant improvements in volatility forecasting performance in comparison with traditional GARCH-type models. Besides the good results, the model does processes data in batch, which requires large quantities of data in the database. Moreover, the DE algorithm requires a considerable number of tuning parameters, defined by the user, as well as is computational time consuming, i.e., it is inefficient in on-line domains.

The evolving fuzzy-GARCH model suggested brings novel features to the existing approaches addressed in the literature. First, the evolving fuzzy-GARCH model combines adaptive fuzzy modeling with GARCH models, which gives a more realistic framework to capture the imprecise and time-varying nature of volatility and volatility clustering. Second, the approach translates into a simple, fast and memory efficient recursive algorithm which process data streams naturally. Indeed, the computational experiments performed using S&P 500 (United States) and Ibovespa (Brazil) indexes from January 3, 2000 through September 30, 2011, show that the evolving fuzzy-GARCH outperforms the GARCH-family of models, and also provides comparable results with fuzzy GJR-GARCH methodology.

The reminder of this paper proceeds as follows. Section 2 details the evolving fuzzy-GARCH model suggested in this work. Section 3 describes the computational experiments and analyzes stock market volatility forecasting. Section 4 concludes the paper summarizing its contributions and issues for further investigation.

2 Evolving Fuzzy-GARCH Modeling

2.1 Evolving Fuzzy Systems

The effectiveness of data stream oriented learning algorithms is rooted in their aptitude to quickly evolve models from nonstationary data. The key issues are incremental learning and recursive data processing. New data may either reinforce or suggest revision of the current model, depending if data is compatible with existing knowledge or not. Data stream and recursive processing approaches are particularly important in time-varying and nonstationary dynamic system modeling because usually system operating conditions change, fault occurs, and parameters modify.

The main concern in evolving systems modeling is how update the current model structure and parameters using the newest data sample. An evolving system can both, develop model structure from scratch and perform recursive computation using incoming data to continuously develop model structure and functionality through self-organization.

Fuzzy rule-based models whose rules are endowed with functions forming their consequents are commonly referred to as fuzzy functional models. The Takagi-Sugeno is a typical example of a fuzzy functional model. A particularly important case is when rules consequents are linear functions of the variables that appear in the rules antecedents. For instance, the evolving Takagi-Sugeno model and its variations (Angelov and Filev 2004) assume rule-based models whose fuzzy rules are as follows:

$$\begin{aligned} {\mathcal {R}}_{i}: \hbox {IF}\quad {\mathbf {x}}~\hbox { is }~{\mathcal {A}}_{i}\quad \hbox {THEN}\,y_{i} = a_{i0} + \displaystyle \sum _{j=1}^{m}a_{ij}x_{j}, \quad i=1,\ldots ,R, \end{aligned}$$

where \({\mathcal {R}}_{i}\) is the ith fuzzy rule, R is the number of fuzzy rules, \({\mathbf {x}}\in \mathfrak {R}^{m}\) is the input data, \(y_{i}\) is the output of the ith rule, \({\mathcal {A}}_{i}\) is the vector of antecedents fuzzy sets, \(a_{i0}\), \(a_{ij}\) is the parameters of the consequent

The collection of the R rules constructs the model as a combination of local linear models. Given an input \({\mathbf {x}}=[x_1,x_2,\ldots ,x_m]^T\), the contribution of a local linear model to the overall output is proportional to the membership degree \({\mathcal {A}}_{i}({\mathbf {x}})\), called the activation level of the ith rule.

Antecedent fuzzy sets may have triangular, rectangular or Gaussian membership functions. Recursive learning of evolving fuzzy models requires recursive clustering to find the rules and the membership functions of the fuzzy sets of rules antecedents. Each cluster corresponds to a fuzzy rule. Fuzzy sets of \({\mathcal {A}}_{i}({\mathbf {x}})\) may have parameterized membership functions. For instance, if they are Gaussians, then cluster centers are assigned as their central values and spread is chosen to partition input spaces properly. Often, the recursive least squares algorithm is used to compute the parameters of the rules consequents, as it will be shown in Sect. 2.3.

2.2 GARCH-Type Models

The GARCH(pq) considers that the current conditional variance depends on p past conditional variances and on q past squared innovations. Let \(r_t = 100\times ({\text {ln}} \ P_t - {\text {ln}} \ P_{t-1})\) denote the continuously compounded rate of stock returns from time \(t-1\) to t, where \(P_t\) is the daily closing stock price at t. The GARCH(pq) model can be written as:

$$\begin{aligned}&\displaystyle r_t = \sigma _t \xi _t, \end{aligned}$$
(1)
$$\begin{aligned}&\displaystyle \sigma ^2_t = \alpha _0 + \sum _{n=1}^{q}{\alpha _{1,n} r_{t-n}^2} + \sum _{j=1}^{p}{\alpha _{2,j} \sigma _{t-j}^2}, \end{aligned}$$
(2)

where \(\xi _t\) is a sequence of independent and identically distributed random variables with zero-mean and unit variance, \(\sigma _t^2\) is the conditional variance of \(\xi _t\), and \(\alpha _0\), \(\alpha _{1,n}\) and \(\alpha _{2,j}\) are unknown coefficients to be estimated.

The GARCH model reduces the number of parameters by considering the information in the lags of the conditional variance and in the lagged \(r_{t-n}^2\) terms as in ARCH-type models. The simplicity of GARCH modeling and its ability to capture volatility persistence explain its empirical and theoretical attractiveness. However, it fails to capture stock fluctuations with volatility clustering well. This fact can lead to inadequacy and poor forecasting ability. The fuzzy-GARCH appears as an alternative approach for volatility modeling and forecasting in a presence of volatility clustering as discussed next.

2.3 The Evolving Fuzzy-GARCH Model

Fuzzy functional models and inference systems are universal approximators. They can uniformly approximate any continuous function in compact domains with arbitrary accuracy (Ji et al. 2007; Kreinovich et al. 1998). GARCH models are able to capture time-varying volatility. The fuzzy-GARCH approach combines the approximation power of functional fuzzy modeling with the GARCH ability to encapsulate time-varying volatility to model the behavior of stock fluctuations with volatility clustering. A fuzzy-GARCH(pq) model is a collection of functional fuzzy rules \({\mathcal {R}}_i\) of the form:

(3)

where \({\mathcal {A}}_r\left( r_{t-n} \ {{\text {is}}} \ {\mathcal {A}}_{i,n} \right) \) and \({\mathcal {A}}_{\sigma ^2}\left( \sigma ^2_{t-j} \ {{\text {is}}} \ {\mathcal {A}}_{i,q+j} \right) \) are the rule antecedents associated with stock market returns r and volatility \(\sigma ^2\), \(i=1,2,\ldots ,R\), variables \(r_{t-n}\) and \(\sigma ^2_{t-j}\), \(n=1,2,\ldots ,q\) and \(j = 1,2,\ldots ,p\), are lagged values of the stock market return and volatility. Rule antecedent \({\mathcal {A}}_r \left( r_{t-n} \ {{\text {is}}} \ {\mathcal {A}}_{i,n} \right) \) denotes \((r_{t-1} {\mathrm {is}} \ {\mathcal {A}}_{i,1} \ {\mathrm {AND}} \ r_{t-2} \ {\mathrm {is}} \ {\mathcal {A}}_{i,2} \ {\mathrm {AND}} \ \cdots \ {\mathrm {AND}} \ r_{t-q} \ {\mathrm {is}} \ {\mathcal {A}}_{i,q})\) for short. Similarly, \({\mathcal {A}}_{\sigma ^2}\left( \sigma ^2_{t-j} \ {{\text {is}}} \ {\mathcal {A}}_{i,q+j} \right) \) denotes \((\sigma _{t-1}^{2} \ {\mathrm {is}} \ {\mathcal {A}}_{i,q+1} \ {\mathrm {AND}} \ \sigma ^2_{t-2} \ {\mathrm {is}} {\mathcal {A}}_{i,q+2} \ {\mathrm {AND}} \ \cdots \ {\mathrm {AND}} \ \sigma _{t-j}^{2} \ {\mathrm {is}} \ {\mathcal {A}}_{i,q+j})\).

Let \({\mathbf {x}} \in \mathfrak {R}^{q+p}\) the vector \({\mathbf {x}} = [x_{1}, \ x_{2}, \ \ldots ,\ x_{l}, \ \ldots , \ x_{q+p}]^T\) such that \(x_{1} = r_{t-1}, \ x_{2} = r_{t-2}, \ \ldots , \ x_{q+p-1} = \sigma ^2_{t-p-1}, \ x_{q+p}= \sigma ^2_{t-p}\) be the input, and \(y = \sigma ^2_t\) be the output, \(y \in \mathfrak {R}\). Assuming Gaussian membership functions for the fuzzy sets of the rules antecedents we have:

$$\begin{aligned} {\mathcal {A}}_{i,l} \left( x_l\right) = {\mathrm {exp}}\left( -\frac{\left( x_{i,l}^{*}-x_{l}\right) ^2}{2s^2}\right) , \end{aligned}$$
(4)

where \({\mathcal {A}}_{i,l} \left( x_l\right) \) denotes the membership degree of lth input vector component \(x_l\), \(x_{i,l}^{*}\) the ith cluster center of the lth input component and s the spread of the lth fuzzy set in the antecedent of the ith fuzzy rule.Footnote 3

The activation level of the ith rule, assuming AND as the product T-norm in rules antecedents, is:

$$\begin{aligned} \tau _i\left( {\mathbf {x}}\right) = \prod _{l=1}^{q+p}{{\mathcal {A}}_{i,l}\left( x_l\right) }. \end{aligned}$$
(5)

The output y of the model is the weighted average of the individual rule contributions:

$$\begin{aligned} y= \sum _{i=1}^R{\lambda _i y_i}=\sum _{i=1}^R{\lambda _i {\mathbf {x}}^T_e\varTheta _i}, \ \ \lambda _i = \frac{\tau _i}{\sum _{h=1}^R{\tau _h}}, \end{aligned}$$
(6)

where \(\varTheta _i = \left[ \alpha _{0}^{i}, \alpha _{1,1}^{i}, \alpha _{1,2}^{i}, \ldots , \alpha _{1,q}^{i}, \alpha _{2,1}^{i}, \alpha _{2,2}^{i}, \ldots , \alpha _{2,p}^{i} \right] ^T\) is the vector of parameters of the ith rule consequent, \(\lambda _i\) is the normalized activation level of the ith rule, and \({\mathbf {x}}_e = \left[ 1 \ {\mathbf {x}}^T\right] ^T\) is the expanded input vector.

There are essentially two sub-tasks related to the recursive identification of evolving fuzzy-GARCH models: clustering to learn rules and the central points of the membership functions of rules antecedents, and estimate the parameters of the linear functions of rules consequents. Rules learning and consequent parameters estimation are detailed in the next sections.

2.3.1 Learning Rules and Their Antecedents

Rules antecedent learning in fuzzy-GARCH modeling uses the eClustering algorithm, developed by Angelov (2010). eClustering is a recursive procedure to process streaming data and find clusters in the input space.Footnote 4 The collection of fuzzy clusters defines the fuzzy rule base. Each cluster and a corresponding linear function forms a fuzzy rule. The mechanism to form a new rule, modify an existing one, or remove a rule from the rule base is rooted on the notion of density of a data point using Cauchy functions.

Density is computed recursively, and information related to the spatial distribution of all data at step k is accumulated by the variables \(\beta ^k\) and \(\delta ^k_l\). The detailed steps are as follows:

$$\begin{aligned} D^k\left( {\mathbf {z}}^k\right) = \frac{k-1}{\left( k-1\right) \left( \sum _{l=1}^{q+p}{\left( {\mathbf {z}}^k_l\right) ^2}+1\right) + \beta ^k -2\sum _{l=1}^{q+p}{{\mathbf {z}}^k_j\delta ^k_j}}, \end{aligned}$$
(7)

where \(D^k\left( {\mathbf {z}}^k\right) \) is the density of the data around the last data point of the data stream input to the algorithm, \({\mathbf {z}}^k=([{\mathbf {x}}^T,y]^T)^k\) is an input/output pair at step k (\(k=2,3,\ldots \)), and

$$\begin{aligned} \beta ^k = \beta ^{k-1} + \sum _{l=1}^{q+p}\left( z^{k-1}_l\right) ^2, \quad \ \beta ^1=0,\quad \ \delta ^k_l = \delta ^{k-1}_l + z^{k-1}_l \ {\mathrm {and}} \ \delta ^1_l=0. \end{aligned}$$
(8)

eClustering ensures a gradual change of the rule-base. High-density data points are potential candidates for becoming central points of the fuzzy sets in antecedents of the fuzzy rules. The density of a data point selected to be a center has its density computed using (7) and is updated whenever new data is input. The density of the central points is recursively updated by:

$$\begin{aligned} D^k\left( {\mathbf {z}}^{i^{*}}\right) = \frac{k-1}{k-1 + (k-2)\left( \frac{1}{D^{k-1}\left( {\mathbf {z}}^{i^{*}}\right) }-1\right) + \sum _{l=1}^{q+p}\left( {\mathbf {z}}^{i^{*}}_l-{\mathbf {z}}^k_l \right) }, \end{aligned}$$
(9)

where \(D^1\left( {\mathbf {z}}^{i^{*}}\right) = 1\), \(k = 2,3,\ldots \), and \(i^{*}\) denotes the center point of the ith fuzzy rule. Notice that initialization (\(k=1\)) sets \({\mathbf {z}}^{1^{*}}\leftarrow {\mathbf {z}}^1\), \(R\leftarrow 1\), that is, the first data point is set as the cluster center to form the first rule.

The recursive density-based clustering approach does not rely on user- or problem-specific thresholds, unlike methods such as subtractive clustering or participatory learning, for example. The density is evaluated recursively and accumulates the information about the spatial distribution of all the data by a small number of variables (Angelov 2010).

In the eClustering procedure, representative clusters with high generalization capability are formed by considering the data points with the highest value of D. This is translated into Condition (I) as follows:

$$\begin{aligned} ({\mathrm {I}})&:&{\mathrm {IF}} \ \ D^k\left( {\mathbf {z}}^k\right) >\max _{i=1}^R{D^k\left( {\mathbf {z}}^{i^{*}}\right) } \ \ {\mathrm {OR}} \ \ D^k\left( {\mathbf {z}}^k\right) <\min _{i=1}^R{D^k\left( {\mathbf {z}}^{i^{*}}\right) }\nonumber \\&{\mathrm {THEN}} \ \ {\mathbf {z}}^{(R+1)^{*}} \leftarrow {\mathbf {z}}^k ,\ \ \ \ R \leftarrow R+1. \end{aligned}$$
(10)

If the current data point satisfies Condition (I), then it becomes a new cluster center and a new rule is formed with its central point at the new data point (\({\mathbf {z}}^{(R+1)^{*}} = {\mathbf {z}}^k;\ R\leftarrow R+1\)). This condition ensures good convergence, but it is sensitive to outliers. The influence of outliers can be smoothed using quality clusters indicators (Angelov 2010).

To control for the level of overlap and to avoid redundant clusters, the following condition is verified, Condition (II):

$$\begin{aligned} ({\mathrm {II}})&:&{\mathrm {IF}} \ \ \exists \ \ i: \ \ {\mathcal {A}}_{i,l}(x_l^k)>e^{-1} \ \ \forall \ \ l \nonumber \\&{\mathrm {THEN}} \ {\mathrm {remove}} \ {\mathbf {z}}^{i^{*}} \ {\mathrm {and}} \ {\mathrm {update}} \ R \ (R\leftarrow R-1). \end{aligned}$$
(11)

Condition (II) removes highly overlapping clusters, avoiding redundant rules. The previously existing central point(s) for which this condition holds is (are) removed. These mechanisms simplify the rule base once number of rules depends on the information available only.

Quality measures for recursively monitoring of the clusters include support, age, utility, zone of influence and local density (Angelov 2010). In this paper, as in Angelov (2010), the quality of the clusters is monitored using the relative accumulated activation level of a rule at step k:

$$\begin{aligned} U^k_i = \frac{\sum _{t=1}^k{\lambda _t}}{k-T^{i^{*}}}, \quad \ \ i=1,2,\ldots ,R; \quad \ \ k = 2,3,\ldots , \end{aligned}$$
(12)

where \(T^{i^{*}}\) is a time tag to indicate when the ith fuzzy rule was generated.

The utility of the clusters is evaluated using (12) and Condition (III):

$$\begin{aligned} ({\mathrm {III}})&:&{\mathrm {IF}} \ \ U^k_i < \varepsilon \nonumber \\&{\mathrm {THEN}} \ {\mathrm {remove}} \ {\mathbf {z}}^{i^{*}} \ {\mathrm {and}} \ {\mathrm {update}} \ R \ \left( R\leftarrow R-1\right) , \end{aligned}$$
(13)

where \(\varepsilon \) is a threshold related to the minimum utility of a cluster (threshold values are typically in the range [0.03, 0.1]).

Condition (III) means that if a cluster has low utility (lower than a threshold \(\varepsilon \)), then the data pattern has shifted away from the central point of that rule and, if a rule is such that it satisfies (13), then it is removed from the rule base. This quality measure evaluates the importance of fuzzy rules and assists the evolving process (Angelov 2010). The next step is to estimate the parameters of the linear rule consequents.

2.3.2 Recursive Consequent Parameter Identification

Expression (6) can be put into the following vector form:

$$\begin{aligned} y = \varLambda ^T \varPhi , \end{aligned}$$
(14)

where y is the output, \(\varLambda = \left[ \lambda _1 {\mathbf {x}}_e^T,\lambda _2 {\mathbf {x}}_e^T,\ldots ,\lambda _{q+p} {\mathbf {x}}_e^T\right] ^T\) denotes the fuzzily weighted extended input vector, and \(\varPhi = \left[ \varTheta _1^T,\varTheta _2^T,\ldots ,\varTheta _R^T\right] ^T\) is the vector of the rule base parameters.

Since the target output is known at each step, the parameters of the consequents can be updated using the recursive least squares algorithm (RLS), either locally or globally (Ljung 1988). In this paper we use the local optimal error criterion \(E_L^i\) which is:

$$\begin{aligned} \min {E_L^i} = \min {\sum _{t=1}^k{\lambda _i\left( y^t - \left( {\mathbf {x}}_{e}^t\right) ^T\varTheta _{i}^t\right) ^2}}. \end{aligned}$$
(15)

The optimal update of the parameters of the ith rule is:

$$\begin{aligned} \varTheta _i^{k+1}= & {} \varTheta _i^k + P_i^k {\mathbf {x}}_e^k\lambda ^k_i\left( y^k-\left( {\mathbf {x}}_e^k\right) ^T\varTheta ^k_i \right) ,\quad \ \varTheta ^1_i=0, \end{aligned}$$
(16)
$$\begin{aligned} P_i^{k+1}= & {} P_i^k - \frac{\lambda ^k_i P^k_i {\mathbf {x}}_e^k \left( {\mathbf {x}}_e^k\right) ^T P^k_i}{1+ \lambda ^k_i \left( {\mathbf {x}}_e^k\right) ^T P^k_i {\mathbf {x}}_e^k}, \quad \ P^1_i = \varOmega I, \end{aligned}$$
(17)

where I is a \((q+p+1)\times (q+p+1)\) identity matrix, \(\varOmega \) denotes a large number, usually \(\varOmega = 1000\), and P is a dispersion matrix. Angelov (2010) performed simulations with several benchmarks and verified the stability and convergence of the RLS updating Eqs. (16) and (17).

A new fuzzy rule created at k requires a respective dispersion matrix which is set as \(P^k_{R+1}=I\varOmega \). The parameters of a new rule are found from the parameters of the existing R fuzzy rules as follows:

$$\begin{aligned} \varTheta _{R+1}^k = \sum _{i=1}^R{\lambda _i\varTheta _i^{k-1}}. \end{aligned}$$
(18)

The parameters of all other rules are inherited from the previous step, and corresponding dispersion matrices updated independently. When a center point is replaced due to Condition (II), the parameters and the dispersion matrix are inherited from the fuzzy rule replaced:

$$\begin{aligned} \varTheta _{R+1}^k= & {} \varTheta _{i^{*}}^{k-1}, \quad \ \ {\mathcal {A}}_{i^{*},l}\left( x_l^k\right) >e^{-1}, \quad \ \ \forall \ \ l, \quad \ \ l = 1,2,\ldots ,q+p, \end{aligned}$$
(19)
$$\begin{aligned} P^k_{R+1}= & {} P^{k-1}_{i^{*}}, \quad \ \ {\mathcal {A}}_{i^{*},l}\left( x_l^k\right) >e^{-1}, \quad \ \ \forall \ \ l, \quad \ \ l = 1,2,\ldots ,q+p. \end{aligned}$$
(20)

Once the consequent parameters are found, the model output is computed using (6). Notice that the learning algorithm has only two control parameters, s (cluster spread) and \(\varepsilon \) (rule utility threshold).

2.4 Evolving Fuzzy-GARCH Algorithm

The detailed steps of the evolving fuzzy-GARCH model are as follows. All the steps of the algorithm are non-iterative. The model can develop/evolve an existing model when the data pattern changes, and by being recursive it means that it is computationally efficient.

figure a

3 Computational Results

To illustrate the performance of the evolving fuzzy-GARCH model for forecasting stock market volatility, this section uses the daily prices of the S&P 500 (US) and the Ibovespa (Brazil)Footnote 5 over the period from January 3, 2000 through September 30, 2011 to compare evolving fuzzy-GARCH against GARCH (Bollerslev 1986), EGARCH (Nelson 1991), GJR-GARCH (Glosten et al. 1993) and the fuzzy GJR-GARCH (Maciel 2012, 2013) models.Footnote 6 The daily stock return series were generated by taking the differences in the natural logarithm of the daily stock index and the previous day stock index, multiplied by 100. The data set was partitioned into two. The in-sample period consists of data from January 3, 2000 through December 29, 2005. The forecast out-of-sample period is from January 2, 2006 through September 30, 2011. This procedure is only necessary for GARCH-type modeling because the evolving fuzzy-GARCH learns recursively and does not require a pre-training phase.

Table 1 shows the basic statistics of the return series. The average daily returns are negative for the S&P 500 and positive for Ibovespa. The daily returns display evidence of skewness and kurtosis.

Table 1 Descriptive statistics of S&P 500 and Ibovespa daily returns

The return series are skewed toward the left and characterized by a distribution with tails that are significantly thicker than for a normal distribution. The Jarque–Bera test statistics further confirm that the daily returns are non-normally distributed. Compared with a Gaussian distribution, the kurtosis in the S&P 500 and Ibovespa suggests that their daily returns are fat tailed (Table 1). The Ibovespa index has a higher kurtosis than the S&P 500, which explains the fact that emerging countries generally exhibit more leptokurtic behavior. Under the null hypothesis of no serial correlation in the squared returns, the Ljung–Box \(Q^2(10)\) statistics infer linear dependence for both series considered. Furthermore, Engle’s ARCH test for the squared returns reveals strong ARCH effects, evidence in support of GARCH effects (i.e., heteroscedasticity). Accordingly, these preliminary analysis of the data encourage the adoption of a sophisticated model that embodies fat-tailed features and conditional models to allow for time-varying volatility.

GARCH-type models capture fat-tails and conditional volatility, but they do not capture volatility clustering, as characterized by Fama (1965). The stock indexes are shown in Fig. 1, and the corresponding returns are shown in Fig. 2. In particular, in Fig. 2, volatility clustering becomes clearer, especially when the context of the recent US Subprime crisis is considered.

Fig. 1
figure 1

Daily closing stock price indexes for the S&P 500 and Ibovespa

Fig. 2
figure 2

S&P 500 and Ibovespa daily returns

The Bayesian information criterion (BIC) and Akaike’s information criterion (AIC) were used to select appropriate lag values for the evolving fuzzy-GARCH and GARCH-type models Akaike (1974) and Schwarz (1978). Models with various combinations of (pq) values ranging from (1, 1) to (15, 15) were developed using return data. According to BIC and AIC criteria the best value for all volatility models was (1, 1), i.e. \(p=1\) and \(q=1\).

To choose appropriate control parameters for the fuzzy-GARCH model, simulations were conducted with different parameter values and compared in terms of accuracy. The value found for the spread and threshold were \(s = 0.05\) and \(\varepsilon = 0.1\), respectively.Footnote 7

Comparison of volatility forecasts was done assuming one-step ahead forecast and the mean squared forecast error (MSFE), mean absolute forecast error (MAFE), and mean percentage forecast error (MPFE):

$$\begin{aligned} {\mathrm {MSFE}}= & {} \frac{1}{N}\sum _{t=1}^N{\left( \sigma ^2_t - \hat{\sigma }^2_t \right) ^2}, \end{aligned}$$
(21)
$$\begin{aligned} {\mathrm {MAFE}}= & {} \frac{1}{N}\sum _{t=1}^N{|\sigma ^2_t - \hat{\sigma }^2_t|}, \end{aligned}$$
(22)
$$\begin{aligned} {\mathrm {MPFE}}= & {} \frac{1}{N}\sum _{t=1}^N{\frac{|\sigma ^2_t - \hat{\sigma }^2_t|}{\sigma ^2_t}}, \end{aligned}$$
(23)

where N is the number of out-of-sample observations, \(\sigma ^2_t\) is actual volatility at t, measured as the squared daily return, and \(\hat{\sigma }^2_t\) is the forecast volatility at t.

Table 2 summarizes the performance of the models when forecasting the S&P 500 and Ibovespa stock indexes volatilities. The evolving fuzzy-GARCH model performs better than the remaining family-GARCH models because its structure provides a combination of rules as a mechanism to deal with volatility clustering. The GARCH-type models achieve similar performance. Comparing the fuzzy approaches, the evolving fuzzy-GARCH and the fuzzy GJR-GARCH models provide very similar results, since both address the issues of volatility clustering and volatility nonlinear dynamic on their structures. Furthermore, it worth to note that these both methodologies are able to consider markets uncertainties and vagueness due to the fuzzy aspect.

Table 2 Volatility modeling performance for one-step ahead forecast

The results of evolving fuzzy-GARCH and fuzzy GJR-GARCH are quite superior than the remaining models (Table 2). The theoretical justification for this performance may be summarized as follows. First, the fuzzy approaches are suitable for the identification of nonlinear, time-varying and complex systems. Since the volatility of assets returns shows these features, the fuzzy methods are more appropriate than traditional GARCH-family models that assume a linear relationship for volatility behavior. Second, structure identification of the fuzzy techniques, considered in the paper, is a fuzzy clustering problem, which means that data are divided into subsets or clusters, and each subset represents similar data points, in terms of the distance of a data point to its centroid (representative of the cluster center). Therefore, data are associated by similarity. Volatility clustering stylized fact stands for that large changes tend to be followed by large changes, of either sign, and small changes tend to be followed by small changes, i.e., volatility changes are cataloged by similarity. In this case, the fuzzy methods consider volatility clustering naturally due to its similarity-based fuzzy clustering framework. Third, since financial markets are affected by news, expectations, and investors’ psychological states, uncertainties and vagueness are verified on market dynamics. Thus, the fuzzy concept of evolving fuzzy-GARCH and fuzzy GJR-GARCH models provides mechanisms to treat the uncertainties of volatility processes, which also results in more accurate forecasts.

To illustrate the capability of the evolving fuzzy-GARCH model to deal with volatility clustering, Figs. 3 and 4 show the “true volatility”, measured as the squared returns, and the estimated volatility using GARCH and the fuzzy-GARCH models for the S&P 500 and Ibovespa indexes, respectively. Periods of high volatilities correspond to the volatility clustering behavior in the series of stock returns, revealing also volatility persistence. The GARCH and fuzzy-GARCH models perform similarly in stability or low volatility environments. Howsoever, during high variability movements, the evolving fuzzy-GARCH model captures more accurately the volatility levels. The 11th September 2001 terrorist attack and the Subprime crisis initiated in the second semester of 2008 denote the events corresponding to the high volatility dynamics. For the S&P 500 (Fig. 3) and Ibovespa (Fig. 4) indexes, the evolving fuzzy-GARCH model captures the instabilities suffered by the economies evaluated. In the Brazilian stock market, characterized by a high volatility behavior, the effectiveness of the suggested methodology becomes more clear, surpassing the GARCH approach in capturing volatility dynamic mainly in the presence of significant market fluctuations.

Fig. 3
figure 3

S&P 500 actual and estimated volatility using GARCH and the fuzzy-GARCH models

Fig. 4
figure 4

Ibovespa actual and estimated volatility using GARCH and the fuzzy-GARCH models

Although the forecasting accuracy is extensively employed in practice for comparison purposes, it does not reveal whether a forecasting model is statistically superior to another one. Therefore, additional tests must be pursued to compare two or more competing models fairly.

This paper adopts the parametric Morgan–Granger–Newbold (MGN) test suggested in Diebold and Mariano (1995). The MGN test is employed when the assumption of contemporaneous error correlation is relaxed. The statistic for this test is computed using the following:

$$\begin{aligned} {\mathrm {MGN}} = \frac{\hat{\rho }_{ab}}{\left( \frac{1 - \hat{\rho }_{ab}^2}{N - 1} \right) ^{\frac{1}{2}}}, \end{aligned}$$
(24)

where \(\hat{\rho }_{ab}\) is the estimated correlation coefficient between \(a=e_1+e_2\), and \(b = e_1 - e_2\), with \(e_1\) and \(e_2\) the residuals of the two models, for example, the fuzzy-GARCH and GARCH models. In this case, the statistics is distributed as a Student distribution with \(N-1\) degrees of freedom, where N is the number of out-of-sample observations. For this test, if the estimates are equally accurate (null hypothesis), then the correlation between a and b is zero.

The results from the MGN test, shown in Table 3, agree with the previous results. The MGN statistics reveal that the evolving fuzzy-GARCH and fuzzy GJR-GARCH forecasting models are statistically superior when compared against GARCH-family models. By the same token, the remaining GARCH models, GARCH, EGARCH, and GJR-GARCH are equally accurate. When evolving fuzzy-GARCH and fuzzy GJR-GARCH models are considered, according to MGN statistics, they are also equally accurate.

Table 3 MGN volatility forecast statistics for the S&P 500 and Ibovespa

Figure 5 shows how the number of fuzzy rules changes during evolving fuzzy-GARCH modeling steps. The number of rules is similar for both markets, but the S&P 500 shows greater variability, revealing the continuous adaptation of the model structure. It is interesting to note that the number of rules increases significantly between 2008 and 2009, revealing the capability of evolving fuzzy-GARCH to capture crises instabilities. This period corresponds to the US subprime mortgage crisis which has led to plunging property prices, a slowdown in the US economy, and billions of dollars in banks losses, affecting the world’s main financial markets, including the Brazilian market.

Fig. 5
figure 5

Number of fuzzy-GARCH rules for the S&P 500 and Ibovespa indexes

The fuzzy-GARCH model exhibits a strong ability to forecast volatility of real market returns once it considers both, stock market asymmetry and volatility clustering. The fuzzy-GARCH approach statistically overperforms GARCH, EGARCH and GJR-GARCH as well, and also showed comparable forecasts with fuzzy GJR-GARCH methodology. Moreover, the adaptive modeling and incremental/recursive nature of fuzzy-GARCH provides a more efficient algorithm and can be used on-line, an essential requirement in volatility forecasting and actual decision making instances.

4 Conclusion

Volatility forecasting plays a central role in several financial decisions such as asset allocation and hedging, option pricing and risk analysis. This paper has introduced an evolving fuzzy-GARCH approach for financial volatility modeling and forecasting. Fuzzy-GARCH combines evolving fuzzy systems and the conditional variance GARCH modeling to deal with stylized facts such as time-varying volatility and volatility clustering. Since volatility mirrors behavior of nonstationary nonlinear environments, evolving models have shown to be very suitable. Empirical evidence based on S&P 500 and Ibovespa index market data illustrates the potential of the evolving fuzzy-GARCH approach to forecast volatility. Statistically speaking, fuzzy-GARCH develops more accurate forecasts than GARCH-type models, as well as comparable results with fuzzy GJR-GARCH method. The fuzzy-GARCH was also able to handle periods with high instabilities such as the recent subprime mortgage crisis. Future work should include applications of the evolving fuzzy-GARCH approach in financial decision making problems related to volatility such as option pricing, portfolio selection, and risk analysis, as well as the use of realized volatility as “true volatility” and their use to construct series of jumps component as input to improve forecasts. Moreover, the extension of the evolving fuzzy-GARCH approach by using exponential and threshold GARCH-type models also comprises issues of further investigation.