1 Introduction

The ability of choosing a model that effectively forecasts the conditional variance of financial time series constitutes a challenging problem. Many studies reinforce the stylised fact that the conditional volatility of stock market returns are time varying (e.g. Bekaert and Harvey 1997; Pagan and Schwert 1990; Andersen et al. 2001; Bollerslev and Zhou 2002; Engle 2002; Brownlees and Engle 2012). The consequences of such a temporal volatility directly impact financial analyses and investment decisions, such as asset pricing, risk management, asset allocation and portfolio optimisation (e.g. Hodrick 1981; Bollerslev et al. 1988; De Santis and Gerard 1997; Bali and Engle 2010; Creal et al. 2011). Therefore, it is highly desirable to find a modelling framework with the capability to properly capture the volatility clustering phenomenon.

While modelling non-linear and asymmetric dependent time series, the aptitude of models capable of apprehending volatility clustering effects is notably dependent upon its structure versatility. Motivated by difficulties of widely adopted generalized autoregressive conditional heteroscedasticity (GARCH)-type models (Engle 1982; Bollerslev 1986)—which cannot properly capture conditional distribution properties as well as lack robustness, a class of score-driven volatility models was introduced in Harvey and Chakravarty (2008) and Creal et al. (2008), namely dynamic conditional score (DCS)—also known as generalized autoregressive score (GAS).

A particular attraction of score-driven models relies on the fact that they produce locally and asymptotically optimal filters in relation to the Kullback–Leibler divergence (Blasques et al. 2015). Moreover, this model class is also capable of identifying distribution skewness with the presence of outliers and, due to its conditional score dynamics, the maximum likelihood (ML) may be straightforwardly estimated. Due to its peculiar characteristics—more specifically by scaling the score function in an appropriate manner, score-driven models are able to capture properties of established observation-driven models, such as the GARCH model (Engle 1982; Bollerslev 1986), dynamic conditional correlation (DCC) model of Engle (2002), autoregressive conditional multinomial (ACM) model of Russell and Engle (2005) and dynamic copula models of Patton (2006), among others (e.g. Creal et al. 2013; Blasques et al. 2014).

Another advantage of the score-driven framework is that many of such models are generalizations of classical observation-driven models (Creal et al. 2013; Harvey 2013). However, some of the classical dynamic volatility models are not special cases of corresponding score-driven models. For example, the t-GARCH model (Bollerslev 1987), which we explore in the empirical application of the present paper, is not a score-driven model. An additional advantage of score-driven compared with classical time series models such as autoregressive moving average (ARMA), GARCH and vector ARMA (VARMA) models is that score-driven models are robust to outliers and missing data (Harvey 2013).

The practical usefulness of score-driven models has been demonstrated through works on different topics and purposes. Empirical problems for which the score-driven model framework has been applied include systemic risk forecasting (Oh and Patton 2013; Eckernkemper 2018; Bernardi and Catania 2019), credit risk analysis (Creal et al. 2014), dependence modelling (Harvey and Thiele 2016; Janus et al. 2014), spatial econometrics (Blasques et al. 2014; Catania and Billé 2017), credit default swap spread (Lange et al. 2017; Oh and Patton 2018) and high-frequency data (Opschoor et al. 2018; Gorgi et al. 2019), among others (Ardia et al. 2019; Patton et al. 2019; Lazar and Xue 2020).

Particularly relevant to the present paper is the work of Harvey and Lange (2018), in which the one- and two-component Beta-t-EGARCH-M (i.e. Beta-t-exponential GARCH-in-mean) models are introduced. Those models adopt the ideas of the two-component GARCH model (Engle and Lee 1999) and GARCH-M model (Engle et al. 1987) into score-driven volatility models. Two-component volatility models (e.g. Engle and Lee 1999; Alizadeh et al. 2002) consider that volatility is driven by a long- and a short-run component, where the latter captures temporary variation in volatility and includes leverage effects (Black 1976). As presented in Alizadeh et al. (2002), two-component volatility models are capable of capturing long memory behaviour. In GARCH-M models (e.g. Engle et al. 1987) an equity risk premium is included in the expected return, which is driven by conditional volatility. There is a large body of literature on GARCH-M models (e.g. Adrian and Rosenberg 2008), where statistical and volatility forecasting performances of such models are studied.

The work of Harvey and Lange (2018) shows that the one- and two-component Beta-t-EGARCH-M models improve the volatility forecasting performance of the corresponding Beta-t-EGARCH models. Another recent work which is also relevant to the present paper is Blazsek et al. (2022), in which the one-component Beta-t-QVAR-M model is introduced, extending the one-component Beta-t-EGARCH-M model of Harvey and Lange (2018). Volatility in the Beta-t-QVAR-M is driven by a bivariate score-driven filter, which is updated by score functions with respect to location (i.e. expected return) and log-scale (i.e. non-linear transformation of volatility). Leverage effects are included to the score-driven volatility filter (i.e. Beta-t-QVAR-M-lev) in the most general specification of the one-component Beta-t-QVAR-M (Blazsek et al. 2022), measuring asymmetric effects of unexpected returns on volatility.

The present work contains two contributions to the literature. Firstly, we introduce the two-component Beta-t-QVAR-M-lev model, showing that the asymptotic properties of this model are directly obtained from the theoretical results of Blazsek et al. (2022). This is performed by assuming covariance stationarity for both volatility components. The one-component Beta-t-QVAR-M and the one- and two-component Beta-t-EGARCH-M models are special cases of the two-component Beta-t-QVAR-M model. Secondly, we extend the stock market data explored in Blazsek et al. (2022) to a larger sample of G20 stock indices. This indicates that the volatility forecasting superiority of the one-component Beta-t-QVAR-M-lev continues to hold over a range of stock market indices in addition to the USA one. Moreover, the two-component Beta-t-QVAR-M-lev improves the forecasting performance of one-component Beta-t-QVAR-M-lev for all indices included in the sample.

The full dataset includes 20 stock indices of the G20 counties, covering the period from January 2000 to April 2022. For the volatility forecasting analysis, we explore 5-min realized volatility data as a benchmark of true volatility (Liu et al. 2015). For the comparison of volatility forecasting accuracy, we adopt the Giacomini–White test of forecasting accuracy (Giacomini and White 2006) for 2500 rolling windows. The realized volatility data are available for 13 stock market indices. Hence, we focus on volatility forecasting for the following countries: Brazil, Canada, China, France, Germany, India, Italy, Japan, Mexico, South Korea, Spain, the UK and the USA.

It is worth noting that the full sample includes high-volatility periods such as the Dot-com bubble of 2002, the Global Financial Crisis of 2007–08, the Covid-19 pandemic and the beginning of the Russian invasion of Ukraine. Thus, our 22 years sample span includes several low- and high-volatility periods. For all countries, we study volatility forecasting accuracy for the last 2500 trading days of the full sample, from 2012 to 2022.

Motivated by the work in Blazsek et al. (2022), in the present paper we focus on the M-lev-type volatility models. In particular, we use one- and two-component Gaussian-GARCH-M-lev, t-GARCH-M-lev and Beta-t-QVAR-M-lev models for all stock indices. The volatility forecasting results for the one-component models expand and confirm the findings in Blazsek et al. (2022) for a larger sample of G20 stock market indices. This indicates that the forecasting performance of the one-component Beta-t-QVAR-M-lev is superior to the forecasting performances of the one-component Gaussian-GARCH-M-lev and t-GARCH-M-lev models.

In addition, the Giacomini–White test results indicate that the volatility forecasting performance of the two-component Beta-t-QVAR-M-lev model is superior compared with alternative volatility models considered in the present paper. Therefore, the volatility forecasting performance of the one-component Beta-t-QVAR-M-lev model (Blazsek et al. 2022) is improved. Moreover, we find that the two- instead of the one-component improves volatility forecasting for the Beta-t-QVAR-M-lev. One-component models have shown superior forecasting performances for Gaussian-GARCH-M-lev and t-GARCH-M-lev. This particular finding suggests that score-driven models may provide superior empirical performances compared with classical GARCH-type volatility models.

The remainder of this paper is organized as follows. Section 2 contains a review of the literature. Section 3 details the proposed model. Section 4 describes the data and empirical results. Section 5 concludes.

2 Literature review

In the existing literature, time series models based on parameters that dynamically vary through time are traditionally grouped into two model categories, consisting of the observation-driven and parameter-driven models (Box and Cox 1982; Creal et al. 2013). Observation-driven models have been developed to exploit large changes—also known as shifts or jumps, and distributional asymmetries that are frequently present in financial time series. Such a category of models includes the ARCH model, originally introduced in Engle (1982), and its subsequent relevant extension, the generalized ARCH (i.e. GARCH) model proposed in Bollerslev (1986). Both models have been widely applied to empirical studies (e.g. Engle and Sheppard 2001; Bauwens et al. 2006).

Score-driven are observation-driven models (Creal et al. 2008, 2013), consisting of a modification of the GARCH model, proposed to capture large changes through time, while mitigating negative impacts of outliers. As in the score-driven framework, it is frequently assumed that the innovations follow a non-normal distribution, and its second central moment is modelled through a GARCH-type equation based on the conditional score of the distribution regarding the variance.

The first score-driven model introduced in Harvey and Chakravarty (2008) and Creal et al. (2008) is the Beta-t-EGARCH model, which is a member of the EGARCH family, originally proposed in Nelson (1991). A positive outcome of the adoption of the new model is that the deleterious effect of potential outliers on the time-varying volatility equation is relatively smaller. Harvey (2013) notes that GARCH models may overreact to extreme observations, while the score-driven Beta-t-EGARCH may underreact to them, inspiring the proposition of two-component score-driven volatility models.

Through score-driven models it is possible to derive volatility forecasting expressions, all supposing that their respective conditional variances are properly calculated, simulating straightforwardly the conditional distribution. Moreover, general analytic expressions of the serial correlation function of squared observations may also be appropriately calculated (Harvey 2013; Sucarrat 2013). Due to its attractive properties, score-driven models have been further developed into many model extensions. Such models include the dynamic models for location, volatility and multivariate dependence for fat-tailed densities introduced in Creal et al. (2011), the asymmetric exponential score-driven model introduced in Creal et al. (2013) and the observation-driven mixed measurement dynamic factor models proposed in Creal et al. (2014), among other extensions (e.g. Harvey and Luati 2014; Babatunde et al. 2019).

Caivano and Harvey (2014) propose another extension of the EGARCH model class, which uses the exponential generalized beta distribution of the second kind (i.e. EGB2). Such a model is complementary to the model under the Student’s t-distribution explored in Creal et al. (2013) and Harvey (2013). The proposed framework is then modelled to macroeconomic data, which empirical results indicate that the exponential generalized beta distribution of the second kind may provide a superior fit when applied to certain macroeconomic time series (e.g. exchange rate).

More recently, further score-driven EGARCH models using the generalized Student’s t-distribution and two-component EGARCH models for the t-distribution were introduced in Harvey and Lange (2017, 2018). These models are capable of encompassing asymmetry and skewness, and expressions regarding the respective information matrix are derived by the authors. The model is empirically tested through analyses exploring stock market and commodity return series, whose results potentially provide a flexible and robust volatility modelling framework. Regarding the use of alternative probability distributions for score-driven EGARCH models, we refer to the works in Blazsek et al. (2018), Blazsek and Licht (2022) and Blazsek and Haddad (2022).

3 Model

In this section, we present a general score-driven modelling framework and the proposed two-component score-driven model for expected return and volatility, namely the two-component Beta-t-QVAR-M-lev. Subsequently, we present technical details about the updating terms (i.e. score functions), and the statistical inference of the Beta-t-QVAR-M-lev. Lastly, we present a classical alternative, the two-component GARCH-M-lev model.

3.1 General framework

The score-driven framework consists of observation-driven models (Cox 1981) that are updated by the partial derivatives of the log conditional density of the dependent variables with respect to dynamic parameters. To illustrate the formulation of score-driven models, we present a score-driven model with a univariate score-driven filter (Creal et al. 2008). We assume that the random dependent variable \(y_{t}\) is generated according to the following density function:

$$\begin{aligned} y_{t} \sim f(y_{t}|f_{t},{\mathcal {F}}_{t-1},\Theta ) \end{aligned}$$
(1)

where \(f_{t}\) is the score-driven filter, \({\mathcal {F}}_{t-1}\) is a \(\sigma \)-algebra representing the information set of period t, and \(\Theta \) is a vector of time-invariant parameters. The filter \(f_{t}\) is formulated as follows:

$$\begin{aligned} f_{t}=\omega +\sum _{i=1}^{p}a_{i}s_{t-i}+\sum _{j=1}^{q}b_{j}s_{t-j} \end{aligned}$$
(2)

where \(\omega \), \(a_{i}\) for \(i=1,\ldots ,p\), and \(b_{j}\) for \(j=1,\ldots ,q\), are time-invariant scalar parameters. Moreover, \(s_{t}\) is the scaled score function, defined as follows:

$$\begin{aligned} s_{t}=S_{t}\nabla _{t}, \qquad \nabla _{t}=\frac{\partial \ln f(y_{t}|f_{t},{\mathcal {F}}_{t-1},\Theta )}{\partial f_{t}}, \qquad S_{t}=S(t,f_{t},{\mathcal {F}}_{t-1},\Theta ) \end{aligned}$$
(3)

where \(\nabla _{t}\) is the conditional score with respect to \(f_{t}\), and \(S_{t}\) is the scaling parameter of score. For the scaling parameter of score \(S_{t}\), a popular choice is the conditional inverse information matrix, i.e. \(S_{t}={\mathcal {I}}^{-1}_{t}({\mathcal {F}}_{t-1})\) where \({\mathcal {I}}_{t}({\mathcal {F}}_{t-1})=E(\nabla _{t}\nabla _{t}'|{\mathcal {F}}_{t-1})\) (Creal et al. 2008).

Regarding volatility modelling, the following Beta-t-GARCH(1,1) model is presented as an example:

$$\begin{aligned} y_{t}= \;& {} v_{t}= f_{t}^{1/2}\epsilon _{t}, \qquad \epsilon _{t} \sim t(\nu ) \text { i.i.d.} \end{aligned}$$
(4)
$$\begin{aligned} f_{t}= \;& {} \omega +a_{1} s_{t-1} +b_{1} f_{t-1} \end{aligned}$$
(5)

In this formulation we assume that the expected return is zero, i.e. observed \(y_{t}\) and unexpected return \(v_{t}\) are identical. In the Beta-t-GARCH, it is assumed that the independent and identically distributed (i.i.d.) standardized error term \(\epsilon _{t}\) follows the Student’s t-distribution. It may be shown that the Beta-t-GARCH becomes a Gaussian-GARCH model (Bollerslev 1986) if the degrees of freedom parameter goes to infinity, i.e. the Student’s t-distributed error term becomes the standard normal distribution, and the scaled score function becomes a quadratic transformation of unexpected return, i.e. \(s_{t} \stackrel{p} {\rightarrow} g(v_{t}^{2})\) if \(\nu \rightarrow \infty \).

3.2 Two-component Beta-t-QVAR-M-lev

The log-return \(y_{t}= \ln (p_{t}/p_{t-1})\) model, where pre-sample data define initial price \(p_{0}\), is calculated as follows:

$$\begin{aligned} y_{t}= \mu _{t}+v_{t}= \mu _{t}+\exp (\lambda _{t})\epsilon _{t} \end{aligned}$$
(6)

for \(t=1,\ldots ,T\), where \(\mu _{t}\) is the expected return, \(v_{t}\) is the unexpected return, and \(\exp (\lambda _{t})\equiv \exp (\omega +\lambda ^{\dagger }_{t})\) is a score-driven scale parameter with time-invariant parameter \(\omega \) and filter \(\lambda ^{\dagger }_{t}\), for which \(E(\lambda _{t})=\omega \). The error term \(\epsilon _{t}\sim t(\nu )\), with degrees of freedom \(2<\nu <\infty \), is i.i.d. with respect to the Student’s t-distribution. The \(\nu >2\) ensures that the conditional volatility of \(y_{t}\) exists.

For the specification of expected return \(\mu _{t}\), we extend the expected return specification in Harvey and Lange (2018) – i.e. \(\mu _{t}=c+\beta _{2} \exp (\lambda _{t})\), for the Beta-t-EGARCH-M model, and we use the following expected return specification for the two-component Beta-t-QVAR-M-lev:

$$\begin{aligned} \mu _{t}=c+\beta _{1}\mu ^{\dagger }_{t}+\beta _{2} \exp (\lambda _{t}) =c+\beta _{1}\mu ^{\dagger }_{t}+\beta _{2} \exp (\omega +\lambda ^{\dagger }_{t}) \end{aligned}$$
(7)

In addition, for the specification of \(\mu _{t}^{\dagger }\) and \(\lambda _{t}^{\dagger }\), we extend the one-component Beta-t-QVAR-M-lev model in Blazsek et al. (2022) into the following two-component score-driven filter:

$$\begin{aligned} \left[ \begin{array}{l} \mu _{t}^{\dagger }\\ \lambda _{t}^{\dagger } \end{array} \right] = \left[ \begin{array}{l} \mu _{1,t}^{\dagger }\\ \lambda _{1,t}^{\dagger } \end{array} \right] + \left[ \begin{array}{l} \mu _{2,t}^{\dagger }\\ \lambda _{2,t}^{\dagger } \end{array} \right] \end{aligned}$$
(8)

with a short- and a long-run component, respectively. For those components of the bivariate score-driven filter, we have the following: \(E(\mu ^{\dagger }_{1,t})=E(\mu ^{\dagger }_{2,t})=E(\lambda ^{\dagger }_{1,t})=E(\lambda ^{\dagger }_{2,t})=0\). Hence, \(E(\mu ^{\dagger }_{t})=E(\lambda ^{\dagger }_{t})=0\).

The short-run component is expressed as follows:

$$\begin{aligned} \left[ \begin{array}{c} \mu ^{\dagger }_{1,t}\\ \lambda ^{\dagger }_{1,t} \end{array} \right]= & {} \left[ \begin{array}{cc} \phi _{1,11}&{}\phi _{1,12}\\ \phi _{1,21}&{}\phi _{1,22}\\ \end{array} \right] \left[ \begin{array}{c} \mu ^{\dagger }_{1,t-1}\\ \lambda ^{\dagger }_{1,t-1} \end{array} \right] \nonumber \\{} & {} + \left[ \begin{array}{cc} \psi _{1,11}&{}\psi _{1,12}\\ \psi _{1,21}&{}\psi _{1,22}\\ \end{array} \right] \left[ \begin{array}{c} s_{\mu ,t-1}\\ s_{\lambda ,t-1} \end{array} \right] + \psi ^{*} \left[ \begin{array}{c} 0\\ \text {sgn}(-\epsilon _{t-1})(s_{\lambda ,t-1}+1) \end{array} \right] \end{aligned}$$
(9)

where the updating terms \(s_{\mu ,t}\) and \(s_{\lambda ,t}\) are defined in the following section, \(\psi ^{*} \in \mathrm{I\!R}\) measures leverage effects, and \(\text {sgn}(\cdot )\) is the signum function. It is worth noting that \(\text {sgn}(x)=-1\) for \(x<0\); \(\text {sgn}(x)=0\) for \(x=0\); \(\text {sgn}(x)=1\) for \(x>0\), and the leverage effects formulation follows Harvey (2013). The use of leverage effects in volatility equations is motivated by several works. As an example, we refer to Hansen and Lunde (2005), in which it is shown that among 330 GARCH-type models the most accurate volatility models include leverage effects.

We include a leverage effects term in the less persistent filter, following Engle and Lee (1999) as well as Harvey and Lange (2018). In matrix form, Eq. (9) is formulated as follows: \(\theta _{1,t}=\Phi _{1}\theta _{1,t-1}+g(s_{t-1})\) where \(\theta _{1,t}=(\mu ^{\dagger }_{1,t},\lambda ^{\dagger }_{1,t})'\), and \(g(s_{t})\equiv g[(s_{\mu ,t},s_{\lambda ,t})']\) represents the second and third terms on the right side of Eq. (9). Moreover, we use \(\theta _{1}=E(\theta _{t})=0_{2\times 1}\) for initialization, assuming that the maximum modulus of the eigenvalues of \(\Phi \) is less than one.

The long-run component is expressed as follows:

$$\begin{aligned} \left[ \begin{array}{c} \mu ^{\dagger }_{2,t}\\ \lambda ^{\dagger }_{2,t} \end{array} \right] = \left[ \begin{array}{cc} \phi _{2,11}&{}\phi _{2,12}\\ \phi _{2,21}&{}\phi _{2,22}\\ \end{array} \right] \left[ \begin{array}{c} \mu ^{\dagger }_{2,t-1}\\ \lambda ^{\dagger }_{2,t-1} \end{array} \right] + \left[ \begin{array}{cc} \psi _{2,11}&{}\psi _{2,12}\\ \psi _{2,21}&{}\psi _{2,22}\\ \end{array} \right] \left[ \begin{array}{c} s_{\mu ,t-1}\\ s_{\lambda ,t-1} \end{array} \right] \end{aligned}$$
(10)

In matrix form, Eq. (10) may be written as follows: \(\theta _{2,t}=\Phi _{2}\theta _{2,t-1}+\Psi _{2} s_{t-1}\), where \(\theta _{2,t}=(\mu ^{\dagger }_{2,t},\lambda ^{\dagger }_{2,t})'\), and \(s_{t}=(s_{\mu ,t},s_{\lambda ,t})'\). It is also assumed that the maximum modulus of the eigenvalues of \(\Phi _{2}\) is less than one. The filter \(\theta _{t}\) is initialized by its deterministic unconditional mean \(\theta _{1}=E(\theta _{t})=0_{2\times 1}\). We assume Beta-t-QVAR(1) specifications for both the short- and long-run components, which could be extended to the Beta-t-QVARMA(p,q).

We define the information set for period t as follows: \({\mathcal {F}}_{t-1}=(\mu ^{\dagger }_{1},\lambda ^{\dagger }_{1},y_{1},\ldots ,y_{t-1})\). The conditional volatility for period t is finite, being formulated as follows: \(\text {SD}_{t}(y_{t}|{\mathcal {F}}_{t-1})=\sigma _{t}=\exp (\lambda _{t})[\nu /(\nu -2)]^{1/2}\). Furthermore, the log conditional density of \(y_{t}|{\mathcal {F}}_{t-1}\) for the Student’s t-distribution is formulated as follows:

$$\begin{aligned}{} & {} \ln f(y_{t}|{\mathcal {F}}_{t-1},\Theta )\nonumber \\{} & {} \quad = \ln \Gamma \left( \frac{\nu +1}{2}\right) - \ln \Gamma \left( \frac{\nu }{2}\right) - \frac{1}{2}\ln (\pi \nu )- \lambda _{t}- \frac{\nu +1}{2}\ln \left[ 1+\frac{(y_{t}-\mu _{t})^{2}}{\nu \exp (2\lambda _{t})} \right] \end{aligned}$$
(11)

where \(\Theta \) is a vector of time-invariant parameters.

For the Beta-t-QVAR(1)-M-lev model, the most recent information on return influences the expected return in two manners. Firstly, through \(\mu _{t}^{\dagger }\), which is updated by a linear transformation of \(s_{t}\). Secondly, through \(\exp (\omega +\lambda ^{\dagger }_{t})\), which is updated through a non-linear transformation of \(s_{t}\). The functional forms of those updates are different. In practice \(\beta _{1}\), which multiplies \(\mu _{t}^{\dagger }\) in the expected return, may or may not be significantly different from zero. If it is significant, then additional dynamics would be captured for the expected return by \(\mu _{t}^{\dagger }\). If it is not significant, then the expected return formulation in Harvey and Lange (2018) would be obtained.

By adopting \(\theta _{2,t}=0\), we get the one-component Beta-t-QVAR-M-lev in Blazsek et al. (2022). If we also restrict \(\beta _{1}=\phi _{1,11}=\phi _{1,21}=\psi _{1,11}=\psi _{1,21}=\phi _{2,11}=\phi _{2,21}=\psi _{2,11}=\psi _{2,21}=0\), then we would get the two-component Beta-t-EGARCH-M-lev proposed in Harvey and Lange (2018). The further restrictions \(\theta _{2,t}=\beta _{2}=0\) and \(\theta _{2,t}=\beta _{2}=\psi ^{*}=0\) provide the one-component Beta-t-EGARCH-lev and Beta-t-EGARCH, respectively (Harvey and Chakravarty 2008; Harvey 2013).

3.3 Score functions of the two-component Beta-t-QVAR-M-lev

The scaled score function \(s_{\mu ,t}\) is the scaled conditional score of the log-likelihood (LL) with respect to \(\mu _{t}\), as follows:

$$\begin{aligned} \frac{\partial \ln f(y_{t}|{\mathcal {F}}_{t-1},\Theta )}{\partial \mu _{t}}= \frac{\nu +1}{\nu \exp (2\lambda _{t})}\times s_{\mu ,t} \end{aligned}$$
(12)

where the scaling parameter of score \(S_{t}\) is defined by the inverse information matrix, and

$$\begin{aligned} s_{\mu ,t}=\frac{\nu \exp (2\lambda _{t})(y_{t}-\mu _{t})}{\nu \exp (2\lambda _{t})+(y_{t}-\mu _{t})^{2}} =\frac{\nu \exp (\lambda _{t})\epsilon _{t}}{\nu +\epsilon _{t}^{2}} \end{aligned}$$
(13)

The score function \(s_{\lambda ,t}\) is the conditional score of the LL with respect to \(\lambda _{t}\), as follows:

$$\begin{aligned} s_{\lambda ,t}=\frac{\partial \ln f(y_{t}|{\mathcal {F}}_{t-1},\Theta )}{\partial \lambda _{t}} =\frac{(\nu +1)(y_{t}-\mu _{t})^{2}}{\nu \exp (2\lambda _{t})+(y_{t}-\mu _{t})^{2}}-1 =\frac{(\nu +1)\epsilon _{t}^{2}}{\nu +\epsilon _{t}^{2}}-1 \end{aligned}$$
(14)

where the scaling parameter of score \(S_{t}\) equals one (Harvey 2013). Both \(s_{\mu ,t}\) and \(s_{\lambda ,t}\) are martingale difference sequences, asymptotically at the true values of parameters \(\Theta _{0}\), with zero mean and finite variance. Moreover, \(s_{t}=(s_{1,t},s_{2,t})'\equiv (s_{\mu ,t},s_{\lambda ,t})'\) is white noise, asymptotically at \(\Theta _{0}\). These results extend the theoretical results into the one-component Beta-t-QVAR-M-lev model in Blazsek et al. (2022).

3.4 Statistical inference of the two-component Beta-t-QVAR-M-lev

The parameters of Beta-t-QVAR are estimated by using the ML method, as follows:

$$\begin{aligned} {\hat{\Theta }}=\arg \max _{\Theta \in {\tilde{\Theta }}} \text {LL}(y_{1},\ldots ,y_{T}|\Theta ) =\arg \max _{\Theta \in {\tilde{\Theta }}} \sum _{t=1}^{T} \ln f(y_{t}|{\mathcal {F}}_{t-1},\Theta ) \end{aligned}$$
(15)

where \(\Theta \) is the vector of time-invariant parameters, \({\hat{\Theta }}\) is the parameter space, the \(\sigma \)-algebra is \({\mathcal {F}}_{t-1}=(\mu ^{\dagger }_{1},\lambda ^{\dagger }_{1},y_{1},\ldots ,y_{t-1})\), and the log conditional density is defined in Eq. (11).

We refer to the ML theory for the one-component Beta-t-QVAR-M-lev model (Blazsek et al. 2022). The results of that theory hold for the two-component Beta-t-QVAR-M-lev model, under the assumption of covariance stationarity of both short- and long-run components of the score-driven filter – i.e. the maximum moduli of the eigenvalues of \(\Phi _{1}\) and \(\Phi _{2}\) are less than 1.

Under such an assumption, the covariance stationarity in Blazsek et al. (2022, proposition 1b) may be straightforwardly extended to the short- and long-run filters of two-component Beta-t-QVAR-M-lev. Therefore, asymptotically at the true values of parameters \(\Theta _{0}\), both score-driven filters are covariant stationary. Moreover, the exponentially almost sure (e.a.s.) convergence of the score-driven filter to a unique stationary and ergodic solution in Blazsek et al. (2022, proposition 2b) may also be extended to the short- and long-run filters of the two-component Beta-t-QVAR-M-lev.

Thus, the short- and long-run components of the score-driven filter converge e.a.s. to unique strictly stationary and ergodic vector sequences for all \(\Theta \in {\tilde{\Theta }}\). Lastly, under the same assumption on \(\Phi _{1}\) and \(\Phi _{2}\) in Blazsek et al. (2022, propositions 3–7), the consistency and asymptotic normality of the ML estimates of \(\Theta \) hold for the two-component Beta-t-QVAR-M-lev model.

3.5 The classical alternative: two-component GARCH-M-lev models

It is shown in Blazsek et al. (2022) that the forecasting performance of the one-component Beta-t-QVAR-M-lev model is superior to that of the one-component Beta-t-EGARCH-M-lev, Beta-t-EGARCH-M, Beta-t-EGARCH-lev, and Beta-t-EGARCH models. Drawing on these results, in the present work special cases of the Beta-t-QVAR-M-lev are then not considered. Instead, the volatility forecasting performances of the one- and two-component Beta-t-QVAR-M-lev model as well as the one- and two-component GARCH-M-lev models are compared.

The two-component GARCH-M-lev model is specified as follows:

$$\begin{aligned} y_{t}=\mu _{t}+v_{t}=\mu _{t}+\lambda _{t}^{1/2}\epsilon _{t} \end{aligned}$$
(16)

where \(\mu _{t}\) and \(v_{t}\) are the expected and unexpected returns, respectively, \(\lambda _{t}\) is a filter which influences volatility, and for the standardized error term we consider two alternatives: \(\epsilon _{t}\sim N(0,1)\) i.i.d. and \(\epsilon _{t}\sim t(\nu )\) with degrees of freedom \(2<\nu <\infty \). These alternatives define the two-component Gaussian-GARCH-M-lev and two-component t-GARCH-M-lev models, respectively, for which the conditional volatilities are given by \(\sigma _{t}=\lambda _{t}^{1/2}\) and \(\sigma _{t}=[\lambda _{t}\nu /(\nu -2)]^{1/2}\), respectively.

The expected return is specified according to the GARCH-M model, as follows:

$$\begin{aligned} \mu _{t}=c+\beta _{2}\lambda _{t} \end{aligned}$$
(17)

The volatility filter is the sum of a short- and long-run component, as detailed below:

$$\begin{aligned} \lambda _{t}=\lambda _{1,t}+\lambda _{2,t} \end{aligned}$$
(18)

respectively, where the short-run component with leverage effects is expressed as follows:

$$\begin{aligned} \lambda _{1,t}=\omega +\phi _{1}\lambda _{1,t-1}+\psi _{1} v_{t-1}^{2}+\psi ^{*} v_{t-1}^{2} \mathbbm {1}(v_{t-1}<0) \end{aligned}$$
(19)

in which \(\mathbbm {1}(\cdot )\) is the indicator function, and the long-run component consists of the following:

$$\begin{aligned} \lambda _{2,t}=\phi _{2}\lambda _{2,t-1}+\psi _{2} v_{t-1}^{2} \end{aligned}$$
(20)

The \(\lambda _{2,t}=0\) restriction provides the one-component GARCH-M-lev model. All GARCH-M-lev model specifications are estimated through the ML method.

4 Data and results

In this section, we detail the stock market return data, report descriptive statistics and present out-of-sample volatility forecasting results for the G20 indices.

4.1 Data

We explore daily stock market returns of G20 countries for the period from 2000 to 2022, collected from Bloomberg. The stock index data \(p_{t}\) are transformed into daily log-returns \(y_{t}\), which are computed by using the opening and closing prices of each trading day. We use the same sample period for all countries. The dataset includes 20 stock indices, as follows: MERVAL Index (Argentina), S&P/ASX 300 (Australia), Ibovespa (Brazil), S&P/TSX Composite Index (Canada), Shanghai Stock Exchange Composite Index (China), CAC 40 (France), DAX (Germany), S&P BSE SENSEX Index (India), Jakarta Stock Exchange Composite Index (Indonesia), FTSE MIB Index (Italy), Nikkei 225 (Japan), S&P/BMV IPC (Mexico), MOEX Russia Index (Russia), Tadawul All Share Index (Saudi Arabia), FTSE/JSE Africa All Share Index (South Africa), Korea Stock Exchange KOSPI Index (South Korea), IBEX 35 Index (Spain), Borsa Istanbul 100 Index (Turkey), FTSE 100 Index (UK) and S&P 500 Index (USA).

In addition, motivated by the work in Liu et al. (2015), we explore the 5-min realized volatility as a proxy of true volatility. This dataset was collected from the Oxford-Man Institute of Quantitative Finance (OMI). We present the data availability of stock index and realized volatility in Table 1 (Appendix A), which shows that realized volatility is not available for seven countries. Therefore, we perform volatility forecasting analysis considering the following 13 countries: Brazil, Canada, China, France, Germany, India, Italy, Japan, Mexico, South Korea, Spain, the UK and the USA.

The descriptive statistics of \(y_{t}\) regarding these 13 countries are reported in Table 2 (Appendix A). Most of the variables have their mean around zero, negative skewness and excess kurtosis. The skewness and excess kurtosis indicate that none of the series should follow a Gaussian distribution. The results of the correlation between the absolute returns in time t and its respective previous return in time \(t-1\) indicate that they are negatively correlated.

4.2 Volatility forecasting results

One-step-ahead volatility forecasts \(\sigma _{t}\) are computed for the forecasting window, which includes the last 2500 trading days of the full sample period. The start and end dates of the forecasting window are reported in Table 1. As a proxy of true volatility \(\sigma _{t}^{*}\), the square root of the 5-min realized variance is adopted. Forecasts are compared through the Giacomini–White test, considering the following loss functions:

$$\begin{aligned} \begin{array}{ll} \text {MSE}_{1,i,t}=(\sigma _{t}^{*}-\sigma _{i,t})^{2}&{} \text {MSE}_{2,i,t}=[(\sigma _{t}^{*})^{2}-\sigma _{i,t}^{2}]^{2}\\ \text {QLIKE}_{i,t}= \left\{ \frac{(\sigma _{t}^{*})^{2}}{\sigma _{i,t}^{2}}-\ln \left[ \frac{(\sigma _{t}^{*})^{2}}{\sigma _{i,t}^{2}}\right] -1\right\} &{} \text {R}^{2}\text {LOG}_{i,t}= \left\{ \ln \left[ \frac{(\sigma _{t}^{*})^{2}}{\sigma _{i,t}^{2}}\right] \right\} ^{2}\\ \text {MAE}_{1,i,t}=|\sigma _{t}^{*}-\sigma _{i,t}|&{} \text {MAE}_{2,i,t}=|(\sigma _{t}^{*})^{2}-\sigma _{i,t}^{2}|\\ \end{array} \end{aligned}$$
(21)

for model i and for each period of the forecasting window \(t=1,\ldots ,T_{f}\) (Hansen and Lunde 2005; Patton 2011). The null hypothesis of the Giacomini–White test is equal to the out-of-sample forecasting accuracy of two models, where parameters are estimated using a rolling-window estimation and forecasting approach. For each rolling window, we check the conditions of consistency and asymptotic normality of the ML estimator. Due to the large number of rolling windows and stock indices included in our sample, we do not report results on those empirical estimates, although they are available upon request to the authors.

In Table 3 (Appendix A), we present the mean loss functions for the forecasting window for each stock market index, considering the one- and two-component Beta-t-QVAR-M-lev models, one- and two-component Gaussian-GARCH-M-lev, and t-GARCH-M-lev models. We also report the statistical significance of the Giacomini–White test statistic, for which the benchmark model is the two-component Beta-t-QVAR-M-lev model – highlighted in bold in Table 3.

For all countries, we obtain the following three results. Firstly, the forecasting results of the one-component models confirm the findings of Blazsek et al. (2022) for all stock market indices. The loss function estimates indicate that the forecasting performance of the one-component Beta-t-QVAR-M-lev is superior to the forecasting performances of the one-component Gaussian-GARCH-M-lev and t-GARCH-M-lev models. We do not report the corresponding Giacomini–White test results in Table 3, although they are available upon request to the authors.

Secondly, the Giacomini–White test results of Table 3 show that the volatility forecasting performance of the two-component Beta-t-QVAR-M-lev model is superior compared with all alternative volatility models considered in the present work. This suggests that, for all stock indices the volatility forecasting performance of the one-component Beta-t-QVAR-M-lev model is improved. Thirdly, it is shown in Table 3 that the adoption of two volatility components instead of one improves volatility forecasting of the Beta-t-QVAR-M-lev model for all stock market indices, although the opposite is true regarding the Gaussian-GARCH-M-lev and t-GARCH-M-lev models.

For the GARCH-M-lev models, we find that the one-component models have superior forecasting performance. This finding is specific to the G20 stock market indices data explored in the present work. This indicates that the two-component score-driven volatility models may provide superior performance in comparison with the two-component GARCH-type volatility models – possibly due to the relatively easier handling of parameter constraints of the two-component Beta-t-QVAR-M-lev compared with the two-component GARCH model with non-negativity constraints. For further details, we refer to Engle and Lee (1999) and Harvey (2013).

5 Conclusion

In the present paper, we propose the new two-component Beta-t-QVAR-M-lev model. The one-component Beta-t-QVAR-M, and the one- and two-component Beta-t-EGARCH-M models are special cases of the two-component Beta-t-QVAR-M-lev model. We extend the US stock market dataset of Blazsek et al. (2022) to 13 stock indices of the G20 countries. This shows that the volatility forecasting superiority of the one-component Beta-t-QVAR-M still holds for other stock market indices. The results also indicate that the two-component Beta-t-QVAR-M-lev improves the forecasting performance of one-component Beta-t-QVAR-M for all stock indices.

The full dataset includes 20 stock market indices of G20 counties, from January 2000 to April 2022. Regarding the volatility forecasting accuracy, we use the 5-min realized volatility as a benchmark of true volatility. For the comparison of volatility forecasting accuracy, we adopt the Giacomini–White test of forecasting accuracy for 2500 rolling windows. The realized volatility data are available for 13 stock market indices of the G20. Our full sample period includes several low- and high-volatility regimes. For all countries, we study volatility forecasting accuracy for the last 2500 trading days of the full sample, from 2012 to 2022.

In terms of classical models of conditional volatility, we consider the one- and two-component Gaussian-GARCH-M-lev and t-GARCH-M-lev models. We compare the forecasting performances of those models with the one- and two component Beta-t-QVAR-M-lev models for all stock market indices. The out-of-sample volatility forecasts suggest that the forecasting results for the one-component models confirm the findings in Blazsek et al. (2022) for all 13 stock market indices explored in this paper. The forecasting performance of the one-component Beta-t-QVAR-M-lev is superior to that of the one-component Gaussian-GARCH-M-lev and t-GARCH-M-lev models for all countries.

In addition, the Giacomini–White test results reveal that the volatility forecasting performance of the two-component Beta-t-QVAR-M-lev model is superior to that of all competing alternatives. Therefore, the volatility forecasting performance of the one-component Beta-t-QVAR-M-lev model from the existing literature is improved. Moreover, we find that considering two components instead of one improves volatility forecasting for the Beta-t-QVAR-M-lev model, while the opposite holds for Gaussian-GARCH-M-lev and t-GARCH-M-lev models. This particular finding indicates that volatility forecasting performance of two-component score-driven volatility models may be superior compared with volatility forecasting performance of two-component GARCH models.