1 Introduction

Volatility forecasting plays a critical role in derivatives valuation, portfolio management, and risk measurement. It attracts extensive research to improve the forecasting performance of time series volatility models (Barunik et al., 2016; Ma et al., 2019). The development of big data technology and artificial intelligence has been significantly changing the development process of econometric volatility estimation models (Papanagnou & Matthews-Amune, 2018; Zhu et al., 2023). The advancement of financial storage technologies enables investors and quantitative traders to effectively utilize all the available trading information, such as highest prices, lowest prices, closing prices, etc., for risk management or arbitrage purposes (Treleaven et al., 2013; Nuti et al., 2011). This required improvement of the previous paradigm which solely uses closing prices or point-valued trading data for risk management. This study proposes a new Generalized Autoregressive Conditional Heteroskedasticity (GARCH)-type volatility forecasting model based on random set theory and sets-valued time series, namely, the Set-GARCH model.

In the context of estimating volatility, GARCH models remain as the most popular devices. Both uni-variate and multi-variate GARCH employ point-valued data, i.e., each moment of the price time series of the GARCH model is a point, most notably the closing price-based returns (Hansen & Lunde, 2005). However, with the development of big data in finance, the structure of price data of assets has changed (Treleaven et al., 2013; Nuti et al., 2011). All trading information during the day t would inform investors’ decision-making i.e., investors may short (long) assets at any price during the trading day, rather than only focusing on the closing price. Conditional autoregressive range (CARR) group models reveal the relationship between return volatility and highest, and lowest price (Chou, 2005; Parkinson, 1980). The diverse set of price information facilitates volatility forecasting in the context of big data (Lyócsa et al., 2021; Molnár, 2012).

The non-point-valued data, particularly the study of interval-valued price characteristics or interval-valued price forecasting, has been prevalent in recent years (Buansing et al., 2020; Joshi & Kumar, 2016; Maia & de Carvalho, 2011). Interval-valued data may provide more information than traditional point-valued GARCH or CARR group models and could be used to forecast price volatility. The interval-valued models, such as auto-regressive conditional interval (ACI) group models and GARCH model with interval-valued variables (Int-GARCH) (Han et al., 2016; He et al., 2021; Sun et al., 2018, 2020; Yang et al., 2016), could explain the evolution of an interval-valued defined price proxy. The use of interval-valued variables and the incorporation of random set theory distinguishes these models from the standard multivariate time series model.Footnote 1

However, the interval-valued variables are only driven by information regarding the highest and lowest prices. Could we use a sets-valued variable to represent prices and construct a sets-valued time series volatility model by incorporating additional price information (such as the closing price)? The interval-valued variable contains all possible price points with “equal” weighting, but do these points actually weigh equally? Could we use sets-valued information to achieve our desired point values? Motivated by these considerations, we intend to develop a sets-valued time series model to describe the dynamics of return dynamics and, from this, to forecast volatility.

The key element of our methodology is using random fuzzy sets to characterize the stochastic process of returns and calculate volatility. Numerous studies have characterized prices (or returns) with fuzzy sets-valued data (Atsalakis et al., 2019; Ezbakhe & Pérez-Foguet, 2021; Nowak & Romaniuk, 2010), it is proven that fuzzy set-valued data contains more information than interval-valued data.Footnote 2 Not only does using fuzzy sets-valued prices as a proxy for one day’s returns incorporate multiple types of price information, but it also highlights key information with a continuum of membership grades (Hocine et al., 2020; Jones et al., 2022). However, forecasting volatility with fuzzy sets-valued data is still underdeveloped due to its complexity. Given that returns are stochastic, fuzzy sets-valued data must be transformed into random fuzzy sets-valued time series, followed by evolution equations describing the time-varying pattern of volatility. Some studies have combined the fuzzy set concept with GARCH models, but in their models, prices are still point-valued or are not considered random fuzzy sets-valued (D’Urso et al., 2016).

In this vein, we construct fuzzy sets-valued prices in a stochastic manner and propose a novel Set-GARCH model to address the aforementioned issues in volatility forecasting. We incorporate additional price information into a set and predict the volatility of returns by influencing the time series of sets-valued variables. The highest log-price \(H_t\), lowest log-price \(L_t\), and closing log-price \(C_t\) of a day are integrated into a set to form a sets-valued stochastic variable, that is, \(\tilde{{\varvec{P}}}_t=\{H_t,L_t,C_t\}\). The Set-GARCH model has a similar volatility-driven equations structure to the GARCH model in general. Moreover, the addition/multiplication operation and the distance measurement of the Set-GARCH model are performed in the random fuzzy set space (Körner & Näther, 2002; Li et al., 2013; Sun et al., 2020; Wang et al., 2016). In this study’s practice of volatility forecasting, the Set-GARCH model is adaptable to different derivative family models and has demonstrated distinctive advantages.

The main contributions of this paper are two-fold. First, we established a theoretical framework for modelling dynamic volatility using random fuzzy sets-valued returns data. We propose the Set-GARCH model based on the characteristics of data structure and sets-valued operation rules. The Set-GARCH model extends the ACI model (Sun et al., 2018), the Int-GARCH model (Sun et al., 2020), and other interval-valued time series models (Wu et al., 2023; González-Rivera & Lin, 2013; Gonzalez-Rivera et al., 2020) from interval-valued data to sets-valued data. Our Set-GARCH model is, to the best of our knowledge, the first model to describe the dynamic volatility of sets-valued returns time series.

Second, we address the specification limits and extended specification space in the Set-GARCH model and thus propose the Set-GARCH-LR model as a variation We utilized the crude oil, gold, and S &P500 index, which are representative of the market, using daily, weekly, and monthly trading data, respectively, for the applications of the proposed models. In-sample and out-of-sample volatility forecasting demonstrate that the Set-GARCH model and Set-GARCH-LR model outperform conventional GARCH-type, CARR-type, ACI, and Int-GARCH models.

The rest of the paper is organized as follows. In Sect. 2, the definition of fuzzy sets-valued returns is provided. The specifications of our proposed Set-GARCH models is provided in Sect. 3. Section 4 presents an application of empirical data. The paper concludes with Sect. 5. “Appendix A” illustrates the benchmark volatility models used to validate the superiority of our approach by separating the point-valued and interval-valued cases. “Appendix B” contains an empirical example of a relevant technical point of the methodological section.

2 Construction of fuzzy sets-valued returns

2.1 Preliminaries of the fuzzy sets-valued random variable

We begin with defining the returns in the sets-valued variable space firstly by random set theory (Li & Guan, 2007; Li et al., 2013).

2.1.1 Sets-valued variable

Let \({\varvec{P}}_0({\mathbb {R}}^d)\) the family of all non-empty subsets of \({\mathbb {R}}^d\). For any \(\tilde{{\varvec{A}}}\in {\varvec{P}}_0({\mathbb {R}}^n)\), we first define the membership function \(m_{\tilde{{\varvec{A}}}}(x):{\mathbb {R}}^d\rightarrow {0,1}\) as

$$\begin{aligned} m_{\tilde{{\varvec{A}}}}=\begin{aligned}\left\{ \begin{array}{rcl} 0&{},&{}x\notin \tilde{{\varvec{A}}}\\ 1&{},&{}x\in \tilde{{\varvec{A}}} \end{array} \right. \\ \end{aligned} \end{aligned}$$
(1)

Membership function \(m_{\tilde{{\varvec{A}}}}\) reflects whether x belongs to \(\tilde{{\varvec{A}}}\). For \(\tilde{{\varvec{A}}}, \tilde{{\varvec{B}}}\in {\varvec{P}}_0({\mathbb {R}}^d)\), we have the addition and scalar multiplication operation:

$$\begin{aligned} \tilde{{\varvec{A}}}\oplus \tilde{{\varvec{B}}}=\{a+b:a\in \tilde{{\varvec{A}}},b\in \tilde{{\varvec{B}}}\}\nonumber \\ \lambda \tilde{{\varvec{A}}}=\{\lambda a:a\in \tilde{{\varvec{A}}}\},\lambda \in {\mathbb {R}} \end{aligned}$$
(2)

interval-valued subtraction consists of two concepts. Similarly, this issue will arise when we discuss sets-valued subtraction. Sets-valued subtraction rule of ACI model (named Type-A subtraction in this paper) considered that the sets-valued subtraction operation should be the inverse of the sets-valued addition operation, shown in Eq.(3). However, the Sets-valued subtraction rule of Int-GARCH models (named Type-B subtraction in this paper) consider that the subtraction rule subtraction rule should be strictly adhered to in the set of basic arithmetic operations, shown in Eq.(4). This paper provides a detailed explanation in “Appendix A” on the ACI and Int-GARCH model, which are two benchmark models, along with their respective subtraction rules.

Type-A Subtraction:

$$\begin{aligned} \tilde{{\varvec{A}}}\ominus _A\tilde{{\varvec{B}}}=\{x\in {\mathbb {R}}^d,x+\tilde{{\varvec{B}}}\subset \tilde{{\varvec{A}}}\} \end{aligned}$$
(3)

where \(x+\tilde{{\varvec{B}}}=\{y=x+b:b\in \tilde{{\varvec{B}}}\}\).

Type-B Subtraction:

$$\begin{aligned} \tilde{{\varvec{A}}}\ominus _B\tilde{{\varvec{B}}}=\tilde{{\varvec{A}}}\oplus (-1)\otimes \tilde{{\varvec{B}}} \end{aligned}$$
(4)

where \(\oplus \) and \(\otimes \) are addition and scalar operations in Eq. (2). We pay close attention to the subtraction operation because the choice of subtraction, i.e., Type-A substract of ACI model and Type-B subtract of Int-GARCH model will directly impact the structure of our sets-valued volatility model.

2.1.2 Fuzzy sets-valued variable

A fuzzy set \(\tilde{{\varvec{A}}}\) on \({\mathbb {R}}^d\) is identified by its membership function \(m_{\tilde{{\varvec{A}}}}(x):{\mathbb {R}}^d\rightarrow [0,1]\), where \(m_{\tilde{{\varvec{A}}}}\) is interpreted as the degree of acceptance that \(x\in {\mathbb {R}}^d\) is a member of \(\tilde{{\varvec{A}}}\). Unlike the situation described in Eq. (1), whether x belongs to \(\tilde{{\varvec{A}}}\) is not definite, but exists in an ambiguous “either/or” state.

The crisp set

$$\begin{aligned} \tilde{{\varvec{A}}}_\alpha \doteq \{x\in {\mathbb {R}}^d:m_{\tilde{{\varvec{A}}}}\ge \alpha \},\quad \alpha \in [0,1] \end{aligned}$$
(5)

is called the \(\alpha \)-cut of \(\tilde{{\varvec{A}}}\). For \(\alpha =0\), the support of \(\tilde{{\varvec{A}}}\) is defined as \(\tilde{{\varvec{A}}}_{\alpha =0}\doteq cl\{x\in {\mathbb {R}}^d:m_{\tilde{{\varvec{A}}}}>0\}\doteq supp\tilde{{\varvec{A}}}\). For any two fuzzy sets \(\tilde{{\varvec{A}}}\) with membership function \(m^A(x)\) and \(\tilde{{\varvec{B}}}\) with \(m^B(x)\), they have addition operation \(\tilde{{\varvec{A}}}\oplus \tilde{{\varvec{B}}}=\tilde{{\varvec{C}}}\) and scalar multiplication operation \(\lambda \otimes \tilde{{\varvec{A}}}=\tilde{{\varvec{D}}}\). Given the membership function of \(\tilde{{\varvec{C}}}\) is \(m^C(x)\) and \(\tilde{{\varvec{B}}}\) is \(m^D(x)\), we have that

$$\begin{aligned} m^C(x)&=sup\{\alpha \in [0,1]:x\in m^A(x)_\alpha +x\in m^B(x)_\alpha \} \nonumber \\ m^D(x)&=\begin{aligned}\left\{ \begin{array}{rcl} m^A(\frac{x}{\lambda })&{},&{}\lambda \ne 0\\ {\tilde{{\varvec{0}}}\in {\mathbb {R}}^d}&{},&{}\lambda =0 \end{array} \right. \\ \end{aligned} \end{aligned}$$
(6)

Similar to the sets-valued variable case, given \(\tilde{{\varvec{A}}}\ominus \tilde{{\varvec{B}}}=\tilde{{\varvec{E}}}\), \(\tilde{{\varvec{E}}}\) has the membership function \(m^E(x)\). The subtraction \(\ominus \) of fuzzy sets-valued variables could also be defined in Type-A Subtraction like Eqs. (A13) and (3),

$$\begin{aligned} m^E(X)=&m^A(x)\ominus _Am^B(x) \nonumber \\ =&sup\{\alpha \in [0,1]:x\in m^A(x)_\alpha -m^B(x)_\alpha \},\quad x\in {\mathbb {R}}^d \end{aligned}$$
(7)

or Type-B subtraction like Eqs. (A15) and (4)

$$\begin{aligned} m^E(x_E)=m^A(x_A)\ominus m^B(x_B)=\mathop {Sup}\limits _{x_A-x_B=x_E}Inf\{m^A(x_A),m^B(x_B)\} \end{aligned}$$
(8)

Fuzzy sets-valued Type-A subtraction is consistent with the idea of the ACI model. Fuzzy sets-valued Type-B subtraction is consistent with the idea of the Int-GARCH model and a classic fuzzy sets-valued subtraction rule (Zhü 2014).

2.1.3 The distance of fuzzy sets-valued variable

We give the concept of the support function first. Given \(S_{\tilde{{\varvec{M}}}}\doteq \mathop {Sup}\limits _{y\in \tilde{{\varvec{A}}}}<u,y>\), \(u\in {\mathbb {S}}^{d-1}\), where \(\tilde{{\varvec{M}}}\) is a sets-valued variable, \(<,>\) is a scalar-inner product, and \({\mathbb {S}}^{d-1}\) is the unit sphere of \({\mathbb {R}}^d\). The support function of a fuzzy sets-valued variable \(\tilde{{\varvec{A}}}\) is \(S_{\tilde{{\varvec{A}}}}(u,\alpha )\doteq S_{\tilde{{\varvec{A}}}_\alpha (u)}\), \(\alpha \in (0,1]\), \(u\in {\mathbb {S}}^{d-1}\) and \(\tilde{{\varvec{A}}}_\alpha \) is \(\alpha \)-cut of \(\tilde{{\varvec{A}}}\) in Eq. (5).

Using the concept of support function (Körner & Näther, 2002; Li et al., 2013), a popular \(\rho _2\) distance measure between fuzzy sets-valued variable \(\tilde{{\varvec{A}}}\) and \(\tilde{{\varvec{B}}}\) is

$$\begin{aligned} \rho _2(\tilde{{\varvec{A}}},\tilde{{\varvec{B}}}) =\int \limits _{[0,1]^2\times ({\mathbb {S}}^{d-1})^2}{((S_{\tilde{{\varvec{A}}}}(u,\alpha )-S_{\tilde{{\varvec{B}}}}(u,\alpha ))(S_{\tilde{{\varvec{A}}}}(v,\beta )-S_{\tilde{{\varvec{B}}}}(v,\beta )))dK(u,\alpha ,v,\beta )} \nonumber \\ \end{aligned}$$
(9)

where \(dK(u,\alpha ,v,\beta )\) is a kernel, and \(\int \limits _{[0,1]^2\times ({\mathbb {S}}^{d-1})^2}dK(u,\alpha ,v,\beta )=1\). Moreover, the inner product between fuzzy sets-valued variable \(\tilde{{\varvec{A}}}\) and \(\tilde{{\varvec{B}}}\) is

$$\begin{aligned} <\tilde{{\varvec{A}}},\tilde{{\varvec{B}}}>\doteq \int \limits _{[0,1]^2\times ({\mathbb {S}}^{d-1})^2}S_{\tilde{{\varvec{A}}}}(u,\alpha )S_{\tilde{{\varvec{B}}}}(v,\beta )dK(u,\alpha ,v,\beta ) \end{aligned}$$
(10)

The expectation of any fuzzy random set \(\tilde{{\varvec{X}}}\), denoted by \({\mathbb {E}}(\tilde{{\varvec{X}}})\), is also a fuzzy set-variable in such that for every \(\alpha \in [0,1]\), i.e.,

$$\begin{aligned} ({\mathbb {E}}(\tilde{{\varvec{X}}}))_\alpha =cl\{{\mathbb {E}}f:f\in S_{\tilde{{\varvec{X}}}_\alpha }\} \end{aligned}$$
(11)

The variance of \(\tilde{{\varvec{X}}}\) can be defined as

$$\begin{aligned} {\mathbb {D}}(\tilde{{\varvec{X}}})&=\int \limits _{[0,1]^2\times ({\mathbb {S}}^{d-1})^2}COV(S_{\tilde{{\varvec{X}}}}(u,\alpha ),S_{\tilde{{\varvec{X}}}}(v,\beta ))dK(u,\alpha ,v,\beta )\nonumber \\&={\mathbb {E}}(<\tilde{{\varvec{X}}},\tilde{{\varvec{X}}}>)-<{\mathbb {E}}(\tilde{{\varvec{X}}}),{\mathbb {E}}(\tilde{{\varvec{X}}})> \end{aligned}$$
(12)

where \(<\tilde{{\varvec{X}}},\tilde{{\varvec{X}}}>\) is a random variable, and the definition of fuzzy sets-valued inner product can be referred to Eq. (10).

2.2 Fuzzy sets-valued price and returns

We build the highest log-price \(H_t\), lowest log-price \(L_t\), and closing log-price \(C_t\) information of day t into a set to form a sets-valued stochastic variable, that is, \(\tilde{{\varvec{P}}}_t=\{H_t,L_t,C_t\}\). The range of asset price movements is formed by the \(H_t\) and \(L_t\) of the asset. Compared to the opening and settlement prices, the closing price of an asset contains richer information relating to investors’ market perceptions. The closing price often reflects the level of market attention from investors towards a particular stock and can serve as an indicator of the expected movement for the next trading day.Footnote 3 Therefore, the performance of the closing price is worth paying attention to. In empirical research, most studies use \(H_t\), \(L_t\), and \(C_t\) as the LR-fuzzy set-valued price for asset prices (Moussa et al., 2014; Hassan, 2009).

We give \(\tilde{{\varvec{P}}}_t\) with membership function \(m_{\tilde{{\varvec{P}}}_t}(x)\) into a classic LR-type fuzzy set-valued variable as

$$\begin{aligned} m_{\tilde{{\varvec{P}}}_t}(x)=\begin{aligned}\left\{ \begin{array}{rcl} \phi (\frac{C_t-x}{C_t-L_t};p)&{},&{}L_t\le x\le C_t\\ \phi (\frac{x-C_t}{H_t-C_t};p)&{},&{}H_t\ge x \ge C_t \end{array} \right. \\ \end{aligned} \end{aligned}$$
(13)

where \(\phi (x;p)\doteq \frac{e^{-(-x)^p}-e^{-1}}{1-e^{-1}}{\varvec{1}}(x\le 0)+\frac{e^{-x^p}-e^{-1}}{1-e^{-1}}{\varvec{1}}(x>0)\) and \(x\in [L_t,H_t]\). The benefit of choosing such a \(\phi (x;p)\) is that the parameter p can control the morphology of \(\tilde{{\varvec{P}}}(x)_t\) to produce rich variations (see “Appendix B”).

Let the closing price returns of day t be \(R_{C,t}=C_t-C_{t-1}\), similarly, the highest returns of day t are \(R_{H,t}=H_t-L_{t-1}\) and the lowest returns of day t are \(R_{L,t}=L_t-H_{t-1}\), then a sets-valued stochastic variable \(\tilde{{\varvec{R}}}_t\) under Type-B subtraction of Eq. (8) is also a fuzzy sets-valued variable with the membership function as

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{{\varvec{P}}}_t\ominus _B\tilde{{\varvec{P}}}_{t-1}=\tilde{{\varvec{P}}}_t\otimes -1\otimes \tilde{{\varvec{P}}}_{t-1}\nonumber \\ m_{\tilde{{\varvec{P}}}_t}(x)&=\begin{aligned}\left\{ \begin{array}{rcl} \phi (\frac{R_{C,t}-x}{R_{l,t}};p)&{},&{}R_{C,t}-R_{l,t}\le x\le R_{C,t}\\ \phi (\frac{x-R_{C,t}}{R_{r,t}};p)&{},&{}R_{C,t}\le x< R_{C,t}+R_{r,t} \end{array} \right. \\ \end{aligned} \end{aligned}$$
(14)

where \(R_{l,t}=C_t-L_t\) and \(R_{r,t}=H_t-C_t\). We will explain in Sect. 2.3 why we chose the Type-B subtraction of Eq. (7) instead of the Type-A subtraction of Eq. (8). A numerical example in “Appendix B” shows the membership function trajectory of our fuzzy sets-valued returns.

There are three benefits of using the fuzzy sets-valued variable of Eq. (13): (1) Compared with GARCH-type models’ point valued-returns, it expands the \(H_t\) and \(L_t\) information. (2) Compared with CARR-type models’ range based point-valued returns, it expands the “trend” information of \(H_t\) and \(L_t\). (3) Compared with ACI and Int-GARCH models’ interval valued-returns, Eq. (13) can flexibly highlight the closing price information.

2.3 Why do we choose the Type-B subtraction?

If the Type-A subtraction (like the ACI model as introduced in Section A.2) is used in the calculation of returns when \(L_t-H_t\le L_t-L_{t-1}\le C_t-C_{t-1}\le H_t-H_{t-1}\le H_t-L_{t-1}\) and \(L_t-H_{t-1}\le H_t-H_{t-1} \le C_t-C_{t-1} \le L_t-L_{t-1} \le H_t-L_{t-1}\),Footnote 4 the fuzzy sets-valued returns calculated by the Type-A subtraction \(\tilde{{\varvec{R}}}_{Type-A}\) with membership function \(m_{Type-A}(x)\) could be

$$\begin{aligned} \tilde{{\varvec{R}}}_{Type-A}&=\tilde{{\varvec{P}}}_t\ominus _A\tilde{{\varvec{P}}}_{t-1}\nonumber \\ m_{Type-A}(x)&=\begin{aligned}\left\{ \begin{array}{rcl} \phi (\frac{R_{C,t}-x}{R_{l,t}-(L_t-L_{t-1})};p)&{},&{}L_t-L_{t-1}\le x\le R_{C,t}\\ \phi (\frac{x-R_{C,t}}{(H_t-H_{t-1})-R_{r,t}};p)&{},&{}R_{C,t}\le x< H_t-H_{t-1} \end{array} \right. \\ \end{aligned} \end{aligned}$$
(15)

If we ignore the fuzziness of \(\tilde{{\varvec{P}}}_{Type-A,t}\) or assume a relatively high value of p in Eq. (15), \(\tilde{{\varvec{R}}}_{Type-A,t}\) will become \(\tilde{{\varvec{R}}}^*_{Type-A,t}\), i.e.,

$$\begin{aligned} \tilde{{\varvec{R}}}^*_{Type-A,t}=[L_t-L_{t-1},H_t-H_{t-1}]=[\tilde{{\varvec{P}}}_{L,t}-\tilde{{\varvec{P}}}_{L,t-1},\tilde{{\varvec{P}}}_{R,t}-\tilde{{\varvec{P}}}_{R,t-1}] \end{aligned}$$
(16)

Due to \(L_t\le C_t\le H_t\) and \(L_{t-1}\le C_{t-1}\le H_{t-1}\), under the Type-A subtraction, we cannot guarantee that \(R_{C,t}=C_t-C_{t-1}\in Supp \tilde{{\varvec{R}}}^*_{Type-A,t}\). It would be contrary to our original intent to absorb closing price data. Further, let \(\Delta m(x;p)\) be the difference between \(m_{\tilde{{\varvec{P}}}_t}(x)\) in Eq. (14) and \(m_{Type-A}(x)\) in Eq. (15), i.e.,

$$\begin{aligned} \Delta m(x;p)=\begin{aligned}\left\{ \begin{array}{rcl} m_{\tilde{{\varvec{P}}}_t}(x)-m_{Type-A}(x)&{},&{}L_t-L_{t-1}\le x\le H_t-H_{t-1}\\ m_{\tilde{{\varvec{P}}}_t}(x)&{},&{}others \end{array} \right. \\ \end{aligned} \end{aligned}$$
(17)

then Fig 1 demonstrates the trajectory of \(\Delta m(x;p)\).

Fig. 1
figure 1

The trajectory of \(\Delta m(x;p)\) with different value of p

Fig 1 demonstrates the trajectory of \(\Delta m(x;p)\). When a real-world trading point-valued returns \(r_t\) is in \([L_t-H_{t-1},L_t-L_{t-1}]\) and \([H_t-H_{t-1},H_t-L_{t-1}]\), as p in Eq. (17) increases, the degree of membership of \(r_t\) to \(\tilde{{\varvec{R}}}_t\) of Eq. (14) will surpass \(\tilde{{\varvec{P}}}_{Tpyr-A,t}\) of Eq. (15) to a greater extent. When the point-valued returns \(r_t\) is in \([L_t-L_{t-1},H_t-H_{t-1}]\), the greater the p, the smaller the difference between the membership of \(r_t\) for \(\tilde{{\varvec{R}}}_t\) and the membership for \(\tilde{{\varvec{R}}}_{Tpye-A,t}\).

From the perspective of information absorption, when p is smaller, the difference between selecting Type-A and Type-B subtraction is smaller; and when p is larger, selecting the Type-B subtraction has a higher degree of information absorption on \([L_t-H_{t-1},L_t-L_{t-1}]\) and \([H_t-H_{t-1},H_t-L_{t-1}]\). This actually implies that we should regard the setting of p as a prior parameter, rather than putting it into our model and then estimating its value. All in all, if real-world trading returns \(r_t\) fall in the interval \([L_t-H_{t-1},L_t-L_{t-1}]\) and \([H_t-H_{t-1},H_t-L_{t-1}]\), the returns defined by Type-A subtraction (like the ACI model) cannot cover \(r_t\). This goes against the original intent of the model we wish to create, and we also find that the preceding parameter p in Eq. (13).

2.4 Discussion of \(K(u,\alpha ,v,\beta )\)

Here we discuss the setting of \(K(u,\alpha ,v,\beta )\) in fuzzy sets-valued returns (He et al., 2021; Sun et al., 2018; Yang et al., 2016), which is used in scalar-inner product, distance, and variance calculation of \(\tilde{{\varvec{R}}}_t\) from Eqs. (9). to (12). Given that the sets-valued volatility model in this study is for uni-variate fuzzy sets-valued time series, we have \({\mathbb {S}}^{d-1}={\mathbb {S}}^0=\{1,-1\}\) in the support function in Eq. (9). The u and v in Eq. (9) and \(K(u,\alpha ,v,\beta )\) only takes 1 or \(-1\) in this study. We have (He et al., 2021; Sun et al., 2020, 2018; Yang et al., 2016)

$$\begin{aligned} K(u,\alpha ,v,\beta )=\begin{aligned}\left\{ \begin{array}{rcl} a\cdot \delta _{\alpha }(\beta )d\alpha &{},&{}u=v=1\\ b\cdot \delta _{\alpha }(\beta )d\alpha &{},&{}u=v=-1\\ c\cdot \delta _{\alpha }(\beta )d\alpha &{},&{}u=-v\\ \end{array} \right. \\ \end{aligned} \end{aligned}$$
(18)

where \(\delta _{\alpha }(\beta )=1\) when \(\alpha =\beta \) and \(\delta _{\alpha }(\beta )=0\) when \(\alpha \ne \beta \). For the settings of a, b, c, a classic form is Körner and Näther (2002); Näther (2001)

$$\begin{aligned} a&=1-2\int _0^1td\psi (t)+\int ^1_0t^2d\psi (t)\nonumber \\ b&=\int ^1_0t^2d\psi (t)\nonumber \\ c&=\int _0^1td\psi (t)-\int ^1_0t^2d\psi (t) \end{aligned}$$
(19)

where \(\psi (t)\) is the weight function. We set \(\psi (t)=t\) in this study, thus in Eqs. (18) and (19), we have \(a=1/3\), \(b=1/3\), and \(c=1/6\). The \(\alpha \)-cut of \(\tilde{{\varvec{R}}}_t\) is \(\tilde{{\varvec{R}}}_{\alpha ,t}\), and

$$\begin{aligned} \tilde{{\varvec{R}}}_{\alpha ,t}=[R_{C,t}-\phi ^{-1}(\alpha )R_{l,t},R_{C,t}+\phi ^{-1}(\alpha )R_{r,t}] \end{aligned}$$
(20)

where \(\phi (x)\) is defined in Eq. (13). Combining Eqs. (10) and (20), the scalar inner product is

$$\begin{aligned}&<\tilde{{\varvec{R}}}_t,\tilde{{\varvec{R}}}_t>_{a=1/3,b=1/3,c=1/6} \nonumber \\&\quad =\int ^1_0(a(R_{C,t}+\phi ^{-1}(\alpha )R_{r,t})^2+b(\phi ^{-1}(\alpha )R_{l,t}-R_{C,t})^2)d\alpha \nonumber \\&\quad +\int ^1_0(2c(R_{C,t}+\phi ^{-1}(\alpha )R_{r,t})(\phi ^{-1}(\alpha )R_{l,t}-R_{C,t}))d\alpha \nonumber \\&\quad =(a+b-2c)R^2_{C,t}+aR^2_{r,t}\int ^1_0(\phi ^{-1}(\alpha ))^2d\alpha +bR^2_{l,t}\int ^1_0(\phi ^{-1}(\alpha ))^2d\alpha \nonumber \\&\quad +2R_{C,t}R_{r,t}(a-c)\int ^1_0\phi ^{-1}(\alpha )\alpha +2R_{C,t}R_{l,t}(c-b)\int ^1_0\phi ^{-1}(\alpha )\alpha \nonumber \\&\quad +2cR_{l,t}R_{r,t}\int ^1_0(\phi ^{-1}(\alpha ))^2d\alpha \end{aligned}$$
(21)

given the distance between \(\tilde{{\varvec{R}}}_t\) and \(\tilde{{\varvec{0}}}\) is \(\rho _2(\tilde{{\varvec{R}}}_t,\tilde{{\varvec{0}}})=<\tilde{{\varvec{R}}}_t,\tilde{{\varvec{R}}}_t>\doteq \Vert \tilde{{\varvec{R}}}_t\Vert ^2_{\rho _2}\), and \({\mathbb {E}}(S_{\tilde{{\varvec{R}}}_t})=S_{{\mathbb {E}}\tilde{{\varvec{R}}}_t}\), the variance \({\mathbb {D}}(\tilde{{\varvec{R}}}_t)\) is,

$$\begin{aligned} {\mathbb {D}}(\tilde{{\varvec{R}}}_t)&={\mathbb {E}}(<\tilde{{\varvec{R}}}_t,\tilde{{\varvec{R}}}_t>_{a=\frac{1}{3},b=\frac{1}{3}, c=\frac{1}{6}})-<{\mathbb {E}}\tilde{{\varvec{R}}}_t,{\mathbb {E}}\tilde{{\varvec{R}}}_t>_{a=\frac{1}{3},b=\frac{1}{3},c=\frac{1}{6}} \nonumber \\&=(a+b-2c){\mathbb {D}}(R_{C,t})+a{\mathbb {D}}(R_{r,t})\int ^1_0(\phi ^{-1}(\alpha ))^2d\alpha \nonumber \\&\quad +b{\mathbb {D}}(R_{l,t})\int ^1_0(\phi ^{-1}(\alpha ))^2d\alpha +2COV(R_{C,t},R_{r,t})(a-c)\int ^1_0\phi ^{-1}(\alpha )d\alpha \nonumber \\&\quad +2COV(R_{C,t},R_{l,t})(c-b)\int ^1_0\phi ^{-1}(\alpha )d\alpha \nonumber \\&\quad +2COV(R_{l,t},R_{r,t})c\int ^1_0(\phi ^{-1}(\alpha ))^2d\alpha \end{aligned}$$
(22)

3 The random fuzzy sets-valued based GARCH model

3.1 Grounding ideas on the model setting

The modeling philosophy embodied in Eq. (A5) in subsection A.1.1 implies that changes in current observations are driven by historical observations. If we also wish to apply this modeling philosophy to the proposed model with parameter \(\theta \), one classic mode is:

$$\begin{aligned} \tilde{{\varvec{R}}}_{t}=f(\tilde{{\varvec{R}}}_{t-1},\tilde{{\varvec{R}}}_{t-2},\ldots ,\tilde{\varvec{\epsilon }}_{t-1},\tilde{\varvec{\epsilon }}_{t-2},\ldots ;\theta )+\tilde{\varvec{\epsilon }}_{t} \end{aligned}$$
(23)

In Eq. (23), at time t, the term \(f(\tilde{{\varvec{R}}}_{t-1},\tilde{{\varvec{R}}}_{t-2},\ldots ,\tilde{\varvec{\epsilon }}_{t-1},\tilde{\varvec{\epsilon }}_{t-2},\ldots ;\theta )\) is not stochastic any more, but is still a fuzzy sets-valued number, while \(\tilde{\varvec{\epsilon }}_{t}\) is a random sets-valued variable that gives randomness to \(\tilde{{\varvec{R}}}_{t}\).

Let \(\tilde{{\varvec{r}}}_{t}=\{\tilde{{\varvec{r}}}_{T},\tilde{{\varvec{r}}}_{T-1},\tilde{{\varvec{r}}}_{T-2},\ldots ,\tilde{{\varvec{r}}}_{1},\tilde{{\varvec{r}}}_{0}\}\) the observations of \(\tilde{{\varvec{R}}}_{t}\), and \(\tilde{{\varvec{r}}}_{t}\) is a fuzzy sets-valued variable. Under the classic model structure of Eq. (23), when one uses the minimum loss function method to estimate the parameter \(\theta \) with some loss function \(\Psi \), the estimated parameter \(\theta \) under type-A and type-B subtraction is

$$\begin{aligned} {\hat{\theta }}_{Type-A}&=\mathop {argmin}\limits _{\theta }\sum ^T_{i=1}\Psi (\tilde{{\varvec{r}}}_{i}\ominus _Af(\tilde{{\varvec{r}}}_{i-1},\tilde{{\varvec{r}}}_{i-2},\ldots ,\tilde{\varvec{\epsilon }}_{i-1},\tilde{\varvec{\epsilon }}_{i-2},\ldots ;\theta )) \nonumber \\ {\hat{\theta }}_{Type-B}&=\mathop {argmin}\limits _{\theta }\sum ^T_{i=1}\Psi (\tilde{{\varvec{r}}}_{i}\ominus _Bf(\tilde{{\varvec{r}}}_{i-1},\tilde{{\varvec{r}}}_{i-2},\ldots ,\tilde{\varvec{\epsilon }}_{i-1},\tilde{\varvec{\epsilon }}_{i-2},\ldots ;\theta )) \end{aligned}$$
(24)

respectively. However, given a real parameter \(\theta ^*\), we will never find a \({\hat{\theta }}_{Type-B}=\theta ^*\) under minimum loss function method, because \(\tilde{{\varvec{r}}}_{i}\ominus _Bf(\tilde{{\varvec{r}}}_{i-1},\tilde{{\varvec{r}}}_{i-2},\ldots ,\tilde{\varvec{\epsilon }}_{i-1},\tilde{\varvec{\epsilon }}_{i-2},\ldots ;\theta )\ne \tilde{{\varvec{0}}}\). The reason is that if we have \(\tilde{{\varvec{A}}}=\tilde{{\varvec{B}}}\), then \(\tilde{{\varvec{A}}}\ominus _A\tilde{{\varvec{B}}}=\tilde{{\varvec{0}}}\), while \(\tilde{{\varvec{A}}}\ominus _B\tilde{{\varvec{B}}}\ne \tilde{{\varvec{0}}}\). However, we could find a \({\hat{\theta }}_{Type-A}=\theta ^*\).

Using maximum likelihood (ML) for parameter estimation, Type-B subtraction suffers from the same issue. Given that both \(\tilde{{\varvec{R}}}_t\) and \(\tilde{{\varvec{r}}}_t\) in Eq. (23) are random variables, one could maximize the likelihood function of \(\tilde{{\varvec{R}}}_t\) and \(\tilde{{\varvec{r}}}_t\) to estimate \(\theta \) in Eq. (23). Given the fact that

$$\begin{aligned} \tilde{{\varvec{R}}}_t\ominus _Af(\tilde{{\varvec{R}}}_{t-1},\tilde{{\varvec{R}}}_{t-2},\ldots ,\tilde{\varvec{\epsilon }}_{t-1},\tilde{\varvec{\epsilon }}_{t-2},\ldots ;\theta )&=\tilde{\varvec{\epsilon }}_t \nonumber \\ \tilde{{\varvec{R}}}_t\ominus _Bf(\tilde{{\varvec{R}}}_{t-1},\tilde{{\varvec{R}}}_{t-2},\ldots ,\tilde{\varvec{\epsilon }}_{t-1},\tilde{\varvec{\epsilon }}_{t-2},\ldots ;\theta )&\ne \tilde{\varvec{\epsilon }}_t \end{aligned}$$
(25)

let the density function of \(\tilde{{\varvec{R}}}_t\) and \(\tilde{\varvec{\epsilon }}_t\) be \(f_{\tilde{{\varvec{R}}}}\) and \(f_{\tilde{\varvec{\epsilon }}}\), whether we maximize \(f_{\tilde{{\varvec{R}}}}\) to obtain \({\hat{\theta }}^{\tilde{{\varvec{R}}}}\) or maximize \(f_{\tilde{\varvec{\epsilon }}}\) to get \({\hat{\theta }}^{\tilde{\varvec{\epsilon }}}\). It should have \({\hat{\theta }}^{\tilde{{\varvec{R}}}}={\hat{\theta }}^{\tilde{\varvec{\epsilon }}}\). However, there is a paradox in the following maximum likelihood estimation function under model structure of Eq. (23),

$$\begin{aligned} {\hat{\theta }}^{\tilde{{\varvec{R}}}}_{Type-A}=\mathop {argmax}\limits _{\theta } \prod ^T_{t-1}{f_{\tilde{{\varvec{R}}}}(\theta \vert \tilde{{\varvec{r}}}_t)}=\mathop {argmax}\limits _{\theta } \prod ^T_{t-1}{f_{\tilde{\varvec{\epsilon }}}(\theta \vert \tilde{\varvec{\epsilon }}_t)}={\hat{\theta }}^{\tilde{\varvec{\epsilon }}}_{Type-A} \nonumber \\ {\hat{\theta }}^{\tilde{{\varvec{R}}}}_{Type-B}=\mathop {argmax}\limits _{\theta } \prod ^T_{t-1}{f_{\tilde{{\varvec{R}}}}(\theta \vert \tilde{{\varvec{r}}}_t)}=\mathop {argmax}\limits _{\theta } \prod ^T_{t-1}{f_{\tilde{\varvec{\epsilon }}}(\theta \vert \tilde{\varvec{\epsilon }}_t)}\ne {\hat{\theta }}^{\tilde{\varvec{\epsilon }}}_{Type-B} \end{aligned}$$
(26)

To solve this problem in the estimation process, one solution is to drive the dynamics of \(\tilde{{\varvec{R}}}_t\) in the following model structure instead of Eq. (23)’s structure as

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{\varvec{\epsilon }}_t \nonumber \\ \tilde{\varvec{\epsilon }}_t&\sim f_{\tilde{\varvec{\epsilon }}_t}(\tilde{{\varvec{x}}};\theta ) \end{aligned}$$
(27)

Where the evolution and randomness of \(\tilde{{\varvec{R}}}_t\) are all comes from a fuzzy sets-valued stochastic variable \(\tilde{\varvec{\epsilon }}_t\) with time-varying probability density \(f_{\tilde{\varvec{\epsilon }}_t}(\tilde{{\varvec{x}}};\theta )\). It could be found that the problem in Eq. (26) is resolved, because we set \(\tilde{{\varvec{R}}}_t=\tilde{\varvec{\epsilon }}_t\) compulsively in Eq. (27). This kind of setting is similar to the Int-GARCH model and most GARCH-type models. In Sect. 4, we discover that the model structure of Eq. (27) is still capable of predicting volatility accurately. The limitation of the model setting does not necessarily impact the proposed model’s predictive power.

Recalling Eq. (27), \(\tilde{{\varvec{R}}}_t=\tilde{\varvec{\epsilon }}_t\), \(\tilde{\varvec{\epsilon }}_t\sim f_{\tilde{\varvec{\epsilon }}_t(\tilde{{\varvec{x}}};\theta )}\), if one wants to determine the change of \(\tilde{\varvec{\epsilon }}_t\), a straightforward idea from Eq. (A5) of GARCH-type models is to construct a time-varying parameter \(\theta _t\),Footnote 5 and use the past observations of \(\tilde{\varvec{\epsilon }}_t\) (or \(\tilde{{\varvec{R}}}_t\)) to obtain the \(\theta _t\). From an economic perspective, whether we use point values or the fuzzy set values as described in this paper to represent returns (or the innovations in returns), we must carefully consider the fact that current returns (or the innovations in returns) may be driven by past values and exhibit correlation with past values. The concept of lagged terms influencing current terms is widely applied in various econometric models (Creal et al., 2013; Koop & Korobilis, 2013).

Similar to the GARCH-type model, the type of distribution law of \(\tilde{\varvec{\epsilon }}_t\) will not change over time. Let the parameter set \(\theta _t\) in Eq. (27) is \(\theta _t={\nu _{1,t},\nu _{2,t},\ldots ,\nu _{n,t}}\) and we provide the following general model structure:

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{\varvec{\epsilon }}_t,\quad \tilde{\varvec{\epsilon }}_t=\tilde{\varvec{\epsilon }}_t(\nu _{1,t},\nu _{2,t},\ldots ,\nu _{n,t}) \nonumber \\ \nu _{1,t}&\sim l_{1,t}(\theta _{\nu _{1,t}}),\quad \theta _{\nu _{1,t}}=f_1(\tilde{{\varvec{R}}}_{t-1},\tilde{{\varvec{R}}}_{t-2},\ldots ,\theta _{\nu _{1,t-1}},\theta _{\nu _{1,t-2}},\ldots ) \nonumber \\&\dots \nonumber \\ \nu _{n,t}&\sim l_{n,t}(\theta _{\nu _{n,t}}),\quad \theta _{\nu _{n,t}}=f_n(\tilde{{\varvec{R}}}_{t-1},\tilde{{\varvec{R}}}_{t-2},\ldots ,\theta _{\nu _{1,t-1}},\theta _{\nu _{1,t-2}},\ldots ) \end{aligned}$$
(28)

where \(\nu _{1,t},\nu _{2,t},\ldots ,\nu _{n,t}\) are the random scalar parameters in \(\tilde{\varvec{\epsilon }}_t\) and with density function \(l_{1,t},l_{2,t},\ldots ,l_{n,t}\). Following the GARCH-type model, in Eq. (28), the scalar parameters \(\theta _{\nu _{1,t}},\theta _{\nu _{2,t}},\ldots ,\theta _{\nu _{n,t}}\) in density functions \(l_{1,t},l_{2,t},\ldots ,l_{n,t}\) are obtained by the past observed \(\tilde{{\varvec{R}}}_t\) and lag-terms of themselves \(\theta _{\nu _{1,t}},\theta _{\nu _{2,t}},\ldots ,\theta _{\nu _{n,t}}\).

We further explore the drivers of \(\tilde{{\varvec{R}}}_t\) change. When we get the prior parameter p in Eq. (13), the shape of \(\tilde{{\varvec{R}}}_t\) depends on \(R_{C,t}\), \(R_{r,t}\), and \(R_{l,t}\). The evolution of scalar value \(R_C,t\) is first obtained by the past term of itself, and the distance between \(R_{C,t}\) and 0. The \(\rho _2\) distance of Eq. (9) between \(\tilde{{\varvec{R}}}_t\) and \(\tilde{{\varvec{0}}}\) represents the degree of change in the overall price information set, which we note it by a 2-norm form \(\Vert \tilde{{\varvec{R}}}_t\Vert ^2_{\rho _2}\). The overall change will also cause a change in the distance between \(R_{C,t}\) and 0. In this vein, we have

$$\begin{aligned} R_{C,t}=g_1(R_{C,t-1},R_{C,t-2},\ldots ,\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2},\Vert \tilde{{\varvec{R}}}_{t-2}\Vert ^2_{\rho _2},\ldots ) \end{aligned}$$
(29)

If \(R_{C,t}\) reflects a “standard” returns level, then \(R_{r,t}\) and \(R_{l,t}\) reflect the degree of extreme deviation from “standard” returns level in the positive and negative directions, respectively.Footnote 6 This implies that the current \(R_{r,t}\) may be related to the past \(R_{r,t-1}\) and the past \(R_{l,t-1}\). The case is same for \(R_{l,t}\). Therefore, we set the following drive mode:

$$\begin{aligned} R_{l,t}&=g_2(R_{l,t-1},R_{l,t-2},\ldots ,R_{r,t-1},R_{r,t-2},\ldots ,\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2},\Vert \tilde{{\varvec{R}}}_{t-2}\Vert ^2_{\rho _2},\ldots ) \nonumber \\ R_{r,t}&=g_3(R_{r,t-1},R_{r,t-2},\ldots ,R_{l,t-1},R_{l,t-2},\ldots ,\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2},\Vert \tilde{{\varvec{R}}}_{t-2}\Vert ^2_{\rho _2},\ldots ) \end{aligned}$$
(30)

where the form of functions \(g_1\), \(g_2\), and \(g_3\) will be discussed later. From Eq. (28) to Eq. (30), we have

$$\begin{aligned} \tilde{{\varvec{R}}}_{t}&=\tilde{\varvec{\epsilon }}_{t} \nonumber \\ \tilde{\varvec{\epsilon }}_t&\sim f_{\tilde{\varvec{\epsilon }}_t}(R_{C,t-1},R_{C,t-2},\ldots ,R_{l,t-1},R_{l,t-2},\ldots ,R_{r,t-1},\nonumber \\ {}&R_{r,t-2},\ldots ,\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2},\Vert \tilde{{\varvec{R}}}_{t-2}\Vert ^2_{\rho _2},\ldots ) \end{aligned}$$
(31)

In this vein, we could determine the evolution of conditional random sets-valued \(\tilde{{\varvec{R}}}_{t}\vert \Omega _{t-1}\) (whereby \(\Omega _t={R_{C,t},R_{C,t-1},\ldots ,R_{l,t},R_{l,t-1},\ldots ,R_{r,t},R_{r,t-1},\ldots }\) is information set at time t) and calculate the in-sample volatility \({\mathbb {D}}(\tilde{{\varvec{R}}}_{t}\vert \Omega _{t-1})\) and out-of-sample volatility \({\mathbb {D}}(\tilde{{\varvec{R}}}_{t}\vert \Omega _{t-1})\) using Eq. (12).

3.2 Relationship between \({\mathbb {D}}(\tilde{{\varvec{R}}}_{t})\) and \(\sigma _t\)

The \({\mathbb {D}}(\tilde{{\varvec{R}}}_{t})\) is the volatility of fuzzy sets-valued returns \(\tilde{{\varvec{R}}}_{t}\), which is not exactly the daily volatility (Sun et al., 2020). We need to perform an “average operation” for the “degree of fuzziness” \(\tilde{{\varvec{R}}}_{t}\) to get \(\sigma _t\). First, we use a fuzziness control parameter \(\zeta \in [0,1]\) to control the “degree of fuzziness” of \(\tilde{{\varvec{R}}}_{t}\), that is:

$$\begin{aligned} \tilde{{\varvec{R}}}(\zeta )_t=\begin{aligned}\left\{ \begin{array}{rcl} \phi \left( \frac{R_{C,t}-x}{\zeta R_{l,t}};p\right) &{},&{}R_{C,t}-R_{l,t}\le x\le R_{C,t}\\ \phi \left( \frac{x-R_{C,t}}{\zeta R_{r,t}};p\right) &{},&{}R_{C,t}\le x< R_{C,t}+R_{r,t} \end{array} \right. \\ \end{aligned} \end{aligned}$$
(32)

The smaller the value of \(\zeta \), the better the \(R_{C,t}\) is able to represent the fuzzy information of this day. In particular, when \(\zeta \) is 0, fuzzy sets-valued \(\tilde{{\varvec{R}}}(\zeta )_t\) collapses to \(R_{C,t}\). Following Sun et al. (2020), in this study, we define an aggregate sets-valued volatility \(\sigma _t^{set}\), which reflects the average change from accepting all possible returns information and assigning a certain membership, to accept only \(R_{C,t}\). For any fuzzy sets-valued returns \(\tilde{{\varvec{R}}}(\zeta )_t\) under a set information reception level \(\zeta \), we give \(\zeta \) a certain weight \(W(\zeta )\). Then, the volatility \(\sigma _t^{set}\) defined in our study is

$$\begin{aligned} \sigma _t^{set}=\frac{\int ^1_0{\mathbb {D}}(\tilde{{\varvec{R}}}(\zeta )_t)dW(\zeta )}{\int ^1_0dW(\zeta )} \end{aligned}$$
(33)

We set a general weight function \(W(\zeta )=-\zeta +1\), \(\zeta \in [0,1]\) in this study.

3.3 Model specification

In accordance with the analysis framework of subsections 3.1 and 3.2, we present our proposed random sets-valued GARCH model, Set-GARCH model, and its derivatives.

3.3.1 Set-GARCH model

We set \(\theta ={\nu _1,\nu _2,\nu _3}\) in Eq. (28) as \(\theta ={R_{C,t}, R_{l,t},R_{r,t}}\), which also means \(R_{C,t}\), \(R_{l,t}\), and \(R_{r,t}\) would be stochastic processes. Thus, Eq. (27) can be expressed as:

$$\begin{aligned} \tilde{{\varvec{R}}}_t=\tilde{\varvec{\epsilon }}_t,\quad \tilde{\varvec{\epsilon }}_t=\tilde{\varvec{\epsilon }}_t(R_{C,t},R_{l,t},R_{r,t}) \end{aligned}$$
(34)

Following the GARCH-type models, we set \(l_{q,t}\) in Eq. (28) to be a normal distribution \({\mathcal {N}}(0,1)\).Footnote 7 Thus, we have:

$$\begin{aligned} R_{C,t}&\mathop {\sim }\limits _{i.i.d}\sqrt{h_t}{\mathcal {N}}(0,1) \nonumber \\ h_t&=\omega _h+\sum ^{P_h}_{i=1}\alpha _{h,i}h_{t-i}+\sum ^{Q_h}_{i=1}\beta _{h,i}(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2}-\frac{1}{3}R_{C,t-1}^2)+\sum ^{R_h}_{i=1}\gamma _{h,i}R_{C,t-1}^2 \end{aligned}$$
(35)

Given that \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2}\) has a term of \(\frac{1}{3}R_{C,t}^2\) from Eq. (21), which reveals that we should remove this term in \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2}\), because the \(\sum ^{R_h}_{i=1}\gamma _{h,i}R_{C,t-1}^2\) term in \(h_t\) also has \(R_{C,t}^2\). Since \(R_{r,t}\) and \(R_{l,t}\) must be positive numbers, we set \(l_{2,t}\) and \(l_{3,t}\) in Eq. (28) as Gamma distributions \(\Gamma {1,\theta _{l_{2,t}}}\) and \(\Gamma {1,\theta _{l_{3,t}}}\), which can flexibly control the variance and mean of \(l_{2,t}\) and \(l_{2,t}\) in Eq. (28), while reducing the complexity of the model. We have \({\mathbb {E}}(l_{2,t})=1/\theta _{l_{2,t}}\) and \({\mathbb {D}}(l_{2,t})=1/\theta ^2_{l_{2,t}}\). We denote \(1/\theta _{l_{2,t}}\) as \(\lambda _{l,t}\) and \(1/\theta _{l_{3,t}}\) as \(\lambda _{r,t}\). Here, we first give a simple setting, that is, \(R_{l,t}\) and \(R_{r,t}\) are independent of each other, or \(COV(R_{l,t},R_{r,t})=0\). This assumption is not strong, because in the analysis of Eq. (30) we only discussed some possible influence paths of \(R_{l,t}\) and \(R_{r,t}\). In Sect. 3.3.2, we will discuss the case where \(R_{l,t}\) and \(R_{r,t}\) are not independent of each other. According to the setting of Eqs. (28) and (30) we provide the following structure:

$$\begin{aligned} R_{l,t}&\mathop {\sim }\limits _{i.i.d.}\lambda _{l,t}\Gamma (1,1),\qquad R_{r,t}\mathop {\sim }\limits _{i.i.d.}\lambda _{r,t}\Gamma (1,1) \nonumber \\ \lambda _{l,t}&=\Lambda (\omega _l+\sum ^{P_l}_{i=1}\alpha _{l,i}\lambda _{l,t-i}+\sum ^{Q_l}_{i=1}\beta _{l,i}\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}+\sum ^{R_l}_{i=1}\gamma _{l,t}R_{l,t-i} \nonumber \\ \lambda _{r,t}&=\Lambda (\omega _r+\sum ^{P_r}_{i=1}\alpha _{r,i}\lambda _{r,t-i}+\sum ^{Q_r}_{i=1}\beta _{r,i}\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}+\sum ^{R_r}_{i=1}\gamma _{r,t}R_{r,t-i} \end{aligned}$$
(36)

where \(\Lambda :{\mathcal {R}}\rightarrow (0,\inf ]\) is a conversion function to ensure that \(\lambda _{l,t}\) and \(\lambda _{r,t}\) are positive values. Compared to Eq. (35), Eq. (36) selects \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}\) term instead of \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}^2\) term, given that distance, rather than the square of the distance, is more suitable for describing \(R_{l,t}\) and \(R_{r,t}\). Now we have the proposed Set-GARCH model, which means a GARCH-type model for sets-valued time series as:

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{\varvec{\epsilon }}_t,\quad \tilde{\varvec{\epsilon }}_t=\tilde{\varvec{\epsilon }}_t(R_{C,t},R_{l,t},R_{r,t}) \nonumber \\ R_{C,t}&\mathop {\sim }\limits _{i.i.d}\sqrt{h_t}{\mathcal {N}}(0,1),\quad R_{l,t}\mathop {\sim }\limits _{i.i.d.}\lambda _{l,t}\Gamma (1,1),\quad R_{r,t}\mathop {\sim }\limits _{i.i.d.}\lambda _{r,t}\Gamma (1,1) \nonumber \\ h_t&=\omega _h+\sum ^{P_h}_{i=1}\alpha _{h,i}h_{t-i}+\sum ^{Q_h}_{i=1}\beta _{h,i}(\Vert \tilde{{\varvec{R}}}_{t-i}\Vert ^2_{\rho _2}-\frac{1}{3}R_{C,t-i}^2)+\sum ^{R_h}_{i=1}\gamma _{h,i}R_{C,t-i}^2 \nonumber \\ \lambda _{l,t}&=\Lambda (\omega _l+\sum ^{P_l}_{i=1}\alpha _{l,i}\lambda _{l,t-i}+\sum ^{Q_l}_{i=1}\beta _{l,i}\Vert \tilde{{\varvec{R}}}_{t-i}\Vert _{\rho _2}+\sum ^{R_l}_{i=1}\gamma _{l,i}R_{l,t-i}) \nonumber \\ \lambda _{r,t}&=\Lambda (\omega _r+\sum ^{P_r}_{i=1}\alpha _{r,i}\lambda _{r,t-i}+\sum ^{Q_r}_{i=1}\beta _{r,i}\Vert \tilde{{\varvec{R}}}_{t-i}\Vert _{\rho _2}+\sum ^{R_r}_{i=1}\gamma _{r,i}R_{r,t-i} ) \end{aligned}$$
(37)

Given \(COV(R_{l,t},R_{C,t})=0\), \(COV(R_{C,t},R_{r,t})=0\), and \(COV(R_{l,t},R_{r,t})=0\), now we give the set volatility \(\sigma _t^{set}\) of Eq. (33) of Set-GARCH model as:

$$\begin{aligned} \sigma _t^{set}&=\frac{\int ^1_0{\mathbb {D}}(\tilde{{\varvec{R}}}(\zeta )_t)dW(\zeta )}{\int ^1_0dW(\zeta )} \nonumber \\&=\frac{1}{3}h_t+\frac{\lambda ^2_{l,t}+\lambda ^2_{r,t}}{18}\int ^1_0(\phi ^{-1}(\alpha ;p))^2d\alpha \end{aligned}$$
(38)

where \(\int ^1_0(\phi ^{-1}(\alpha ;p))^2d\alpha \) depends on p and \(h_t\) is in Eq. (35), and \(\lambda ^2_{l,t}+\lambda ^2_{r,t}\) are as in Eq. (36).

3.3.2 Set-GARCH-LR model

One assumption in the Set-GARCH model is \(COV(R_{l,t},R_{r,t})=0\). Now we remove this condition and consider that \(R_{l,t}\) and \(R_{r,t}\) are not independent. We call the proposed model in this case as Set-GARCH-LR model.

We consider using a joint bivariate Gamma distribution \(\Gamma ^2\) to characterize \(R_{l,t}\) and \(R_{r,t}\) (Furman, 2008), where the marginal distribution \(R_{l,t}\) or \(R_{r,t}\) is a univariate Gamma distribution. This setup follows the CARR model’s specification of the distribution of returns’ range (Chou, 2005) (also see “Appendix A.1.2”).

Let \({\varvec{R}}=(Y_0,Y_1,Y_2)'\) is a tri-variate vector and \(Y_i\sim \Gamma (\gamma _i,\alpha _i)\), which has the density of \(f_{Y_i}(y)=e^{-\alpha _iy}\frac{y^{\gamma _i-1}\alpha ^{\gamma _i}_i}{\Gamma (y_i)}\), \(y>0\), \(\alpha _i>0\), \(\gamma _i>0\). Let \({\varvec{A}}=\begin{bmatrix} \alpha _0/\alpha _1&{}1&{}0\\ \alpha _0/\alpha _2&{}\alpha _1/\alpha _2&{}1 \end{bmatrix}\), \((R_l,R_r)'={\varvec{A}}{\varvec{Y}}\), the joint distribution \((R_l,R_r)\) is a bi-variate gamma distribution which is controlled by the parameter \(\{\alpha _0,\alpha _1,\alpha _2,\gamma _0,\gamma _1,\gamma _2\}\). Let \(x^*=min\{\frac{\alpha _1}{\alpha _0}R_l,\frac{\alpha _2}{\alpha _0}R_r\}\), we have the density function of bivariate Gamma:

$$\begin{aligned} f(x_1,x_2)=e^{-\alpha _2x_2}\left( x_2-\frac{\alpha _1}{\alpha _2}x_1\right) ^{\gamma _2-1}\prod ^2_{j=0}\left( \frac{\alpha ^{\gamma _j}_j}{\Gamma (\gamma _j)}\right) \int ^{x^*}_0y_0^{\gamma _0-1}(x_1-\frac{\alpha _0}{\alpha _1}y_0)^{\gamma _1-1}dy_0\nonumber \\ \end{aligned}$$
(39)

According to the definition of Eq. (39), the marginal distribution of \(R_{l,t}\) and \(R_{r,t}\) is Gamma distribution, and the expectation and covariance of \(R_{l,t}\) and \(R_{r,t}\) is \({\mathbb {E}}(R_l)=\frac{\gamma _0+\gamma _1}{\alpha _1}\), \({\mathbb {E}}(R_r)=\frac{\gamma _0+\gamma _1+\gamma _2}{\alpha _2}\), \({\mathbb {D}}(R_l)=\frac{\gamma _0+\gamma _1}{\alpha _1^2}\), \({\mathbb {D}}(R_r)=\frac{\gamma _0+\gamma _1+\gamma _2}{\alpha _2^2}\), and \(COV(R_l,R_r)=\frac{\gamma _0+\gamma _1}{\alpha _1\alpha _2}\). In this paper, we reparametrize Eq. (39). Let \(\gamma _0=1\), \(\alpha _0=1\), \(\bar{\gamma }_1=\gamma _0+\gamma _1\), and \(\bar{\gamma }_2=\gamma _0+\gamma _1+\gamma _2\), and thus we have:

$$\begin{aligned} f_{(R_l,R_r)'}(x_1,x_2)&=e^{-\alpha _2x_2}(x_2-\frac{\alpha _1}{\alpha _2}x_1)^{\bar{\gamma }_2-\bar{\gamma }_1-1}\frac{\alpha _1^{\bar{\gamma }_1-1}}{\Gamma (\bar{\gamma }_1-1)} \nonumber \\&\quad \cdot \frac{\alpha _2^{\bar{\gamma }_2-1}}{\Gamma (\bar{\gamma }_2-1)}\int ^{x^*}_0(x_1-\frac{1}{\alpha _1}y_0)^{\bar{\gamma }_1-2}dy_0 \end{aligned}$$
(40)

We keep \(\bar{\gamma }_1\) and \(\bar{\gamma }_1\) time-invariant, and let \(\alpha _1\) and \(\alpha _2\) change dynamically in Eq. (40). Under this condition, we simplified Eq. (39) while ensuring that the marginal distribution of \(R_{l,t}\) and \(R_{r,t}\) has a gamma distribution \(\Gamma (\gamma ,\alpha )\), and more importantly, we can maintain the dynamics of the first and second moments of \(R_{l,t}\) and \(R_{r,t}\). Further, we have:

$$\begin{aligned} (R_{l,t},R_{r,t})\mathop {\sim }\limits _{i.i.d.}&\Gamma ^2(\alpha _{1,t},\alpha _{2,t},\bar{\gamma }_{1},\bar{\gamma }_2) \nonumber \\ \left[ \begin{array}{c} \alpha _{1,t} \\ \alpha _{2,t} \end{array} \right]&=\left[ \begin{array}{c} \omega _1 \\ \omega _2 \end{array} \right] +\sum ^{P_{lr}}_{i=1}\left[ \begin{array}{cc} a_{11,i}&{}0 \\ 0&{}a_{22,i} \end{array} \right] \left[ \begin{array}{c} \alpha ^*_{1,t-1} \\ \alpha ^*_{2,t-1} \end{array} \right] +\sum ^{Q_{lr}}_{i=1}\left[ \begin{array}{c} b_{1,i} \\ b_{2,i} \end{array} \right] \circ \left[ \begin{array}{c} \Vert \tilde{{\varvec{R}}}_{t-i}\Vert _{\rho _2} \\ \Vert \tilde{{\varvec{R}}}_{t-i}\Vert _{\rho _2} \end{array} \right] \nonumber \\&\quad +\sum ^{R_{lr}}_{i=1}\left[ \begin{array}{cc} c_{11,i}&{}c_{12,i} \\ c_{21,i}&{}c_{22,i} \end{array} \right] \left[ \begin{array}{c} R_{l,t-i} \\ R_{r,t-i} \end{array} \right] \nonumber \\ \alpha ^*_{1,t}&=\Lambda (\alpha _{1,t}),\quad \alpha ^*_{2,t}=\Lambda (\alpha _{2,t}) \end{aligned}$$
(41)

where \(\circ \) is Hadamard product, \(\Gamma ^2\) is the density function of Eq. (40) with four parameters, and \(\Lambda (x)\) is a transformation function \(\Lambda :{\mathbb {R}}\rightarrow (0,\infty ]\). Combining Eqs. (34), (35), and (41), we propose the derivative of Set-GARCH named Set-GARCH-LR model as:

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{\varvec{\epsilon }}_t,\quad \tilde{\varvec{\epsilon }}_t=\tilde{\varvec{\epsilon }}_t(R_{C,t},R_{l,t},R_{r,t}) \nonumber \\ R_{C,t}&\mathop {\sim }\limits _{i.i.d}\sqrt{h_t}{\mathcal {N}}(0,1) \nonumber \\ h_t&=\omega _h+\sum ^{P_h}_{i=1}\alpha _{h,i}h_{t-i}+\sum ^{Q_h}_{i=1}\beta _{h,i}(\Vert \tilde{{\varvec{R}}}_{t-i}\Vert ^2_{\rho _2}-\frac{1}{3}R_{C,t-i}^2)+\sum ^{R_h}_{i=1}\gamma _{h,i}R_{C,t-i}^2 \nonumber \\ (R_{l,t},R_{r,t})\mathop {\sim }\limits _{i.i.d.}&\Gamma ^2(\alpha _{1,t},\alpha _{2,t},\bar{\gamma }_{1},\bar{\gamma }_2) \nonumber \\ \left[ \begin{array}{c} \alpha _{1,t} \\ \alpha _{2,t} \end{array} \right]&=\left[ \begin{array}{c} \omega _1 \\ \omega _2 \end{array} \right] +\sum ^{P_{lr}}_{i=1}\left[ \begin{array}{cc} a_{11,i}&{}0 \\ 0&{}a_{22,i} \end{array} \right] \left[ \begin{array}{c} \alpha ^*_{1,t-i} \\ \alpha ^*_{2,t-i} \end{array} \right] +\sum ^{Q_{lr}}_{i=1}\left[ \begin{array}{c} b_{1,i} \\ b_{2,i} \end{array} \right] \circ \left[ \begin{array}{c} \Vert \tilde{{\varvec{R}}}_{t-i}\Vert _{\rho _2} \\ \Vert \tilde{{\varvec{R}}}_{t-i}\Vert _{\rho _2} \end{array} \right] \nonumber \\&\quad +\sum ^{R_{lr}}_{i=1}\left[ \begin{array}{cc} c_{11,i}&{}c_{12,i} \\ c_{21,i}&{}c_{22,i} \end{array} \right] \left[ \begin{array}{c} R_{l,t-i} \\ R_{r,t-i} \end{array} \right] \nonumber \\ \alpha ^*_{1,t}&=\Lambda (\alpha _{1,t}),\quad \alpha ^*_{2,t}=\Lambda (\alpha _{2,t}) \end{aligned}$$
(42)

The “LR” in the name of Set-GARCH-LR means that it reveals the dependence between \(R_{l,t}\) and \(R_{r,t}\). Recalling Eq. (33), the calculated by the Set-GARCH-LR model is:

$$\begin{aligned} \sigma _t^{set}=\frac{1}{3}h_t+\frac{1}{18}(\frac{\bar{\gamma }_1}{\alpha ^2_{1,t}}+\frac{\bar{\gamma }_2}{\alpha ^2_{2,t}}+\frac{\bar{\gamma }_1}{\alpha _{1,t}\alpha _{2,t}})\int ^1_0(\phi ^{-1}(\alpha ;p))^2d\alpha \end{aligned}$$
(43)

where \(\int ^1_0(\phi ^{-1}(\alpha ;p))^2d\alpha \) depends on p. The \(h_t\) is in Eq. (35), and \(\bar{\gamma }_1\), \(\bar{\gamma }_2\), \(\alpha _{1,t}\) and \(\alpha _{2,t}\) in Eq. (41).

3.4 Parameter estimation

Given that \(\tilde{{\varvec{R}}}_t=\tilde{\varvec{\epsilon }}_t\), we can directly use historical observations for maximum likelihood estimation. For the Set-GARCH model, given that \(f_{(R_C,R_l,R_r)}=f_{R_C}f_{R_l}f_{R_r}\), the log-likelihood function \(ll_{Set-GARCH}\) w.r.t. the parameter set \(\varvec{\theta }_{Set-GARCH}=(\omega _h,\varvec{\alpha }_h,\varvec{\beta }_h,\varvec{\gamma }_h,\omega _l,\varvec{\alpha }_l,\varvec{\beta }_l,\varvec{\gamma }_l,\omega _r,\varvec{\alpha }_r,\varvec{\beta }_r,\varvec{\gamma }_r)\) is

$$\begin{aligned}&ll_{Set-GARCH}(\omega _h,\varvec{\alpha }_h,\varvec{\beta }_h,\varvec{\gamma }_h,\omega _l,\varvec{\alpha }_l,\varvec{\beta }_l,\varvec{\gamma }_l,\omega _r,\varvec{\alpha }_r,\varvec{\beta }_r,\varvec{\gamma }_r\vert \tilde{{\varvec{r}}}_t) \nonumber \\&\quad =\sum ^T_{t=1}\ln f_{R_C}(x\vert \Omega _{t-1})+\sum ^T_{t=1}\ln f_{R_l}(x\vert \Omega _{t-1})+\sum ^T_{t=1}\ln f_{R_r}(x\vert \Omega _{t-1}) \nonumber \\&\quad \propto -\frac{1}{2}\sum ^T_{t=1}\ln h_t-\sum ^T_{t=1}\frac{r^2_{C,t}}{2h_t}-\sum ^T_{t=1}\ln \lambda _{l,t}-\sum ^T_{t=1}\frac{r_{l,t}}{\lambda _{l,t}}-\sum ^T_{t=1}\ln \lambda _{r,t}-\sum ^T_{t=1}\frac{R_{r,t}}{\lambda _{r,t}} \end{aligned}$$
(44)

Thus, we can deconstruct the maximum likelihood estimation process into three sub-maximum likelihood estimation terms, i.e.,

$$\begin{aligned} (\omega _h,\varvec{\alpha }_h,\varvec{\beta }_h,\varvec{\gamma }_h)&=argmax\{-\frac{1}{2}\sum ^T_{t=1}\ln h_t-\sum ^T_{t=1}\frac{r^2_{C,t}}{2h_t}\} \nonumber \\ (\omega _l,\varvec{\alpha }_l,\varvec{\beta }_l,\varvec{\gamma }_l)&=argmax\{-\sum ^T_{t=1}\ln \lambda _{l,t}-\sum ^T_{t=1}\frac{r_{l,t}}{\lambda _{l,t}}\} \nonumber \\ (\omega _r,\varvec{\alpha }_r,\varvec{\beta }_r,\varvec{\gamma }_r)&=argmax\{-\sum ^T_{t=1}\ln \lambda _{r,t}-\sum ^T_{t=1}\frac{r_{r,t}}{\lambda _{r,t}}\} \end{aligned}$$
(45)

For the Set-GARCH-LR model, given that \(f_{(R_C,R_l,R_r)}=f_{R_C}f_{(R_l,R_r)}\), the likelihood function \(ll_{Set-GARCH-LR}\) w.r.t. parameter set \(\varvec{\theta }_{Set-GARCH-LR}=(\omega _h,\varvec{\alpha }_h,\varvec{\beta }_h,\varvec{\gamma }_h,\omega _1,\omega _2,\bar{\gamma }_1,\bar{\gamma }_2,{\varvec{a}}_{P_lr},{\varvec{b}}_{Q_lr},{\varvec{c}}_{R_lr})\) is:

$$\begin{aligned}&ll_{Set-GARCH-LR}(\omega _h,\varvec{\alpha }_h,\varvec{\beta }_h,\varvec{\gamma }_h,\omega _1,\omega _2,\bar{\gamma }_1,\bar{\gamma }_2,{\varvec{a}}_{P_lr},{\varvec{b}}_{Q_lr},{\varvec{c}}_{R_lr}\vert \tilde{{\varvec{r}}}_t) \nonumber \\&\quad =\sum ^T_{t-1}\ln f_{R_C}(x\vert \Omega _{t-1})+\sum ^T_{t-1}\ln f_{(R_l,R_r)}(x,y\vert \Omega _{t-1}) \nonumber \\&\quad \propto -\frac{1}{2}\sum ^T_{t=1}\ln h_t-\sum ^T_{t=1}\frac{r^2_{C,t}}{2h_t}-\sum ^T_{t=1}\alpha _{2,t}r_{r,t}+(\bar{\gamma }_2-\bar{\gamma }_1-1)\sum ^T_{t=1}\ln (r_{r,t}-\frac{\alpha _{1,t}}{\alpha _{2,t}}r_{l,t}) \nonumber \\&\quad +(\bar{\gamma }_1-1)\sum ^T_{t=1}\ln \alpha _{1,t}+(\bar{\gamma }_2-\bar{\gamma }_1)\sum ^T_{t=1}\ln \alpha _{2,t}-\sum ^T_{t=1}\ln \Gamma (\bar{\gamma }_1-1) \nonumber \\&\quad +\sum ^T_{t=1}\ln \Gamma (\bar{\gamma }_2-\bar{\gamma }_1)+\sum ^T_{t=1}\ln (\int ^{x^*}_0(r_{l,t}-\frac{y_0}{\alpha _{1,t}})^{\bar{\gamma }_1-2}dy_0) \end{aligned}$$
(46)

We can still find the optimal parameters to be estimated using a method similar to Eq. (45), that is:

$$\begin{aligned}&(\omega _h,\varvec{\alpha }_h,\varvec{\beta }_h,\varvec{\gamma }_h)=argmax\{-\frac{1}{2}\sum ^T_{t=1}\ln h_t-\sum ^T_{t=1}\frac{r^2_{C,t}}{2h_t}\} \nonumber \\&\quad (\omega _1,\omega _2,\bar{\gamma }_1,\bar{\gamma }_2,{\varvec{a}}_{P_lr},{\varvec{b}}_{Q_lr},{\varvec{c}}_{R_lr}) \nonumber \\&\quad =argmax\{-\sum ^T_{t=1}\alpha _{2,t}r_{r,t}+(\bar{\gamma }_2-\bar{\gamma }_1-1)\sum ^T_{t=1}\ln (r_{r,t}-\frac{\alpha _{1,t}}{\alpha _{2,t}}r_{l,t}) \nonumber \\&\quad +(\bar{\gamma }_1-1)\sum ^T_{t=1}\ln \alpha _{1,t}+(\bar{\gamma }_2-\bar{\gamma }_1)\sum ^T_{t=1}\ln \alpha _{2,t}-\sum ^T_{t=1}\ln \Gamma (\bar{\gamma }_1-1) \nonumber \\&\quad +\sum ^T_{t=1}\ln \Gamma (\bar{\gamma }_2-\bar{\gamma }_1)+\sum ^T_{t=1}\ln (\int ^{x^*}_0(r_{l,t}-\frac{y_0}{\alpha _{1,t}})^{\bar{\gamma }_1-2}dy_0)\} \end{aligned}$$
(47)

where \(x^*=min\{\frac{\alpha _1}{\alpha _0}R_l,\frac{\alpha _2}{\alpha _0}R_r\}\). The scoring direction search optimization method is used to solve Eqs. (45) and (47). Let \(\varvec{\theta }^*_{Set-GARCH}\) and \(\varvec{\theta }^*_{Set-GARCH-LR}\) be the real parameters of the Set-GARCH and Set-GARCH-LR model. We have that:

$$\begin{aligned}&(T\rightarrow \infty )\hat{\varvec{\theta }}_{Set-GARCH}\mathop {\rightarrow }\limits ^{{\mathcal {P}}}\varvec{\theta }^*_{Set-GARCH} \nonumber \\&(T\rightarrow \infty )\hat{\varvec{\theta }}_{Set-GARCH-LR}\mathop {\rightarrow }\limits ^{{\mathcal {P}}}\varvec{\theta }^*_{Set-GARCH-LR} \nonumber \\&(T\rightarrow \infty )T^{\frac{1}{2}}(\hat{\varvec{\theta }}_{Set-GARCH}-\varvec{\theta }^*_{Set-GARCH})\mathop {\rightarrow }\limits ^{{\mathcal {D}}}{\mathcal {N}}\left( 0,-\left[ {\mathbb {E}}\frac{\partial ^2ll_{Set-GARCH}}{\partial \varvec{{\theta }^*}^2_{Set-GARCH})}\right] '\right) \nonumber \\&(T\rightarrow \infty )T^{\frac{1}{2}}(\hat{\varvec{\theta }}_{Set-GARCH-LR}-\varvec{\theta }^*_{Set-GARCH-LR})\nonumber \\&\mathop {\rightarrow }\limits ^{{\mathcal {D}}}{\mathcal {N}}\left( 0,-\left[ {\mathbb {E}}\frac{\partial ^2ll_{Set-GARCH-LR}}{\partial \varvec{{\theta }^*}^2_{Set-GARCH-LR})}\right] '\right) \end{aligned}$$
(48)

where \(-[{\mathbb {E}}\frac{\partial ^2ll_{Set-GARCH}}{\partial \varvec{{\theta }^*}^2_{Set-GARCH})}]'\) and \(-[{\mathbb {E}}\frac{\partial ^2ll_{Set-GARCH-LR}}{\partial \varvec{{\theta }^*}^2_{Set-GARCH-LR})}]'\) is the Fisher information matrix of \(ll_{Set-GARCH}\) in Eq. (44) and \(ll_{Set-GARCH-LR}\) in Eq. (46) at \(\varvec{\theta }^*_{Set-GARCH}\) and \(\varvec{\theta }^*_{Set-GARCH-LR}\), respectively. We can compute the numerical solution of Eq. (48) to obtain the standard errors of the estimated parameter.

4 An empirical application

4.1 Data selection

We select daily, weekly, and monthly data from Datastream for WTI oil futures, S &P500 stock index, and NYMEX gold futures to demonstrate the in-sample and out-of-sample volatility forecasting and returns interval forecasting capabilities of the proposed Set-GARCH model. Futures prices are chosen so that they represent the highest, lowest, and closing prices. The selection of Data is shown in Table 1.

Table 1 Data selection

As shown in Table 1, the SD of the highest price, lowest price, and closing price of an asset would almost increase as the timescale lengthens. This is because of the cumulative change in asset prices in a month is always greater than the change in a day or week. The daily data is a great test of the forecasting performance of a model that incorporates range information, but we would like to investigate further how our Set-GARCH or Set-GARCH-LR models perform in this environment of high- and low-frequency data. Figure 2 clearly depicts the high, low, and closing price (returns) trajectories for the same sample period since 2018 for crude oil. If we only consider the closing price, we appear to lose a great deal of information.

Fig. 2
figure 2

The trajectory of monthly \(C_t\), \(H_t\), \(L_t\), \(R_{C,t}\), \(R_{L,t}\) and \(R_{H,t}\) of oil, S &P500, and gold since 2018

4.2 In-sample volatility forecasting

Without loss of generality, we set the Set-GARCH model specification of proposed as follows:

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{\varvec{\epsilon }}_t,\quad \tilde{\varvec{\epsilon }}_t=\tilde{\varvec{\epsilon }}_t(R_{C,t},R_{l,t},R_{r,t}) \nonumber \\ R_{C,t}&\mathop {\sim }\limits _{i.i.d}\sqrt{h_t}{\mathcal {N}}(0,1),\quad R_{l,t}\mathop {\sim }\limits _{i.i.d.}\lambda _{l,t}\Gamma (1,1),\quad R_{r,t}\mathop {\sim }\limits _{i.i.d.}\lambda _{r,t}\Gamma (1,1) \nonumber \\ h_t&=\omega _h+\alpha _{h,1}h_{t-1}+\beta _{h,1}(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2}-\frac{1}{3}R_{C,t-1}^2)+\gamma _{h,1}R_{C,t-1}^2 \nonumber \\ \lambda _{l,t}&=(\omega _l+\alpha _{l,1}\lambda _{l,t-1}+\beta _{l,1}\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}+\gamma _{l,1}R_{l,t-1})^2+0.001 \nonumber \\ \lambda _{r,t}&=(\omega _r+\alpha _{r,1}\lambda _{r,t-1}+\beta _{r,1}\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}+\gamma _{r,1}R_{r,t-1})^2+0.001 \end{aligned}$$
(49)

Similarly, we set each order in the Set-GARCH-LR model to 1, i.e.,

$$\begin{aligned} \tilde{{\varvec{R}}}_t&=\tilde{\varvec{\epsilon }}_t,\quad \tilde{\varvec{\epsilon }}_t=\tilde{\varvec{\epsilon }}_t(R_{C,t},R_{l,t},R_{r,t}) \nonumber \\ R_{C,t}&\mathop {\sim }\limits _{i.i.d}\sqrt{h_t}{\mathcal {N}}(0,1) \nonumber \\ h_t&=\omega _h+\alpha _{h,1}h_{t-1}+\beta _{h,1}(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert ^2_{\rho _2}-\frac{1}{3}R_{C,t-i}^2)+\gamma _{h,1}R_{C,t-1}^2 \nonumber \\ (R_{l,t},R_{r,t})\mathop {\sim }\limits _{i.i.d.}&\Gamma ^2(\alpha _{1,t},\alpha _{2,t},\bar{\gamma }_{1},\bar{\gamma }_2) \nonumber \\ \left[ \begin{array}{c} \alpha _{1,t} \\ \alpha _{2,t} \end{array} \right]&=\left[ \begin{array}{c} \omega _1 \\ \omega _2 \end{array} \right] +\left[ \begin{array}{cc} a_{11,1}&{}0 \\ 0&{}a_{22,1} \end{array} \right] \left[ \begin{array}{c} \alpha ^*_{1,t-1} \\ \alpha ^*_{2,t-1} \end{array} \right] +\left[ \begin{array}{c} b_{1,1} \\ b_{2,1} \end{array} \right] \circ \left[ \begin{array}{c} \Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2} \\ \Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2} \end{array} \right] \nonumber \\&\quad +\left[ \begin{array}{cc} c_{11,1}&{}c_{12,1} \\ c_{21,1}&{}c_{22,1} \end{array} \right] \left[ \begin{array}{c} R_{l,t-1} \\ R_{r,t-1} \end{array} \right] \nonumber \\ \alpha ^*_{1,t}&=(\alpha _{1,t})^2+0.001,\quad \alpha ^*_{2,t}=(\alpha _{2,t})^2+0.001 \end{aligned}$$
(50)

Meanwhile, we set the prior parameter p reflecting the shape of the fuzzy set to three different values of 1, 2, and 10. Tables 2, 3, 4, 5, 6 and 7 demonstrate the parameter estimation results.

\(\beta _{h,1}\) shows how item \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}^2-\frac{1}{3}R^2_{C,t-1}\) in the Set-GARCH and Set-GARCH-LR model affects the change of \(h_t\), which is also an important coefficient revealing the usage of fuzzy sets-valued variable. We found that for the same asset, \(\beta _{h,1}\) is mostly insignificant under the daily data, while under the weekly and monthly data, \(\beta _{h,1}\) is statistically significant. In Sect. 4.1, we found that as the data frequency decreases, the volatility of the \(H_t\) and \(L_t\) also becomes greater. The \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}^2-\frac{1}{3}R^2_{C,t-1}\) changes in the weekly and monthly frequencies and will provide more information. In contrast to the daily frequency, which helps to predict the \(h_t\) under the weekly and monthly data frequency.

In Set-GARCH model, the \(\beta _{l,1}\) and \(\beta _{r,1}\) shows how \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}\) affects the expectation (and also variance) of \(R_{r,t}\) and \(R_{l,t}\). From Tables 2, 3 and 4, in almost all assets and different time scales, \(\beta _{l,1}\) and \(\beta _{r,1}\) are significant, indicating that the influence of fuzzy set numerical variables on \(R_{r,t}\) and \(R_{l,t}\) do not change with the data frequency. Further, \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}\) always have a positive impact on \(\lambda _{l,t}\) and \(\lambda _{r,t}\) in our empirical application, which is different the pattern of the influence of \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}\) on \(h_t\). The \(h_t\), \(\lambda _{l,t}\), and \(\lambda _{r,t}\) demonstrate strong dynamic patterns which drive the change of \(\tilde{{\varvec{R}}}_{t}\). This implies the rationality of our settings on the Set-GARCH model.

The Eqs. (45) and (47) imply that the \(\omega _h\), \(\alpha _{h,1}\), \(\beta _{h,1}\), \(\gamma _{h,1}\) in both Set-GARCH and Set-GARCH-LR models are at the same value, which is demonstrated in Tables 5, 6 and 7. The coefficients \(b_{1,1}\) and \(b_{2,1}\) of the Set-GARCH-LR model are almost all significant under different assets and different sample frequencies, which shows that when \(R_{r,t}\) and \(R_{l,t}\) are not independent, \(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}\) would affect the \(\alpha _{1,t}\) and \(\alpha _{2,t}\) parameters in the distribution in a time-varying manner. This demonstrates once again the importance of our returns being in fuzzy random set values.

Compared to the Set-GARCH model, the \(c_{12,1}\) and \(c_{21,1}\) coefficients in the Set-GARCH-LR model are also significant in most cases from Tables 5, 6 and 7. This illustrates the interaction of \(R_{r,t}\) and \(R_{l,t}\) of assets, and this interaction will not disappear due to the changes in data frequency. In summary, the model settings of Set-GARCH and Set-GARCH-LR make full use of \(\tilde{{\varvec{R}}}_{t-1}\) past information to drive changes of \(\tilde{{\varvec{R}}}_{t}\).

Table 2 In sample Set-GARCH estimation of oil price
Table 3 In sample Set-GARCH estimation of S &P500 price
Table 4 In sample Set-GARCH estimation of gold price
Table 5 In sample Set-GARCH-LR estimation of oil price
Table 6 In sample Set-GARCH-LR estimation of S &P500 price
Table 7 In sample Set-GARCH-LR estimation of gold price

We select the following two loss functions to measure the volatility forecasting accuracy (Patton, 2011), and we denote the squared returns \(\sigma ^2_t\) the proxy \({\hat{\sigma }}^2\) the predicted volatility of real volatility (Wang et al., 2020; Zhang et al., 2020):

$$\begin{aligned} MSE-SD&=\frac{1}{N}\sum ^N_{i=1}({\hat{\sigma }}-\sigma _i)^2 \nonumber \\ MAE&=\frac{1}{N}\sum ^N_{i=1}\vert {\hat{\sigma }}^2-\sigma _i^2\vert \end{aligned}$$
(51)

The Model Confidence Set (MCS) test (Hansen et al., 2011) is utilized to determine if a model could achieve an acceptance set with a specified confidence level. The MCS statistics range between 0 and 1. The greater the number, the higher the acceptance of one model (Wang et al., 2020, 2016). For CARR group models and the ACI model, after calculating their range, we use \(\frac{H_t-L_t}{4\ln 2}\) to calculate their in-sample predicted volatility.

In general, compared to the benchmark model, the Set-GARCH and Set-GARCH-LR models exhibit significantly superior in-sample volatility prediction capabilities. The Set-GARCH and Set-GARCH-LR models demonstrate superior in-sample volatility prediction capabilities than daily or weekly data, particularly as the sample frequency of assets decreases (e.g., monthly data). For the same frequency and asset, the Set-GARCH-LR model’s in-sample prediction performance is frequently superior to that of the Set-GARCH model. These observations indicate that the degree of absorption of sets-valued information in the sample enables the model to better fit the in-sample data. Through the evidence presented in Tables 8, 9 and 10, we will further elaborate this claim.

Referring to the analysis in sections A.1 and 2, Set-GARCH and Set-GARCH-LR models have captured “range” and “level” information of \(H_t\) and \(L_t\), and the “point” information of \(R_{C,t}\). The GARCH group models only contain the “point” information, while the ACI and Int-GARCH models do not contain the “point” information of \(R_{C,t}\). The CARR group models only engage the “range” information of \(H_t\) and \(L_t\), and the “point” information of \(R_{C,t}\). Provided that the ’ range” and “level” information of \(R_{L,t}\) and \(R_{H,t}\) contain rich information (or a relatively large change), it would certainly improve in-sample forecasting. Compared to daily and weekly time intervals, the changes of \(R_{L,t}\) and \(R_{H,t}\) in monthly data are more profound. The complete information empowers the Set-GARCH and Set-GARCH-LR models to close the information gap existing in the benchmark models.

The empirical results also show that the in-sample prediction performance of the Set-GARCH-LR model is superior to that of the Set-GARCH model when it was applied to crude oil. Crude oil is a highly volatile asset (Cerqueti & Fanelli, 2021; Cerqueti et al., 2020), and the mechanism of change between \(R_{L,t}\) and \(R_{H,t}\) is more significant, which could make Set-GARCH-LR superior for in-sample forecasting.

The high value of p means that we increase the degree of membership of returns value close to the \(R_{L,t}\) and \(R_{H,t}\) in \(\tilde{{\varvec{R}}}_t\). From the performance of in-sample prediction, the Set-GARCH and Set-GARCH-LR models with \(p=1\) and \(p=2\) have better fitting results. Compared to the ACI and Int-GARCH models that fairly absorb all the interval-valued information, the small p controls our “degree of membership for various points in the interval-valued information. As shown in Fig. 3, there is no difference between \(\tilde{{\varvec{R}}}_t\) and interval-valued variables for extremely large p values, making our model inferior to ACI and Int-GARCH. Figure 3 demonstrates the real volatility and best models’ fitted volatility.

Table 8 In sample daily returns volatility forecasting goodness-of-fit and MCS test
Table 9 In sample weekly returns volatility forecasting goodness-of-fit and MCS test
Table 10 In sample monthly returns volatility forecasting goodness-of-fit and MCS test
Fig. 3
figure 3

Real volatility and fitted volatility of best model under MSE-SD and MAE loss function

4.3 Out-of-sample volatility forecasting

We use the rolling 300-length window one-step forward prediction method to evaluate the out-of-sample volatility prediction performance of the Set-GARCH model and the Set-GARCH-LR model. From Eqs. (49) and (50), the one-step head \({\hat{\sigma }}_{set,t}(1)\) is:

$$\begin{aligned} {\hat{\sigma }}_{set,t}(1)&=\frac{1}{3}{\hat{h}}_t(1)\frac{{\hat{\lambda }}^2_{l,t}(1)+{\hat{\lambda }}^2_{r,t}(1)}{18}\int ^1_0(\phi ^{-1}(a;p))^2d\alpha \nonumber \\ {\hat{h}}_t(1)&={\hat{\omega }}_h+{\hat{\alpha }}_{h,1}h_t+{\hat{\beta }}_{h,1}(\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}^2-\frac{R^2_{C,t}}{3})+{\hat{\gamma }}_{h,1}R^2_{C,t} \nonumber \\ {\hat{\lambda }}_{l,t}(1)&=({\hat{\omega }}_l+{\hat{\alpha }}_{l,1}\lambda _{l,t}+{\hat{\beta }}_{l,1}\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}+{\hat{\gamma }}_{l,1}R_{l,t})^2+0.001 \nonumber \\ {\hat{\lambda }}_{r,t}(1)&=({\hat{\omega }}_r+{\hat{\alpha }}_{r,1}\lambda _{r,t}+{\hat{\beta }}_{r,1}\Vert \tilde{{\varvec{R}}}_{t-1}\Vert _{\rho _2}+{\hat{\gamma }}_{r,1}R_{r,t})^2+0.001 \end{aligned}$$
(52)

and the one-step forward prediction of the Set-GARCH-LR is also calculated in a similar way.

As evidenced in Tables 11 and 13, the Set-GARCH group models generally demonstrate a superior ability to predict out-of-sample volatility compared to the benchmark models. With some exceptions, the CARR-B model exhibits certain predictive advantages for weekly data.

Unlike the Set-GARCH or the Set-GARCH-LR model when \(p=1\) or \(p=2\), the out-of-sample prediction effect of the Set-GARCH model when the prior parameter equals to 10 is not as satisfactory in our empirical applications. This may be because that a higher p value increases the absorption of \(R_l\) and \(R_r\) information in the model in out-of-sample predictions. This may lead to instability in the Set-GARCH model. The poor prediction performance of the Set-GARCH model under this large p is consistent with the poor out-of-sample volatility prediction performance of the Int-GARCH model presented in Tables 11, 12 and 13. According to the analysis in Sect. 2.3, \(\tilde{{\varvec{R}}}_t\) turns into an interval number random variable equivalent to the Int-GARCH model when p is large. At this time, we approximate that Int-GARCH and Set-GARCH (-LR) models under \(p=10\) absorb the same information. We note that the prediction results of Int-GARCH and Set-GARCH(-LR) are not satisfactory. This may suggest that extra interval information will not necessarily improve the model’s performance in volatility prediction in the out-of-sample analysis.

Under \(p=1\) or \(p=2\), the out-of-sample volatility prediction of Set-GARCH or Set-GARCH-LR model performs well. The out-of-sample prediction performance of GARCH group models is significantly inferior to our proposed Set-GARCH (-LR) model. This is largely due to the lack of information processed by the GARCH model, which only considers \(R_{C,t}\) information. This confirms the importance of Set-GARCH (-LR) absorbing \(R_{L,t}\) and \(R_{R,t}\) information in the out-of-sample volatility prediction.

As discussed in Sect. 2.3, when p is small, the information absorbed by Set-GARCH-LR is closed to the information absorbed by the ACI model. In most cases, the Set-GARCH (-LR) out-of-sample prediction of \(p=1\) or \(p=2\), performs better than the ACI model. It may imply that the calculation mode of \(\sigma ^{Set}_t\) is better than the calculation mode of \(\frac{H_t-L_t}{4\ln 2}\) of the ACI model. Recalling Eqs. (38) and (43), \(\sigma ^{Set}_t\) is a linear combination of term \({\mathbb {D}}(R_{C})\), \({\mathbb {D}}(R_r)\), \({\mathbb {D}}(R_l)\), \(COV(R_C,R_r)\), \(COV(R_C,R_l)\), and \(COV(R_l,R_r)\).Footnote 8 Different combinations of information are blended together to give \(\sigma ^{Set}_t\) an enhanced predictive capability. Different sample frequencies do not appear to have a substantial effect on the Set-GARCH (-LR) model’s ability to predict out-of-sample volatility.

Table 11 Out of sample daily returns volatility forecasting goodness-of-fit and MCS test
Table 12 Out of sample weekly returns volatility forecasting goodness-of-fit and MCS test
Table 13 Out of sample monthly returns volatility forecasting goodness-of-fit and MCS test

5 Conclusion

In the last few decades, the data structure of the financial time series volatility model has evolved significantly from GARCH-type models with point-valued data to CARR-type models with range-valued data, the ACI model and Int-GARCH model with interval-valued data using random set theory etc. This study proposes a Set-GARCH model that drives the volatility changes in random fuzzy sets-valued time series. Adapting to the rules of random set operations, the proposed Set-GARCH model exhibits accurate volatility prediction.

We construct the sets-valued asset price using a fuzzy LR-form set. We present a general and adaptable form of the membership function with a prior parameter p that controls the shape of these functions. We examine the impact of various subtraction rules on sets-valued returns. This paper provides the inner-product definition, distance definition, and variance definition between two random fuzzy sets-valued returns.

Based on the sets-valued variable subtraction rule selected, we discuss the specifications that a model driving sets-valued variable changes should have and provide the specifications of our Set-GARCH model. We also propose the Set-GARCH-LR model as a derivative of the Set-GARCH model to increase the flexibility of structure settings. The Set-GARCH differs from Set-GARCH-LR in that the latter assumes that the two shape parameters in fuzzy sets-valued returns are dependent and follow bivariate Gamma distribution. Maximum likelihood could be utilized to estimate both the Set-GARCH and the Set-GARCH-LR models’ parameters. In addition, we provide a transforming formula between the variance of fuzzy sets- valued returns and the volatility of real returns.

In the empirical applications, we compare the volatility forecasting performance of the Set-GARCH model to that of three classic GARCH-type models, three classic CARR-type models, the interval valued-ACI model, and the interval valued Int-GARCH model using daily/weekly/monthly trading data for oil, gold, and the S &P500. The proposed Set-GARCH model/Set-GARCH-LR model performs well in both in-sample and out-of-sample volatility prediction tests.

This paper also points out the possible directions for future research on the development of sets-valued time series volatility models. First, to develop sets-valued time series models that could absorb more information on price aggregation (our model only absorbs three prices). Second, to develop an extension to the multivariate sets-valued time series.