1 Introduction

Conventionally, time series analysis uses a succession of single-valued data samples. This may be restrictive in situations in which complex data analysis is needed to comprehend the inherent variability and uncertainty of a phenomenon. For instance, in economics, the daily stock price of a corporation is expressed as the daily minimum and maximum trading prices. If only the lowest (or the highest, or closing) price of each day is considered, then the resulting time series is single-valued, neglecting the inherent intra-daily price variability. If both, the daily lowest and the highest prices are accounted for, then an interval-valued time series (ITS) is formed with its intrinsic trend (or level) and volatility information (the range between the boundaries) [29]. ITS is encountered in many fields such as finance [3], energy [30], environment and climate [8], and agriculture [32] to mention but a few.

This paper introduces a novel adaptive interval fuzzy modeling method to model and forecast interval-valued time series. The method processes stream interval-valued input data, employs participatory learning to cluster interval-valued data, develops a fuzzy rule for each cluster, uses weighted recursive least squares to update the parameters of the rule consequents, and outputs interval-valued forecasts. The adaptive interval fuzzy method is evaluated in modeling and forecasting the daily lowest and highest prices of the most traded cryptocurrencies, respectively, BitCoin, Ethereum, XRP and LiteCoin. Its performance is compared against the adaptive neuro-fuzzy inference system (ANFIS), long short-term memory (LSTM) neural network, autoregressive integrated moving average (ARIMA), exponential smoothing state model (ETS), and the naïve random walk benchmarks. These methods do not rely on interval arithmetic, and their interval forecasts are produced from the individual lower and upper bound forecasts.

The rest of this paper proceeds as follows. Section 2 overviews the current literature and state of the art of the area. Section 3 details the nature of the data, the structure of the fuzzy models, and suggests the adaptive method to develop fuzzy models from interval stream data. Section 4 concerns cryptocurrencies modeling and forecasting using actual daily lowest and highest prices data. Section 5 concludes the paper, summarizing its contributions and suggesting issues for future investigation.

2 Literature review

Different approaches have been developed to model and forecast interval-valued data. The first linear regression model for interval-valued data was investigated in [4] using the center of the interval method. A constrained center and interval range method with nonnegative constraints on the coefficients of the range regression model is addressed in [16]. The use of ARIMA and neural networks models to forecast the center and range of intervals is pursued in [21]. An autoregressive time series modeling approach is addressed in [33], and a threshold autoregressive interval model in [29]. Nonparametric alternatives such as the interval kernel regression method [7] and the nonparametric additive approach [15] have also been developed.

Machine learning has also been used in interval-valued data modeling and forecasting. Examples include the interval multilayer perceptron (iMLP) model [26], a multilayer perceptron (MLP) neural network and Holt’s exponential smoothing [20], a multiple-output support vector regression method for interval-valued stock price index forecasting [34].

The possibilistic fuzzy, evolving method for interval time series modeling and forecasting was proposed in [19]. In this case, the model employs memberships and typicalities to recursively cluster data, uses participatory learning to update the forecasting model structure as stream data are input, and processes interval-valued data.

Intuitionistic Fuzzy Grey Cognitive Map (IFGCM) for interval-valued data [9] has also been explored in modeling and forecasting. IFGCM was evaluated using stock market data and the results show the high efficiency of the IFGCM, especially when compared with state-of-the-art models.

Participatory learning in fuzzy clustering [27] was extended for clustering interval data in [18]. The clustering method uses interval arithmetic, and the computational experiments are reported using synthetic interval data sets with linearly non-separable clusters of different shapes and sizes.

This paper introduces an adaptive interval fuzzy method to model stream interval-valued data. The method employs participatory learning to cluster interval-valued data using the Hausdorff–Pompeiu [11, 14] distance to compute the (dis)similarity between intervals. Moore and Hukuhara subtraction are studied to verify and evaluate the impact of the distinct subtraction operations on ITS modeling and forecasting performance. Moreover, the paper also contributes with an alternative forecasting technique for cryptocurrencies based on highest and lowest prices, which serve as a measure of volatility, an essential information in decision making and risk management in high price variation markets such as digital coins.

3 Adaptive fuzzy modeling of interval-valued stream data

This section addresses the nature of data, the structure of interval fuzzy models, and develops an adaptive method to construct the models within the framework of evolving participatory learning from interval-valued stream data [2, 18]. First, a brief reminder of interval-valued time series and interval arithmetic is given. Next, the structure of the rule-based model is shown. Finally, the interval adaptive fuzzy modeling method is explained, and the computational steps of the corresponding procedure are summarized.

3.1 Interval time series and interval arithmetic

An interval-valued time series (ITS) is a sequence of interval-valued data indexed by successive time steps \(t=1,2,\ldots ,N\). An interval datum is expressed as \([x]=[x^{\mathrm{L}},x^{\mathrm{U}}]\in {\mathcal {K}}_c({\mathbb {R}})\), where \( {\mathcal {K}}_c({\mathbb {R}}) = \{[x^{\mathrm{L}},x^{\mathrm{U}}]: x^{\mathrm{L}}, x^{\mathrm{U}} \in {\mathbb {R}}, x^{\mathrm{L}}\le x^{\mathrm{U}}\}\) is the set of closed intervals of the real line \({\mathbb {R}}\), and \(x^{\mathrm{L}}\) and \(x^{\mathrm{U}}\) are the lower and upper bounds of the interval [x]. An interval [x] may also be expressed by a two-dimensional vector \([x] = [x^{\mathrm{L}},x^{\mathrm{U}}]^T\).

Modeling ITS requires interval arithmetic, an extension of traditional arithmetic to operate on intervals. This paper uses the arithmetic operations introduced by Moore [23] summarized below:

$$\begin{aligned}{}[x] + [y]&= [x^{\mathrm{L}} + y^{\mathrm{L}}, x^{\mathrm{U}} + y^{\mathrm{U}}]\nonumber \\ \left[ x\right] - \left[ y\right]&= [x^{\mathrm{L}} - y^{\mathrm{U}}, x^{\mathrm{U}} - y^{\mathrm{L}}]\nonumber \\ \left[ x\right] \cdot \left[ y\right]&= \left[ {\text {min}}\{x^{\mathrm{L}} y^{\mathrm{L}}, x^{\mathrm{L}} y^{\mathrm{U}},x^{\mathrm{U}} y^{\mathrm{L}}, x^{\mathrm{U}} y^{\mathrm{U}}\},\right. \nonumber \\&\left. {\text {max}}\{x^{\mathrm{L}}y^{\mathrm{L}}, x^{\mathrm{L}}y^{\mathrm{U}},x^{\mathrm{U}} y^{\mathrm{L}}, x^{\mathrm{U}} y^{\mathrm{U}}\}\right] \nonumber \\ \left[ x\right] /\left[ y\right]&= \left[ x\right] \cdot \left( 1/\left[ y\right] \right) \quad {\text {with}} \,\, 1/\left[ y\right] = [1/y^{\mathrm{U}},1/y^{\mathrm{L}}]. \end{aligned}$$
(1)

It is well known that \([x] - [x] \ne [0]\) for Moore subtraction, with \([0] = [0,0]\). An alternative subtraction operation for which \([x] - [x] = [0]\) is recovered is the generalized Hukuhara difference [28] defined as follows:

$$\begin{aligned}&\left[ x\right] - \left[ y\right] = [\min {\{x^{\mathrm{L}} - y^{\mathrm{L}},x^{\mathrm{U}}- y^{\mathrm{U}}\}},\nonumber \\&\quad \max {\{x^{\mathrm{L}} - y^{\mathrm{L}},x^{\mathrm{U}}- y^{\mathrm{U}}\}}]. \end{aligned}$$
(2)

This paper considers both Moore and Hukuhara subtraction to verify and evaluate their impact on ITS modeling and forecasting performance.

The forecasting ability of the models must be verified and tested using interval-based accuracy metrics, which requires the notions of union and intersection of intervals [x] and [y] [23]. They are defined as follows:

$$\begin{aligned}&[x] \cup [y] = [\min {\{x^{\mathrm{L}},y^{\mathrm{L}}\}},\max {\{x^{\mathrm{U}},y^{\mathrm{U}}\}}], \end{aligned}$$
(3)
$$\begin{aligned}&[x] \cap [y] = [\max {\{x^{\mathrm{L}},y^{\mathrm{L}}\}},\min {\{x^{\mathrm{U}},y^{\mathrm{U}}\}}]. \end{aligned}$$
(4)

The intersection of [x] and [y] is empty if \(\max {\{x^{\mathrm{L}},y^{\mathrm{L}}\}} > \min {\{x^{\mathrm{U}},y^{\mathrm{U}}\}}\). Real numbers are considered as intervals of zero length, that is, intervals whose lower bounds are equal to the upper bounds.

The concept of distance between two intervals is important to measure their (dis)similarity. The Hausdorff–Pompeiu distance measures how far two intervals [x] and [y] are from each other [11, 14]. Denoted dH([x], [y]), it is computed using:

$$\begin{aligned} dH([x],[y])= {\mathrm {max}}\left\{ |x^{\mathrm{L}} - y^{\mathrm{L}}|, |x^{\mathrm{U}} - y^{\mathrm{U}}|\right\} . \end{aligned}$$
(5)

3.2 Fuzzy interval model structure

A fuzzy interval model is a collection of interval fuzzy rules, a fuzzy rule base in which each rule has two parts: an antecedent part identifying the state of the interval-valued input variable, and a consequent part specifying the corresponding interval-valued output variable. Adaptive fuzzy interval modeling (aFIM) uses functional interval fuzzy rules, an interval extension of functional fuzzy rules of the form introduced in [31]. An interval fuzzy rule base is a collection of functional interval fuzzy rules of the form:

(6)

where \({\mathcal {R}}_i\) is the ith fuzzy rule, \(i=1,2,\ldots ,c\), c is the number of fuzzy rules in the rule base, \([{\mathbf {x}}]=\left( [x_1],[x_2],\ldots ,[x_n]\right) ^T\) the vector of inputs, \([x_j] =[x^{\mathrm{L}}_j,x^{\mathrm{U}}_j]\in {\mathcal {K}}_c({\mathbb {R}})\), \(j=1,\ldots ,n\), and \([\beta _{i,l}]=[\beta ^{\mathrm{L}}_{i,l},\beta ^{\mathrm{U}}_{i,l}]\in {\mathcal {K}}_c({\mathbb {R}})\) are interval-valued parameters of the rule consequent, \(l=0,1,\ldots ,n\). \({\mathcal {M}}_i\) is the fuzzy set of the antecedent whose membership function is \(\mu _i([{\mathbf {x}}]): {\mathcal {K}}_c({\mathbb {R}}^{p}) \rightarrow [0,1]\), and \([y_i] = [y^{\mathrm{L}}_i,y^{\mathrm{U}}_i] \in {\mathcal {K}}_c({\mathbb {R}})\) is the output of the ith rule. Fuzzy inference with interval functional rules (6) produces outputs [y] as the weighted average:

$$\begin{aligned}{}[y]= & {} \sum _{i=1}^{c}{\left( \frac{\mu _i([{\mathbf {x}}])[y_i]}{\sum _{j=1}^{c}{{\mu _j}([{\mathbf {x}}])}}\right) }\nonumber \\= & {} \sum _{i=1}^{c}{\lambda _i[y_i]}, \end{aligned}$$
(7)

where \(\lambda _i = \displaystyle {\frac{\mu _i([{\mathbf {x}}])}{\sum _{j=1}^{c}{\mu _j([{\mathbf {x}}])}}}\) is the normalized degree of activation of the ith rule. The membership degree \(\mu _i([{\mathbf {x}}])\) of datum \([{\mathbf {x}}]\) is given by:

$$\begin{aligned}&\mu _i([{\mathbf {x}}])= \left[ \sum _{h=1}^c{\left( \frac{\sum _{j=1}^n{\left( {\mathrm {max}}\left\{ |x^{\mathrm{L}}_j - v^{\mathrm{L}}_{i,j}|, |x^{\mathrm{U}}_j - v^{\mathrm{U}}_{i,j}|\right\} \right) }}{\sum _{j=1}^n{\left( {\mathrm {max}}\left\{ |x^{\mathrm{L}}_j - v^{\mathrm{L}}_{h,j}|, |x^{\mathrm{U}}_j - v^{\mathrm{U}}_{h,j}|\right\} \right) }}\right) ^{\frac{2}{(m-1)}}}\right] ^{-1}, \end{aligned}$$
(8)

where m is a fuzzification parameter (usually \(m=2\)), and \([v_{i,j}] = [v^{\mathrm{L}}_{i,j},v^{\mathrm{U}}_{i,j}]\in {\mathcal {K}}_c({\mathbb {R}})\), \(j = 1,\ldots ,n\) and \(i=1,\ldots ,c\) is the cluster center of the ith cluster/rule.

Functional interval fuzzy modeling uses parameterized fuzzy regions of the data space, and associates each region with a local affine interval-valued model. The nature of the rule-based model emerges from the fuzzy weighted combination of the collection of the multiple local interval models. The contribution of a local model to the model output is proportional to the normalized activation degree of the corresponding rule. The construction of interval-valued fuzzy models needs two steps: (1) to learn the antecedent part of the rules using fuzzy clustering to granulate the input interval-valued data space into parameterized regions; and (2) to estimate the parameters of the local models of the rules consequents. These two steps are detailed in the next section.

3.3 Learning the antecedents of interval fuzzy rules

Identification of the fuzzy rules of an interval fuzzy model is done using the participatory learning fuzzy clustering algorithm extended to handle interval-valued data [18]. The aim of interval-valued data clustering is to partition an interval data set \([{\mathbf {X}}] = \{ [{\mathbf {x}}_1], \ldots , [{\mathbf {x}}_N]\}\) in c fuzzy subsets, \(2\le c \le N\), where N is the number of samples. Intervals \([{\mathbf {x}}_{j}]\) bounds are assumed to be normalized using min–max operator:

$$\begin{aligned} x^{\mathrm{B}}_{\mathrm{norm}} = \frac{x^{\mathrm{B}}-\min {\{x^{\mathrm{L}},x^{\mathrm{U}}\}}}{\max {\{x^{\mathrm{L}},x^{\mathrm{U}}\}}-\min {\{x^{\mathrm{L}},x^{\mathrm{U}}\}}}, \end{aligned}$$
(9)

where B denotes either the lower bound L, or the upper bound U of the interval.

Participatory learning clustering is based on the idea that the current cluster structure influences the cluster learning process whenever a new data sample is an input. Model learning depends on what the system already knows about the model itself. The input data and the cluster structure affect self-organization depending on the compatibility of the data with the current cluster structure [27]. The impact of new data in inducing model revision depends on its compatibility with the current rule base structure or, equivalently, its compatibility with the current cluster structure.

The cluster structure is defined by the cluster centers. If \([{\mathbf {V}}] = [[{\mathbf {v}}_1],\ldots ,[{\mathbf {v}}_c]]\), \([{\mathbf {v}}_i]=\left( [v_{i,1}],\ldots ,[v_{i,n}]\right) ^T\), \([v_{i,j}] = [v^{\mathrm{L}}_{i,j},v^{\mathrm{U}}_{i,j}]\in {\mathcal {K}}_c([0,1])\), \(j = 1,\ldots ,n\) and \(i=1,\ldots ,c\), then \([{\mathbf {V}}]\) represents a cluster structure. The aim of the antecedent learning is to determine \([{\mathbf {V}}]\) from inputs \([{\mathbf {x}}]^t \in [0,1]^n\), \(t=1,\ldots \), i.e., \([{\mathbf {x}}]^t\) is used as a vehicle to learn about \([{\mathbf {V}}]\).

The learning process is participatory if the contribution of \([{\mathbf {x}}]^t\) to the learning process depends upon its acceptance by the cluster structure \([{\mathbf {V}}]^{t}\) at t being valid. In other words, data \([{\mathbf {x}}]^t\) must be compatible with current cluster structure \([{\mathbf {V}}]^{t}\). The compatibility \(\rho _i^{t}\in [0,1]\) of input \([{\mathbf {x}}]^t\) with the cluster center \([{\mathbf {v}}_i]^{t}\) of \([{\mathbf {V}}]^{t}\), \(i=1,\ldots ,c\) is calculated as:

$$\begin{aligned}&\rho _i^{t} = 1 - \frac{1}{n}\sum _{j=1}^n{\left( {\mathrm {max}}\left\{ |x^{{\mathrm{L}},t}_j - v^{{\mathrm{L}},t-1}_{i,j}|, |x^{{\mathrm{U}},t}_j - v^{{\mathrm{U}},t-1}_{i,j}|\right\} \right) }. \end{aligned}$$
(10)

Participatory clustering updates only the cluster center whose compatibility with input \([{\mathbf {x}}]^t\) is the highest. Thus, if cluster center \([{\mathbf {v}}_i]^{t-1}\) is the most compatible with \([{\mathbf {x}}]^t\), that is, \(i = {\mathrm {arg}} \max _{j=1,\ldots ,c}{\{\rho _j^t\}}\), then it is updated as follows:

$$\begin{aligned}&{[}{\mathbf {v}}_i]^{t} = [{\mathbf {v}}_i]^{t-1} + G_{i}^{t}([{\mathbf {x}}]^{t} - [{\mathbf {v}}_{i}]^{t-1}), \,\,G_{i}^{t} = \alpha \rho _{i}^{t} \end{aligned}$$
(11)

where \(\alpha \in [0,1]\) is the basic learning rate. Notice that (11) needs the value of (\([{\mathbf {x}}]^{t} - [{\mathbf {v}}_{i}]^{t-1}\)). Both Moore’s (1) and generalized Hukuhara subtraction (2) are used in (11) to verify if they make a difference in interval time series modeling and forecasting performance.

Adaptive modeling is particularly relevant in time-varying domains. A sequence of input data with low compatibility with the current cluster structure indicates that the current model should be revised in front of new information. Participatory learning uses an arousal mechanism to monitor how the compatibility measure values progress, i.e., the arousal mechanism acts as a reminder of when current cluster structure should be revised, given the new information carried in the data. A high arousal value indicates less confidence in how the current model fits recent input data. Thus, arousal can be seen as the complement of confidence of [18]. A way to express the arousal \(a_i^{t} \in [0,1]\) at step t is:

$$\begin{aligned} a_{i}^{t} = a_{i}^{t-1} + \beta (1-\rho _{i}^{t} - a_{i}^{t-1}), \end{aligned}$$
(12)

where \(\beta \in [0,1]\) controls the rate of change of arousal. The closer \(\beta \) is to one, the faster the learning process senses compatibility variations. If the values of arousal \(a_i^{t}\) are greater than or equal a threshold \(\tau \in [0,1]\) for \(i=1,\ldots ,c\), then a new cluster should be created, assigning the current data as its cluster center, that is, \([{\mathbf {v}}_{c+1}]^{t} = [{\mathbf {x}}]^{t}\). Otherwise, the center with the highest compatibility is updated to accommodate input the data using (11). The arousal mechanism (12) becomes part of the learning process by converting \( G_{i}^{t}\) of (11) in an effective learning rate:

$$\begin{aligned} G_{i}^{t} = \alpha (\rho _{i}^{t})^{1-a_{i}^{t}}. \end{aligned}$$
(13)

Updates of cluster centers (11) can be viewed as a form of exponential smoothing modulated by the compatibility of data with the model structure, the very nature of participatory learning. Participatory clustering also accounts for redundant clusters because updating a cluster center using (11) can cause the center to get closer to another one, and a redundant cluster may be formed. A cluster i is redundant if its similarity with any other cluster h, \(\rho _{i,h}^{t}\), is greater than or equal to a threshold \(\lambda \in [0,1]\). The similarity between cluster centers i and h is found using:

$$\begin{aligned} \rho _{i,h}^{t}&= 1 - \frac{1}{n}{dH([{\mathbf {v}}_i]^t,[{\mathbf {v}}_h]^t)} \nonumber \\&= 1 - \frac{1}{n}\sum _{j=1}^n{\left( {\mathrm {max}}\left\{ |v^{{\mathrm{L}},t}_{i,j} - v^{{\mathrm{L}},t}_{h,j}|, |v^{{\mathrm{U}},t}_{i,j} - v^{{\mathrm{U}},t}_{h,j}|\right\} \right) }. \end{aligned}$$
(14)

If clusters i and h are declared redundant, then they are replaced by a cluster whose center is the average of their centers. Figure 1 summarizes the rule antecedent learning process of aFIM.

Fig. 1
figure 1

Participatory learning in clustering interval stream data and antecedents learning

3.4 Parameter estimation of interval fuzzy rule consequents

To complete the interval rule-based model construction procedure, the interval-valued parameters \( [\beta _{i,0}],[\beta _{i,1}], \ldots , [\beta _{i,n}]\) of the rule consequent are estimated using the classic weighted recursive least squares algorithm (wRLS) [1, 17] to compute the lower bounds \(\beta ^{\mathrm{L}}_{i,l}\) and the upper bounds \(\beta ^{\mathrm{U}}_{i,l}\), \(i = 1,\ldots , c\) and \(l=0,1,\ldots ,n\) separately. To do this, expression (7) is expressed in terms of the input interval data bounds individually, namely \(y^{\mathrm{B}} = \Lambda ^T \Theta \) where \(\Lambda = \left[ \lambda _1{\mathbf {x}}^T_e,\lambda _2{\mathbf {x}}^T_e,\ldots ,\lambda _c{\mathbf {x}}^T_e\right] ^T\) is the fuzzily weighted input data, \({\mathbf {x}}_e = \left[ 1, x^{\mathrm{B}}_1, x^{\mathrm{B}}_2, \ldots , x^{\mathrm{B}}_n\right] ^T\) is the extended input, \(\Theta = \left[ {\varvec {{\beta }}}_1^T,{\varvec {{\beta }}}_2^T,\ldots ,{\varvec {{\beta }}}_c^T\right] ^T\), and \({\varvec {{\beta }}}_i = [{\beta }^{\mathrm{B}}_{i,0}, {\beta }^{\mathrm{B}}_{i,1}, \ldots , {\beta }^{\mathrm{B}}_{i,n}]^T\). Recall that superscript B denotes either the lower L, or the upper bound U of the interval. The locally optimal error criterion wRLS is considered:

$$\begin{aligned} \min {E_i^{t}} = \min {\sum _{k=1}^t{\lambda _i\left( y^{{\mathrm{B}},k} - ({\mathbf {x}}_{e}^k)^T{\varvec {\beta }}_{i}^k\right) ^2}}, \end{aligned}$$
(15)

whose solution can be expressed recursively by [17]:

$$\begin{aligned}&{\varvec {{\beta }}}_{i}^{t+1} = {\varvec {{\beta }}}_{i}^t + \varSigma _{i}^t{\mathbf {x}}_{e}^t \lambda _{i}^t\left( y^{{\mathrm{B}},t}({\mathbf {x}}_{e}^t)^T{\varvec {{\beta }}}_{i}^t \right) , \,\,\, {\varvec {{\beta }}}_{i}^0=0, \end{aligned}$$
(16)
$$\begin{aligned}&\varSigma _{i}^{t+1} = \varSigma _{i}^t - \frac{\lambda _{i}^t \varSigma _{i}^t {\mathbf {x}}_{e}^t({\mathbf {x}}_{e}^t)^T \varSigma _{i}^t}{1+ \lambda _{i}^t ({\mathbf {x}}_{e}^t)^T \varSigma _{i}^t{\mathbf {x}}_{e}^t}, \quad \varSigma _{i}^0 = \varOmega I, \end{aligned}$$
(17)

where \(\varOmega \) is a large number (typically \(\varOmega = 1000\)), and \(\varSigma \) is the dispersion matrix.

More precisely, parameter estimation of interval fuzzy rule consequents is done is two steps. The first step uses expressions (16)–(17) to individually estimate the lower bounds \(\beta ^{\mathrm{L}}_{i,l}\) and the upper bounds \(\beta ^{\mathrm{U}}_{i,l}\) of the interval-valued parameters \([\beta _{i,l}]=[\beta ^{\mathrm{L}}_{i,l},\beta ^{\mathrm{U}}_{i,l}]\), \(l=0,1,\ldots ,n\). the second step computes the outputs of the fuzzy rules (6) at \(t+1\) using:

$$\begin{aligned}{}[y_i]^{t+1} = [\beta _{i,0}]^t + [\beta _{i,1}]^t[x_1]^t+\cdots +[\beta _{i,n}]^t[x_n]^t. \end{aligned}$$
(18)

\(i=0,1,\ldots ,c\). The model output \([y]^{t+1}\) at \(t+1\) is the weighted average of the outputs \([y_i]^{t+1}\) computed using (7).

3.5 Adaptive interval fuzzy modeling method

Adaptive interval fuzzy modeling is inherently recursive, which means that it is memory efficient in continuously and endlessly learning and adaptation using stream data. This is of major importance, especially in online and real-time application situations. The procedure to construct interval fuzzy models is as follows.

figure a

4 Computational experiments

This section illustrates the adaptive fuzzy interval modeling method (aFIM) introduced in this paper in modeling and forecasting the daily lowest and highest prices of the currently most traded cryptocurrencies, respectively, BitCoin, Ethereum, XRP and LiteCoin. Moreover, the performance of aFIM is compared against ANFIS, LSTM, ARIMA, ETS, and the naïve random walk.

4.1 Cryptocurrency trade

Current interest in cryptocurrencies has offered to investors and speculators a diversity of electronic crypto assets to trade due to benefits such as anonymity, transparency, lower transaction costs, and diversification [6]. Because of their high volatile dynamics, risk management of cryptocurrencies investment positions based on volatility modeling and forecasting is of key interest. When risk management models are built based on intervals forecasts, more accurate analyses can be done. Intervals of daily highest and lowest digital currency prices convey a better idea of portfolio risk perception (volatility) at a time period than a single value such as closing prices. Interval-valued modeling and forecasting inherently embed price variability information contained in the data samples. This section addresses the performance evaluation of the adaptive fuzzy interval modeling (aFIM) using real-world ITS data from the cryptocurrency market.

4.2 Data

Adaptive fuzzy interval modeling and forecasting use daily highest and lowest prices as interval bounds of four leading cryptocurrencies: BitCoin, XRP (Ripple), Ethereum, and LiteCoin, respectively. Data cover January 1, 2016 to December 31, 2019, with a total of 1461 observations.Footnote 1 The data set was divided into in-sample and out-of-sample sets. The in-sample data set contains data from January 1, 2016 to December 31, 2018, and the out-of-sample data contains the 2019 data. Thus, 2019 data are used to evaluate the modeling power and forecasting performance of aFIM and its counterparts. For the in-sample set, data from January 1, 2016 to December 31, 2017 were used for models learning, and the year of 2018 was used as a validation set on choosing control parameters. aFIM processes data as a stream and learning may be done endlessly, which means that there is no need to split data in training, validation, and testing data as usual in machine learning experiments. In spite of that, a data set split was done to keep evaluations and comparisons with the competing methods fair. Forecasting is one-step-ahead with an iterated strategy. The forecasting techniques considered for comparison are the autoregressive moving average (ARIMA), the exponential smoothing state-space model (ETS) [12], the adaptive neuro-fuzzy inference system (ANFIS), a long short-term memory (LSTM) neural network, and the naïve random walk (RW). These methods produce interval forecasts from the individual lowest and highest daily currency data.

4.3 Performance measures

Performace evaluation of the methods is accessed considering the mean absolute percentage error (MAPE) and the root mean squared scaled error (RMSSE) [3], respectively:

$$\begin{aligned}&{\mathrm {MAPE}}^{\mathrm{B}} = \frac{100}{N} \sum _{t=1}^N{\frac{|y^{{\mathrm{B}},t} - {\hat{y}}^{{\mathrm{B}},t}|}{y^{{\mathrm{B}},t}}}, \end{aligned}$$
(19)
$$\begin{aligned}&{\mathrm {RMSSE}}^{\mathrm{B}} = \sqrt{{\mathrm {mean}}\left( \frac{(y^{{\mathrm{B}},t} - {\hat{y}}^{{\mathrm{B}},t})^2}{\frac{1}{N-1}\sum _{i=2}^N{(y^{{\mathrm{B}},i}-y^{{\mathrm{B}},i-1})^2}}\right) }, \end{aligned}$$
(20)

where the superscript B denotes either the lower bound L, or the upper bound U of the interval prices, \([y]^t=[y^{{\mathrm{L}},t},y^{{\mathrm{U}},t}]\) is the actual price interval, \([{\hat{y}}]^t=[{\hat{y}}^{{\mathrm{L}},t},{\hat{y}}^{{\mathrm{U}},t}]\) is the forecast price interval at t, and N is the size of the out-of-sample data set.

Additionally, in practice, the direction of price change is as important as, sometimes more important than, the magnitude of the forecasting error [5]. A measure of direction accuracy (DA) [10] is as follows:

$$\begin{aligned} {\mathrm {DA}}^{\mathrm{B}} &= \frac{1}{N}\sum _{t=1}^N{Z^{{\mathrm{B}},t}}, \\ Z^{{\mathrm{B}},t} &= \left\{ \begin{array}{ll} 1, &\quad {\mathrm {if}} \ \ \left( {\hat{y}}^{{\mathrm{B}},t+1}- y^{{\mathrm{B}},t}\right) \left( y^{{\mathrm{B}},t+1}- y^{{\mathrm{B}},t}\right) >0, \\ 0, &\quad {\mathrm {otherwise}}. \end{array} \right. \end{aligned}$$
(21)

Because MAPE, RMSSE, and DA are computed individually for each bound of an interval, they neglect the inherent interval nature of the data. Performance evaluation of forecasting methods of interval time series data uses the normalized symmetric difference of intervals (NSD) [25] defined by:

$$\begin{aligned}&{\mathrm {NSD}}([y]^t,[{\hat{y}}]^t) \nonumber \\&\quad = \frac{1}{N} \sum _{t=1}^N{\frac{w([y]^t \cup [{\hat{y}}]^t)-w([y]^t \cap [{\hat{y}}]^t)}{w([y]^t \cup [{\hat{y}}]^t)}}, \end{aligned}$$
(22)

as well as the average coverage (\(R^{\mathrm{C}}\)) rate, and the efficiency rate (\(R^{\mathrm{E}}\)) [25]:

$$\begin{aligned}&R^{\mathrm{C}} = \frac{1}{N} \sum _{t=1}^N{\frac{w([y]^t \cap [{\hat{y}}]^t)}{w([y]^t)}}, \nonumber \\&R^{\mathrm{E} }= \frac{1}{N} \sum _{t=N}^n{\frac{w([y]^t \cap [{\hat{y}}]^t)}{w([{\hat{y}}]^t)}}. \end{aligned}$$
(23)

\(R^{\mathrm{C}}\) and \(R^{\mathrm{E}}\) give information on what part of ITS data are covered by their forecasts (coverage), and what part of the forecasts cover ITS data (efficiency). If the observed intervals are fully enclosed in the predicted intervals, then the coverage rate is 100%, but the efficiency could be less than 100% and reveal the fact that the forecasted ITS is in average wider than the actual ITS data samples. Hence, \(R^{\mathrm{C}}\) and \(R^{\mathrm{E}}\) values must be considered jointly. As the literature indicates [25], good forecast performance is expected when the average coverage and efficiency rates are reasonably high, and the difference between them is small.

4.4 Results

The aFIM is compared with ARIMA, ETS, ANFIS, LSTM and RW in cryptocurrencies lowest and highest price forecast. ARIMA, ETS, ANFIS, LSTM and RW are univariate techniques, and their price forecasts are developed for the interval bounds individually. Contrary, aFIM is an interval-based method and develops interval-valued price forecasts. aFIM assumes that forecasts are produced assuming:

$$\begin{aligned} {[}{\hat{y}}]^{t+1} = f\left( [y]^t, [y]^{t-1},\ldots ,[y]^{t-l} \right) , \end{aligned}$$
(24)

that is, the inputs are lagged values of the interval time series \([y]^{t}=[y^{{\mathrm{L}},t},y^{{\mathrm{U}},t}]\), the output is the one-step-ahead forecast \([{\hat{y}}]^{t+1}=[{\hat{y}}^{{\mathrm{L}},t+1},{\hat{y}}^{{\mathrm{U}},t+1}]\), and f encodes the aFIM output in (7).

Performance evaluation was done using the out-of-sample data, i.e., from January 1, 2019 to December 31, 2019. aFIM was implemented in MATLAB, while ARIMA, ETS, ANFIS and LSTM were developed using software R. Table 1 shows the values of the control parameters for each method. aFIM parameters were chosen by doing simulations to select the ones that give the best MAPE and RMSSE values. Structures of ARIMA, ETS, ANFIS and LSTM are different for lower (L) and upper (U) interval bounds. In ARIMA(p, d, q), p, d and q are the number of autoregressive, difference, and moving average terms, respectively. In ETS(er, tr, sea), er, tr, and sea mean error, trend, and season components, respectively, such that each component could be of the type A, M or N, denoting additive, multiplicative, and none, respectively. In ANFIS(l), l is the number of lagged time series values (inputs). In LSTM(l, h), l is the number of lagged time series values (inputs), and h is the number of hidden units. The structures of ARIMA, ETS, ANFIS and LSTM were selected to achieve their highest accuracy. aFIM parameters were chosen by doing simulations to select the ones that give the best MAPE and RMSSE values. Recall that aFIM may use either Moore (\({\hbox {aFIM}}_{{\mathrm {M}}}\)) and generalized Hukuhara (\({\hbox {aFIM}}_{\mathrm {H}}\)) subtractions. ANFIS is as detailed in [13]. LSTM has one layer, 800 as the maximum number of epochs, and uses tangent hyperbolic and sigmoid as state and gate activation functions, respectively.

Table 1 Structure of the forecasting models and their respective control parameters

Table 2 summarizes the forecasting performance in terms of MAPE, RMSSE, and DA. Best results are highlighted in bold. aFIM outperforms all competitors from the point of view of MAPE and RMSSE, except for Ethereum high prices when MAPE is the lowest. LSTM is the second best approach, from the point of view of lowest values of MAPE and RMSSE, in general. It is worth to note that naïve random walk performs worst than aFIM, but better than ARIMA, ETS and ANFIS, in line with the well-known “Meese–Rogoff puzzle” [22] which states that exchange rate forecasting models are not able to outperform random walk. The simulations show that this is also the case for cryptocurrency forecasts, but when an adaptive nonlinear model such as aFIM or LSTM is considered, the RW can be outperformed. Concerning the use of the two distinct interval subtraction operations in aFIM, the generalized Hukuhara generally produces, except for Ethereum, slightly higher performance for all currencies lowest and highest prices forecasts.

It is interesting to recall that the Meese–Rogoff puzzle [24] also suggests that dynamic forecasting models can outperform the random walk in out-of-sample forecasting if performance is measured by the direction of change and profitability. Indeed, this is the case in cryptocurrency price forecasting as well. Table 2 shows that aFIM outperforms random walk, ARIMA, ETS, ANFIS and LSTM from the point of view of the direction accuracy (DA). These results agree with those of [5, 24], in which the random walk is decisively beaten when the direction is used as a comparison measure. The high performance that aFIM achieves in predicting directions of price changes is due to its evolving, continuous adaptation ability to capture price changes more accurately in time-varying environments such as digital currency markets. When trading strategies use direction, the potential to anticipate price change is crucial.

Table 2 MAPE, RMSSE and DA performance for one-step-ahead forecasting of the daily cryptocurrencies lowest and highest prices

To further illustrate the forecasting efficiency of aFIM, Fig. 2 shows an example of the actual and forecast values developed for BitCoin (BTC) daily lowest and highest prices from July to December 2019. Prices are in logarithmic values to give better visualization. aFIM is able to accurately predict both the lowest and highest price dynamics of the BTC during the period evaluated. Because the differences between lowest and highest values give an efficient measure of volatility, aFIM appears as a potentially strategic tool to manage risk in digital coin markets, especially in real-time situations.

Fig. 2
figure 2

BitCoin actual and forecast values of aFIM lowest and highest daily log-prices

Forecasting performance, considering the interval nature of the data, measured by the normalized symmetric divergence (NSD), coverage rate (\(R^{\mathrm{C}}\)), and the efficiency rate (\(R^{\mathrm{E}}\)), are shown in Table 3. Again, aFIM outperforms all the remaining methods. Table 3 also indicates that when forecasts are evaluated by interval-oriented metrics, aFIM achieves superior performance, especially when it uses the generalized Hukuhara subtraction (\({\hbox {aFIM}}_{\mathrm {H}}\)).

Table 3 NSD, \(R^{\mathrm{C}}\) and \(R^{\mathrm{E}}\) for one-step-ahead forecast of cryptocurrencies lowest and highest prices

The average number of fuzzy rules that aFIM developed using the out-of-sample data (the year of 2019) when using Moore/Hukuhara subtraction operator were 4/4 (BitCoin), 4/3 (Ethereum), 3/3 (XRP) and 3/2 (LiteCoin). It shows that model complexity varies slightly depending on the dynamic nature of the the digital coin, and that generally model complexity is insensitive to the choice of either Moore or Hukuhara subtraction.Footnote 2

5 Conclusion

This paper has introduced a novel adaptive fuzzy interval modeling and forecasting method for interval-valued time series. The model collects functional fuzzy rules with a learning mechanism that continuously updates the rule base structure and its parameters using interval stream input data. The structure of the rule base is found employing participatory learning clustering, and parameters of the fuzzy rules consequents are estimated using the weighted recursive least squares algorithm. Experimental evaluation was done using an actual interval time series of cryptocurrency prices whose lowest and highest prices are the low and high interval bounds. One-step-ahead forecasts of the interval-valued prices of BitCoin, Etherem, XRP, and LiteCoin for the period from January 2016 to December 2019 were produced. Comparison of the adaptive fuzzy interval model aFIM with ARIMA, ETS, ANFIS, LSTM and naïve random walk indicate that aFIM outperforms all of them. The performance of aFIM also is the highest when performance is measured by the direction of price changes. This is a key feature, especially when trading with the price and direction-based strategies. Future work shall consider autonomous mechanisms to select and adjust control parameters such as thresholds in real-time, the evaluation of the results trough a trading strategy based on forecasts, and model and forecast interval time series from commodities markets, energy, environment, and transportation systems.