1 Introduction

As the application of information technology is growing very rapidly, data in various formats have also proliferated over the time. However, these data are useless unless they are analyzed and utilized. It is worthless of gathering the information of floods due to heavy rainfall, heart attack due to high blood pressure or the stock market crash down due to bad economic policy, if they are not distinguished (identified) and predicted in advance. This is where the concept of “Advance Prediction/Forecast” arises into the knowledge of human beings.

An expert system associated with the prediction should have the ability to discover patterns or rules, capability to learn from the patterns, and produce the desired outputs. For all these purposes, a huge amount of data is required as an input. Sometimes, the input data requires to be preprocessed as per the requirement of expert systems. The expert prediction system can be designed from data mining (machine learning, statistics) and soft computing (SC). The designed system must be assessed very carefully on the basis of accuracy, cost, complexity, importance and utility. That is, prediction is not an informal process. It involves lots of well-planned steps and domains knowledge, which directly impact on the skill of the system as well as the final outputs.

Thus, it has been concluded that the prediction made by the expert system seems to be very much automatic, which is far from the real story. To reach the performance at the level of the best, it requires lots of experimentation, intelligence and artful investigation of input data. The poor performance of existing systems, thus, always leads to the development of a new expert system. This new system must contain better algorithm or technique from the existing ones.

Time series are evolved by day to day activities of human and machine. Running speed of athletes per second, velocities of cars per minute, the amount of rainfall in per hour, maximum temperature from \(12\) noon to evening 5 p.m., all represent a wide range of time series, which can be analyzed and employed for prediction. However, time series analysis and its prediction itself is a very tedious, such as their preprocessing, their transformation, to identify the suitable input predictor that can enhance the prediction, and adjustment of various parameters associated with the models [12, 104].

1.1 Issues in time series forecasting

Some major issues in time series forecasting are discussed as follows.

  • Models: Is it possible to predict the time series values in advance? If it is so, then which models are best fitted for the data that are characterized by different variables?

  • Quantity of data: What amount of data (i.e., small or massive) needed for the prediction that fit the model well?

  • Improvement in Models: Is there any possibility to improve the efficiency of the models? If yes, then how could it be possible?

  • Factors: What are the factors that influence the time series prediction? Is there any possibility to deal with these factors together? Can integration of these factors affect the prediction capability of the models?

  • Analysis of results: Are results given by the models statistically acceptable? If it is not, then which parameters are needed to be adjusted which can influence the performance of the models?

  • Model constraints: Can linear statistical or mathematical models successfully deal with the non-linear nature of time series data? If it is not possible, then what are the models and how are they advantageous?

  • Data preprocessing: Do data need to be transformed from one form to another? In general, what type of transformation is suitable for data that can directly be employed as input in the models?

  • Consequences of prediction: What are the possible consequences of time series prediction? Are there advance predictions advantageous for the society, politics and economics?

All these issues indicate the need for intelligent forecasting technique, which can discover useful information from data. The term “soft computing (SC)” refers to the overall technique for designing intelligent or expert system. It has been widely used in machine learning, artificial intelligence, pattern recognition, uncertainties and reasoning. More detail discussion on the SC techniques is provided next.

1.2 Soft computing

The term “soft computing (SC)” is a multidisciplinary field which pervades from a mathematical science to computer science, information technology, engineering applications, etc. The conventional computing or hard-computing generally deals with precision, certainty and rigor [182]. However, the main desiderata of SC is to tolerate with imprecision, uncertainty, partial truth, and approximation [168]. SC is influenced by many researchers. Among them, Zadeh’s contribution is invaluable. Zadeh published his most influential work in SC in 1965 [176]. Later, he contributed in this area by publishing numerous research articles on the analysis of complex systems and decision processes [177], approximate reasoning [178180], knowledge representation [181], design and deployment of intelligent systems [183], etc. According to Jiang et al. [76], “SC is not a single methodology. Rather, it is a partnership in which each of the partners contributes a distinct methodology for addressing problems in its domain.” Therefore, SC has been evolving as an amalgamated field of different methodologies such as fuzzy sets, neural computing, evolutionary computing and probabilistic computing [47, 83, 87]. Later, rough sets, chaos computing and immune network theory have been included into the SC [13, 119, 174]. The main objective of hybridizing these methodologies is to design an intelligent machine and find solution to nonlinear problems which can not be modeled mathematically [184].

1.3 Time series events and uncertainty

Since the advance prediction of events like temperature, rainfall, stock price, population growth, economic growth, etc., are major scientific issues in the domain of time series forecasting, imprecise knowledge or information cannot be overlooked in this domain. Because of the nature of the time series data, which is highly non-stationary and uncertain, the decision-making process becomes very tedious. For example, sudden rise and fall of daily temperature, sudden increase and decrease of daily stock index price, sudden increase and decrease of rainfall amount indicate that these events are very uncertain. The characteristics of all these events cannot be described accurately; therefore, it is referred to as “imprecise knowledge” or “incomplete knowledge”. Due to these problems, mathematical or statistical models can not deal with this imprecision knowledge, thereby diluting the accuracy very significantly.

Since forecasting of time series events with \(100\,\%\) accuracy may not be possible, their forecasting accuracy and the speed of forecasting process can be improved. To resolve this problem, Song and Chissom [144] developed a model in \(1993\) based on uncertainty and imprecise knowledge contained in time series data. They initially used the fuzzy sets concept to represent or manage all these uncertainties, and referred this concept as “Fuzzy Time Series (FTS)”.

From \(1994\) onwards, researchers have developed numerous models based on the FTS concept to improve the forecasting accuracy of time series. This study focuses on the application and use of fuzzy sets concept in time series forecasting. The basic knowledge of artificial neural networks (ANNs), rough set (RS) and evolutionary computing (EC) are provided complimentary with the sound background of fuzzy sets, because in many cases a problem can be solved most effectively by hybridizing these techniques together rather independently. Hence, one of the objectives of this article is also to introduce the SC methodologies (such as ANNs, RS and EC) that are employed with FTS to represent and manage the imprecise knowledge in time series forecasting.

1.4 Structure of this work

We begin with some essential definitions associated with fuzzy sets and its extended application in time series forecasting, referred as “Fuzzy time series (FTS)”, in Sect. 2. Next, preamble for the FTS modeling approach is discussed in Sect. 3. Articles that provide numerous contributions in the FTS modeling approach are also discussed in this section. In Sect. 4, hybridized techniques associated with the FTS modeling approach are discussed. Type-2 FTS models are reviewed in Sect. 5. Classification of various FTS models based on input variables are presented in Sect. 7. List of performance measure parameters employed in the FTS modeling approach are presented in Sect. 6. Various unsolved problems and research trends associated with the FTS modeling approach are discussed in Sect. 8. Future directions and conclusions are discussed in Sects. 9 and 10, respectively. The list of abbreviations used in the article is provided in Appendix.

2 Definitions

In this section, we provide various definitions for the terminologies used throughout this article.

In \(1965\), Zadeh [176] introduced fuzzy sets theory involving continuous set membership for processing data in presence of uncertainty. He also presented fuzzy arithmetic theory and its application [175, 177, 178].

Definition 1

(Fuzzy Set) [176]. A fuzzy set is a class with varying degrees of membership in the set. Let \(U\) be the universe of discourse, which is discrete and finite, then fuzzy set \(\tilde{A}\) can be defined as follows:

$$\begin{aligned} \tilde{A}=\left\{ \mu _{\tilde{A}(x_1)}\big /x_1+\mu _{\tilde{A}(x_2)}\big /x_2+ \ldots \right\} =\Sigma _{i}\mu _{\tilde{A}}(x_i)\big /x_i \end{aligned}$$
(1)

where \(\mu _{\tilde{A}}\) is the membership function of \(\tilde{A}\), \(\mu _{\tilde{A}}\): \(U\) \(\rightarrow\) \(\left[ 0,1\right]\), and \(\mu _{\tilde{A}(x_i)}\) is the degree of membership of the element \(x_i\) in the fuzzy set \(\tilde{A}\). Here, the symbol “+” indicates the operation of union and the symbol “/” indicates the separator rather than the commonly used summation and division in algebra, respectively.

When \(U\) is continuous and infinite, then the fuzzy set \(\tilde{A}\) of \(U\) can be defined as:

$$\begin{aligned} \tilde{A}=\left\{ \int \mu _{\tilde{A}(x_i)}\big /x_i\right\} ,\forall x_i \in U \end{aligned}$$
(2)

where the integral sign stands for the union of the fuzzy singletons, \(\mu _{\tilde{A}(x_i)}/x_i\).

Definition 2

(Fuzzy Time Series) [144146]. Let \(Y(t)(t = 0,1,2,\ldots )\) be a subset of \(Z\) and the universe of discourse on which fuzzy sets \(\mu _i(t)(i=1, 2, \ldots )\) are defined and let \(F(t)\) be a collection of \(\mu _i(t)(i=1, 2, \ldots )\). Then, \(F(t)\) is called a fuzzy time series on \(Y(t)(t=0,1,2,\ldots )\).

With the help of following two examples, the concept of FTS can be explained:

[Example 1] The common observations of daily weather condition for certain region can be described using the daily common words “hot”, “very hot”, “cold”, “very cold”, “good”, “very good”, etc. All these words can be represented by fuzzy sets.

[Example 2] The common observations of the performance of a student during the final year of degree examination can be represented using the fuzzy sets “good”, “very good”, “poor”, “bad”, “very bad”, etc.

Above two examples are dynamic processes, and conventional time series models are not applicable to describe these processes [145]. Therefore, Song and Chissom [145] first time uses the fuzzy sets concept in time series forecasting. Later, their proposed method have gained in popularity in scientific community as a “FTS forecasting model”.

Definition 3

(Universe of discourse) [144]. Let \(L_{bd}\) and \(U_{bd}\) be the lower-bound and upper-bound of the time series data, respectively. Based on \(L_{bd}\) and \(U_{bd}\), we can define the universe of discourse \(U\) as:

$$\begin{aligned} U=\left[ L_{bd},U_{bd}\right] \end{aligned}$$
(3)

Definition 4

(Fuzzy logical relationship) [18, 144, 146]. Assume that \(F(t-1)={\tilde{A}_i}\) and \(F(t)={\tilde{A}_j}\). The relationship between \(F(t)\) and \(F(t-1)\) is referred as a fuzzy logical relationship (FLR), which can be represented as:

$$\begin{aligned} {\tilde{A}_i} \rightarrow {\tilde{A}_j}, \end{aligned}$$
(4)

where \({\tilde{A}_i}\) and \({\tilde{A}_j}\) refer to the left-hand side and right-hand side of the FLR, respectively.

Definition 5

(Fuzzy logical relationship group) [18, 144, 146]. Assume the following FLRs:

$$\begin{aligned}&{\tilde{A}_i} \rightarrow {\tilde{A}}_{k1},\\ &{\tilde{A}_i} \rightarrow {\tilde{A}}_{k2},\\ &\cdots\\ &{\tilde{A}}_i \rightarrow {\tilde{A}}_{km}\end{aligned}$$

Chen [18] suggested that FLRs having same fuzzy sets on left-hand side can be grouped into a fuzzy logical relationship group (FLRG). So, based on Chen’s model [18], these FLRs can be grouped into the FLRG as:

$$\begin{aligned} {\tilde{A}_i} \rightarrow {\tilde{A}_{k1}}, {\tilde{A}_{k2}}, \ldots , {\tilde{A}_{km}}. \end{aligned}$$

Definition 6

(High-order FLR) [22]. Assume that \(F(t)\) is caused by \(F(t-1), F(t-2), \ldots ,\) and \(F(t-n)\) \((n > 0)\), then high-order FLR can be expressed as:

$$\begin{aligned} F(t-n),\ldots ,F(t-2),F(t-1)\rightarrow F(t) \end{aligned}$$
(5)

Definition 7

(M-factors FTS). Let FTS \(A(t), B(t), C(t), \ldots , M(t)\) be the factors/observations of the forecasting problems. If we only use \(A(t)\) to solve the forecasting problems, then it is called a one-factor FTS. If we use remaining secondary-factors/secondary-observations \(B(t), C(t), \ldots , M(t)\) with \(A(t)\) to solve the forecasting problems, then it is called M-factors FTS.

One-factor FTS models (referred to as Type-1 FTS models) employ only one variable for forecasting [64, 70]. For example, researchers in articles [32, 64] consider only closing price in forecasting of the stock index. However, the stock index price consists of many different observations, such as opening, high, low, etc. If these additional observations are used with one-factor variable, then it is referred to as M-factors FTS model. The model proposed by Huarng and Yu [71] is based on the M-factors, because they use high and low as the secondary-observations to forecast the closing price of TAIEX.

Definition 8

(Type-2 fuzzy set) [61]. Let \(\tilde{A}(U)\) be the set of fuzzy sets in \(U\). A Type-2 fuzzy set \(\tilde{A}\) in \(X\) is fuzzy set whose membership grades are themselves fuzzy. This implies that \(\mu _{\tilde{A}}(x)\) is a fuzzy set in \(U\) for all \(x\), i.e., \(\mu _{\tilde{A}}: X \rightarrow \tilde{A}(U)\) and

$$\begin{aligned} \tilde{A}=\left\{ (x, \mu _{\tilde{A}}(x)) | \mu _{\tilde{A}}(x) \in \tilde{A}(U) \forall _x \in X \right\} \end{aligned}$$
(6)

The concept of Type-2 fuzzy set and its application in the FTS modeling approach can be found in recent article published by Singh and Borah [1].

Definition 9

(Type-2 FTS model) [71]. A Type-2 FTS model can be defined as an extension of a Type-1 FTS model. The Type-2 FTS model employs the FLRs established by a Type-1 model based on Type-1 observations. Fuzzy operators such as union and intersection are used to establish the new FLRs obtained from both the Type-1 and Type-2 observations. Then, Type-2 forecasts are obtained from these FLRs.

In many cases, rough set (RS) concept [37] is hybridized with the FTS modeling approach. To understand the RS theory in-depth, we need to review some of the basic definitions as follows[126]:

\(U\) is a finite set of objects, i.e. \(U=\{x_1, x_2, x_3,\ldots , x_n\}\). Here, each \(x_1, x_2, x_3,\ldots , x_n\) represents the object.

Definition 10

(Equivalence relation) Let \(R\) be an equivalence relation over \(U\), then the family of all equivalence classes of \(R\) is represented by \(U/R\).

Definition 11

(Lower approximation and upper approximation) \(X\) is a subset of \(U\), \(R\) is an equivalence relation, the lower approximation of \(X\) [i.e., \(\underline{R}(X)\)] and the upper approximation of \(X\) [i.e. \(\overline{R}(X)\)] is defined as follows:

$$\begin{aligned} \underline{R}(X)&=\cup \left\{ x \in U \mid [x]_R \subseteq X\right\} \end{aligned}$$
(7)
$$\begin{aligned} \overline{R}(X)&=\cup \left\{ x \in U \mid [x]_R \cap X \ne \emptyset \right\} \end{aligned}$$
(8)

The lower approximation comprises of all objects that completely belong to the set, and the upper approximation comprises all objects that possibly belong to the set.

Definition 12

(Boundary region) The set of all objects which can be decisively classified neither as members of \(X\) nor as members of non-X with respect to \(R\) is called the boundary region of a set \(X\) with respect to \(R\), and denoted by \(RS_B\).

$$\begin{aligned} RS_B=\overline{R}(X)-\underline{R}(X) \end{aligned}$$
(9)
Fig. 1
figure 1

Basic notations of the rough set

Based on the notions shown in Fig. 1, we can formulate the definitions of crisp set and rough set as follows:

Definition 13

(Crisp set) A set \(X\) is called crisp (exact) with respect to \(R\) if and only if the boundary region of \(X\) is empty.

Definition 14

(Rough set) A set \(X\) is called rough (inexact) with respect to \(R\) if and only if the boundary region of \(X\) is nonempty.

3 FTS modeling approach

Chen [18] proposed a simple calculation method to get a higher forecasting accuracy in the FTS modeling approach. Still this model is used as the basis of FTS modeling. The basic architecture of this model is depicted in Fig. 2. This model employs the following five common steps to deal with the forecasting problems of time series, which are explained below. Contributions of various research articles in different phases of this model are also categorized in this section.

  • Step 1. Partition the universe of discourse into intervals. The universe of discourse can be defined based on Eq. 3. After determination of length of intervals, \(U\) can be partitioned into several equal lengths of intervals. To determine the universe of discourse and partition them into effective lengths of intervals, many researchers provide various solutions in these articlesFootnote 1.

  • Step 2. Define linguistic terms for each of the interval. After generating the intervals, linguistic terms are defined for each of the interval. In this step, we assume that the historical time series data set is distributed among \(n\) intervals (i.e., \(a_1, a_2,\ldots ,\) and \(a_{n}\)). Then, define \(n\) linguistic variables \({\tilde{A}_1}, {\tilde{A}_2}, \ldots , {\tilde{A}_{n}}\), which can be represented by fuzzy sets, as shown below:

    $$\begin{aligned} {\tilde{A}_{1}}&= 1/a_1 + 0.5/a_2 + 0/a_3 + \ldots + 0/a_{n-2} + 0/a_{n-1} + 0/a_{n}, \nonumber \\ {\tilde{A}_{2}}&= 0.5/a_1 + 1/a_2 + 0.5/a_3 + \ldots + 0/a_{n-2} + 0/a_{n-1} + 0/a_{n}, \nonumber \\ {\tilde{A}_{3}}&= 0/a_1 + 0.5/a_2 + 1/a_3 + \ldots + 0/a_{n-2} + 0/a_{n-1} + 0/a_{n}, \nonumber \\&\vdots \nonumber \\ {\tilde{A}_{n}}&= 0/a_n + 0/a_2 + 0/a_3 + \ldots + 0/a_{n-2} + 0.5/a_{n-1} + 1/a_{n}. \end{aligned}$$
    (10)

    Then, we obtain the degree of membership of each time series value belonging to each \(\tilde{A}_i\). Here, maximum degree of membership of fuzzy set \(\tilde{A}_i\) occurs at interval \(a_i\), and \(1 \le i \le n\). Then, each historical time series value is fuzzified. For example, if any time series value belongs to the interval \(a_i\), then it is fuzzified into \(\tilde{A}_i\), where \(1 \le i \le n\).

  • For ease of computation, the degree of membership values of fuzzy set \(\tilde{A}_j (j=1, 2, \ldots , n)\) are considered as either \(0\), \(0.5\) or \(1\), and \(1 \le j \le n\). In Eq.10, for example, \(\tilde{A}_1\) represents a linguistic value, which denotes a fuzzy set \(= \{a_1, a_2, \ldots , a_{n}\}\). This fuzzy set consists of \(n\) members with different degree of membership values \(= \{1, 0.5, 0, \ldots , 0\}\). Similarly, the linguistic value \(\tilde{A}_2\) denotes the fuzzy set \(= \{a_1, a_2, \ldots , a_{n}\}\), which also consists of \(n\) members with different degree of membership values \(= \{0.5, 1, 0.5, \ldots , 0\}\). The descriptions of remaining linguistic variables, viz., \(\tilde{A}_3, \tilde{A}_4, \ldots , \tilde{A}_{n}\), can be provided in a similar manner. Since each fuzzy set contains \(n\) intervals, and each interval corresponds to all fuzzy sets with different degree of membership values. For example, interval \(a_1\) corresponds to linguistic variables \(\tilde{A}_1\) and \(\tilde{A}_2\) with degree of membership values \(1\) and \(0.5\), respectively, and remaining fuzzy sets with degree of membership value \(0\). Similarly, interval \(a_2\) corresponds to linguistic variables \(\tilde{A}_1\), \(\tilde{A}_2\) and \(\tilde{A}_3\) with degree of membership values \(0.5\), \(1\), and \(0.5\), respectively, and remaining fuzzy sets with degree of membership value \(0\). The descriptions of remaining intervals, viz., \(a_3, a_4, \ldots , a_{n}\), can be provided in a similar manner.

  • Liu [115] introduced an improved FTS forecasting method in which the forecasted value is regarded as a trapezoidal fuzzy number instead of a single-point value. They replace the above discrete fuzzy sets (as discussed in Eq. 10) with trapezoidal fuzzy numbers. The main advantage of the proposed method is that the decision analyst can accumulate information about the possible forecasted ranges under different degrees of confidence.

  • Step 3. Fuzzify the historical time series data set. In order to fuzzify the historical time series data, it is essential to obtain the degree of membership value of each observation belonging to each \(\tilde{A}_j\) \((j = 1, 2, \ldots , n)\) for each day/year. If the maximum membership value of one day’s/year’s observation occurs at interval \(a_i\) and \(1 \le i \le n\), then the fuzzified value for that particular day/year is considered as \(\tilde{A}_i\).

  • In FTS model, each fuzzy set carries the information of occurrence of the historic event in the past. So, if these fuzzy sets would not be handled efficiently, then important information may be lost. Therefore, for fuzzification purpose, many researchers provided different techniques in these articles [34, 74, 130].

  • Step 4. Establish the FLRs between the fuzzified time series values, and create the FLRGs. After time series data is completely fuzzified, then FLRs have been established based on Definition 4. The first-order FLR is established based on two consecutive linguistic values. For example, if the fuzzified values of time \(t-1\) and \(t\) are \(\tilde{A}_i\) and \(\tilde{A}_j\), respectively, then establish the first-order FLR as “\(\tilde{A}_i \rightarrow \tilde{A}_j\)”, where “\(\tilde{A}_i\)” and “\(\tilde{A}_j\)” are called the previous state and current state of the FLR, respectively. Similarly, the nth-order FLR is established based on \(n+1\) consecutive linguistic values. For example, if the fuzzified values of time \(t-4\), \(t-3\), \(t-2\), \(t-1\) and \(t\) are \(\tilde{A}_{ai}\), \(\tilde{A}_{bi}\), \(\tilde{A}_{ci}\), \(\tilde{A}_{di}\) and \(\tilde{A}_{ej}\), respectively, then establish the fourth-order FLR as “\(\tilde{A}_{ai}, \tilde{A}_{bi}, \tilde{A}_{ci}, \tilde{A}_{di} \rightarrow \tilde{A}_{ej}\)”, where “\(\tilde{A}_{ai}, \tilde{A}_{bi}, \tilde{A}_{ci}, \tilde{A}_{di}\)” and “\(\tilde{A}_{ej}\)” are called the previous state and current state of the FLR, respectively.

  • Most of the existing FTS modelsFootnote 2 use the first-order FLRs to get the forecasting results. In these articlesFootnote 3, researchers show that the high-order FLRs (see Definition 6) can improve the forecasting accuracy. The main reason of obtaining high accuracy from these high-order FTS models is that it can consider more linguistic values that represent the high uncertainty involved in various dynamic processes. On the other hand, to extract rule from the fuzzified time series data set, Qiu et al. [129] utilized C-fuzzy decision trees [127] in FTS model. They introduced two major improvements in C-fuzzy decision trees, viz., first a new stop condition is introduced to reduce the computational cost, and second weighted C-fuzzy decision tree (WCDT) is introduced where weight distance is computed with information gain. In this approach, the forecast rule are expressed as “if input value is \(\ldots\) then it can be label as \(\ldots\)”. Based on the same previous state of the FLRs, the FLRs can be grouped into a FLRG (see Definition 5). For example, the FLRG “\(\tilde{A}_i \rightarrow \tilde{A}_m, \tilde{A}_n\)” indicates that there are following FLRs:

    $$\begin{aligned} \tilde{A}_i&\rightarrow \tilde{A}_m,\\ \tilde{A}_i&\rightarrow \tilde{A}_n. \end{aligned}$$
  • Step 5. Defuzzify and compute the forecasted values. In articles [144, 155], researchers adopted the following method to forecast enrollments of the University of Alabama:

    $$\begin{aligned} Y(t)=Y(t-1)\circ R, \end{aligned}$$
    (11)

    where \(Y(t-1)\) is the fuzzified enrollment of year \((t-1)\), \(Y(t)\) is the forecasted enrollment of year \(t\) represented by fuzzy set, “\(\circ\)” is the max-min composition operator, and “\(R\)” is the union of fuzzy relations. This method takes much time to compute the union of fuzzy relations \(R\), especially when the number of fuzzy relations is more in Eq. 11 [27, 68]. Therefore, some researchers in these articlesFootnote 4 introduced various solutions for the defuzzification operation. One of the solution introduced by Chen [18] is presented below.

  • This includes the following two principles, viz., Principle 1 and Principle 2. The procedure for Principle 1 is given as follows:

    • Principle 1: For forecasting \(F(t)\), the fuzzified value for \(F(t-1)\) is required, where “t” is the current time which we want to forecast. The Principle 1 is applicable only if there are more than one fuzzified values available in the current state. The steps under Principle 1 are explained next.

      • Step 1. Obtain the fuzzified value for \(F(t-1)\) as \(\tilde{A}_{i}\) \((i=1,2,3\ldots ,n)\).

      • Step 2. Obtain the FLR whose previous state is \(\tilde{A}_{i}\), and the current state is \(\tilde{A}_{j1}, \tilde{A}_{j2}, \ldots , \tilde{A}_{jp}\), i.e., the FLR is in the form of “\(\tilde{A}_{i} \rightarrow \tilde{A}_{j1}, \tilde{A}_{j2}, \ldots , \tilde{A}_{jp}\)”.

      • Step 3. Find the interval where the maximum membership value of the fuzzy sets \(\tilde{A}_{j1}, \tilde{A}_{j2}, \ldots , \tilde{A}_{jp}\) (current state) occur, and let these intervals be \(a_{j1}, a_{j2}, \ldots , a_{jp}\). All these intervals have the corresponding mid-values \(C_{j1}, C_{j2}, \ldots , C_{jp}\).

      • Step 4. Compute the forecasted value as:

        $$\begin{aligned} Forecasted_{value}= \left[ \frac{C_{j1} + C_{j2} + \ldots + C_{jp}}{p}\right] \end{aligned}$$
        (12)

        Here, \(p\) represents the total number of fuzzy sets associated with the current state of the FLR.

    • Principle 2: This principle is applicable only if there is only one fuzzified value in the current state. The steps under Principle 2 are given as follows:

      • Step 1. Obtain the fuzzified value for \(F(t-1)\) as \(\tilde{A}_{i}\) \((i=1,2,\ldots ,n)\).

      • Step 2. Find the FLR whose previous state is \(\tilde{A}_{i}\) and the current state is \(\tilde{A}_{j}\), i.e., the FLR is in the form of “\(\tilde{A}_{i} \rightarrow \tilde{A}_{j}\)”.

      • Step 3. Find the interval where the maximum membership value of the fuzzy set \(\tilde{A}_j\) occurs. Let these interval be \(a_j\) \((j=1,2,3,\ldots ,n)\), and its corresponding mid-value be \(C_j\). This \(C_j\) is the forecasted value for \(F(t)\).

Fig. 2
figure 2

Architecture of Chen’s Model

4 Hybridize modeling approach for FTS

Recently, numerous SC techniques have been employed to deal with the different challenges imposed by the FTS modeling approach. The main SC techniques for this purpose include ANN, RS, and EC. Each of them provides significant solution for addressing domain specific problems. The combination of these techniques leads to the development of new architecture, which is more advantageous and the expert, providing robust, cost effective and approximate solution, in comparison to conventional techniques. However, this hybridization should be carried out in a reasonable, rather than an expensive or a complicated, manner.

In the following, we describe the basics of individual SC techniques and their hybridization techniques, along with the several hybridized models developed for handling forecasting problems of the FTS modeling approach. It should be noted that still there is no any universally recognized method to select particular SC technique(s), which is suitable for resolving the problems. The selection of technique(s) is completely dependent on the problem and its application, and requires human interpretation for determining the suitability of a particular technique.

4.1 Artificial neural network (ANN)

ANNs are massively parallel adaptive networks of simple nonlinear computing elements called neurons which are intended to abstract and model some of the functionality of the human nervous system in an attempt to partially capture some of its computational strengths [91]. The neurons in an ANN are organized into different layers. Inputs to the network are existed in the input layer; whereas outputs are produced as signals in the output layer. These signals may pass through one or more intermediate or hidden layers which transform the signals depending upon the neuron signal functions.

The neural networks are classified into either single-layer or multi-layer neural networks. This layer exists between input layer and output layer. A single-layer feed forward (SLFF) neural network is formed when the nodes of input layer are connected with processing nodes with various weights, resulting to form a series of output nodes. A multi-layer feed forward (MLFF) neural network architecture can be developed by increasing the number of layers in SLFF neural network.

Fig. 3
figure 3

A BPNN architecture with one hidden layer

Researchers employ ANN in various forecasting problems such as electric load forecasting [149], short-term precipitation forecasting [89], credit ratings forecasting [90], tourism demand forecasting [94], etc., due to its capability to discover complex nonlinear relationships [45, 46, 75] in the observations. In literature, several types of neural networks could be found, but usually feed-forward neural network (FFNN) and back-propagation neural network (BPNN) are used in time series forecasting (especially seasonal forecasting).

In Fig. 3, an architecture of BPNN is shown, which consists of only one hidden layer. In this figure, each \(I_n\) represents the input to the input node \(Z_n\), each \(Y_n\) represents the node in the hidden layer, and each \(O_n\) represents the node in the output layer. The main objective of using BPNN with MLFF neural network is to minimize the output error obtained from the difference between the calculated output (\(o_1, o_2, \ldots , o_n\)) and target output (\(n_1, n_2, \ldots , n_n\)) of the neural network by adjusting the weights. So, in the BPNN, each information is sent back again in the reverse direction until the output error is very small or zero. The BPNN is trained under the process of three phases: (a) Using FFNN for training process of input information. Adjustment of weights and nodes are made in this phase, (b) to calculate the error, and (c) update the weights. More detail description on applications of ANN (especially BPNN) can be found in these articles [136, 162].

Fig. 4
figure 4

Block diagrams of FTS-ANN hybridized models

Hybridization of ANN with FTS is a significant development in the domain of forecasting. It is an ensemble of the merits of ANN and FTS, by substituting the demerits of one technique by the merits of another technique. This includes various advantages of ANN, such as parallel processing, handling of large data set, fast learning capability, etc. Handling of imprecise/ uncertainty and linguistic variables are done through the utilization of fuzzy sets. Besides these advantages, the FTS-ANN hybridization helps in designing complex decision-making systems.

ANN can be used in different steps of the FTS modeling approach. These steps are discussed in Sect. 3. Now in Fig. 4, three different hybridized based architectures are presented, where applications of ANN are demonstrated in different steps of the FTS modeling approach. In the first architecture (i.e., Fig. 4a), ANN is responsible for determination of FLRs; in the second architecture (i.e., Fig. 4b), ANN is responsible for partitioning the Universe of discourse; and in the third architecture (i.e., Fig. 4c), ANN is responsible for defuzzification operation. The roles of ANN in these architectures are explained in detailed below.

  1. (a)

    For defining FLRs: In this case, primary inputs for connection-oriented neural network are fuzzified time series values. The neural network is trained in terms of the number of input nodes, hidden nodes and desired outputs. One or more hidden layers are employed to automatically generate the FLRs, which may later be clustered into similar FLRGs. In articles [2, 3], researchers employ FFNN to define high-order FLRs in FTS model. Both these models are applied in forecasting the enrollments of the University of Alabama. Similar to these two approaches, many researchers [52, 67, 72, 169, 173] use the ANN in FTS model to capture the FLRs for improving the forecasted accuracy.

    For defining high-order FLRs, a neural network architecture for the nth-order FLRs is shown in Fig. 5. Here, each input node takes the previous days \(F(t-n)\), \(\ldots\), \(F(t-2)\), \(F(t-1)\) fuzzified time series values, e.g., \(\tilde{A}_l, \ldots , \tilde{A}_m, \tilde{A}_n\) respectively to predict current day \(F(t)\) fuzzified time series value, e.g., \(\tilde{A}_j\). Here, each “\(t\)” represents the day for corresponding fuzzified time series values. Based on the input and output fuzzified values, the nth-order FLRs are established as: \(\tilde{A}_l, \ldots , \tilde{A}_m, \tilde{A}_n \rightarrow \tilde{A}_j\). During simulation, the indices of previous state fuzzy sets (e.g., \(l, \ldots , m, n\)) are used as inputs, whereas index of current state fuzzy set (e.g., \(j\)) is used as target output.

  2. (b)

    For partitioning the Universe of discourse: Data clustering is a popular approach for automatically finding classes, concepts, or groups of patterns [60]. Time series data are pervasive across all human endeavors, and their clustering is one of the most fundamental applications of data mining [86]. In literature, many data clustering algorithms [56, 121, 166] have been proposed, but their applications are limited to the extraction of patterns that represent points in multidimensional spaces of fixed dimensionality [167]. In recent two articles [7, 133], a distance-based clustering algorithm, i.e., the Self-organizing feature maps (SOFM) is employed for determining the intervals of the historical time series data sets by clustering them into different groups. The SOFM is developed by Kohonen [88], which is a class of neural networks with neurons arranged in a low dimensional (often two-dimensional) structure, and trained by an iterative unsupervised or self-organizing procedure [112]. The SOFM converts the patterns of arbitrary dimensionality into response of one-dimensional or two-dimensional arrays of neurons, i.e., it converts a wide pattern space into a feature space. The neural network performing such a mapping is called feature map [142].

  3. (c)

    For defuzzification operation: Based on the BPNN architecture as shown in Fig. 3, Singh and Borah [135] design an ANN architecture, and hybridize it with the FTS model to defuzzify the fuzzified time series values. This neural network architecture is shown in Fig. 5. In this figure, the arrangement of nodes in input layer can be done in the following sequence:

    $$\begin{aligned} F(t-n), \ldots , F(t-2), F(t-1) \rightarrow F(t) \end{aligned}$$
    (13)

    Here, each input node takes the previous days \((t-n), \ldots , (t-2), (t-1)\) fuzzified time series values (e.g., \(\tilde{A}_l, \ldots , \tilde{A}_m, \tilde{A}_n\)) to predict one day \((t)\) advance time series value “\(\tilde{A}_j\)”. In Eq. 13, each “\(t\)” represent the day for considered fuzzified time series values.

Fig. 5
figure 5

ANN architecture for the nth-order FLRs

4.2 Rough set (RS)

RS is a new mathematical tool proposed by Pawlak [125]. The RS concept [37] is based on the assumption that with every associated object of the universe of discourse, some information objects characterized by the same information are indiscernible in the view of the available information about them. Any set of all indiscernible objects is called an elementary set and forms a basic granule of knowledge about the universe. Any union of elementary sets is referred to as a precise set; otherwise the set is rough. A fundamental advantage of RS theory is the ability to handle a category that cannot be sharply defined given a knowledge base [124]. Therefore, the RS theory is used in attribute selection, rule discovery and various knowledge discovery applications as data mining, machine learning and medical diagnoses [33].

The role of RS in the FTS modeling approach is discussed below.

  • For rule induction: In the FTS modeling approach, each fuzzy set carries the information of occurrence of the historic event in the past. So, if these fuzzy sets would not be handled efficiently, then important information may be lost. Therefore, after generating the intervals, the historical time series data set is fuzzified, and used to generate the rules. Sometimes, the number of fuzzy sets are very large, and the rules generation from these fuzzy sets become very tedious. For this purpose, Teoh et al. [150, 151] employ the concept of RS in the FTS modeling approach to generate rules from the various FLRs, because the RS [125] acts as a powerful tool for analyzing data and information tables. In this case, fuzzy sets are used to establish the FLRs, and then RS based rule induction technique (LEM2 algorithm) is employed to mine reasonable rules from the information table. The rules produced by RS rule induction method are in the form of “If-Then” by combining a condition value (\(\tilde{A}_i\)) with several decision values (\(\tilde{A}_j, \tilde{A}_k, \ldots , \tilde{A}_n\)). For example, these decision values can be represented with “Then” as follows:

    $$\begin{aligned} If\,\left( condition = \tilde{A}_i\right)\,Then\,\left( decision = \tilde{A}_j [S], \tilde{A}_k [S], \ldots , \tilde{A}_n [S]\right) \end{aligned}$$
    (14)

    Here, each \(S\) represents the rule support value. Following is an example of one-order rule generated from the FLRs as:

    $$\begin{aligned} If\,\left( condition = \tilde{A}_4\right) \,Then\,\left( decision = \tilde{A}_5 [3], \tilde{A}_7 [4], \tilde{A}_9 [1]\right) \end{aligned}$$
    (15)

    In Eq. 15, when the condition, \(\tilde{A}_4\), occurs, there are three possible decision values as: \(\tilde{A}_5\), \(\tilde{A}_7\) and \(\tilde{A}_9\). Here, each value in square brackets represents the different support value for each decision value. For example, \(\tilde{A}_5[3]\) indicates that there are three cases occurred for the fuzzy set \(\tilde{A}_5\) when the condition value is \(\tilde{A}_4\), and the total support value for this rule is 3. Similarly, the rest of the rules can be explained.

4.3 Evolutionary computing (EC)

EC is a collection of problem solving techniques that includes paradigms such as Evolutionary Strategies, Evolutionary Programs, Genetic Algorithms (GAs), and Genetic Programming (GP) [11]. GA concept was first proposed by Holland [62]. In the GA, a population consists of chromosomes and a chromosome consists of genes, where the number of chromosomes in a population is called the population size [96]. In the following, we briefly review the basic concept of GA from articles [58, 59, 142]:

  • Step 1. Create a random initial state. An initial population is created from a random selection of solutions (chromosomes).

  • Step 2. Evaluate fitness. A value for fitness is assigned to each solution depending on how close it actually is to solving the problem.

  • Step 3. Reproduce. Those chromosomes with a higher fitness value are more likely to reproduce offspring.

  • Step 4. Next generation. If the new generation contains a solution that produces an output that is close enough or equal to the desired answer then the problem has been solved. Otherwise, iterate the whole process with the new generation.

Particle swarm optimization (PSO) is a new algorithm of EC, which is applied to solve the bilevel programming problem [156]. To deal with complicated optimization problem, recently many researchers hybridized this optimization technique with the FTS modeling approach. In the following, we briefly review the basic concept of the PSO from articles [77, 100, 120].

The PSO algorithm was first introduced in article [48]. It is a population-based evolutionary computation technique, which is inspired by the social behavior of animals such as bird flocking, fish schooling, and swarming theory [49, 113, 114]. The PSO can be employed to solve many of the same kinds of problems as GAs [85]. The PSO algorithm is applied to a set of particles, where each particle has assigned a randomized velocity. Each particle is then allowed to move towards the problem space. At each movement, each particle keeps track of its own best solution (fitness) and the best solution of its neighboring particles. The value of that fitness is called “p-best”. Then each particle is attracted towards the finding of global best value by keep tracking the overall best value of each particle, and its location [152]. The particle which obtained the global fitness value is called “g-best”.

At each step of optimization, velocity of each particle is dynamically adjusted according to its own experience and its neighboring particles, which is represented by the following equations:

$$\begin{aligned} Vel_{id, t}\,=\,&\alpha \times Vel_{id, t} + M_1 \times R_{and} \times (PB_{id}-CP_{id, t})+\nonumber \\ &M_2 \times R_{and} \times (PG_{best}-CP_{id, t}) \end{aligned}$$
(16)

The position of a new particle can be determined by the following equation:

$$\begin{aligned} CP_{id, t}= CP_{id, t} + Vel_{id, t} \end{aligned}$$
(17)

where \(i\) represents the \(ith\) particle and \(d\) represents the dimension of the problem space. In Eq. 16, \(\alpha\) represents the inertia weight factor; \(CP_{id, t}\) represents the current position of the particle \(i\) in iteration \(t\); \(PB_{id}\) denotes the previous best position of the particle \(i\) that experiences the best fitness value so far (p-best); \(PG_{best}\) represents the global best fitness value (g-best) among all the particles; \(R_{and}\) gives the random value in the range of \([0,1]\); \(M_1\) and \(M_2\) represent the self-confidence coefficient and the social coefficient, respectively; and \(Vel_{id, t}\) represents the velocity of the particle \(i\) in iteration \(t\). Here, \(Vel_{id, t}\) is limited to the range \([-Vel_{max},Vel_{max}]\), where \(Vel_{max}\) is a constant and defined by users.

The role of EC in the FTS modeling approach is categorized below based on different functions.

  1. (a)

    For determination of optimal interval lengths: In the FTS modeling approach, GA is used to arrive optimal interval lengths using certain genetic operators. In this case, some chromosomes are defined as the initial population based on the number of intervals, where each chromosome consists of genes. Initially each chromosome is randomly generated by the system. Then, the system randomly selects chromosomes and genes from the population to perform the crossover and mutation operations, respectively. The whole process is repeated until optimal interval lengths are achieved. The achievement of optimality can be measured with the performance measure parameters (refer to Sect. 6), such as AFER, MSE, etc. Based on this concept, researchers in articles [25, 26] presented the methods for forecasting the enrollments by hybridizing GA technique with the FTS modeling approach. However, the basic difference between the models presented in [26] and [25] is that first model is based on high-order FLRs, whereas second model is based on first-order FLRs. Similar to above approach, Lee et al. [96, 97] presented new methods for temperature and the TAIFEX forecasting based on two-factors high-orders FLRs.

  2. (b)

    For finding best intervals using PSO: The main downside of FTS forecasting model is that increase in the number of intervals increases accuracy rate of forecasting, and decreases the fuzziness of time series data sets [133]. Recently, many researchers [65, 66, 92, 93] show that appropriate selection of intervals also increases the forecasting accuracy of the model. Therefore, in order to get the optimal intervals, they used PSO algorithm in their proposed model [65, 66, 92, 93]. They signify that the PSO algorithm is more efficient and powerful than the GA as applied by [26] in selection of proper intervals. The basic concept of FTS-PSO hybridized model is explained below. Let \(n\) be the number of intervals, \(x_{0}\) and \(x_{n}\) be the lower and upper bounds of the universe of discourse \(U\) on historical time series data set \(D(t)\), respectively. A particle is an array consisting of \(n-1\) elements such as \(x_{1}\), \(x_{2}\), \(\ldots\), \(x_{i}\), \(\ldots\), \(x_{n-2}\) and \(x_{n-1}\), where \(1 \le i \le n-1\) and \(x_{i-1} < x_i\). Now based on these \(n-1\) elements, define the \(n\) intervals as \(I_1=(x_0, x_1]\), \(I_2=(x_1, x_2]\), ..., \(I_i=(x_{i-1}, x_i]\), ..., \(I_{n-1}=(x_{n-2}, x_{n-1}]\) and \(I_{n}=(x_{n-1}, x_{n}]\), respectively. Now, in Fig. 6, each \(x_{1}\), \(x_{2}\), \(\ldots\), \(x_{i}\), \(\ldots\), \(x_{n-2}\) and \(x_{n-1}\) represents the position of the particle at the corresponding interval \(I_1=(x_0, x_1]\), \(I_2=(x_1, x_2]\), ..., \(I_i=(x_{i-1}, x_i]\), ..., \(I_{n-1}=(x_{n-2}, x_{n-1}]\) and \(I_{n}=(x_{n-1}, x_{n}]\), respectively. In case of movement of a particle from one position to another position, i.e., from \(x_1\) to \(x_2\), the elements of the corresponding new array always require to be adjusted in an ascending order such that \(x_1 \le x_2 \le \ldots x_{n-1}\). In this process, the FTS-PSO hybridized model allows the particles to move other positions based on Eqs. 16 and 17, and repeats the steps until the stopping criterion is satisfied or the optimal solution is found. If the stopping criterion is satisfied, then employ all the FLRs obtained by the global best position (g-best) among all personal best positions (p-best) of all particles. Based on the similar concept, Singh and Borah [1] introduced a new FTS-PSO hybridized model that can deal with M-factors time series data sets. The main difference between the existing models [92, 93] and the Singh and Borah [1] model is the procedure for handling the intervals based on their importance. Singh and Borah [1] model also incorporates more information in terms of observations, which are represented in terms of FLRs. These FLRs are later employed for defuzzification operation based on a technique proposed in the article.

  3. (c)

    For determination of membership values using PSO: The PSO technique is first time employed by the researcher Aladag et al. [4] to obtain the optimal membership values of the fuzzy sets in the fuzzy relationship matrix “R” (refer to Eq. 11). In this approach, first FCM clustering algorithm is used for fuzzification phase of time series data set. Then, Eq. 11 is used to compute the forecasted values.

Fig. 6
figure 6

The graphical representation of particle

4.4 Other hybridization approaches

To improve the forecasting accuracy, some researchers also hybridize Hidden-Markov model (HMM), Adaptive expectation (AE), Intuitionistic fuzzy set (IFS) and statistical linear models with the FTS modeling approach. Related works corresponding to hybridization of these techniques with the FTS modeling approach are discussed next.

  • FTS with HMM: Sullivan and Woodall [147] proposed the HMM based model by using conventional matrix multiplication to minimize the overhead of computation time in deriving the FLRs in Song and Chissom’s model [144146]. Similarly, Hsu et al. [64] applied a fuzzy Markov relationship matrix to perform forecasting. However, their applications are limited to forecast the price limit and trading volume difference of Taiwan weighted stock index (TWSI). Researchers in articles [108] and [43] extended the work of [147], and proposed a novel stochastic forecasting model for the FTS modeling approach. This model is based on the HMM in which the FLRs are formulated as state transitions so that it can handle two-factors forecasting problems.

  • FTS with AE: The AE is a post defuzzification operation used to enhance the forecasting accuracy. In these articles [3, 35, 36, 38, 118], authors employ the following adaptation equations:

    $$\begin{aligned} Adapted (t+1)=Actual (t-1)+\alpha \times \left( Forecast (t)-Actual (t-1)\right) \end{aligned}$$
    (18)

    Chen et al. [31] proposed a fibonacci based AE method, which can be represented as:

    $$\begin{aligned} Adapted (t+1)=\,&Actual (t)+\alpha \times \left( Forecast (t+1)-Actual (t)\right) + \nonumber \\ &\beta\times \left( Forecast (t)-Actual (t-1)\right) \end{aligned}$$
    (19)

    Chen et al. [32] extended the AE model to derive a multi-period AE model based on the following Eq. 20:

    $$\begin{aligned} Adapted (t+1)=Actual (t)+ \sum _{i=1}^{k}h_i\times \varepsilon _i \end{aligned}$$
    (20)

    Tsaur and Kuo [154] obtained the forecasting value using the proposed adaptive FTS model as:

    $$\begin{aligned} Adapted (t)=\alpha \times Actual (t)+ (1-\alpha )\times Adapted (t-1) \end{aligned}$$
    (21)

    In Eqs. 18 and 19, \(\alpha\) and \(\beta\) represent adapted parameters. In Eq. 20, \(\varepsilon _i\) is the ith period of forecast error, and \(h_i\) is the adaptation parameter for \(\varepsilon _i\).

  • FTS with IFS: A significant features of IFS [5] is to assign to each element a membership degree and a nonmembership degree. Therefore, it can be regarded as a powerful tool to deal with uncertainty and vagueness in real applications. Based on hybridization of IFS and FTS, Joshi and Kumar [81] propose a novel computational model of forecasting. In the proposed method, degree of nondeterminacy is used to establish the FLRs. The time series data are fuzzified on the basis of degree of nondeterminacy in the IFSs.

  • FTS with statistical linear models: Egrioglu et al. [52] introduced a new hybrid model based on FTS, SARIMA and ANN. In the first phase of the proposed method, the best SARIMA model for the crisp time series is determined using Box-Jenkins method. In the second phase, the parameters and order of the proposed model, which is called partial high order bivariate FTS forecasting model, is found dependent upon the inputs of determined by SARIMA model. Then, FLRs are established using ANN.

    Wong et al. [164] proposed two forecasting methods, viz., traditional time series method (ARIMA model and VARMA model) and FTS Method (two-factor model, Heuristic model, and HMM) for the forecasting problem. Their comparison studies show that the ARIMA model comparatively got smaller forecasting errors for longer period of data set. However, for short period of data, forecasting accuracy of FTS model is higher. In comparison of forecasting accuracy between one variable and two variable models, the HMM with one variable performs better forecasting than two variables model.

    Wang in article [158] presented the comparison studies of two forecasting methods, viz., ARIMA time series method and the FTS method. Based on the FTS method, three models referred to as Factor model, Heuristic model, and the HMM, are designed. Comparison studies show that the ARIMA model has the forecast advantage, with little prediction errors when the test period is lengthy. On the other hand, if the test period is relatively shorter, the FTS model has been proved to be more effective than the ARIMA model. Overall analysis shows that the Heuristic model has the lowest prediction error, followed by the HMM.

5 Financial forecasting and type-2 FTS models

The application of FTS in financial forecasting [102] has attracted many researchers’ attention in the recent years. In recent years, many researchers focus on designing the models for the TAIEX [42, 67, 161, 173] and the TIFEX [4, 6, 8, 92] forecasting. Their applications are limited to deal with either one-factor or two-factors time series data sets. However, forecasting accuracy of financial data set can be improved by including more observations (e.g., close, high, and low) in the models. In Type-2 FTS modeling approach, observation that is handled by Type-1 FTS model can be termed as “main-factor/Type-1 observation”, whereas observations that are handled by Type-2 FTS model can be termed as “secondary-factors/Type-2 observations”. Later, both these observations are combined together to take the final decision. But, due to involvement of Type-2 observations with Type-1 observation, massive FLRGs are generated in Type-2 model. For this reason, Type-2 FTS model suffers from the burden of extra computation. Therefore, most of the researchers still use to prefer Type-1 FTS modeling approach for forecasting. But, as far as accuracy of forecasting is concerned, Type-2 FTS models produce better result than Type-1 FTS models. Basic steps involve in Type-2 FTS modeling approach that can deal with multiple observations together are presented in Algorithm 1.

figure a

Contributions of various researchers in Type-2 FTS models are presented below:

  1. (a)

    Huarng and Yu [71] model: This model first time employs the Type-2 FTS concept in financial forecasting (TAIEX) by considering close, high, and low observations together. In this model, they suggested some improvement in Algorithm 1 as: (1) Introduction of union (\(\vee\)) and intersection (\(\wedge\)) operators. This operators are applied in Step 8 of Algorithm 1. Both these operators are used to include Type-1 and Type-2 observations, and (2) For defuzzification operation, they employ Principal 1 and Principal 2 (as discussed in Sect. 3) in Step 9 of Algorithm 1.

  2. (b)

    Bajestani and Zare [9] model: This model is the enhancement of the model proposed by Huarng and Yu [71]. In this model, researchers employ the four changes as: (1) Using triangular fuzzy set with indeterminate legs and optimizing these triangular fuzzy sets. This improvement is applied in Step 3 of Algorithm 1, (2) using indeterminate coefficient in calculating Type-2 forecasting. This improvement is applied in Step 9 of Algorithm 1, (3) using center of gravity defuzzifier. This improvement is applied in Step 9 of Algorithm 1, and (4) using 4-order Type-2 FTS. This improvement is applied in Step 5 of Algorithm 1.

  3. (c)

    Lertworaprachaya et al. [101] model: Based on articles [71, 137], a novel high-order Type-2 FTS model is proposed in [101]. This model is divided into two parts: high-order Type-1 FTS forecasting and Type-2 FTS forecasting. The high-order Type-1 FTS model is employed to define the FLRs. This improvement is suggested in Step 5 of Algorithm 1. The high-order FLRs can be defined based on Definition 6. Then the rule in the high-order Type-1 FTS is used in Type-2 FTS forecasting.

  4. (d)

    Singh and Borah [1] model: This Type-2 FTS model can utilize multiple observations together in forecasting, which was the limitation of previous existing Type-2 FTS models. This model suggested the following changes in Algorithm 1 as: (1) Utilize the PSO in Step 2 of Algorithm 1 to adjust the lengths of intervals in the universe of discourse that are employed in forecasting, without increasing the number of intervals, and (2) introduce two new operators \(\cup\) and \(\cap\), and apply them on FLRGs of Type-1 and Type-2 observations, and obtain the fuzzified forecasting data. This improvement is suggested in Step 8 of Algorithm 1. For these two improvement, accuracy rate of this model is better than various existing FTS models [18, 19, 71, 135, 171].

6 Performance measure parameters

To assess the performance of the time series forecasting models (especially FTS models), researchers use numerous performance measure parameters, such as \(AFER\), \(MAPE\), \(MSE\), \(RMSE\), \(\bar{A}\), \(SD\), \(U\), \(TS\), \(DA\), \(\delta _r\), \(R\), \(R^2\), \(PP\), etc. All these parameters and their statistical significance are presented in Table 1. In this table, each \(F_i\) and \(A_i\) is the forecasted and actual value of day/year \(i\), respectively, and \(N\) is the total number of days/years to be forecasted.

Table 1 Performance measure parameters and its statistical significance

7 Classification of FTS models based on input variables

Based on the number of input variables, FTS models can be classified into either one-factor or M-factors. Each one-factor model employs only one input variable, two-factors model employs only two inputs variable, and so on. However, there are some models that employ M-factors. In Tables 2 and 3, we summarize the detail of input variables choices (in the second column) for the several FTS forecasting models. In these tables, we also present the list of articles (in the first column) in which researchers compare their forecasting accuracies with the various FTS models (listed in the third column).

Table 2 Categorization of one-factor FTS models based on input data sets

[Discussion] In Table 2, a total of 103 articles are cited. \(44\,\%\) of the cited articles [see Fig. 7 (left)] use the university enrollment data set in one-factor FTS models. About \(22\%\) of the cited articles [(see Fig. 7 (left)] use stock index prices as input data, i.e., the daily closing price and its dependent variables. Mostly used stock index prices are TAIFEX and TAIEX. Some researchers use TAIEX in their one-factor FTS models [72, 150, 151], whereas some researchers use TAIFEX in their one-factor FTS models [4, 92]. About \(34\,\%\) of the reviewed articles [(see Fig. 7 (left)] concentrate on using different time series data as inputs. For example, in article [52], seasonal sulfur dioxide time series data is used; in articles [158, 164], TE data is used as inputs.

Table 3 Categorization of M-factors FTS models based on input data sets

In Table 2, a total of 48 articles are cited. In 2000, first two-factors FTS model was proposed in article [27]. In this model, the main-factor is DTDST, whereas the second-factor is DCDDST. Later, many researchers provide several solutions to enhance the predictability of the proposed model [27], and contribute \(31\,\%\) of the cited articles [see Fig. 7 (right)]. In 2001, Huarng [70] first time uses both TAIEX and TAIFEX data sets in its two-factors model. In this model, Huarng forecasts TAIFEX by employing TAIFEX as the main-factor and TAIEX as the second-factor. Later, researchers contribute \(17\,\%\) of the cited articles [see Fig. 7 (right)] for TAIFEX forecasting, by considering TAIEX as the second-factor. In 2009, Chen and Chen [16] first time designed M-factors FTS model for TAIEX forecasting. In this model, remaining factors are either the Dow Jones, the NASDAQ, the \(M_{1b}\), or their combination. Based on similar data sets, researchers design several models, which contribute \(10\,\%\) of the cited articles [see Fig. 7 (right)].

Fig. 7
figure 7

Time distributions of the referred articles based on one-factor (left) and M-factors (right) data sets

8 Existing unsolved problems and research trends

The FTS modeling approach is an interminable and an arousing research domain that has continually increased challenges and problems over the last decade. In this section, we present various research problems and trends associated with the FTS modeling approach. These discussions are based on the recent research articles published by Singh and Borah [1, 133135].

Problem Definition 1 (lengths of intervals). For fuzzification of time series data set, determination of lengths of intervals of the historical time series data set is very important. In case of most of the FTS models [18, 69, 74, 144, 146], the lengths of the intervals were kept the same. No specific reason is mentioned for using the fixed lengths of intervals.

Problem Definition 2 (ignorance of repeated FLRs). After generating the intervals, the historical time series data sets are fuzzified based on the FTS theory. Each fuzzified time series values are then used to create the FLRs. Still most of the existing FTS models ignore repeated FLRs. To explain this, consider the following four FLRs at four different time functions, \(F(t=1,2,3,4)\) as:

$$\begin{aligned} F(t&=4)\quad\tilde{A}_{i} \rightarrow \tilde{A}_{i}, \nonumber \\ F(t&=3)\quad\tilde{A}_{k} \rightarrow \tilde{A}_{j}, \nonumber \\ F(t&=2)\quad\tilde{A}_{i} \rightarrow \tilde{A}_{i}, \nonumber \\ F(t&=1)\quad\tilde{A}_{i} \rightarrow \tilde{A}_{j}. \end{aligned}$$
(22)

In Eq. 22, three FLRs at functions \(F(t=4)\), \(F(t=2)\) and \(F(t=1)\) have the same fuzzy set, (\(\tilde{A}_{i}\)), in the previous state. Hence, these FLRs can be represented in the following FLRG as:

$$\begin{aligned} \tilde{A}_{i} \rightarrow \tilde{A}_{i}, \tilde{A}_{j}. \end{aligned}$$
(23)

Since existing FTS models do not consider the identical FLRs during forecasting. They simply use the FLR as shown in Eq. 23 by discarding the repeated FLRs in the FLRG.

Problem Definition 3 (equal importance to FLRs). In existing FTS models, each FLR is given equal importance, which is not an effective way to solve real time problems. Because, each fuzzy set in the FLR represents various uncertainty involved in the domain. According to Yu [171], there are two possible ways to assign weights, i.e., (1) assign weights based on human interpretation, and (2) assign weights based on their chronological order. Assignment of weights based on human-knowledge is not an acceptable solution for real world problems as human-interpretation varies from one to another. Moreover, human-interpretation is still an issue which is not understood by the computational scientists [134]. Therefore, Yu [171] considered the second way, where all the FLRs are given importance based on their chronological order. In this scheme, weight for each FLR is determined based on their sequence of occurrence.

Problem Definition 4 (utilization of first-order FLRs). Most of the previous FTS models [18, 34, 74, 144146] use first-order FLRs (see Eq. 4) to get the forecasting results. The first-order FLRs based models use only previous one day/year fuzzified value for forecasting. Hence, the models which employ the first-order FLRs, are unable to capture more uncertainty reside in the events.

Problem Definition 5 (utilization of current state’s fuzzy sets). Previous FTS models [144146] utilize the current state’s fuzzified values (i.e., right-hand side of the FLR, see Eq. 4) for forecasting. This approach, no doubt, improves the forecasting accuracy, but it degrades the predictive skill of the FTS models, because predicted values lie within the sample.

Problem Definition 6 (defuzzification operation). In \(1996\), Chen [18] used simplified arithmetic operations for defuzzification operation by avoiding the complicated max-min operations (see Eq. 11), and their method produced better results than Song and Chissom models [144146]. Most of the existing FTS models (e.g., refer to articles [50, 70, 73, 105]) have used Chen’s defuzzification method [18] to acquire the forecasting results. However, forecasting accuracy of these models are not good enough.

9 Future directions

In this article, we reviewed various models based on the FTS modeling approach. However, this study deserve further studies, therefore this section is dedicated to confer a few significant future works closely related to our study.

  • In observation of certain events, recorded time series values do not only depend on previous values but also on current values. Therefore, the representation of FLR in terms of high-order is a worthy idea in the FTS modeling approach [34]. However, defining FLR in high-order is more complicated and computationally more expensive than first-order [3]. Therefore, many researchers employ ANN based method to define FLRs in high-order [2, 52, 53, 55]. But, still there is no any method suggested to find out the optimal order of the high-order FLRs. Therefore, there is a need to put more stress on development of new method that can automatically determine the optimal order of the high-order FLRs to deal the forecasting problems.

  • The multivariate FTS models are based on the prior assumption that one-factor always dependent on other factors. Therefore, in order to fuzzify all these factors together, it is very much essential to extract the hidden information from the data, and then try to explore the membership values of each datum. To tackle this problem, many researchers use a FCM technique [4, 21, 39, 109111]. While some researchers [7, 10, 29, 30, 50, 73, 135] introduce unsupervised clustering techniques that determine the membership values efficiently. In spite of all these developments, there is the need for future research on developing more robust data clustering algorithm for multivariate FTS model.

  • FTS model was introduced in an article [144] try to predict the future values by capturing the past uncertainties. For example, how much difference between the past values and current values will be considered as very low, low, medium, high and very high, are determined based on human perception. The FTS models developed so far can only predict the future values, but they don’t consider the change in trend associated with the time series values in terms of upward, downward or unchanged. 13 years later, some researchers [17, 30, 35, 38, 131] considered these trends, and proposed trend-based FTS models. However, these trend-based models are very few in numbers, so still need some attention in this approach. Therefore, in future more robust trend-based models can be expected from the researchers.

  • The study reflects that hybridized models are more robust than conventional FTS models. However, difficulties arise in determining the applications of such techniques in suitable phase. Therefore, there is the need to develop a model selection technique that can effectively make the use of both input variables and knowledge, and fulfill the forecasting objectives.

  • Most of the existing FTS models have used Chen’s defuzzification method [18] to acquire the forecasting results. However, forecasting accuracy of these models are not good enough. Researchers also introduce various defuzzification techniques. In spite of these contributions, there is a future scope to propose new defuzzification technique. For example, one can employ entropy [148, 163] for the defuzzification purpose. For this purpose, we need to first obtain the entropy for each of the intervals based on the frequencies of the intervals. Then, apply the following steps for the defuzzification operation as:

    • If forecasting day is \(Y(t)\), then obtain the fuzzified value for day \(Y(t-1)\) as \(\tilde{A}_i (i=1,2,3,\ldots ,n)\).

    • Obtain the FLRG whose the previous state is \(\tilde{A}_i (i=1,2,3,\ldots ,n)\), and the current state is \(\tilde{A}_k, \tilde{A}_s, \ldots , \tilde{A}_n\), i.e., the FLRG is in the form of \(\tilde{A}_i \rightarrow \tilde{A}_k, \tilde{A}_s, \ldots , \tilde{A}_n\).

    • Find the intervals where the maximum membership value of the fuzzy sets \(\tilde{A}_k, \tilde{A}_s, \ldots , \tilde{A}_n\) occurs, and let these intervals be \(a_k, a_s, \ldots , a_n\), respectively. All these intervals have the corresponding mid-points \(M_k, M_s, \ldots , M_n\), and the corresponding entropies \(H_k, H_s, \ldots , H_n\), respectively.

    • Apply the following formula to calculate the forecasted value for day, \(Y(t)\):

      $$\begin{aligned} Forecast\left( t \right) = M_k &\times \left[ \frac{H_k}{\sum _{i=1}^{n} H_k+H_s+\ldots +H_n}\right] + \nonumber \\ M_s &\times \left[ \frac{H_s}{\sum _{i=1}^{n} H_k+H_s+\ldots +H_n}\right] +\nonumber \\&\qquad\qquad\qquad\ldots \nonumber \\ M_n &\times \left[ \frac{H_n}{\sum _{i=1}^{n} H_k+H_s+\ldots +H_n}\right] . \end{aligned}$$
      (24)
  • In further research work, one can present a model based on the hybridization of FTS and Grey system theory [82] to predict the time series values. The predicted results obtained can be analyzed based on various statistical parameters as discussed in Sect. 6. The performance of this model can also compared with various statistical models (http://www.spss.com.hk/statistics/).

10 Conclusion and discussion

From 1994 onwards, numerous time series forecasting models have been proposed based on the FTS modeling approach. Due to the uncertain nature of time series, scope of extensive applications in this domain raised simultaneously with the development of new algorithms and architectures. The FTS modeling approach is currently applied to a diverse range of fields from economy, population growth, weather forecasting, stock index price forecasting to pollution forecasting, etc. Various aspects of complexities arise in this research domain, if the number of factors in time series data sets is large. These complexities can be evolved in terms of (a) determination of length of intervals, (b) establishment of FLRs between different factors, and (c) defuzzification of fuzzified time series values.

Present research in the FTS modeling approach mainly aims at designing algorithms for discretization of time series data set, rule generation from the fuzzified time series values, proposing techniques for defuzzification operation, and designing various hybridized based architectures for resolving complex decision making problems.

SC techniques comprise of ANN, RS, EC, and their hybridizations, have recently been employed to solve FTS modeling problems. They endeavor to provide us approximate results in a very cost effective manner, thereby reducing the time complexity. In this survey, a categorization has been presented based on utilization of different SC techniques with the FTS modeling approach along with basic architectures of different hybridized based FTS models.

Fuzzy sets are the oldest component of SC, which is known for representation of real time or uncertain events in a linguistic manner, and can take decisions very faster. ANNs are especially used in discovering the rules, and can establish a linear association between the inputs and outputs. RSs is mainly employed for extracting hidden patterns from the data in terms of rules. EC provides efficient search algorithms to select based intervals from the discretized time series data set, based on some evaluation criterion.

FTS-ANN hybridization exploits the features of both ANN and fuzzy sets in establishment of FLRs/linguistic rules, data discretization, and defuzzification of fuzzified time series data set. FTS-RS hybridization uses the features of both RS and fuzzy sets in discovering meaningfull rules from the fuzzified time series data set, thereby employing these rules in defuzzification operation. FTS-EC hybridization utilizes the characteristics of both EC and fuzzy sets in the determination of optimal interval lengths of the discretized time series data set, which are further used to represent time series data set in terms of fuzzy sets/linguistic terms. From this survey, it is obvious that the research scope in FTS will be increased in the near future for its flexibility in representing real life problems in a very natural way. This study also describes elaborately different phases of the FTS modeling approach. Various research issues and challenges in the FTS modeling approach are presented in the subsequent section. All these inclusions may help the researchers to identify: (a) what are the problems in the FTS modeling approach?, (b) how to resolve all these problems using heuristics approach?, and (c) how to employ different SC methodologies in the FTS modeling approach to improve its efficiency?