1 Introduction

In quite a many real-world applications related to science, technology, stock price forecasting, university enrollments, weather forecasting, etc., prediction may often play a pre-eminent part as it can save peoples’ precious time and imperative measures can well be taken prior getting some cynical results. Many widely used crisp and fuzzy set theory-based methods for predictions are available in literature (Aladag et al. 2008; Aliev et al. 2008; Bulut 2014; Bulut et al. 2012; Chen et al. 2013; Chen and Tanuwijaya 2011; Chen 1996; Chatterjee and Roy 2014a, b; Duru 2010, 2012; Duru and Bulut 2014; Dunn 1973; Huarng 2001a, b; Huarng and Yu 2005, 2006; Mamdani 1977; Ross 2010; Song and Chissom 1993a, b, 1994; Tanaka 1996; Tseng et al. 2001; Tsaur 2008; Khashei 2012; Zadeh 1975); of them the latter one has the capability of handling the uncertainty efficiently. Among several fuzzy set theory-based prediction techniques, the models exploiting the potentiality of fuzzy time series are the main subject of interest of the present paper as it has huge application in different aforementioned areas.

Fuzzy time series, based on the concept of fuzzy reasoning (Rabiei et al. 2014) proposed by Mamdani (1977), was first introduced by Song and Chissom (1993a, b, 1994), following which many variations and versions of fuzzy time series and their rigorous applications were discussed and published in many papers by several researchers (Aladag et al. 2008; Chen et al. 2013; Chen and Tanuwijaya 2011; Chatterjee and Roy 2014a, b; Chen 1996; Duru and Bulut 2014; Dunn 1973; Huarng 2001a, b; Huarng and Yu 2005, 2006; Mamdani 1977; Ross 2010; Song and Chissom 1993a, b, 1994; Tanaka 1996; Tseng et al. 2001). In their extensive study Song and Chissom (1993a, b, 1994) have mainly developed both the time invariant and time variant time series models and explained them with the help of some real-life examples, which are, however, improved by Chen (1996) and Chen and Tanuwijaya (2011) for the development of certain prediction algorithm. Another significant improvisation was made by Aladag et al. (2008) by using the feed forward neural network to define fuzzy relations in higher order fuzzy time series. Apart from this, several modifications of their work are found in literature. In a recent study, Chen and Tanuwijaya (2011) have adopted some relevant steps to make the interval size variable whereas, no improvisation was made to generate different membership values of the elements apart from 0, 0.5, and 1 as also the influences of the other factors are still not considered in the defuzzification process.

However, most of the existing extensively utilized fuzzy forecasting methods based on fuzzy time series used the static length of intervals, i.e., the same length of intervals, however, Huarng (2001a, b) pointed out that the lengths of intervals will greatly affect forecasting results, where the drawback of the static length of intervals being that the historical data are roughly put into the intervals, even if the variance of the historical data is not quite high. And the most important yet, the predictive accuracy of the existing fuzzy time series-based forecasting techniques is usually not satisfactory. Additionally, defuzzification process of the main factor does not consider the effects of the remaining secondary factors, which is, however, a major drawback of the existing widely used a forestated approaches. Furthermore, the membership distribution techniques employed by the aforementioned approaches suffer from some unrealistic assumptions and limitations in that the membership values can only be 0, 0.5, and 1. It is said to be an unrealistic assumption and limitation as, in many real-life applications, it is generally found that the membership of an element within a cluster (or interval) lies either in (0, 0.5) or (0.5, 1). In these cases, the predictive accuracy may be hampered if any of the membership values among 0, 0.5, and 1 is applied. Moreover, all the existing aforementioned fuzzy forecasting algorithms mainly utilize their own clustering techniques that may not be able to partition all the data sets correctly as the nature of the data sets change with the ever changing circumstances, due to which, frequently, it can be found that the data sets may contain categorical entries (true–false type), however, in the remaining cases it may contain numeric digits (real, integers) or mixture of numeric as well as categorical types (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979). Moreover, due to the changes in some statistical properties, viz., coefficient of variation, correlation, etc., the nature of the data sets may vary (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979). As a result, it is not possible for a single clustering algorithm to correctly partition (i.e., the partitions containing almost similar types of data) all the data sets. Consequently, the accuracy of the corresponding forecasting algorithm decreases significantly and it becomes data-dependent (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979), as the clustering algorithm used by them may be suitable for certain types of data sets. One possible solution to this problem is to search the particular clustering algorithms which are most suitable for the data sets under consideration. Apart from this, all the existing and frequently used fuzzy time series-based forecasting algorithms cannot check whether the data set is stationary or not, which can be a possible reason behind their unsatisfactory predictive accuracy. Hence, in the present paper, initially, the author checks whether the data set is stationary or not. If it is found to be stationary, continue with the proposed forecasting algorithm. Otherwise, the non-stationarity, trend components, etc. of the data set have been removed and next, the author selects the suitable clustering algorithm by checking the DVI (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979) index of the generated clusters which are frequently used to evaluate their quality.

From the foregoing survey of literature, it evinces itself clearly that the existing fuzzy time series-based forecasting techniques have the aforementioned limitations, which require certain improvements in the modeling technique for better predictive accuracy. Motivated by the aforestated comparative study, the present author improvised the multivariate fuzzy ‘if-then’ rule-based model, incorporating both clustering (overlapping and non-overlapping) and prediction, by making the interval size variable (due to the selected suitable clustering algorithm) (described in Sect. 3), varying the membership value from 0 to 1 in contrast to the other researchers who have used only three values, viz., 0, 0.5, and 1, employing the influences of different factors at the time of defuzzification and prediction, and better accuracy for predicting different instances of the data set. The membership distribution technique adopted by the proposed fuzzy prediction algorithm is mainly based on some well-defined functions that can generate real numbers lying between 0 and 1, making it more realistic than others. Moreover, the defuzzification process acquired by the proposed algorithm can exploit the influences of different factors on the very factor that is to be predicted. Again, the selected suitable clustering algorithm (that depends on the nature of the data set) perfectly partition the data set and as a result, the forecasting accuracy increases (Huarng 2001a). Consequently, this improvised model as such is capable to overcome the aforesaid drawbacks.

Finally, three different examples related to three different areas of science and technology were cited, that demonstrate the potentiality and the applicability of the proposed algorithm over a vast domain. For the first example, the failure data, collected from the logs (access and error logs) of www.ismdhanbad.ac.in, the official website of ISM Dhanbad, India, was used and the above findings were satisfactorily validated. Quite on a contrary, the next example is related to the very popular oil agglomeration process for the beneficiation of coal fines (coal washing technique), a heavily used technique in the coal industries where the proposed algorithm proves its better predictive accuracy. The corresponding data were collected from the CIMFER (a CSIR Lab, Govt. of India), Dhanbad, India. Consequently, it can be concluded that the proposed algorithm has a vast area of implementation. The remaining example is related to a financial data collected from the Ministry of Statistics and Program Implementation, Govt. of India and the above findings are satisfactorily validated again. Next, the outcomes of the proposed algorithm were compared with various fuzzy and statistical techniques. Moreover, Chen and Tanuwijaya (2011) method was applied on the aforementioned examples by replacing its clustering technique with \(c\)-means (Bezdek et al. 1984) and \(k\)-means (Hartingan and Wong 1979) algorithms, respectively to demonstrate the influence of clustering on the predictive accuracy of the fuzzy time series-based forecasting models. Additionally, the proposed algorithm was validated by the ‘Chi-square test of goodness of fit’.

Before proceeding to develop the fuzzy logic-based clustering and prediction algorithm, it would be apt for clarity to describe the organizational structure of the paper by discussing its important components in different sections and subsections. This includes review fuzzy time series in Sect. 2; development of the proposed algorithm in Sect. 3; discussion of the test results in Sect. 4. Towards the end, the important findings and conclusions of the present work are encapsulated in Sect. 5. Each of these sections is dealt with herein under in the paragraphs that follow:

2 Review of fuzzy time series

This subsection presents a review of the fuzzy time series.

Fuzzy time series-based on the concept of fuzzy reasoning proposed by Zadeh (1975), Mamdani (1977), was first introduced by Song and Chissom (1993a, b, 1994), following which many variations and versions of fuzzy time series and their rigorous applications were discussed and published in many papers by several researchers. Fuzzy time series is basically defined in the following way:

Definition 1

(Fuzzy time series) Assuming \(Y(t),(t=1,2\ldots )\) is the subset of \({\mathbb {R}}^1\) (one-dimensional Euclidian space), which is the universe of discourse where fuzzy subsets \(m_i (t), ({i=1,2\ldots })\) are defined and let \(F(t)\) be a collection of \(m_i(t),({i=1,2\ldots }),\) then, \(F(t)\) is called a fuzzy time series defined on \(Y(t)(t=1,2\ldots )\). Here, \(F(t)\) is regarded as a linguistic variable and \(m_i (t),({i=1,2\ldots })\) can be viewed as possible linguistic values of \(F(t)\), where \(m_i (t),({i=1,2\ldots })\) are represented by fuzzy sets.

From this, it can be observed that \(F(t)\) is a function of time \(t,\) i.e., the value of \(F(t)\) being different at different times. According to Mamdani (1977) and Chen and Tanuwijaya (2011), Chen et al. (2013), if there exists a fuzzy relationship \(R(t,t-1)\), such that \(F(t)=F(t-1)\circ R(t,t-1)\), where ‘\(\circ \)’ is the fuzzy Max–Min composition operator, then \(F(t)\) is caused by \(F(t-1)\). The relationship between \(F(t)\) and \(F(t-1)\) is denoted by: \(F(t)\rightarrow F(t-1)\). For example, for \(t=2013\), the fuzzy relationship between \(F({t-1})\) and \(F(t)\) is given by \(F(2012) \rightarrow F(2013)\). It is to be noted that the right-hand side of the fuzzy relation represents the future fuzzy set (forecast), its crisp counterpart being denoted as \(Y(t)\).

It is very much significant to note that the main difference between fuzzy and conventional time series lies in the fact that the values of the former are fuzzy sets, while the values of the latter are the real numbers. As a corollary, it can be roughly assumed that a fuzzy set is a class with fuzzy boundaries.

Definition 2

(n order fuzzy relations) If \(F(t)\) be a fuzzy time series and if \(F(t)\) is caused by \(F({t-1}),F({t-2})\ldots F({t-n}),\) i.e., the next state is caused by the current and its \(n\) previous states, then this fuzzy logical relationship (FLR) would be represented by:

$$\begin{aligned} F({t-n}),\ldots F({t-2}),F({t-1})\rightarrow F(t) \end{aligned}$$

and is called the \(n\) order fuzzy time series. \(n\) order-based fuzzy time series models are referred to as the higher order models.

If for any time \(t, F(t)=F({t-1})\) and \(F(t)\) has only finite elements, then \(F(t)\) is called a time invariant fuzzy time series, otherwise, it is called a time variant fuzzy time series.

Different relevant examples of fuzzy time series were cited by Song and Chissom (1993a, b, 1994) as also by Chen and Tanuwijaya (2011), Chen et al. (2013).

3 The proposed algorithm

In this section, the proposed multivariate fuzzy forecasting algorithm has been developed. But before attempting to develop the algorithm, it would be appropriate to briefly touch upon the existing fuzzy forecasting techniques, after which the developed predictive algorithm will be compared to the latter’s for accuracy and predictability.

Most of the existing fuzzy forecasting techniques, in general, employ the following four steps (Aladag et al. 2008; Bulut 2014; Bulut et al. 2012; Chatterjee and Roy 2014a, b; Chen et al. 2013; Chen and Tanuwijaya 2011; Chen 1996; Duru 2012, 2010; Dunn 1973; Huarng 2001a, b; Huarng and Yu 2005, 2006; Mamdani 1977; Ross 2010; Song and Chissom 1993a, b, 1994; Tanaka 1996; Tseng et al. 2001; Zadeh 1975):

  • Step 1: Partitioning the universe of discourse into intervals,

  • Step 2: Fuzzifying the historical data,

  • Step 3: Building fuzzy logical relationship and obtaining fuzzy logical relationship groups, and

  • Step 4: Calculating the forecast output.

However, all the aforementioned fuzzy time series-based forecasting algorithms are not checking the stationarity of the data set and, as a consequence, the predictive accuracy of these algorithms reduces. With this in mind, in the present paper, the author has introduced a step called stationarity checking (Step 1) and using the above four steps as the basis, a novel and innovative fuzzy clustering and prediction algorithm is being attempted to be developed to enable one to forecast different instances of the data set. For reference, the flow chart of the proposed algorithm may be seen in the already depicted Fig. 1. This new algorithm has five steps, e.g., Step 1. Stationarity Checking; Step 2. Clustering; Step 3. Computation of different parameters of the proposed algorithm; Step 4. Distribution of membership; Step 5. Multivariate fuzzy forecasting algorithm. Now, the development of the proposed algorithm is being done as follows:

Fig. 1
figure 1

Flow chart of the proposed multivariate fuzzy clustering algorithm

Step 1: stationarity checking

Data points are often non-stationary or have means, variances and covariances that change over time (Lutkepohl 2005). Non-stationary behaviors can be trends, cycles, random walks or combinations of the three. Non-stationary data, as a rule, are unpredictable and cannot be modeled or forecasted (Lutkepohl 2005). The results obtained using non-stationary time series may be spurious in that they may indicate a relationship between two variables where one does not exist. To receive consistent, reliable results, the non-stationary data needs to be transformed into stationary data (Lutkepohl 2005). In contrast to the non-stationary process that has a variable variance and a mean that does not remain near, or returns to a long-run mean over time, the stationary process reverts around a constant long-term mean and has a constant variance independent of time.

However, the sad fact is that a lot of important real-time series are not even approximately stationary. For example, most the share market data fall in this category. For example, most the share market data falls in this category. Hence, to check the stationarity of the input series, initially, the Dickey–Fuller test (Lutkepohl 2005) has been conducted. If the series is stationary, then, simply the suitable clustering algorithm has been selected for partitioning the data set. Otherwise, the data set has been treated as the non-stationary time series (Lutkepohl 2005). The conventional approach is to try to separate time series like this into a persistent trend, and stationary fluctuations (or deviations) around the trend (Lutkepohl 2005),

$$\begin{aligned} Y_t =X_t +Z_t, \text{ i.e., } \text{ series } = \text{ fluctuations } + \text{ trend }. \end{aligned}$$

Since a constant can be added or subtracted to each \(X_t\) without changing whether theyare stationary, then it can be stipulated \(E({X_{t}})=0\), i.e., \(E({Y_{t}})=E({Z_{t}})\). In other situations, the decomposition might be multiplicative instead of additive, etc. (Lutkepohl 2005). Again, in case of multiple independent realizations \(Y_{i,t}\) of the same process, say \(m\) of them having same trend \(Z_t\), then the common trend can be found by averaging the time series:

$$\begin{aligned} Z_t =E({Y_{i,t}})\approx \mathop {\sum }\limits _{i=1}^m Y_{i,t} \end{aligned}$$

Multiple time series with the same trend do exist, especially in the experimental sciences (Lutkepohl 2005).

Once we have the fluctuations, and are reasonably satisfied that they’re stationary, we can model them like any other stationary time series. Of course, to actually make predictions, the trend needs to be extrapolated, which is a harder business described as follows:

3.1 Trend components

The problem with making predictions when there is a substantial trend is that it is usually hard to know how to continue or extrapolate the trend beyond the last datapoint. If we are in the situation where we have multiple runs of the same process, we can at least extrapolate up to the limits of the different runs. If we have an actual model which tells us that the trend should follow a certain functional form, and we have estimated that model, we can use it to extrapolate (Lutkepohl 2005).

3.2 Pure random walk \(({\text{ Y }_{t} =\text{ Y }_{{t}-1} +\upvarepsilon _{t}})\)

Random walk predicts that the value at time “\(t\)” will be equal to the last period value plus a stochastic (non-systematic) component that is a white noise, which means \(\upvarepsilon _{t}\) is independent and identically distributed with mean “0” and variance “\(\sigma ^2\)” (Lutkepohl 2005). Random walk can also be considered as a process integrated of some order, a process with a unit root or a process with a stochastic trend. It is a non-mean reverting process that can move away from the mean either in a positive or negative direction. Another characteristic of a random walk is that the variance evolves over time and goes to infinity as time goes to infinity and hence, a random walk cannot be predicted (Lutkepohl 2005).

3.3 Random walk with drift \(({\text{ Y }_{t} =\upalpha + \text{ Y }_{{t}-1} + \upvarepsilon _{t}})\)

If the random walk model predicts that the value at time “\(t\)” will equal the last period’s value plus a constant, or drift (\(\upalpha \)), and a white noise term (\(\upvarepsilon _{t})\), then the process is random walk with a drift. It also does not revert to a long-run mean and has variance dependent on time (Lutkepohl 2005).

3.4 Deterministic trend \((\text{ Y }_{\mathrm{t}} = \upalpha + \varvec{\beta }\mathbf {t}+\upvarepsilon _{{t}})\)

Often a random walk with a drift is confused for a deterministic trend. Both include a drift and a white noise component, but the value at time “\(t\)” in the case of a random walk is regressed on the last period’s value (\(\text{ Y }_{{ t}-1})\), while in the case of a deterministic trend it is regressed on a time trend (\(\beta t\)). A non-stationary process with a deterministic trend has a mean that grows around a fixed trend, which is constant and independent of time (Lutkepohl 2005).

3.5 Random walk with drift and deterministic trend \(({\text{ Y }_{{t}} = \upalpha + \text{ Y }_{{t}-1} + \varvec{\beta }\mathbf {t}+\upvarepsilon _{{t}}})\)

Another example is a non-stationary process that combines a random walk with a drift component (\(\upalpha \)) and a deterministic trend (\(\beta \)t). It specifies the value at time “\(t\)” by the last period’s value, a drift, a trend and a stochastic component (Lutkepohl 2005).

3.6 Seasonal components

Sometimes, it can be found that time series contain components which repeat, pretty exactly, over regular periods. These are called the seasonal components, after the obvious example of trends which cycle each year with the season. But they could cycle over months, weeks, days, etc... (Lutkepohl 2005). The decomposition of the process is thus

$$\begin{aligned} Y_t =X_t +Z_t +S_t, \end{aligned}$$

where \(X_t\) can be considered as the stationary fluctuations, \(Z_t\) is the long-term trend and \(S_t\) is the repeating seasonal component. If \(Z_t =0\) or equivalently if we have a good estimate of it and can subtract it out, \(S_t\) can be found by averaging over multiple cycles of the seasonal trend (Lutkepohl 2005). Assume that, the period of the cycle is T, then \(m=\frac{n}{T}\) number of full cycles can be found and \(S_t\) can be calculated as follows:

$$\begin{aligned} S_t \approx \frac{1}{m}\mathop {\sum }\limits _{j=0}^{m-1} Y_{t+jT}. \end{aligned}$$

This is because of the fact that, \(Z_t =0, Y_t =X_t +S_t\) and \(S_t\) is periodic, \(S_t = S_{t+T}\). Sometimes, it is necessary to know the overall trend present in the data. If there are seasonal components, they have to be subtracted before trying to find \(Z_t\). The detrending can be done as follows:

Let \(Y_t\) has the linear time trend as follows:

$$\begin{aligned} Y_t ={{\beta }_{0}} +\beta t+X_t \end{aligned}$$

with \(X_t\) stationary. Then, if the difference between successive values \(Y_t\) has been taken, the trend goes away:

$$\begin{aligned} Y_t -Y_{t-1} =\beta +X_t -X_{t-1}. \end{aligned}$$

Since, \(X_t\) is stationary, \(({\beta +X_t -X_{t-1}})\) is also stationary. However, if the first difference does not look stationary, then the other differences can be taken until the input series becomes stationary. In this way, the trend components can be removed from the data set. Similarly, applying the above procedure the random walk with or without a drift can be transformed to a stationary process (Lutkepohl 2005). Moreover, once \(({Y_{t+1}-Y_t})\) has been predicted, \(Y_t\) can be added to get \(Y_{t+1}\).

Step 2: clustering

(i) Initially, the suitable clustering algorithms [by checking the DVI (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979) indices of the generated clusters, given in Sect. 4] from the aforementioned classes apply to the data sets corresponding to the main (dependent variable) and the secondary factors (independent variables) to generate the variable-sized overlapping or non-overlapping clusters. In many cases, it can be found that different clustering algorithms are suitable for the data sets corresponding to the main and the secondary factors. Moreover, the number of clusters generated from the data sets of the main as well as the secondary factors may differ as it depends on the nature of the data set.

(ii) Next, the resulting clusters of the main and the secondary factors are related to that of the main factors for establishing the fuzzy logical relationships which has been discussed latter.

Step 3: computation of different parameters of the proposed algorithm

Some parameters of the proposed algorithm are now defined as below:

max\(\left[ i\right] \!,({ i\in \text{ Z }^+})\): Maximum element of the \(i\)th cluster.

min\(\left[ i\right] \!,({i \in \text{ Z }^+})\): Minimum element of the \(i\)th cluster.

$$\begin{aligned} \mathrm{mid}\left[ i\right] \!, ({i \in \text{ Z }^+})=0.5*({\mathrm{max}[i]+\mathrm{min}[i]}) \end{aligned}$$
$$\begin{aligned} \mathrm{mean}=\frac{1}{\mathrm{no.\,of\,clusters\,of\,the\,main\,factor}}\mathop {\sum }\limits _{i\in \mathrm{Z}^+} \mathrm{mid}\left[ i\right] \end{aligned}$$
(1)
$$\begin{aligned} \mathrm{sum}\_\mathrm{deviation}&= \frac{1}{\mathrm{no.\,of\,clusters\,of\,the\,main\,factor}}\nonumber \\&*\sqrt{\mathop {\sum }\limits _{i\in \mathrm{Z}^+}({\mathrm{mean}-\mathrm{mid}\left[ i\right] })^2} \end{aligned}$$
(2)
$$\begin{aligned}&\mathrm{mean}\_\mathrm{sec}\left[ j \right] \\&\quad =\mathop {\sum }\limits _{i\in \mathrm{Z}^+} \left( {\frac{\mathrm{mid\_sec}[j]}{\mathrm{no.\,of\,clusters\,of\,the}\,j\mathrm{th\,secondary\,factor}}}\right) \end{aligned}$$
$$\begin{aligned}&\mathrm{sum\_deviation\_sec}[j]\nonumber \\&\quad =\frac{1}{\mathrm{no.\,of\,clusters\,of\,the}\,j\mathrm{th\,secondary\,factor}}\nonumber \\&\qquad *\sqrt{\mathop {\sum }\limits _{i\in \mathrm{Z}^+} \left( {\mathrm{mean\_sec}\left[ j\right] -\mathrm{mid\_sec}\left[ j\right] \left[ i\right] }\right) ^2}, \end{aligned}$$
(3)

where mean_sec\(\left[ j\right] \!; ({j\in \mathrm{Z}^+})\) is the mean_clust of the \(j\)th secondary factor and, as a consequence, \(k\) number of mean_sec can be found.

$$\begin{aligned}&\mathrm{global\_deviation}=\frac{1}{\mathop \prod \nolimits _{k\in \mathrm{Z}^+} \mathop \prod \nolimits _{p\in \mathrm{Z}^+} n_{k,p}} \sqrt{\mathop {\sum }\limits _{i\in \mathrm{Z}^+} ({\mathrm{mean-mid}\left[ i\right] })^2+ \mathop {\sum }\limits _{j\in \mathrm{Z}^+} \mathop {\sum }\limits _{i\in \mathrm{Z}^+} ({\mathrm{mean\_sec}\left[ j\right] -mid\_sec\left[ j\right] \left[ i\right] })^2}, \end{aligned}$$
(4)

where mid_sec\(\left[ j \right] \left[ i \right] \!;({i,j\in \mathrm{Z}^+})\) is the mid of the \(i\)th\(({i\in \mathrm{Z}^+})\) cluster of the \(j\)th\(({j\in \mathrm{Z}^+})\) secondary factor.

Here, \(n_{k,p}\) is the total number of elements of the \(p\)th cluster of the \(k\)th main factor.

These parameters will be eventually used in the development of the algorithm.

Step 4: distribution of membership

In this step, it is to be necessarily checked up as to whether the distances between an element and the mid\([i],(i\in \text{ Z }^+)\) were less than sum_deviation or not. However, if it is less than sum_deviation, then the algorithm would itself generate a membership of the element in that cluster.

On the other hand, to check the influence of different secondary factors on the main factor, it is to be necessarily checked up as to whether the distances between the elements of the main factor and the mid_sec\(\left[ j \right] \left[ i \right] \!;({i,j\in \text{ Z }^+})\) were less than global_deviation or not. If it is less, then the influence of the \(i\)th\(({i\in \text{ Z }^+})\) cluster of the \(j\)th\(({j\in \text{ Z }^+})\) secondary factor on the element of the main factor must be counted. This part enables the present forecasting algorithm to consider the influences of different factors on a particular factor.

Step 5: multivariate fuzzy forecasting algorithm

In this step, the multivariate fuzzy forecasting algorithm, based on the \(k\)-means clustering and fuzzy time series technique is developed. The novel feature of this algorithm is that it takes care of overlapping as well as non-overlapping clusters.

(a) Clustering Let the universe of discourse of the main factor is divided into a number disjoint intervals or clusters (by the chosen clustering algorithm) denoted by \(a_i, (i\in \text{ Z }^+)\). The corresponding linguistic variables are denoted by \(A_i, ({i\in \text{ Z }^+})\). Similarly, \(b_{j,p}, (j,p\in \text{ Z }^+)\) is the \(p\)th cluster of the \(j\)th secondary factor and the corresponding linguistic variable is denoted by \(B_{j,p}, (j,p\in \text{ Z }^+)\). In this paper, the dependent variable present in the system is the main factor; however, the independent variables are the secondary factors present in the system.

(b) Defining fuzzy sets The memberships of \(A_p\) in \(a_p\) (where \(p\in \text{ Z }^+)\), i.e., the local influences (\(f_L\)), are determined by the following symmetric triangular fuzzy membership function that can remove the drawback of fixed membership values, viz., 0, 0.5 and 1 by taking more real numbers lying between 0 and 1:

$$\begin{aligned} f_L ({A_p })=\left\{ {{\begin{array}{l} {1; \quad \mathrm{membership\,of}\,A_p \,\mathrm{in}\,a_p}\\ {\left( {1-\frac{x_i}{n_i*n_p}}\right) \!;\quad \mathrm{where}\,i \in \mathrm{Z}^+; i\ne p} \\ {0;\quad \mathrm{in\,case\,of\,the\,empty\,clusters\,of\,the\,main}}\\ \qquad \mathrm{as\,well\,as\,the\,secondary\,factors}.\\ \end{array}}}\right. \end{aligned}$$
(5)

Again, the memberships of \(A_p\) in different clusters of the secondary factors \(b_{j,i}, ({i,j\in \text{ Z }^+})\), i.e., the global influences (\(f_G\)), are determined by the following triangular membership function which can take more real numbers lying between 0 and 1 apart from 0, 0.5 and 1:

$$\begin{aligned} f_G ({A_p })=\left( {1-\frac{x_{j,i} }{n_{j,i} *n_p }}\right) ;\,i,j\in \mathrm{Z}^+; p = \mathrm{fixed} \end{aligned}$$
(6)

Here, the memberships of the linguistic variable corresponding to a cluster of a particular factor on different clusters of the same factor are called the local influence (\(f_L\)). On the other hand, the memberships of the linguistic variable corresponding to a cluster of main factor on different clusters of the other factors are called the global influence (\(f_{G}\)). In this case, the significance of the local and the global influences are to establish the influences of several other clusters belong to the same or different factors on a cluster of a particular factor.

Variables used in the above equations are explained as follows:

\(n_p, ({p\in \text{ Z }^+})\!:\) Total number of elements of \(a_p, (p\in \text{ Z }^+)\).

\(x_i, ({0\le x_i \le n_i *n_p })\!:\) Total number of distances of the elements of \(a_i, ({i\in \text{ Z }^+})\) from \(a_p, ({p\in \text{ Z }^+})\) is greater than sum_deviation of the main factor (Cf. Step 3 above).

\(x_{j,i}, ({i,j\in \text{ Z }^+})\!:\) Total number of distances of the elements of \(a_i, ({i\in \text{ Z }^+})\) from \(b_{j,i}, ({i,j\in \text{ Z }^+})\) is greater than global_deviation (Cf. Step 3 above).

When all the distances are less than sum_deviation of the main factor, i.e., \(x_i =0,\) it could well be discerned that \(a_i =a_p \) and \(f_L ({A_p })=1\).

For the fuzzy set representation of the linguistic variables \(A_p \) of main factor, both the local (\(f_{L})\) and the global (\(f_{G})\) influences are considered as follows:

$$\begin{aligned} A_p =\mathop {\sum }\limits _{i\in \text{ Z }^+} \frac{f_L ({A_p })}{a_i}+\mathop {\sum }\limits _{j\in \text{ Z }^+} \mathop {\sum }\limits _{i\in \text{ Z }^+} \frac{f_G ({A_p })}{b_{j,i} }. \end{aligned}$$
(7)

Previously, it was mentioned that the secondary factors are mainly the independent variables present in the system. Hence, no other factors have the influences on the secondary factors. Consequently, in case of the secondary factors, the global influences are not considered. Now, the memberships of \(B_{p,q}\) on \(b_{p,i}\) can be defined with the help of the following triangular fuzzy membership function that can take more real numbers lying between 0 and 1 apart from 0, 0.5 and 1:

$$\begin{aligned} f({B_{p,q} })=\left\{ \!{{\begin{array}{l} {1;\quad \mathrm{membership\,of}\,B_{p,q} \,\mathrm{in}\,b_{p,q} }\\ {\!\left( {1-\frac{x_{p,i} }{n_{p,i} *n_{p,q}}}\!\right) \!;\quad \mathrm{for\,other\,over\,lapping\,clusters};}\\ \qquad \qquad \qquad \qquad \qquad p,q=\mathrm{fixed};i\in \text{ Z }^+\\ {0;\quad \mathrm{for\,the\,empty\,clusters}}\\ \end{array} }} \right. \end{aligned}$$
(8)

The fuzzy set representation of the linguistic variable \(B_{p,q}\) is given as follows:

$$\begin{aligned} B_{p,q} =\mathop {\sum }\limits _{i\in \text{ Z }^+} \frac{f({B_{p,q}})}{b_{p,i} }. \end{aligned}$$
(9)

Variables used in the above equation are defined as follows:

\(n_{p,q}, ({p,q\in \text{ Z }^+})\): Total number of elements in \(b_{p,q},\) \( ({p,q\in \text{ Z }^+})\).

\(x_{p,i}, ({0\le x_{p,i} \le n_{p,i} *n_{p,q} })\): Total number of distances of the elements of \(b_{p,i}, ({i\in \text{ Z }^+})\) from \(b_{p,q} \) is greater than sum_deviation of the \(p\)th secondary factor.

(c) Prediction: Rule 1 The elements within the data set can be predicted by this rule. The membership of \(M(p, q)\) on \(M(i, j)\) (i.e., local influence, i.e., \(g_{L}\), the influence of a particular occurrence of the main factor on the its other occurrence) can be defined with the help of the following triangular fuzzy membership function that can take more real numbers lying between 0 and 1 apart from 0, 0.5 and 1:

$$\begin{aligned}&g_L ({{M(p,q)}\_{M(i,j)}})=\left( {1-\frac{\left| {{M(p,q)}-{M(i,j)}} \right| }{\mathrm{sum\_deviation}}}\right) ;\nonumber \\&\left| {{M(p,q)}-{M(i,j)}} \right| \le \mathrm{sum\_deviation} \ne 0 \end{aligned}$$
(10)

Here, \(M({p,q})\) is the \(q\)th element of the \(p\)th cluster of the main factor. Again, the memberships of \(M({p,q})\) in \(S(j)_{i,l}\), i.e., the global influences (\(g_G\)), are determined by the following fuzzy triangular membership function:

$$\begin{aligned}&g_G ({{M(p,q)}\_S(j)_{i,l} })=\left( {1-\frac{\left| {{M(p,q)}-S(j)_{i,l} } \right| }{\mathrm{global\_deviation}}}\right) ;\nonumber \\&\left| {{M(p,q)}-S(j)_{i,l} } \right| \le \mathrm{global\_deviation} \ne 0 \end{aligned}$$
(11)

Variables used in the above equations are, in turn, defined as follows:

\(g_L ({{M(p,q)}\_{M(i,j)}})\): local membership of M(p, q) on M(i, j).

\(g_G ({{M(p,q)}\_S(j)_{i,l} })\): global membership of M(p, q) on \(S(j)_{i,l} \).

Next, the fuzzy sets corresponding to \(M(p,q),({p,q\in \text{ Z }^+})\) are then defined in the following manner:

$$\begin{aligned} {M}(p,q)&= \mathop {\sum }\limits _{i\in \text{ Z }^+} \mathop {\sum }\limits _{j\in \text{ Z }^+} \frac{g_L ({{M(p,q)}\_{M(i,j)}})}{a_i }\nonumber \\&+\mathop {\sum }\limits _{j\in \text{ Z }^+} \mathop {\sum }\limits _{i\in \text{ Z }^+} \mathop {\sum }\limits _{l\in \text{ Z }^+} \frac{g_G ({{M(p,q)}\_S(j)_{i,l} })}{b_{j,i} }. \end{aligned}$$
(12)

Now in the next step, the construction of fuzzy logical relationship on fuzzified main and secondary factors is made as follows:

$$\begin{aligned} {M}({{i,j}}),S(1)_{a,b}, S(2)_{c,d}, \ldots S(k)_{l,p} \rightarrow {M(m,n)}, \end{aligned}$$

where \({M}({{i,j}}),S(1)_{a,b}, S(2)_{c,d}, \ldots S(k)_{l,p} \) denotes fuzzified value of the main factor, the fuzzified value of the first secondary factor, the fuzzified value of the second secondary factor, ..., and the fuzzified value of the \(k\)th secondary factor at stage \(t\), then at the stage \((t+1)\) the main factor will be \(n\)th element of the \(m\)th cluster of the main factor.

The defuzzified predicted occurrences of the main factor can be calculated in the following manner:

$$\begin{aligned}&\mathrm{predicted}(M(p, q))\nonumber \\&\quad =\frac{1*\mathrm{mid}({a_p })+\mathop \sum \nolimits _{i\in \text{ Z }^+}\left\{ {\mathop \sum \nolimits _{{{j\in \text{ Z }^+}}} \left( {1-\frac{\left| {{M(p, \,q)}-{M(i,\,j)}}\right| }{\mathrm{sum\_deviation}}}\right) }\right\} *\mathrm{mid}({a_i })+ \mathop \sum \nolimits _{j\in \text{ Z }^+} \left\{ {\mathop \sum \nolimits _{i\in \text{ Z }^+} \mathop \sum \nolimits _{l\in \text{ Z }^+} \left( {1-\frac{\left| {{M(p,\, q)}-S(j)_{i,l} }\right| }{\mathrm{global\_deviation}}}\right) }\right\} *\mathrm{mid}({b_{j, i}})}{1+\mathop \sum \nolimits _{{{j\in \text{ Z }^+}}} \left( {1-\frac{\left| {{M(p,\, q)}-{M(i, \,j)}} \right| }{\mathrm{sum\_deviation}}}\right) +\mathop \sum \nolimits _{i\in \text{ Z }^+} \mathop \sum \nolimits _{l\in \text{ Z }^+} \left( {1-\frac{\left| {{M(p,\, q)}-S(j)_{i,l}}\right| }{\mathrm{global\_deviation}}}\right) }\nonumber \\ \end{aligned}$$
(13)

If the fuzzified value of the main factor, the fuzzified value of the first secondary factor, the fuzzified value of the second secondary factor, ..., and the fuzzified value of the \(k\)th secondary factor at time \((t-1)\) are \(\text{ M }({{i,j}}),S(1)_{a,b}, S(2)_{c,d}, \ldots S(k)_{l,p}\), respectively, and there is a fuzzy logical relationship in the fuzzy logical relationship group, shown as follows:

$$\begin{aligned}&\text{ M }({{i,j}}), S(1)_{a,b}, S(2)_{c,d}, \ldots S(k)_{l,p} \rightarrow {M}({{m}_1, {n}_1 }),\\&\quad \text{ M }({{m}_2, {n}_2}),{M}({{m}_3, {n}_3})\ldots \text{ M }({{m}_r, {n}_r }) \end{aligned}$$

In case of the secondary factors (mainly the independent variables), the memberships of \(S(i)_{p,q}\) on \(S(i)_{l,j}\) (local influence) are defined with the help of the following triangular fuzzy membership function that can take more real numbers lying between 0 and 1 apart from 0, 0.5 and 1:

$$\begin{aligned}&g({S(i)_{p,q} \_S(i)_{l,j} })=({1-\frac{\left| {S(i)_{p,q} -S(i)_{i,j} } \right| }{\mathrm{sum\_deviation\_sec}\left[ i \right] }});\nonumber \\&| {S(i)_{p,q} -S(i)_{i,j} } |\le \mathrm{sum\_deviation\_sec}\left[ i\right] \ne 0, \end{aligned}$$
(14)

where \(g({S(i)_{p,q}\_S(i)_{i,j} })\) = Local membership of \(S(i)_{p,q} \) on \(S(i)_{i,j} \).

Fuzzy set representations for the elements of the secondary factors are defined as follows:

$$\begin{aligned} S(l)_{p,q} =\mathop {\sum }\limits _{l\in \text{ Z }^+} \mathop {\sum }\limits _{j\in \text{ Z }^+} \frac{g({S(i)_{p,q} \_S(i)_{l,j}})}{b_{i,l} } \end{aligned}$$
(15)

The secondary factors are mainly the independent variables present in the system. Hence, in this case, to form the fuzzy logical relationship the concept of univariate time series is used as follows:

$$\begin{aligned} S(l)_{p,q} \rightarrow S(l)_{r,s}, \end{aligned}$$

where \(S(l)_{p,q} \rightarrow S(l)_{r,s}\) denotes that ‘if the fuzzified value of the \(l\)th secondary factor at stage \(t\) is the \(q\)th element of its \(p\)th cluster, then at the stage \((t+1)\) the aforementioned secondary factor will be its \(s\)th element of the \(r\)th cluster’. This is because of the fact that, in this case no other factor except itself has the influence on an independent variable.

The defuzzified predicted occurrences of the \(l\)th\(({l\in \mathrm{Z}^+})\) secondary factor can be calculated using the following equation:

$$\begin{aligned}&\mathrm{predicted}({S(l)_{p,q} })\nonumber \\&\quad =\frac{\mathop \sum \nolimits _{i\in \text{ Z }^+}\left\{ {\mathop \sum \nolimits _{j\in \text{ Z }^+} \left( {1-\frac{\left| {S(l)_{p,q} -S(l)_{i,j} } \right| }{\mathrm{sum\_deviation\_sec}\left[ l \right] }}\right) }\right\} *\mathrm{mid}({b_{l,i} })}{\mathop \sum \nolimits _{j\in \text{ Z }^+} \left( {1-\frac{\left| {S(l)_{p,q} -S(l)_{i,j} }\right| }{\mathrm{sum\_deviation\_sec}\left[ l \right] }}\right) }\nonumber \\ \end{aligned}$$
(16)

Rule 2 An important feature of this rule is that the elements lying outside the data set can precisely be predicted. The corresponding fuzzy logical relationship can be constructed as follows:

\({M}(i,j),S(1)_{p,q}, S(2)_{r,s}, \ldots S(n)_{x,y} \rightarrow \#, \) where ‘\(\# \)’ is the element lying outside the data set. Then the predicted value of \(\# \) can be calculated as follows:

$$\begin{aligned} \mathrm{predicted}(\# )= \frac{\mathop \sum \nolimits _{i\in \mathrm{Z}^+}\mathrm{mid}\left[ {a_i } \right] +\mathop \sum \nolimits _{j\in \text{ Z }^+}\mathop \sum \nolimits _{i\in \text{ Z }^+} \left( {1-\frac{x_{j,i}}{n}}\right) *\mathrm{mid\_sec}\left[ {b_{j,i} }\right] }{\text{ Total } \text{ number } \text{ of } \text{ clusters } \text{ of } \text{ the } \text{ main } \text{ factor }+\mathop \sum \nolimits _{j\in \text{ Z }^+} \mathop \sum \nolimits _{i\in \text{ Z }^+} ({1-\frac{x_{j,i} }{n}})},\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \end{aligned}$$
(17)

where \(x_{j,i}\) is the number of mid values of the clusters of the main factor having distances greater than the global_deviation from mid\(\left[ {b_{j,i}}\right] \) and \(n\) is total number of mid-values of the main factor.

The secondary factors are mainly the independent variables present in the system. Hence, in this case, to form the fuzzy logical relationship the concept of uni-variate time series is used as follows:

$$\begin{aligned} S(i)_{p,q} \rightarrow \# \_S(i), \end{aligned}$$

where \(S(i)_{p,q} \rightarrow \# \_S(i)\) denotes that ‘if the fuzzified value of the \(i\)th secondary factor at stage \(t\) is the \(q\)th element of its \(p\)th cluster, then at the stage \((t+1)\) the aforementioned secondary factor will be defuzzified to its unknown value \(\#\_S(i)\)’. The defuzzified predicted value of ‘\(\#\_S(j)\)’(unknown occurrences of the \(i\)th secondary factor) can then be calculated as follows:

$$\begin{aligned}&\mathrm{predicted}({\# \_S(j)})\nonumber \\&\quad =\frac{\mathop \sum \nolimits _{i\in \mathrm{Z}^+} \mathrm{mid}\left[ {b_{j,i} }\right] }{\text{ Total } \text{ number } \text{ of } \text{ clusters } \text{ of } \text{ the } j{\text{ th }} \text{ secondary } \text{ factor }}\nonumber \\ \end{aligned}$$
(18)

After this, the defuzzified unknown occurrences of the main and the secondary factors are inserted into the dataset and repeat the step 1 to step 4 of the developed algorithm to predict the next occurrences of them.

The above-developed algorithm can thus effectively handle both the overlapping and non-overlapping clusters, besides making predictions and handling the uncertainty as well.

4 Test results of the developed algorithm

In this section, three different real-life examples related to three different areas, viz., web technology, coal industries and finance were cited, that demonstrate the potentiality and the applicability of the proposed algorithm over a vast domain. The first example deals with the prediction of some frequently occurred web errors during the execution of www.ismdhanbad.ac.in, the official website for Indian School of Mines Dhanbad, India. Quite on a contrary, the next one deals with the prediction of the yield % of the clean coal during the oil agglomeration process for the beneficiation of coal fines. However, the last one deals with the prediction of records (mainly financial data), provided by the ministry of statistical and program implementation, Govt. of India.

The proposed forecasting method was compared with thirteen different conventional (uni-variate and multivariate time series models), e.g., VAR, MA, Holt-Winter, Box-Jenkins (Lutkepohl 2005), and fuzzy time series-based forecasting algorithms, viz., Bulut et al. (2012), Bulut (2014), Duru (2010, 2012), Chatterjee and Roy (2014a), Chatterjee and Roy (2014b), Chen and Tanuwijaya (2011) (replacing its clustering algorithm with \(c\)-means and \(k\)-means techniques, respectively). Moreover, the accuracy of the proposed algorithm has also been compared with the ANN approach (Aladag et al. 2008). To check the forecasting accuracy of the proposed algorithm, in this paper root mean squared error (RMSE), root median squared errors (RMdSE) and median relative absolute error (MdRAE) have been used as the accuracy metrics. RMSE is a biased accuracy metric (Hyndman 2006). On the other hand, RMdSE is not scale-free (Hyndman 2006). The comparative study can be found in Table 1, 5, 9, 11, 12, 13 and 14.

However, in many cases, it can be found that different clustering techniques are suitable for the main and different secondary factors due to the change in the nature or properties (viz., statistical, etc.) of the corresponding data sets. Hence, for simplicity of calculation, in this paper the author mainly concentrates on three very well-known clustering techniques, viz., \(c\)-means, \(k\)-means and the automatic clustering algorithm, to discuss the experimental results obtained by implementing the proposed algorithm. But prior going to discuss the experimental results, it would be apt for clarity to explain briefly the concept of DVI (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979) as it is used to check the quality of the generated clusters.

A validity index is used to evaluate the quality of the clusters generated by the clustering algorithm (Bezdek et al. 1984; Dunn 1973; Hartingan and Wong 1979). For the performance measure of the proposed algorithm and the quality of the generated clusters, in this paper, the Dynamic Validity Index (DVI) is used and are defined as follows:

Let \(n\) be the number of data points, \(k\) be the pre-defined upper bound of the number of clusters, and \(z_i \) be the center of the cluster \(c_i \). The dynamic validity index (DVI) is given as follows:

$$\begin{aligned} \text{ DVI }={\begin{array}{c} {\min } \\ {p}\\ \end{array} }\left\{ {\text{ IntraRatio }(\text{ p })+\gamma *\text{ InterRatio }(\text{ p })}\right\} , \end{aligned}$$
(19)

where IntraRatio and InterRatio are defined as follows:

$$\begin{aligned} \text{ IntraRatio }(\text{ p })=\frac{\text{ Intra }(\text{ p })}{\text{ MaxIntra }},\text{ InterRatio }(\text{ p })=\frac{\text{ Inter }(\text{ p })}{\text{ MaxInter }} \end{aligned}$$
$$\begin{aligned} \text{ Intra }(p)=\frac{1}{n}\mathop {\sum }\limits _{i=1}^{p} \mathop {\sum }\limits _{{x}\in \text{ C }_{i} } \text{ x }-z_i^2,\text{ MaxIntra }={\begin{array}{c} {\max } \\ i \\ \end{array} }\left\{ {\text{ Intra }(i)}\right\} \end{aligned}$$
$$\begin{aligned} \text{ Inter }(p)=\frac{{\begin{array}{c} {\max } \\ {i,j} \\ \end{array} }({z_i -z_j^2})}{{\begin{array}{c} {\min } \\ {i\ne j}\\ \end{array} }({z_i -z_j^2})}\mathop {\sum }\limits _{i=1}^p \frac{1}{\mathop \sum \nolimits _{j=1}^p z_i -z_j^2},\\ \text{ MaxInter }={\begin{array}{c} {\max } \\ i \\ \end{array} }\left\{ {\text{ Inter }(i)}\right\} \end{aligned}$$

For simplicity of calculation, in the present study, \(k\)-means, \(c\)-means (with 3 clusters each) and the automatic clustering algorithms are applied to partition the data sets. The DVI of the clusters of all the data sets, used in this study, generated by the aforementioned clustering algorithms are shown in Table 1. The forecasted values may change with the clustering algorithm.

Table 1 DVI index values of different clusters of the data sets of the main factors

4.1 An example regarding web error prediction

In this subsection, the developed algorithm has been validated by predicting the frequently occurred web errors using the data collected from the HTTP log files (error and access logs) (Huynh and Miller 2009) of http://www.ismdhanbad.ac.in, i.e., the official website of Indian School of Mines Dhanbad, India, which is a non-commercial and dynamic website that utilizes the PHP (http://www.php.net) scripting language, MySql (http://www.mysql.com) for the backend database and is hosted on an Apache HTTP Daemon. For scrutinizing the stability and reliability of the data, the log files (HTTP access and error logs) were chosen to cover 387 consecutive days, starting from \(30\)th September 2010 to 22nd October 2011, during which, the website had received approximately, 6,367,893 hits, 188,369 unique visitors, 13,612 unique URLs, 25,433 unique user agents [viz. Mozilla/5.0+ (compatible; +Googlebot/2.1; ++http://www.google.com/bot.html)], transferred a total amount of 87,964,646 KBytes data, and approximately, 35,137 numbers of sessions were created. The most frequently-occurred web failures corresponding to each day of http://www.ismdhanbad.ac.inare tabulated and shown in Table 2. From this table, it is observed that, the error code 404 numerically dominates the others, which is in tune with the findings of the survey results from 1994 to 1998 by the Graphics, Visualization, and Usability Center of Georgia Institute of Technology (http://www.gvu.gatech.edu/user_surveys/), which states that 404 errors are most commonly occurred errors that users encounter while browsing the web. The other two most frequently occurred web error codes in case of http://www.ismdhanbad.ac.in are 406 (not acceptable) and 403 (forbidden) (Huynh and Miller 2009). Therefore, the error code 404 is now considered as the main factor and the error codes 406, 403 are now treated as the two secondary factors. Accordingly, the occurrences of 404 (main factor) are preferentially predicted on the basis of 406 and 403 (secondary factors). Now, for the modeling purpose 70 % of the data set, i.e., (387 \(\times \) 0.7) \(\approx \) 270 data have been used and remaining 117 (30 %) have been left for prediction purpose (post sample period).

Table 2 The occurrences of different frequently occurred error codes from 30/9/2010 to 22/10/2011 along with their positions in the respective clusters generated by the \(k\)-means clustering algorithm. Dickey–Fuller test for checking stationarity of the series corresponding to the main, \(1\)st and 2nd secondary factors

Again, the stationarity of the series of occurrences corresponding to the error codes 404, 406, 403 has also been checked using the Dickey–Fuller test (Lutkepohl 2005)and found that all the series are stationary (Table 2).

Initially, for the simplicity of calculation, the main and the secondary factors, shown in Table 2, are partitioned into clusters with the help the \(k\)-means (with 3 clusters), \(c\)-means (with 3 clusters), and the automatic clustering algorithm (Chen and Tanuwijaya 2011) and the corresponding results are shown in Table 3. Next, different clustering indices, viz., DVI, are calculated to check the quality of the generated clusters, and it is found that the \(k\)-means (with 3 clusters) is most suitable for the data set shown in Table 2 due to lowest DVI among all the aforementioned clustering algorithms. Consequently, the proposed algorithm employs the \(k\)-means clustering technique to partition the main and the secondary factors in step 2 (Cf. Sect. 3). The resulting non-overlapping, non-empty clusters of the main and the secondary factors are denoted by \(a_i ;({i=1,2,3})\) and \(b_{i,j} ;({i=1,2;j=1,2,3})\), respectively. The linguistic variables corresponding to the clusters of the main and the secondary factors are denoted by \(A_i ;({i=1,2,3})\) and \(B_{i,j} ;({i=1,2;j=1,2,3})\), respectively.

Next, using step 3 of the proposed algorithm (Cf. Sect. 3), different parameters are calculated and are shown in Table 3. The fuzzy sets corresponding to \(A_i ;({i=1,2,3})\) and \(B_{i,j} ;({i=1,2;j=1,2,3})\) are defined using step 4(b)(Cf. Sect. 3) of the developed algorithm. For example, the fuzzy set \(A_1\) (linguistic variable corresponding to \(a_1 )\) can be defined as follows [using Eq. (7)]:

$$\begin{aligned} A_1&= \frac{1}{a_1 }+\frac{\left( {1-\frac{8}{8*12}}\right) }{a_2 }+\frac{\left( {1-\frac{1}{8*3}}\right) }{a_3 }\\&+\mathop {\sum }\limits _{j=1}^2 \mathop {\sum }\limits _{i=1}^3 \frac{\left( {1-\frac{x_{j,i} }{n_{j,i} *n_p}}\right) }{b_{j,i} }\\&= \frac{1}{a_1 }+\frac{0.917}{a_2 }+\frac{0.958}{a_3 }+\mathop {\sum }\limits _{j=1}^2 \mathop {\sum }\limits _{i=1}^3 \frac{0}{b_{j,i} } \end{aligned}$$

From the above fuzzy set representation, it is quite clear that the membership values can be real numbers lying between 0 and 1, apart from 0,0.5 and 1. Moreover, from the above equation, it is quite clear that there are influences of different secondary factors on the main factor, which is, however, a remarkable feature of the proposed algorithm. Similarly, the fuzzy sets corresponding to the remaining main, \(1\)st and \(2\)nd secondary factors can be calculated. From Table 2, it is found that 3870 = \(\text{ M }({1,1})\in a_1 \). Similarly, the positions of the other elements of the main and the secondary factors can be found.

Table 3 Different clusters of the main and the secondary factors using the \(k\)-means, \(c\)-means and the automatic clustering algorithm

Again, sum_deviation_sec\([1]=3.623\) and sum_deviation_ sec\([2]=0.7577\). The fuzzy set representation for \(B_{1,3}\) (linguistic variable corresponding to \(b_{1,3}\)) is given as follows:

$$\begin{aligned} B_{1,3}&= \mathop {\sum }\limits _{i=1}^3 \frac{f({B_{1,3} })}{b_{1,i} },\\&\text{ or },B_{1,3} =\frac{1}{b_{1,3} }+\frac{\left( {1-\frac{4}{20}}\right) }{b_{1,1} }+\frac{\left( {1-\frac{5}{10}}\right) }{b_{1,2}}\\&\quad =\frac{1}{b_{1,3} }+\frac{0.8}{b_{1,1} }+\frac{0.5}{b_{1,2} }. \end{aligned}$$

Next, different fuzzified occurrences of the main, \(1\)st, and \(2\)nd secondary factors of www.ismdhanbad.ac.in were tabulated and the data are shown in Table 2. Using Rule 1 of the developed algorithm, different fuzzy logical relationships were established. From Table 2, it is seen that 3,205 \(= {M}({1,5})\in a_1 \). Accordingly, the corresponding fuzzy set can be defined, using Eq. (13), as given below:

$$\begin{aligned} {M}(\text{ p, } \text{ q })&= \mathop {\sum }\limits _{i\in \text{ Z }^+} \mathop {\sum }\limits _{j\in \text{ Z }^+} \frac{g_L ({{M}(1,5)\_{M}(i,j)})}{a_i}\\&+\mathop {\sum }\limits _{j\in \text{ Z }^+} \mathop {\sum }\limits _{i\in \text{ Z }^+} \mathop {\sum }\limits _{l\in \text{ Z }^+} \frac{g_G ({{M}(1,5)\_S(j)_{i,l}})}{b_{j,i}} \end{aligned}$$

In the same way, fuzzy sets corresponding to the other fuzzified occurrences of the main and secondary factors were defined. The fuzzy set representation for \(4=S(1)_{3,1}\) is given as follows:

$$\begin{aligned} S(1)_{3,1} =\mathop {\sum }\limits _{i=1}^3 \mathop {\sum }\limits _{j\in Z^+} \frac{g_L ({S(1)_{3,1} \_S(1)_{i,j} })}{b_{1,i} } \end{aligned}$$

Again, from first row of Table 2 using Rule 1 (Cf. Sect. 3), the following fuzzy logical relationship was formed.

‘If the \(1\)st element of the \(1\)st cluster of the main factor (i.e., fuzzified value \(M\)(3,1)), \(1\)st element of the \(1\)st cluster of the \(1\)st secondary factor (i.e., fuzzified value \(S(1)_{1,1} )\), and \(1\)st element of the \(1\)st cluster of the \(2\)nd secondary factor (i.e., fuzzified value \(S(2)_{1,1} )\) are at stage 1, then at the stage 2 the main factor will be the \(3\)rd element of the \(1\)st cluster of the main factor (i.e., fuzzified value \(M\)(1,1))’. Symbolically, it is expressed as:

$$\begin{aligned} {M}({1,1}),S(1)_{1,1}, S(2)_{1,1} \rightarrow {M}({3,1}). \end{aligned}$$

In case of the fuzzified one-step ahead occurrence of the main factor (i.e., error code 404) on 23/10/10, i.e., ‘#’, a fuzzy logical relationship was established by applying Rule 2 (Cf. Sect. 3) of the developed algorithm as follows:

‘If the \(11\)th element of the \(1\)st cluster of the main factor (i.e., fuzzified value \(M\)(1, 11)), the \(17\)th element of the \(1\)st cluster of the \(1\)st secondary factor (i.e., fuzzified value \(S(1)_{1,17} )\), and the \(12\)th element of the \(2\)nd cluster of the \(2\)nd secondary factor (i.e., fuzzified value \(S(2)_{2,12})\) are at stage 21, then at stage 22 the fuzzified occurrence of the main factor will be ‘#’. Then, the fuzzy logical relationship would symbolically be expressed as:

$$\begin{aligned} {M}({1,11}),S(1)_{1,17}, S(2)_{2,12} \rightarrow \# . \end{aligned}$$

Different fuzzy logical relationships are contained in Table 4.

Table 4 Fuzzy logical relationship among main (404), \(1\)st secondary (406) and \(2\)nd secondary (403) factors

With the help of Rule 1 of the developed algorithm (Cf. Sect. 3), different known occurrences of the main factor given in Table 2 can be predicted. The defuzzified predicted value of M(1,5), i.e., 3,205, was calculated as follows (Cf. Sect. 3):

$$\begin{aligned} \mathrm{predicted}({3636})=3633 \end{aligned}$$

In the same way, the remaining known occurrences of the main factor can easily be predicted and shown in Table 5.

Table 5 Forecasted outcomes (approx.) along with theRMSE, RMdSE, MdRAE values for the post sample period

Again, using Rule 2 of the developed algorithm (Cf. Sect. 3), the occurrence of the main factor on 23/10/2011, i.e., #, can be predicted as follows:

$$\begin{aligned} \mathrm{predicted}(\# )=\frac{3719+3374.5+2761+0}{3}\approx 3285. \end{aligned}$$

Next, to check the predictive accuracy \(\text{ RMSE }\) values are calculated as follows:

$$\begin{aligned}&\text{ RMSE }\nonumber \\&\quad =\sqrt{\frac{\mathop \sum \nolimits _{i=1}^n ({\text{ Forecasted }\_{\mathrm{occurrence}}_{\mathrm{i}}-\text{ Actual }\_{\mathrm{occurrence}}_{\mathrm{i}} })^2}{n}}\nonumber \\ \end{aligned}$$
(20)

The variables, used in the above equation, are defined as follows:

Forecasted_occurrence \(_{\mathbf{i}}\): \(i\)th forecasted occurrence of the main factor.

Actual_occurrence \(_{\mathbf{i}}\) \(i\)th actual occurrence of the main factor.

Next, the outputs of the proposed algorithm were compared with that of the algorithm developed by Chen and Tanuwijaya (2011) using the automatic clustering algorithm and the results, given in Table 5, establish the superiority of the former. Afterwards, the outcomes of the proposed algorithm are compared with that of the algorithm developed by Chen and Tanuwijaya (2011), replacing the automatic clustering by \(k\)-means and \(c\)-means, respectively and the results are given in Table 5 which also shows the better predictive accuracy of the former. Quite interestingly, it is found that the predictive accuracy of the algorithm proposed by Chen and Tanuwijaya (2011) increases if the automatic clustering algorithm is replaced by the \(k\)-means and \(c\)-means, respectively as they can produce better quality clusters than the former (one possible reason). Additionally, it is also found that in this case, the quality of the clusters generated by the \(k\)-means algorithm (checking the DVI index, given in Table 1) is better than that of the \(c\)-means clustering approach and, as a consequence, better forecasted outputs are found from Chen and Tanuwijaya algorithm (2011) if the automatic clustering is replaced by \(k\)-means than \(c\)-means algorithm. The above discussion clearly establishes the influence of choosing suitable clustering algorithm on the forecasted results of the fuzzy time series-based prediction algorithms. The results are shown in Table 5. The RMSE, RMdSE and MdRAE values for the proposed forecasting algorithm is lesser than all of its competitors that can be found from the Tables 5, 9, 11, 12, 13 and 14. The bold portions of the tables confirms the propositions.

Moreover, the outputs of the proposed algorithm is compared with two statistical models, viz., MA(1) (univariate time series model) (Lutkepohl 2005) and VAR(1) (multivariate time series model) (Lutkepohl 2005) and found better predictive accuracy of the proposed algorithm. The corresponding MA(1) model is given as follows:

$$\begin{aligned} \text{ MA }(1):X_T =Z_T -0.2024Z_{T-1}, \end{aligned}$$

where \(\left\{ {Z_T }\right\} \) is the white noise series corresponding to the series of the occurrences of 404 error code, i.e., \(\left\{ {X_T }\right\} \) (given in Table 2). The MA coefficients are determined with the help of the maximum likelihood method. Here, the white noise variance corresponding to \(\left\{ {X_T }\right\} \) can be calculated as \(0.369892\times 10^6\) whereas, the standard error of the MA coefficients can be calculated as 0.324468. The AICC and BIC (Lutkepohl 2005) of the proposed MA(1) model can be calculated as \(0.364795\times 10^3\) and \(0.360366\times 10^3\), respectively. The predicted occurrences of different web errors are given in Table 5. Similarly, the corresponding VAR(1) model is given as follows:

$$\begin{aligned} \left( {{\begin{array}{*{20}c} {Y_T }\\ {X_{1T} }\\ {X_{2T} }\\ \end{array} }}\right)&= \left( {{\begin{array}{*{30}c} {-0.2218}&{} {1.0948}&{} {0.6846}\\ {-0.05292}&{} {-0.01587}&{} {-15.08921}\\ {-0.002207}&{} {-0.004290}&{} {-0.019705}\\ \end{array} }}\right) \\&\times \left( {{\begin{array}{*{20}c} {Y_{T-1} }\\ {X_{1T-1} }\\ {X_{2T-1} }\\ \end{array} }}\right) +\left( {{\begin{array}{*{20}c} {3892.9216}\\ {460.21221}\\ {14.357282}\\ \end{array} }}\right) \\&+\left( {{\begin{array}{*{20}c} {-16.7561}\\ {-10.67506}\\ {-0.080135}\\ \end{array} }}\right) , \end{aligned}$$

where \(Y_T =\) occurrences of error code 404 (main factor), \(X_{1T} =\) occurrences of the error code 406 (first secondary factor) and \(X_{2T} =\) the occurrences of the error code 403 (the second secondary factor). Again, \((3892.9216\,\,460.21221\,\,14.357282)^{\mathrm{T}}\) and \(({-16.7561\,-10.67506\,-0.080135})^{\mathrm{T}}\) are the constant and the trend components of the above-mentioned VAR(1) model. The forecasted outputs of the VAR(1) model are shown in Table 5. Moreover, from Fig. 2, it is clear that the predicted accuracy of the proposed algorithm is better than all the other approaches used in the present study.

Fig. 2
figure 2

Original and the predicted occurrences of 404

Additionally, \(\chi ^2\)-goodness of fit test too was carried out to validate the developed multivariate fuzzy forecasting algorithm. Here, \(\chi _{\mathrm{Computed}}^2 =20.83<40.289=\chi _{\mathrm{Tabulated}}^2\) at 22 degrees of freedom and 1 % level of significance for the data set given in Table 2. Therefore, the developed algorithm stands fully validated.

4.2 An example regarding coal processing

This sub section showcases a real example of the oil agglomeration (Sahinoglu and Uslu 2011) process for the beneficiation of coal fines (coal washing), where the environment is much different from that of the websites. Oil agglomeration can be used for separation of particles suspended in water differing in affinity towards oil drops. The affinity of particles suspended in water towards oil drops is called aquaoleophilicity (Sahinoglu and Uslu 2011). The term aquaoleophilicity reflects the fact that particles like oil drops in water. This property in similar to the hydrophobicity utilized in flotation in which the oil drop is substituted with the gas bubble. Successful oil agglomeration requires vigorous stirring to disperse oil drops and particles to facilitate sufficient number of collisions between them (Sahinoglu and Uslu 2011). For this purpose, impellers are required as the main equipment (Sahinoglu and Uslu 2011). Different properties of the impellers (independent variables) along with the experimental (dependent variable) and predicted % yield (clean coal) that used in this experiment are given in Tables 6 and 9. Figure 3 pictorially demonstrates the oil agglomeration process. Different abbreviations of the independent variables used in Tables 6 and 9 are given as follows:

Table 6 Different independent and dependent variables of the oil agglomeration data set along with their respective positions in the clusters generated by the \(k\)-means clustering algorithm

\(\mathbf{I_{B}}\): Number of impellers blades; \(\mathbf{I_{N}}\): number of impellers; \(\mathbf{I_{D}}\): diameters of the impellers; \(\mathbf{I_{W}}\): width of the impellers’ blades; RPM: impeller’s speed (in revolution per minute).

The experimental yield (%) of the clean coal is the dependent variable. Here, the experimental yield (%) of the clean coal has been predicted based on \({\hbox {I}}_{\mathrm{B}}, \hbox {I}_{\mathrm{N}}, \hbox {I}_{\mathrm{D}}, \hbox {I}_{\mathrm{W}}\) and RPM (independent variables). For the experimental purpose, in this paper a data set of length 81 has been used. Some part of the data set has been shown in Table 6. Here, for the modeling purpose 70 % of the data set, i.e., \(({81\times 0.7})\approx 57\), data have been used and the remaining are used for prediction purpose (post sample period).

From the first row of Table 6, it is found that if the number of impellers’ blade is 2, number of impellers are 4, diameters of the impellers are 90 mm., width of the impeller blade is 20 mm., and the speed of the impellers are 1200 RPM, then the experimental yield (%) of the clean coal is 58.2300 %, whereas, the predicted yield (%) of clean coal by the developed algorithm and the ANN approach are 57.37 % and 62.17563 %, respectively (Table 9). The main motive behind citing this example is to unveil the extensive applicability of the developed prediction algorithm in different parts of science and technology. In this case, the ceil of the experimental yield (dependent variable), i.e., Experimental Yield, is considered as the main factor whereas, \({\hbox {I}}_{\mathrm{B}}, \hbox {I}_{\mathrm{N}}, \hbox {I}_{\mathrm{D}}, \hbox {I}_{\mathrm{W}}\) and RPM (all the independent variables) are considered as the secondary factors. The following table shows different instances or observations of the independent and dependent variables involved in this experiment.

Fig. 3
figure 3

Oil agglomeration process

Applying step 1 and step 2 of the developed algorithm (Cf. Sect. 3) on the dataset (shown in Table 6) non- overlapping clusters generated by the \(k\)-means, automatic and \(c\)-means clustering algorithms are shown in Table 7. Next, to choose the suitable clustering algorithm, initially, the DVI of the clusters generated by the aforementioned clustering algorithm are calculated and are shown in Table 1, from which it is quite clear that the \(k\)-means clustering algorithm can produce better quality clusters. Consequently, in this case, the \(k\)-means clustering algorithm has been applied to partition the data set.

Table 7 Clusters of the main and the secondary factors generated by the above-mentioned clustering algorithm

The mean, sum_deviation and the global_deviation (Cf. Sect. 3) are given as follows:

$$\begin{aligned}&\mathrm{mean}=62.625,\mathrm{sum\_deviation}=2.165,\\&\mathrm{global\_deviation}=14.07. \end{aligned}$$

In each case \(k\)-means clustering algorithm with three clusters was applied and as a consequence, the dataset corresponding to all the main and secondary factors were divided into 3 clusters each.

The fuzzy set representation for \(A_0\) (the linguistic variable corresponding to \(a_0\)) is as follows [Cf. Eq. (7)]:

$$\begin{aligned} A_0&= \frac{1}{a_0 }+\frac{\left( {1-\frac{1}{36}}\right) }{a_1 }+\frac{0}{a_3}+\mathop {\sum }\limits _{j=1}^5 \mathop {\sum }\limits _{i=1}^3 \frac{\left( {1-\frac{x_{j,i} }{n_{j,i} *n_p }}\right) }{b_{j,i} }\\&= \frac{1}{a_0 }+\frac{0.97}{a_1 }+\frac{0}{a_3 } \end{aligned}$$

Fuzzy set representation for the other linguistic variables for the main factor can be similarly defined. The fuzzy set representations for \(B_{1,4}\) is given as follows:

$$\begin{aligned} B_{1,4}&= \frac{0}{b_{1,0} }+\frac{0}{b_{1,1} }+\frac{0}{b_{1,2}}+\frac{0}{b_{1,3} }+\frac{1}{b_{1,4} }\\&+\frac{0}{b_{1,5}}+\frac{0}{b_{1,6} }+\frac{0}{b_{1,7} }+\frac{0}{b_{1,8}}+\frac{0}{b_{1,9}}. \end{aligned}$$

The fuzzy set representation for M(2,1) is defined as follows [Cf. Eq. (12)]:

$$\begin{aligned} \text{ M }({2,1})&= \frac{1}{a_2}+\frac{0}{a_1}+\frac{0}{a_3}\\&+\mathop {\sum }\limits _{j=1}^5 \mathop {\sum }\limits _{i=1}^3 \mathop {\sum }\limits _l \frac{g_G ({M({2,1})\_S(j)_{i,l} })}{b_{j,i} } \end{aligned}$$

Different fuzzified occurrences of the main and the secondary factors as also the possible fuzzy relationships are tabulated and shown in Tables 6 and 8.

Table 8 Fuzzy logical relationships of the data set given in Table 6

Again, from the first row of Table 16, using Rule I (Cf. Sect. 3), the following fuzzy logical relationship was formed:

‘If the \(1\)st element of the \(2\)nd cluster of the main factor [i.e., fuzzified value M(2,1)], \(1\)st element of the \(3\)rd cluster of the \(1\)st secondary factor [i.e., fuzzified value \(S(1)_{({3,1})}\)], \(1\)st element of the \(1\)st cluster of the \(2\)nd secondary factor [i.e., fuzzified value \(S(2)_{({1,1})}\)], \(1\)st element of the \(2\)nd cluster of the \(3\)rd secondary factor [i.e., fuzzified value \(S(3)_{({2,1})}\)], \(1\)st element of the \(3\)rd cluster of the \(4\)th secondary factor [i.e., fuzzified value \(S(4)_{({3,1})}\)], and \(1\)st element of the \(2\)nd cluster of the \(5\)th secondary factor [i.e., fuzzified value \(S(5)_{({2,1})}\)] are at stage 1, then at the next stage the fuzzified main factor will be M(3,1)’. The possible fuzzy logical relationships are tabulated and shown in Table 8 below.

Table 9 Forecasted outputs (approx.) of different prediction methods for the oil agglomeration data set

The defuzzified predicted value of M(3,1) can be calculated as follows [Cf. Eq. (13)]:

$$\begin{aligned} \mathrm{predicted}({\text{ M }({3,1})}) =\frac{1*64.64+\left\{ {\left( {1-\frac{65-63.15}{2.165}}\right) +\left( {1-\frac{65.25-63.15}{2.165}}\right) +\left( {1-\frac{63.15-63}{2.165}}\right) }\right\} *65.865}{1+\left\{ {\left( {1-\frac{65-63.15}{2.165}}\right) +\left( {1-\frac{65.25-63.15}{2.165}}\right) +\left( {1-\frac{63.15-63}{2.165}}\right) }\right\} }\approx 65.26. \end{aligned}$$

Similarly, the other predicted experimental yields (%) can be calculated which are shown in Table 9.

Tables 9, 13 show the forecasted outputs of the proposed and several other prediction methods [proposed by Chen and Tanuwijaya 2011 (using automatic, \(c\)-means and \(k\)-means clustering algorithms), ANN, VAR(1), MA(3), Holt-Winter, Box-Jenkins, Bulut et al. 2012; Bulut 2014; Duru 2010, 2012; Chatterjee and Roy 2014a, b] with their RMSE, RMdSE and MdRAE values, which establishes the better efficiency and accuracy of the former. The pictorial representations of the above results are shown in Fig. 4. Comparing the DVI values it has been found that the quality of the clusters generated by the \(k\)-means clustering algorithm is better than the other two aforementioned clustering methods. Quite interestingly, it has been found that the forecasting accuracy of the Chen and Tanuwijaya (2011) method is enhanced if the automatic clustering technique is replaced by the \(k\)-means clustering algorithm. From the above discussion, it is quite clear that the clustering technique has an influence on the fuzzy time series-based forecasting methods which corroborates well with the findings of Huarng (2001a). Hence, the proposed algorithm employs the \(k\)-means clustering technique to partition the data set given in Table 6 for better predictive accuracy. Next, the proposed forecasting algorithm can consider the contributions of different secondary factors on the defuzzified predicted occurrences of the main factor, which makes it more realistic than the other existing, extensively used fuzzy time series-based forecasting algorithms. Finally, the proposed algorithm has been compared with the ANN (Aladag et al. 2008) approach and two statistical methods, viz., VAR(1) (multivariate time series model), MA(3) (univariate time series model) to establish its better predictive accuracy. The corresponding VAR(1) model is given as follows:

$$\begin{aligned} \left( {{\begin{array}{*{30}c} {Y_t }\\ {X_{1t} }\\ {X_{2t} }\\ {X_{3t} }\\ {X_{4t} }\\ {X_{5t} }\\ \end{array} }}\right)&= \left( {{\begin{array}{*{20}c} {1.94318}&{}\quad {-4.72287}&{}\quad {-2.83968}&{}\quad {-0.36636} &{}\quad {-2.60480}&{}\quad {-0.03826}\\ {0.004213}&{}\quad {0.016857}&{}\quad {0.0467286}&{}\quad {0.0041039} &{}\quad {0.0004224}&{}\quad {-0.0002691}\\ {0.210601}&{}\quad {-1.749276}&{}\quad {0.014056}&{}\quad {-0.033934} &{}\quad {-0.197276}&{}\quad {-0.003480}\\ {1.99407}&{}\quad {-2.80078}&{}\quad {-5.28686}&{}\quad {0.14733}&{}\quad {-1.95601}&{}\quad {-0.03234}\\ {0.81219}&{}\quad {-1.03003}&{}\quad {-1.06770}&{}\quad {-0.19284} &{}\quad {-.77454}&{}\quad {-0.01384}\\ {173.799}&{}\quad {-220.979}&{}\quad {-226.111}&{}\quad {24.399}&{}\quad {-205.855}&{}\quad {-2.551}\\ \end{array} }}\right) \left( {{\begin{array}{*{20}c} {Y_{t-1} }\\ {X_{1t-1} }\\ {X_{2t-1} }\\ {X_{3t-1} }\\ {X_{4t-1} }\\ {X_{5t-1} }\\ \end{array} }}\right) \\&+\left( {{\begin{array}{*{20}c} {97.17546}\\ {3.7110951}\\ {7.627039}\\ {51.93884}\\ {23.98482}\\ {825.107}\\ \end{array} }}\right) +\left( {{\begin{array}{*{20}c} {0.27969}\\ {-0.0210704}\\ {0.038676}\\ {0.80671}\\ {0.21784}\\ {47.180}\\ \end{array} }}\right) , \end{aligned}$$

where \((97.17546\,3.7110951\,\,7.627039\,\,51.93884\,\,23.98482 825.107)^{\mathrm{T}}\) and \((0.27969\,-0.0210704\,\,0.038676\,\,0.80671\,\,0.21784\,\,47.180)^{\mathrm{T}}\) are the constant and the trend components of the proposed VAR(1) model. Similarly, the MA(3) is given as follows:

$$\begin{aligned} \text{ MA }(3):X_t&= Z_t -0.5711Z_{t-1} -0.5727Z_{t-2}\\&+\,0.9981Z_{t-3}, \end{aligned}$$

where \(\left\{ {X_t }\right\} \) is the series of the experimental yield (%) and \(\left\{ {Z_t }\right\} \) is the corresponding white noise. The white noise variance corresponding to \(\left\{ {X_t }\right\} \) can be calculated as 2.964867 whereas, the standard errors of the MA coefficients are calculated as 0.01408, 0.010068 and 0.01408. The corresponding AICC and BIC (Lutkepohl 2005) of the MA(3) model can be calculated as 91.686132 and 84.536399, respectively.

Fig. 4
figure 4

The pictorial representation of the original and the predicted values (approx.) of the oil agglomeration data set

Additionally, the \(\chi _{\mathrm{computed}}^2 =1.688<\chi _{\mathrm{tabulated}}^2 =7.05\) at 99 % confidence level shows that the proposed method is fully validated in case of the present example. Figure 4 presents the pictorial representations of the original and the predicted values of different elements of the oil agglomeration data set, using different algorithms used in the present study and also shows that the predictive accuracy of the proposed algorithm is better than all of its competitors.

4.3 An example regarding finance data forecasting

In this sub section, the proposed algorithm is applied on a real financial data set, collected from the Ministry of Statistics and Program Implementation, Govt. of India (http://mospi.nic.in/Mospi_New/upload/asi/mospi_asi_rate_list.pdf), to show its efficiency, accuracy and applicability on financial data forecasting. The frequency of the data set is only 18, which is quite small for modelling as well as prediction. With this in mind, in this case the entire data set has been used for modeling as well as prediction purpose. The detailed description of the data set is given in the aforementioned web site. Here, the no. of records (year-wise) is considered to be the main factor, however, number of schedules, users in India and the users outside India are considered as the \(1\)st, \(2\)nd and \(3\)rd secondary factors, respectively. For simplicity of calculation, the author has considered only three clusters for the \(k\)-means and the \(c\)-means clustering algorithms. Different clusters of the main and the secondary factors generated by the \(k\)-means, automatic clustering and the \(c\)-means clustering are tabulated and are shown in Table 10.

Table 10 Different clusters of the finance data generated by the aforementioned clustering algorithms
Table 11 Original and different predicted outcomes (approx.) of the finance data along with their RMSE, RMdSE and MdRAE

From Table 11 it can be found that in case of the finance data set the forecasted output by the Chen and Tanuwijaya method improves if its clustering technique (automatic clustering) is replaced by hard \(c\)-means and \(k\)-means clustering algorithm. Consequently, Table 11 shows that the RMSE of the Chen and Tanuwijaya method improves up to 8 times (approximately) in case of the \(k\)-means clustering algorithm, however, in case of the \(c\)-means clustering algorithm that is improved up to 3 times (approximately). The above discussion clearly evinces the influences of the selection of the suitable clustering algorithm on the forecasted output and corroborates well with the findings of Huarng and Yu (2006). Further, to check the quality of the clusters generated by the automatic clustering algorithm (2011), \(k\)-means clustering algorithm and the \(c\)-means clustering algorithm, shown in Table 10, the DVI indices are calculated which clearly shows that the \(k\)-means algorithm (among the aforementioned three) is the most suitable clustering technique for the finance data set (given in http://mospi.nic.in/Mospi_New/upload/asi/mospi_asi_rate_list.pdf). Hence, the above study strongly establishes that the poor selection of clustering strategy may hamper the forecasted outputs of the fuzzy time series based prediction algorithms. For example, from Table 11, it is clearly seen that the forecasted outcomes of the algorithm proposed by Chen and Tanuwijaya (2011) are improved if the automatic clustering algorithm is replaced by \(c\)-means and \(k\)-means clustering algorithms. To overcome this drawback, the proposed algorithm provides the flexibility to choose suitable clustering technique and in turn the RMSE decreases. In this case, the proposed algorithm employs the \(k\)-means clustering algorithm for the clustering purpose due to its suitability of the data set.

The existing extensively used fuzzy time series-based forecasting algorithms do not incorporate the influences of the secondary factors at the time of defuzzification of the main factor, which is, however, removed by the proposed algorithm. Consequently, Table 11 shows the superiority of the proposed algorithm in terms of RMSE over the algorithm proposed by Chen and Tanuwijaya (2011) (using the automatic clustering method, \(k\)-means and \(c\)-means), MA(3) and the VAR(1) models. The corresponding MA(1) model is given as follows:

$$\begin{aligned} x(t)=z(t)+0.2996\,z({t-1}). \end{aligned}$$

Additionally, the \(\chi _{\mathrm{computed}}^2 =24.98<\chi _{\mathrm{tabulated}}^2 =25.989\) at 90 % confidence level shows that the proposed method is fully validated in case of the present example. The proposed algorithm has been compared with the algorithms proposed by Chen and Tanuwijaya (2011) (using automatic, \(c\)-means and \(k\)-means clustering algorithms), MA(1), Holt-Winter, Box-Jenkins, (Bulut et al. 2012; Bulut 2014; Duru 2010, 2012; Chatterjee and Roy 2014a, b) and the corresponding results are given in Table 11. The pictorial representation of the above results is shown in Fig. 5. Figure 5 shows the better predictive accuracy of the proposed algorithm.

4.4 Comparison with the ‘traditional four step’ algorithm

The membership values of the ‘traditional four step’ algorithms (Aladag et al. 2008; Bulut et al. 2012; Bulut 2014; Chen et al. 2013; Chen and Tanuwijaya 2011; Chen 1996; Duru 2010, 2012; Duru and Bulut 2014; Dunn 1973; Huarng 2001a, b; Huarng and Yu 2005, 2006; Mamdani 1977; Ross 2010; Song and Chissom 1993a, b, 1994; Tanaka 1996; Tseng et al. 2001; Zadeh 1975) can only be 0, 0.5 and 1. On the other hand, the membership values in case of the proposed algorithm can be more real numbers lying in the interval [0,1] apart from 0, 0.5 and 1. It is to be remembered that most of the existing ‘traditional four step’ algorithms can only be used in case of static length intervals. Quite on the contrary, the developed algorithm is itself capable to take care of both static and variable-sized overlapping as well as non-overlapping intervals. The effects of the previous and the very next elements of a particular point can only be considered in case of almost all the existing ‘traditional four step’ algorithms. In case of the developed algorithm, the effects of all the elements, present in the data set can be considered for predicting a particular element. This feature makes the developed algorithm more flexible and also superior to the ‘traditional four step’ algorithms. Apart from this, the developed algorithm can take care of both stationary as well as non-stationary data sets, which cannot be found in case of other ‘traditional four step’ algorithms. Consequently, the predictive accuracy of the proposed algorithm increases.

From the foregoing analysis and discussion of the algorithm implementation test results, it is safely concluded that the developed algorithm is not only accurate but is also superior to the other existing algorithms.

4.5 Comparison with some other well-known, recently developed fuzzy time series-based algorithm

This sub section presents a comparative study of the proposed algorithm with some well-known recently developed fuzzy time series-based forecasting algorithms as follows:

There is a huge number of recently developed well-known fuzzy time series-based forecasting algorithms available in literature (Bulut 2014; Bulut et al. 2012; Duru 2010, 2012; Chatterjee and Roy 2014a, b). The work of Duru (2010) is suffered from the equi-spaced and fixed sized intervals. Moreover, for partitioning purpose Duru (2010) has developed his own clustering strategy that may not be able to generate best quality clusters from different types of data sets. Hence, the predictive accuracy of the algorithm has been affected as the lengths of the intervals have an influence on the predicted accuracy of the proposed algorithm (Huarng 2001a). Consequently, the proposed algorithm becomes data-dependent. Again, at the time of forecasting the main factors, the contributions of the secondary factors are not considered. Moreover, the membership values are only 0, 0.5 and 1. Again, this algorithm is not able to judge the stationarity of the data set, i.e., if the data set is non-stationary then also the prediction mechanism remains the same, which can be considered as a major drawback. Latter, Duru (2012) developed a fuzzy integrated logical forecasting model for dry bulk shipping index forecasting, in which, again the membership values have been taken only 0, 0.5 and 1. Moreover, the sizes of the intervals have been considered as fixed and the stationarity of the data set has not been checked.

In their extensive study, Bulut et al. (2012) have developed a fuzzy integrated logical forecasting (FILF) model of time charter rates in dry bulk shipping, which is mainly a vector autoregressive design of fuzzy time series with fuzzy \(c\)-means clustering algorithm. But this approach is again data-dependent as the fuzzy \(c\)-means clustering algorithm may not be able to partition all the data sets into best quality clusters. It can be verified with the help of corresponding DVI values (Dunn 1973). Again, in the latter year, Bulut (2014) has modified his previous approach by modeling seasonality using the fuzzy integrated logical forecasting (FILF) approach, which is, however, not free from all the aforementioned drawbacks.

Fig. 5
figure 5

The pictorial representation of the original and the predicted values (approx.) of the finance data set

Table 12 The occurrences of different frequently occurred error codes from 30/9/2010 to 22/10/2011 along with their positions in the respective clusters generated by the \(k\)-means clustering algorithm

In some recent studies, Chatterjee and Roy (2014a, b) have developed two novel fuzzy time series-based forecasting algorithms, which are, however, not free from certain important drawbacks in the modeling techniques. The first major drawback in the modeling technique of these algorithms is the inability to judge whether the data set is stationary or non-stationary. The non-stationary behaviors can be trends, cycles, random walks or combinations of the three. Non-stationary data, as a rule, are unpredictable and cannot be modeled or forecasted (Lutkepohl 2005). The results obtained using non-stationary time series may be spurious in that they may indicate a relationship between two variables where one does not exist (Lutkepohl 2005). As a consequence, the predictive accuracy of the algorithms (Chatterjee and Roy 2014a, b) reduces. Another major drawback in the aforementioned algorithms is that in both the cases the authors (Chatterjee and Roy 2014a, b) have developed their own clustering algorithms, which may not be able to generate best quality clusters for all types data sets (clearly shown in Table 1) and, as a consequence, the predictive accuracy of the forecasting algorithms decreases (Huarng 2001a, b). Moreover, developing own clustering algorithms, in turn, makes the corresponding fuzzy time series-based forecasting algorithms data-dependent, as all the clustering algorithms are not suitable for all type of data sets. This is because of the fact that all the clustering algorithms are not able to generate best quality clusters of all the data sets. With this in mind, in the present paper, at first the suitable clustering algorithm for the data set has been chosen and then the proposed forecasting algorithm has been applied. Again, the algorithm proposed by Chatterjee and Roy (2014b) does not incorporate the influences of the secondary factors on the main factor at the time of defuzzification, which can be considered as a severe drawback in the modeling technique. However, this drawback has been removed in the proposed algorithm.

Additionally, the algorithms proposed by Chatterjee and Roy (2014a, b) have presented both sample and post-sample period results to investigate both estimation accuracy and forecasting accuracy, respectively. However, any developed forecasting method must confirm that the post-sample period is not used for clustering the data set. If the aforementioned algorithms (Bulut 2014; Bulut et al. 2012; Duru 2010, 2012; Chatterjee and Roy 2014a, b) are suitable for forecasting the unknown future, clusters should not be estimated by using test period since it has been assumed that they are unknown future values and, as a consequence, they may not contribute to the business practice. The proposed forecasting method can remove the above-mentioned drawbacks, making it a very powerful tool for forecasting. Apart from this, the proposed algorithm has better predictive accuracy than the aforementioned all the fuzzy time series-based forecasting algorithms. It can be easily found from Table 12, 13 and 14 as in each case the proposed algorithm has the least RMSE, RMdSE and MdRAE values (Hyndman 2006). From the above study, it is quite clear that the proposed algorithm is not only capable of removing all the drawbacks of the existing fuzzy time series-based forecasting algorithms, but also, can correctly incorporate the influences of different secondary factors on the main factor. As a result, the predictive accuracy of the proposed algorithm increases and the modeling becomes more realistic.

Table 13 The original and the predicted values (approx.) of the oil agglomeration data set using the proposed and different fuzzy time series based approaches
Table 14 The original and the predicted values (approx.) of the finance data set using the proposed and different fuzzy time series-based approaches

From Tables 12, 13 and 14 it can be found that the predictive accuracy of the proposed algorithm is highest and that of the algorithm developed by Chatterjee and Roy (2014a) remains in the second position for all the data sets used in the present study.

Apart from this, some more differences (regarding the modeling technique) between the proposed algorithm and the algorithms proposed by Chatterjee and Roy (2014a, b) have been given as follows:

(i) The function defined in the prediction Rule 1 [Eq. (10)] of the proposed algorithm is more realistic than that of Chatterjee and Roy (2014a, b). This is because of the fact that, in the former case the absolute distance between the two points (main, secondary factors) has been considered. Accordingly, the rules of predictions have been modified for the main as well as the secondary factors. However, in the latter cases the number of points which have distances greater than sum_deviation is considered. Consequently, the predictive accuracy of the proposed model increases. Moreover, the algorithm developed by Chatterjee and Roy (2014b) is not able to consider the influences of different secondary factors at the time of defuzzification of the main factor.

(ii) In case of the algorithm proposed by Chatterjee and Roy (2014a) the accuracy_factor has to be chosen only based on the expert judgment as no hard and fast rule has been given by them. Hence, if the accuracy_factor \(\in (0,\left| {\frac{\mathrm{mean\_distance}}{2}} \right| ]\subset {\mathbb {R}}\) is perfectly chosen, the predictive accuracy increases, otherwise, it will decrease. But, every time it is not possible to choose correct accuracy_factor and, as a consequence, the result deteriorates. Keeping this in mind, in the present paper, the author has removed this concept. For example, the mean_distance (Chatterjee and Roy 2014a) of the web error data set, given in Table 2 is 792. Hence, by Chatterjee and Roy (2014a)

$$\begin{aligned}&0<\mathrm{accuracy\_factor}\le \left| {\frac{\mathrm{mean\_distance}}{2}} \right| ;\\&\mathrm{i.e.,}0<\mathrm{accuracy\_factor}\le \left| {\frac{792}{2}} \right| ;\\&\mathrm{i.e.,}\,0<\mathrm{accuracy\_factor}\le 396. \end{aligned}$$

Hence, accuracy_factor\(\in (0,396]\subset {\mathbb {R}}\), i.e., accuracy_factor can be any real number among the infinitely many real numbers lying between 0 and 396, which is, however, one of the most difficult tasks. Hence, accuracy_factor choosing is the biggest challenge in case of the algorithm proposed by Chatterjee and Roy (2014a). Keeping this in mind, this concept has been removed from the present forecasting algorithm.

(iii) In the modern fast and competitive world every algorithm needs both accuracy and lesser computational complexity simultaneously, i.e., faster execution. This is because of the fact that modern people or different industries will choose the algorithm having better accuracy and lesser execution time. Hence, the proposed algorithm has also been compared in the ground of computational complexity (Knuth 1973) with the algorithms proposed by Chatterjee and Roy (2014a, b).

The computational complexity of the algorithm proposed by Chatterjee and Roy (2014a) can be calculated as (\({C*\theta (s)+D*M(s)});C,D\in {{\mathbb {Z}}}^+\), except the calculation of the accuracy_factor. If it is calculated, the complexity increases heavily. This is because of the fact that every time an accuracy_factor has to be selected from the set \((0,\left| {\frac{\mathrm{mean\_distance}}{2}} \right| ]\) (having infinite number of elements, as open set has infinite elements) and the same value has been used for prediction. Continuing this process infinite number of accuracy_factors can be found along with infinite number of predicted values. The accuracy_factor corresponding to the best predicted data (having least RMSE, RMdSE and MdRAE) can be selected as the accuracy_factor. It may involve infinite number of comparisons. Hence, the above procedure increases the complexity of the algorithm greatly. Quite on the contrary, the computational complexity of the proposed algorithm (if \(k\)-means clustering algorithm has been adopted) is found to be best in partitioning the experimental data set. For this purpose, the DVI values (Dunn 1973) of the generated clusters can be compared) is at most \(({O({nkdi})+D*M(s)});n,k,d,i,D\in {{\mathbb {Z}}}^+\), where \(M(s)\) can be considered as the complexity of the chosen multiplication algorithm, when the inputs are two \(s\)-digit numbers. Again, \(n\) is the number of \(d\)-dimensional vectors, \(k\) the number of clusters and \(i\) the number of iterations needed until convergence (Knuth 1973). This will change if a new clustering algorithm has been adopted. But, still the complexity is less than the algorithm proposed by Chatterjee and Roy (2014a) as no clustering algorithm involves infinite number of comparisons.

On the other hand, the computational complexity of the algorithm proposed by Chatterjee and Roy (2014b) is \(({C*\varvec{M(s)}*\varvec{M(s)}*\theta (s)+D*M(s)});C,D\in {{\mathbb {Z}}}^+\), which is greater than the proposed algorithm. This increment is because of the adopted clustering algorithm implementing the concept of Mahalanobis distance (Chatterjee and Roy 2014b). Hence, the computational complexity of the proposed algorithm is less than the algorithm proposed by Chatterjee and Roy (2014b). However, the complexity of the algorithms proposed by Chatterjee and Roy (2014a, b) is less than the algorithm proposed by Chen and Tanuwijaya (2011).

(iv) The proposed algorithm is easier to implement than the algorithms proposed by Chatterjee and Roy (2014a, b).

4.6 Analysis of the residuals

This subsection showcases the analysis of the residuals of the proposed multivariate fuzzy forecasting algorithm. The residual of an observed value is the difference between the observed value and the estimated function value. The main motive of this analysis is to confirm that the remaining residuals do not follow any specific pattern and as a result they can be considered as the white noise (Lutkepohl 2005). Figure 6 confirms that the remaining residuals of the web error data set for the proposed algorithm do not include any pattern and, as a consequence, they can be considered as white noise. Similarly, Fig. 7 confirms that the remaining residuals of the oil agglomeration data set (given in Table 13) do not follow any pattern and hence, it can be considered as the white noise. In a similar manner, from Fig. 8, it can be found that the residuals of the finance data set for the proposed clustering algorithm can also be considered as the white noise as they do not follows any pattern.

Fig. 6
figure 6

The rescaled residuals of the web error data set

Fig. 7
figure 7

The rescaled residuals of the oil agglomeration data set

Fig. 8
figure 8

The rescaled residuals of the finance data set

5 Conclusion

The present paper demonstrates a novel multivariate fuzzy time series-based forecasting algorithm that is able to remove the drawbacks of the previously developed fuzzy time series-based techniques. Initially, the proposed algorithm can check the stationarity of the data set. If the data set is stationary, the proposed algorithm continues its different steps. Otherwise, it removes the non-stationarity of the data set and continues with the different steps of the proposed forecasting algorithm. Again, this novel algorithm can generate variable-sized clusters or intervals by applying a suitable clustering algorithm, assign more real numbers lying between 0 and 1 as the membership values of different elements, incorporate the effects of different secondary factors in the defuzzification process, which are, however, considered as the important findings arising out of this work. Moreover, the developed algorithm shows better predictive accuracy. For testing purpose, the developed algorithm was applied on three different domains, viz., oil agglomeration process for the beneficiation of the coal fines (coal washing technique), the frequently occurred web error prediction (a burning topic related to web technology) and financial data forecasting which manifests its applicability over broad domains like, the coal industries, web technology as well as finance. The real dataset related to the oil agglomeration for the beneficiation of coal fines was collected from CIMFER, Dhanbad, India (aCSIRLab, run by the Govt. of India), and that regarding the frequently occurred web error codes of www.ismdhanbad.ac.in, the official website of ISM Dhanbad, was collected from the Indian School of Mines Dhanbad, India server. However, the remaining data set was collected from the Ministry of Statistical and Program Implementation, Govt. of India. The proposed forecasting method was compared with thirteen different conventional (univariate and multivariate), e.g., VAR, MA, Holt–Winter, Box–Jenkins (Lutkepohl 2005), and fuzzy time series-based forecasting algorithms, viz., Bulut et al. (2012), Bulut (2014), Duru (2010, 2012), Chatterjee and Roy (2014a, b), Chen and Tanuwijaya (2011) (replacing its clustering algorithm with \(c\)-means and \(k\)-means techniques, respectively). Moreover, the accuracy of the proposed algorithm has also been compared with the ANN approach (Aladag et al. 2008). But in every case, the proposed algorithm proves its efficiency and better predictive accuracy. Hence, from the above study it is quite clear that the proposed algorithm can be applicable over a large domain more accurately for forecasting purpose.