1 Introduction

Fuzzy time series procedures do not require the assumptions, such as a large sample, linear model, and stationary and normal distribution, typically needed by classic time series approaches such as autoregressive moving average models [1]. Recent studies have mainly focused on fuzzy time series since they do not require the strict assumptions and generally provide remarkable forecasting performance. Besides, artificial intelligence optimization algorithms such as genetic algorithm (GA) have been used in almost all areas [218].

The fuzzy set was first introduced in [19] and since then has been used in many application areas. Fuzzy time series were first introduced by Song and Chissom [2022]. The proposed fuzzy time series techniques generally consist of three stages: fuzzification, determining fuzzy relations, and defuzzification.

According to the literature, decomposition of the universe of discourse is mostly used in the fuzzification stage with intervals thereof determined arbitrarily in the studies by Song and Chissom [2022] and Chen [23, 24]. In addition, Huarng [25] suggested the importance of the interval length on forecasting performance and proposed two new techniques for finding intervals based on the mean and distribution.

Egrioglu et al. [26, 27] suggested considering the problem of finding intervals as an optimization problem. The authors in [2830] used different interval lengths, instead of a fixed interval length, obtained by using a genetic algorithm, while those in [3137] used particle swarm optimization. Furthermore, the authors in [3840] used fuzzy c-means clustering, while those in [41, 42] used Gustafson-Kessel fuzzy clustering in this stage.

With respect to determining fuzzy relations, several studies have made contributions. Whereas Song and Chissom [2022] used matrix operations, Chen [23] and some others used a fuzzy logic relations group (FLRG) table, while yet others [4347] used artificial neural networks to determine fuzzy relations. Various other methods focusing on this stage have also been proposed [4446, 4850].

For the defuzzification process, most reported studies use either the centroid method [23, 25, 51] or the adaptive expectation method [38, 50].

The proposed method contributes to the fuzzification stage by using a modified genetic algorithm (MGA). In particular, we aim to check the negative effects of the mutation operation so as to obtain better forecasts. Moreover, by using MGA we avoid subjective judgments in determining the intervals, and to obtain more realistic results, we find dynamic instead of fixed length intervals.

An outline of the rest of the paper is given below. The fundamental definitions of fuzzy time series are given in Section 2. The third section of the article reviews GAs, while a concise explanation of the differential evaluation algorithm (DEA) is given in Section 4. In Section 5, the proposed method is introduced. Section 6 presents the results of applying the proposed method to three real life datasets and finally, Section 7 presents our conclusions.

2 Fuzzy time series

A definition of fuzzy time series was first introduced in [20, 21]. A number of studies in the literature have used fuzzy time series together with GAs. These include a method to optimize fuzzy time series using GAs [52], an efficient non-linear time series prediction system using a GA and fuzzy time series [53], a forecasting model using a GA with fuzzy time series [54], and a time invariant fuzzy time series forecasting method based on a GA [55]. The authors in [30, 54] also used a GA in the fuzzification stage, which is the first stage of fuzzy time series.

In contrast to conventional time series methods, various theoretical assumptions do not need to be checked in fuzzy time series approaches. The most important advantage of the fuzzy time series approaches is their ability to work with a very small set of data.

Let U be the universe of discourse, where U = {u 1, u 2, ⋯ , u n }. A fuzzy set A i of U can be defined as

$$A_i =\frac{\mu_{A_i } ( {u_1 })}{u_1 }+\frac{\mu_{A_i } ( {u_2 } )}{u_2 }+\cdots +\frac{\mu_{A_i } ( {u_n } )}{u_n }, $$
(1)

where μ A i is the membership function of fuzzy set A i and μ A i ; U → [0, 1]. In addition, μ A i (u j ), j = 1, 2, ⋯ , n denotes a generic element of fuzzy set A i , andμ A i (u j ) is the degree of belongingness of u j to A i , withμ A i (u j )∈[0, 1].

Definition 1

Fuzzy time series

Let Y(t)(t = ⋯ , 0, 1, 2, ⋯), a subset of real numbers, be the universe of discourse by which fuzzy sets f i (t) are defined. If F(t) is a collection of f 1(t), f 2(t), ⋯, then F(t) is called a fuzzy time series defined on Y(t).

Definition 2

Fuzzy time series relationships

Assume that F(t) is caused only by F(t − 1); then the relationship can be expressed as F(t) = F(t − 1)∗R(t, t − 1), which is the fuzzy relationship between F(t) and F(t − 1), where * denotes an operator. To summarize, let F(t − 1) = A i and F(t) = A j . The fuzzy logical relationship between F(t) and F(t − 1) can be denoted as A i A j , where A i (current state) refers to the left-hand side and A j (next state) refers to the right-hand side of the fuzzy logical relationship. Furthermore, these fuzzy logical relationships can be grouped to establish different fuzzy relationships.

3 Genetic algorithms

GAs were proposed by Holland [56] and developed by Goldberg [57]. A GA is a stochastic global search technique that solves problems by imitating processes observed during natural evolution. The GA procedure is a simulation that depends on biological evolution behavior. The first decision to make when solving a problem using a GA is the encoding. There are many types of encoding in GAs including binary, value, and permutation encoding. First, based on the population size an initial population is randomly generated after determining the encoding and GA operators, namely, crossover, mutation and natural selection, to be applied to the population. For the crossover operation, the researcher determines the crossover rate (cor). Then, a random number is generated with the help of uniform distribution U (0, 1). If cor is greater than or equal to the random number, the crossover operation is performed by randomly choosing both a pair of chromosomes and a crossover point and then swapping the genes of the selectedchromosomes.

After the crossover operation, the mutation operator is applied if needed. First, a chromosome is randomly selected and then a mutation point is determined. A gene of the selected chromosome is changed by considering the encoding. Thereafter, the researcher determines the mutation rate (mr) and a random number is generated with the help of uniform distribution U (0, 1). If this number is smaller than or equal to mr, the mutation operation is performed. In the natural selection operation, each chromosome of the population is evaluated using an evaluation function. All chromosomes are ordered according to their corresponding evaluation function values and the best chromosomes are transferred to the next generation. Some of the worst chromosomes are discarded from the population. The natural selection operation may be used at either the beginning or the end of the algorithm.

4 Differential evaluation algorithm

The DEA, proposed by Storn and Price [58], is a heuristic algorithm based on the population like the GA. It has some operators including mutation and crossover, which are used to create new generations. At the end of the process, candidate solutions are found by using some mathematical operations and these solutions are compared with the current solutions in the population. The best solution based on the evaluation function is transferred to the new generation, while the best chromosome is taken as the optimal solution. For more details of this algorithm, consult the study in [58].

5 Proposed method

As is well known, all the stages of fuzzy time series approaches have a marked influence on the forecasting performance of the applied model. Recently, artificial intelligence algorithms have frequently been used in these stages. GAs are the most popular of these algorithms because of their operators and rapid evaluation process. The crossover and mutation operators prevent achieving sub-optimal solutions. The most important GA operator is mutation, which ensures that solutions are found in as yet unsearched areas.

In classic GAs, the only approach to counteract the loss of relevant alleles, thereby avoiding premature convergence, is mutation. However, as mutation is used as an undirected operator, the probability to recover relevant alleles by lucky mutations decreases rapidly when trying to solve larger problems. In addition to mutation, various other ideas to reduce the negative effects of premature convergence have also been discussed in the literature. Among these, the most common are preselection [59], crowding [60], and fitness-sharing [57]. The main idea of these approaches is to maintain genetic diversity by replacing solutions occupying similar regions of the search space more frequently (preselection, crowding) or to reduce the fitness value of solutions that are located in densely populated regions of the search space (fitness-sharing). All three approaches require the definition of a distance measure to calculate the similarity of solutions in the search space, while fitness-sharing is additionally quite restricted to fitness proportional selection. As such, these approaches are not applicable in all cases. Furthermore, they do not really address the problem of losing relevant alleles, but instead try to reduce the loss of genetic diversity in general [61].

Chromosomes represent the solution sets in GAs. When applying the mutation operator in a GA, a chromosome is first randomly selected and then a random gene of this selected chromosome is changed. However, the changed gene may be a gene that provides a positive contribution to the solution set. In other words, this gene may be a useful gene for the chromosome. Therefore, changing this gene may have a negative contribution. To avoid changing useful genes of the chromosomes, and thus eliminating a good solution, we propose a new mutation algorithm for fuzzy time series.

As explained above, applying the mutation operator can be both beneficial and harmful in the GA process. It may alter a harmful gene of a chromosome, but it could also alter a reliable gene thereby moving further from the optimal solution. Thus, to control the effect of the mutation operator we propose a new GA called MGA. In this method, the mutation operator of the GA is inspired by the DEA.

In this study, by combining two population-based artificial intelligence techniques in the fuzzification stage in the form of MGA, we aim to improve the forecasting performance of the model and to determine the interval lengths without resorting to subjective decisions. From this perspective we feel that a more flexible solution process is provided by obtaining dynamic instead of fixed length intervals.

The advantages of our proposed method are as follows:

  • ✓ Negative consequences caused by the mutation operator are prevented.

  • ✓ More realistic results are obtained by finding dynamic length intervals instead of fixed length ones.

  • ✓ The interval lengths are determined by avoiding subjective decisions through the use of the MGA.

The modified genetic algorithm

  1. Step 1.

    Define the universe of discourse and number of intervals.

    The margins of the universe of discourse and the actual universe are first defined as (D min ) and (D max ) and After the number of intervals (m) has been determined, the number of genes in a chromosome in MGA is defined as (m − 1). These are represented by x i (i = 1, 2, ⋯m − 1); the structure of a chromosome in MGA is illustrated in Fig. 1.

    Then the margins of all intervals are given by

    $$u_1 = [{{D}_{\textit{min} } , x_1 } ], u_m = [ {{x}_{m-1} , D_{\textit{max}}} ].$$
    (2)
  2. Step 2.

    Determine the parameters of MGA and generate the initial population.

    This step determines the values of the parameters such as the number of chromosomes (cn), cor (0 < c o r ≤ 1), the number of chromosomes to be eliminated in the natural selection stage (dcn), and the number of iterations (itr).

    Then, the initial population with (m − 1) genes and cn chromosomes is randomly generated from the uniform distribution with parameters (D min , D max ). The generated data for each chromosome is ordered from smallest to largest.

  3. Step 3.

    For each chromosome in the population, the root of the mean squared error (RMSE) selected as the evaluation function is calculated by applying steps 3.1 to 3.5.

  4. Step 3.1.

    Form fuzzy sets using m intervals based on the values of the genes in the chromosomes as follows:

    $$A_i =\frac{a_{i1} }{u_1 }+\frac{a_{i2} }{u_2 }+\cdots +\frac{a_{im} }{u_m } i=1, 2, \cdots m, $$
    (3)

    where a i k is the membership degree as defined in Eq. (4).

    $$a_{ik} =\left\{\begin{array}{ll} 1 & { k=i} \\ {0.5} & {, k=i-1, i+1, i=1, 2, \cdots m} \\ 0 & {, d.d} \\ \end{array}\right.$$
    (4)

    The observations of a crisp time series are converted to fuzzy sets where the interval in which the corresponding observation is included, has the highest membership values.

  5. Step 3.2.

    Obtain the FLRs and FLRG tables.

    When we observe a relation such as F(t − 1) = A i and F(t) = A j for any time t, this fuzzy logic relation (FLR) is represented by A i A j . For the whole series, if we obtain the relation F(t − 1) = A i and F(t) = A k for any time t, we express the FLR as A i A j , A k . We can also save the number of times an FLR such as A i A j occurs, as the weight w j .

  6. Step 3.3.

    Obtain fuzzy forecasts.

    Fuzzy forecasts are obtained with respect to the FLR table formed in Step 3.2. For example; if F(t − 1) = A i and there is a relation such as A i A j in the FLR table, the fuzzy forecast will be A j . If F(t − 1) = A i and there is a relation such as A i A j , A k in the FLR table, the fuzzy forecast will be A j , A k . If F(t − 1) = A i and A i E m p t y in the FLR table, the fuzzy forecast will be A i .

  7. Step 3.4.

    Defuzzify the fuzzy forecasts.

    The weights w j obtained from the FLR table are used in the defuzzification stage.

    For example, if F(t − 1) = A i and there exists relation A i A j in the table, the defuzzified forecast will be m j , which is the midpoint of u j , which is the subinterval of fuzzy set A j with the largest membership degree. That is, we do not consider how many times the relation is repeated in the table.

    If F(t − 1) = A i and there exists relation A i A j , A k and w j denotes the number of times relation A i A j is repeated in the whole time series and w k denotes the number of times relation A i A k is repeated, the defuzzified forecast is calculated as

    $$\Hat{{x}}_t =\frac{w_j m_j +w_k m_k }{w_j +w_k }.$$
    (5)

    If F(t − 1) = A i and there exists relation A i E m p t y in the FLR table, the defuzzified forecast will be m i , which is the midpoint of subinterval u i , which is the fuzzy set A i with the largest membership degree.

  8. Step 3.5.

    Let x t be the original time series and \(\Hat {{x}}_t\) be its defuzzyfied forecasts with n observations. RMSE is calculated as

    $$RMSE=\sqrt{\frac{\sum_{t=1}^n \left( {x_t -\Hat{{x}}_t } \right)^2}{n}}.$$
    (6)
  1. Step 4.

    Save the chromosome with the minimum RMSE value in the population.

  2. Step 5.

    Apply the MGA parameters.

Natural selection, crossover, and mutation operations are applied.

  1. Step 5.1.

    Initially, the first dcn chromosomes in the population are removed from the list of the ordered RMSE values from largest to smallest. Then, dcn new chromosomes, generated randomly as in Step 2, are added to the population.

  2. Step 5.2.

    To determine whether the crossover operation should be applied, a number is randomly generated from the uniform distribution with parameters (0, 1). If the number is smaller than cor, the crossover operation is applied. When applying the crossover operation two chromosomes are randomly chosen from the population. The crossover point at which the genes are swapped is also randomly determined. The crossover operation is illustrated in Fig. 2.

    To better understand the crossover operation, consider the example in Fig. 3 with the universe of discourse U = 13000, 20000 and assuming that two random chromosomes in the population have been chosen.

    If there is a problem after crossover, such as the first chromosome in Fig. 4 not being sorted, a repair operator must be used to sort the chromosomes in ascending order

  3. Step 5.3.

    After applying the crossover operator, mutation is performed. First, the chromosome to be mutated is selected randomly. Then three chromosomes, differing from the chromosome to be mutated, are selected randomly. The first two chromosomes are subtracted from each other to form the difference vector. Then the difference vector is multiplied by parameter F (this parameter takes values between 0 and 2, but is generally set to 0.8). This new chromosome, called the weighted difference vector is summed with the third chromosome to obtain the total vector. Next, the mutation rate (mr) is determined and a random number is generated between 0 and 1 with the help of the uniform distribution.

Fig. 1
figure 1

The structure of a chromosome in MGA

Fig. 2
figure 2

The crossover operation between two chromosomes

Fig. 3
figure 3

Example of the crossover operation

Fig. 4
figure 4

The chromosomes after applying the repair operator

If this random number is smaller than mr, the gene is taken from the total vector. If not, the gene is taken from the interested chromosome and a candidate chromosome is generated and its fitness value calculated.

The candidate chromosome and interested chromosome are compared in terms of fitness values. The chromosome with the smaller RMSEvalue, used to evaluate the function, is transferred to the population. The mutation operation is illustrated in Fig. 5.

Fig. 5
figure 5

Example of the mutation operation

Assume a population with seven chromosomes and universe of discourse U = [13000, 2000] as given in Table 1.

Table 1 Example of a random population

To better understand the mutation operation, consider the example given in Fig. 5 assuming that Chromosome 2 is the chromosome to be mutated and Chromosomes 6, 7, and 1 are the chromosomes randomly selected in addition to Chromosome 2.

figure a

Let the mutation ratio be 0.10, and then we generate random numbers for each gene, (0.01, 0.08, 0.15, 0.20, 0.16), respectively, to obtain the candidate chromosome.

Finally, we calculate the RMSE value of the candidate chromosome. If its RMSE value is smaller than that of Chromosome 2, the new chromosome is taken as the candidate chromosome and the mutation operation is completed. If not, there is no need to apply the mutation operation to the population, because this mutated chromosome may be the optimal solution at the end of the process.

Then we can obtain the intervals given below:

$${\begin{array}{ll} {u_1 =\left[ {13000, 13560} \right], u_2 =\left[ {13560, 14840} \right], } \\ {u_3 =\left[ {14840, 15200} \right], u_4 =\left[ {15200, 15600} \right], } \\ {u_5 =\left[ {15600, 16000} \right], u_6 =\left[ {16000, 17100} \right], } \\ {u_7 =\left[ {17100, 20000} \right]} \\ \end{array} }$$
  1. Step 6.

    Steps 3 to 5 are repeated as many times as defined by the iteration number, which was previously determined by the researcher.

6 Application

To verify the performance of the proposed method, we applied it to three different time series datasets. The results are compared with those from other fuzzy time series methods in terms of RMSE and mean absolute percent error (MAPE).

For each of the time series datasets, MGA parameters were set as follows:

  • cn was varied between 20 and 100 in increments of 10.

  • corwas varied between 0.1 and 1 in increments of 0.1.

  • For each chromosome, dcn was set to 7, 10, 13, 17, 20, 23, 26, 30, and 33.

  • For all possible cases derived from the above, MGA was executed 100 times in MATLAB.

  • m was varied from 5 to 20.

At the end of the process, 1440 different solutions were obtained. The parameters (m, cn, cor, dcn) with the smallest RMSE value were taken as the optimal solution of all these solutions.

6.1 Enrollment data

The performance of the proposed method was evaluated separately for test and training datasets of enrollment data. First all data were used as the training set like almost all studies in the literature and then the last three observations of the enrollment data were used for the test set. Results obtained by the proposed method were compared with those of other studies, clearly showing that our proposed method has superior forecasting performance.

As the first experiment, the proposed method was applied to the training dataset. The best result was obtained with m = 17, c n = 50, c o r = 0.01, and d c n = 17. Table 2 presents a comparison of the forecasts together with RMSE and MAPE values obtained from the proposed method and various other methods from the literature. The results from both belong to the best case. The graph of the forecasts of proposed method and real data is given in Fig. 6.

Table 2 Comparative presentation of enrollment forecasts for training set
Fig. 6
figure 6

The forecasts and actual values of enrollment data for training set

Next, a comparison of enrollment forecasts in terms of RMSE and MAPE values for various methods is given in Table 3.

Table 3 Comparison of enrollment forecasts for training set in terms of RMSE and MAPE

Finally, a comparison of enrollment forecasts in terms of RMSE values for various methods is given in Table 4.

Table 4 Comparison of enrollment forecasts for training set in terms of RMSE

As the second experiment, the proposed method was applied to the test dataset. The best result was obtained with m = 12, c o r = 60, m r = 0.05, and d c n = 20. Table 5 presents a comparison of all the RMSE values obtained from the proposed method and various other methods (Fig. 6).

Table 5 Comparison of the results for test set

In addition, the results obtained from conventional time series forecasting methods are given in Table 6.

Table 6 Results obtained from conventional time series forecasting methods

6.2 Taifex data

In the second case, the proposed method was applied to TAIFEX data, with observations between 03.08.1998 and 30.09.1998. The last 16 observations were used as the test dataset. The best result was obtained with m = 11, cn = 60, cor = 0.05, and dcn = 20. Table 7 gives a comparison of the forecasts as well as RMSE and MAPE values obtained from the proposed method and various other methods from the literature. The results from both belong to the best case.

Table 7 Comparison of the results for test set

6.3 Data from vehicle road accidents in Belgium

In the third case, the proposed method was applied to time series data for ‘killed in car road accidents in Belgium’. The best result was obtained with m = 18, cn = 70, cor = 0.05, and dcn = 23. Table 8 shows a comparison of all the results, including forecasts and RMSE and MAPE values, obtained from the proposed method and various other methods. The results from both belong to the best case.

Table 8 Comparison of “killed” forecasts by various methods and the actual observations

7 Conclusion

Researchers have recently found that there are many factors affecting forecasting performance of various fuzzy time series approaches. Although the GA is a popular optimization algorithm owing to its process, the mutation operator can cause unexpected solutions in the population. Thus, we proposed a modified genetic algorithm to avoid any unexpected solutions.

In this study, we used the MGA in the fuzzification stage, combining two-population based algorithms, DEA and GA, to control the harmful effects of the mutation operator. By avoiding the harmful effects of the mutation operation, superior forecasting performance is obtained with the proposed method.

We expect that in future studies researchers will concentrate on different optimization techniques in the defuzzification stage.