Keywords

1 Introduction

Intelligent models that are composed of artificial neural networks and concepts of fuzzy systems have great utility for classification, regression or prediction of time series. The fuzzy neural networks use the structure of an artificial neural network, where classical artificial neurons are replaced by fuzzy neurons [16]. It has as its relevance factor its transparency, allowing the use of information a priori to define the initial structure of the network and the extraction of relevant information from the resulting topology. Thus, the neural network is seen as a linguistic system with level of interpretation, preserving the learning capacity of RNA. Its fuzzy neurons are composed of triangular norms, which generalize the union and intersection operations of classical clusters to the theory of fuzzy sets.

Examples of fuzzy neurons include neurons, and, or, nullneurons and unineuron where their differences are based on the way in which inputs and weights are aggregated [8]. The use of extreme learning machine theory (ELM) [9] has been used for the training of fuzzy neural networks as in [3, 14, 18, 20]. This motivation is mainly due to the performance and low computational cost presented by algorithms based on ELMs, however, most of these cases were used in pattern classification techniques. Already [12, 17, 21, 22] uses fuzzy neural networks to work with time series prediction.

This paper proposes to use the learning methodology for fuzzy neural networks of the feed forward type with a layer of fuzzification, a hidden layer composed of fuzzy neurons, and a linear output layer. The algorithm is able to generate consistent and accurate models by aggregating interpretation to the resulting structure. The learning of the model is carried out through extreme learning machine concepts, but a regularization term is added in the cost function which, together with a resampling technique, is able to perform the selection of the best neurons in internal layers, generating parsimonious models. Initially, fuzzy sets with equally spaced membership functions are defined for each input variable in the fuzzification layer. Subsequently an initial set of fuzzy candidate neurons is generated. From this initial set of neurons, the bootstrap lasso algorithm [2], is used to define the network topology, selecting a subset of the significant fuzzy neurons. Finally, the least squares algorithm is used to estimate the weights of the network output layer. This technique was used to classify binary patterns using unineuron [18] and andneuron [20], but due to the characteristics present in the ELM, the model’s ability to predict time series was verified. Through all the steps the fuzzy neural network is able to act effectively in the forecast of time series. In this paper, we performed tests of time series prediction in the Box and Jenkins gas furnace [5] and the proposed model was compared to other algorithms of fuzzy neural networks for the same purpose widely used in the literature.

The remainder of the paper is organized as follows. Section 2 presents the theoretical concepts related to fuzzy neural networks and neural logic neurons. Section 3 describes the methodology used to train fuzzy neural networks. In Sect. 4 results of numerical experiments are presented. Finally, Sect. 5 presents the conclusions.

2 Fuzzy Neural Networks

2.1 Artificial Neural Networks and Fuzzy Systems

In [23] the authors defines an artificial neural network composed of an input layer, one or more hidden layers and an output layer. The network can be completely connected where each neuron is connected to all the neurons of the next layer, partially connected where each neuron is, or locally connected where there is a partial connection oriented to each type of functionality. To perform the training of a neural network, a set of data is required that contains patterns for training and desired outputs. In this way, the problem of neural network training is summarized in an optimization problem in which we want to find the best set of weights that minimizes the mean square error calculated between the network outputs and the desired outputs. The fuzzy systems are based on fuzzy logic, developed by [24]. His work was motivated because of the wide variety of vague and uncertain information in making human decisions. Some problems can be not solved with classical Boolean logic. In some situations only two values are insufficient to solve a problem.

2.2 Neural Logic Neurons

Among the several studies performed to simulate the behavior of the human neuron, we highlight those who sought to add fuzzy nature to the artificial neuron model, adding the ability to treat inaccurate information. This neuron is called fuzzy neuron [8]. This paper deals with a class of neurons called fuzzy logic neurons [16]. These neurons perform a mapping in the space formed by the Cartesian product between the input space and the space of the weights in the unit interval, i.e. X × W → [0, 1] [8]. Examples of such neurons are neurons and and or [16] and unineuron [16, 14].

The logical or neuron uses a t-norm in the weighting operation and an s-norm in the final aggregation [8]. Given an input vector \( \varvec{x} = \left[ {x_{1} , \, x_{2} ,\,\ldots\,x_{n} } \right] \) and a vector of weights of neuron w = [w1, w2, … wn] for ai ∈ [0, 1] and wi ∈ [0, 1] for i of 1, …, n. The output of logical neuron or is described as [16]:

$$ {z = OR(}x,w) = S_{i = 1}^{n} (x_{i} tw_{i} ) $$
(1)

where t is t-norms and S is s-norms. Figure 1 presents the structure of an OR-type neuron

Fig. 1.
figure 1

Orneuron architecture

2.3 Fuzzy Neural Networks

The fuzzy neural networks have several characteristics from which we can distinguish the models through their basic properties, such as the way the network is connected, the type of fuzzy neurons used, the type of learning (training) and the way the inputs are handled in the first layers of the models. In this models, each layer is responsible for a specific function or task. Usually the first layer is responsible for handling the inputs and the last to bring the network response. Between these two layers there are other intermediate, which can be hidden or not. Depending on the model and what it is proposed, each layer has a specific function. Evaluating the type of training for fuzzy neural networks we can highlight that these algorithms are a set of well-defined rules for solving learning problems. These training methodologies seek to simulate human learning by learning or updating their new concepts, mainly by updating network factors, such as synaptic weights.

As the use of the extreme learning machine can be verified as a faster and more efficient alternative to adjust the parameters of a fuzzy neural network, it was proposed ways of modifying methodologies that act on these models, either the way of updating the parameters or the way of granulize the input in the model. In [4] a new methodologies to train fuzzy neural networks based also on extreme learning machine concepts, creating a model which they called XUninet. To train their fuzzy network, [17] used the extreme learning machine where weights of the hidden layer of a neural network are randomly chosen. To find the weights of the output layer we use the technique of recursive weighted least squares. For their algorithm, they defined the weights in the hidden layer and the identity elements of the uninorm [19] between zero and one. These values are updated recursively in training. Finally, [20] uses the ELM and to train the parameters of its network after the regularization method select the most representative neurons to the problem. In this context pattern classification techniques are used.

3 Fuzzy Neural Networks for Time Series Forecasting

3.1 Fuzzy Neural Networks Architecture

The fuzzy neural networks discussed below use the two types of neural logic neurons described in the previous subsection. The logical neurons that make up the network are described in (1). The structure of the network is illustrated in Fig. 2 in which the z-neurons are orneurons.

Fig. 2.
figure 2

Fuzzy neural network.

The architecture of the model used in this article presents its structure in [20], however changes are necessary so that the network can act as a model capable of predicting a time series. The first layer is defined as a fuzzification layer and is composed of neurons whose activation functions are membership functions of the respective fuzzy sets used in the partition of the input variables. For each input variable xij are defined M fuzzy sets \( \text{A}_{\text{j}}^{{^{\text{m}} }} \), for m varying from 1… M. The outputs of the first layer are the degrees of membership associated with the input values, that is, \( {\text{a}}_{\text{jm}} = \upmu_{{{\text{A}}_{\text{j}}^{\text{m}} }} \) for j = 1, .., N at = 1, …, M, where N is the number of inputs and M is the number of fuzzy sets for each input variable. The second layer is composed of L fuzzy neurons of the orneuron type. Each neuron performs a weighted aggregation of some outputs of the first layer. The fuzzy logic neurons perform the aggregation using the wil weights (for i = 1, …, N, and l = 1, …, L). The strategy of creation of the fuzzy neurons uses the grid partition defined by the ANFIS model [10]. Finally, the output layer is composed of a single linear neuron. In [15, 20] used the output of the neuron adapted for pattern recognition, transforming their final responses into −1 or 1. For time series problems, we consider the following neuron:

$$ {y = (}\sum\limits_{j = 0}^{L} {z_{j} v_{j} )} $$
(2)

where z0 = 1, v0 is bias, and zj and vj, j = 1, …, l are the output of each fuzzy neuron of the second layer and their corresponding weight, respectively. Fuzzy rules can be extracted from the network topology. To see how the fuzzy rules are generated, see [15, 20].

The ELM [9] is a learning algorithm developed for hidden layer feedforward neural networks (SLNFs) where random values are assigned to the weights of the first layer and the weights of the output layer are estimated analytically. In [15, 20] defined a training model for the fuzzy neural network where the parameters of the neurons are randomly assigned and the output parameters are calculated through least squares. The difference between them is the approach in creating the fuzzy rules of the first layer performed directly the amount of input data [20] and using the grid to divide the input space [15], in addition to [20] the neuron used to be the andneuron and in [20] there is also the use of unineuron [15].

This paper will use the same partitioning technique proposed in [15] where it will use equally spaced membership functions for each input variable to define the fuzzification layer neurons and the use of a smoothing technique to define the topology of the hidden layer. The model is able to generate parsimonious models, selecting more relevant neurons within the context of the problem. From the resulting model it is possible to extract a set of fuzzy rules.

The learning algorithm initially defines the neurons of the first layer through the partition of each interval of each input variable into M fuzzy sets with equally spaced Gaussian membership functions with its center at 0.5. Then, a strategy of partitioning the input space by a grid [10] is used to define an initial set of candidate neurons. The initial number of neurons in the hidden layer is defined as MN, that is, for each possible combination of the membership functions of each input, a neuron is generated and its inputs are defined. The weights associated to the neuron inputs are randomly defined in the interval [0, 1], similarly to the ELMs. This approach to defining the network topology facilitates the interpretability of the extracted rules. For example, if three fuzzy sets are used per input variable (M = 3), each fuzzy set can be interpreted as “Small”, “Medium” and “Large”. Figure 3 presents the Gaussian relevance functions proposed for problem solving.

Fig. 3.
figure 3

Gaussian membership functions.

The final architecture of the network is defined by a feature extraction technique based on L1 regularization and resampling, called Bolasso [2]. The learning algorithm assumes that the output hidden layer composed of the candidate neurons can be written as:

$$ {f(x_{i} ) = }\sum\limits_{l = 0}^{L} {v_{l} z_{l} (x_{i} ) = z(x_{i} )v} $$
(3)

where \( \varvec{v}\, = \,\left[ {v_{0} , \, v_{1} , \, v_{2} , \ldots ,v_{L} } \right] \) is the weight vector of the output layer and \( {\mathbf{z}}\;\left( {x_{i} } \right)\, = \,[z_{0} , \, z_{1} \;(x_{i} ),\;z_{2} \;(x_{i} )] \) the output vector of the second layer, for z0 = 1. In this context, \( {\mathbf{z}}\;\left( {x_{i} } \right) \) is considered as the non-linear mapping of the input space for a space of fuzzy characteristics of dimension L. Since the weights connecting the first two layers are randomly assigned, the only parameters to be estimated are the weights of the output layer. Thus, the problem of network parameter estimation can be seen as a simple linear regression problem, allowing the use of regression techniques [7] for estimating parameters and selecting candidate neurons. The regression algorithm used by the model proposed by [15] for high-dimensional data estimating the regression coefficients and the subset of candidate regressors to be included in the final model is the LARS [6]. When we evaluate a set of K distinct samples \( (x_{i} , \, y_{i} ) \), where \( x_{i} = \left[ {x_{i1} ,x_{i2} , \ldots ,x_{iN} } \right] \) \( \upvarepsilon\;{\mathbb{R}} \) and \( y_{i} \;\upvarepsilon\;{\mathbb{R}} \) for i = 1, …, K, the cost function of this regression algorithm can be defined as:

$$ \sum\limits_{i = 1}^{K} {\left\| {z(x_{i} )v - y_{i} } \right\|}_{2} + \lambda \left\| v \right\|_{1} $$
(4)

where λ is a regularization parameter of L1 norm, commonly estimated via cross-validation [15].

The LARS algorithm is used to perform the model selection, since, for a given value of λ only a fraction (or none) of the regressors have corresponding weights other than zero. For the problem considered in this work, the regressors zls are the outputs of the significant neurons. Thus, the LARS algorithm can be used to select an optimal subset of the significant neurons (Ls) that minimize (4) for a given value of λ. The approach used to increase the stability of the model selection algorithm is the use of resampling. This procedure developed by [2] is defined as a bootstrap-enhanced least absolute shrinkage operator where LARS algorithm runs on several bootstrap replications of the training data set. For each repetition, a distinct subset of the regressors is selected. The regressors to be included in the final model are defined according to the frequency with which each of them is selected through different tests. A consensus threshold is defined, say γ = 60%, and a regressor is included, if selected in at least 60% of the assays. Finally, after the definition of the network topology, the calculations of the estimation of the vector of weights of the output layer are performed. In this paper, this vector is estimated by the Moore-Penrose pseudo Inverse:

$$ v = Z^{ + } y $$
(5)

where Z+ is pseudo-inverse of Moore-Penrose of Z which is the minimum norm of the solution of the least squares for the weights of the exit. The learning process can be synthesized as demonstrated in Algorithm 1. It has three parameters:

  • the number of fuzzy sets that will partition the input space, M;

  • the number of bootstrap replications, b;

  • the consensus threshold, γ.

figure a

4 Tests and Experiments

The learning model of normalized fuzzy neural networks was evaluated through numerical experiments of time series prediction. A time series, x (t), can be defined as a function of an independent time t variable, tied to a process in which a mathematical description is considered to be unknown. Its most relevant feature is that its future behavior can not be predicted exactly, as can be predicted from a deterministic function, known at t. However, the behavior of a time series can sometimes be anticipated through stochastic procedures. The database used is the time series of Box-Jenkings (Gas-Furnace). The gas furnace of Box and Jenkins [5] consists of a furnace where uk is the feed rate of methane gas (cubic feet per minute) and the output yk is the concentration of carbon dioxide (% CO2) in a gas mixture. A set of 296 samples (pairs of input and output data) is available for identification. The normalized data set represents the concentration of CO2, and k, from the values yk1 and uk4. See more in [12]. The studies in [5] state that a suitable model to act on this data set is in the form of:

$$ y^{{{\prime }k}} = f\left( {y^{k - 1} , u^{k - 4} } \right) $$
(6)

Figure 4 shows the input data of the experiment and the output data of the base used in the experiments.

Fig. 4.
figure 4

Sample of the gas furnace input and output data.

The experiment was set up to use 200 samples for the neural network training phase and 96 samples for the validation phase of the model for this time series. All samples were normalized with mean zero and unit variance. In all experiments, the test assumptions defined in [15] were considered, as well as Gaussian activation functions. The performance of the proposed model was evaluated using the Root Mean Square Error (RMSE). The RMSE was calculated in the same way as in [13].

$$ RMSE = \left( {\frac{1}{N} \sum\nolimits_{k = 1}^{n} {\left( { y^{k} - y^{{{\prime }k}} } \right)} } \right)^{{\frac{1}{2}}} $$
(7)

In the tests carried out using Matlab, we tried to verify the ability of the learning model proposed in this paper to improve the structure of the network through the method of definition of the proposed structure, in addition to verifying that the method has the capacity to work in solving problems of time series. Table 1 shows the rules used (best value in the test), the best value of RMSE obtained in the test and the average value of RMSE for the model used in this work, besides presenting a comparison with the results of fuzzy neural networks commonly employed in time series problems. For a version where it is desired to evaluate the best network, RMSE can be considered as the value to be considered for analysis, but the mean value of RMSE can help in the definition of a more stable model. To avoid that the choice of the parameters of the models used interfere in the accuracy of the final training of the models, each algorithm was executed 30 times and the RMSE mean values were the indices used for the comparison.

Table 1. Performance evaluation of the algorithms.

The algorithm proposed in this paper was compared with other efficient methodologies to solve time series problems that are widely used in the literature. R-ORNEURON is considered the network formed by logical neurons composed by neurons or. The other fuzzy neural network models used were the DENFIS, proposed by [11], the FbeM, proposed by [12], the XUninet, developed by [4], eRFH, proposed by [17], the model of [13] based on uninorms, called in this paper of UN-RNN and also proposed by [14] FL-RNN which deals with a rapid learning approach, the model eTS, developed by [1]. Table 1 summarizes the results obtained.

The results of Table 1 allow an analysis that the proposed model uses a smaller number of rules to solve the problem and presents better RMSE to the models that are traditionally used in the literature to solve time series problems. Figure 5 shows the result obtained by the OrNeuron model in the final validation of the results.

Fig. 5.
figure 5

Validation of the model.

5 Conclusion

This paper presents a new way to use regularized fuzzy neural networks based on the concepts of extreme learning machine to act in the forecast of time series. The method presented highlighted numerical results compared to the models of fuzzy neural networks to act in time series prediction, in addition to using a smaller number of fuzzy rules to solve the problem. The experiments performed and the results suggest that the network is able to act as a model capable of solving time series, presenting consistent results and close to the results obtained by models commonly used for this purpose in the literature. Future actions can be taken so that the model is submitted to other types of time series models and to regression problems and their results with statistical tests.