Keywords

1 Introduction

Time Series is defined as a set of measurements of some phenomenon or experiment recorded sequentially in time. The first step in analyzing a time series is to plot it, this allows: to identify the trends, seasonal components and irregular variations. A classic model for a time series can be expressed as a sum or product of three components: trend, seasonality and random error term.

Time series predictions are very important because based on them we can analyze past events to know the possible behavior of futures events and thus we can take preventive or corrective decisions to help avoid unwanted circumstances.

The contribution of this chapter is the proposed approach for ensemble neural network optimization using particle swarm optimization. The proposed models are also used as a basis for statistical tests [25, 11, 12, 14, 17, 18, 2326].

The rest of the chapter is organized as follows: Sect. 2 describes the concepts of optimization, Sect. 3 describes the concepts of particle swarm optimization, Sect. 4 describes the concepts of Fuzzy Systems as Methods of integration, Sect. 5 describes the problem and the proposed method of solution, Sect. 6 describes the simulation results of the proposed method, and Sect. 7 shows the conclusions.

2 Optimization

Regarding optimization, we have the following situation in mind: there exists a search space V, and a function:

$$ g:V \to {\mathbb{R}} $$

and the problem is to find

$$ \begin{aligned} & {\mathbf{arg}}\;{\mathbf{min}}\;{\mathbf{g}}. \\ & \;\;\;v \in V \\ \end{aligned} $$

Here, V is vector of decision variables, and g is the objective function. In this case we have assumed that the problem is one of minimization, but everything we say can of course be applied mutatis mutandis to a maximization problem. Although specified here in an abstract way, this is nonetheless a problem with a huge number of real-world applications.

In many cases the search space is discrete, so that we have the class of combinatorial optimization problems (COPs). When the domain of the g function is continuous, a different approach may well be required, although even here we note that in practice, optimization problems are usually solved using a computer, so that in the final analysis the solutions are represented by strings of binary digits (bits).

There are several optimization techniques that can be applied to neural networks, some of these are: evolutionary algorithms [22], ant colony optimization [6] and Particle swarm [8].

3 Particle Swarm Optimization

The Particle Swarm Optimization algorithm maintains a swarm of particles, where each particle represents a potential solution. In analogy with evolutionary computation paradigms, a swarm is a population, while a particle is similar to an individual. In simple terms, the particles are “flown” through a multidimensional search space where the position of each particle is adjusted according to its own experience and that of their neighbors. Let \( x_{i} \left( t \right) \) denote the position of particle i in the search space at time step t unless otherwise selected, t denotes discrete time steps. The position of the particle is changed by adding a velocity, \( v_{i} \left( t \right) \) to the current position i,e.

$$ x_{i} \left( {t + 1} \right) = x_{i} (t) + v_{i} \left( {t + 1} \right) $$
(1)

with \( x_{i} (0)\sim U \left( {X_{min} , X_{max} } \right). \)

It is the velocity vector the one that drives of the optimization process, and reflects both the experimental knowledge of the particles and the information exchanged in the vicinity of particles. The experimental knowledge of a particle which is generally known as the cognitive component, which is proportional to the distance of the particle from its own best position (hereinafter, the personal best position particles) that are from the first step. Socially exchanged information is known as the social component of the velocity equation.

For the best PSO, the particle velocity is calculated as:

$$ v_{ij} (t + 1) = v_{ij} (t) + c_{1} r_{1} [y_{ij} (t) - x_{ij} (t)],\, +\, c_{2} r_{2} (t)[\hat{y}_{j} \left( t \right) - x_{ij} (t)] $$
(2)

where \( v_{ij} \left( t \right) \) is the velocity of the particle i in dimension j at time step \( {\text{t}},{\text{c}}_{1} {\text{yc}}_{2} \) are positive acceleration constants used to scale the contribution of cognitive and social skills, respectively, \( yr_{1j} \left( t \right), yr_{2j} \left( t \right)\sim U\left( {0,1} \right) \) are random values in the range [0,1].

The best personal position in the next time step t + 1 is calculated as:

$$ y_{i} (t + 1) = \left\{ {\begin{array}{*{20}l} {y_{i } \left( t \right) } & { if\;\;f\left( {x_{i} \left( {x_{i} \left( {t + 1} \right)} \right) \ge f y_{i } \left( t \right)} \right)} \\ {x_{i} \left( {t + 1 } \right)} & {if\;\;f\left( {x_{i} \left( {x_{i} \left( {t + 1} \right)} \right) > f y_{i } \left( t \right)} \right)} \\ \end{array} } \right. $$
(3)

where \( f: {\mathbb{R}}^{nx} \to {\mathbb{R}} \) is the fitness function, as with EAs, measuring fitness with the function will help find the optimal solution, for example the objective function quantifies the performance, or the quality of a particle (or solution).

The overall best position, \( \hat{y}(t) \) at time step t, is defined as:

$$ \hat{y}(t) \in \left\{ {y_{o } \left( t \right), \ldots ,y_{ns } \left( t \right)} \right\}f(y\left( {t)} \right) = \hbox{min} \left\{ {f\left( {y_{o } \left( t \right)} \right), \ldots f(y_{ns } \left( t \right)), } \right\} $$
(4)

where \( n_{s} \) is the total number of particles in the swarm. Importantly, the above equation defining and establishing \( \hat{y} \) the best position is uncovered by either of the particles so far as this is usually calculated from the best position best personal [6, 7, 10].

The overall best position may be selected from the actual swarm particles, in which case:

$$ \hat{y}(t) = \hbox{min} \left\{ {f\left( {x_{o } \left( t \right)} \right), \ldots ,f\left( {x_{ ns} \left( t \right)} \right) } , \right\} $$
(5)

4 Fuzzy Systems as Methods of Integration

Fuzzy logic was proposed for the first time in the mid-sixties at the University of California Berkeley by the brilliant engineer Lofty A. Zadeh., who proposed what it’s called the principle of incompatibility: “As the complexity of system increases, our ability to be precise instructions and build on their behavior decreases to the threshold beyond which the accuracy and meaning are mutually exclusive characteristics.” Then introduced the concept of a fuzzy set, under which lies the idea that the elements on which to build human thinking are not numbers but linguistic labels. Fuzzy logic can represent the common knowledge as a form of language that is mostly qualitative and not necessarily a quantity in a mathematical language [29].

Type-1 Fuzzy system theory was first introduced by Castillo and Melin [2] in 2008, and has been applied in many areas such as control, data mining, time series prediction, etc.

The basic structure of a fuzzy inference system consists of three conceptual components: a rule base, which contains a selection of fuzzy rules, a database (or dictionary) which defines the membership functions used in the rules, and reasoning mechanism, which performs the inference procedure (usually fuzzy reasoning) [17].

Type-2 Fuzzy systems were proposed to overcome the limitations of a type-1 FLS, the concept of type-1 fuzzy sets was extended into type-2 fuzzy sets by Zadeh in 1975. These were designed to mathematically represent the vagueness and uncertainty of linguistic problems; thereby obtaining formal tools to work with intrinsic imprecision in different type of problems; it is considered a generalization of the classic set theory. Type-2 fuzzy sets are used for modeling uncertainty and imprecision in a better way [1820].

5 Problem Statement and Proposed Method

The objective of this work is to develop a model that is based on integrating the responses of an ensemble neural network using type-1 and type-2 fuzzy systems and their optimization. Figure 1 represents the general architecture of the proposed method, where historical data, analyzing data, creation of the ensemble neural network and integrate responses of the ensemble neural network with type-2 fuzzy system integration and finally obtaining the outputs as shown. The information can be historical data, these can be images, time series, etc., in this case we show the application to time series prediction of the Dow Jones where we obtain good results with this series.

Fig. 1
figure 1

General architecture of the proposed method

Figure 2 shows a type-2 fuzzy system consisting of 5 inputs depending on the number of modules of the neural network ensemble and one output. Each input and output linguistic variable of the fuzzy system uses 2 Gaussian membership functions. The performance of the type-2 fuzzy integrators is analyzed under different levels of uncertainty to find out the best design of the membership functions and consist of 32 rules. For the type-2 fuzzy integrator using 2 membership functions, which are called low prediction and high prediction for each of the inputs and output of the fuzzy system. The membership functions are of Gaussian type, and we consider 3 sizes for the footprint uncertainty 0.3, 0.4 and 0.5 to obtain a better prediction of our time series.

Fig. 2
figure 2

Type-2 fuzzy system for the Mackey-Glass time series

In this Fig. 3 shows the possible rules of a type-2 fuzzy system.

Fig. 3
figure 3

Rules of the type-2 fuzzy inference system for the Dow Jones time series

Figure 4 represents the Particle Structure to optimize the ensemble neural network, where the parameters that are optimized are the number of modules, number of layers, number of neurons.

Fig. 4
figure 4

Particle structure to optimize the ensemble neural network

Data of the Mackey-Glass time series was generated using Eq. (6). We are using 800 points of the time series. We use 70 % of the data for the ensemble neural network trainings and 30 % to test the network.

The Mackey-Glass Equation is defined as follows:

$$ \dot{x}(t) = \frac{0.2x(t - \tau )}{{1 + x^{10} (t - \tau )}} - 0.1x(t) $$
(6)

where it is assumed x(0) = 1.2, τ = 17, τ = 34, and 68 x(t) = 0 for t < 0. Figure 5 shows a plot of the time series for these parameter values.

Fig. 5
figure 5

Mackey Glass time series

This time series is chaotic, and there is no clearly defined period. The series does not converge or diverge, and the trajectory is extremely sensitive to the initial conditions. The time series is measured in number of points, and we apply the fourth order Runge–Kutta method to find the numerical solution of the equation [12, 13].

6 Simulation Results

In this section we present the simulation results obtained with the integration of the ensemble neural network with type-2 fuzzy integration and its optimization with the genetic algorithm for the Mackey-Glass time series.

Table 1 shows the particle swarm optimization where the best prediction error is of 0.0063313.

Table 1 Particle swarm results for the ensemble neural network τ = 17

Fuzzy integration is performed initially by implementing a type-1 fuzzy system in which the best result was in the experiment of row number 5 of Table 2 with an error of: 0.1521.

Table 2 Results of type-1 fuzzy integration for τ = 17

Fuzzy integration is performed by implementing a type-1 fuzzy system in which the results were as follows: for the best evolution with a degree of uncertainty of 0.3 a forecast error of 0.1785 was obtained, and with a degree of uncertainty of 0.4 a forecast error of 0.1658 and with a degree of uncertainty of 0.5 a forecast error of 0.3134 was obtained, as shown in Table 3.

Table 3 Results of type-2 fuzzy integration for τ = 17

Table 4 shows the particle swarm optimization where the best prediction error is of 0.0019726.

Table 4 Particle swarm results for the ensemble neural network for τ = 34

Fuzzy integration is performed by implementing a type-1 fuzzy system in which the best result was in the experiment of row number 2 of Table 5 with an error of: 0.4586.

Table 5 Results of type-1 fuzzy integration for τ = 34

Fuzzy integration is performed by implementing a type-2 fuzzy system in which the results were as follows: for the best evolution with a degree of uncertainty of 0.3 a forecast error of 0.6036 was obtained, and with a degree of uncertainty of 0.4 a forecast error of 0.6524 and with a degree of uncertainty of 0.5 a forecast error of 0.3893 was obtained, as shown in Table 6.

Table 6 Results of type-2 fuzzy integration for τ = 34

Table 7 shows the particle swarm optimization where the prediction error is of 0.0019348.

Table 7 Particle swarm results for the ensemble neural network for τ = 68

Fuzzy integration is performed by implementing a type-1 fuzzy system in which the best result was in the experiment of row number 4 of Table 8 with an error of: 0.32546.

Table 8 Results of type-1 fuzzy integration for τ = 68

Fuzzy integration is also performed by implementing a type-2 fuzzy system in which the results were as follows: for the best evolution with a degree of uncertainty of 0.3 a forecast error of 0.6825 was obtained, and with a degree of uncertainty of 0.4 a forecast error of 0.7652 and with a degree of uncertainty of 0.5 a forecast error of 0.6581 was obtained, as shown in Table 9.

Table 9 Results of type-2 fuzzy integration for τ = 68

7 Conclusions

Using the technique of PSO particle we can reach the conclusion that this algorithm is good for reducing the execution time compared to other techniques such as genetic algorithms, and also architectures for ensemble neural network are small and they applied to the time series, as in this case the time series of Mackey-Glass. Also the outputs results obtained integrating the results of the neural network with type-1 and type-2 fuzzy systems and integrated type-2 the best results with type 2 are very good [1, 9, 15, 21].