Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

In this chapter, we describe the state of the art of the computational intelligence techniques, which we use as a basis for this work.

2.1 Time Series

Time series is a set of measurements of some phenomenon or experiment sequentially recorded over time. These observations will be denoted by \( \left\{ {{\text{x}}\left( {{\text{t}}_{ 1} } \right) , {\text{x}}\left( {{\text{t}}_{ 2} } \right) , {\text{ \ldots ,x}}\left( {{\text{t}}_{\text{n}} } \right)} \right\}{ = }\left\{ {{\text{x(t):t }} \in T \, \subseteq {\text{R}}} \right\} \) con \( x\left( {t_{i} } \right) \) the value of the variable x in the time \( t_{i} \). If \( T = Z \) is said that the time series is discrete and if \( T = R \) is said that the time series is continuous [1, 2].

A classic model for a time series, assumes that a \( x_{\left( 1 \right)} , \ldots ,x_{\left( n \right)} \) series can be expressed as the sum or product of its components: trend, cyclical, seasonal and irregular [3]. There are three time series models, which are generally accepted as good approximations to the true relationships between the components of the observed data. These are:

  • Additive.

    $$ X\left( t \right) = T\left( t \right) + C\left( t \right) + S\left( t \right) + I\left( t \right) $$
    (2.1)
  • Multiplicative.

    $$ X\left( t \right) = T\left( t \right) \cdot C\left( t \right) \cdot S\left( t \right) \cdot I\left( t \right) $$
    (2.2)
  • Mixed.

    $$ X\left( t \right) = T\left( t \right) \cdot S\left( t \right) \cdot I\left( t \right) $$
    (2.3)

where:

\( X\left( t \right) \) :

observed series in time t

\( T\left( t \right) \) :

trend component

\( C\left( t \right) \) :

cyclic component

\( S\left( t \right) \) :

seasonal component

\( I\left( t \right) \) :

irregular or random component

A common assumption is that \( I\left( t \right) \) is a random or white noise component with zero mean and constant variance. An additive model, is suitable, for example, when \( S\left( t \right) \) does not depend on other components such as \( T\left( t \right) \), if instead the seasonality varies with the trend, the most suitable model is a multiplicative model. It is clear that the multiplicative model can be transformed into additive, by taking logarithms. The problem that arises is to adequately model the components of the series.

Temporal phenomena are both complex and important in many real-world problems. Their importance stems from the fact that almost every kind of data contains time-dependent components, either explicitly coming in the form of time values or implicitly in the way that the data is collected from a process that varies with time [4]. A time series is an important class of complex data objects [5] and comes as a sequence of real numbers, each number representing a value reported at a certain time instant [6]. The popular statistical model of Box-Jenkins [7] is considered to be one of the most common choices for the prediction of time series. However, since the Box-Jenkins models are linear and most real world applications involve nonlinear problems, it is difficult for the Box-Jenkins models to capture the phenomenon of nonlinear time series and this brings a limitation to the accuracy of the generated predictions [8].

2.2 Interval Type-2 Fuzzy Neural Network

One way to build on IT2FNN is by fuzzifying a conventional neural network (NN). Each part of a NN (the activation function, the weights, and the inputs and outputs) can be fuzzified. A fuzzy neuron is basically similar to an artificial neuron, except that it has the ability to process fuzzy information.

The IT2FNN system is one kind of IT2-TSK-FIS inside a NN structure. An IT2FNN is proposed by Castro et al. in [9], with TSK reasoning and processing elements called IT2FN for defining antecedents , and the IT1FN for defining the consequents of rules R k.

An IT2FN is composed by two adaptive nodes represented by squares, and two non-adaptive nodes represented by circles. Adaptive nodes have outputs that depend on their inputs, modifiable parameters and transference function while non-adaptive, on the contrary, depend solely on their inputs, and their outputs represent lower \( \underline{\mu }_{A} \left( x \right) \) and upper \( \overline{\mu }_{A} \left( x \right) \) membership functions . Parameters from adaptive nodes with uncertain standard deviation are denoted by \( \left( {w \in \left[ {w_{1,1} ,w_{2,1} } \right]} \right) \) and with uncertain mean by \( \left( {b \in \left[ {b_{1} ,b_{2} } \right]} \right) \). The IT2FN (Fig. 2.1) with crisp input signals \( \left( x \right) \), crisp synaptic weights \( \left( {w,b} \right) \) and type-1 fuzzy outputs \( \mu \left( {net_{1} } \right),\mu \left( {net_{2} } \right),\underline{\mu } \left( x \right),\overline{\mu } \left( x \right) \). This kind of neuron is build from two conventional neurons with transference functions \( \mu \left( {net_{1} } \right) \), Gaussian , generalized bell and logistic for fuzzifier the inputs. Each neuron equation is defined as follows: the function μ is often referred to as an activation (or transfer) function. Its domain is the set of activation values, net, of the neuron model; we thus often use this function as \( \mu \left( {net_{2} } \right) \). The variable net is defined as a scalar product of the weight and the vectors:

Fig. 2.1
figure 1

Interval type-2 fuzzy neuron (IT2FN)

$$ \begin{aligned} net_{1} = w_{1,1} + b_{1} ;\mu_{1} = \mu \left( {net_{1} } \right); \hfill \\ net_{2} = w_{2,1} + b_{1} ;\mu_{2} = \mu \left( {net_{2} } \right) \hfill \\ \end{aligned} $$
(2.4)

The non-adaptive t-norm node (T) evaluates the lower membership function \( \overline{\mu } \left( x \right) \) under t-norm algebraic product, while s-norm non-adaptive node (S), evaluate the upper membership function \( \underline{\mu } \left( x \right) \) under the s-norm algebraic sum, as shown in Eq. (2.5):

$$ \begin{aligned} \underline{\mu } \left( x \right) & = \mu \left( {net_{1} } \right) \cdot \mu \left( {net_{2} } \right), \\ \overline{\mu } \left( x \right) & = \mu \left( {net_{1} } \right) + \mu \left( {net_{2} } \right) - \underline{\mu } \left( x \right) \\ \end{aligned} $$
(2.5)

Each IT2FN adapts an interval type-2 fuzzy set [10, 11], \( \tilde{A} \), expressed in terms of the output \( \underline{\mu } \left( x \right) \), of type-1 fuzzy neuron with T-norm and \( \overline{\mu } \left( x \right) \) of type-1 fuzzy neuron with S-norm. An internal type-2 fuzzy set is denoted as:

$$ \widetilde{A} = \int {_{x \in X} } \left[ {\int {_{{\mu \left( x \right) \in \left[ {\overline{\mu } \left( x \right),\underline{\mu } \left( x \right)} \right]}} 1/\mu } } \right]/x $$
(2.6)

An IT1FN (Fig. 2.2) is built from two conventional adaptive linear neurons (ADALINE) [12] for adapting the consequents \( y_{k}^{j} \in \left[ {{}_{l}y_{k \cdot r}^{j} y_{k}^{j} } \right] \) from the rules \( R_{k} \), for the output defined by

Fig. 2.2
figure 2

Interval type-1 fuzzy neuron

$$ \begin{aligned} {}_{l}y_{k}^{j} & = \sum\limits_{1 = 1}^{n} {C_{k,i}^{j} } x_{i} + C_{k,0}^{j} - \sum\limits_{1 = 1}^{n} {S_{k,i}^{j} } \left| {x_{i} } \right| - S_{k,0}^{j} , \\ {}_{r}y_{k}^{j} & = \sum\limits_{1 = 1}^{n} {C_{k,i}^{j} } x_{i} + C_{k,0}^{j} + \sum\limits_{1 = 1}^{n} {S_{k,i}^{j} } \left| {x_{i} } \right| + S_{k,0}^{j} . \\ \end{aligned} $$
(2.7)

Thus consequents can be adapted with linear networks. The network weights are the parameters of consequents \( C_{k,i}^{j} ,S_{k,i}^{j} \) for the kth rule. The outputs represents interval linear MFs of the rule’s consequents (Fig. 2.3).

Fig. 2.3
figure 3

Interval type-2 fuzzy neural network

2.3 Ensemble Learning

The Ensemble consists of a learning paradigm where multiple component learners are trained for a same task, and the prediction of the component learners are combined for dealing with future instances [13]. Since an Ensemble is often more accurate than its component learners, such a paradigm has become a hot topic in recent years and has already been successfully applied to optical character recognition, face recognition, scientific image analysis, medical diagnosis and time series [14].

In general, a neural network ensemble is constructed in two steps, i.e. training a number of component neural networks and then combining the component predictions.

There are also many other approaches for training the component neural networks. Some examples are as follows. Hampshire and Waibel [15, 16] utilize different objective functions to train distinct component neural networks. Cherkauer [17] trains component networks with different number of hidden layers. Maclin and Shavlik [18] initialize component networks at different points in the weight space. Krogh and Vedelsby [19, 20] employ cross-validation to create component networks. Opitz and Shavlik [21, 22] exploit a genetic algorithm to train diverse knowledge based component networks. Yao and Liu [23] regard all the individuals in an evolved population of neural networks as component networks [24].

2.4 Interval Type-2 Fuzzy Systems

Type-2 fuzzy sets are used to model uncertainty and imprecision; originally they were proposed by Zadeh [25, 26] and they are essentially “fuzzy–fuzzy” sets in which the membership degrees are type-1 fuzzy sets (Fig. 2.4).

Fig. 2.4
figure 4

Structure of the interval type-2 fuzzy logic system

The structure of a type-2 fuzzy system implements a nonlinear mapping of on input to on output space. This mapping is achieved through a set of type-2 if-then fuzzy rules, each of which describes the local behavior of the mapping.

The uncertainty is represented by a region called footprint of uncertainty (FOU). When \( \mu_{{\widetilde{A}}} \left( {x,u} \right) = 1,\forall u \in l_{x} \subseteq \left[ {0,1} \right] \) we have an interval type-2 membership function [27,28,29,30] (Fig. 2.5).

Fig. 2.5
figure 5

Interval type-2 membership function

The uniform shading for the FOU represents the entire interval type-2 fuzzy set and it can be described in terms of an upper membership function \( \overline{\mu }_{{\widetilde{A}}} \left( x \right) \) and a lower membership function \( \underline{\mu }_{{\widetilde{A}}} \left( x \right) \).

A fuzzy logic system (FLS) described using at least one type-2 fuzzy set is called a type-2 FLS. Type-1 FLSs are unable to directly handle rule uncertainties, because they use type-1 fuzzy sets that are certain [9, 31]. On the other hand, type-2 FLSs are very useful in circumstances where it is difficult to determine an exact certainty value, and there are measurement uncertainties.

2.5 Genetic Algorithms

Genetic algorithms (GAs) are adaptive methods that can be used to solve search and optimization problems. They are based on the genetic process of living organisms. Over generations, the populations evolve in nature in accordance with the principles of natural selection and survival of the strongest, postulated by Darwin. By imitating this process, genetic algorithms are able to create solutions to real world problems. The evolution of these solutions towards optimal values of the problem depends largely on proper coding them. The basic principles of genetic algorithms were established by Holland [32, 33] and are well described in the works of Goldberg [34,35,36], Davis [37] and Michalewicz [38]. The large field of applications of GA is related to those problems for which there are no specialized techniques. Even if such technical exist and work well, improvements can be made with the same hybrid genetic algorithms .

A GA is a highly parallel mathematical algorithm that transforms a set (population) of individual mathematical objects (typically strings of fixed length which fit the model chains chromosomes ), each of which is associated with a fitness in a new population (e.g. the next generation) operations using models according to the principle Darwinian reproduction and survival of the fittest and after having naturally presented a series of genetic operations [39].

To apply the genetic algorithm requires the following five basic components:

  1. 1.

    Representation of the potential solutions to the problem.

  2. 2.

    One way to create an initial population of possible solutions (usually a random process).

  3. 3.

    An evaluation function to play the role of the environment, classifying solutions in terms of their “fitness”.

  4. 4.

    Genetic operators that alter the composition of the children that will occur for the next generation.

  5. 5.

    Values for the different parameters using the genetic algorithm (population size, crossover probability, mutation probability, maximum number of generations, etc.).

The basic operations of a genetic algorithm [40] are as follows illustrated in Fig. 2.6:

Fig. 2.6
figure 6

Steps of the genetic algorithm

  • Step 1: Represent the problem variable domain as a chromosome of a fixed length; choose the size of a chromosome population N, the crossover probability (pc) and the mutation probability (pm).

  • Step 2: Define a fitness function to measure the performance, or fitness, of an individual chromosome in the problem domain. The fitness function establishes the basis for selecting chromosomes that will be mated during reproduction.

  • Step 3: Randomly generate an initial population of chromosomes of size \( N:x_{1} ,x_{2} , \ldots ,x_{N} \)

  • Step 4: Calculate the fitness of each individual chromosome : \( f\left( {x_{1} } \right),f\left( {x_{2} } \right), \ldots ,f\left( {x_{N} } \right). \)

  • Step 5: Select a pair of chromosomes for mating from the current population. Parent chromosomes are selected with a probability related to their fitness.

  • Step 6: Create a pair of offspring chromosomes by applying the genetic operators- crossover and mutation.

  • Step 7: Place the created offspring chromosomes in the new population.

  • Step 8: Repeat Step 5 until the size of the new chromosome population becomes equal to the size of the initial population, N.

  • Step 9: Replace the initial (parent) chromosome population with the new (offspring) population.

  • Step 10: Go to Step 4, and repeat the process until the termination criterion is satisfied.

2.6 Particle Swarm Optimization

Particle Swarm Optimization (PSO) is a bio-inspired optimization method proposed by Eberhart and Kennedy [41,42,43] in 1995. PSO is as metaheuristic search technique based on a population of particles. The main idea of PSO comes from the social behavior of schools of fish and flocks of birds [44, 45]. In PSO each particle moves in a D-dimensional space based on its own past experience and those of other particles [46, 47]. Each particle has a position and a velocity represented by the vectors \( x_{i} = \left( {x_{i,1} ,x_{i,2} , \ldots ,x_{i,D} } \right) \) and \( v_{i} = \left( {v_{i,1} ,v_{i,2} , \ldots ,v_{i,D} } \right) \) for the i-th particle. At each iteration, particles are compared with each other to find the best particle [48, 49]. Each particle records its best position as \( v_{i} = \left( {v_{i,1} ,v_{i,2} , \ldots ,v_{i,D} } \right) \). The best position of all particles in the swarm is called the global best, and is represented as \( G = \left( {G_{1} ,G_{2} , \ldots ,G_{D} } \right) \). The velocity of each particle is given by Eq. (2.13).

$$ v_{id} = wv_{id} + C_{1} \cdot rand ( )\cdot \left( {pbest_{id} - x_{id} } \right) + C_{2} \cdot rand ( )\cdot \left( {gbest - x_{id} } \right) $$
(2.13)

In this equation \( i = 1,2, \ldots ,M,d = 1,2, \ldots \;D,C_{1} \) and \( C_{2} \) are positive constants (known as acceleration constants), \( rand_{1} ( ) \) and \( rand_{2} ( ) \) are random numbers in [0, 1], and w, introduced by Shi and Eberhart [50] is the inertia weight. The new position of the particle is determined by Eq. (2.14):

$$ x_{id} = x_{id} + v_{id} $$
(2.14)

The basic functionally of the PSO is illustrated as follows (Fig. 2.7).

Fig. 2.7
figure 7

Flowchart of the PSO algorithm