Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Artificial Neural Networks

Artificial Neural Network (ANN) is a powerful tool for non-linear mapping between input and output. It is the ultimate solution when all the traditional algorithms fail. It provides a parametric expression for the system whose mathematical model for functionality is unknown. Despite successful application of neural networks to a variety of problems, they still have some limitations. One of the most common limitations is associated with neural network training. The back-propagation learning algorithm cannot guarantee an optimal solution. Back propagation has the tendency to get stuck-up at a sub-optimal point, and might never come out of it. Much research has been done to tackle this problem. Evolutionary methods can provide solution to this problem. They do so by producing multiple solutions at any one point, thus if one of them is stuck-up at suboptimal point the other might be in closer vicinity to the global optimum and will thus cause the network to reach the global optimum. In real-world applications, the back-propagation algorithm might converge to a set of sub-optimal weights from which it cannot escape. As a result, the neural network is often unable to find a desirable solution to a problem at hand. To get the benefits of both ANN and evolutionary methods researchers tried a hybrid of both these methods [33]. Another difficulty is related to selecting an optimal topology for the neural network. The right network architecture for a particular problem is often chosen by means of heuristics, and designing a neural network topology is still more of art than engineering. Genetic algorithm [25] and genetic programming [28] are effective optimization techniques that can guide both weight and topology selection.

2 Neuro-Evolution

The process of evolving various parameters of neural network is termed Neuro-evolution. Unlike backpropagation algorithm which is used to train only weights of the network to obtain the desired optimum characteristics, evolutionary techniques can train the network topology and even the learning rules. In case of evolving connection weights, we perform the following steps:

  1. 1.

    Encode the connection weights of each individual neural network into chromosomes.

  2. 2.

    Calculate the error function and determine the individual‘s fitness.

  3. 3.

    Reproduce children based on selection criteria.

  4. 4.

    Apply genetic operators.

Success or failure of an application is largely determined by the network architecture (i.e. the number of neurons and their interconnections). As the network architecture is usually decided by trial and error, a good algorithm is required to automatically design an efficient architecture for each particular application. Genetic algorithms may well be suited for this task. The basic idea behind evolving a suitable network architecture is to conduct a genetic search in a population of possible architectures. We must first choose a method of encoding a network’s architecture into a chromosome. There are two types of encoding schemes:

Direct Encoding: When all the information of the network is represented directly by genes in the code, it is referred to as Direct Encoding Scheme. In this case the network has a one to one relationship between genotype and phenotype.

Indirect Encoding: In this type of encoding the genes don’t represent the network directly, and show only the indirect function responsible for generation of network parameters [5]. This is biologically more plausible, as according to the findings of neuroscience it is impossible for genetic information to be encoded in humans to specify the whole nervous system directly. It is computationally more expensive, since it doesn’t have any clue of the targeted application for which the network is developed.

3 CGP Evolved Artificial Neural Network (CGPANN)

We have used CGP to introduce four different ways of evolving neural networks that are as follows:

  • Feed-forward CGP evolved ANN (FCGPANN)

  • Recurrent CGP evolved ANN (RCGPANN)

  • Plastic CGP evolved ANN (PCGPANN)

  • Plastic Recurrent CGPANN (PRCGPANN).

3.1 Feed-Forward CGP Evolved ANN (FCGPANN)

In the first case, CGP is transformed to a feed-forward neural network by considering each node as a neuron, and providing each connection with a weight. The neurons of such a network are arranged in Cartesian format with rows and columns inspired by original CGP architecture, and later on restricted to a single row mostly giving the network an ability to create infinite graphs/topologies. Each neuron in the network can take connection from either a previous neuron or from the system input. Not all neurons are necessarily connected with each other or with system inputs, this provides the network with an ability to continuously evolve its complexity along with the weights. All the network parameters are represented by a string of numbers called genotype. The number of active neurons (connected from inputs to outputs) varies from generation to generation subject to the genotype selection. Output of any neuron or a system input can be a candidate for the systems output selection. The ultimate system functionality is identified by interconnecting neurons from output to input. Since CGP works best with mutation, thus only mutation operator is explored in all the variants of neural networks introduced in this work. Mutation operator specifies the percentage of genes to be mutated during the process of evolution. A gene can be an input connection to a node, a weight, a switch or a neuron function. These genes for a single node are shown in Fig. 1. An FCGPANN genotype for arity 2 (two inputs per node) is represented by the following expression: \(F I_{1} W_{1} C_{1} I_{2} W_{2} C_{2}, F I_{1} W_{1} C_{1} I_{2} W_{2} C_{2} \ldots , O_{1}, O_{2} \ldots O_{n}\). Where F is the activation function chosen randomly from a list of different type of nonlinear functions. The two most popular functions, also used in this work are the Log-sigmoid and the tangent hyperbolic function, I represents output of a node or a system input, connected to the node under consideration, W is the weight being multiplied with a node input and C is an optional ON/OFF switch. All these genes can get only certain allowed values depending on the type of network. The network representation of a genotype is called a phenotype. Figure 2 shows a typical FCGPANN phenotype with its associated genotype. We have the option to evolve, all the parameters, or can fix one or two and evolve others.

Fig. 1
figure 1

A CGPANN node containing the inputs, weights, switches, summer and activation function

Fig. 2
figure 2

A CGPANN phenotype with its corresponding genotype

FCGPANN was initially tested for its speed of learning, and evaluated against the previously introduced neuro-evolutionary techniques on benchmarks such as single and double Pole balancing [19]. Table 1 shows the superior performance of FCGPANN and RCGPANN (see next section) in comparison to the other neuroevolutionary techniques evaluated for speed of learning on single pole balancing problem. Table 2 shows the performance of FCGPANN and RCGPANN compared to other techniques, for Markovian and non-Markovian cases of double pole balancing task, where a Markovian process can be defined as the one in which the conditional probability distribution of the future state depends only on the present state and not on the past history. The figures show average number of evaluations needed to achieve the target objective of balancing the poles for a specific bench marked time interval. FCGPANN is explored in a range of applications including: breast cancer detection, prediction of foreign currency exchange rates, Load forecasting, Internet multimedia traffic management, cloud resource estimation, solar irradiance prediction, wind power forecasting and arrhythmia detection [2, 13, 14, 16, 19, 24, 30]. FCGPANN outperformed all the previously introduced techniques as highlighted in the literature. Table 3 shows the comparative results for the mean accuracy in breast cancer detection with fine needle aspiration (FNA), using different algorithms. It can be seen that FCGPANN outperformed the other methods [19]. Another important area in which FCGPANN was successfully applied is the prediction of foreign currency exchange rates [24]. Table 4 shows the comparative results for prediction of foreign currency exchange rates using FCGPANN and other methods.

Table 1 Comparison of CGPANN with other neuro-evolutionary algorithms in terms of average number of evaluations required to solve the single pole balancing task
Table 2 Comparison of CGPANN with other neuro-evolutionary algorithms in terms of average number of evaluations required to solve the double pole balancing task
Table 3 Comparison of Mean Absolute Percentage Errors (MAPE) obtained using various classification methods using the processed FNA data from WDBC that contains 30 features
Table 4 Comparison between the MAPE of other well known methods and that of FCGPANN, for predicting foreign exchange rates

Marketing a product requires good knowledge about the demands of customers, especially in the case of food products. FCGPANN provides efficient method to predict market trends [1].

3.2 Recurrent CGPANN (RCGPANN)

The second type of CGPANN is the Recurrent CGPANN (RCGPANN). These networks are more suitable for modeling systems that are dynamic and nonlinear. This network is a modification to one of the earliest networks, the Jordan’s network [11].

Fig. 3
figure 3

A typical RCGPANN neuron

In the Jordan’s network there are state inputs that are equal in number to the outputs. These inputs are fed by the outputs through unit weights. The state inputs are present only at the input layer. Learning of these networks take place by changing the weights of connections between input layer and the hidden layer and, the hidden and the output layer. In RCGPANN unlike the Jordan’s network the state inputs can be connected, not necessarily to the first layer but to any layer. These additional inputs also have the activation functions. Figures 3 and 4 show a typical RCGPANN neuron, and the RCGPANN genotype and phenotype respectively. Here \(I_{1}\) and \(I_{2}\) are the normal inputs while R is a state input. Initial value of the R input to the system is considered zero. Output is taken from Node 6 as evident from genotype and the corresponding phenotype, and node 6 takes input from node 3 and input \(I_{2}\) only. Node 4 and 5 do not contribute to the output and are termed inactive nodes, while 3 and 6 are active nodes as they contribute to the output. Following are the outputs of active nodes:

$$ \psi _3=tanh(I_{1}\cdot W_{13}+I_2\cdot W_{23}+R\cdot W_{R3})$$
$$ \psi _6=tanh(\psi _3\cdot W_{36}+I_2\cdot W_{26}+R\cdot W_{R6}) $$

where \(I_{1}\) and \(I_{2}\) are the normal inputs to the system, R is the state input, \( W_{mn}\) is the weight of connection between system-input/node m and n, \(\psi _{m}\) is the output of node m. Figure 5 shows the case when all the outputs are presented as feedback inputs to the neurons for selection. The output is averaged to find the desired output of the system. RCGPANN was also tested initially for its speed of learning similar to FCGPANN on both single and double pole balancing for both markovian and nonmorkovian cases. Its performance relative to other neuroevolutionary techniques for Markovian and non-Markovian cases explored on single pole balancing task is shown in Table 1 and that on double pole is shown in Table 2. The numbers in the tables represent the required average number of evaluations to find the desired pole balancing behaviour. The results in these two tables clearly show the superiority of FCGPANN and RCGPANN in Markovian case and that of RCGPANN in the non-Markovian case.

Fig. 4
figure 4

Figure showing a RCGPANN genotype, b 2\(\,\times \,\)2 RCGPANN Network and c corresponding to the phenotype

Fig. 5
figure 5

RCGPANN phenotype with full feedback having all outputs available for feedback

FCGPANN and RCGPANN are also tested for their generalization ability. Table 5 shows generalization of FCGPANN and RCGPANN: average number of random cart initializations (out of 625) that can be balanced for a desired number (benchmark) of time-steps. Table 6 presents the generalization ability of various neuroevolutionary algorithms for the double pole balancing scenario for non-Markovian case using the damping fitness function. It is observed that the RCGPANN scored 335.84 out of 625 for 50 independently evolved genotypes exhibiting greater generalization ability as compared to other techniques presented to date. Recurrent CGPANN has been successfully applied to a number of applications including: Load forecasting, foreign currency exchange rates, bandwidth management and estimation [16, 17, 31]. RCGPANN has been successfully applied to electrical load prediction for a complete year and also for different seasons of the year, enabling efficient utilization of electricity management [14]. Table 7 compares the performance of RCGPANN with other contemporary methods in terms of Mean Absolute Percentage Error (MAPE), for the load forecasting task. Bandwidth allocation for communication channels has always been a challenge for the engineers. Recurrent CGPANN has been successfully applied to predict the size of next MPEG4 video frame based on the estimate of the last ten frames [16]. Table 8 shows the comparative results of next frame prediction in terms of overall error for RCGPANN and other models. The electric load prediction discussed earlier was improved to predict very short term (about half an hour) load [18]. Table 9 shows the performance in terms of MAPE values, for the proposed RCGPANN in comparison to other methods, for predicting very short term electric load. Table 10 shows the comparative results for predicting the foreign currency exchange rate using RCGPANN and other models [31].

Table 5 Generalization of CGPANN: average number of random cart initializations (out of 625) that can be solved
Table 6 Comparison of generalization of CGPANN with other neuroevolutionary algorithms for the double pole balancing scenario for Non-Markovian case
Table 7 Comparison of RCGPANN with other methods for the load forecasting task
Table 8 Best overall error comparison in MPEG4 frame size prediction
Table 9 Comparison of RCGPANN with other methods in terms of MAPE for very short time electric load forecasting
Table 10 Comparison between the accuracy distribution rates and MAPE of RCGPANN and other models for the foreign currency exchange rate prediction

3.3 Plastic CGPANN (PCGPANN)

Plasticity in neural networks has been the characteristic of choice when it comes to applications in dynamic systems due to its comparatively better performance [4, 26, 32]. The improved performance in Plastic neural networks can be attributed to the adaptability of its morphology to environmental stimuli. This is similar to the natural neural system. In this developmental form of CGPANN an additional output gene provides extra features to the system. PCGPANN has the same basic structure as that of CGPANN presented previously. With the addition of an extra output gene, that causes developmental decisions during evaluation process, the CGPANN achieves its plasticity. The decision is made on the basis of output values. The mutation of the genotype is invoked according to a decision function that decides either to invoke mutation or not. The decision function in this case is a threshold function which invokes mutation when the value of output of the CGPANN is higher than the threshold value. The threshold value is selected based on the performance of the model. In this way an unlimited number of phenotypes might be generated from a single genotype, depending on the system requirement. The network is tested for its performance in diverse learning domains. The genotypes are modified based on the defined set of constraints. The plasticity invokes mutation of genotype at runtime, modifying its genes, producing complex network structures.

Fig. 6
figure 6

A Generalized approach of PCGPANN depicting the network topology, attributes: inputs, connections, outputs, the additional output gene and the algorithm

3.3.1 PCGPANN Methodology

The PCGPANN generalized approach is demonstrated in Fig. 6. The figure depicts network topology, its parameters i.e. inputs, connections, outputs, the additional output gene and the algorithm of PCGPANN. The system consists of the original CGPANN as explained before. Output from this network is fed into the running sum block. Based on the evolutionary requirements, the number of outputs fed into the summation block can vary. Either half or full number of CGPANN outputs are fed into the summation block. The network is thus named Full Feedback (FFB) or Half Feedback (HFB) network, based on its architecture. Output of the summation block is fed to a decision function that generates either a 0 or a 1 so as to invoke the mutation or not. Depending on the value of the summation block the decision function either invokes mutation or leaves the genotype unchanged. The network is unique in its ability to invoke mutation during run time. The mutation may take place randomly in any of the following network genes: node inputs, weights, outputs or activation functions. The genes are mutated under the given set of constraints.

Constraints:

  1. 1.

    If the gene that has to be mutated is the function of a node, then a new function is randomly selected from a list of functions and assigned to the node.

  2. 2.

    If a weight gene has to be mutated, then it is assigned a random value between \(-1\) and \(+1\).

  3. 3.

    If the gene to be mutated happens to be a node input, then depending on the levels-back parameter, output of any randomly selected node on the left of the node under consideration or a network input is connected to it.

  4. 4.

    If an output gene has to be mutated, then it is connected to the output of a randomly selected node or a system input.

In each iteration genes are mutated, the number of mutations depending on runtime mutation rate (Developmental index). The possibility of change in genotype at runtime is dependent on the output from activation function. The decision function is maintained in accordance to the expected range of output from the activation function.

3.3.2 Development in PCGPANN

The plasticity in PCGPANN is based on the decision function. Once developmental index is invoked by the function, the network initiate a step by step process of development under the given set of aforementioned constraints. An example of various possible steps of development in PCGPANN are illustrated in Fig. 7. A change may be invoked in either the input of a node, a function or a weight gene as shown in the figure. The process is highlighted both in the genotype and the phenotype. Figure 7a shows the initial genotype and phenotype. There are two node functions, a sigmoid and a tangent hyperbolic, in the initial genotype. These are the available functions that can be randomly assigned to a node when mutation takes place. The \(\psi _{0}\)s and \(\psi _{1}\)s are network inputs and \(\psi _{2}\) is the output of the first node and \(\psi _{3}\) is the system output, while the weights have values in the range \([-1,1]\). The mathematical representation of the initial network is given in Eq. 1.

$$\begin{aligned} \begin{aligned} \psi 3=Sig\big [0.6\psi 1+0.2Tanh(0.2\psi 0+0.7\psi 1)\big ] \end{aligned} \end{aligned}$$
(1)

In the first case, the function of first node i.e. Tanh is transformed to sigmoid function. The updated genotype and phenotype are presented in Fig. 7b. The mathematical expression for the updated genotype is given in Eq. 2.

$$\begin{aligned} \begin{aligned} \psi 3=Sig\big [0.6\psi 1+0.2Sig(0.2\psi 0+0.7\psi 1)\big ] \end{aligned} \end{aligned}$$
(2)
Fig. 7
figure 7

Demonstration of real time development in PCGPANN

In the second case, the weight changes from 0.1 to 0.7 i.e. at the first input of the second node as shown in Fig. 7c. The mathematical expression for the updated genotype is given by Eq. 3.

$$\begin{aligned} \begin{aligned} \psi 3=Sig\big [0.6\psi 1+0.7Sig(0.2\psi 0+0.7\psi 1)\big ] \end{aligned} \end{aligned}$$
(3)

In Fig. 7d, the structure is modified when an input to the neuron is altered i.e. the first input to the second node is disconnected from node 1 and connected to a system input. The network attains final expression as given by Eq. 4.

$$\begin{aligned} \begin{aligned} \psi 3=Sig\big [0.6\psi 1+0.7\psi 0\big ] \end{aligned} \end{aligned}$$
(4)

Similar to FCGPANN and RCGPANN, PCGPANN is also first evaluated for its learning ability and then the generalization of performing in an unknown environment. Table 11 shows the comparative results for PCGPANN and other algorithms, used for single and double pole balancing tasks. The performance is shown in terms of average number of network evaluations [20]. Table 12 shows generalization of the PCGPANN genotypes in comparison to FCGPANN for both the single and double pole balancing scenarios. Plastic CGPANN has also been successfully applied to evolve a dynamic and robust computational model for efficiently predicting daily foreign currency exchange rates in advance based on past data [15]. Table 13 shows the comparative results of foreign currency exchange rate prediction using PCGPANN and other contemporary methods.

Table 11 Comparison of PCGPANN with other neuroevolutionary algorithms applied on single and double pole balancing task: average number of network evaluations
Table 12 Average number of random cart-pole initializations (out of 625) that can be solved
Table 13 Comparison of PCGPANN with other ANNS for the prediction of foreign currency exchange rates

3.4 Plastic Recurrent Cartesian Genetic Programming Evolved Artificial Neural Network (PRCGPANN)

Plastic Recurrent Cartesian Genetic Programming Evolved Artificial Neural Network is an online learning approach that incorporates developmental plasticity in Recurrent Neural Networks. Recurrent Neural Networks can process arbitrary sequences of inputs due to their ability to access internal memory. In a Plastic RCGPANN the output gene not only forms the system output but also plays a role in the developmental decision. Output of the system is applied to a decision function to invoke development of the network. Development in the phenotype takes place with the mutation of the genotype in runtime. The recurrent CGPANN has a feedback mechanism in the network, that feeds one or more outputs back to the system. The general approach of PCGPRNN is depicted in Fig. 8. The figure shows the inputs, outputs, connections, recurrent inputs and the output gene that invokes development in the network. The initial network is the original representation of the genotype that changes in response to the output of the system with the passage of time. The decision regarding the development is the reflection of output of the system fed into the sum block and the recurrent paths taken from the CGPANN block associated with the weights, summation and sigmoid function as shown in the figure. If the value obtained from the function is less than defined decision value, development is invoked in the network or otherwise the outputs are monitored without any modification. The uniqueness aspect of the approach is that network changes take place in real time according to the data flow in the network and this provides the plasticity feature to the network. PRCGPANNs invoke changes in the network by changing (or mutating) a node function, an input, a weight or by switching an input in the real time. The learning rules for development in the PCGPRNN are achieved during the process of evolution. A special case is presented here to describe the developmental mechanism in PRCGPANN at run time as shown in Figs. 9, 10 and 11. The system has three inputs (\(I_{0}, I_{1}\) and \(I_2\)), two recurrent inputs (\(R_0\) and \(R_1\)), five outputs (\(Y_{0}, Y_{1}, Y_{2}, Y_{3}\) and \(Y_{4}\)), weights (\(w_{0}, w_{1}, w_{2}, w_{3},....., w_{11}\)) and the plastic feedback (PF).

Fig. 8
figure 8

Structural view of PRCGPANN

Fig. 9
figure 9

Original PRCGPANN phenotype

Figure 9 shows the original phenotype having the following genotype:

$$\begin{aligned} \text{ Genotype } =\,&f_{0}, I_{0}, w_{0}, I_{1}, w_{1}, I_{2}, w_{2}, f_{1},I_{3}, w_{5}, R_{1}, w_{4}, I_{2}, w_{6}, \\&f_{2}, I_{3}, w_{7}, R_{1}, w_{4}, R_{0}, w_{8}, f_{3}, I_{3}, w_{10}, I_{4}, w_{11}, I_{0}, w_{9}, I_{3}, I_{1}, I_{4}, I_{5}, I_{6} \end{aligned}$$
Fig. 10
figure 10

Mutation in the function of the network

Figure 10 illustrates the change in function as a result of change in function gene as highlighted in the genotype below:

$$\begin{aligned} \text{ Genotype } =\,&f_{0}, I_{0}, w_{0}, I_{1}, w_{1}, I_{2}, w_{2}, f_{1},I_{3}, w_{5}, R_{1}, w_{4}, I_{2}, w_{6},\\&f_{0}, I_{3}, w_{7}, R_{1}, w_{4}, R_{0}, w_{8}, f_{3}, I_{3}, w_{10}, I_{4}, w_{11}, I_{0}, w_{9}, I_{3}, I_{1}, I_{4}, I_{5}, I_{6} \end{aligned}$$
Fig. 11
figure 11

Mutation in the output gene

Figure 11 shows the change in the output connectivity at runtime respectively with corresponding change in gene highlighted below:

$$\begin{aligned} \text{ Genotype } =\,&f_{0}, I_{0}, w_{0}, I_{1}, w_{1}, I_{2}, w_{2}, f_{1}, I_{3}, w_{5}, R_{1}, w_{4}, I_{2}, w_{6},\\&f_{0}, I_{3}, w_{7}, R_{1}, w_{4}, R_{0}, w_{8}, f_{3}, I_{3}, w_{10}, I_{4}, w_{11}, I_{0}, w_{9}, I_{0}, I_{1}, I_{4}, I_{5}, I_{6} \end{aligned}$$

The various cases of genotype and phenotype diagrams above demonstrate the possible run time modification to the recurrent network, thus adding not only the signal feedback but also structural feedback through modification in the system architecture and topology at runtime. This is a very interesting system, because it can transform to feedforward and feedback structure from time to time during the processing, thus giving a unique ability to the system. This architecture is yet to be explored on various applications.

4 Concluding Remarks

This chapter provides a detailed overview of how CGP is used to evolve artificial neural network by finding the proper set of weights and topology for the network. CGP based ANN provides an ideal platform to all Markovian and non-Markovian, Linear and non-linear problems that are static or dynamic/plastic. They can help finding the unknown mathematical model for the problem at hand. The CGPANN model not only helps in selection of topology and optimum weights for ANNs, but also helps in identifying the best possible features to be selected amongst many provided to the network and ignoring the unwanted noise. Various models of CGPANNs are tested in diverse fields of application for its speed of learning, robustness, and accuracy. Comparison with other algorithms on same set of problems show encouraging results.