Efficiency improvement of genetic network programming by tasks decomposition in different types of environments

Roshanzamir, Mohamad; Palhang, Maziar; Mirzaei, Abdolreza

doi:10.1007/s10710-021-09402-y

Efficiency improvement of genetic network programming by tasks decomposition in different types of environments

Published: 22 March 2021

Volume 22, pages 229–266, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

Efficiency improvement of genetic network programming by tasks decomposition in different types of environments

Download PDF

Abstract

Genetic Network Programming (GNP) is a relatively recently proposed evolutionary algorithm which is an extension of Genetic Programming (GP). However, individuals in GNP have graph structures. This algorithm is mainly used in decision making process of agent control problems. It uses a graph to make a flowchart and use this flowchart as a decision making strategy that an agent must follow to achieve the goal. One of the most important weaknesses of this algorithm is that crossover and mutation break the structures of individuals during the evolution process. Although it can lead to better structures, this may break suitable ones and increase the time needed to achieve optimal solutions. Meanwhile, all the researches in this field are dedicated to test GNP in deterministic environments. However, most of the real-world problems are stochastic and this is another issue that should be addressed. In this research, we try to find a mechanism that GNP shows better performance in stochastic environments. In order to achieve this goal, the evolution process of GNP was modified. In the proposed method, the experience of promising individuals was saved in consecutive generations. Then, to generate offspring in some predefined number of generations, the saved experiences were used instead of crossover and mutation. The experimental results of the proposed method were compared with GNP and some of its versions in both deterministic and stochastic environments. The results demonstrate the superiority of our proposed method in both deterministic and stochastic environments.

Multi-agent learning technique inspired from software engineering model for permutation coded GA

Article 23 March 2018

A Grid-Based Decomposition for Evolutionary Multiobjective Optimization

Evolutionary Computation for Real-World Problems

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

It is proven that there is no optimization method that can be better than all others to solve all types of optimization problems. This theory is known as No-Free-Lunch Theorem [1]. So, different meta-heuristic algorithms such as Genetic Algorithm (GA) [2], Genetic Programming (GP) [3, 4], Particle Swarm Optimization (PSO) [5], Ant Colony Optimization (ACO) [6, 7], Artificial Bee Colony (ABC) [8], the Estimation of Distribution Algorithm (EDA) [9] and many new variants are continuously invented and being used to solve various optimization problems [10,11,12,13,14,15,16,17,18]. Genetic Network Programming (GNP) [19,20,21] as an extension of GP is one of them. However, instead of using the tree structure as in GP, the graph structure is used to represent the individuals to improve GNP expression ability. In the GNP algorithm, the graph structure of individuals makes this algorithm suitable for decision making in agent control problems [22]. The graph structure has been composed of judgment and processing nodes enabling the individuals to represent a decision making process as a flowchart. Indeed, they are similar to GP’s elementary functions. Judgment and processing nodes correspond to non-terminal and terminal nodes of GP respectively. In GNP, the individuals are composed by connecting these nodes. The first difference between GNP and GP is that in former, the processing or action nodes are terminal nodes while in latter there is no terminal node. It means that in GP, the processing or action nodes are not connected to other ones. But in GNP, they may have a connection to other nodes. As a result, decisions in GNP are made according to not only the current condition of the environment but also the actions done in the past. It means that implementing some structures such as loops which are essential in generating strategies for agents is easy using GNP structures while it is not convenient at all to generate these types of strategies in GP. For example, suppose that it needs to generate the following instruction: if condition c₁ is true do action a₁, then action a₂ and again action a₂, then while condition c₂ is true do actions a₃ and a₂ respectively. Although generating these types of instructions may be possible in GP, it needs too many modifications on crossover, mutation and the structures of individuals in GP. In addition, GP has an inherent bloat of tree problem [23]. However, GNP does not have this problem because the number of nodes of each individual does not change during the evolution process. Meanwhile, GNP can generate compact and sophisticated structures considering only needed judgment and processing nodes according to necessity [20].

There are also some other network-oriented structure evolutionary methods such as Parallel Algorithm Discovery and Orchestration (PADO) [24], Cartesian Genetic Programming (CGP) [25, 26] and Evolutionary Programming (EP) [27]. PADO proposes an evolutionary computation algorithm on graph like automata which is so similar to GNP. It is formed by three main components. They are the main program, Automatically Defined Function (ADF) programs and indexed memory. There are a start and an end nodes in the main program of PADO. ADF is a function set which is automatically defined in the program runs. PADO is executed from the start node and ends in the end node in the network. CGP was proposed about 20 years ago for the first time. It explores the graph based GP motivated by a general representation of the graph structure compared with the tree structure of GP. It can represent the solutions of computational problems as graphs. Its encoding is an integer string that denotes the list of node connections and functions. It also includes redundant genes to help for effective evolutionary search. EP has also a graph structure that for the first time proposed by Fogel. It is an evolutionary computation algorithm like GA and GP. However, generally, it uses only mutation as the evolutionary operator. EP is used as a method for the synthesis of finite state machines automatically which is used for solving sequence prediction problems.

However, GNP is different from these methods. While GNP can evolve programs in both static and dynamic environments, PADO aims to evolve them in only static environments [28]. Meanwhile, nodes in PADO have both function (processing) and branching (judgment) behavior. They are governed by stack and index memory [29]. Different from CGP, GNP emphasizes the information transition inside the graph. There is not any terminal or output node that halts the program explicitly. Consequently, this structure is suitable to make the behavior sequences for agents [30]. In addition, there are some notable differences between CGP and GNP. For example, In CGP, the individuals are in the form of directed acyclic graphs while in GNP having cycles is an important feature of individuals that helps it to produce behavioral strategies for agents. Meanwhile, CGP uses 1 + λ EA in its evolution process. Commonly crossover is not used in the evolution process of CGP. There is no explicit notion of time delay in CGP. Finally, another important difference between GNP and CGP is that in GNP, judgment nodes provide expert-designed high-level functionality based on the task whereas CGP functions are usually standard mathematic functions.

There are some essential differences between GNP and EP. While in EP, the transition rule for all combinations of states and inputs must be defined, in GNP, nodes are connected by necessity. In each situation, only the essential inputs are used in the network flow. So, the structure of the GNP is quite compact [31].

Overall, GNP has some advantages with respect to other evolutionary algorithms. The reusability of the nodes that make the structure more compact, creating connections according to necessity and make decisions according to not only the current state of environment but also according to actions which were done in the past are some of them.

For the performance improvement of GNP, various modifications were suggested. It is also used in various applications. For example, In [19], Q-learning [32] was used to improve the efficiency of GNP. The combination of GNP and Q-learning has also been used in [28, 33] for faster adaptation in dynamic environments. SARSA algorithm [32] is another reinforcement learning algorithm used in [34, 35]. It has been applied on Khepera robot control process [36] to improve GNP efficiency. Another combination of GNP with reinforcement learning (RL) was proposed in [31]. In this version, there are several functions in each node. During the evolution process, a function is selected according to its Q value. In addition, crossover and mutation operators are defined differently from standard GNP. Defining more than one start node is another modification proposed by Mabu and Hirasawa [37]. They want to extract several programs from an individual.

Li et al. [38, 39] used EDA in their proposed method. In each iteration of their algorithm, the structure of elite individuals is used to calculate the probabilistic model. Then, next generation is produced according to the estimated probabilistic model. In other words, crossover and mutation were substituted by this probabilistic model. This mechanism was used by Li et al. [38] to find association rules in a traffic forecasting system. EDA and RL are also used to produce next generation in [40]. Finally, these papers are summarized in [41].

In standard GNP, the branches have an equal chance when using crossover and mutation operators. In individuals with high fitness values, inappropriate branches may exist. To fix this problem, the non-uniform mutation is introduced by Meng et al. [42]. In evolutionary algorithms, it is common to start the evolution process from scratch. To prevent this problem, Li et al. [43] used knowledge transfer. This leads to the shorter evolution process. In this algorithm, knowledge was formulated using the rules extracted from individuals. Then, this knowledge was used as a guideline in the evolution process. Meanwhile, RL was used to transfer knowledge automatically.

There are also some researches that used this algorithm or some other versions of it for different applications, especially in single/multi agent decision making problems. Coordination of the agents in a multi-agent system is an example of GNP applications [44, 45]. GNP was used in these studies to generate a strategy in the pursuit domain [46]. Automatically creation of a multi-agent system using GNP is another research done by Itoh et al. [47]. Making a Learning Classifier System (LCS) using GNP is proposed in [48, 49]. In their proposed method, the rules were extracted from the structure of individuals.

Swarm intelligence was also combined with GNP. ACO as one of the most successful swarm intelligence algorithms was used in [50,51,52] in order to make better exploitation ability in GNP. To make a good tradeoff between exploration and exploitation, Lu et al. [50, 51] dedicated one iteration to ACO in every 10 iterations of GNP in their proposed method. ABC is another swarm intelligence algorithm that was used in [53].

In [54, 55], an investigation on one of the important features of GNP i.e. transition by necessity was done theoretically and empirically. Standard operators of GNP treat all branches equally during evolution. In addition, the fitness of individuals only depends on the nodes which are used during evaluation. So, new genetic operators were proposed in these papers.

Overall, according to [40, 41], breaking the useful structures when using crossover and mutation is one of the most important weaknesses of GNP. Since an individual in GNP represents a strategy that agents must follow to achieve their goal, the dependency of nodes in the individual’s structure is high. Crossover and mutation break the connections frequently and completely randomly. Meanwhile, when the agents use the strategy proposed by an individual in a stochastic environment, its fitness is not precise. Solving this problem needs several time evaluation process of each individual. However, as the fitness evaluation is commonly the most time consuming section in evolutionary algorithms, several time evaluation process of each individual is not a suitable solution for solving this problem. In this research, both of these issues are considered.

In the proposed algorithm, the reproduction probability of useful structures is increased using the experience of promising individuals during the evaluation process. It reduces the destructive effect of crossover and mutation. In addition, the proposed method was applied to deterministic and stochastic environments. In stochastic environments, the experience of promising individuals could help us to estimate the fitness of each individual more precisely. Keeping a good balance between exploration and exploitation is another goal of our proposed method.

This paper is organized as follows. The GNP algorithm is reviewed in Sect. 2. Section 3 describes the stochastic environments. Our proposed algorithm is presented in detail in Sect. 4. In Sects. 5 and 6, the experimental results and discussion are presented, respectively. In the end, conclusions and future works are discussed in Sect. 7.

2 GNP

As it is shown in Algorithm 1, GNP includes three steps. As the first step, some directed graphs are produced. Then, they are evaluated and finally according to their fitness some offspring are generated using crossover and mutation.

2.1 Population structure

Unlike GP that individuals have a tree structure, in GNP, they have a graph structure. This structure increases its expressive ability. It can describe more complex strategies. As it is clear in Fig. 1, a directed graph like a flowchart can model a strategy that an agent must follow to achieve its goal. The directed graph is made of these three types of nodes. They are start node as the indicator that the strategy starts, the judgment nodes that investigate the conditions in the environment and the processing nodes that are defined according to the actions that the agents can do. As it is shown in the genotype structure of an individual, each node has an identification number i and is composed of two sections. The first section is Node Gene and the second section is Connection Gene. The Node Gene is composed of three subsections. The first subsection is NT_i. It denotes the type of node i. The second subsection is NF_i that denotes the function that node i executes and the last subsection is d_i that shows the time delay of node i function execution. Connection Gene section which is named B_i is the set of node i branches. It is composed of two subsections. Subsection C_ij determines the node that j^th branch of node i is connected to and the subsection d_ij shows the transition time delay of the j^th branch of node i.

As it is shown in Fig. 1b, NT, NF and d subsections of start node are set to zero. The start node only denotes the node from which the strategy must be started. Consequently, only the Connection Gene of this node is assigned. For judgment nodes, NT is set to one and the number of connections in the Connection Gene is more than one. Each branch in a Connection Gene corresponds to a specific condition in the environment. In processing nodes, NT is set to two and the number of connections in the Connection Gene is one because there is no conditional branch in them.

This structure shows a strategy that agents follow in the environment. For example, suppose that an agent wants to use the individual presented in Fig. 1 as its strategy. The agent starts at Node 1. According to this node, it must go to Node 2. Node 2 is a judgment node. The agent executes the J₁ function. This function investigates the state of environment. If according to the environment state, the agent has to follow the third branch of Node 2, it goes to Node 6 and executes P₁ as the process specified in this node. Then, it goes to Node 5 and executes the process of this node i.e. P₂.

In the GNP structure, there are two types of time delay. d_i is the time delay for node i execution and d_ij is the time delay needed for the transition between nodes of individuals. These two types of time delays are introduced to model the delays in the human decision making process. They can be used to define the steps in the decision making process. The number of steps is considered a terminal condition during the decision making process.

2.2 Crossover

As it is clear in Fig. 2, two offspring are generated by crossover operator applied on two parents selected by an algorithm of choice such as tournament selection [23]. During crossover operation, in the selected parents, a pair of nodes with the same identification number exchange their connections with a predefined probability P_c.

2.3 Mutation

For mutation, as it is illustrated in Fig. 3, each branch of Connection Gene is changed to another randomly selected node identification number with probability P_m.

3 Deterministic and stochastic environments

According to [56], if the current state of an environment and executed action of an agent completely determine the next state of the environment, then this type of environment is known as deterministic. If this feature does not exist, the environment is known as stochastic. An example model of a stochastic environment is shown in Fig. 4. In these environments, with a specific probability named as the deterministic parameter, each action achieves the intended outcome. Suppose the agent wants to move forward. In this case, the probability of moving forward is 60% and there is 20% chance of moving left and 20% chance of moving right.

4 Proposed algorithm

Our proposed algorithm is named Tasks Decomposition Genetic Network Programming (TDGNP). This algorithm is composed of two phases. These phases are named exploration oriented phase and exploitation oriented phase. To produce new individuals in the exploration oriented phase of TDGNP, standard operators of GNP i.e. mutation and crossover are used. In addition, during this phase, promising individuals distribute their fitness on the sequences of nodes’ connections used by the agents. This distribution is done according to our proposed method which is explained in the following. Before that, we need to define two concepts: (1) sequence and (2) task. A sequence is defined as the trail of some tasks. A task is defined as some judgment nodes followed by some processing nodes that an agent uses according to the structure of an individual when that individual is used as the strategy of the agent. In Fig. 5, an example of a sequence composed of two tasks is shown.

In the proposed method, a value is assigned to all possible connections proportional to the fitness of promising individuals within the exploration oriented phase. Then, within the exploitation oriented phase, these values as the accumulated experience of previous generations are used to produce the next generation. Algorithm 2 clearly describes our proposed method. Like other evolutionary algorithms, TDGNP is an iterative algorithm and in every K iterations, exploration and exploitation oriented phases take turn. There are exploration and exploitation in both of these phases but their names are chosen based on the dominant feature. During the exploration phase, next generation is produced using standard operators of GNP i.e. crossover and mutation. However, during exploitation phase, the individuals are generated according to the accumulated experience of promising individuals.

According to [22], the philosophy behind GNP is finding the optimal strategy for decision making of agents in the environments. It was mentioned in this reference that using GNP, we want to find which condition(s) must be investigated and according to each condition, which action(s) must be done. Then, according to what has been performed so far, the next conditions or actions are selected to be investigated or executed, respectively. Using task decomposition, we try to improve the quality of producing these types of structures. The value assigned to the connections of each task in each sequence is proportional to its importance in goal achievement. In GP, because of the tree structure of its individuals, applying this mechanism is not straightforward.

In the following, we will explain how this accumulated experience is calculated. In each iteration of exploration oriented phase, crossover and mutation of standard GNP produce the individuals of the next generation. However, within the iterations of the exploitation oriented phase, branch b of node i which is shown by b_i is connected to node n with probability p(b_i,n). This probability is calculated according to Eq. 1. This is done for all branches of all nodes in the structures of an individual to produce a new one.

$$p\left( {b_{i} ,n} \right) = \frac{{v\left( {b_{i} ,n} \right)}}{{\mathop \sum \nolimits_{m = 2}^{N} v\left( {b_{i} ,m} \right)}}\quad \forall b_{i}$$

(1)

In this equation, N shows the number of nodes in the structure of an individual. In the structure of individuals, the indegree of the first node (start node) is zero. So, in this equation, variable m is set to 2. Finally, v(b_i,n) is the value assigned to the b_i assuming it is connected to node n.

To calculate the experience of successive generations, as the first step, the effectiveness of the connections in the tasks of the sequences is assigned according to Eq. 2.

$$e\left( {b_{i} ,n} \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\;\left( {b_{i} ,n} \right) \in current\;task} \hfill \\ {\lambda e\left( {b_{i} ,n} \right)} \hfill & {O.W.} \hfill \\ \end{array} } \right.\quad \forall \left( {b_{i} ,n} \right)$$

(2)

In this equation, the effectiveness of b_i if it is connected to node n is shown by e(b_i, n). Parameter λ (0 < λ ≤ 1) is a discounting factor. Connections in the later tasks have larger e(b_i, n) than connections in the earlier tasks. Then, the fitness of each M promising individual is distributed on the connections according to Eqs. 3, 4 and 5.

$$v\left( {b_{i} ,n} \right)_{g + 1} = v\left( {b_{i} ,n} \right)_{g} + \alpha \left( {e\left( {b_{i} ,n} \right)\frac{{\mathop \sum \nolimits_{k = 1}^{M} \Delta v\left( {b_{i} ,n} \right)_{k} }}{{\mathop \sum \nolimits_{k = 1}^{M} \sigma \left( {b_{i} ,n} \right)_{k} }} - v\left( {b_{i} ,n} \right)_{g} } \right)\quad \forall \left( {b_{i} ,n} \right)$$

(3)

$$\Delta v\left( {b_{i} ,n} \right)_{k} = \left\{ {\begin{array}{*{20}l} {\log \left( {fitness_{k} } \right)} \hfill & { if\;self\;loop\;is\;not\;created} \hfill \\ 0 \hfill & {O.W.} \hfill \\ \end{array} } \right.$$

(4)

$$\sigma \left( {b_{i} ,n} \right)_{k} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {if\;it\;is\;used\;durig\;runtime} \hfill \\ 0 \hfill & {O.W.} \hfill \\ \end{array} } \right.$$

(5)

In these equations, g and α show the generation number and updating factor of v(b_i, n) respectively. In each iteration, M promising individuals update values of tasks’ connections using Eq. 3. fitness_k is the fitness of kth promising individual. In these equations, σ(b_i, n)_k is set to one whenever b_i is used by an agent that executes individual k as its strategy. To prevent the value of connections from rapid growth, the log function is used in Eq. 4. This can prevent premature convergence in the evolution process. What we are looking for is the expected value of b_i when it is connected to node n. These equations can approximate this expected value proportional to its effectiveness in the goal achievement.

When an individual is produced, the agents use it as their strategy to interact with the environment. Based on the interaction results, the fitness of individuals is estimated. After calculating individuals’ fitness, the sequences generated during the interaction of M promising individuals with the environment are extracted. Then, the tasks in each sequence are determined. Finally, for each of M promising individuals, we use Eq. 3 to distribute its fitness on the connections of its tasks.

It was mentioned that if b_i which is connected to node n is used during individual execution, σ(b_i, n)_k is set to one. It not only takes into account the existence of a connection but also shows how useful it is. Consequently, during an individual execution, only the value of the used connections is updated. This way, the usefulness of connections can be learned. In addition, when a connection of an individual is used several times, its effectiveness should not increase repeatedly. We only reset it to one. Using this mechanism, the high accumulation of fitness on the connections that participated in the loops is avoided.

5 Experimental results

Our proposed method was applied to Tile-world [57] and Pursuit-domain [46] benchmarks to test its effectiveness. These problems are two agent control problems that are commonly used in GNP research [20, 21, 28, 41, 44, 45, 49].

5.1 Tile-world

In this benchmark, we have some agents that try to push some tiles into the holes while there are some obstacles in the environment. An example of this problem can be seen in Fig. 6.

The interactions of the agents with the environment are based on judgment and processing functions defined for them. These functions are listed in Table 1. An example of using judgment functions is shown in Fig. 7. According to the results of these judgment functions, agents select one or more processing functions to execute in the environment. When a tile is dropped into a hole, the hole is filled by the tile and both of them are vanished. The corresponding cell is also converted into the floor. The program of controlling the agents’ behaviors can be generated by combining judgment and processing functions.

Table 1 List of Tile-world functions

Efficiency improvement of genetic network programming by tasks decomposition in different types of environments

Abstract

Similar content being viewed by others

Multi-agent learning technique inspired from software engineering model for permutation coded GA

A Grid-Based Decomposition for Evolutionary Multiobjective Optimization

Evolutionary Computation for Real-World Problems

Explore related subjects

1 Introduction

2 GNP

2.1 Population structure

2.2 Crossover

2.3 Mutation

3 Deterministic and stochastic environments

4 Proposed algorithm

5 Experimental results

5.1 Tile-world

5.2 Pursuit-domain

5.3 Experimental analysis

5.3.1 Tile world problem experimental results

5.3.2 Pursuit-domain experimental results

6 Discussion

7 Conclusion and future work

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation