A learning automata-based memetic algorithm

Rezapoor Mirsaleh, M.; Meybodi, M. R.

doi:10.1007/s10710-015-9241-9

A learning automata-based memetic algorithm

Published: 21 January 2015

Volume 16, pages 399–453, (2015)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Genetic Programming and Evolvable Machines Aims and scope Submit manuscript

A learning automata-based memetic algorithm

Download PDF

M. Rezapoor Mirsaleh¹ &
M. R. Meybodi^1,2

351 Accesses
10 Citations
Explore all metrics

Abstract

Combing a genetic algorithm (GA) with a local search method produces a type of evolutionary algorithm known as a memetic algorithm (MA). Combining a GA with a learning automaton (LA) produces an MA named GALA, where the LA provides the local search function. GALA represents chromosomes as object migration automata (OMAs), whose states represent the history of the local search process. Each state in an OMA has two attributes: the value of the gene (allele), and the degree of association with those values. The local search changes the degree of association between genes and their values. In GALA a chromosome’s fitness is computed using only the value of the genes. GALA is a Lamarckian learning model as it passes on the learned traits acquired by its local search method to offspring by a modification of the genotype. Herein we introduce a modified GALA (MGALA) that behaves according to a Baldwinian learning model. In MGALA the fitness function is computed using a chromosome’s fitness and the history of the local search recorded by the OMA states. In addition, in MGALA the learned traits are not passed to the offspring. Unlike GALA, MGALA uses all the information recorded in an OMA representation of the chromosome, i.e., the degree of association between genes and their alleles, and the value of a gene, to compute the fitness of genes. We used MGALA to solve two problems: object partitioning and graph isomorphism. MGALA outperformed GALA, a canonical MA, and an OMA-based method using computer simulations, in terms of solution quality and rate of convergence.

A Memetic Model Based on Fixed Structure Learning Automata for Solving NP-Hard Problems

Comparing Lamarckian and Baldwinian Approaches in Memetic Optimization

Conceptual modeling of evolvable local searches in memetic algorithms using linear genetic programming: a case study on capacitated vehicle routing problem

Article 31 December 2015

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Exploration and exploitation are two main search goals. Exploration is important for ensuring global reliability: the whole of search space needs to be searched to provide a trustworthy estimate of the global optimum. Exploitation is important, because it focuses the search effort around the best solutions by searching their neighborhoods to find more accurate solutions [1]. Many search algorithms use a combination of a global search method and a local search method to achieve their goal. These algorithms are known as hybrid methods. The combination of a traditional genetic algorithm (GA) with local search methods that incorporate local improvement procedures can improve the performance of GAs. These hybrid methods are commonly known as memetic algorithms (MAs), or Baldwinian [2] or Lamarckian [3] evolutionary algorithms (EA). The particular local search method employed is the important aspect of these algorithms. In the Lamarckian approach the local search method is used as a refinement genetic operator that modifies the genetic structure of an individual and places it back in the genetic population [4]. Lamarckian evolution can increase the speed of search processes in genetic algorithms. However, it can damage schema processing by changing the genetic structure of individuals, which may lead to premature convergence [5, 6].

The Baldwinian learning approach improves the fitness of an individual by applying a local search, however, individual genotypes remain unchanged. Thus, it increases the individual’s chances of remaining in subsequent generations. Similar to natural evolution, Baldwinian learning does not modify the genetic structure of an individual; but it does increase its chances of survival. Unlike the Lamarckian learning model, the Baldwinian approach does not allow parents to transfer what they have learned to their children [6]. The local search method is used as a part of the individual’s evaluation process in the Baldwinian approach. The local search method uses local knowledge to create a new fitness that can be used by the global genetic operators to improve an individual’s capability. In this method one or more individuals of a population that are similar in genotype gain similar fitness. These individuals, are probably near to each other in search space, and are equal in fitness after applying the local search. Therefore, the new search space will be a smooth surface, and will cover many of the local minima of the new search space. This fitness modification is known as the smoothing effect. The Baldwinian learning approach can be more effective, albeit slower, than Lamarckian approaches, since it does not alter the global search process of GAs [5].

Learning automata (LAs) are based on the general schemes of reinforcement learning algorithms. LAs enable agents to learn their interaction with an environment. They select actions via a stochastic process and apply them on a random, unknown environment. They can learn the best action by iteratively performing and receiving stochastic reinforcement signals from the unknown environment. These stochastic responses from the environment show the favorability of the selected actions, and the LAs change their action selecting mechanism in favor of the most promising actions according to responses from the environment [7, 8].

GALA is a type of MA first reported by Rezapoor and Meybodi [9]. GALA combines a GA, used for its global search function (Exploration), with an LA, used for its local search function (Exploitation). Object migration automata (OMAs) represent chromosomes in GALA. Each state in an OMA has two attributes: the value of the gene, and the degree of association with its value. Information about the past history of the local search process shows the degree of association between genes and their values. GALA performs according to a Lamarckian learning model, because it modifies the genotype and only uses a chromosome’s fitness to fitness function computation.

We present a new version of GALA, called modified GALA (MGALA), in the first part of this paper. MGALA behaves according to a Baldwinian learning model. Unlike GALA, which only uses the value of genes for fitness computation, MGALA uses all the information in the OMA representation of the chromosome (i.e., the degree of association between genes and their alleles, and the value of genes) to compute the fitness function. In the second part of the paper MGALA is used to solve two optimization problems: object partitioning and graph isomorphism. Computer simulations show that MGALA outperforms GALA, a canonical MA, and an OMA-based method, in terms of solution quality and in the rate of convergence.

Overall our paper is organized as follows: after this introduction Sect. 2 briefly describes learning automata and object migrating automata. GALA and its applications are described in Sect. 3. MGALA is introduced in Sect. 4. Two MGALA applications, those of solving the object partitioning problem and the graph isomorphism problem (GIP), are explained in Sects. 5 and 6, respectively. These two sections include implementation considerations, simulation results, and comparisons with other algorithms, which highlights MGALA’s contributions to the field. Section 7 is the conclusion.

2 Learning automata and object migrating automata

2.1 Learning automata

A learning automaton (LA) [5] is an adaptive decision‐making unit. It can be described as determination of an optimal action from a set of actions through repeated interactions with an unknown random environment. It selects an action based on a probability distribution at each instant and applies it on a random environment. The environment sends a reinforcement signal to automata after evaluating the input action. The learning automata process the response of environment and update its action probability vector. By repeating this process, the automaton learns to choose the optimal action so that the average penalty obtained from the environment is minimized. The environment is represented by a triple $ < \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} > $. $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } = \{ \alpha_{1} , \ldots ,\alpha_{r} \} $ is the finite set of the inputs, $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } = \{ 0, 1\} $ is the set of outputs that can be taken by the reinforcement signal, and $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} = \{ c_{1} , \ldots , c_{r} \} $ is the set of the penalty probabilities, where each element c _i of $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} $ is corresponds to one input action α _i. The input α(n) to the environment belongs to $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } $ and may be considered to be applied to the environment at discrete time $ t = n\,(n = 0,1,2, \ldots ) $. The output β(n) of the environment belongs to $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } $ and can take on one of two values 0 and 1. An β(n) = 1 is identified with a failure or an unfavorable response and β(n) = 0 with a success or favorable response of the environment. The element c_i of $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} $ which characterizes the environment may then be defined by $ pr\left( {\beta (n) = 1|\alpha (n) = \alpha_{i} } \right) = c_{i} \quad (i = 1,2, \ldots ,r) $. When the penalty probabilities are constant, the random environment is said a stationary random environment. It is called a non stationary environment, if they vary with time. Figure 1 shows the relationship between the LA and the random environment.

There are two main families of learning automata [6]: fixed structure learning automata and variable structure learning automata. First, we formally define fixed structure learning automata and then some of the fixed structures learning automata such as Tsetline, Krinsky, and Krylov automata are described.

A fixed structure LA is represented by a quintuple $ < \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } ,F,G > $. where:

$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } = \{ \alpha_{1} , \ldots ,\alpha_{r} \} $ is the set of actions that it must choose from.
$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } = \{ \varphi_{1} , \ldots ,\varphi_{s} \} $ is the set of internal states.
$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } = \{ 0,1\} $ is the set of inputs where 1 represents a penalty and 0 represents a reward.
$ F:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } *\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } \to \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } $ is a function that maps the current state and current input into the next state.
$ G:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } \to \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } $ is a function that maps the current state into the current output. In other words, G determines the action taken by the automaton.

The operation of fixed learning automata could be described as follows: At the first step, the selected action $ \alpha (n) = G[\varPhi (n)] $ serves as the input to the environment, which in turn emits a stochastic response β(n) at the time n. β(n) is an element of $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } = \{ 0,1\} $ and is the feedback response of the environment to the automaton. In the second step, the environment penalize (i.e., β(n) = 1) the automaton with the penalty probability c _i, which is the action dependent. On the basis of the response β(n), the state of automaton is updated by $ \varPhi (n + 1) = F[\varPhi (n),\beta (n)] $. This process continues until the desired result is obtained.

In the following paragraphs, we describe some of the fixed structure learning automata such as Tsetline, Krinsky, and Krylov automata.

2.1.1 The two-state automata (L _2,2)

This automaton has two states, $ \varphi_{1} $ and $ \varphi_{2} $ and two actions α ₁ and α ₂. The automaton accepts input from a set of {0,1} and switches its states upon encountering an input 1 (unfavorable response) and remains in the same state on receiving an input 0 (favorable response). An automaton that uses this strategy is refereed as L _2,2 where the first subscript refers to the number of states and second subscript to the number of actions.

2.1.2 The Tsetline automata (the two-action automata with memory L _2N,2)

Tsetline suggested a modification of L _2,2 denoted by L _2N,2. This automaton has 2 N states and two actions and attempts to incorporate the past behavior of the system in its decision rule for choosing the sequence of actions. While the automaton L _2,2 switches from one action to another on receiving a failure response from environment, L _2N,2 keeps an account of the number of success and failures received for each action. It is only when the number of failures exceeds the number of successes, or some maximum value N; the automaton switches from one action to another. The procedure described above is one convenient method of keeping track of performance of the actions α₁ and α₂. As such, N is called depth of memory associated with each action, and automaton is said to have a total memory of 2N. For every favorable response, the state of automaton moves deeper into the memory of corresponding action, and for an unfavorable response, moves out of it. This automaton can be extended to multiple action automata. The state transition graph of L _2N,2 automaton is shown in Fig. 2.

2.1.3 The Krinsky automata

This automaton behaves exactly like L _2N,2 automaton when the response of the environment is unfavorable, but for favorable response, any state $ \varphi_{i} $ (for i = 1,…,N) passes to the state $ \varphi_{1} $ and any state $ \varphi_{i} $ (i = N + 1, 2N) passes to the state $ \varphi_{N + 1} $. This implies that a string of N consecutive unfavorable responses are needed to change from one action to another. The state transition graph of Krinsky automaton is shown in Fig. 3.

2.1.4 The Krylov automata

This automaton has state transitions that are identical to the L _2N,2 automaton when the output of the environment is favorable. However when the response of the environment is unfavorable, a state $ \varphi_{i} \,(i \ne 1,N,N + 1,2N) $ passes to a state $ \varphi_{i + 1} $ with probability 1/2 and to state $ \varphi_{i - 1} $ with probability 1/2, as shown in Fig. 4. When i = 1 or N + 1, $ \varphi_{i} $ stays in the same state with probability 1/2 and moves to $ \varphi_{i + 1} $ with the same probability. When i = N, $ \varphi_{N} $ moves to $ \varphi_{N - 1} $ and $ \varphi_{2N} $ each with probability 1/2 and similarly, When i = 2 N, $ \varphi_{2N} $ moves to $ \varphi_{2N - 1} $ and $ \varphi_{N} $ each with probability 1/2. The state transition graph of Krylov automaton is shown in Fig. 4.

Object migration automaton (OMA) that is an example of fixed structure learning automata is described in the next section. Learning automata have a vast variety of applications in combinatorial optimization problems [8–10], computer networks [10–13], queuing theory [14, 15], signal processing [16, 17], information retrieval [18, 19], adaptive control [20–22], neural networks engineering [23, 24] and pattern recognition [25–27].

2.2 Object migration automata

Object migration automata were first proposed by Oommen and Ma [10]. OMAs are a type of fixed structure learning automata, and are defined by a quintuple $ < \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } ,\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } ,F,G > $.$ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } = \{ \alpha_{1} , \ldots , \alpha_{r} \} $. is the set of allowed actions for the automaton. For each action α _k, there is a set of states $ \{ \varphi_{{\left( {k - 1} \right)N + 1}} , \ldots ,\varphi_{kN} \} $, where N is the depth of memory. The states $ \varphi_{{\left( {k - 1} \right)N + 1}} $ and $ \varphi_{kN} $ are the most internal state and the boundary state of action α _k, respectively. The set of all states is represented by $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } = \{ \varphi_{1} , \ldots ,\varphi_{s} \} $, where $ s = N*r $. $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } = \{ 0,1\} $ is the set of inputs, where 1 represents an unfavorable response, and 0 represents a favorable response. $ F: \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } *\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\beta } \to \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } $ is a function that maps the current state and current input into the next state, and $ G:\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\varPhi } \to \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{\alpha } $ is a function that maps the current state into the current output. In other words, G determines the action taken by the automaton. W objects are assigned to actions in an OMA and moved around the states of the automaton, as opposed to general learning automata, in which the automaton can move from one action to another by environmental response. The state of objects is changed on the basis of the feedback response from the environment. If the object w _i is assigned to action α _k (i.e., w _i is in state ξ _i, where $ \xi_{i} \in \left\{ {\varphi_{{\left( {k - 1} \right)N + 1}} , \ldots ,\varphi_{kN} } \right\} $), and the feedback response from the environment is 0, α_k is rewarded, and w_i is moved toward the most internal state $ (\varphi_{{\left( {k - 1} \right)N + 1}} ) $ of that action. If the feedback from the environment is 1, then α _k is penalized, and w _i is moved toward the boundary state ($ \varphi_{kN} $) of action α _k. The variable γ _k denotes the reverse of the state number of the object assigned to action α _k (i.e., degree of association between action α _k and its assigned object). By rewarding an action, the degree of association between that action and its assigned object will be increased. Conversely, penalizing an action causes the degree of association between that action and its assigned object to be decreased. An object associated with state $ \varphi_{{\left( {k - 1} \right)N + 1}} $ has the most degree of association with action α _k, and an object associated with state $ \varphi_{kN} $ has the least degree of association with action α _k.

3 GALA

GALA, which is a hybrid model based on a GA and an LA, was introduced for the first time by Rezapoor and Meybodi [9]. Chromosomes are represented by OMAs in this model. In the OMA-based representation, there are n actions in each automaton corresponding to n genes in each chromosome. Furthermore, for each action, there are a fixed number of states N. The value of each gene, as a migratory object in the automata, is selected from the set W = {w ₁,…,w _m} and assigned to states of corresponding action. After applying a local search, if the assignment of an object to the states of an action is promising, then the automaton is rewarded and the assigned object moves toward the most internal state of that action; otherwise, the automaton is penalized and the assigned object moves toward the boundary state of that action. The rewarding and penalizing of an action changes the degree of association between an object and its action. Figure 5 shows a representation of chromosome “dfabec” using the Tsetline automaton-based OMA with six actions and a depth of memory of five.

In Fig. 5 there are six actions (genes), denoted by α ₁, α ₂, α ₃, α ₄, α ₅, and α ₆. Genes 1, 2, and 6 possess values ‘d,’ ‘f,’ and ‘c,’ located at internal states 2, 3, and 4 of their actions, respectively. The value of genes 3 and 5 are ‘a’ and ‘e’ respectively, and both of them are located at the boundary states of their actions. Consequently, there is a minimum degree of association between these actions and their corresponding object. The remaining action, gene 4, has a value of ‘b’ and is located at the most internal state of its action. That is, it has the maximum degree of association with action 4. Representation of chromosomes based on other fixed structure learning automata is also possible. In a Krinsky-based OMA representation, as shown in Fig. 3, the object will be associated with the most internal state (i.e., it gets the highest degree of association with the corresponding action) when it is rewarded, and moves according to the Tsetline automaton-based OMA when it is penalized. In the representation of a Krylov OMA shown in Fig. 4, the object moves either toward the most internal state, or toward the boundary state, with a probability 0.5 toward penalty, and moves according to the Tsetline automaton-based OMA upon reward.

3.1 Global search in GALA

The global search in GALA is based on a traditional genetic algorithm. A population of chromosomes is represented by an OMA. Chromosome i is denoted by CR _i = [(CR _i .Action(1), CR _i .Object(1), CR _i .State(1)),…,(CR _i .Action(n), CR _i .Object(n), CR _i .State(n))], where CR _i.Action(k) is the kth action of CR _i, CR _i.Object(k) is the object assigned to the k _th action (the value of the kth gene), and CR _i.State(k) is the state of the object assigned to the k _th action (the degree of association between gene k and its value), specifying 1 ≤ k≤n, 1 ≤ CR _i .Object(k) ≤ m, and (k − 1)N + 1≤CR _i .State(k) ≤ kN. The initial population is created randomly and objects are located at the boundary state of their actions. At the beginning of each generation the best chromosome from the previous generation is moved to the population of the current generation. Next, the crossover operator is applied to the parent chromosomes at rate r _c (parents are selected according to the chromosome’s fitness using a tournament mechanism), and then the mutation operator is applied at rate r _m.

3.2 Crossover operator

The crossover operator in GALA is applied as follows: Two chromosomes, CR ₁ and CR ₂, are selected by the selection mechanism as parent chromosomes. Two actions, r₁ and r₂, are also randomly selected from CR ₁ and CR ₂, respectively. Then for each action in the range of [r₁,r₂] of CR ₁, the assigned object is exchanged with an assigned object of the same action in chromosome CR ₂. In the crossover operator, the previous states of the selected actions in CR ₁ and CR ₂ are transferred to the child chromosomes. The pseudo code for the crossover operator is shown in Fig. 6.

Figure 7 illustrates an example of a crossover operator. First, two actions are randomly selected in the parent chromosomes (e.g. actions 2 and 4 here), and then objects are assigned to all actions in the range of [2,4] in CR ₁, and are exchanged with the objects of the corresponding actions in CR ₂.

3.3 Mutation operator

The mutation operator is the same as in a traditional genetic algorithm. Two actions are selected randomly in the parent chromosome, and then their assigned objects are exchanged. The previous states of selected actions in the parent chromosome are transferred to the child chromosomes in this operator. Pseudo code for the mutation operator is shown in Fig. 8.

Figure 9 illustrates an example of a mutation operator. First two actions in the parent chromosome are randomly selected (e.g. actions 1 and 2 here). The mutation operator exchanges both the state and the object assigned to action 1 with the state and the object assigned to action 2.

3.4 Local learning in GALA

Local learning in GALA is done using OMA representations of chromosomes. If the objects assigned to an action are the same before and after applying a given local search, then that action will be rewarded; otherwise, it will be penalized. It is worth noting that a local search only changes the state of the actions (according to the OMA connections), not the objects assigned to the actions (i.e., the action associated with an object will not change). By rewarding an action, the state of that action will move toward the most internal state according to the OMA connection. This causes the degree of association between an object, and its corresponding action, to be increased. The state of an action remains unchanged, if the object is located at its most internal state, such as the state of object D in action 4, shown in Fig. 11.

Figure 10 provides pseudo code for rewarding an action. Figure 11 illustrates an example of rewarding an action.

Penalizing an action causes the degree of association between an object and its corresponding action to be decreased. If an object is not in the boundary state of its action, then penalizing causes the object assigned to the action to move toward the boundary state. This means that the degree of association between the action and the corresponding object will be decreased (Fig. 12). If an object is in the boundary state of its action, then penalizing the action causes the object assigned to that action to change, and results in the creation of a new chromosome. How a new chromosome is created depends on the application. A new chromosome is always created in such a way that its fitness becomes greater than the fitness of the old chromosome. Figure 13 shows the effect of the penalty function on action 3 of a sample chromosome (assuming that chromosome “cbadfe” has better fitness than chromosome “cbedfa”). Pseudo code for the penalty function is shown in Fig. 14. The pseudo code for GALA is shown in Fig. 15.

3.5 Applications of GALA

GALA has been used in a variety of applications, including the GIP [11], join ordering problems in database queries [12–15], the traveling salesman problem [16–18], the Hamiltonian cycles problem [19], sorting problems in graphs [20], the graph bandwidth minimization problem [21–23], software clustering problems [24, 25], the single machine total weighted tardiness scheduling problem [26], data allocation problems in distributed database systems [27, 28], and the task graph scheduling problem [29, 30].

4 Modified GALA (MGALA)

Modified GALA (MGALA) is a new version of GALA. Like GALA each chromosome is represented by an object migration automaton (OMA) whose states keep information about the past history of the local search process. Each state in the OMA has two attributes: the value of the corresponding gene, and the degree of association of the gene with its value. In MGALA the fitness function is computed using the past history of the local search kept in the OMA states, as well as the chromosome’s fitness. Unlike GALA, which only uses the value of the genes for fitness computation, MGALA uses all the information in the OMA representation of the chromosome (i.e., the degree of association between genes and their values) to compute the fitness function. Hence, unlike GALA, which behaves according to a Lamarckian learning model it behaves according to a Baldwinian learning model. MGALA’s various components will be described in the remainder of this section.

4.1 Fitness function

The fitness function in MGALA is not only dependent on genotype information, but also on phenotype information. We use the fitness function $ f^{'} (CR) = \sum\nolimits_{i = 1}^{n} {f_{i} (1 + \gamma_{i} )} $ for maximization problems, and $ f^{'} (CR) = \sum\nolimits_{i = 1}^{n} {f_{i} (1 - \gamma_{i} )} $ for minimization problems in the selection of chromosomes. In these functions f _i is the fitness of the ith gene, and γ _i is the degree of association between action α _i and its assigned object. The depth of memory is N, so we have $ \frac{1}{N} \le \gamma_{i} \le 1 $. The parent chromosomes and the chromosomes of the next generation are selected based on the defined fitness function using a tournament mechanism.

4.2 Mutation operator

Depending on whether the state of the selected actions changes or not, we define two types of mutation operators in MGALA. In the first type the states of the selected actions remain unchanged (i.e., the degree of association between actions and their assigned objects are saved). In second class the states of the selected actions are changed (i.e., the degree of association between actions and their assigned objects are lost).

MGALA has three mutation operators defined in this paper: SS-Mutation, XS-Mutation, and LS-Mutation. Specifically:

The mutation operator in which the previous state of selected actions can be saved is referred to as the SS-Mutation.
The mutation operator in which the previous state of selected actions can be exchanged is referred to as the XS-Mutation.
The mutation operator in which the previous state of selected actions can be lost is referred to as the LS-Mutation.

The SS-Mutation and XS-Mutation are examples of the first type of mutation operator described above, and the LS-Mutation is an example of the second type. These mutation operators are described in more detail below.

4.2.1 SS-Mutation

Assuming actions 1 and 2 are the selected actions, the SS-Mutation exchanges the objects assigned to the states of actions 1 and 2. Figure 16 shows an example of an SS-Mutation. Pseudo code for our SS-Mutation is shown in Fig. 17.

4.2.2 XS-Mutation

The XS-Mutation is the same as the mutation that GALA uses. The states of the actions, along with their assigned objects, are exchanged. Figure 18 shows an example of an XS-Mutation. Assuming actions 1 and 2 are the selected actions, the XS-Mutation exchanges the object and the state of action 1 with those of action 2. Pseudo code for our XS-Mutation is shown in Fig. 19.

4.2.3 LS-Mutation

The LS-Mutation is the same as that used in GALA, except that the state of each action is changed to become its corresponding boundary state. Figure 20 shows an example of an LS-Mutation operator. Assuming that actions 1 and 2 are the selected actions, the LS-Mutation operator causes: 1) The object assigned to the state of action 1 is exchanged with the assigned object of the action; and 2) The state of each action changes to become its corresponding boundary state. Pseudo code for our LS-Mutation operator is given in Fig. 21.

4.3 Crossover operator

Similar to the mutation operators, we define two different types of crossover operators for MGALA. In the first type, the states of selected actions remain unchanged, and in the second type the states of the actions change to the boundary states.

Three crossover operators, SS-Crossover, XS-Crossover, and LS-Crossover, are defined in MGALA:

The crossover operator in which the previous state of selected actions can be saved is referred to as the SS-Crossover.
The crossover operator in which the previous state of selected actions can be exchanged is referred to as the XS-Crossover.
The crossover operator in which the previous state of selected actions can be lost is referred to as the LS-Crossover.

The SS-Crossover and XS-Crossover are examples of crossover operators of the first type, and the LS-Crossover is an example of the second type. These crossover operators are described in further detail below.

4.3.1 SS-Crossover

Figure 22 shows an example of an SS-Crossover. Assuming actions 2 and 4 are selected randomly from the parent chromosomes, the SS-Crossover exchanges the assigned object of each action in the range of [2,4] of CR ₁, with the assigned object of the same action in the range of [2,4] of CR ₂. Note that in the SS-Crossover the states of the actions remain unchanged. Pseudo code for the SS-Crossover is shown in Fig. 23.

4.3.2 XS-Crossover

The XS-Crossover is the same as the crossover used in GALA. Figure 24 shows an example of an XS-Crossover. Assuming actions 2 and 4 are selected randomly from the parent chromosomes, the XS-Crossover exchanges both the assigned object and the state of each action in the range of [2,4] of CR ₁, with the assigned object and the state of the same action in the range of [2,4] of CR ₂. Figure 25 shows the pseudo code for the XS-Crossover.

4.3.3 LS-Crossover

The LS-Crossover is the same as the crossover used in GALA, except that the state of each action is changed to become its corresponding boundary state. Figure 26 shows an example of an LS-Crossover. Assume that actions 2 and 4 are randomly selected in the parent chromosomes, then the LS-Crossover causes: (1) Each object assigned to the actions in the range of [2,4] of CR ₁ is exchanged with the assigned object of the same action in the range of [2,4] of CR ₂; and (2) The state of each action changes to become its boundary state. Figure 27 shows the pseudo code of the LS-Crossover.

5 The equipartitioning problem

Let $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{A} = \{ A_{1} , \ldots ,A_{W} \} $ be a set of W objects. We want to partition $ \underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{A} $ into R classes {P ₁,…,P _R} such that the objects used more frequently are located together in the same class. We assume that the joint access probabilities of the objects are unknown. This problem is called the object partitioning problem. A special case of the object partitioning problem, referred to as the equal partitioning problem (EPP), is where the objects are equipartitioned. In an EPP each class has exactly M = W/R objects. For solving the EPP with MGALA we define a chromosome to have W genes (actions) and the value of the genes are selected from a set of classes {P ₁,…,P _R} (as migratory objects in the OMA) such that each class is assigned to W/R of genes (actions). Objects are initially assigned to the boundary state of actions. Figure 28 shows a chromosome based on a Tsetline OMA representation for 6 objects and 2 classes (called class α and class β) with N = 5. In this figure objects 1, 3, and 4 are assigned to class α, and objects 2, 5, and 6 are assigned to class β.

5.1 Local search for EPP

Suppose a query [which is a pair of objects (A _i, A _j)] has been accessed. If the assigned objects of actions α _i and α _j are the same, then both actions α _i and α _j are rewarded and their states change according to OMA connections. If the assigned objects of actions α _i and α _j are different, then they are penalized and their states change according to OMA connections. Pseudo code for our local search of the EPP is shown in Fig. 29.

5.2 Experimental results

We studied the efficiency of our MGALA algorithm in solving the EPP by comparing its results to those obtained for an OMA method reported in [10] and to the GALA algorithm. Queries were chosen randomly from a pool of queries for all experiments. The pool of queries was generated in such a way that the sum of probabilities that object A _i in partition π _i is jointly accessed with other objects in partition π _i is p, and with objects in partition π _j (j ≠ i) is 1 − p, that is:

$$ \mathop \sum \limits_{{A_{j} \in \pi_{i} }} \Pr \left[ {A_{i} ,A_{j} \,accessed\,together} \right] = p $$

(1)

Therefore, if p = 1, then queries will only involve objects in the same partition. As the value of p decreases, the queries will become decreasingly informative about the solution of the EPP [10]. For all experiments an initial population of chromosomes of size 1 was randomly created, the size of each chromosome was set equal to the number of objects, the mutation rate was 0.05, the selection mechanism was (1,1), p was 0.9, and the depth of memory was 2. The algorithm terminates when all the objects in only chromosome are located in the most internal state of their actions. For all experiments a Tsetline-based OMA was used for chromosome representation. Each reported result was averaged over 30 runs. We performed a parametric test (T test) and two non-parametric tests (wilcoxon rank sum test and permutation test) at the 95 % significance level to provide statistical confidence. The T tests were performed after ensuring that the data followed a normal distribution (by using the Kolmogorov–Smirnov test).

5.2.1 Experiment 1

In this experiment we compared the results obtained from MGALA with the results of two other algorithms, an OMA-based algorithm reported in [10] and the GALA version of the algorithm, for the EPP, in terms of the number of iterations (number of accessed queries) required by the algorithm. MGALA was tested with three different mutation operators: SS-Mutation, XS-Mutation, and LS-Mutation.

Table 1 presents the results of the different algorithms for 14 different cases with respect to the average number of iterations and their standard deviation. From the results reported in Table 1 we report the following:

Table 1 Comparison of the number of iterations required by different algorithms

A learning automata-based memetic algorithm

Abstract

Similar content being viewed by others

A Memetic Model Based on Fixed Structure Learning Automata for Solving NP-Hard Problems

Comparing Lamarckian and Baldwinian Approaches in Memetic Optimization

Conceptual modeling of evolvable local searches in memetic algorithms using linear genetic programming: a case study on capacitated vehicle routing problem

Explore related subjects

1 Introduction

2 Learning automata and object migrating automata

2.1 Learning automata

2.1.1 The two-state automata (L 2,2)

2.1.2 The Tsetline automata (the two-action automata with memory L 2N,2)

2.1.3 The Krinsky automata

2.1.4 The Krylov automata

2.2 Object migration automata

3 GALA

3.1 Global search in GALA

3.2 Crossover operator

3.3 Mutation operator

3.4 Local learning in GALA

3.5 Applications of GALA

4 Modified GALA (MGALA)

4.1 Fitness function

4.2 Mutation operator

4.2.1 SS-Mutation

4.2.2 XS-Mutation

4.2.3 LS-Mutation

4.3 Crossover operator

4.3.1 SS-Crossover

4.3.2 XS-Crossover

4.3.3 LS-Crossover

5 The equipartitioning problem

5.1 Local search for EPP

5.2 Experimental results

5.2.1 Experiment 1

5.2.2 Experiment 2

5.2.3 Experiment 3

6 The graph isomorphism problem

6.1 The local search in the graph isomorphism problem

6.2 Experimental results

6.2.1 Experiment 1

6.2.2 Experiment 2

6.2.3 Experiment 3

6.2.4 Experiment 4

6.2.5 Experiment 5

7 Conclusions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

2.1.1 The two-state automata (L _2,2)

2.1.2 The Tsetline automata (the two-action automata with memory L _2N,2)