Multifactorial evolutionary algorithm for solving clustered tree problems: competition among Cayley codes

Dinh, Thanh Pham; Thanh, Binh Huynh Thi; Ba, Trung Tran; Binh, Long Nguyen

doi:10.1007/s12293-020-00309-2

Multifactorial evolutionary algorithm for solving clustered tree problems: competition among Cayley codes

Case studies on the clustered shortest-path tree problem and the minimum inter-cluster routing cost clustered tree problem

Regular Research Paper
Published: 05 August 2020

Volume 12, pages 185–217, (2020)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Memetic Computing Aims and scope Submit manuscript

Multifactorial evolutionary algorithm for solving clustered tree problems: competition among Cayley codes

Download PDF

522 Accesses
18 Citations
Explore all metrics

Abstract

The Multifactorial Evolutionary Algorithm (MFEA) has emerged as an effective variant of the evolutionary algorithm. MFEA has been successfully applied to deal with various problems with many different types of solution encodings. Although clustered tree problems play an important role in real life, there haven’t been much research on exploiting the strengths of MFEA to solve these problems. One of the challenges in applying the MFEA is to build specific evolutionary operators of the MFEA algorithm. To exploit the advantages of the Cayley Codes in improving the MFEA’s performance, this paper introduces MFEA with representation scheme based on the Cayley Code to deal with the clustered tree problems. The new evolutionary operators in MFEA have two different levels. The purpose of the first level is to construct a spanning tree which connects to a vertex in each cluster, while the objective of the second one is to determine the spanning tree for each cluster. We focus on evaluating the efficiency of the new MFEA algorithm on known Cayley Codes when solving clustered tree problems. In the aspect of the execution time and the quality of the solutions found, each encoding type of the Cayley Codes is analyzed when performed on both single-task and multi-task to find the solutions of one or two different clustered tree problems respectively. In addition, we also evaluate the effect of those encodings on the convergence speed of the algorithms. Experimental results show the level of effectiveness for each encoding type and prove that the Dandelion Code outperforms the remaining encoding mechanisms when solving clustered tree problems.

Two levels approach based on multifactorial optimization to solve the clustered shortest path tree problem

Article 14 October 2020

Evolutionary dynamic grouping based cooperative co-evolution algorithm for large-scale optimization

Article 01 March 2024

A Two-Level Hybrid Based Genetic Algorithm to Solve the Clustered Shortest-Path Tree Problem Using the Prüfer Code

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Along with the rapid development of technology, the demand for connecting devices through the network is increasing. Therefore, optimization of the network design to save the building cost, but still to allow the devices to communicate with each other quickly and simultaneously has attracted a lot of interests from the research community. The common problem in fiber-optic network design is to ensure communication, the speed of sending, receiving information between devices and to optimize the cost of network construction.

All of the above problems can be modeled as an optimization problem on a fully connected graph. In particular, each device is considered as a vertex of the graph, each pair of devices is assigned with a weight representing the cost of the connection between them. To ensure that there is always a connection between the two devices, designing the network as a spanning tree of the graph is a suitable solution. When the number of devices is large, many research work have the ideas of aggregating similar devices into clusters, which will be used for designing a bigger network system. In reality, numerous types of clustered network systems are designed with different goals in term of optimizing communication costs. Those network systems are then modeled as clustered spanning trees to ensure inter-device connectivity and inter-cluster communication connectivity. In some recent studies on problems with clustered tree structures, D’Emidio et al. [8] defined the Clustered Shortest-Path Tree Problem (CluSPT), and Chen-Wan Lin et al. [14] introduced the Minimum Inter-cluster Routing Cost Clustered Tree Problem (InterCluMRCT). The solution to each of the 2 problems mentioned above is a spanning tree of a graph which satisfies a constrain that the sub-graph in each cluster belonging to that tree is also a spanning tree. Nevertheless, the goals of two problems are different. While the goal of the CluSPT is to find a spanning tree of a graph which minimizes the total length of all paths from a given source vertex to other vertices, the InterCluMRCT tries to minimize the total communication cost among vertices of different clusters. In this study, we aim to design an algorithm to solve simultaneously multiple network problems that can be transformed into the clustered tree problems with different objective functions. The effectiveness of the algorithm is then experimented on two examples of clustered tree problems, which are the CluSPT and the InterCluMRCT. Based on the similarities of the solution structures and constraints, we try to build a mechanism to simultaneously solve those 2 above problems specifically, with the purpose of solving various different types of clustered network design problems in general.

Recently, many studies have proposed multitasking algorithms to find solutions to multiple different combinatorial optimization problems at the same time. In particular, Multifactorial Evolutionary Algorithm (MFEA) [2] is emerging as one of the most effective variants of the Evolutionary Algorithm (EA). MFEA which has been proposed by A. Gupta et al. [11] can solve multiple independent optimization problems simultaneously using a single population of solutions in the unified search space. Since a part of the genetic material of individuals in unified search space is shared among different problems, there is always a process of exchanging information among problems inside MFEA through changing the shared genetic material. Based on these advantages, this research introduces the MFEA algorithm to simultaneously solve 2 problems which are the CluSPT problems and the InterCluMRCT problem. We propose a method of combining those two problems with different dimensions into the same search space. The main reason for doing that is due to the similar properties of the solution structures for those problems, so exchanging the components of the two solutions will assist in searching to find a better solution in the process of exploring the solution space.

There are several ways for representing solution as spanning tree such as edge-sets [20], characteristic vectors [9], etc. in which Cayley Code [19, 24] has been investigated for a long time and attracted a lot of attention from the research community because of its three main advantages. Cayley Code represents each tree with n labeled vertices as a string of \(n-2\) labels, such that each tree corresponds to a unique string. Firstly, Cayley Code can encode a solution into spanning tree easier than other methods. Instead of representing chromosome as a list of edges like edge-set method [20], chromosome can be encoded as a series of numbers. Secondly, it takes full advantage of existing evolutionary operators such as one-point crossover, two-point crossover, swap mutation, etc. Lastly, there are various effective algorithms for solving a lot of problems based on the Cayley Code [19, 24].

With its outstanding advantages, Cayley Code is chosen to be the solution representation for the solution to the 2 above-mentioned problems in our research. We introduce a method to represent a solution of clustered tree problems as a series of integers based on Cayley code. A spanning tree corresponding to each cluster is encoded into a segment of Cayley string which is arranged next to each other to form a solution for clustered tree problem. A valid solution can be constructed by decoding all Cayley strings from encoded solution and combining them altogether. As a result, some operations on the solution will be less complex since performing the operations on a string of numbers is simpler than that of a graph. This is especially useful when applying for EA due to the fact that in each generation such evolutionary operations like crossover operator, mutation operator,... constantly happened, so the simplification of those operations reduces the computational time for the entire algorithm. This paper also investigates the effectiveness of different types of encodings in the Cayley Code to find the most appropriate representations for solving clustered problems.

Consequently, we propose MFEA to solve clustered tree problems based on the Cayley Code (called CC-MFEA). The major contributions of this work are as following:

1.
Propose an effective encoding mechanism for representing many solutions of clustered tree problems into unified representation in Unified Search Space (USS) in order to apply different types of Cayley Codes: Prüfer Code, Dandelion Code, and Blob Code for encoding trees.
2.
Propose an effective decoding method to obtain the solution corresponding to each specific task problem.
3.
Propose evolutionary operations for CC-MFEA.
4.
Analyze the effectiveness of the proposed algorithm in some different clustered tree problems.
5.
Analyze the effectiveness of the Cayley Code for solving the clustered tree problems.

The rest of this paper is organized as follows. Section 2 introduces related works and Sect. 3 provides foundation knowledge of the Cayley Code. Section 4 presents the notations and definitions used for formulating problem. The proposed CC-MFEA for the clustered tree problems is elaborated in Sect. 5. Section 6 explains the setup of our experiments and reports the computed results. The paper concludes in Sect. 7 with discussions on the future extension of this research.

2 Related works

2.1 Multitasking in evolutionary algorithms

Evolutionary Algorithm (EA) [1] is one of the most well-known algorithms for finding global optimization and is a part of evolutionary computation. The mechanisms of EA are inspired by the biological evolution. Being able to solve complex problems, EA has been successfully applied in many fields such as engineering, art, economics, marketing, genetics, operations research, robotics and social sciences. Despite its various strengths, EA has shown several weaknesses, one of them is the ability to do multiple tasks at a time. Therefore, EA is not compatible with some computational model such as cloud computing. To overcome this restriction, Abhishek Gupta et al [11] proposed a new paradigm of EA, called Multifactorial Evolutionary Algorithm. This new model can perform many tasks at a time on a sole search space. R. Chandra et al [5] applied the MFEA to train neural networks. In this model, the required tasks are neural network topologies with different number of hidden neurons. The authors use the encoding and decoding rules in [11] for solution representation whose dimensionality is that of one single task (in this case, the number of hidden neurons). For evaluation of the proposed algorithm, MFEA is compared with evolutionary single-task learning (ESTL) by using the n-bit Parity Problem. The results showed that MFEA is better than ESTL on all of the cases.

In Da et al. [6], the authors used MFEA to solve a single-objective problem by transforming it into a multi-objective one. With this method, MFEA is expected to overcome the local optima and exploit better areas of the search space. To demonstrate the competence of proposed MFEA, the authors compared it with the classic single-objective evolutionary algorithm (SOEA) and the standard multi-objective evolutionary algorithm (MOEA). The symmetric Traveling Salesman Problem is selected as test problem. Order crossover, random swap mutation and 2-opt local search are used. The results show that by overcoming the local optima, the proposed algorithm can get the optimal solutions on 4 out of the 5 runs, while SOEA gets trapped at a local optimum. The proposed algorithm performs at least as well as SOEA in 20 cases and better in 11 cases. In comparison with MOEA, the proposed algorithm performs at least as well on 18 out of the 20 instances and better in 11 cases. Concerning convergence trends, the convergent speeds of the proposed algorithm and SOEA are relatively fast, and equal to each other in the first 25 generations, but in the rest generations, the proposed algorithm continues to successfully explore latent regions of the search space, while SOEA gets stuck.

A. Gupta et al [10] used the characteristics of the Evolutionary multitasking in the bi-level optimization. The authors assume that lower level optimization tasks are corresponding to the neighboring upper level population members. Therefore, the individuals in that upper level population are clustered into groups. For comparing the proposed algorithm with the basic Nested Bi-Level Evolutionary Algorithm (N-BLEA), ten-dimension variants of SMD benchmarks are selected. The representation is random-key encoding scheme. The selected crossover and mutation operator are SBX operator and polynomial mutation respectively. The experimental results show that N-BLEA is worse than the proposed algorithm with respect to the number of function evaluations of the lower level. The authors also select the composites manufacturing problem for evaluating the proposed algorithm. The results show that the convergence trend of the proposed algorithm is considerably hastened compared with that of N-BLEA.

In Gupta et al. [12], A.Gupta et al further extended the concept of multitasking into a Multi-Objective MultiFactorial Optimization (MO-MFO) paradigm, in which each constitutive task is a Multi-Objective Optimization (MOO) problem that contributes a distinct factor influencing the evolutionary search process. One of the main characteristics that makes evolutionary multitasking appealing for multi-objective problems is the implicit parallelism provided by a unified population which may allow for simultaneous convergence toward the Pareto front. Accordingly, the authors proposed the Multi-Objective Multi-Factorial Evolutionary Algorithm (MO-MFEA), and compared the performance of the proposed algorithm with the classical NSGA-II which deals with a single task at a time. To demonstrate the improvement in the convergence trends of the proposed algorithm, experiments on several synthetic benchmark functions was conducted with identical solution encoding, genetic operators, and parameter settings of the proposed MO-MFEA and the NSGA-II counterpart. The results showed that while solving a problem instance separately by NSGA-II, the IGD progressed slowly and had the tendency of getting trapped in a local Pareto front. In contract, the implicit genetic transfer provided a strong motivation to the evolutionary search process of multiple task instances being integrated, which explains the accelerated convergence characteristics achieved by the MO-MFEA.

Y.S Ong et al [17] pointed out from a practical point of view that the potential applications of effective multitask optimization are truly remarkable since by taking advantage of the underlying commonalities between different optimization tasks, multitaking provides the scope for improved performance in real-world problem solving. To prove this claim, the authors compared the performance of the proposed multitasking engine with the traditional methodology on 3 insightful computational studies of the continuous benchmark functions, combinatorial optimization problems, as well as for a more realistic optimization domain involving the path-planning of multiple unmanned aerial vehicles (multi-UAVs). In the first study, three continuous benchmark functions, namely Sphere function, Ackley function, and Rastrigin function, were combined pairwise to create the multitaking instances. Then, in order to highlight the performance of multitasking, the proposed MFEA shall be compared to an elitist single-objective evolutionary algorithm (SOEA) with similar algorithmic specifications and parameter settings. Experimental results showed that the improvement in convergence characteristics achieved during multitasking is substantial in comparison to the SOEA. As a result of intersecting global optimums between different functions which leads to predominantly positive transfer of knowledge, the MFEA can exploit the evolving population of both tasks simultaneously, efficiently overcome obstacles and converge faster, while traditional SOEA causes the population to often stuck at a local optimum. Interestingly, prior familiarity about the relationship between tasks is not a prerequisite for successful evolutionary multitasking, as the proposed MFEA can exploit latent complementarities autonomously even between apparently disparate optimization tasks, thereby successfully accelerating convergence trends. Such claim was successfully demonstrated in the second study about combinatorial optimization problems, when the authors combined a QAP(Sko100a) and a JSP(la27) to be solved together. While the single-tasking method is found to get trapped in local optimum regions of the search space, the diversified search facilitated by multitasking substantially improved performance characteristics due to constant transfer of genetic materials. Finally, by tackling 2 different multi-UAV path-planning missions at once (a pair of UAVs flying through two narrow openings in a barrier, and four UAVs flying around a centrally located geo-fence of circular planform without colliding into one another), the authors had been able to improved the obtained results which showcase the utility of multitasking in realistic domains.

2.2 Cayley code: a representation scheme for spanning tree in genetic algorithms

There are a lot of research works about representation for Trees, one of the earliest is Cayley Code when Palmer et al. [18] introduced Prüfer code. In recent years, many research works demonstrate the effect of Cayley code on Genetic Algorithm (GA), such as Julstrom et al. (Julstrom [13]) deployed the Blob Code as an individual representation in GA. The results showed that the Blob Code has high locality, and could outperform the Prüfer Code in a simple test problem. In 2005, Julstrom el al. (Julstrom [13]) showed that the Blob Code was competitive with edge-sets representation on the minimum routing cost spanning tree problem. Thompson E. B. [24] indicated that the Dandelion Code has even higher locality than the Blob Code, and offers superior performance in a GA. Futher details of the Cayley Code will be presented in Sect. 3.

2.3 Clustered tree optimization problems and several solving approaches

The network optimization problems have many applications in real life [8, 14] and many of those problems have structure being clustered [8]. For that reason, a lot of researchers have shown their keen interests in such problems. One of the most interesting research in clustered problems is Clustered Steiner Tree Problem (CluSteiner). Bang Ye Wu et al. [26] considered a version of the Steiner Minimum Tree whose vertices are allocated into clusters. Firstly, the authors show that the Steiner ratios of problems can get maximum at 4 and minimum at 3. The authors also propose an approximation algorithm for solving CluSteiner. In the new algorithm, the Hamiltonian paths are added to all local tree of the clusters, then the edges belonging to the inter-cluster topology are removed. The authors demonstrate that new algorithm can be approximate in \((2 + \rho )\) for CluSteiner in O\((nlog(n) + f(n))\) time, when local topologies are given, the problem CluSteiner can be \((1 + \rho )\)-approximated in O\((n log(n) + f (n))\) time. Chen-Wan Lin and Bang Ye Wu [14] studied another variant of clustered tree, Minimum Routing Cost Clustered Tree Problem (CluMRCT). The new constraint in CluMRCT problem is that sub-graph in each cluster is a spanning tree. The authors proved that the CluMRCT having more than 2 clusters is NP-Hard and proposed an approximation algorithm for solving the CluMRCT problem. The new approximation algorithm finds the solutions of CluMRCT problem by creating a two-level star-like graph and based on the R-Star graph. The authors demonstrate that the routing cost of R-Star graph is less than two times the cost of the optimal solution of CluMRCT problem.

In Myung et al. [16], the authors studied a generalized version of the Minimum Spanning Tree Problem. The vertices were divided into distinct groups. The goal was to find a minimum-cost tree which contains only one vertex from each group. After proving that Generalized Minimum Spanning Tree Problem (GMSTP) was NP-hard, the authors proposed two mathematical programming formulations and compared them in terms of linear programming relaxations when applied to GMSTP. The test problems were partitioned into two sets: big and small set. The small one were selected from generalized traveling salesman instances whose number of vertices ranged from 0 to 100. The larger one were randomly created through an algorithm. The experimental results showed that the gap between the upper and the lower bounds grew when the number of groups and vertices in each group increased. The authors [8] have proven that the CluSPT is in the NP-hard class and provided an approximation algorithm for CluSPT problem (denoted as ALL). ALL performs well when the number of vertices in all clusters is small. The approximation approaches including EAs are used to solve the CluSPT when the problem has large dimensionality or at least one cluster has a great number of vertices. Recently, new variants of EA, MFEA are applied to solve the CluSPT and improve the resulting solutions.

In Binh et al. [3], the authors proposed MFEA (hereinafter E-MFEA)with new genetic operator algorithm for solving the CluSPT. The major ideas of the novel genetic operators is that first constructing spanning tree for smallest sub-graph then the spanning tree for larger sub-graph are construed based on the spanning tree for smaller sub-graph. This paper also proposed the new individual encoding in unified search space. The number of clusters of individual is equal to maximum number of clusters of all tasks and the number of vertices of cluster i is maximum number of vertex of cluster i of all tasks. The experimental results point out the effect of novel MFEA. However the novel MFEA has disadvantage, for instance, to construct spanning tree for the sub-graph the edges set of the sub-graph must contain edges set of smaller sub-graph; the individual encoding can make no good solutions for each task in the initial population.

In Thanh et al. [23], the authors take the advantage of Cayley code to encode the solution of CluSPT and proposed genetic operators based on Cayley code. The ideas of genetic operators are taken ideas of genetic operator for binary and permutation representation. In this paper, effective MFEA (hereinafter C-MFEA) for CluSPT is proposed. The decoding method processes segments of genes by removing the genes which have invalid values. To present the spanning tree for a graph by Cayley code, the vertex of the graph must has larger than 2 so novel MFEA can only apply to solve CluSPT instances in which the vertex of its clusters must has larger than 2. Another disadvantage of Cayley code is only applying to complete graph therefore novel MFEA is suitable for complete graph.

The authors [4] proposed a new algorithm based on the EA and Dijkstra Algorithm. The proposed algorithm decomposes the CluSPT problem into two sub-problems, the solution of the first sub-problem is found by an EA while that of the second sub-problem is found by Dijkstra Algorithm. The goal of the first sub-problem is to determine a spanning tree which connects among the clusters, while that of the second sub-problem is to determine the best spanning tree for each cluster. In some real-life applications, the communication cost between devices of different clusters may need to be considered more than between those of the same cluster. In Wu and Lin [25], Chen-Wan Lin et al investigated the problem of finding a clustered spanning tree with minimum inter-cluster cost (InterCluMRCT) and denoted the variant of InterCluMRCT with the number of clusters is k as k-InterCluMRCT. After proving that the InterCluMRCT problem with 2 clusters is solvable in polynomial time, the authors have shown that k-InterCluMRCT is NP-hard for any fixed \(k > 2\) and presented a 2-approximation algorithm for 3-InterCluMRCT.

Although there have been many studies on the use of Cayley encoding mechanisms, evaluating the effectiveness of the MFEA algorithm when applying the Cayley Code to solve the clustered tree problems is still a novel topic and attracts a lot of attention in the scientific community. As a result, this paper focuses on analyzing the efficiency of the MFEA algorithm when using the Cayley Code to solve the clustered tree problems.

3 Preliminaries of the Cayley code

Cayley Code is one of the most classical representations for spanning trees. It offers a fast and simple coding of spanning trees whose decoding algorithm is originated from a constructive proof of the Cayley’s formula. This formula identifies the number of possible spanning trees in a complete graph on n nodes is \(n^{n-2}\). From this formula, the Cayley Code has been proposed as a tree coding scheme that represents each tree on n labelled vertices (in which the underlying graph to form the tree is a complete graph) as a string of \((n-2)\) vertex labels. The Cayley Code indicates a bijective mapping between a tree and a corresponding string which ensures that each tree corresponds to a unique string, and each string corresponds to a unique tree.

In the following subsections, this paper introduces 3 typical coding types in the Cayley Code families and demonstrates their corresponding encoding and decoding mechanisms.

3.1 Prüfer code

Prüfer Code is one of the earliest example of the Cayley Code as it was firstly used in 1918 when Prufer described a mapping between a spanning tree with n nodes and a string of \((n-2)\) node labels. Since then, this code had become well-known and been widely considered as a convenient way for representing trees, especially in the application of the genetic algorithms.

Given a tree T on n nodes. The algorithm to convert a tree into a Prüfer string is described as follows:

1.
Let i be the leaf with the smallest label (node of degree 1) in T and j be the neighbour node of i. Set the label of j to be the rightmost digit in the Prüfer string.
2.
Remove node i and the edge (i, j) from T.
3.
Repeat steps (1) and (2) until there are only 2 remaining nodes in T. The final Prüfer string is read from left to right.

It is also possible to convert from a Prüfer string to a unique tree via the following algorithm. Let \(a_{1}a_{2}...a_{n-2}\) be a Prüfer string. In the corresponding spanning tree, each node’s degree is one more than the number of times the node’s label appears. The edges of that tree is identified as follows:

1.
Scan the Prüfer number to identify each node’s degree. Initialize a variable t to 1.
2.
Find a node i of degree 1 with the smallest label. \((i, a_{t})\) is a spanning tree edge.
3.
Decrease the degree of node i and \(a_{t}\) by 1; increase t by 1.
4.
Repeat steps (2) and (3) until all nodes have degree 0, except for two nodes with degree 1 which create the spanning tree’s last edge.

Several researchers have pointed out the advantages of the Prüfer Code such as: quick and simple decoding, easy to generate a random Prüfer string and supportive in the application of conventional evolutionary operators like k-point crossover and position-by-position mutation. However, Prüfer Code actually does not support effective evolutionary search since this coding mechanism lacks the essential properties of locality (small changes in genotypes should result in small changes in the phenotypes) and heritability (solutions generated by evolutionary operators should combine characteristics of their parents).

3.2 Blob code

In 2001, another type of Cayley Code called Blob Code was identified as a GA representation for spanning trees. The Blob mapping appears to be adapted well for evolutionary search when a GA using the Blob Code tends to considerably outperform one using the Prüfer code for a simple test problem. This can be explained by the higher locality and heritability under positional crossover and mutation that the Blob Code possesses.

In this research, we reuse the encoding and decoding algorithms for Blob Code mentioned in Julstrom [13]. These two algorithms which has the time complexity of O(n), can run much faster compared to the traditional ones for the Blob Code, which may require quadratic time in the worst case. For the notation convenience, we define vertex 1 as the root vertex; the vertex i is a descendant of vertex j if there is a directed path from i to j, and successor of vertex i (succ(i)) is the unique neighbour of i on the path from i to 1 on the tree. Two algorithms below help to convert a tree T into a corresponding Blob string \(a_{2}a_{3} \ldots a_{n-1}\) and vice versa.

Encoding algorithm for the Blob Code:

1.
Define succ(i) for each vertex \(i \in [2, n]\) and add the directed edge (\(i\rightarrow succ(i)\)) into a temporary tree \(T'\).
2.
Colour each vertex \(v \in [1, n]\) black if none of the descendants of v in the directed tree \(T'\) has a label exceeding v; and colour white for the remaining vertices.
3.
Label the black vertices as \(x_{1}< x_{2}< \ldots < x_{t}\), where \(t \in [2, n]\) is the total number of black vertices. Noting that \(x_{1}=1\) and \(x_{t}=n\) since vertices 1 and n must be black.
4.
To construct the Blob string, set \(a_{i} = succ(i)\) for every white vertex \(i \in [2, n-1]\), and set \(a_{x_{i}} = succ(x_{i+1})\) for each \(i \in [2, t-1]\).

Decoding algorithm for the Blob Code:

1.
Define a digraph G with the vertex set as [1, n] and the directed edge set as \({(i \rightarrow a_{i}) : i \in [2, n - 1]}\).
2.
Colour each vertex \(v \in [1, n]\) black if none of the descendants of v in the directed tree Gs has a label exceeding v; and colour white for the remaining vertices.
3.
Label the black vertices as \(x_{1}< x_{2}< \ldots < x_{t}\), where \(t \in [2, n]\) is the total number of black vertices. Noting that \(x_{1}=1\) and \(x_{t}=n\) since vertices 1 and n must be black.
4.
To construct the tree T, create the edge \((i,a_{i})\) for each white vertex \(i \in [2, n-1]\), and create the edge \((x_{i},a_{x_{i-1}})\) for each \(i \in [3, t]\) with the final edge is \((x_{2}, 1)\).

However, EAs using the representation of Blob Code still have not performed as well as EAs using other representations (like edge-sets, etc) on most of the problems which have the search spaces of spanning trees.

3.3 Dandelion code

Recently, Dandelion Code has been proposed to be a strong alternative as it offers efficient performance for representing tree in a GA. Researchers show that this coding scheme exhibits even higher locality than the Blob Code and could potentially outperform other current Cayley code representations.

Given a Dandelion string \(a_{2}a_{3}...a_{n-1}\), the algorithm to decode the Dandelion Code into a tree T works as follows:

1.
Define a function \(f: [2, n - 1] \rightarrow [1, n]\) such that \(f(i) = a_{i}\) for each \(i \in [2, n-1]\)
2.
Determine the cycles associated with the function f so that in each cycle, the largest element is rightmost. Arrange the cycles from left to right in the decreasing order of the largest element of each cycle.
3.
Create a list L containing all the elements of every cycle with their corresponding orders after arranging in step 2.
4.
To construct the tree T from a set of n isolated vertices, build a path from 1 to n such that the intermediate nodes on this path are the nodes in the list L in left-to-right order. Then, create the edge \((i, a_{i})\) for every \(i \in [2, n-1]\) not occurring in the list L.

To encode a tree T into its corresponding Dandelion string, the encoding algorithm works simply by reversing the above procedure:

1.
Find a unique path from 1 to n in T, and let L be the ordered list of intermediate nodes
2.
Split this list L into cycles in which each cycle ends when an element that is larger than all elements to its left still can be found.
3.
The Dandelion Code corresponding to the tree T is the unique string \(a_{2}a_{3}...a_{n-1}\) which satisfies: a) for those nodes \(i \in [2, n-1]\) appear in the list L, insert the cycle information of each node i with respect to function \(f(i) = a_{i}\); b) for each \(i \in [2, n-1]\) that does not occur in the list L, the unique neighbour of i on the path from vertex i to vertex n in the tree T is \(a_{i}\).

An example of applying 3 above-mentioned types of Cayley Code for representing a spanning tree is demonstrated in Fig. 1. On the left of this figure is a spanning tree on 8 nodes with its corresponding Prüfer string, Blob string and Dandelion string shown respectively on the right.

4 Problem formulation

In this paper, a graph is a simple, connected and undirected graph. For a graph \(G = (V, E, w)\), V and E are the vertex and the edge sets, respectively, and w is the non-negative edge length function. An edge between 2 vertices u and v is denoted by (u, v), and its weight is denoted by w(u, v).

For a vertex subset U, the sub-graph of G induced by U is denoted by G[U]. For a vertex set V, a collection \(C~=~\{C_i | 1 \le i \le h\}\) of subsets of V is a partition of V if the subsets are mutually disjoint and their union is exactly V. A path of G is simple if no vertex appears more than once on the path. This paper only considers simple paths.

For a given spanning tree T of \(G = (V, E, w)\) and \(u, v \in V\), let \(d_T(u, v)\) denote length of the shortest path between u and v on T.

Local root of a cluster is a vertex which connects to a vertex in each other cluster and denoted as \(L_r\).

The CluSPT [8] is defined as following

Input:	- A weighted undirected graph \(G = (V, E, w)\).
	- Vertex set V is partitioned into h clusters \({C_1, C_2, \ldots , C_h}\).
	- A source vertex \(s \in V\).
Output:	- A spanning tree T of G.
Output:	- Sub-graph \(T[C_i] (i = 1,\ldots , h)\) is a connected graph.
Objective:	\(\displaystyle \sum _{v \in V} d_{T}(s,v) \rightarrow \) min

The InterCluMRCT [14] is defined as following

Input:	- A weighted undirected graph \(G = (V, E, w)\).
	- Vertex set V is partitioned into h clusters \({C_1, C_2, \ldots , C_h}\).
Output:	- A spanning tree T of G.
	- Sub-graph \(T[C_i] (i = 1,\ldots , h)\) is a connected graph.
Objective:	\(\displaystyle \sum _{i=1}^{k} \sum _{j \ne i} d_{T}\left( C_{i}, C_{j}\right) \rightarrow \) min
	where \(d_{T}\left( C_{i}, C_{j}\right) =\sum _{u \in C_{i}} \sum _{v \in C_{j}} d_{T}(u, v)\).

Figure 2 illustrates the cases of valid and invalid solutions of the CluSPT and the InterCluMRCT. Figure 2(a) shows the input graph G with 6 clusters, 18 vertices and vertex 1 as source vertex. Figure 2(b) presents a valid solution of the CluSPT and InterCluMRCT. In Fig. 2(c), the vertex 6 and vertex 7 in cluster 2 are not connected, so this solution violates the second condition of the output of the CluSPT and the InterCluMRCT problems.

5 Proposed algorithm

This section presents a novel approach by using the MFEA to deal with the clustered tree problems. The algorithm includes: a new individual encoding, and new evolutionary operators, i.e. new population initialization, new crossover operator and new mutation operator.

5.1 MFEA scheme to solve clustered tree problems

Multifactorial Evolutionary Algorithm is used to find the solutions of k clustered tree problems \(G_i=(V_i, E_i, w_i, C_i)\), \(i = 1,.., k\) where \(V_i\), \(E_i\), \(w_i\) and \(C_i\) are set of vertices, set of edges, weight matrix and set of clusters, respectively.

\(V_i\) is partitioned into \(m_i\)\((i=1,\ldots , k)\) clusters \(C_i=\{C^1_i, C^2_i,\ldots , C_i^{m_i}\}\).

For each clustered tree problem \(G_i\), we decompose the problem into two subproblems: The first subproblem aims to determine the spanning tree for each cluster (also called local tree), while the second one intents to construct a spanning tree connecting all of the clusters (also called global tree). Global Graph is denoted by a graph \(G'_i = (V'_i, E'_i)\) in which a cluster \(C_i\) is represented as a vertex in \(V'_i\). An edge connects two vertices of \(G'_i\) if there exists at least one edge connecting the corresponding clusters in \(G_i\).

To use CC-MFEA for solving many clustered tree problems simultaneously, a new scheme of individuals encoding based on Cayley code and a two-level approach are proposed. Cayley codes are used for representation of local trees and the global tree in which the position of the Cayley codes for local tree is placed before that of the global tree in an individual. We provide a mechanism for decoding the individual into the solution for each clustered tree problem as well as efficient evolutionary operators.

The pseudocode of our approach is presented in Algorithm 1. It should be noted that to compute the fitness of an individual \(ind_j\) for a given task, both local trees and global tree must be constructed for the corresponding solution of the clustered tree problem \(s_j\) (lines 5 and 27 in the Algorithm 1). Then, the algorithm computes the fitness of solution of the clustered tree problem \(s_j\). Finally, the fitness of individual \(ind_j\) is computed through the cost of the clustered tree problem \(s_j\) (lines 6 and 28 in the Algorithm 1).

5.2 Individual encoding in unified search space

Individuals in the unified search space are encoded through two stages: the first stage will rearrange clusters of input graphs to reduce resource consumption, while the second stage encodes the solutions of all tasks into a unified representation.

5.2.1 Encoding solution of input graphs

In order to combine many clustered tree problems into unified search space, we define an individual IND which is a graph \(G = (V, E, C)\) with properties as follows:

The number of clusters of IND is equal to the maximum number of clusters in all graph \(G_i\), denoted by N.
The number of vertices in tth cluster of an IND is equal to the maximum number of vertices in tth cluster of all tasks, denoted by \(n_t\).

Encoding procedures transform an individual into the string of genes by using Cayley encoding algorithms for representing all of the local trees and the global tree. Moreover, local trees and global tree are combined via the set of vertices which is the root of the local tree. The components of an individual are arranged as follows:

Spanning tree for tth cluster is presented as tth segment, denoted by \(s_t\) (if a task constitutes of the number of clusters which is smaller than t, that task is ignored). Values of genes in each segment are in the comprehensive range of 1 to \(n_t\).
Spanning tree for Global Graph is represented as a segment, called G-Seg.
Genes in the last segment (called M-Seg) is used for representing the set of local roots. The ith gene of this segment represents the vertex in the ith cluster.
Spanning tree are represented by Cayley Code [19].

A chromosome is divided into \(N+2\) segments. For the convenience of notation, the \(N+2\) segments will be indexed from 1 to \(N+2\). Each of the first N segments is a Cayley code corresponding to a unique local tree. The segment \(N+1\) is the Cayley code corresponding to the global tree. The segment \(N+2\) is the M-seg contains all of the root vertices of the local trees.

An example of individual encoding in new MFEA is depicted in Fig. 3. Assume that the vertices of the tasks are partitioned into clusters as Fig. 3a. An individual in the USS is illustrated in Fig. 3b. The representation of individual via Prüfer Code is depicted in Fig. 3c. This chromosome has 6 segments in which the numbers of vertices in cluster 1, 2 3 and 4 are 4, 5, 5 and 4 respectively; the G-Seg has 2 genes; M-Seg has 4 genes which are 1, 1, 3 and 2.

To find labels of vertices in all clusters for a task, the proposed CC-MFEA also builds a mapping function (called M-function) which maps vertices in every cluster of an individual in unified search space to vertices in relative cluster in that task. Take Fig. 3a, b as an example, the M-function maps vertices in cluster 2 of the individual in USS to vertices in cluster 2 of task 1 is 1 matches with 5; 2 matches with 6; 3 matches 7; 4 matches 8. Vertex 5 in cluster 2 in Fig. 3b is not mapped to task 2 because the number of vertices in cluster 2 in task 2 is 4 whereas cluster 2 in Fig. 3b has 5 vertices. The mapping functions are used in proposed decoding method.

5.2.2 Rearrange clusters on input graphs

When solutions of all tasks are encoded as an individual in USS, if the order of clusters of input graphs are kept then it may become resource consuming. For example, in the case shown in Fig. 4, Fig. 4a show as vertices in clusters of two input graphs. If the order of clusters on input graphs are kept, with the solution in USS, the number of vertices in cluster 1 is 6 while the number of vertices needed to encode cluster 2 is 5 (detail in Fig. 4b). However, if clusters of input graphs are sorted in the order of increasing of the number of vertices, then the solution in USS only needs 3 vertices for encoding cluster 1 and 6 vertices for encoding cluster 2 (detail in Fig. 4c). Clearly, this procedure reduces resources for encoding solutions.

Therefore, the clusters in the input graph are sorted in ascending (descending) order of the number of vertices in a cluster, then applied the encoding procedure in Sect. 5.2.1. For the sake of simplicity, this article assumes that the number of vertices in clusters of an input graph is sorted in increasing order.

5.3 Individual initialization method

To generate valid individuals for a problem, different representations require different methods correspondingly. We introduce a method to generate a valid individual, in which each segment corresponding to a Cayley string is generated randomly. In order to generate the valid solution for clustered tree problem, spanning tree for each cluster and G-Graph are constructed based on random Cayley Code. The detail of individual initializing method (IIM) is presented in Algorithm 2.

In Algorithm 2, Generate_Random_Cayley_Code(l) method generates randomly a Cayley code whose length is equal to \(l-2\). Select_Random_Vertices_In_Clusters() method selects a random vertex in each cluster for creating edges which connect among clusters.

5.4 Crossover operator

An advantage of Cayley Code is that it can use many existing genetic operators such as one-point crossover, PMX crossover [1], etc. In the proposed algorithm, the One-point crossover operator [1] is used. Observe that, one point in chromosome is selected then 2 gene segments after that point are swapped for generating two offspring.

Figure 5 illustrates the crossover operator in CC-MFEA in which red vertical line presents crossover point. In our encoding method, the Cayley strings of each local tree and the global one are placed adjacent to each other, and the last segment stores the information of the vertices that will be used to construct the complete solution from the local and global tree. The one-point crossover can keep characteristics of the pair of parents, and the affectivity of that crossover is different depending on the position of the selected point. If the crossover point is in the local tree segment, that segment will generate a different local tree in comparison to the pair of parents. If the point crosses in the G-seg segment then the global tree is constructed differently from the pair of parents. Finally, if that point lies in the last segment, it will produce a new set of the root of the local tree. Exchanging two good partial genes of parents through that crossover point more often could result in generating high quality solution. In the crossover operation, the probability of a crossover point is located in a cluster/G-seg/M-seg is actually proportional to the corresponding length of gene. Search space of each segment is corresponding to its length. Segment with larger length has a larger search space, thus requires more generation iterations to obtain optimal structures. In contrast, the segment with shorter length requires less generation iterations to explore good structures. By making the probability of a crossover point located in a segment proportional to its corresponding length, the larger segment can have greater opportunities to exchange good partial genes in an effort to acquire more optimal structures. As a result, crossover operation reduces the number of generation loops needed in order to obtain high quality solutions.

Because of the properties of the encoding method, the One-point crossover always produces valid offsprings, and increases the diversity of individual.

5.5 Mutation operator

The Swap-Change Mutation operator (SCM) performs two types of variation: The first type of variation generates new Cayley string by swapping two genes in the selected segment, and the effect of the change to the spanning tree depends on the level of locality of which type of Cayley code has been used. If high locality code is used for decoding Cayley codes into the tree, the swapping of two genes in that Cayley string generates the new offspring which is less different from the parent. Otherwise, the low locality decoding methods change the spanning tree significantly different from parents. The second type of variation is executed on M-Segment to change the root of the local tree with the aim of increasing the diversity of individual. The SCM is described in details in Algorithm 3.

Figure 6 provides an example of the SCM. In this example, the 3rd cluster is randomly selected. Two genes in third segment which have the values of 2 and 3 respectively (the values are in red and bold), are swapped. The value on 2nd gene in M-Seg, which is equal to 1, is replaced by the value of 3 (the values are in red and bold).

5.6 Decoding method

In this subsection, we present association between an individual in USS and an individual in task \(T_j\). The main reason for selecting Cayley Code to represent a chromosome is that we can use simple methods for decoding between tasks.

5.6.1 Decoding method for spanning tree problem

Let \(D_j\) be the dimensionality of jth task.

Step 1: Remove genes whose values are greater than \(D_j\) and store them in the corresponding order into a queue Q.
Step 2: If the number of remaining genes R is greater than \((D_j - 2)\), the solution will select the first \(D_j - 2\) genes from the remaining. Otherwise, while the number of remaining genes R is less than \((D_j - 2)\), repeat following procedure: Let a be the first element in Q; b is the result of a modulo \(D_j\); Append b to R; Delete a from front of Q.

Figure 7 presents the use of S-Decoding method for decoding a representing spanning tree into Cayley code of two tasks. The dimensionality of these two tasks are 6 and 8 respectively, and an individual in the unified search space is transformed to a task whose dimensionality is 6.

Figure 7a depicts a case in which after step 1, the number of remaining genes is greater than the dimensionality of the associated task. In step 1, as the value of second gene is 7 which is greater than 6, this gene is removed then stored in Q = \(\{7\}\). In step 2, the number of the remaining genes is 5 which is greater than \(6-2=4\), so first 4 genes are chosen to create an individual for the associated task.

Figure 7b illustrates another case of a decoding method in which the number of remaining genes is less than the dimensionality of the associated task. In step 1, genes 7, 8, 7 are removed from a segment then stored to a queue \(Q=\{7, 8, 7\}\). In step 2, the number of the remaining genes is 3 which is less than \(6-2=4\) is stored in \(R = \{1, 2, 3 \}\), one missing gene are created by getting the remainder of the first element the queue Q is 7 divided by the dimensionality of particular task is 6, and the result of 7 modulo 6 is 1, then append 1 to the end of R.

5.6.2 Decoding method for Clustered Tree Problems

New decoding method for the Clustered Tree Problems (NDM) has two stages. The first stage finds spanning tree for clusters and G-Graph by using S-Decoding method. The second stage performs on last segment for finding edges which connect among clusters.

Detail of new decoding method is described in Algorithm 5.

In Algorithm 5, the S-Decoding method is a proposed method in Sect. 5.6.1; Line 8 and 13 use the mapping functions to relabel for vertices in clusters.

Figure 8 illustrates the proposed decoding method. Figure 8a presents an individual in USS. Representation of individual via Prüfer code is depicted in Fig. 8b. Figure 8c illustrates chromosome via Prüfer code after performing from line 1 to line 12 of Algorithm 5. Figure 8d represents a solution which is constructed from chromosome in Fig. 8c. Figure 8e illustrates the complete solution which receives from Fig. 8d after performing the M-functions in Algorithm 5.

6 Computational results

6.1 Problem instances

To assess the performance of the proposed algorithm, CluSPT instances are selected from Binh et al. [4] and InterCluMRCT instances are selected from Clustered Traveling Salesman Problem (CluTSP) instances [15]. The main reason for selecting those instances is that the instances had been generated with various algorithms to be suitable for clustered problems [15].

The experiments is performed on all small instances of Type 1, Type 5 and Type 6. The instances are selected randomly for implementing CC-MFEA. All tested instances are available via Thanh [22].

The Prüfer Code [18], Blob Code (Julstrom [13]) and Dandelion Code [19] are the well-known example of Cayley codes, so those Codes are chosen to implement experiments.

6.2 Experimental setup

We focus on the following criteria to assess the quality of the output of the algorithms.

Criteria
Average (Avg)	Average function value over all runs
Best-found (BF)	Best function value achieved over all runs

Each problem instance was evaluated 30 times on Intel Core i7 - 4790, 16GB RAM computer using windows 8 64-bit. The source codes were written in the Java language.

The simulation parameters include population size = 100, number of evaluations = 50000, probability of random mating = 0.5, mutation rate = 0.05 and number of tasks = 2.

6.3 Experimental results

In this section, we analyze the results obtained by the proposed algorithm using different types of Cayley codes. The performance of each Cayley Code is evaluated in both single-task and multi-task environment. In addition, the effectiveness of combining 2 types of variation in the proposed SCM is also examined. All of the obtained results are then carefully investigated under some statistical analysis methods such as Wilcoxon Signed-rank Test to show the effectiveness of the proposed algorithm.

In tables of results in this section, the italic, red cells on the Prüfer Code column mean that on those instances, CC-MFEA using Prüfer Code outperforms using Dandelion Code. Similar for other encoding techniques, cells in italic and red color on the Dandenion Code and Blob Code column mean that CC-MFEA using Dandelion Code exceeds the CC-MFEA using the Blob Code and CC-MFEA using Blob Code surpasses CC-MFEA using Prüfer Code, respectively. The values in bold on encoding techniques column means that CC-MFEA using the encoding techniques outperforms using two other encoding techniques.

6.3.1 The efficacy of the proposed variation of mutation operator

To evaluate the effectiveness of each variation to the proposed mutation, Dandelion Code is chosen to conduct the following experiments: In the first experiment, we perform only the first type of variation which is swapping two genes in the selected segments, while the second experiment only uses the second type of variation that changes an element in the M-seg segment. The obtained results from the two experiments above are then compared to those of the proposed SCM which is a combination of both variation types.

Table 1 Results obtained by variations of the mutation operator using Dandelion code on instances in Type 1

Multifactorial evolutionary algorithm for solving clustered tree problems: competition among Cayley codes

Abstract

Similar content being viewed by others

Two levels approach based on multifactorial optimization to solve the clustered shortest path tree problem

Evolutionary dynamic grouping based cooperative co-evolution algorithm for large-scale optimization

A Two-Level Hybrid Based Genetic Algorithm to Solve the Clustered Shortest-Path Tree Problem Using the Prüfer Code

Explore related subjects

1 Introduction

2 Related works

2.1 Multitasking in evolutionary algorithms

2.2 Cayley code: a representation scheme for spanning tree in genetic algorithms

2.3 Clustered tree optimization problems and several solving approaches

3 Preliminaries of the Cayley code

3.1 Prüfer code

3.2 Blob code

3.3 Dandelion code

4 Problem formulation

5 Proposed algorithm

5.1 MFEA scheme to solve clustered tree problems

5.2 Individual encoding in unified search space

5.2.1 Encoding solution of input graphs

5.2.2 Rearrange clusters on input graphs

5.3 Individual initialization method

5.4 Crossover operator

5.5 Mutation operator

5.6 Decoding method

5.6.1 Decoding method for spanning tree problem

5.6.2 Decoding method for Clustered Tree Problems

6 Computational results

6.1 Problem instances

6.2 Experimental setup

6.3 Experimental results

6.3.1 The efficacy of the proposed variation of mutation operator

6.3.2 The effect of encoding techniques in single task

6.3.3 The effect of encoding techniques when CC-MFEA solves two different problems

6.3.4 Comparison of the performances of Single-tasking and Multi-tasking

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation