Ant colony algorithm with Stackelberg game and multi-strategy fusion

Chen, Da; You, XiaoMing; Liu, Sheng

doi:10.1007/s10489-021-02774-9

Ant colony algorithm with Stackelberg game and multi-strategy fusion

Published: 13 September 2021

Volume 52, pages 6552–6574, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Applied Intelligence Aims and scope Submit manuscript

Ant colony algorithm with Stackelberg game and multi-strategy fusion

Download PDF

479 Accesses
13 Citations
Explore all metrics

Abstract

Aiming at the disadvantages of the ant colony algorithm, such as slow convergence speed and easy to fall into local optimum, this paper proposes an ant colony algorithm with Stackelberg game and multi-strategy fusion. Firstly, Stackelberg game is established between ant colonies, and the population with the excellent performance is taken as the leader to increase the influence of excellent ant colony. Secondly, a multi-strategy fusion system is proposed, which is composed of three strategies: One is the pheromone fusion strategy, which selects the population whose entropy is less than the threshold value and the population with the highest similarity for pheromone fusion to increase the diversity of the algorithm. The second is the elite ant learning strategy, which speeds up the convergence rate by learning the elite ants of the elite population; The third is the pheromone recombination strategy, which helps the algorithm jump out of the local optimum. The simulation experiments of multiple cases in TSPLIB show that the improved algorithm balances diversity and the convergence speed, and effectively improves the quality of the solution.

Multi-Colony Ant Optimization Based on Pheromone Fusion Mechanism of Cooperative Game

Article 10 August 2021

A Multi-objective Ant Colony Optimization Algorithm with Local Optimum Avoidance Strategy

Multi-ant colony optimization based on bidirectional induction mechanism and cooperative game

Article 12 June 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Traveling Salesman Problem [1,2,3,4,5] (TSP) is a classic NP-hard problem. There are many algorithms to solve the TSP problem, among which the ant colony algorithm is one of the best algorithms because it has good robustness and convergence speed. In addition, ant colony algorithms have a variety of applications in other fields, such as robot path planning [6], network routing problems [7] and scheduling problems [8, 9].

In the early 1990s, Italian scholars Dorigo et al. proposed the Ant System (AS) [10, 11] based on the foraging behavior of ants, but the algorithm has a slow convergence speed and is prone to stagnation in solving the TSP problem. Therefore, in 2006, Dorigo proposed the ant colony system (ACS) [12]. Compared with the AS algorithm, this algorithm greatly improves the convergence speed and solution accuracy in solving the TSP problem. At the same time, Stutzle et al. proposed the Max-min Ant System (MMAS) [13], which sets the upper and lower bounds as pheromone as a way to improve the solution efficiency of the algorithm. The above are some classical ant colony algorithms, which have greatly improved the solving ability but still have problems such as easy stagnation, slow convergence, and poor diversity.

Since then many experts have come up with their own improvements, Some algorithms improved the diversity of the algorithms; Yue Wu et al. helped the algorithm to improve the computational efficiency by designing a local search method [14]; Wei Gao proposed a premium penalty strategy that changes the distribution of pheromone as a way to increase the diversity on good paths and increase the search space [15]; Ye Ke et al. proposed a negative feedback mechanism to help the algorithm explore more unknown regions by continuously acquiring the experience of ant failures [16]; S.Li et al. proposed a collective action mechanism to improve the collaboration between individual ants [17]. The above algorithms enhance diversity through their respective methods, but suffer from the disadvantage of slow convergence of the algorithms.

Some algorithms improved the convergence of the algorithm; Qin et al. accelerated the convergence of the algorithm by enhancing the pheromone update and improving the positive feedback of the optimal path [18]. Sahar used clustering method to reduce the number of nodes in the TSP, which greatly improves the convergence speed. [19] These algorithms alleviate the problem of slow iteration of the ant colony algorithm to some extent, but do not have good diversity such that they tend to fall into local optima in the later stages of the algorithm.

There is a contradiction between the convergence speed and diversity of ant colony algorithms, and to improve this problem, some scholars have proposed many ingenious methods. Rafał; Skinderowicz used GPU parallel computing to speed up operations and increase the diversity of the algorithm by changing the roulette wheel [20]. Zhao D. et al. proposed a horizontal crossover strategy and a vertical crossover strategy for speeding up the convergence and expanding the search range of ants, respectively [21]. Liu et al. proposed a random reference mechanism to speed up the convergence and used a chaotic reinforcement strategy to improve the accuracy of the solution [22]. Han, ZP et al. combined the ant colony algorithm with symbiotic search to improve the search efficiency by adaptively changing the search of ants while rapidly selecting optimization parameters [23]. Wei Gao applied two meeting ants to construct the path and a strategy that polarized the pheromone on the path to improve the computational efficiency and effectiveness of the algorithm, respectively [24].

Other scholars applied multiple swarm algorithms and introduced knowledge from other fields to balance convergence and diversity and improve coordination between ants. Xiaoyu Wang et al. evaluated the uncertainty of pheromone through information entropy and improved population diversity through disturbance mechanism [25]. Zhou et al. proposed an ant colony algorithm that combines different search ranges and convergence speeds to greatly increase the diversity of the algorithm [26]. Dehui Zhang et al. proposed a collaborative filtering strategy to improve the efficiency of communication between populations [27]. Han Pan et al. smoothed the pheromones through a dynamic bootstrap mechanism and determined the communication frequency by comparing minimum spanning trees [28].

In this paper, we will propose ant colony algorithm with Stackelberg game and multi-strategy fusion(MSACS) for TSP problem; In order to improve the influence of high-quality populations, the algorithm will establish a Stackelberg game among multiple populations; In addition, a multi-strategy fusion system is proposed to communicate various information between populations. The main contributions of this paper are as follows:

1.
In order to apply the superiority of the different populations, we choose ACS algorithm and MMAS algorithm to form heterogeneous multi-population ant colony algorithm.
2.
The Stackelberg game model is established among multiple populations; Through the comprehensive evaluation of convergence, diversity and the overall state of the algorithm, the current best quality population is selected as the leader; Then the leader acts as a pioneer to explore the path for other populations, and forms a cooperative relationship with other populations through the exchange of information, so as to maximize the benefits of the whole system.
3.
Multi-strategy fusion mechanism is used to improve the information exchange between populations. There are three strategies in this mechanism: The pheromone fusion strategy is used to improve the diversity of populations with low information entropy; The elite ant learning strategy is used to learn elite ants from excellent population to improve the convergence speed of the algorithm; The pheromone recombination strategy is used to smooth the over-high pheromone in non-public path, and local search is carried out to help the algorithm jump out of local optimal.

This paper is organized as follows: Section 2 introduces the background knowledge of traditional ant colony algorithm, information entropy and Stackelberg game; Section 3 introduces the main innovations and contributions of this paper; Section 4 describes the various comparative experiments and parameter Settings; Section 5 mainly summarizes our work and some of our future research directions.

2 Related research

2.1 Ant colony algorithm for solving Tsp problem

2.1.1 Path Selection

As shown in (1), the transfer of ant k from city i to city j conforms to the pseudo-random rule:

$$ j = \left\{ \begin{array}{l} {\arg_{}}\max \left\{ {{\tau_{ij}}{{\left[ {{\eta_{ij}}} \right]}^{\beta} }} \right\},q \le {q_{0}}\\ J,q > {q_{0}} \end{array} \right. $$

(1)

where q is a random number between [0,1]; q₀ is a deterministic parameter between [0,1];When the random number q is less than the parameter q₀, the ant k moves to the next city in the determined direction; When the random number q is greater than the parameter q₀, the ant k then moves to the next city by probability with the J; J is calculated from (2).

$$ p_{ij}^{k} = \left \{ \begin{array}{l} \frac{{{{\left[ {{\tau_{ij}}} \right]}^{\alpha} }{{\left[ {{\eta_{ij}}} \right]}^{\beta} }}}{{\sum\limits_{l \in {N_{i}^{k}}} {{{\left[ {{\tau_{ij}}} \right]}^{\alpha} }{{\left[ {{\eta_{ij}}} \right]}^{\beta} }} }},j \in {N_{i}^{k}}\\ 0,j \notin {N_{i}^{k}} \end{array} \right. $$

(2)

where α is the heuristic factor of pheromone; β is the heuristic factor of the greedy rule; ${N_{i}^{k}}$ is the collection of cities that the ant k can choose from; τ_ij is the concentration of pheromone between node i and node j; η_ij is the heuristic function, whose expression is formula (3):

$$ \eta_{ij}=\frac{1}{d_{ij}} $$

(3)

where d_ij is the distance between node i and node j;

2.1.2 Pheromone update

Local pheromone update rule: After the ant moves from city i to city j, the calculus will be updated for local pheromones as shown in (4):

$$ {\tau_{ij}} \leftarrow \left( {1 - \rho } \right){\tau_{ij}} + \rho {\tau_{0}} $$

(4)

where ρ is the volatile factor of local pheromone; τ₀ is the initial value of the pheromone;

Global pheromone update rule: When all ants have finished solving, the algorithm will perform a global pheromone update, as shown in (5):

$$ {\tau_{ij}} \leftarrow \left( {1 - \xi } \right){\tau_{ij}} + \xi {\Delta} \tau_{ij}^{bs} $$

(5)

$$ {\Delta} \tau_{ij}^{bs} = \frac{1}{{{C^{bs}}}} $$

(6)

where ξ is the volatility factor of the global pheromone; C^bs is the value of the global optimal solution; ${\Delta } \tau _{ij}^{bs}$ is the increment of global pheromone, as shown in (6):

2.2 Max-min ant colony algorithm

In order to improve the search efficiency of ant colony algorithm and avoid falling into local optimum prematurely, The MMAS algorithm sets upper and lower limits for the pheromone. The calculation methods of $\tau _{\max \limits }$ and $\tau _{\min \limits }$ are shown in (7) and (8):

$$ {\tau_{\max }} = \left( {{1{\left/ {\vphantom {1 \rho }} \right.} \rho }} \right)*\left( {{1 {\left/ {\vphantom {1 {{\tau^{gb}}}}} \right. } {{T^{gb}}}}} \right) $$

(7)

$$ {\tau_{\min }} = {{{\tau_{\max }}} \mathord{\left/ {\vphantom {{{\tau_{\max }}} {2n}}} \right. } {2n}} $$

(8)

Where $\tau _{{\max \limits } }$ is the upper limit set by the algorithm for the pheromone; $\tau _{{\min \limits } }$ is the lower limit of pheromone; T^gb 1 is the current optimal solution value of the algorithm; ρ is the volatility factor of the pheromone; n is the number of nodes in the city.

2.2.1 Pheromone Update

The MMAS algorithm updates only the pheromone on the current optimal solution path; From (9) and (10), we can see the updating rules of pheromone.

$$ {\tau_{ij}}\left( {t + 1} \right) = \left( {1 + \rho } \right){\tau_{ij}}\left( t \right) + {\Delta} \tau_{ij}^{best} $$

(9)

$$ {\Delta} \tau_{ij}^{best} = {1 \mathord{\left/ {\vphantom {1 {f\left( {{s^{best}}} \right)}}} \right. } {f\left( {{s^{best}}} \right)}} $$

(10)

Where τ_ij is the value of the pheromone between node i and node j in the MMAS algorithm; t is the iteration count; ${\Delta } \tau _{ij}^{best}$ is the amount of change in the pheromone of the node through which the current optimal individual passes, which is obtained from (10); $f\left ({{s^{best}}} \right )$ is the size of the current optimal solution.

2.3 Lnformation entropy

Information entropy was put forward by Shannon [29] to evaluate the degree of disorder of information. The calculation method is shown in (11):

$$ S\left( x \right) = - \sum\limits_{i = 1}^{n} {P\left( {{x_{i}}} \right)} {\log_{b}}\left( {P\left( {{x_{i}}} \right)} \right) $$

(11)

Where b is the base value of the logarithm; P(x_i) is the quality function of probability. n is the number of solutions.

2.4 Stackelberg game

The Stackelberg game is a dynamic game of bounded rationality proposed by Stackelberg in 1953 [30]. First, the Stackelberg game divides the players into leaders and followers; Then, the leader goes first, and the followers make the decision that suits their own best interests on the basis of the leader’s decision, and finally achieve dynamic equilibrium. The game model can be described as (12):

$$ H = \{ \left( {L \cup \{ F\} } \right),{\{ {P_{l}}\}_{l \in L}},{\{ {P_{f}}\}_{f \in F}},{R_{l}},{R_{f}}\} $$

(12)

where L is the selected leader, {L ∪{F} is all the participants; P_l is the leader’s policy set, P_f is the follow policy set, R_l is the leader’s revenue function, and R_f is the revenue of follows.

The goal of each game is to maximize the respective revenue, so the objective functions of leader and follow are (13) and (14):

$$ \max_{{p_{l}} \in {P_{l}}} R\left( {{p_{l}}} \right) $$

(13)

$$ \max_{{p_{f}} \in {P_{f}}} {R_{n}}\left( {{p_{f}}} \right) $$

(14)

where $R\left ({{p_{l}}} \right )$ and ${R_{n}}\left ({{p_{f}}} \right )$ are the objective functions of the leader and the follower respectively, and the whole master-slave game aims at maximising these two values.

To sum up, when both sides of the game reach Nash equilibrium, all participants gain the most and do not change their strategies Return matrix.

3 Ant colony algorithm with Stackelberg game and multi-strategy fusion

The Stackelberg game is a classic game model. In this game, the leader makes the decision first, and the followers make their own decisions after the leader makes the decision. Based on the traditional idea of Stackelberg game, we design a Stackelberg game among multi ant colonies. When the algorithm is running, the group that is beneficial to the whole system will be selected as leader, while the others will be viewed as followers. Then, the leader, a pioneer that explored other paths, trains a number of iterations after exchanging information with other followers. After the path has been explored, the most beneficial strategy will be applied by the followers to obtain the information explored by the leader. Finally, after exchanging information, the followers continue to explore the path. From the Fig. 1, we can see how the Stackelberg game between multiple populations works. The dynamic Stackelberg game model is divided into the following steps:

step1::: Select the leader by (18);
step2::: Leader make decisions first;
step3::: Followers make decisions after the leader;
step4::: back to Step1;

3.1 The establishment of Stackelberg game among ant colony algorithms

3.1.1 Parameters for evaluating population and algorithm state

Information entropy evaluates diversity

There are many indicators to evaluate the diversity of ant colony algorithm, such as standard deviation, information entropy and so on. For ant colony algorithms, keeping the variety of algorithms allows ants to choose more different paths. We use information entropy to measure the diversity of the algorithms, as shown in (15)

$$ E\left( P \right) = - \sum\limits_{x \in X} {P\left( x \right)} \log \left( {P\left( x \right)} \right) $$

(15)

Where $P\left (x \right )$ is the size of ant x in the current solution. $E\left (P \right )$ is the information entropy of all the solutions in the current population. The larger the deviation value of the solution is, the higher the information entropy will be, thus the higher the diversity will be.

Convergence evaluation

Convergence indicates the convergence ability of the current population, and the better the convergence is, the faster the current algorithm converges. In this section, we use (16) to evaluate the convergence of the current algorithm.

$$ conv = \frac{{length_{\min }^{i} - length_{\min }}}{{ite{r^{i}} - ite{r^{j}}}} $$

(16)

Where conv is the convergence of the current population, $length_{\min \limits }$ is the optimal solution of the current population; iter^j is the current number of iterations; $length_{{\min \limits } }^{i}$ was the optimal solution last time; iterⁱ is the number of iterations in which the last optimal solution is found.

Assess system status

Similarly, we use the convergence of the optimal solution to measure the current state of the algorithm; If the optimal solution converges rapidly, it indicates that the current system is in a period of rapid convergence. At this time, the algorithm needs the population with better convergence as the leader; otherwise, it needs the population with better diversity. We use (17) to express the convergence of the overall algorithm.

$$ CO = \frac{{iter_{\min }^{i}}}{{ite{r_{\min }}}} $$

(17)

Where $iter_{\min \limits }$ is the number of iterations of the current optimal solution; $iter_{{\min \limits } }^{i}$ is the number of iterations of the previous optimal solution.

3.1.2 Multi-factor evaluation mechanism

After the number of planned iterations, the comprehensive evaluation index of each population will be calculated by (18), and the population with the highest score will be selected as the leader.

$$ {Y_{i}} = \left( {1 + C{O}} \right)con{v_{i}}E{\left( P \right)_{i}} $$

(18)

Where Y_i is the measure of the comprehensive evaluation index of population i; CO reflects the overall convergence of the algorithm, which is obtained from (17); conv_i is the convergence of population i; $E{\left (P \right )_{i}}$ is the information entropy of population i, which represents the diversity of population.

3.1.3 The choice of leader

In our system, leader exists to lead other groups because of their distinct advantages over other groups. Equation (18) is used to evaluate the overall performance of the population. There are many reasons for using these three variables as the standard of the comprehensive ability of the population; First, how leader is chosen depends on the current state of the system. If the current system is in a period of rapid convergence, it indicates that we need the population with better convergence and higher precision as the leader; If the convergence of the algorithm is slow at present, it means that the whole system needs those populations with good diversity as leaders to help the population find more different paths. From the (17), we can judge the convergence of the algorithm as a whole, and the current state of the algorithm can also be judged from another perspective. Secondly, we also need to consider the current state of the population, so we add the evaluation index of convergence and diversity of the algorithm into the formula, which can reflect the state index of a single ant colony to some extent. Therefore, by correlating these three parameters and considering the overall performance of the algorithm, we also consider the state of a single ant colony, which can reflect the comprehensive performance of the population.

3.1.4 Number of training iterations for leaders

In order to explore more paths, the leader trains M iterations in advance. There are many ways to determine M, such as specifying M as a fixed value directly, but that is not the best way. In this section, we use the similarity (We use cosine similarity to evaluate the population similarity, as shown in (20)) between populations to determine whether the leader has been trained; As shown in (19), when L is greater than l (l is the threshold), the leader training is considered to be completed, and then the number of training iterations of the leader is M. The selection steps for M are shown in Table 1 below.

$$ L = \frac{{\left| {S_{M}^{leader} - S_{M}^{follower\_\min }} \right|}}{{S_{M}^{leader}}} \times 100\% $$

(19)

Where $S_{M}^{leader}$ is the cosine similarity of the leaders; $S_{M}^{follower\_{\min \limits } }$ is the minimum cosine similarity of the follower; L represents the degree of difference between leaders and followers. The greater the value of L, the more obvious the difference is.

Table 1 The calculation process of M

Ant colony algorithm with Stackelberg game and multi-strategy fusion

Abstract

Similar content being viewed by others

Multi-Colony Ant Optimization Based on Pheromone Fusion Mechanism of Cooperative Game

A Multi-objective Ant Colony Optimization Algorithm with Local Optimum Avoidance Strategy

Multi-ant colony optimization based on bidirectional induction mechanism and cooperative game

Explore related subjects

1 Introduction

2 Related research

2.1 Ant colony algorithm for solving Tsp problem

2.1.1 Path Selection

2.1.2 Pheromone update

2.2 Max-min ant colony algorithm

2.2.1 Pheromone Update

2.3 Lnformation entropy

2.4 Stackelberg game

3 Ant colony algorithm with Stackelberg game and multi-strategy fusion

3.1 The establishment of Stackelberg game among ant colony algorithms

3.1.1 Parameters for evaluating population and algorithm state

3.1.2 Multi-factor evaluation mechanism

3.1.3 The choice of leader

3.1.4 Number of training iterations for leaders

3.1.5 Cosine similarity

3.2 Multi-strategy fusion mechanism

3.2.1 Pheromone fusion strategy

3.2.2 Elite ant learning strategy

3.2.3 Pheromone recombination strategy

3.3 Algorithm framework

4 Experiment and simulation

4.1 Parameter setting

4.2 Policy testing and performance analysis

4.2.1 Analysis of the effectiveness of the mechanism

4.2.2 Analysis of the frequency of information exchange

4.3 Comparative experimental analysis

4.3.1 Contrastive analysis with classical ant colony algorithm

4.3.2 Comparative analysis with other ant colony algorithms and other intelligent algorithms

5 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation