1 Introduction

Traveling Salesman Problem [1,2,3,4,5] (TSP) is a classic NP-hard problem. There are many algorithms to solve the TSP problem, among which the ant colony algorithm is one of the best algorithms because it has good robustness and convergence speed. In addition, ant colony algorithms have a variety of applications in other fields, such as robot path planning [6], network routing problems [7] and scheduling problems [8, 9].

In the early 1990s, Italian scholars Dorigo et al. proposed the Ant System (AS) [10, 11] based on the foraging behavior of ants, but the algorithm has a slow convergence speed and is prone to stagnation in solving the TSP problem. Therefore, in 2006, Dorigo proposed the ant colony system (ACS) [12]. Compared with the AS algorithm, this algorithm greatly improves the convergence speed and solution accuracy in solving the TSP problem. At the same time, Stutzle et al. proposed the Max-min Ant System (MMAS) [13], which sets the upper and lower bounds as pheromone as a way to improve the solution efficiency of the algorithm. The above are some classical ant colony algorithms, which have greatly improved the solving ability but still have problems such as easy stagnation, slow convergence, and poor diversity.

Since then many experts have come up with their own improvements, Some algorithms improved the diversity of the algorithms; Yue Wu et al. helped the algorithm to improve the computational efficiency by designing a local search method [14]; Wei Gao proposed a premium penalty strategy that changes the distribution of pheromone as a way to increase the diversity on good paths and increase the search space [15]; Ye Ke et al. proposed a negative feedback mechanism to help the algorithm explore more unknown regions by continuously acquiring the experience of ant failures [16]; S.Li et al. proposed a collective action mechanism to improve the collaboration between individual ants [17]. The above algorithms enhance diversity through their respective methods, but suffer from the disadvantage of slow convergence of the algorithms.

Some algorithms improved the convergence of the algorithm; Qin et al. accelerated the convergence of the algorithm by enhancing the pheromone update and improving the positive feedback of the optimal path [18]. Sahar used clustering method to reduce the number of nodes in the TSP, which greatly improves the convergence speed. [19] These algorithms alleviate the problem of slow iteration of the ant colony algorithm to some extent, but do not have good diversity such that they tend to fall into local optima in the later stages of the algorithm.

There is a contradiction between the convergence speed and diversity of ant colony algorithms, and to improve this problem, some scholars have proposed many ingenious methods. Rafał; Skinderowicz used GPU parallel computing to speed up operations and increase the diversity of the algorithm by changing the roulette wheel [20]. Zhao D. et al. proposed a horizontal crossover strategy and a vertical crossover strategy for speeding up the convergence and expanding the search range of ants, respectively [21]. Liu et al. proposed a random reference mechanism to speed up the convergence and used a chaotic reinforcement strategy to improve the accuracy of the solution [22]. Han, ZP et al. combined the ant colony algorithm with symbiotic search to improve the search efficiency by adaptively changing the search of ants while rapidly selecting optimization parameters [23]. Wei Gao applied two meeting ants to construct the path and a strategy that polarized the pheromone on the path to improve the computational efficiency and effectiveness of the algorithm, respectively [24].

Other scholars applied multiple swarm algorithms and introduced knowledge from other fields to balance convergence and diversity and improve coordination between ants. Xiaoyu Wang et al. evaluated the uncertainty of pheromone through information entropy and improved population diversity through disturbance mechanism [25]. Zhou et al. proposed an ant colony algorithm that combines different search ranges and convergence speeds to greatly increase the diversity of the algorithm [26]. Dehui Zhang et al. proposed a collaborative filtering strategy to improve the efficiency of communication between populations [27]. Han Pan et al. smoothed the pheromones through a dynamic bootstrap mechanism and determined the communication frequency by comparing minimum spanning trees [28].

In this paper, we will propose ant colony algorithm with Stackelberg game and multi-strategy fusion(MSACS) for TSP problem; In order to improve the influence of high-quality populations, the algorithm will establish a Stackelberg game among multiple populations; In addition, a multi-strategy fusion system is proposed to communicate various information between populations. The main contributions of this paper are as follows:

  1. 1.

    In order to apply the superiority of the different populations, we choose ACS algorithm and MMAS algorithm to form heterogeneous multi-population ant colony algorithm.

  2. 2.

    The Stackelberg game model is established among multiple populations; Through the comprehensive evaluation of convergence, diversity and the overall state of the algorithm, the current best quality population is selected as the leader; Then the leader acts as a pioneer to explore the path for other populations, and forms a cooperative relationship with other populations through the exchange of information, so as to maximize the benefits of the whole system.

  3. 3.

    Multi-strategy fusion mechanism is used to improve the information exchange between populations. There are three strategies in this mechanism: The pheromone fusion strategy is used to improve the diversity of populations with low information entropy; The elite ant learning strategy is used to learn elite ants from excellent population to improve the convergence speed of the algorithm; The pheromone recombination strategy is used to smooth the over-high pheromone in non-public path, and local search is carried out to help the algorithm jump out of local optimal.

This paper is organized as follows: Section 2 introduces the background knowledge of traditional ant colony algorithm, information entropy and Stackelberg game; Section 3 introduces the main innovations and contributions of this paper; Section 4 describes the various comparative experiments and parameter Settings; Section 5 mainly summarizes our work and some of our future research directions.

2 Related research

2.1 Ant colony algorithm for solving Tsp problem

2.1.1 Path Selection

As shown in (1), the transfer of ant k from city i to city j conforms to the pseudo-random rule:

$$ j = \left\{ \begin{array}{l} {\arg_{}}\max \left\{ {{\tau_{ij}}{{\left[ {{\eta_{ij}}} \right]}^{\beta} }} \right\},q \le {q_{0}}\\ J,q > {q_{0}} \end{array} \right. $$
(1)

where q is a random number between [0,1]; q0 is a deterministic parameter between [0,1];When the random number q is less than the parameter q0, the ant k moves to the next city in the determined direction; When the random number q is greater than the parameter q0, the ant k then moves to the next city by probability with the J; J is calculated from (2).

$$ p_{ij}^{k} = \left \{ \begin{array}{l} \frac{{{{\left[ {{\tau_{ij}}} \right]}^{\alpha} }{{\left[ {{\eta_{ij}}} \right]}^{\beta} }}}{{\sum\limits_{l \in {N_{i}^{k}}} {{{\left[ {{\tau_{ij}}} \right]}^{\alpha} }{{\left[ {{\eta_{ij}}} \right]}^{\beta} }} }},j \in {N_{i}^{k}}\\ 0,j \notin {N_{i}^{k}} \end{array} \right. $$
(2)

where α is the heuristic factor of pheromone; β is the heuristic factor of the greedy rule; \({N_{i}^{k}}\) is the collection of cities that the ant k can choose from; τij is the concentration of pheromone between node i and node j; ηij is the heuristic function, whose expression is formula (3):

$$ \eta_{ij}=\frac{1}{d_{ij}} $$
(3)

where dij is the distance between node i and node j;

2.1.2 Pheromone update

Local pheromone update rule: After the ant moves from city i to city j, the calculus will be updated for local pheromones as shown in (4):

$$ {\tau_{ij}} \leftarrow \left( {1 - \rho } \right){\tau_{ij}} + \rho {\tau_{0}} $$
(4)

where ρ is the volatile factor of local pheromone; τ0 is the initial value of the pheromone;

Global pheromone update rule: When all ants have finished solving, the algorithm will perform a global pheromone update, as shown in (5):

$$ {\tau_{ij}} \leftarrow \left( {1 - \xi } \right){\tau_{ij}} + \xi {\Delta} \tau_{ij}^{bs} $$
(5)
$$ {\Delta} \tau_{ij}^{bs} = \frac{1}{{{C^{bs}}}} $$
(6)

where ξ is the volatility factor of the global pheromone; Cbs is the value of the global optimal solution; \({\Delta } \tau _{ij}^{bs}\) is the increment of global pheromone, as shown in (6):

2.2 Max-min ant colony algorithm

In order to improve the search efficiency of ant colony algorithm and avoid falling into local optimum prematurely, The MMAS algorithm sets upper and lower limits for the pheromone. The calculation methods of \(\tau _{\max \limits }\) and \(\tau _{\min \limits }\) are shown in (7) and (8):

$$ {\tau_{\max }} = \left( {{1{\left/ {\vphantom {1 \rho }} \right.} \rho }} \right)*\left( {{1 {\left/ {\vphantom {1 {{\tau^{gb}}}}} \right. } {{T^{gb}}}}} \right) $$
(7)
$$ {\tau_{\min }} = {{{\tau_{\max }}} \mathord{\left/ {\vphantom {{{\tau_{\max }}} {2n}}} \right. } {2n}} $$
(8)

Where \(\tau _{{\max \limits } }\) is the upper limit set by the algorithm for the pheromone; \(\tau _{{\min \limits } }\) is the lower limit of pheromone; Tgb 1 is the current optimal solution value of the algorithm; ρ is the volatility factor of the pheromone; n is the number of nodes in the city.

2.2.1 Pheromone Update

The MMAS algorithm updates only the pheromone on the current optimal solution path; From (9) and (10), we can see the updating rules of pheromone.

$$ {\tau_{ij}}\left( {t + 1} \right) = \left( {1 + \rho } \right){\tau_{ij}}\left( t \right) + {\Delta} \tau_{ij}^{best} $$
(9)
$$ {\Delta} \tau_{ij}^{best} = {1 \mathord{\left/ {\vphantom {1 {f\left( {{s^{best}}} \right)}}} \right. } {f\left( {{s^{best}}} \right)}} $$
(10)

Where τij is the value of the pheromone between node i and node j in the MMAS algorithm; t is the iteration count; \({\Delta } \tau _{ij}^{best}\) is the amount of change in the pheromone of the node through which the current optimal individual passes, which is obtained from (10); \(f\left ({{s^{best}}} \right )\) is the size of the current optimal solution.

2.3 Lnformation entropy

Information entropy was put forward by Shannon [29] to evaluate the degree of disorder of information. The calculation method is shown in (11):

$$ S\left( x \right) = - \sum\limits_{i = 1}^{n} {P\left( {{x_{i}}} \right)} {\log_{b}}\left( {P\left( {{x_{i}}} \right)} \right) $$
(11)

Where b is the base value of the logarithm; P(xi) is the quality function of probability. n is the number of solutions.

2.4 Stackelberg game

The Stackelberg game is a dynamic game of bounded rationality proposed by Stackelberg in 1953 [30]. First, the Stackelberg game divides the players into leaders and followers; Then, the leader goes first, and the followers make the decision that suits their own best interests on the basis of the leader’s decision, and finally achieve dynamic equilibrium. The game model can be described as (12):

$$ H = \{ \left( {L \cup \{ F\} } \right),{\{ {P_{l}}\}_{l \in L}},{\{ {P_{f}}\}_{f \in F}},{R_{l}},{R_{f}}\} $$
(12)

where L is the selected leader, {L ∪{F} is all the participants; Pl is the leader’s policy set, Pf is the follow policy set, Rl is the leader’s revenue function, and Rf is the revenue of follows.

The goal of each game is to maximize the respective revenue, so the objective functions of leader and follow are (13) and (14):

$$ \max_{{p_{l}} \in {P_{l}}} R\left( {{p_{l}}} \right) $$
(13)
$$ \max_{{p_{f}} \in {P_{f}}} {R_{n}}\left( {{p_{f}}} \right) $$
(14)

where \(R\left ({{p_{l}}} \right )\) and \({R_{n}}\left ({{p_{f}}} \right )\) are the objective functions of the leader and the follower respectively, and the whole master-slave game aims at maximising these two values.

To sum up, when both sides of the game reach Nash equilibrium, all participants gain the most and do not change their strategies Return matrix.

3 Ant colony algorithm with Stackelberg game and multi-strategy fusion

The Stackelberg game is a classic game model. In this game, the leader makes the decision first, and the followers make their own decisions after the leader makes the decision. Based on the traditional idea of Stackelberg game, we design a Stackelberg game among multi ant colonies. When the algorithm is running, the group that is beneficial to the whole system will be selected as leader, while the others will be viewed as followers. Then, the leader, a pioneer that explored other paths, trains a number of iterations after exchanging information with other followers. After the path has been explored, the most beneficial strategy will be applied by the followers to obtain the information explored by the leader. Finally, after exchanging information, the followers continue to explore the path. From the Fig. 1, we can see how the Stackelberg game between multiple populations works. The dynamic Stackelberg game model is divided into the following steps:

step1::

Select the leader by (18);

step2::

Leader make decisions first;

step3::

Followers make decisions after the leader;

step4::

back to Step1;

Fig. 1
figure 1

The flow of the Stackelberg game

3.1 The establishment of Stackelberg game among ant colony algorithms

3.1.1 Parameters for evaluating population and algorithm state

Information entropy evaluates diversity

There are many indicators to evaluate the diversity of ant colony algorithm, such as standard deviation, information entropy and so on. For ant colony algorithms, keeping the variety of algorithms allows ants to choose more different paths. We use information entropy to measure the diversity of the algorithms, as shown in (15)

$$ E\left( P \right) = - \sum\limits_{x \in X} {P\left( x \right)} \log \left( {P\left( x \right)} \right) $$
(15)

Where \(P\left (x \right )\) is the size of ant x in the current solution. \(E\left (P \right )\) is the information entropy of all the solutions in the current population. The larger the deviation value of the solution is, the higher the information entropy will be, thus the higher the diversity will be.

Convergence evaluation

Convergence indicates the convergence ability of the current population, and the better the convergence is, the faster the current algorithm converges. In this section, we use (16) to evaluate the convergence of the current algorithm.

$$ conv = \frac{{length_{\min }^{i} - length_{\min }}}{{ite{r^{i}} - ite{r^{j}}}} $$
(16)

Where conv is the convergence of the current population, \(length_{\min \limits }\) is the optimal solution of the current population; iterj is the current number of iterations; \(length_{{\min \limits } }^{i}\) was the optimal solution last time; iteri is the number of iterations in which the last optimal solution is found.

Assess system status

Similarly, we use the convergence of the optimal solution to measure the current state of the algorithm; If the optimal solution converges rapidly, it indicates that the current system is in a period of rapid convergence. At this time, the algorithm needs the population with better convergence as the leader; otherwise, it needs the population with better diversity. We use (17) to express the convergence of the overall algorithm.

$$ CO = \frac{{iter_{\min }^{i}}}{{ite{r_{\min }}}} $$
(17)

Where \(iter_{\min \limits }\) is the number of iterations of the current optimal solution; \(iter_{{\min \limits } }^{i}\) is the number of iterations of the previous optimal solution.

3.1.2 Multi-factor evaluation mechanism

After the number of planned iterations, the comprehensive evaluation index of each population will be calculated by (18), and the population with the highest score will be selected as the leader.

$$ {Y_{i}} = \left( {1 + C{O}} \right)con{v_{i}}E{\left( P \right)_{i}} $$
(18)

Where Yi is the measure of the comprehensive evaluation index of population i; CO reflects the overall convergence of the algorithm, which is obtained from (17); convi is the convergence of population i; \(E{\left (P \right )_{i}}\) is the information entropy of population i, which represents the diversity of population.

3.1.3 The choice of leader

In our system, leader exists to lead other groups because of their distinct advantages over other groups. Equation (18) is used to evaluate the overall performance of the population. There are many reasons for using these three variables as the standard of the comprehensive ability of the population; First, how leader is chosen depends on the current state of the system. If the current system is in a period of rapid convergence, it indicates that we need the population with better convergence and higher precision as the leader; If the convergence of the algorithm is slow at present, it means that the whole system needs those populations with good diversity as leaders to help the population find more different paths. From the (17), we can judge the convergence of the algorithm as a whole, and the current state of the algorithm can also be judged from another perspective. Secondly, we also need to consider the current state of the population, so we add the evaluation index of convergence and diversity of the algorithm into the formula, which can reflect the state index of a single ant colony to some extent. Therefore, by correlating these three parameters and considering the overall performance of the algorithm, we also consider the state of a single ant colony, which can reflect the comprehensive performance of the population.

3.1.4 Number of training iterations for leaders

In order to explore more paths, the leader trains M iterations in advance. There are many ways to determine M, such as specifying M as a fixed value directly, but that is not the best way. In this section, we use the similarity (We use cosine similarity to evaluate the population similarity, as shown in (20)) between populations to determine whether the leader has been trained; As shown in (19), when L is greater than l (l is the threshold), the leader training is considered to be completed, and then the number of training iterations of the leader is M. The selection steps for M are shown in Table 1 below.

$$ L = \frac{{\left| {S_{M}^{leader} - S_{M}^{follower\_\min }} \right|}}{{S_{M}^{leader}}} \times 100\% $$
(19)

Where \(S_{M}^{leader}\) is the cosine similarity of the leaders; \(S_{M}^{follower\_{\min \limits } }\) is the minimum cosine similarity of the follower; L represents the degree of difference between leaders and followers. The greater the value of L, the more obvious the difference is.

Table 1 The calculation process of M

3.1.5 Cosine similarity

In this section, we use cosine similarity to evaluate the similarity between populations, as shown in (20):

$$ {S_{M}} = \frac{{A \cdot B}}{{\left| A \right| \times \left| B \right|}} $$
(20)

Where A and B is a 2-dimensional vector made up of conv (Calculated by (16)) and E (Calculated by (15)), which represents the state of the current population to some extent; As can be seen from the (20), the higher the value of SM, the more similar the two populations are.

3.2 Multi-strategy fusion mechanism

After the populations make their choices, they will exchange information, which is achieved by selecting various mechanisms. If the population chooses to cooperate after the game, then the population will choose to exchange information with the cooperating population; For a population, there are many kinds of information that can be exchanged, such as pheromone, optimal solutions common paths, and so on; Therefore, the choice of information exchange is a key issue for the participants.

In this section, we define different types of information; For different information about an ant colony, it reflects the current state of the colony differently; For example, The pheromone of the population contains the path selection of the population history, and the communication of pheromone among the populations can help them to increase the choice of different paths, thus improving the diversity of the algorithm; Therefore, the population chooses different information to communicate, which usually results in different effects.

We will present three strategies for populations to communicate different information between populations in order to improve their required performance. First, the pheromone fusion strategy will be used to improve experience exchange between populations; In general, the pheromone matrix of an ant colony algorithm contains the path chosen in the history of the population, which reflects the fuzzy experience of the population to some extent, and has the path dependence of the population; If a population exchanges pheromone, it will gain experience from other populations and increase the probability of choosing alternative paths, which is shown in Section 3.2.1. Secondly, the elite ant learning strategy increases the influence of the dominant population. The elite individuals of the dominant population contain the excellent characteristics of the population, and communication and learning with them can improve the convergence of other populations, as shown in Section 3.2.2. Finally, we propose the pheromone recombination strategy to help the algorithm avoid stagnation when the population is trapped in a local optimum, which is shown in Section 3.2.3.

3.2.1 Pheromone fusion strategy

The pheromone fusion strategy is implemented when the population has reached the specified time for communication and has converged to a certain extent and its diversity has decreased. Firstly, the information entropy of each population is calculated by (15), which represents the diversity of the population. Secondly, the population with the highest value of information entropy is selected to form a new pheromone matrix through linear fusion by pheromone fusion rules. By reorganizing the pheromone of the population in this way, we can improve the exploration of other paths, which in turn can increase the diversity of the population; Since this strategy will reduce the convergence of the algorithm, we use the elite ant learning strategy in Section 3.2.2 to balance the diversity of the population and avoid the disadvantage of slow convergence of the algorithm. The pheromone sharing between ant colonies will be given by the following (21).

$$ P{h^{k}} = \left( {1 - S_{M} } \right)P{h^{k}} + S_{M} P{h^{m}} $$
(21)

Where Phk is the pheromone matrix of the current ant colony; Phm is the pheromone matrix of the population selected to communicate pheromone; SM is the similarity value of the two populations, which is obtained from (20); According to (21), we form a new pheromone by linear combination of the pheromone matrix of the two populations in a certain proportion.

3.2.2 Elite ant learning strategy

After using pheromone sharing strategies to increase the population diversity, we need some strategies to increase the convergence rate of the algorithm to reduce the conflict between convergence and diversity; In this paper, (17) is used to evaluate the overall convergence of the algorithm. In addition, when the convergence value CO of the algorithm is less than ω1(ω1 is the current threshold of convergence), we will enable the elite ant learning strategies. We will select the population with the highest accuracy as the learning target, and learn the elite ants of this population to accelerate the convergence of the algorithm.

In this section, the ants in the population that are close to the top K% by the order of solutions are considered to be elite ants. When the population learns the elite ant, it learns by rewarding the pheromone in the path of the elite ant. We use (22). to reward pheromone in the path of the elite ants. The learning process is shown in Fig. 2

$$ {P_{n}} = \left( {1 + \frac{{CO}}{{n}}} \right)P $$
(22)

Where P is the value of the pheromone in the public path; Pn is the new value of the rewarded pheromone in the common path; n is the number of nodes in the city; CO is the evaluation of the overall convergence of the algorithm, which is obtained from (17); At the beginning of the algorithm iteration, the convergence of the algorithm is fast and the value of CO is relatively large, which rewards more pheromone in the common path and speeds up the convergence speed of the algorithm; As the number of iterations increases, the convergence rate of the algorithm slows down, thus reducing the pheromone rewarding the path of elite ants.

Fig. 2
figure 2

Schematic diagram of learning strategies of elite ants

3.2.3 Pheromone recombination strategy

The classical ant colony algorithm is easy to fall into local optimum (When the optimal solution does not change for more than 200 iterations, it is considered to be trapped in the local optimal solution), and when the number of nodes to solve the problem is more, the algorithm is easier to fall into local optimum. Therefore, we need a strategy to help ant colony algorithm to jump out of local optimum. First of all, from the perspective of the solution mechanism of ant colony algorithm, the reason why ant colony algorithm falls into local optimum is usually that the concentration of pheromone matrix on the edge of some paths is too high, which reduces the possibility of ants in the ant colony to find other paths, so they fall into local optimum. From this perspective, we can help ACO jump out of local optimum by changing the distribution of pheromone. There are many ways to reset the pheromone of an ant colony. The simple way is to initialize the pheromone matrix, but the disadvantage is that there is no way to take advantage of the long experience of an ant colony, which leads to low efficiency. Therefore, we need to maintain the excellent experience of the ant colony in the pheromone matrix while removing the pheromone in the path with high concentrations. The specific operation of this policy is as follows:

step1::

Find the common paths between populations.

step2::

The pheromone concentration of the public path is maintained, and the pheromone smoothness of the non-public path is carried out by using (23);

step3::

The order of public paths is maintained, and the local search for non-public paths is carried out by ant colony algorithm;

step4::

Update the pheromone of a non-public path;

step5::

All paths are searched again.

$$ {P_{l}} = \frac{{\left( {{P_{\max }} + {P_{\min }}} \right){P_{best}}}}{2} $$
(23)

Where Pmax is the maximum in the pheromone matrix; Pmin is the minimum value in the colony’s pheromone matrix; Pbest is the pheromone on the optimal solution in the current population when the algorithm is stagnant. Through (23), we reduce the pheromone in the optimal solution path with high pheromone concentration in the stagnant population, and re-assign the original path by averaging the maximum and minimum values in the pheromone matrix; Doing so in this way has two benefits: One is that we can reduce pheromone on the pheromone pathway at very high concentrations. Second, the population can retain most of the original experience of finding paths to avoid reducing the efficiency of finding solutions.

3.3 Algorithm framework

First, the ACS algorithm and the MMAS algorithm are chosen as the ant colony algorithm for the various swarms; Then, a certain number of iterations are used to train the various parameters of each swarm; After the training is completed, the parameters are brought into the model of Stackelberg game, and the state of the various swarms is evaluated by a combination of the highest rated swarms as leaders and the rest as followers; The next training time for the various groups is calculated; Leaders take the lead in M iterations of training, after which information is obtained from followers through a multi-strategy fusion mechanism; Followers then conduct a search for solutions; Finally the algorithm continues to loop the loop until the requirements are met. The pseudo-code and flowcharts for the MSACS algorithm are shown below, as shown in Table 2 and Fig. 3.

Table 2 Table of algorithmic framework
Fig. 3
figure 3

Flowchart of the algorithm

Comparison with the literature [14] to [19] mentioned in the introduction, through a multi-strategy fusion mechanism, the MSACS algorithm balances convergence and diversity very well. In addition to this, modelling the master-slave game between multiple swarms can effectively exploit the characteristics of the various swarms and improve the synergy between ants compared to literature [20] to [24].

4 Experiment and simulation

In order to verify the performance of the MSACS algorithm, this paper selects 30 different TSP instances to test the algorithm from the TSPLIP library, and each example needs to carry out 20 experiments. The experimental test environment of this paper is Windows10 operating system, and Matlab2019A is used for simulation.

In Section 4.1, orthogonal experimental method is used to select the parameters of MMAS and ACS. The best combined parameters is selected through experiments. The multi-population ant colony algorithm we improved is composed of ACS algorithm and MMAS algorithm, therefore, the best parameters of the improved ant colony algorithm also choose these parameters.

In Section 4.2, we analyze the validity of the MSACS algorithm strategy; In Section 4.2.1, the effectiveness of multi-strategy fusion strategy is analyzed; In Section 4.2.2, the training time of leaders is analyzed.

In Section 4.3.1, Through 30 TSP examples, we compare the performance of MSACS, ACS and MMAS; In addition, we analyze the data differences of MSACS algorithm, ACS algorithm and MMAS algorithm in 30 groups of examples, through the rank sum test. In Section 4.3.2, The MSACS algorithm is in comparison with other ant colony algorithms and intelligent algorithms.

4.1 Parameter setting

In this section, we will choose the best parameter combination for MSACS algorithm, ACS algorithm and MMAS algorithm through orthogonal test; Tables 34, and 5 shows the experimental results of parameter selection of MMAS; Tables 67, and 8 shows the experimental results of parameter selection of ACS. Each experiment used eil51 as an example. For each set of parameters, we conducted experiments of 30 times respectively. Based on the above experiments, It can be obtained from the Table 5 that the best combination of parameters of the MMAS algorithm is: α = 1, β = 4, ρ = 0.1; In addition, It can be obtained from the Table 5 that the best combination of parameters of the ACS algorithm is: α = 2, β = 4, ρ = 0.4, ζ = 0.2, q0 = 0.8.

Table 3 Experimental factors and levels of MMAS
Table 4 Results of MMAS orthogonal test
Table 5 Test results of MMAS
Table 6 Experimental factors and levels of ACS
Table 7 Results of ACS orthogonal test
Table 8 Test results of ACS

In Tables 5 and 8: T is the sum of the results. t is the average of each level range. range is the difference between the maximum minus the minimum, which will be used to determine which factor is important, and a larger range is usually more important. The scheme is to get the best result by orthogonal test of each factor.

4.2 Policy testing and performance analysis

4.2.1 Analysis of the effectiveness of the mechanism

The multi-strategy fusion strategy proposed includes three mechanisms: the pheromone sharing mechanism, the elite ant learning mechanism and the pheromone recombination mechanism. Firstly the pheromone sharing mechanism is applied to increase the diversity of the population; Secondly the elite ant learning mechanism helps the algorithm improve the convergence speed by learning the elite individuals of the dominant population; Finally The pheromone recombination mechanism is applied to improve the probability of the algorithm jumping out of local optimum.

We use MSACS-1 to label the MSACS algorithm without the pheromone fusion strategy, MSACS-2 to label the MSACS algorithm without the elite ant learning strategy, and MSACS-3 to label the MSACS algorithm without the pheromone recombination strategy; We used three TSP instances (kroA100, kroB150, and A280) to test the effectiveness of the strategy. Each instance ran 20 experiments respectively, and each instance ran 2000 generations; In the comparison experiment, the following parameters are used to judge: the optimal solution (best), the worst solution (worst), the average solution (mean) and the average deviation (MeanError(%)); The experimental results are shown in Fig. 4 and Table 9.

Fig. 4
figure 4

Convergence comparison of MSACS, MSACS-1, MSACS-2 and MSACS-3

Table 9 Comparison of MSACS, MSACS-1, MSACS-2 and MSACS-3

As shown in Table 9 and Fig. 4, the convergence speed and solving quality of MSACS are better than MSACS-1,MSACS-2 and MSACS-3; In addition, due to the absence of the elite ant learning strategy, the convergence speed of MSACS-2 is the slowest among several algorithms, which verifies that this mechanism can enhance the speed of convergence; Finally, in the experimental comparison of A280, it can be seen that the accuracy of MSACS-3 is the lowest among several algorithms, which is due to the lack of the pheromone recombination strategy, resulting in it falling into local optimum in the late stage of most experiments.

4.2.2 Analysis of the frequency of information exchange

In this paper, the leader selected through stackelberg game needs to run for a period of time M before other populations. In order to determine the optimal running time, experiments were conducted with 50 iterations, 100 iterations, 150 iterations, 200 iterations, 250 iterations, 300 iterations, 350 and fq (The number of dynamic iterations is determined by Section 3.1.4) iterations respectively; TSP instances EIL51, EIL76, KROA100, and CH130 were used to run 20 groups each and run 2000 iterations. The results are shown in Table 10, which shows the average solution of 20 experiments. According to the table, the best parameter is fq, which is determined by the similarity between the populations.

Table 10 The choice of M value

4.3 Comparative experimental analysis

4.3.1 Contrastive analysis with classical ant colony algorithm

In order to compare the abilities of ACS, MMAS and MSACS algorithms, we selected 30 different TSP instances for comparative analysis. The experiment is analyzed in terms of the following evaluation parameters: the optimal solution (best), the minimum error(PDbest), the worst solution (Worst), the worst error(PDworst), the average solution (Average), the average error(PDavg), the optimal iteration number (iters), the Standard deviation (std), and the rank sum test (P). The results of the experiment are presented in Tables 1112, and 13. The minimum error rate is obtained from (24); The average error rate is calculated by (25); The maximum error rate is obtained from (26); The standard deviation is obtained from (27).

$$ PD\_best = \frac{{{L_{B}} - {L_{\min }}}}{{{L_{\min }}}} \times 100\% $$
(24)
$$ PD\_avg = \frac{{{L_{AVG}} - {L_{\min }}}}{{{L_{\min }}}} \times 100\% $$
(25)
$$ PD\_worst = \frac{{{L_{W}} - {L_{\min }}}}{{{L_{\min }}}} \times 100\% $$
(26)
Table 11 Performance comparison of MSACS in different TSP instances
Table 12 Performance comparison of MSACS and ACS in different TSP instances
Table 13 Performance comparison of MSACS and MMAS in different TSP instances

Where LB is the optimal solution found by the algorithm; LA is the average of N optimal solutions; LW is the worst of N optimal solutions; Lmin is the optimized solution for the TSP instance.

$$ dev = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{N} {{{\left( {{l_{i}} - {l_{avg}}} \right)}^{2}}} } $$
(27)

Where N is the number of ants; li is the best solution for the ith experiment; lavg is the average of the current N solutions.

As shown in Table 11, 21 groups of optimal solutions were found by the MSACS algorithm in 30 groups of TSP instances. For the other TSP instances, the percentage deviation from the optimal solution is also relatively small, with 7 TSP instances having an error of less than 1%; In addition, the percentage deviation of the optimal solution for the remaining three TSP instances is less than 2%. In addition, in the average error comparison, in most instances, the value of Pavg is less than 1% (kroB200, lin318, pr439, att532, p654, and d1291 whose Pavg is 1.47%, 1.89%, 2.53%, 2.64%, and 3.63%); Most values of Pworst have percentage deviations of less than 4% (p654 and d1291 whose Pworst is 6.08% and 7.44%);

As can be seen from Tables 12 and 13, The MSACS algorithm outperforms the traditional MMAS algorithm and the ACS algorithm in all TSP instances for all properties. It can be seen that in urban TSP instances with less than 200 nodes, most MACS algorithms can find optimal solutions compared to ACS algorithms and MMAS algorithms; In addition, due to the elite ant learning strategy, the MSACS algorithm has the fastest algorithm convergence speed compared to the other two algorithms; In the experimental comparison of the number of city nodes in the TSP instances between 200 and 500, MSACS still has the high accuracy. This is due to the influence of the pheromone fusion strategy and the elite ant learning strategy, which makes the algorithm both convergent and maintain good diversity; In the TSP example, the MSACS algorithm is far more accurate than the other two algorithms in the experimental comparison for city nodes greater than 500(especially in the comparison of TSP instances d1291, the error of the optimal solution of MSACS is 1.34%, while the error of MMAS algorithm and ACS algorithm are 6.98% and 8.47% respectively). What’s more, in the comparison of average deviation, We can see that the average error of MSACS is lower than the two other algorithms, which indicates that the solution stability of MSACS algorithm is also better.

Figure 5 shows the comparison of the standard deviation distribution of MSACS algorithm, ACS algorithm and MMAS algorithm; It can be seen from the figure that, compared with the other two algorithms, the standard deviation of MSACS algorithm is the smallest in most TSP instances; In addition, standard deviation can reflect the stability of the algorithm to some extent, and the smaller the standard deviation is, the better the stability of the algorithm is; From the comparison in the figures, we can see that the stability of MSACS algorithm is the best among the three algorithms.

Fig. 5
figure 5

The stability of the different algorithms

Figure 6 shows the convergence variation of MSACS, ACS, and MMAS in the set of TSP instances; As can be seen from the figure, MSACS algorithm has higher convergence speed and accuracy than MMAS algorithm and ACS algorithm; Since MSACS uses pheromone recombination strategy, it improves the disadvantage of ACS algorithm and MMAS algorithm that it is easy to fall into local optimum, and has better accuracy in various TSP instances.

Fig. 6
figure 6

Convergence comparison of MSACS, ACS and MMAS

Figure 7 shows the optimal paths found by the MSACS algorithm for eight different TSP instances.

Fig. 7
figure 7

Best tours for each TSP instance found by MSACS

The P values in Tables 12 and 13 are the results of rank sum test, which reflects the differences of samples; As can be seen from the table, the results of MSACS are significantly different from those of MMAS and ACS in most cases.

Overall, the MSACS algorithm enhances the accuracy of the algorithm, speeds up the convergence, and reduces the possibility of algorithm stagnation. The solving ability of MSACS is better than that of MMAS and ACS.

4.3.2 Comparative analysis with other ant colony algorithms and other intelligent algorithms

To further verify the effectiveness of the improved MSACS algorithm, we compare it with other improved ACS algorithms and other intelligent algorithms in this section, and the results are shown in Tables 1415161718, and 19. In Table 14 MSACS algorithm and PACO-3OPT [31] algorithm were compared, and from the table we can see: In cities with less than 200 nodes, the MSACS algorithm finds the optimal solution while the PACO-3OPT algorithm finds the optimal solution only partially. In the comparison of optimal, worst and average solutions, the MSACS algorithm performs better than the PACO-3OPT algorithm in most of the TSP instances. In spite of that, MSACS algorithm is compared with HAACO [32] algorithm in all aspects in detail in Table 15. Similarly, we can see that the MSACS algorithm outperforms the HAACO algorithm in the comparison of all city instances.

Table 14 Comparison of MSACS and PACO-3OPT in TSP instances
Table 15 Comparison of MSACS and HAACO in TSP instances
Table 16 Comparison of MSACS and DSFLA in TSP instances
Table 17 Comparison of MSACS and DJAYA in TSP instances
Table 18 Comparison of MSACS and other algorithms in TSP instances
Table 19 Comparison of MSACS and other algorithms in TSP instances

Further, other intelligent algorithms are used to compare with MSACS. In Tables 16 and 17, the DSFLA and DJAYA algorithms are compared with the MSACS algorithm, respectively. In addition to being better in the comparison of optimal and average solutions, the MSACS algorithm is smoother than the other two algorithms in the comparison of variance.

Finally, to better evaluate the capability of the MSACS algorithm, more comparisons will be shown in Tables 18 and 19. DFFO [4], DEACO [19], CCMACO [33], HDACO [34], HGA [35], HDABC [36] GA-MARL+NICH-LS [37], SSABC [38], AG-BSO [39], PSO-ACO-3opt [40], MARL+NICH-LS [37], IBA [41], DWCA [42], HMMA [43], DBACS [44], DCS [45], ABC-3OPT [46], DSMO [47], JCACO [27], and MGACACO [48]will be compared with the MSACS algorithm in the optimal solution. In the comparison, the solution accuracy of MSACS algorithm is better than the other algorithms. This shows that the MSACS algorithm expands the search space by selecting the best strategy through the Stackelberg game, which results in better accuracy.

5 Conclusion

In this paper, we propose An ant colony algorithm with Stackelberg game and multi-strategy fusion. First of all, MMAS algorithm and ACS algorithm are used to construct heterogeneous multi-population ant colony algorithm, so that the advantages of different algorithms are used to balance the convergence and diversity of the algorithm; Secondly, Stackelberg game with dynamic changes of leaders among multiple populations is established. In this game, the overall consideration algorithm and the attributes of each population are comprehensively used to select leader. Then, the leader improves the search efficiency by training a certain number of iterations in advance to explore more paths; Finally, the multi-strategy fusion strategy is used to improve the information exchange among the populations, among which three strategies are proposed: Strategy 1 is the pheromone fusion strategy, under which the pheromone matrices between populations that need to communicate are combined into a new pheromone matrix through a linear combination of certain parameters. This strategy can improve the ability of ants to explore different paths and increase the diversity between populations; Strategy 2 is the elite ant learning strategies. Under this strategy, the population learns from the experience of the elite ants of the dominant population through the updating of pheromone to improve the convergence rate of the population; Strategy 3 is the pheromone recombination strategy, which is used to help the population jump out of local optimality. When the population is in the local optimal state, the pheromone of common paths between the populations are retained, and the pheromone of non-public paths are smoothed, and then the local search is carried out.

The experimental results show that Compared with the traditional ant colony algorithm, the improved ant colony algorithm and other intelligent algorithms, the MSACS algorithm has better convergence speed and precision, and has better quality in the solution of large-scale examples.

In addition, the ant colony algorithm mentioned in this paper can also be applied to specific practical applications, such as robot path planning: in a general solution, the map rasterization makes path planning for robots a nodal optimization problem, which is similar to the TSP problem. In our algorithm, Stackelberg game model can improve the collaboration of ants in path planning; The multi-strategy fusion mechanism can help the algorithm balance diversity and convergence, and increase its probability of jumping out of the local optimum when stalled.

In the future work, we will further investigate the essence of solution seeking of ant colony algorithm, improve its convergence speed and solving precision for large-scale node problems, and apply it to some practical engineering problems.