1 Introduction

Global urbanization is resulting in increased traffic congestion in major cities, which not only negatively impacts productivity and causes significant economic losses but also leads to serious environmental pollution and safety accidents that endanger human life and health. How to utilise current traffic and road resources to relieve urban congestion in the event of restricted growth of existing urban area has emerged as a hot topic that requires urgent resolution in the development of modern cities [1]. Currently, the following is the primary strategies for reducing urban traffic congestion:

  • Control the dynamic optimization of traffic signals using technologies like SCOOT (split cycle offset optimization method) [2] and SCATS (sydney coordinated adaptive traffic system) [3]. This kind of approach primarily relieves traffic congestion from the standpoint of macro-traffic flow and increases the rate at which road resources are utilized by accelerating the circulation speed of driving vehicles, but it ignores the various traffic requirements of each microscopic driving vehicle;

  • Deterministic algorithms, such as the A* algorithm [4], the Dijkstra shortest path algorithm [5], the dynamic programming method [6], etc., make up the majority of the path navigation algorithms for a single moving vehicle. PSO algorithm [7], genetic algorithm [8], colony optimization algorithm [9], neural network algorithm [10], and other intelligent algorithms. algorithms for traffic optimization, as those used by TomTom [11], Google Navigation [12], etc. All of the aforementioned techniques are non-negotiation algorithms, and the issue is that: When determining the best route for each vehicle, the interaction between vehicles is not taken into consideration, which may cause many vehicles to congregate in one area of the road, causing new traffic congestion and extending the driving times of vehicles.

  • Optimization method based on vehicle or route data. Some researchers choose a selection of roads with comparable driving times based on past vehicle path data and propose them to driving cars based on probability [13] in order to spread vehicles on various paths to relieve traffic congestion. Alternatively, a charge dependent on the level of path congestion might direct the vehicle’s choice of route [14] or by choosing the vehicle’s route through the interactive game with the road network center [15].

The previously mentioned methods have been effective in distributing traffic and reducing congestion to some extent. However, they either depend on fixed data or lack collaboration when determining a vehicle’s route based on real-time traffic updates. It either lacks computing efficiency and is unable to offer quick and precise path selection [16]. Additionally, in the approaches mentioned above, the vehicle primarily chooses the route based on trip time, without taking additional preference indications into account, and without flexible personalized choices.

This paper introduces a distributed system made up of multiple autonomous agents [17]. Each vehicle agent has the characteristics of self-regulation and cooperation, and can effectively conduct real-time information exchange with other vehicle agents [18]. When the car approaches each intersection, the route must be constantly updated in conjunction with various preferences for the path ahead and real-time traffic data in order to increase the accuracy of routes. The preferred path is selected by each vehicle to maximize its own utility, but in order to decrease the impact of multiple vehicles on path selection, the optional path must be configured by each vehicle and combine real-time traffic data to work together [19]. How can all vehicles in the entire urban road network cooperate to maximize their individual utility while the utility of the entire road network is maximized, which is a random congestion game problem based on incomplete information [20]? Assuming every vehicle is aware of its potential routes and its realized benefits, but is unaware of the benefits of routes not chosen. Players engage in the game over multiple levels to grasp its mechanics. They choose strategies for each level based on past experiences and observations. Over time, with consistent selection behavior, players’ strategies are likely to gravitate towards the Nash equilibrium [21]. This study introduces an adaptive learning algorithm that aims to reach the Nash equilibrium, providing a way to optimize traffic utility while also ensuring the best performance for each vehicle.

In summary, the main contributions of this paper can be summarized as follows:

  • Built an urban traffic network dynamic real-time multi-intersection route selection model. It may efficiently avoid mutual route conflicts and provide more precise, individualized route selection for driving cars by taking into account the route selection methods of many vehicles at various junctions at the same time;

  • The accuracy and efficiency of vehicle routing are greatly enhanced by the newly developed calculation method for the utility value of alternative routes for driving vehicles, which not only takes into account the various preferences of vehicles for route selection and dynamic uncertain real-time road condition information, but also takes into account the impact on road traffic conditions;

  • An adaptive learning algorithm is introduced to implement a dynamic route selection approach for vehicles, aiming to reach the Nash equilibrium and ensure a balanced traffic distribution across available routes. This is done while taking into account the preferences of each vehicle, using information exchange and negotiation between them. The primary goal of this algorithm is to enhance the efficient use of urban road network resources.

The rest of this article is divided into the following sections. The associated study is introduced in Sect. 2 of this publication. The suggested system model is introduced in Sect. 3. The adaptive learning approach for dynamic route selection is described in Sect. 4 along with its convergence. In order to demonstrate the suggested model’s efficacy, Sect. 5 compares it to both the simulation and the actual urban traffic network. Final observations are provided in Sect. 6.

2 Literature review

Reference [22] integrated the PSO algorithm to optimize the traffic management strategy and used the junction control information to carry out dynamic route navigation for cars in order to make maximum use of urban road resources and decrease the average delay time of vehicles. Reference [23] segmented the metropolitan road network into several regions, forecasted the macroscopic traffic flow in each region, and carried out dynamic route planning for vehicles across regions, minimizing traffic jams between neighboring regions and lowering the computation scale of centralized control. A reinforcement learning technique was presented in Reference [24] to enable traffic light signals to cooperate with one another and make adaptive modifications in response to real-time traffic circumstances, therefore reducing traffic delays. Using a non-centralized collaborative agent, reference [25] suggested a multi-agent-based distributed green wave adaptive control system for controlling traffic lights in order to improve traffic flow. The aforementioned techniques, however, primarily undertake route counselling from the standpoint of macro-traffic flow, without taking into account the specific driving requirements of microscopic cars.

Reference [7] suggested a quantum PSO method to plan the vehicle’s course in order to increase the classic PSO algorithm’s global search capability and convergence speed. The parameters that must be taken into account in the traffic process were precisely modelled by reference [26] using the AHP in conjunction with fuzzy reasoning technology, and dynamic route navigation for moving cars was carried out. References [5, 27] suggested a path planning model based on a multi-agent reinforcement learning algorithm that determines the best route for the vehicle by accounting for a variety of influencing elements during vehicle operation and assigning varying weights to each one. Ant colony optimization technique was incorporated in references [9, 28] to design and investigate vehicle pathways. In order to increase vehicle driving economy, reference [29] suggested a route optimization approach based on traffic flow prediction, paired with an effective dynamic enhanced A* route search algorithm. The aforementioned techniques, which are all non-negotiation algorithms, can all help shorten the distance travelled by a single vehicle, but they do not account for the routing effects of other cars when routing simultaneously.

Reference [13] proposed a route recommendation method based on super-path, select a group of routes with similar driving times and recommend them to driving vehicles according to probability, and disperse traffic flow in a group of similar routes in order to avoid new congestion caused by vehicles choosing the same route. However, this approach is based on static historical traffic data rather than dynamic real-time traffic statistics. The reference [30] attempted to spread out the cars to the way with low traffic flow, but did not fully use the capacity of the actual shortest path, which would induce some vehicles to the far path. Reference [31] uses traffic signals to gather real-time traffic data, prevent abrupt traffic congestion, and equally distribute cars along the whole road network. However, it does not take into account the driving requirements of tiny vehicles; instead, it addresses the collaboration between traffic signals and relevant road segments. Reference [32] lowered the average trip time of cars by performing path navigation from the viewpoint of tiny vehicles, sorting and navigating by comparing the impact of vehicles on the total road network traffic, and so on. The collaboration between cars, however, is not taken into consideration by this strategy. To direct cars, references [14, 33] employed pricing strategy and a dynamic toll policy. The extended anti-Stackelberg game was used in reference [15] to help the traffic centre direct the motorist towards the system’s ideal path. Reference [34] looked for an efficient path allocation strategy that collaborates with road network administrators, information centers, and vehicles to fully use the capacity of the road network in space and time. The calculating efficiency of the above strategy, which is mostly a master–slave game akin to centralized agent collaboration, is not very high. Reference [17] is based on virtual agent negotiation for path guidance between cars to reduce traffic jams. By contrasting three negotiating strategies, the link between the individual interests of cars and the interests of the overall metropolitan road network is investigated. The correlation and impact of traffic capacity across road segments when cars at numerous crossroads choose routes simultaneously, as well as the influence of unforeseen traffic occurrences on vehicle route selection, are not taken into consideration.

Through various sensors deployed inside vehicles or outside vehicles, drivers or transportation administrators can clearly know the concrete status of vehicles or the traffic conditions so as to make better transportation scheduling decisions. In [35,36,37,38], the authors analyzed the time-aware traffic flow data and prediction so as to aid more scientific and reasonable transportation decision-makings. Since networks play a key role in data-driven intelligent decision-makings [39, 40], novel technologies have been taken into consideration to improve the transportation scheduling performances, such as pricing mechanism [41], digital twin [42] and blockchain [43].

In conclusion, there are issues with the current study on vehicle routing based on urban road networks, including the simplicity of routing indications, a lack of accuracy, and a lack of dynamic collaboration. In order to adequately capture the dynamic strategy when several cars at various crossings simultaneously pick routes, this research builds a unique dynamic real-time multi-intersection route selection model. The efficiency of route selection is greatly increased by a calculation method for the utility value of optional routes for moving vehicles that not only considers the various preferences of the vehicles for route selection and dynamic and unreliable real-time road condition information, but also takes into account the impact of route traffic conditions. precision and potency. To resolve the challenges faced when multiple vehicles select their routes, an adaptive learning method is proposed. This approach aims to optimize the use of urban road network resources by ensuring an even distribution of traffic across available routes, all while taking into account the preferences of each vehicle.

3 System model

Prior research did not concentrate on how cars at several junction’s impact one another when picking routes. A dynamic real-time multi-intersection routing model is therefore suggested in this article for urban traffic networks like those in Fig. 1. This traffic model, in contrast to previous traffic models, is a model for simultaneous dynamic route navigation for a large number of cars at several junctions in the metropolitan road network. There are many factors that need to be taken into account for route navigation based on real-time dynamic traffic information, including not only the driver’s subjective preferences for the road and objective attributes, but also the cost associated with the route selection, potential emergencies on the road, etc. By sharing information, vehicles on the road identify their optimal paths, optimizing the overall usage rate of the urban road system while also catering to their individual preferences. The model definition is thoroughly explained in Sect. 3.1. The specific method of utility computation is provided in Sect. 3.2.

Fig. 1
figure 1

Proposed system model

3.1 Problem model

Let a directed graph G(LE) denote the traffic network, in which L denote the collection of intersections, and E is a collection of road segments between intersections, as shown in Fig. 2. Suppose the number of intersections in the urban traffic network is m, and \({l_x(x=1,2...,m)}\) represents the xth intersection, then \(L=\left\{ l_1,l_2,...,l_m\right\}\). For any adjacent intersection \({l_x}\) and \({l_x}\), \({(l_x, l_y) \in E}\) represents a road section that can directly reach intersection \({l_y}\) from intersection \({l_x}\). If \({(l_y, l_x) \in E}\), it does not mean that \({(l_x, l_y) \in E}\) must be established. \({ln\left( l_x \right) }\) represents the set of all neighbor which drive to \({l_x}\), and \(Out(l_x)\) represents the set of all neighbor from \({l_x}\). \({Turn\left( l_x, l_y \right) }\) represents the collection of intersections that vehicles driving from intersection \({l_x}\) to \({l_y}\). If it is possible to return to \(l_x\) from intersection \({l_y}\), then \({Turn \left( l_y, l_x \right) =Out(l_y)-\left\{ l_x\right\} }\). If it is not possible to return \({l_x}\) from the intersection \({l_y}\), then \({Turn(l_y,l_x)=Out\left( l_y\right) }\).

Fig. 2
figure 2

Graphical model of traffic intersection

The number of driving vehicles in the above urban traffic network is set to n, and \(N=\left\{ 1,2,...,i,...,n \right\}\) denotes the number of vehicles. Among them, the start and end point of the ith vehicle are denoted as \(o_i\) and \(d_i\), respectively. At the current moment t, the set of intersections the vehicle has passed, the intersection it just left and the intersection it is about to drive to, are represented by \(Q_i^t\), \(f_i^t\) and \(g_i^t\) respectively. When the vehicle is heading towards the intersection \(g_i^t\), according to the combination of the preference and cost of the optional path ahead estimated, combined with the traffic condition information obtained through real-time monitoring, select an optional path to turn, and use \({l_k}\) to represent the intersection that the vehicle chooses to turn in the next time step, that is \({l_k \in Turn(f_i^t,g_i^t)}\).Use \(Action\left( i,t \right)\) to represent the steering action of the vehicle at the current moment t, then:

$$\begin{aligned} {Action(i,t)=\left\{ (g_i^t \rightarrow l_k)|l_k \in Turn(f_i^t, g_i^t) \right\} } \end{aligned}$$
(1)

Use \(R_{g_i^t,d_i}^k\) to represent at the current moment t, the ith vehicle drives to the intersection \(g_i^t\) and passes through the intersection \(l_k\) to reach the set of all optional paths to the destination \(d_i\), where the hth optional path is expressed as \(r_{g_i^t,d_i}^{k,h}\), the road segment number of the path is expressed as \(S_{g_i^t,d_i}^{k,h}\).

Taking into account potential uncertainties on the upcoming route, as well as the driving conditions and their impact, vehicles approaching each intersection must weigh the desirability of available paths before making a choice. Factor value and cost value, and finally the utility value of the vehicle for the optional route can be obtained. Here, using the utility function, \(u_{g_i^t,d_i}^{k,h}\) represents the utility value corresponding to the ith vehicle driving to the intersection i and turning to the intersection \(l_k\) at the current moment t, and arriving at the destination \(d_i\) via the hth optional route, which is defined as follows:

$$\begin{aligned} u_{g_i^t,d_i}^{k,h}&= p \times {P_{g_i^t,d_i}^{k,h}} - b \times {B_{g_i^t,d_i}^{k,h}} \nonumber \\{} & \quad {} - c \times {C_{g_i^t,d_i}^{k,h}},p,c,b \in (0,1) \end{aligned}$$
(2)

Among them, \({P_{g_i^t,d_i}^{k,h}}\), \({B_{g_i^t,d_i}^{k,h}}\), \({C_{g_i^t,d_i}^{k,h}}\) respectively represent the vehicle’s preference value, uncertainty factor value and cost value for the corresponding optional route; p, b, and c are multiplicative factors, respectively representing the preference value, uncertainty factor value, and weight ratio of the cost value when the vehicle to be steered chooses an optional route. According to the different urban traffic network standards where the vehicle is located, for example, the vehicle traffic efficiency, driving safety, etc., p, b, and c take different values respectively. The values of p, b, and c depend on factors such as city size and road network traffic flow. When the city scale is large and the corresponding road network traffic volume is large, vehicles will focus on traffic efficiency, and will first consider avoiding the congested path, the preference for the path is secondly considered. At this time, the values of b and c are too large, and the value of p is too small. When the city scale is relatively small and the corresponding road network traffic flow is relatively small, the vehicles will focus on the preferred route. At this time, the value of p is relatively large, and the values of b and c are relatively small. Based on the above principles, the specific values of p, b, and c can obtain optimal values in multiple experimental explorations. The greater the utility value of the vehicle for the optional route, the greater the possibility of choosing the route.

To achieve a balanced distribution across available routes and prevent traffic congestion, while also taking into account the desirability of each path, vehicles at intersections, just before turning at time t, rely on real-time traffic data from ahead, the congestion game is carried out through information interaction, and the driving route is selected based on the adaptive learning algorithm, so that the strategy can ensure that the respective expenses does not exceed the expected threshold. To achieve Nash equilibrium, while maximizing the utilization efficiency of urban road network road resources. According to the above description, the dynamic multi-intersections approach for traffic network is to solve the following parameter optimization problem.

$$\begin{aligned} \left\{ \begin{array}{l} {u_i^t(r_i^{t^*},r_{-1}^{t^*}) \ge u_i^t(r_i^{t^*},r_{-1}^{t^*}), \forall r_i^t \in R_i^t}\\ {s.t.time_{g_i^t,d_i}^{k,h} \le t\_M_{g_i^t,d_i}^k|\pi ^*(i,t)}\\ {dist_{g_i^t,d^i}^{k,h} \le d\_M_{g_i^t,d_i}^k|\pi ^*(i,t)}\\ {oil_{g_i^t,d^i}^{k,h} \le o\_M_{g_i^t,d_i}^k|\pi ^*(i,t)}\\ i=1,2,...,n;l_k \in Turn(f_i^t,g_i^t);h=1,2,...,R_{g_i^t,d_i}^k \end{array} \right. \end{aligned}$$
(3)

Among them, \(r_i^{t^*}\) represents the optimal path selected by the i-th vehicle at the current moment t; \(r_{-i}^(t^*)\) represents the current moment t, a vector composed of the routing strategies of all vehicles except the ith vehicle; \(u_i^t\) is the utilization factor of the ith route selection at time t; \(R_i^t\) is the set of optional paths for the ith vehicle at the current moment t; \(\pi ^*(i,t)\) represents the optimal strategy of the ith vehicle for other vehicle route selection at the current moment t; \(t\_M_{g_i^t,d_i}^k\), \(d\_M_{g_i^t,d_i}^k\), \(o\_M_{g_i^t,d_i}^k\) and \(time_{g_i^t,d_i}^{k,h}\), \(dist_{g_i^t,d_i}^{k,h}\), \(oil_{g_i^t,d_i}^{k,h}\) represent the time-consuming cost threshold and the cost-distance threshold corresponding to the ith vehicle driving to the intersection \(g_i^t\) and turning to the intersection \(l_k\) to reach the destination \(d_i\) at the current moment t, the fuel consumption cost threshold, and the expected time-consuming cost value, the consumption distance cost value, and the fuel consumption cost value corresponding to the hth path to reach the destination.

3.2 Utility calculation

Future path selection criteria when a vehicle is about to turn while being driven take into account a variety of factors in addition to distance and travel time, such as the number of lanes, the presence of pedestrian crossings, the quality of the lighting, etc., of the road sections included in the path, as well as the various subjective preferences of drivers for the route. These elements work together to determine the vehicle’s preference for a particular road section, and the precise specifications are displayed in Table 1.

Table 1 Parameters for preference determination

According to the relevant parameters described in Table 1, they can be weighted and combined to calculate the route priority P, which is expressed as follows:

$$\begin{aligned} P&= {} alt \times w1 + lan \times w2 + sdw \times w3 + lgt \times w4\nonumber \\{} & \quad + cpl \times w5 + fmy \times w6 \end{aligned}$$
(4)

Among them, \(w1\sim w6\) represent the independent weight multiplication factors corresponding to the corresponding parameters. The larger the factor, the more important the corresponding parameters are, and their sum is 1. In this scenario, the priority is determined by the total of preference values for all segments along the route. A higher calculated preference value indicates a stronger inclination of the vehicle to choose that road, making it more likely for that particular route to be selected.

There may occasionally be some unforeseen incidents on the road ahead while the car is being driven, such as traffic accidents, temporary controls, etc. These occurrences make up the unpredictable aspects that must be taken into account when the vehicle selects a route. In Table 2, the particular parameters are described.

Table 2 Parameters for traffic uncertainty

The severity of the emergency’s effects on the traffic throughout the whole road segment are the key factors taken into account when determining the value of each uncertainty element. For the hazard of unexpected traffic accidents, for instance: If the effect of the collision is so severe that the cars on this road section cannot continue to pass, then this value can be set to \(\infty\), which denotes that the vehicle will not select this road section while turning. If a traffic collision occurs, but the effect on traffic on this stretch of the road is not extremely severe and some cars may still pass without incident, then the value is between [0,1], and the value is chosen based on how serious the accident was. The higher the value, the more significant its effect on traffic, leading to more severe accidents. The computation method remains consistent for uncertainty factors associated with different emergencies, such as temporary controls. The uncertainty factor value B related to this section may be determined using the values provided for various crises, as illustrated below:

$$\begin{aligned} B=accident \times a1 + activity \times a2 \end{aligned}$$
(5)

where a1 and a2 represent the independent weight multiplication factors corresponding to the corresponding parameters. The importance of the related characteristics increases with the size of the factor, and their total is 1. The total of the values for each part of the road denotes the route uncertainty factor. The larger the estimated uncertainty factor value, the less likely it is that the vehicle will turn to pick the road.

The driving vehicle must also take into account the driving circumstances of each route after determining the values of priority and uncertainty. When a vehicle picks a certain path to turn on, it will increase the traffic flow of that path, which will have an effect on the way’s driving conditions. It will also be impacted by the path’s current congestion status. This impact is known as the cost price paid by the vehicle for selecting the route and includes the expected time cost, the expected distance cost, and the expected fuel consumption cost. In Table 3, the particular parameters are explained.

Table 3 Parameters for route cost estimation

Among the aforementioned parameters, the expected distance cost can be calculated relatively easily by the ratio of actual and expected lengths, whereas the expected time cost and expected fuel consumption cost calculations are more challenging because they depend on how congested the corresponding path is. This work introduces the congestion coefficient tau, whose value is connected to the actual number of cars on the road q, D shows the congestion capacity, Y denotes the capacity of threshold parameter. In general, the more congested a road is, the higher the corresponding traffic congestion coefficient of the road is, and the different stages of traffic flow (such as free flow, semi-free flow, congestion flow, etc.) will result in different increases in the traffic congestion coefficient. Calculated specifically as follows:

$$\begin{aligned} \tau = \left\{ \begin{array}{l} 1, \quad q< Y \\ q / Y, \quad Y \le q < D \\ D /Y + e^{q/D}, \quad D \le q \end{array} \right. \end{aligned}$$
(6)

The expected time-consuming cost of a path is the result of adding the expected time-consuming times of all the road sections and the expected time of the path ratio. The expected time-consuming of a road section can be calculated as the product of the average travel time of the road section and the congestion coefficient of the road section. The ratio of the predicted fuel consumption on each road segment to the projected fuel consumption. The cost value C corresponding to a path can be computed by adding the predicted cost values stated above:

$$\begin{aligned} C=time \times t1+dist \times t2+oil \times t3 \end{aligned}$$
(7)

where the independent weight multiplication factors t1 through t3 that correlate to the respective parameters are represented by these numbers. The importance of the related characteristics increases with the size of the factor, and their total is 1. The probability that the turned vehicle will pick the road decreases as the predicted cost value increases.

4 Proposed algorithm

The suggested algorithm refines its dynamic route selection approach, allowing numerous vehicles throughout the road network to navigate through a stochastic congestion game. Leveraging the Markov learning process, it helps each vehicle discover and choose the optimal path, ultimately reaching a mixed Nash equilibrium. The traffic flow and the computation link between utility values are described in Sect. 4.1. The stochastic congestion game among automobiles is described in Sect. 4.2.

4.1 Traffic flow and utility value

In order to express the relationship between the traffic flow in the urban road network and the utility value of a vehicle choosing a certain route, the following symbols and terms are defined in this paper. \(N=\left\{ 1,2,...,n\right\}\) represents n driving vehicle agent sets. \(R_i=\left\{ 1,2,...h,...,M_i\right\} (i \in N)\) represents the optional path set of the ith vehicle agent, and \(R_{-i}\) denotes the agents except the ith vehicle agent gather. If the start and end points of two vehicles are the same, their optional path sets are also the same. \(r_i\in R_i\) denotes ith vehicle agent route, and \(r_{-i}\in R_{-i}\) denotes the routes of other vehicle agents. The routing strategy of all vehicles can be expressed as a vector \(\overrightarrow{r}=(r_1,r_2,...,r_i,...,r_n)\in R\),\(\overrightarrow{r}=(r_i,r_{-i}) \in R\), where \(R=R_1 \times R_2 \times ...\times R_n\). \(u_i(\overrightarrow{r}):R \rightarrow {\mathbb {R}}\) represents the utility value obtained by the ith vehicle agent under the path selection strategy r of all vehicle agents. The utility value for a vehicle selecting a specific route is chiefly tied to the value of each segment on that route. Apart from the vehicle’s preference for the route and the accompanying uncertainties, this can be deduced from the utility value computation method discussed in the prior section regarding vehicle route choices. It has to do with how traffic is moving. This value is specified as follows, however it relies on how each vehicle chooses its routes.

It is known that E represent the number of sections, and \(q_e\) is defined as the traffic flow and \(e \in E\). The optional path set \(R_i \subseteq 2^{|E|}\) of vehicle \(i(i\in N)\), one of which is an optional path \(h \in R_i\), consists of several road sections \(e \in h\). At time \(e \in t\), the number of times vehicle i visits the optional path \(h(h \in R_i)\) is a 0-1 function, defined as follows:

$$\begin{aligned} z_{i,h}^t=1\left\{ r_i^t=h \right\} \end{aligned}$$
(8)

Among them, 1\(\left\{ \bullet \right\}\) is an indicator function: if the judgment in the braces is true, the value is 1; Otherwise, it is 0. Therefore, the traffic flow is expressed as:

$$\begin{aligned} q_e^t = \sum _{i \in N}\sum _{h \in R_i}\delta _{i,h}^eZ_{i,h}^t, \forall e\in E \end{aligned}$$
(9)

Among them, \(\left\{ \delta _{i,h}^e \right\}\) represents the relationship matrix between a road segment and all the optional routes. The factor \(\delta _{i,h}^e\) represents the relationship between the option route and preferred segment e. Assuming that each vehicle selects a route independently according to a certain fixed probability based on past historical experience, and each vehicle can observe the route selection of other vehicles by using the utility calculation formula in Sect. 2, the average utility \(u_i^t(r_i^t,r_{-i}^t)\) of vehicle i to the selected route at time t can be calculated. In order to stabilize the random route selection process of the vehicle, a noise \(\varepsilon _i^t( r_i^t)\) generated by random selection is added to form the known utility \(U_i^t\) of vehicle i at time t, as follows:

$$\begin{aligned} U_i^t=u_i^t(r_i^t,r_{-i}^t)+\varepsilon _i^t(r_i^t) \end{aligned}$$
(10)

Assume that at each time t, all vehicles observe the known utility of choosing a route at each previous time \((0,1,...,t-1)\), and refer to this information to select the route at time t. According to the known utility value at each previous moment, the estimated utility of vehicle i for route h at time t (the utility value is the maximization target of ith vehicle path selection) can be obtained as:

$$\begin{aligned} {\hat{u}}_{i,h}^t=\frac{1}{Z_{i,h}^t}\sum _{s=0}^{t-1}U_i^SZ_{i,h}^S \end{aligned}$$
(11)

Among them, \(Z_{i,h}^t\) represents the number of times ith vehicle visits path h until time t (not including this time), defined as follows:

$$\begin{aligned} Z_{i,h}^t=\sum _{S=0}^{t-1}Z_{i,h}^S \end{aligned}$$
(12)

Combining formula (11) and formula (12), formula (13) can be obtained, that is, according to the estimated utility of the ith vehicle for the path h at the previous moment and the known utility known at the current moment t, the path h at the current moment t can obtain the estimated utility of:

$$\begin{aligned} {\hat{u}}_{i,h}^t={\hat{u}}_{i,h}^{t-1}+\frac{1\left\{ {\hat{r}}_i^{t-1}=h \right\} }{Z_{i,h}^t}\left( U_i^{t-1}-{\hat{u}}_{i,h}^{t-1}\right) \end{aligned}$$
(13)

4.2 Traffic flow and utility value

The negotiation process is known as a random congestion game and occurs when all cars in the urban road network pick the optimum path close to the intersection. This negotiation is based on the projected utility of the other route. Each vehicle in the game chooses an optional path that, according to the adaptive learning algorithm, maximizes the estimated utility and, with a high probability, reaches a mixed Nash equilibrium. The adaptive learning algorithm combines utility learning and policy learning to concurrently update the program along with the parameters of the Logit model. This method relies on the best response function derived from the estimated utility. This response function is progressively refined using stochastic approximation theory, given that the current vehicle cannot access the utility values of the paths not chosen.

4.2.1 Hybrid strategy

In this paper, a hybrid strategy \(\pi _i^t(r_i)\) is used to represent the probability that vehicle i chooses an optional route \(r_i\) at time t, namely:

$$\begin{aligned} \pi _i^t(r_i)=Pr[r_i^t=r_i] \end{aligned}$$
(14)

Among them, for any optional path \(1 \le h \le M\) of ith vehicle, there are \(0\le \pi _i^t (h)\le 1\) and \(\sum _{h=1}^{M_i}\pi _i^t(h)\). The average utility based on the hybrid method is expressed as:

$$\begin{aligned} u_i(\overrightarrow{\pi })=\sum _\pi [u_i(\overrightarrow{r})]=\sum _{\overrightarrow{r} \in R}u_i(\overrightarrow{r})\prod _{j \in N}\pi _j(r_j) \end{aligned}$$
(15)

The equilibrium based on the Nash principle is obtained when the optimal path compared to how other cars choose their paths, which means that the conditions listed below are met.

$$\begin{aligned} u_i(\pi ^*)=\max _{r_i \in R_i}u_i(r_i,\pi _{-i}^*) \end{aligned}$$
(16)

4.2.2 Optimal response function

Because decisions made by one vehicle will have an impact on other decisions made by other vehicles, each vehicle chooses the optimal route according to the estimated utility \({\hat{u}}_i(h)\) of the available routes, making their decisions probabilistic rather than deterministic. Each vehicle chooses its best answer through an optimum response function in accordance with the likelihood of each vehicle’s route selection and the related optional route estimate utility, in order to locate the Nash equilibrium point of route selection. Based on [44], the following definition represents the ideal response function for ith vehicle to select a path:

$$\begin{aligned} r_{i,h}^t=arg\max _{\pi _i \in \sum i}\left[ \sum _{h\in R_i}\pi _{i,h}^t{\hat{u}}_{i,h}^t+\mu _i\Psi _i(\pi _i^t)\right] \end{aligned}$$
(17)

Among them, \(\mu _i>0\) is a user-related smooth parameter, and \(\Psi _i:\bigtriangleup (R_i)\rightarrow {\mathbb {R}}\) is a smooth and strictly differentiable function containing the vehicle i information. It is expressed as:

$$\begin{aligned} \Psi _i(\pi _i)=-\sum _{h\in R_i}\pi _{i,h}\log \pi _{i,h} \end{aligned}$$
(18)

From this, the following optimal response function Logit type can be obtained:

$$\begin{aligned} \gamma _{i,h}^t=\frac{e^{\left( \frac{{\hat{u}}_{i,h}^t}{\mu _i}\right) }}{ \sum _{W\in R_i}e^{\frac{{\hat{u}}_{i,w}^t}{\mu _i}} },h\in R_i, i\in N \end{aligned}$$
(19)

In order for \(\left\{ \gamma _{i,h}^t \right\} _{t \ge 0}\) to effectively respond, the recursive estimated via known utility approximation. The following update formula is used to approximate formula (13) [45]:

$$\begin{aligned} {\hat{u}}_{i,h}^{t+1}={\hat{u}}_{i,h}^t+{\xi ^{t+1}}\frac{1\left\{ r_i^t=h\right\} }{r_{i,h}^t}(U_i^t-{\hat{u}}_{i,h}^t) \end{aligned}$$
(20)

Among them, for each ith vehicle, \(\left\{ {\xi _i^t}\right\} _{t>0}\) is a deterministic sequence, satisfying the following conditions:

$$\begin{aligned} \sum _{t\ge 0}\xi _i^t=\infty ,\sum _{t\ge 0}{(\xi _i^t)}^2<\infty \end{aligned}$$
(21)

Ordinary differential equations (ODE) in [46] can approach the convergence condition of formula (20). The estimated utility will converge to the anticipated value of the known utility. In a more general scenario, the system exhibits an erratic behavior when all cars alter their courses simultaneously. Reference [47, 48] has confirmed that the parameter \(\xi _t\) in formula (20) will be a vehicle-specific parameter if it satisfies the following conditions \(\left\{ \xi _i^t\right\}\):

$$\begin{aligned} \left\{ when \ t \rightarrow then\ {\frac{\xi _i^t}{\xi _i^{t+1}}} \rightarrow 0 \right. \end{aligned}$$
(22)

The following lemma is based on the theory of perturbed differential equations presented in [47, 49]: As long as it stays limited, the value \(hat{u}_i^t\) given by formula (19) together with the vehicle-specific parameter \(\xi ^t\) will converge to a recursive set of interconnected chains of traffic flows that are described by anomalous perturbation differential equations.

4.3 Adaptive learning algorithm

In the adaptive learning process, the Logit model parameters are employed and adjusted iteratively based on the regret value, which is defined by the realized utility. If every vehicle consistently chooses a superior route, the regret value will gradually approach a negligible amount, increasing the probability of each vehicle selecting the best path. Finally, it is demonstrated that the method converges to \(\varepsilon\)-Nash equilibrium using the Markov chain model.

4.3.1 Markov learning process

The specific information of each vehicle and the cost of the chosen road segment rely on the traffic flow of other road segments when several cars travelling to different intersections play a routing game, disrupting the vehicle’s routing strategy. Right now, the better response-rather than the optimal response-is the sensible selection of the vehicle. The state transfer approach used in this study is based on the Markov learning model for state transfer, which views the game’s state as a learning level and specifies the state transfer procedure. At each time step \(t=1,2,...\), use X(t) to represent the game state, the variable v is the number of learners in this state, and the learner is the vehicle that has successfully found the best response path. The remaining \((n-\nu )\) vehicles are non-learners. Mark the non-learner’s path selection behavior as 0. On the other hand, mark the learner’s path selection behavior as 1, and the learner will keep the behavior unchanged in the next time step.

Use \(\sigma _\nu\) to represent the probability that a certain vehicle chooses the best route in the state X(t), and \((1-\sigma _\nu )\) represents the probability that a certain vehicle does not choose the best route. Then as shown in Fig. 3, the state transition probability is expressed as:

$$\begin{aligned} \left\{ \begin{array}{l} Pr[X(t+1)=\nu +1|X(t)=\nu ]=\frac{n-\nu }{n}\sigma _\nu ,0 \le \nu \le n-1 \\ Pr[X(t+1)=\nu -1|X(t)=\nu ]=\frac{\nu }{n}(1-\sigma _\nu ),1 \le \nu \le n \\ Pr[X(t+1)=\nu |X(t)=\nu ]=\frac{n-\nu }{n}(1-\sigma _\nu )+\frac{\nu }{n}\sigma _\nu ,0 \le \nu \le n \\ Pr[X(t+1)=\nu '|X(t)=\nu ]=0,\nu ' \notin \left\{ \nu -1,\nu +1 \right\} \end{array} \right. \end{aligned}$$
(23)
Fig. 3
figure 3

Variation of probability based on state

Each vehicle is unaware of its current situation, despite the information analysis centre knowing it. The Markov chain model will converge to a stable migration probability since it is irreducible and non-periodic, in accordance with the features of the Markov chain. This probability may be discovered by solving the following equation [50,51,52]:

$$\begin{aligned} \frac{\lambda _{\nu +1}}{\lambda _\nu }={\left( \frac{n-\nu }{\nu +1}\right) }{\left( \frac{\sigma _\nu }{1-\sigma _nu}\right) },\sum _{\nu =0}^{n}\lambda _\nu =1 \end{aligned}$$
(24)

4.3.2 Algorithm process

In the process of each vehicle driving to the intersection to choose the forward path, given the initial path selection behavior vector and the corresponding utility value \(\left\{ \overrightarrow{r}^0, \overrightarrow{U}^0\right\}\), the following path selection behavior and utility value \(\left\{ \overrightarrow{r}^t, \overrightarrow{U}^t\right\} (t>0)\) will be generated successively. It is assumed that ith vehicle only knows the realized utility value \(\left\{ U_i^t \right\}\) at each stage \(t>0\), and makes a probability distribution on the selection of alternative paths based on certain rules. The information center computes the utility value of the route and sends it to each vehicle so that the driving vehicle is aware that each time has passed. This is done on the basis of the route selection vector generated by each vehicle agent on the optional route based on the hybrid strategy. The moving vehicle chooses the current course via exploration and the best path at the moment based on the realized path utility value at each instant TBR (tentative best route).

1) Path Initialization The adaptive learning algorithm is started, and this procedure is referred to as the initialization condition, when the vehicle agent is about to arrive at the intersection and sends the preference information of the road ahead to the information center, realizing that it needs to negotiate the route selection. At \(t=0\), each agent selects \(r_i^0\) in a random manner. At the next moment \(t=1\), take this path as the TBR \(h^*=r_i^1=r_i^0\).

2) Learning Stage Each vehicle i explores to select a tentative optimal path TBR \(h^*\), and then includes this path into the optimal path set (i.e., with the goal of maximizing the estimated utility of each vehicle):

$$\begin{aligned} B_i^t=\left\{ h^* \in R_i:h^*=\arg \max _{r_i \in R_i} {\hat{u}}_i(r_i) \right\} \end{aligned}$$
(25)

Based on the following probabilistic assignment rules (choose the path with the highest estimated utility with high probability, and randomly select other paths with low probability), vehicle i selects the heuristic optimal path \(h^*\) by exploration:

$$\begin{aligned} \pi _i^t(r_i) = \left\{ \begin{array}{l} \varepsilon \beta _i^t(r_i)+(1-\varepsilon ), r_i=h^*\\ \frac{\varepsilon (1-\beta (h*))}{|R_i|-1}, r_i \ne h* \end{array} \right. \end{aligned}$$
(26)

Among them, \(\varepsilon\) is a small number with arbitrary outcomes. The selection probability is expressed as:

$$\begin{aligned} \beta _i^t(r_i)=Ke^{\left( \frac{{\hat{u}}_i(r_i)}{\mu _i^t}\right) }, r_i=h* \end{aligned}$$
(27)

Among them, K is a parameter that is deployed for normalization, \(\mu _i^t\) is the Logit parameter. It is automatically updated according to the user’s experience:

$$\begin{aligned} \left\{ \begin{array}{l} \mu _i^t = \max [\varepsilon _\mu ,\mu _i^t]\\ \mu _i^t=\mu _i^{t-1}+\frac{H_i^t-\mu _i^t}{t} \end{array} \right. \end{aligned}$$
(28)

Among them, \(H_i^t\) represents the regret value of the vehicle, defined as \(H_i^t=U_i^t-{\overline{U}}_i^t\), where, \({\overline{U}}_i^t=\sum _{s=1}^tU_t^s/t\). When the time t becomes very large, the regret value \(H_i^t\) is close to 0.

The new path selection vector \(\overrightarrow{r}^t\) is generated according to the mixed strategy \(\overrightarrow{\pi }^i\) of each vehicle.

The first equation in Eq. (26) shows that: Vehicle i assigns probability \(\varepsilon \beta _i(h^*)\) to each optional path at least, and assigns a little more probability to TBR, so as to strengthen TBR. Equations (26) and (27) represent the re-planning (adaptive learning) process of vehicle i to tentatively select TBR.

3) Path Selection Iterative Process

If TBR is frequently chosen, the vehicle will maintain its path choice in the subsequent phases; if the hybrid strategy’s path choice is not TBR, the vehicle will continue to investigate TBR in accordance with formulas (26) and (27).

4) Output Optimal Path Selection

When the route selection of each vehicle reaches the Nash equilibrium or when a certain distance from the connection point (based on GPS positioning) is computed, the algorithm stops if each vehicle agent has located its own TBR within the stipulated number of repetitions. An example of a termination condition is this. The vehicle agent goes to the associated junction to continue driving in accordance with the final route selection scheme after the algorithm has been run.

Figure 4 displays the detailed flowchart of the adaptive learning algorithm. The approach boasts advantages such as high precision, impressive adaptability, and robust resilience. This is attributed to its consideration of the interplay between vehicles navigating multiple intersections in the urban traffic grid while concurrently selecting their paths.

Fig. 4
figure 4

Proposed algorithm flowchart

5) Computational Complexity

The dynamic programming approach is used in the adaptive learning algorithm to solve, and the algorithm’s time complexity is \(O(n^m)\). Compared to conventional path selection methods which have time complexity of \(O(n^2)\) and \(O(n^3)\), respectively. The network environment has significantly lowered vehicle average journey times and dramatically slowed down the congestion of the whole urban traffic network.

4.3.3 Algorithm convergence

Assuming that the congestion game has a pure Nash equilibrium, then the behavior vector \(\overrightarrow{r}^t\) generated by the adaptive learning algorithm will converge to the \(\varepsilon\)-Nash equilibrium with probability \(\rho <1\) when \(\varepsilon\) is very small and the time t is very large.

First, for the stationary probability, there is the following formula:

$$\begin{aligned} \lambda _\nu = \left( \frac{\nu +1}{n-\nu }\right) \left( \frac{1-\sigma _\nu }{\sigma _\nu }\right) \lambda _{\nu +1},0 \le \nu \le n-1 \end{aligned}$$
(29)

Among them, \(\sigma _\nu =1-\varepsilon (1-\beta _\nu ^*)\), \(\beta _\nu ^*\) is the selection probability assigned to the tentative best response path TBR when the number of learners is \(\nu\). Define \(\sigma =\min _{0 \le \nu \le n}\sigma _\nu\) is the lowest probability of selecting TBR, and \(\varepsilon\) can be selected to ensure that \(\delta =(1-\sigma )/\sigma <1\). If \(\varepsilon \le 1/n\) is chosen, then there is the following formula:

$$\begin{aligned} \lambda _0 \le ... \le \lambda _{n-1} \le \lambda _n \end{aligned}$$
(30)

Assuming the last state, all vehicles have found the TBR with probability 1, that is, \(\lambda _n=\rho =1\). This means that \(\lambda _0=\lambda _{n-1}=0\), and all vehicles have almost converged to a pure Nash equilibrium, which may occur in the problem that vehicles know each other’s path choices. However, in the problem that vehicles do not know each other’s route selection, the probability \(\rho\) must be less than 1. According to formula (21), for a given probability \(\lambda _n=\rho\), the sequence equation \(\left\{ \lambda _{n-1},\left\{ \lambda _{n-2},...,\lambda _0\right\} \right\}\), and their sum is:

$$\begin{aligned} \Phi= & {} \sum _{\nu =0}^{n-1}\lambda _\nu =\left( \left( n\frac{n-1}{2}...\frac{1}{n}\right) \delta ^n+... +n\frac{n-1}{2}\delta ^2+n\delta \right) \nonumber \\ \rho= & {} \rho \sum _{\nu =1}^{n}\begin{pmatrix} n \\ \nu \end{pmatrix}\delta ^\nu \end{aligned}$$
(31)

Using the binomial formula, we can get \(\Phi =\rho \left\{ {\left( 1+n\right) }^n-1\right\}\). Thus:

$$\begin{aligned} \rho =\frac{1}{{\left( 1+\delta \right) }^n}=\left\{ 1-\varepsilon \left( 1-\beta \right) \right\} ^n<1 \end{aligned}$$
(32)

If the minimum probability \(\beta ^*\) assigned to TBR is close to 1, then the probability \(\rho\) that all users will successfully find the best path will also be very close to 1. However, using very small values of \(\varepsilon\) will have the opposite effect. At a relatively early stage of an iteration, a vehicle will choose the wrong path with a high probability, and then correct it. If some lower probability is added, meaning that some vehicles are allowed to get better responses, that is, \(\rho =\lambda _n+lambda_{n-1}+...+lambda_{n-s}\), then the \(\varepsilon\)-Nash equilibrium will be achieve.

5 Simulation results

The suggested model is tested in this research using JADE as the agent simulator, VanetMobiSim as the vehicle movement simulator, and tests on a synthetic road network and a real road network, respectively. Using Scenario Extended Markup Language files, several road networks are constructed in VanetMobiSim, and agent behavior (representing the concept of the suggested solution) is simulated in JADE. The artificial road network with several junctions that Sect. 5.1 sets up has varying numbers of intersections and cars. This study contrasts the introduced model with three non-negotiation algorithms to illustrate that the route selection for multiple vehicles in the suggested model, post-negotiation, is more accurate and effective. To further validate the applicability of the recommended model in actual traffic conditions, Sect. 5.2 extends to a larger real-world road network and evaluates under different traffic density scenarios (smooth flow, open flow, mixed flow, and congested flow).

5.1 Scenario 1: artificial road network

In this case, a particular section of the man-made road network is chosen as the experimental object. The road is made up of 3 subsidiary arterial roads and 5 main arterial roads (solid lines). (dotted lines). The roads in the road network cross each other and are bidirectional. The 7 destination nodes have the letters A through G. They serve as either the starting point or the finishing point of the journey in the simulation experiment, as seen in Fig. 5. A primary or a less-than-ideal preferred route is first arbitrarily allocated to each vehicle agent. A road stretch will begin to experience congestion if the vehicle capacity surpasses the threshold capacity. This will result in slower vehicle speeds. The vehicle’s driving speed is further lowered and enters a very slow driving stage when the road’s vehicle capacity surpasses its traffic capacity.

Fig. 5
figure 5

Traffic network scenario using artificial approach

In order to verify the influence of using the proposed model for mutual negotiation between vehicles to reduce the travel time of the entire road network, this paper compares the model with three non-negotiation algorithms by changing the weight coefficients of different parameters in the path utility calculation formula: Shortest path algorithm \((p=0,b=0)\), preference-based algorithm \((c=0,b=0)\), preference-based shortest path algorithm \((p=c\ and\ p!=0,b=0)\). These algorithms lack inter-vehicle communication, making it impossible for cars to mutually negotiate the avoidance of route selection conflicts. It does not take into account the possibility of unpredictable crises on the road either. In the shortest path algorithm, regardless of whether the road attribute of the path is in line with the vehicle preference, all vehicles choose the path that has the shortest driving distance from the source address to the destination. The utility of choosing a path is solely dependent on the path that is chosen. The usefulness of selecting a route solely depends on the related vehicle’s preference value for this route in the preference-based algorithm. Regardless of whether the path exceeds the road capacity, all cars pick the shortest way among their preferences in the preference-based shortest path selection process.

Set the scale of road network negotiation from small to large and choose four (1,2,5,6), nine (1,2,3,5,6,7,9,10,11) and sixteen (16) junctions to observe the performance of the proposed model in various sized road networks. A, B, C, and D are utilized as source addresses and E, F, and G are used as destinations in the simulation experiment. Vehicle agents communicate with one another and negotiate a course before they arrive at each crossing. This procedure assumes that the vehicle is completely aware of the traffic conditions at the intersection up ahead and that (global positioning system (GPS) information is accurate. The suggested model’s percentage trip time decrease in comparison to the three non-negotiation methods in various circumstances is depicted in Fig. 6.

Fig. 6
figure 6

Driving time reduction of the proposed algorithm in artificial traffic configuration under different path allocation methods

It can be seen from the simulation results in Fig. 6 that:

  1. 1)

    The suggested model may minimize driving time for road networks with several junctions compared to the three non-negotiation techniques, particularly compared to the shortest path approach, which can cut driving time by 5% to 37%. To reduce travel time expenses, the suggested model calls for cars to coordinate their dispersion on different routes and communicate with one another. Because all cars only select the shortest way in the shortest path algorithm (regardless of the road capacity restriction), congestion is a result of this behavior. The user’s preference (possibly more than one path is preferred) drives both the preference-based algorithm and the preference-based shortest path algorithm; in fact, this will result in a certain percentage of vehicles being dispersed on some optional paths rather than being crowded in one path on the shortest path. As a result, the suggested model typically outperforms the shortest path algorithm in terms of benefits, as opposed to both the relative preference-based algorithm and the preference-based shortest path algorithm;

  2. 2)

    As more intersections are factored in, the percentage decrease in driving time actually increases, rather than diminishes, when the number of vehicles surpasses the road network’s capacity threshold. However, the percentage of time saved is at its smallest when the number of vehicles goes beyond the congestion capacity. This demonstrates that: When the number of vehicles on the road network is just starting to rise, route selection through vehicle negotiation can improve the distribution of vehicles on the road network, but when the road network is too congested, vehicle negotiation has little impact on the distribution of traffic flow;

  3. 3)

    The reduction in travel time becomes increasingly evident as more junctions participate in negotiation and route optimization. This demonstrates that the proposed model’s proposed model’s more junctions engaging in negotiation, the longer the distance travelled by participating cars, and the share of travel time reduction will be larger. This is due to the fact that there are more intersections encountered by the vehicle in this scenario, giving it more opportunities to negotiate a better route allocation that will allow it to travel more efficiently through the urban road network and maximize the use of its available resources.

5.2 Scenario 2: real road network

The suggested model is tested for stability and scalability in this research via comparison tests on a bigger actual grid road network. As seen in Fig. 7, the suggested model is used in this case to analyses an actual grid network in the Seoul District of Korea. The urban road network has 24 major junctions, of which intersections 1–4, 6, 11, 12, 16, 20, 22, and 24 are utilized as the start and finish locations of the test road network. Each of the 36 road sections that make up the road network’s roadways is a two-way street. The specifications for the real state of the road are established, and a random value is chosen for each driving vehicle’s preference attribute for each road stretch. There is a speed restriction and each portion of the road crosses over the others.

Fig. 7
figure 7

Illustration of real road network

In order to observe the performance of the proposed model in different traffic flow densities, this paper conducts experiments on the road saturation of 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, and 0.8 in the traffic network. The corresponding traffic flow states of 0.1 0.2, 0.3 0.4, 0.5 0.6, 0.7 0.8 respectively represent free flow after 10:00 at night, free flow at 3:00 p.m. For the four common traffic scenarios at 7:00 8:00 am, the number of cars observed is about 9,000 \(\sim\) 90,000.

Choose any intersection in the road network as the source and destination, then contrast the suggested model with the three non-negotiation algorithms shortest path, preference-based, and preference-based shortest path. In these four circumstances, Fig. 8 compares the proposed model’s trip time reduction percentages to those of the three non-negotiation procedures.

Fig. 8
figure 8

Driving time reduction of the proposed algorithm in real traffic configuration under different path allocation methods

It can be seen from the results in Fig. 8 that:

  1. 1)

    The suggested model still reduces travel time by 8% to 36% compared to the three non-negotiation methods, despite the bigger scope of the road network and the presence of more cars;

  2. 2)

    The suggested model’s driving time reduction % in comparison to the three non-negotiation methods constantly rises as road saturation levels out. The driving time of the suggested model has the maximum percentage of time reduction when the road saturation hits 0.5, or in the mixed flow stage when the road is slightly crowded. The negotiating outcome of the suggested model, however, reduces the percentage of travel time reduction when traffic volume rises and the route enters an extremely crowded congestion flow condition. This matches the experimental findings from the simulation on the made-up road network.

    However, there are still several limitations regarding the proposal in this work. First of all, the experiment results are obtained from artificial road network instead of real-world road networks. Second, the traffic conditions are often varied with time critically; therefore, it is necessary to integrate the time factor into transportation route scheduling problems in the future work.

6 Conclusion

This paper proposes a new algorithm in order to efficiently reduce the problem of urban traffic congestion while taking into account the individual traffic demands of driving vehicles and the dynamic uncertainty of traffic conditions in the urban road network. The driving vehicles in the road network are based on the preference of the optional route, combined with the current traffic conditions of the optional route, and they continuously update the forward route using the adaptive learning method to ensure that each driving vehicle’s route selection strategy reaches Nash equilibrium while maximizing the application efficiency of road network resources. Numerous tests were conducted to test the viability of the suggested model, and the following findings were made.

In the following two areas, further study is still required. First of all, the influence of the traffic light cycle at the junction on vehicle route selection will be taken into consideration. Second, further examine the impact of modifying the proportion of cars that do not follow the system’s advised route on the average travel time of vehicles in the overall metropolitan road network. Third, the collaboration and sharing among different parties inevitably disclose partial sensitive information of people [53,54,55,56]. So how do you cope with such an important data sharing problem with privacy protection is still an open problem that requires intensive study.