Demonstration of Multiagent Reinforcement Learning Applied to Traffic Light Signal Control

Higuera, Carolina; Lozano, Fernando; Camacho, Edgar Camilo; Higuera, Carlos Hernando

doi:10.1007/978-3-030-24209-1_25

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11523))

Included in the following conference series:

International Conference on Practical Applications of Agents and Multi-Agent Systems

1186 Accesses

Abstract

We present a demonstration of two coordination methods for the application of multiagent reinforcement learning to the problem of traffic light signal control to decrease travel time. The first approach that we tested exploits the fact that the reward function can be splitted into contributions per agent. The second method computes the best response for a two player game with each member of its neighborhood. We apply both learning methods through SUMO traffic simulator, using data from the Transit Department of Bogotá, Colombia.

Access provided by Autonomous University of Puebla. Download conference paper PDF

Exploring Cooperative Multi-agent Reinforcement Learning Algorithm (CMRLA) for Intelligent Traffic Signal Control

A multi-agent reinforcement learning based approach for intelligent traffic signal control

Article 13 September 2024

A Framework for Smart Traffic Controller by Improved Cooperative Multi-agent Learning Algorithms (ICMALA)

Keywords

1 Introduction

In this work, we test solutions to decrease travel times based on multiagent reinforcement learning, modeling the problem as a multiagent Markov Decision Process (MDP). A collection of agents learns to minimize vehicle queuing delays and queue lengths at all junctions. Coordination between agents is done in two different ways. In the first approach (Q-VE) agents are modeled as vertices in a coordination graph and the joint action is found with the variable elimination algorithm. The second method (Q-BR) computes the action for an agent as the best response of a two player game with each member of its neighborhood.

2 Main Purpose

Multiagent RL for traffic light control allows to split the global function Q into a linear combination for each agent. However, decisions made at the individual level must be optimal for the group. Hence, the problem of coordination is to find at each step the joint action:

$$\begin{aligned} \mathbf {a}^{*}=\mathop {{{\,\mathrm{argmax}}}}\limits _{\mathbf {a}^{\prime } \in \mathcal {A}}Q(\mathbf {s}^{k},\mathbf {a}^{\prime }) \end{aligned}$$

(1)

2.1 Coordination Graphs - (Q-VE)

In a coordination graph, $G=(\mathcal {V},\mathcal {E})$ agent $i \in \mathcal {V}$ needs to coordinate its actions with its neighbors$\varGamma (i)=\{j\,:\, (i,j)\in \mathcal {E}\}$. Given the action $\mathbf {a}^{*}$ and joint state $s_{ij}^{k}$, we update the factors $Q_{ij}$ for each edge $(i,j)\in \mathcal {E}$ with:

To find the optimal joint action $\mathbf {a}^{*}$ in (1), we use the variable elimination algorithm (VE) proposed by Gaustrin et al. [3].

2.2 Best Response from Game Theory - (Q-BR)

We follow the work done by El-Tantawy et al. in [2], in which each agent participates in a two player game with its neighborhood $\varGamma (i)$. Agent i creates and updates a model $\theta $ that estimates the likelihood of action selection for each neighbor $j \in \varGamma (i)$.

To find the best joint action, $\mathbf {a}^{*}$, each agents computes its best response, which is the action that maximizes the Q factor at their neighborhood level, regardless of the policies of other members:

(2)

2.3 Learning Parameters

The state vector for each agent has the hour to include temporal dynamic; the maximum queue length (in vehicles) in all edges, and the queuing delay (in minutes) of stopped vehicles in every edge. Regarding the actions, all agents have two phases. Finally, the reward function encourages short queue lengths and waiting time experienced by the vehicles throughout the road.

$$\begin{aligned} r_i=-\sum _{k=1}^{edges}\beta _q(q_k)^{\theta _q} + \beta _w(w_k)^{\theta _w} \quad \forall i \in \mathcal {N} \end{aligned}$$

(3)

Where, edges is the number of approaches of agent i. $q_k$ and $w_k$ are the maximum queue length and queuing delay in edge k. $\beta _q$ and $\beta _w$ are coefficients to set priority. $\theta _q$ and $\theta _w$ balance queue lengths and waiting times across approaches.

3 Demonstration

Both methods were simulated in a network of Bogotá, as shown in Fig. 1 through the SUMO simulator [4] and the TraCI environment. To compare Q-VE and Q-BR methods, we implement independent Q-learning as proposed by Camponogara et al. in [1] and, the coordination method proposed by Xu et al. in [5].

Q-VE and Q-BR achieve reductions of at least $14\%$ and at most $48\%$ with respect to Fixed Time. The largest reductions are obtained with Q-BR. We note that Q-BR policy generates green waves along arterials, as show in Fig. 2.

In Fig. 3 we found that the reward evolution with independent learning is very similar to the one obtained by the coordinated method Q-BR. Nonetheless, the policy learned by Q-BR positively influence other variables that are not included in the reward function.

4 Conclusions

Distributing the reward function into contribution per agent simplifies the problem, since the Q factors can be splitted into dependencies between agents. This is represented by the coordination graphs, which are favorable for the application of the VE algorithm. This method allows an exact solution to the joint action selection problem. However, as the algorithm eliminates agents, neighborhoods change and may include ones that are not adjacent, thus, may not have direct communication. The method would require an estimation of the Q factors for nonadjacent agents.

On the other hand, the coordination strategy based on BR presents good scalability, due to communication between agents is known a priori. However, policies in the neighborhood are not shared knowledge, so a greater transmission of information is required to estimate and model the behavior of neighbors.

References

Camponogara, E., Kraus Jr., W.: Distributed learning agents in urban traffic control. In: Pires, F.M., Abreu, S. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 324–335. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-24580-3_38
Chapter MATH Google Scholar
El-Tantawy, S., Abdulhai, B., Abdelgawad, H.: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (MARLIN-ATSC): methodology and large-scale application on downtown toronto. IEEE Trans. Intell. Transp. Syst. 14(3), 1140–1150 (2013). https://doi.org/10.1109/TITS.2013.2255286
Article Google Scholar
Guestrin, C., Koller, D., Parr, R.: Multiagent planning with factored MDPs. In: NIPS-14, pp. 1523–1530. The MIT Press (2001)
Google Scholar
Krajzewicz, D., Erdmann, J., Behrisch, M., Bieker, L.: Recent development and applications of SUMO - Simulation of Urban MObility. Int. J. Adv. Syst. Measure. 5(3&4), 128–138 (2012)
Google Scholar
Xu, L.H., Xia, X.H., Luo, Q.: The study of reinforcement learning for traffic self-adaptive control under multiagent Markov game environment. Math. Probl. Eng. 2013, e962869 (2013). https://doi.org/10.1155/2013/962869
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Santo Tomás, Bogotá, Colombia
Carolina Higuera & Edgar Camilo Camacho
Universidad de los Andes, Bogotá, Colombia
Fernando Lozano
Universidad Pedagógica y Tecnológica de Colombia, Tunja, Colombia
Carlos Hernando Higuera

Authors

Carolina Higuera
View author publications
You can also search for this author in PubMed Google Scholar
Fernando Lozano
View author publications
You can also search for this author in PubMed Google Scholar
Edgar Camilo Camacho
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Hernando Higuera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carolina Higuera .

Editor information

Editors and Affiliations

Grenoble Computer Science Laboratory, Grenoble, France
Yves Demazeau
Purdue University, West Lafayette, IN, USA
Eric Matson
University of Salamanca, Salamanca, Spain
Juan Manuel Corchado
University of Salamanca, Salamanca, Spain
Fernando De la Prieta

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Higuera, C., Lozano, F., Camacho, E.C., Higuera, C.H. (2019). Demonstration of Multiagent Reinforcement Learning Applied to Traffic Light Signal Control. In: Demazeau, Y., Matson, E., Corchado, J., De la Prieta, F. (eds) Advances in Practical Applications of Survivable Agents and Multi-Agent Systems: The PAAMS Collection. PAAMS 2019. Lecture Notes in Computer Science(), vol 11523. Springer, Cham. https://doi.org/10.1007/978-3-030-24209-1_25

Download citation

DOI: https://doi.org/10.1007/978-3-030-24209-1_25
Published: 26 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-24208-4
Online ISBN: 978-3-030-24209-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics