Elevator Group Control Using Multiple Reinforcement Learning Agents

Crites, Robert H.; Barto, Andrew G.

doi:10.1023/A:1007518724497

Elevator Group Control Using Multiple Reinforcement Learning Agents

Published: November 1998

Volume 33, pages 235–262, (1998)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

Elevator Group Control Using Multiple Reinforcement Learning Agents

Download PDF

Robert H. Crites¹ &
Andrew G. Barto²

5361 Accesses
151 Citations
Explore all metrics

Abstract

Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.

Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Axelrod, R.M. (1984). The Evolution of Cooperation. New York, NY: Basic Books.
Google Scholar
Bao, G., Cassandras, C.G., Djaferis, T.E., Gandhi, A.D., & Looze, D.P. (1994). Elevator dispatchers for down peak traffic. ECE Department Technical Report, University of Massachusetts.
Barto, A.G. (1989). From chemotaxis to cooperativity: Abstract exercises in neuronal learning strategies. In R. Durbin, C. Miall, and G. Mitchison, (Eds.), The Computing Neuron. Wokingham, England: Addison-Wesley.
Google Scholar
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Google Scholar
Bertsekas, D.P. & Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific Press.
Google Scholar
Bradtke, S.J. (1993). Distributed adaptive optimal control of flexible structures. Unpublished manuscript.
Bradtke, S.J. & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, (Eds.), Advances in Neural Information Processing Systems 7. Cambridge, MA: MIT Press.
Google Scholar
Cassandras, C.G. (1993). Discrete Event Systems: Modeling and Performance Analysis. Homewood, IL: Aksen Associates.
Google Scholar
Crites, R.H. (1996). Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts.
Crites, R.H. & Barto, A.G. (1996). Forming control policies from simulation models using reinforcement learning. Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems.
Crites, R. H. & Barto, A.G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.
Google Scholar
Dayan, P. & Hinton, G.E. (1993). Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, and C. L. Giles, (Eds.), Advances in Neural Information Processing Systems 5. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Fujino, A., Tobita, T., & Yoneda, K. (1992). An on-line tuning method for multi-objective control of elevator group. Proceedings of the International Conference on Industrial Electronics, Control, Instrumentation, and Automation, (pp. 795–800).
Imasaki, N., Kiji, J., & Endo, T. (1992). A fuzzy neural network and its application to elevator group control. In T. Terano, M. Sugeno, M. Mukaidono, and K. Shigemasu, (Eds.), Fuzzy Engineering Toward Human Friendly Systems. Amsterdam: IOS Press.
Google Scholar
Levy, D., Yadin, M., & Alexandrovitz, A. (1977). Optimal control of elevators. International Journal of Systems Science, 8, 301–320.
Google Scholar
Lewis, J. (1991). A Dynamic Load Balancing Approach to the Control of Multiserver Polling Systems with Applications to Elevator System Dispatching. PhD thesis, ECE department, University of Massachusetts.
Littman, M. & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Technical Report CMU-CS-93-165, Carnegie Mellon University.
Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Littman, M.L. (1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University.
Markey, K.L. (1994). Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents. In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, (Eds.), Proceedings of the 1993 Connectionist Models Summer School. Hillsdale, NJ: Erlbaum Associates.
Google Scholar
Markon, S., Kita, H., & Nishikawa, Y. (1994). Adaptive optimal elevator group control by use of neural networks. Transactions of the Institute of Systems, Control, and Information Engineers, 7, 487–497.
Google Scholar
Narendra, K.S. & Thathachar, M.A.L. (1989). Learning Automata: An Introduction. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Ovaska, S.J. (1992). Electronics and information technology in high-range elevator systems. Mechatronics, 2, 89–99.
Google Scholar
Pepyne, D.L. & Cassandras, C.G. (1997). Optimal dispatching control for elevator systems during uppeak traffic. IEEE Transactions on Control Systems Technology, 5, 629–643.
Google Scholar
Rumelhart, D.E., McClelland, J.L., & the PDP Research Group. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press.
Google Scholar
Sakai, Y. & Kurosawa, K. (1984). Development of elevator supervisory group control system with artificial intelligence. Hitachi Review, 33, 25–30.
Google Scholar
Samuel, A.L. (1963). Some studies in machine learning using the game of checkers. In E. Feigenbaum and J. Feldman, (Eds.), Computers and Thought. New York, NY: McGraw-Hill.
Google Scholar
Sandholm, T.W. & Crites, R.H. (1996). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37, 147–166.
Google Scholar
Shoham, Y. & Tennenholtz, M. (1993). Co-learning and the evolution of coordinated multi-agent activity.
Siikonen, M.L. (1993). Elevator traffic simulation. Simulation, 61, 257–267.
Google Scholar
Strakosch, G.R. (1983). Vertical Transportation: Elevators and Escalators. New York, NY: Wiley and Sons.
Google Scholar
Sutton, R.S. & Barto, A.G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
Google Scholar
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
Google Scholar
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.
Google Scholar
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58–68.
Google Scholar
Tobita, T., Fujino, A., Inaba, H., Yoneda, K., & Ueshima, T. (1991). An elevator characterized group supervisory control system. Proceedings of IECON, (pp. 1972–1976).
Tsetlin, M.L. (1973). Automaton Theory and Modeling of Biological Systems. New York, NY: Academic Press.
Google Scholar
Ujihara, H. & Amano, M. (1994). The latest elevator group-control system. Mitsubishi Electric Advance, 67, 10–12.
Google Scholar
Ujihara, H. & Tsuji, S. (1988). The revolutionary AI-2100 elevator-group control system and the new intelligent option series. Mitsubishi Electric Advance, 45, 5–8.
Google Scholar
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University.
Weiss, G. & Sen, S. (1996). Adaptation and Learning in Multi-Agent Systems. Lecture Notes in Artificial Intelligence, Volume 1042. Berlin: Springer Verlag.
Google Scholar
Widrow, B. & Stearns, S.D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar

Download references

Author information

Authors and Affiliations

Unica Technologies, Inc., 55 Old Bedford Road, Lincoln, MA, 01773
Robert H. Crites
Department of Computer Science, University of Massachusetts, Amherst, MA, 01003
Andrew G. Barto

Authors

Robert H. Crites
View author publications
You can also search for this author in PubMed Google Scholar
Andrew G. Barto
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crites, R.H., Barto, A.G. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning 33, 235–262 (1998). https://doi.org/10.1023/A:1007518724497

Download citation

Issue Date: November 1998
DOI: https://doi.org/10.1023/A:1007518724497

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Elevator Group Control Using Multiple Reinforcement Learning Agents

Abstract

Article PDF

Similar content being viewed by others

Multiagent Reinforcement Learning Applied to Traffic Light Signal Control

Constrained Multiagent Reinforcement Learning for Large Agent Population

Multi-agent Reinforcement Learning for Control Systems: Challenges and Proposals

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Elevator Group Control Using Multiple Reinforcement Learning Agents

Abstract

Article PDF

Similar content being viewed by others

Multiagent Reinforcement Learning Applied to Traffic Light Signal Control

Constrained Multiagent Reinforcement Learning for Large Agent Population

Multi-agent Reinforcement Learning for Control Systems: Challenges and Proposals

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation