Abstract
Recent algorithmic and theoretical advances in reinforcement learning (RL) have attracted widespread interest. RL algorithms have appeared that approximate dynamic programming on an incremental basis. They can be trained on the basis of real or simulated experiences, focusing their computation on areas of state space that are actually visited during control, making them computationally tractable on very large problems. If each member of a team of agents employs one of these algorithms, a new collective learning algorithm emerges for the team as a whole. In this paper we demonstrate that such collective RL algorithms can be powerful heuristic methods for addressing large-scale control problems.
Elevator group control serves as our testbed. It is a difficult domain posing a combination of challenges not seen in most multi-agent learning research to date. We use a team of RL agents, each of which is responsible for controlling one elevator car. The team receives a global reward signal which appears noisy to each agent due to the effects of the actions of the other agents, the random nature of the arrivals and the incomplete observation of the state. In spite of these complications, we show results that in simulation surpass the best of the heuristic elevator control algorithms of which we are aware. These results demonstrate the power of multi-agent RL on a very large scale stochastic dynamic optimization problem of practical utility.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Axelrod, R.M. (1984). The Evolution of Cooperation. New York, NY: Basic Books.
Bao, G., Cassandras, C.G., Djaferis, T.E., Gandhi, A.D., & Looze, D.P. (1994). Elevator dispatchers for down peak traffic. ECE Department Technical Report, University of Massachusetts.
Barto, A.G. (1989). From chemotaxis to cooperativity: Abstract exercises in neuronal learning strategies. In R. Durbin, C. Miall, and G. Mitchison, (Eds.), The Computing Neuron. Wokingham, England: Addison-Wesley.
Barto, A.G., Bradtke, S.J., & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81–138.
Bertsekas, D.P. & Tsitsiklis, J.N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific Press.
Bradtke, S.J. (1993). Distributed adaptive optimal control of flexible structures. Unpublished manuscript.
Bradtke, S.J. & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, and T. Leen, (Eds.), Advances in Neural Information Processing Systems 7. Cambridge, MA: MIT Press.
Cassandras, C.G. (1993). Discrete Event Systems: Modeling and Performance Analysis. Homewood, IL: Aksen Associates.
Crites, R.H. (1996). Large-Scale Dynamic Optimization Using Teams of Reinforcement Learning Agents. PhD thesis, University of Massachusetts.
Crites, R.H. & Barto, A.G. (1996). Forming control policies from simulation models using reinforcement learning. Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems.
Crites, R. H. & Barto, A.G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, (Eds.), Advances in Neural Information Processing Systems 8. Cambridge, MA: MIT Press.
Dayan, P. & Hinton, G.E. (1993). Feudal reinforcement learning. In S. J. Hanson, J. D. Cowan, and C. L. Giles, (Eds.), Advances in Neural Information Processing Systems 5. San Mateo, CA: Morgan Kaufmann.
Fujino, A., Tobita, T., & Yoneda, K. (1992). An on-line tuning method for multi-objective control of elevator group. Proceedings of the International Conference on Industrial Electronics, Control, Instrumentation, and Automation, (pp. 795–800).
Imasaki, N., Kiji, J., & Endo, T. (1992). A fuzzy neural network and its application to elevator group control. In T. Terano, M. Sugeno, M. Mukaidono, and K. Shigemasu, (Eds.), Fuzzy Engineering Toward Human Friendly Systems. Amsterdam: IOS Press.
Levy, D., Yadin, M., & Alexandrovitz, A. (1977). Optimal control of elevators. International Journal of Systems Science, 8, 301–320.
Lewis, J. (1991). A Dynamic Load Balancing Approach to the Control of Multiserver Polling Systems with Applications to Elevator System Dispatching. PhD thesis, ECE department, University of Massachusetts.
Littman, M. & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing. Technical Report CMU-CS-93-165, Carnegie Mellon University.
Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Proceedings of the Eleventh International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Littman, M.L. (1996). Algorithms for Sequential Decision Making. PhD thesis, Brown University.
Markey, K.L. (1994). Efficient learning of multiple degree-of-freedom control problems with quasi-independent Q-agents. In M. C. Mozer, P. Smolensky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, (Eds.), Proceedings of the 1993 Connectionist Models Summer School. Hillsdale, NJ: Erlbaum Associates.
Markon, S., Kita, H., & Nishikawa, Y. (1994). Adaptive optimal elevator group control by use of neural networks. Transactions of the Institute of Systems, Control, and Information Engineers, 7, 487–497.
Narendra, K.S. & Thathachar, M.A.L. (1989). Learning Automata: An Introduction. Englewood Cliffs, NJ: Prentice-Hall.
Ovaska, S.J. (1992). Electronics and information technology in high-range elevator systems. Mechatronics, 2, 89–99.
Pepyne, D.L. & Cassandras, C.G. (1997). Optimal dispatching control for elevator systems during uppeak traffic. IEEE Transactions on Control Systems Technology, 5, 629–643.
Rumelhart, D.E., McClelland, J.L., & the PDP Research Group. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge, MA: MIT Press.
Sakai, Y. & Kurosawa, K. (1984). Development of elevator supervisory group control system with artificial intelligence. Hitachi Review, 33, 25–30.
Samuel, A.L. (1963). Some studies in machine learning using the game of checkers. In E. Feigenbaum and J. Feldman, (Eds.), Computers and Thought. New York, NY: McGraw-Hill.
Sandholm, T.W. & Crites, R.H. (1996). Multiagent reinforcement learning in the iterated prisoner's dilemma. Biosystems, 37, 147–166.
Shoham, Y. & Tennenholtz, M. (1993). Co-learning and the evolution of coordinated multi-agent activity.
Siikonen, M.L. (1993). Elevator traffic simulation. Simulation, 61, 257–267.
Strakosch, G.R. (1983). Vertical Transportation: Elevators and Escalators. New York, NY: Wiley and Sons.
Sutton, R.S. & Barto, A.G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. Proceedings of the Tenth International Conference on Machine Learning.
Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215–219.
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58–68.
Tobita, T., Fujino, A., Inaba, H., Yoneda, K., & Ueshima, T. (1991). An elevator characterized group supervisory control system. Proceedings of IECON, (pp. 1972–1976).
Tsetlin, M.L. (1973). Automaton Theory and Modeling of Biological Systems. New York, NY: Academic Press.
Ujihara, H. & Amano, M. (1994). The latest elevator group-control system. Mitsubishi Electric Advance, 67, 10–12.
Ujihara, H. & Tsuji, S. (1988). The revolutionary AI-2100 elevator-group control system and the new intelligent option series. Mitsubishi Electric Advance, 45, 5–8.
Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University.
Weiss, G. & Sen, S. (1996). Adaptation and Learning in Multi-Agent Systems. Lecture Notes in Artificial Intelligence, Volume 1042. Berlin: Springer Verlag.
Widrow, B. & Stearns, S.D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Crites, R.H., Barto, A.G. Elevator Group Control Using Multiple Reinforcement Learning Agents. Machine Learning 33, 235–262 (1998). https://doi.org/10.1023/A:1007518724497
Issue Date:
DOI: https://doi.org/10.1023/A:1007518724497