Abstract
The Decentralized Partially Observable Markov Decision Process (DEC-POMDP) model addresses the multiagent planning problem in partially observable environments. Due to its high computational complexity, in general only very small size problems can be solved exactly and most researchers concentrate on approximate solution algorithms to handle more complex cases. However, many approximate solution techniques can handle large size problems only for small horizons due to their exponential memory requirements for representing the policies and searching the policy space. In this study, we offer an approximate solution algorithm called GA-FSC that uses finite state controllers (FSC) to represent a finite-horizon DEC-POMDP policy and searches the policy space using genetic algorithms. We encode FSCs into chromosomes and we use one exact and one approximate technique to calculate the fitness of the chromosomes. The exact calculation technique helps us to obtain better quality solutions with the cost of more processing time compared to the approximate fitness calculation. Our method is able to replicate the best results reported so far in the literature in most cases and it is also able to extend the reported horizons further in almost all cases when compared to optimal approaches.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Akın, H.L. (1994). Evolutionary computation: A natural answer to artificial questions. In: ANNAL, pp. 41–52.
Amato, C., Bernstein, D.S., Zilberstein, S. (2006). Optimal fixed-size controllers for decentralized POMDPs. In: AAMAS 2006 Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains.
Amato, C., Bernstein, D.S., Zilberstein, S. (2007). Optimizing memory-bounded controllers for decentralized POMDPs. In: Proceedings of the Twenty-Third Conference on Uncertainty in Artificial Intelligences (UAI-07), pp. 1–8.
Amato C., Bernstein D. S., Zilberstein S. (2010) Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs. Autonomous Agents and Multi-Agent Systems 21: 293–320
Amato, C., Dibangoye, J.S., Zilberstein, S. (2009). Incremental policy generation for finite-horizon DEC-POMDPs. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 2–9.
Amato, C., Zilberstein, S. (2008). Heuristic policy iteration for infinite-horizon decentralized POMDPs. In: Proceedings of the Workshop on Multi-Agent Sequential Decision Making in Uncertain Domains (MSDM).
Aras R., Dutech A. (2010) An investigation into mathematical programming for finite horizon decentralized POMDPs. The Journal of Artificial Intelligence Research 37: 329–396
Astrom K. (1965) Optimal control of Markov decision process with incomplete state estimation. Journal of Mathematical Analysis and Applications 10: 174–205
Becker R., Zilberstein S., Goldman C. V. (2004) Solving transition independent decentralized markov decision processes. Journal of Artificial Intelligence Research 22: 423–455
Bernstein D. S., Amato C., Hansen E. A., Zilberstein S. (2009) Policy iteration for decentralized control of markov decision processes. Journal of Articial Intelligence Research 34: 89–132
Bernstein, D. S., Hansen, E., Zilberstein, S. (2005). Bounded policy iteration for decentralized POMDPs. In: International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1287–1292). Scotland: Edinburgh.
Bernstein, D. S., Zilberstein, S., Immerman, N. (2000). The complexity of decentralized control of markov decision processes. In: Sixteenth Conference on Uncertainty in Artificial Intelligence (UAI).
Besse, C., Chaib-draa, B. (2010). Quasi-deterministic POMDPs and Dec-POMDPs. In: In Proceedings of 5th International Workshop On Multiagent Sequential Decision Making in Uncertain Domains.
Boger J., Hoey J., Poupart P., Boutilier C., Fernie G., Mihailidis A. (2006) A planning system based on markov decision processes to guide people with dementia through activities of daily living. IEEE Transactions on Information Technology and Biomedicine 10: 323–333
Boularias, A., Chaib-draa, B. (2008). Exact dynamic programming for decentralized POMDPs with lossless policy compression. In: International Conference on Automated Planning and Scheduling.
Boularias, A., Chaib-draa, B. (2008). Planning in decentralized POMDPs with predictive policy representations. In: ICAPS’08 Multiagent Planning Workshop (MASPLAN’08).
Carlin, A., Zilberstein, S. (2008). Value-based observation compression for DEC-POMDPs. In: In Proceedings of the 7th International Joint Conference on Autonomous agents and multiagent systems, pp. 501–508.
Cohen P. R. (1995) Empirical methods for artificial intelligence. MIT Press, Massachusetts
Dibangoye, J. S., Chai-draa, B., Mouaddib, A. I. (2009). Policy iteration algorithms for DEC-POMDPs with discounted rewards. In: Eighth International AAMAS Workshop on MSDM.
Dibangoye, J. S., Mouaddib, A., Chai-draa, B. (2009). Point-based incremental pruning heuristic for solving finite-horizon DEC-POMDPs. In: In Proceedings of the Eighth International Joint Conference on Autonomous Agents andMultiagent Systems, pp. 569–576.
Eker B., Akın H. (2010) Using evolution strategies to solve DEC-POMDP problems. Soft Computing-A Fusion of Foundations, Methodologies and Applications 14(1): 35–47
Fogel I., Owens A. J., Walsh M. (1966) Artificial intelligence through simulated evolution. Wiley, New York
Hansen, E. A., Bernstein, D. S., Zilberstein, S. (2004). Dynamic programming for partially observable stochastic games. In: 19th National Conference on Artificial Intelligence (AAAI-04), San Jose, CA (pp. 709–715).
Holland J. (1975) Adaptation in natural, artificial systems. University of Michigan Press, Ann Arbor
Jimenez, F., Sanchez, G., Vasant, P., Vergeday, J. L. (2006). A multi-objective evolutionary approach for fuzzy optimization in production planning. In: IEEE International Conference on Systems, Man, and Cybernetics, pp. 3120–3125.
Kaelbling L. P., Littman M. L., Cassandra A. R. (1995) Planning and acting in partially observable stochastic domains. Artificial Intelligence 101: 99–134
Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge: MIT Press. ISBN 0-262-11170-5.
Kube C. R., Zhang H. (1997) Task modelling in collective robotics. Autonomous Robots 4: 53–72
Kumar, A., Zilberstein, S. (2009). Constraint-based dynamic programming for decentralized POMDPs with structured interactions. In: Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, Vol. 1, AAMAS ’09, pp. 561–568.
Kumar, A., Zilberstein, S. (2010). Point-based backup for decentralized POMDPs: Complexity and new algorithms. In: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, Toronto, Canada, pp. 1315–1322.
LaValle S. (2006) Planning algorithms. Cambridge University Press, Cambridge
Lin, A. Z. Z., Bean, J. C., White, C. C. (1998). Genetic algorithm heuristics for finite horizon partially observed markov decision problems. Technical Report, University of Michigan, Ann Arbor.
Lopez, E., Barea, R., Bergasa, L., Escudero, M. (2003). Visually augmented POMDP for indoor robot navigation. In: 21th IASTED International Multi-Conference on Applied Informatics.
Mazurowski, M., Zurada, J. (2007). Solving decentralized multi-agent control problems with genetic algorithms. In: IEEE Congress on Evolutionary Computation, pp. 1029–1034.
Nair, R., Tambe, M., Yokoo, M., Pynadath, D., Marsella, S. (2003). Taming decentralized POMDPs: Towards efficient policy computation for multiagent settings. In: Eighteenth International Joint Conference on Artificial Intelligence (IJCAI-03), pp. 705–711.
Nair, R., Varakantham, P., Tambe, M., Yokoo, M. (2005). Networked distributed POMDPs: A synthesis of distributed constraint optimization and POMDPs. In: Proceedings of the 20th National Conference on Artificial Intelligence, pp. 133–139.
Oliehoek, F. A. (2012). Decentralized POMDPs. In: M. Wiering, & M. V. Otterlo (Eds.), Reinforcement learning: State of the art. Berlin, Germany: Springer
Oliehoek, F. A., Kooij, J. F. P., Vlassis, N. (2007). A cross-entropy approach to solving DEC-POMDPs. In: 1st International Symposium on Intelligent and Distributed Computing.
Oliehoek F. A., Kooij J. F. P., Vlassis N. (2008) The cross-entropy method for policy search in decentralized POMDPs. Informatica 32: 341–357
Oliehoek F. A., Spaan M. T. J., Vlassis N. (2008) Optimal and approximate q-value functions for decentralized POMDPs. Journal of Artificial Intelligence Research 32: 289–353
Oliehoek, F. A., Vlassis, N. (2007). Q-value functions for decentralized POMDPs. In: Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, 2007, pp. 833–840.
Oliehoek, F. A., Whiteson, S., Spaan, M. T. J. (2009). Lossless clustering of histories in decentralized POMDPs. In: Proceedings of the 8th International Joint Conference on Autonomous Agents and Multi-Agent Systems, pp. 577–584.
Papadimitriou C. H., Tsitsiklis J. N. (1987) The complexity of markov decision processes. Mathematics of Operations Research 12: 441–450
Poupart, P., Boutilier, C. (2003). Bounded finite state controllers. In Proceedings of Advances in Neural Information Processing Systems, Vol. 16.
Pynadath D., Tambe M. (2002) The communicative multiagent team decision problem: Analyzing teamwork theories, models. Journal of Artificial Intelligence Research 16: 389–423
Rabinovich, Z., Goldman, C. V., Rosenschein, J. S. (2003). The complexity of multiagent systems: The price of silence. In: Proceedings of the Second International Joint Conference on Autonomous Agents and Multiagent Systems(AAMAS), pp. 1102–1103.
Rechenberg I. (1973) Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Fromman-Holzboog Verlag, Stuttgart
Regan, K., Cohen, R., Poupart, P. (2005). The advisor-POMDP: A principled approach to trust through reputation in electronic markets. In: In Proceedings of the Third Annual Conference on Privacy, Security and Trust (PST), pp. 121–130.
Sanchez, G., Jimenez, F., Vasant, P. (2007). Fuzzy optimization with multi-objective evolutionary algorithms: a case study. In: Proceedings of the 2007 IEEE Symposium on Computational Intelligence in Multicriteria Decision Making (MCDM), pp. 58–64.
Seuken, S., Zilberstein, S. (2007). Improved memory bounded dynamic programming for decentralized POMDPs. In: Twenty-Third Conference on Uncertainty in Artificial Intelligences (UAI-07), Vancouver, BC, Canada (pp. 344–351).
Seuken, S., Zilberstein, S. (2007). Memory bounded dynamic programming for decentralized POMDPs. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2009–2015.
Spaan, M. T. J. (2011). DEC-POMDP. http://users.isr.ist.utl.pt/~mtjspaan/decpomdp/.
Spaan, M. T. J., Oliehoek, F. A., Amato, C. (2011). Scaling up optimal heuristic search in DEC-POMDPs via incremental expansion. In: Proceedings of International Joint Conference on Artificial Intelligence.
Spaan, M. T. J., Vlassis, N. (2004). A point-based POMDP algorithm for robot planning. In: Proceedings of the IEEE International Conference on Robotics and Automation, pp. 2399–2404.
Szer, D., Charpillet, F. (2005). An optimal best-first search algorithm for solving infinite horizon DEC-POMDPs. In: ECML, pp. 389–399.
Szer, D., Charpillet, F. (2006). Point-based dynamic programming for DEC-POMDPs. In: Proceedings of the 21st National Conference on Artificial Intelligence, pp. 1233–1238.
Szer, D., Charpillet, F., Zilberstein, S. (2005). Maa*: A heuristic search algorithm for solving decentralized POMDPs. In: Twenty-First Conference on Uncertainty in Artificial Intelligence(UAI). Scotland: Edinburgh.
Wang F. (2001) Confidence interval for the mean of non-normal data. Quality, Reliability Engineering International 17: 57–267
Williams, J. D., Poupart, P., Young, S. (2005). Partially observable markov decision processes with continuous observations for dialogue management. In: Proceedings of the 6th SigDial Workshop on Discourse and Dialogue.
Wu, F., Chen, X. (2008). Solving large-scale and sparse-reward DEC-POMDPs with correlation-mdps. In: RoboCup 2007: Robot Soccer World Cup XI, pp. 208–219.
Wu, F., Zilberstein, S., Chen, X. (2009). Multi-agent online planning with communication. In: Proceedings of the International Conference on Automated Planning and Scheduling, pp. 321–329. Thessaloniki, Greece.
Wu, F., Zilberstein, S., Chen, X. (2010). Point-based policy generation for decentralized POMDPs. In: Proceedings of the Ninth International Conference on Autonomous Agents and Multiagent Systems, pp. 1307–1314.
Wu F., Zilberstein S., Chen X. (2011) Online planning for multi-agent systems with bounded communication. Artificial Intelligence 175: 487–511
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Eker, B., Akın, H.L. Solving decentralized POMDP problems using genetic algorithms. Auton Agent Multi-Agent Syst 27, 161–196 (2013). https://doi.org/10.1007/s10458-012-9204-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10458-012-9204-y