Domain independent heuristics for online stochastic contingent planning

Blumenthal, Oded; Shani, Guy

doi:10.1007/s10472-024-09947-5

Domain independent heuristics for online stochastic contingent planning

Research
Open access
Published: 08 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Annals of Mathematics and Artificial Intelligence Aims and scope Submit manuscript

Domain independent heuristics for online stochastic contingent planning

Download PDF

Oded Blumenthal¹ &
Guy Shani¹

116 Accesses
Explore all metrics

Abstract

Partially observable Markov decision processes (POMDP) are a useful model for decision-making under partial observability and stochastic actions. Partially Observable Monte-Carlo Planning (POMCP) is an online algorithm for deciding on the next action to perform, using a Monte-Carlo tree search approach, based on the UCT algorithm for fully observable Markov-decision processes. POMCP develops an action-observation tree, and at the leaves, uses a rollout policy to provide a value estimate for the leaf. As such, POMCP is highly dependent on the rollout policy to compute good estimates, and hence identify good actions. Thus, many practitioners who use POMCP are required to create strong, domain-specific heuristics. In this paper, we model POMDPs as stochastic contingent planning problems. This allows us to leverage domain-independent heuristics that were developed in the planning community. We suggest two heuristics, the first is based on the well-known \(h_{add}\) heuristic from classical planning, and the second is computed in belief space, taking the value of information into account.

Article PDF

Unavoidable deadends in deterministic partially observable contingent planning

Article 29 October 2022

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agrawal, Rajeev: Sample mean based index policies by o (log n) regret for the multi-armed bandit problem. Adv. Appl. Probab. 27(4), 1054–1078 (1995)
Article MathSciNet Google Scholar
Alagoz, O., Hsu, H., Schaefer, A.J., Roberts, M.S.: Markov decision processes: a tool for sequential decision making under uncertainty. Med. Decision Making 30(4), 474–483 (2010)
Article Google Scholar
Albore, A., Geffner, H.: Acting in partially observable environments when achievement of the goal cannot be guaranteed. In: Workshop on Planning and Plan Execution for Real-World Systems, at 19th Int. Conf. on Planning and Scheduling (ICAPS ’09). (2009)
Albore, A., Palacios, H., Geffner, H.: A translation-based approach to contingent planning. In: IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009. pp. 1623–1628 (2009)
Albore, A., Palacios, H., Geffner, H.: A translation-based approach to contingent planning. In: IJCAI, vol. 9. pp. 1623–1628 (2009). https://doi.org/10.5555/1661445.1661706, https://dl.acm.org/doi/10.5555/1661445.1661706
Albore, A., Ramírez, M., Geffner, H.: Effective heuristics and belief tracking for planning with incomplete information. In: Proceedings of the 21st International Conference on Automated Planning and Scheduling, ICAPS 2011, Freiburg, Germany June 11-16, 2011. (2011)
Audibert, J.-Y., Munos, R., Szepesvári, C.: Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410(19), 1876–1902 (2009)
Article MathSciNet Google Scholar
Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47, 235–256 (2002)
Article Google Scholar
Bellman, R.: Dynamic programming. Courier Corporation (1957)
Bellman, R.: A markovian decision process. J. Math. Mech. 679–684 (1957)
Bertoli, P., Cimatti, A., Roveri, M., Traverso, P.: Strong planning under partial observability. Artif. Intell. 170(4–5), 337–384 (2006)
Article MathSciNet Google Scholar
Bonet, B., Geffner, H.: Planning as heuristic search. Artif. Intell. 129(1–2), 5–33 (2001). https://doi.org/10.1016/S0004-3702(01)00108-4
Article MathSciNet Google Scholar
Bonet, B., Geffner, H.: Solving pomdps: Rtdp-bel vs. point-based algorithms. In: Boutilier, C., (ed) IJCAI 2009, Proceedings of the 21st International Joint Conference on Artificial Intelligence, Pasadena, California, USA, July 11-17, 2009. pp. 1641–1646 (2009). https://doi.org/10.5555/1661445.1661709, https://dl.acm.org/doi/10.5555/1661445.1661709
Bonet, B., Geffner, H.: Planning under partial observability by classical replanning: theory and experiments. In: IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, Barcelona, Catalonia, Spain, July 16-22, 2011. pp. 1936–1941 (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-324
Bonet, B., Geffner, H.: action selection for mdps: anytime ao* versus uct. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 26. pp. 1749–1755 (2012)
Brafman, R.I., Shani, G.: A multi-path compilation approach to contingent planning. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, July 22-26, 2012, Toronto, Ontario, Canada. (2012)
Brafman, R.I., Shani, G.: Replanning in domains with partial information and sensing actions. J. Artif. Intell. Res. 45, 565–600 (2012). https://doi.org/10.1613/jair.3711
Article MathSciNet Google Scholar
Brafman, R.I., Shani, G.: Online belief tracking using regression for contingent planning. Artif. Intell. 241, 131–152 (2016). https://doi.org/10.1016/j.artint.2016.08.005
Article MathSciNet Google Scholar
Cai, P., Luo, Y., Hsu, D., Lee, W.S.: Hyp-despot: a hybrid parallel algorithm for online planning under uncertainty. Int. J. Robotics Res. 40(2–3), 558–573 (2021)
Cassandra, A.R., Littman, M.L., Zhang, N.L.: Incremental pruning: a simple, fast, exact method for partially observable markov decision processes. (2013). arXiv:1302.1525
Michael Chung. Monte carlo planning in rts games. (2005)
Rémi Coulom. Efficient selectivity and backup operators in monte-carlo tree search. In: International conference on computers and games. pp. 72–83. Springer (2006)
Geffner, H., Bonet, B.: Solving large pomdps using real time dynamic programming. In Working Notes Fall AAAI Symposium on POMDPs, vol. 218. (1998)
Geffner, H., Bonet, B.: A concise introduction to models and methods for automated planning. Springer Nature (2022)
Haslum, P., Botea, A., Helmert, M., Bonet, B., Koenig, S., et al.: Domain-independent construction of pattern database heuristics for cost-optimal planning. AAAI 7, 1007–1012 (2007)
Google Scholar
Helmert, M., Domshlak, C.: Landmarks, critical paths and abstractions: What’s the difference anyway? In: Brim, L., Edelkamp, S., Hansen, E.A., Sanders, P. (eds) Graph Search Engineering, 29.11. - 04.12.2009, volume 09491 of Dagstuhl Seminar Proceedings. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, Germany (2009). https://doi.org/10.1609/icaps.v19i1.13370, http://drops.dagstuhl.de/opus/volltexte/2010/2432/
Helmert, M., Haslum, P., Hoffmann, J., Nissim, R.: Merge-and-shrink abstraction: a method for generating lower bounds in factored state spaces. J. ACM 61(3), 1–63 (2014). https://doi.org/10.1145/2559951
Hoffmann, J., Nebel, B.: The FF planning system: fast plan generation through heuristic search. JAIR 14, 253–302 (2001). https://doi.org/10.1613/jair.855
Hoffmann, J., Brafman, R.I.: Contingent planning via heuristic forward search witn implicit belief states. In: Biundo, S., Myers, K.L., Rajan, K. (eds) Proceedings of the Fifteenth International Conference on Automated Planning and Scheduling (ICAPS 2005), June 5-10 2005, Monterey, California, USA. pp. 71–80. AAAI (2005). https://doi.org/10.5555/3037062.3037072, http://www.aaai.org/Library/ICAPS/2005/icaps05-008.php
Hörger, M., Kurniawati, H., Elfes, V.: Multilevel monte-carlo for solving pomdps online. In: Asfour, T., Yoshida, E., Park, J., Christensen, H., Khatib, O. (eds) Robotics Research - The 19th International Symposium ISRR 2019, Hanoi, Vietnam, October 6-10, 2019, vol. 20 of Springer Proceedings in Advanced Robotics. pp. 174–190. Springer (2019). https://doi.org/10.1007/978-3-030-95459-8_11
Howard, R.A.: Information value theory. IEEE Transactions on systems science and cybernetics 2(1), 22–26 (1966). https://doi.org/10.1109/TSSC.1966.300074
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artif. Intell. 101(1–2), 99–134 (1998)
Article MathSciNet Google Scholar
Keller, T., Eyerich, P.: PROST: probabilistic planning based on UCT. In: McCluskey, L., Williams, B.C., Silva, J.R., Bonet, B. (eds) Proceedings of the Twenty-Second International Conference on Automated Planning and Scheduling, ICAPS 2012, Atibaia, São Paulo, Brazil, June 25-19, 2012. AAAI (2012). https://doi.org/10.1609/icaps.v22i1.13518, http://www.aaai.org/ocs/index.php/ICAPS/ICAPS12/paper/view/4715
Keller, T., Helmert, M.: Trial-based heuristic tree search for finite horizon mdps. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 23, pages 135–143, 2013. https://doi.org/10.1609/icaps.v23i1.13557, http://www.aaai.org/ocs/index.php/ICAPS/ICAPS13/paper/view/6026
Kim, S.-K., Salzman, O., Likhachev, M.: Pomhdp: search-based belief space planning using multiple heuristics. In: Proceedings of the International Conference on Automated Planning and Scheduling, volume 29. pp. 734–744. (2019). https://doi.org/10.1609/icaps.v29i1.3542, https://ojs.aaai.org/index.php/ICAPS/article/view/3542
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: European conference on machine learning. pp. 282–293. Springer (2006)
Kurniawati, H.: Partially observable markov decision processes and robotics. Ann. Rev. Control Robot. Auto. Syst. 5, 253–277 (2022). https://doi.org/10.1146/annurev-control-042920-092451
Article Google Scholar
Kurniawati, H., Yadav, V.: An online pomdp solver for uncertainty planning in dynamic environment. In: Robotics Research: The 16th International Symposium ISRR. pp. 611–629. Springer (2016). https://doi.org/10.1007/978-3-319-28872-7_35
Kurniawati, H., Hsu, D., Lee, W.S.: Sarsop: efficient point-based pomdp planning by approximating optimally reachable belief spaces. In: Robotics: Science and Systems, vol. 2008. Citeseer (2008)
Luo, Y., Bai, H., Hsu, D., Lee, W.S.: Importance sampling for online planning under uncertainty. Int. J Robot. Res. 38(2–3), 162–181 (2019). https://doi.org/10.1177/0278364918780322
Article Google Scholar
Muise, C.J., Belle, V., McIlraith, S.A.: Computing contingent plans via fully observable non-deterministic planning. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City, Québec, Canada. pp. 2322–2329 (2014)
Pineau, J., Gordon, G., Thrun, S., et al.: Point-based value iteration: an anytime algorithm for pomdps. Ijcai 3, 1025–1032 (2003)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons (2014)
Richter, S., Helmert, M., Westphal, M.: Landmarks revisited. AAAI 8, 975–982 (2008) http://www.aaai.org/Library/AAAI/2008/aaai08-155.php
J. Rintanen: Regression for classical and nondeterministic planning. In: ECAI 2008. pp. 568–572. IOS Press (2008). https://doi.org/10.3233/978-1-58603-891-5-568
Ross, S., Pineau, J., Paquet, S., Chaib-Draa, B.: Online planning algorithms for pomdps. J Artif. Intell. Res. 32, 663–704 (2008). https://doi.org/10.1613/jair.2567
Article MathSciNet Google Scholar
Saborío, J.C., Hertzberg, J.: Planning under uncertainty through goal-driven action selection. In: Agents and Artificial Intelligence: 10th International Conference, ICAART 2018, Funchal, Madeira, Portugal, January 16–18, 2018, Revised Selected Papers 10. pp. 182–201. Springer (2019). https://doi.org/10.1007/978-3-030-05453-3_9
Saborío, J.C., Hertzberg, J.: Efficient planning under uncertainty with incremental refinement. In: Uncertainty in Artificial Intelligence. pp. 303–312. PMLR (2020). https://doi.org/10.1109/TSMC.1987.4309045
Shani, G., Brafman, R.I.: Replanning in domains with partial information and sensing actions. In: IJCAI, volume 2011. pp. 2021–2026. Citeseer (2011). https://doi.org/10.5591/978-1-57735-516-8/IJCAI11-337
Shani, G., Brafman, R.I., Shimony, S.E.: Forward search value iteration for pomdps. In: IJCAI. pp. 2619–2624. Citeseer (2007)
Shani, G., Pineau, J., Kaplow, R.: A survey of point-based pomdp solvers. Auton. Agent. Multi-Agent Syst. 27, 1–51 (2013). https://doi.org/10.1007/s10458-012-9200-2
Article Google Scholar
Shen, W., Trevizan, F., Toyer, S., Thiébaux, S., Xie, L.: Guiding search with generalized policies for probabilistic planning. Proc. Int. Symp. Comb. Search 10, 97–105 (2019). https://doi.org/10.1609/socs.v10i1.18507
Article Google Scholar
Shen, W., Trevizan, F., Thiébaux, S.: Learning domain-independent planning heuristics with hypergraph networks. Proc Int. Conf. Autom. Plann. Sched. 30, 574–584 (2020)
Google Scholar
Silver, D., Veness, J.: Monte-carlo planning in large pomdps. Adv. Neural Inf. Process. Syst. 23 (2010). https://doi.org/10.5555/2997046.2997137
Smallwood, R.D., Sondik, E.J.: The optimal control of partially observable markov processes over a finite horizon. Oper. Res. 21(5), 1071–1088 (1973). https://doi.org/10.1287/opre.21.5.1071
Article Google Scholar
Smith, T., Simmons, R.: Heuristic search value iteration for pomdps. (2012). arXiv:1207.4166
Somani, A., Ye, N., Hsu, D., Lee, W.S.: Despot: online pomdp planning with regularization. Adv. Neural Inf. Process Syst. 26 (2013). https://doi.org/10.1613/jair.5328
Sondik, E.J.: The optimal control of partially observable markov processes over the infinite horizon: discounted costs. Oper. Res. 26(2), 282–304 (1978)
Spaan, M.T.J., Vlassis, N.: Perseus: randomized point-based value iteration for pomdps. J. Artif. Intell. Res. 24, 195–220 (2005)
Sunberg, Z., Kochenderfer, M.: Online algorithms for pomdps with continuous state, action, and observation spaces. In: Proceedings of the International Conference on Automated Planning and Scheduling, vol. 28. pp. 259–263 (2018). https://doi.org/10.48550/arXiv.1709.06196
Trunda, O., Barták, R.: Deep learning of heuristics for domain-independent planning. ICAART (2). pp. 79–88 (2020)
Xiao, Y., Katt, S., ten Pas, A., Chen, S., Amato, C.: Online planning for target object search in clutter under partial observability. 2019 International Conference on Robotics and Automation (ICRA). pp. 8241–8247. IEEE (2019). https://doi.org/10.1109/ICRA.2019.8793494
Ye, N., Somani, A., Hsu, D., Lee, W.S.: Despot: Online pomdp planning with regularization. J. Artif. Intell. Res. 58, 231–266 (2017). https://doi.org/10.1613/jair.5328
Article MathSciNet Google Scholar

Download references

Funding

Open access funding provided by Ben-Gurion University.

Author information

Authors and Affiliations

Software and Information Systems Engineering, Ben Gurion University, Be’er Sheva, Israel
Oded Blumenthal & Guy Shani

Authors

Oded Blumenthal
View author publications
You can also search for this author in PubMed Google Scholar
Guy Shani
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

O.B. : ideas, implementation, experiments. G.S. : ideas, implementation, writing.

Corresponding author

Correspondence to Oded Blumenthal.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Blumenthal, O., Shani, G. Domain independent heuristics for online stochastic contingent planning. Ann Math Artif Intell (2024). https://doi.org/10.1007/s10472-024-09947-5

Download citation

Accepted: 12 May 2024
Published: 08 July 2024
DOI: https://doi.org/10.1007/s10472-024-09947-5

Keywords

Mathematics Subject Classification (2010)

68T20

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Domain independent heuristics for online stochastic contingent planning

Abstract

Article PDF

Similar content being viewed by others

Unavoidable deadends in deterministic partially observable contingent planning

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Domain independent heuristics for online stochastic contingent planning

Abstract

Article PDF

Similar content being viewed by others

Unavoidable deadends in deterministic partially observable contingent planning

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation