Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Nitti, Davide; Belle, Vaishak; De Raedt, Luc

doi:10.1007/978-3-319-23525-7_20

Davide Nitti¹⁰,
Vaishak Belle¹⁰ &
Luc De Raedt¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9285))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

4333 Accesses
5 Citations

Abstract

Real-world planning problems frequently involve mixtures of continuous and discrete state variables and actions, and are formulated in environments with an unknown number of objects. In recent years, probabilistic programming has emerged as a natural approach to capture and characterize such complex probability distributions with general-purpose inference methods. While it is known that a probabilistic programming language can be easily extended to represent Markov Decision Processes (MDPs) for planning tasks, solving such tasks is challenging. Building on related efforts in reinforcement learning, we introduce a conceptually simple but powerful planning algorithm for MDPs realized as a probabilistic program. This planner constructs approximations to the optimal policy by importance sampling, while exploiting the knowledge of the MDP model. In our empirical evaluations, we show that this approach has wide applicability on domains ranging from strictly discrete to strictly continuous to hybrid ones, handles intricacies such as unknown objects, and is argued to be competitive given its generality.

Download to read the full chapter text

Chapter PDF

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

An Experimental Comparison of Classical, FOND and Probabilistic Planning

Markov Decision Processes with Functional Rewards

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Couetoux, A.: Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems. Université Paris Sud - Paris XI, Thesis (2013)
Google Scholar
De Raedt, L., Kersting, K.: Probabilistic inductive logic programming. In: De Raedt, L., Frasconi, P., Kersting, K., Muggleton, S.H. (eds.) Probabilistic Inductive Logic Programming. LNCS (LNAI), vol. 4911, pp. 1–27. Springer, Heidelberg (2008)
Chapter Google Scholar
Driessens, K., Ramon, J.: Relational instance based regression for relational reinforcement learning. In: Proc. ICML (2003)
Google Scholar
Feng, Z., Dearden, R., Meuleau, N., Washington, R.: Dynamic programming for structured continuous Markov decision problems. In: Proc. UAI (2004)
Google Scholar
Forbes, J., André, D.: Representations for learning control policies. In: Proc. of the ICML Workshop on Development of Representations (2002)
Google Scholar
Goodman, N., Mansinghka, V.K., Roy, D.M., Bonawitz, K., Tenenbaum, J.B.: Church: A language for generative models. In: Proc. UAI, pp. 220–229 (2008)
Google Scholar
Gutmann, B., Thon, I., Kimmig, A., Bruynooghe, M., De Raedt, L.: The magic of logical inference in probabilistic programming. Theory and Practice of Logic Programming (2011)
Google Scholar
Kearns, M., Mansour, Y., Ng, A.Y.: A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes. Machine Learning (2002)
Google Scholar
Keller, T., Eyerich, P.: PROST: probabilistic planning based on UCT. In: Proc. ICAPS (2012)
Google Scholar
Kimmig, A., Santos Costa, V., Rocha, R., Demoen, B., De Raedt, L.: On the efficient execution of problog programs. In: Garcia de la Banda, M., Pontelli, E. (eds.) ICLP 2008. LNCS, vol. 5366, pp. 175–189. Springer, Heidelberg (2008)
Chapter Google Scholar
Kocsis, L., Szepesvári, C.: Bandit based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
Chapter Google Scholar
Lang, T., Toussaint, M.: Planning with Noisy Probabilistic Relational Rules. Journal of Artificial Intelligence Research 39, 1–49 (2010)
MATH Google Scholar
Mansley, C.R., Weinstein, A., Littman, M.L.: Sample-Based planning for continuous action markov decision processes. In: Proc. ICAPS (2011)
Google Scholar
Meuleau, N., Benazera, E., Brafman, R.I., Hansen, E.A., Mausam, M.: A heuristic search approach to planning with continuous resources in stochastic domains. Journal of Artificial Intelligence Research 34(1), 27 (2009)
Google Scholar
Milch, B., Marthi, B., Russell, S., Sontag, D., Ong, D., Kolobov, A.: BLOG: probabilistic models with unknown objects. In: Proc. IJCAI (2005)
Google Scholar
Munos, R.: From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning. Foundations and Trends in Machine Learning, Now Publishers (2014)
Google Scholar
Nitti, D., De Laet, T., De Raedt, L.: A particle filter for hybrid relational domains. In: Proc. IROS (2013)
Google Scholar
Nitti, D., De Laet, T., De Raedt, L.: Relational object tracking and learning. In: Proc. ICRA (2014)
Google Scholar
Owen, A.B.: Monte Carlo theory, methods and examples (2013)
Google Scholar
Peshkin, L., Shelton, C.R.: Learning from scarce experience. In: Proc. ICML, pp. 498–505 (2002)
Google Scholar
Precup, D., Sutton, R.S., Singh, S.P.: Eligibility traces for off-policy policy evaluation. In: Proc. ICML (2000)
Google Scholar
Sanner, S.: Relational Dynamic Influence Diagram Language (RDDL): Language Description (unpublished paper)
Google Scholar
Sanner, S., Delgado, K.V., de Barros, L.N.: Symbolic dynamic programming for discrete and continuous state MDPs. In: Proc. UAI (2011)
Google Scholar
Shelton, C.R.: Policy improvement for POMDPs using normalized importance sampling. In: Proc. UAI, pp. 496–503 (2001)
Google Scholar
Shelton, C.R.: Importance Sampling for Reinforcement Learning with Multiple Objectives. Ph.D. thesis, MIT (2001)
Google Scholar
Smart, W.D., Kaelbling, L.P.: Practical reinforcement learning in continuous spaces. In: Proc. ICML (2000)
Google Scholar
Srivastava, S., Russell, S., Ruan, P., Cheng, X.: First-order open-universe POMDPs. In: Proc. UAI (2014)
Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press (1998)
Google Scholar
Van den Broeck, G., Thon, I., van Otterlo, M., De Raedt, L.: DTProbLog: a decision-theoretic probabilistic prolog. In: Proc. AAAI (2010)
Google Scholar
Vien, N.A., Toussaint, M.: Model-Based relational RL when object existence is partially observable. In: Proc. ICML (2014)
Google Scholar
Walsh, T.J., Goschin, S., Littman, M.L.: Integrating sample-based planning and model-based reinforcement learning. In: Proc. AAAI (2010)
Google Scholar
Wiering, M., van Otterlo, M.: Reinforcement learning: state-of-the-art. In: Adaptation, Learning, and Optimization. Springer (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, KU, Leuven, Belgium
Davide Nitti, Vaishak Belle & Luc De Raedt

Authors

Davide Nitti
View author publications
You can also search for this author in PubMed Google Scholar
Vaishak Belle
View author publications
You can also search for this author in PubMed Google Scholar
Luc De Raedt
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Davide Nitti .

Editor information

Editors and Affiliations

University of Bari Aldo Moro, Bari, Italy
Annalisa Appice
University of Porto, Porto, Portugal
Pedro Pereira Rodrigues
Universidade do Porto, Porto, Portugal
Vítor Santos Costa
University of Porto - INESC TEC, Porto, Portugal
João Gama
University of Porto - INESC TEC, Porto, Portugal
Alípio Jorge
University of Porto - INESC TEC, Porto, Portugal
Carlos Soares

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nitti, D., Belle, V., De Raedt, L. (2015). Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming. In: Appice, A., Rodrigues, P., Santos Costa, V., Gama, J., Jorge, A., Soares, C. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2015. Lecture Notes in Computer Science(), vol 9285. Springer, Cham. https://doi.org/10.1007/978-3-319-23525-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-23525-7_20
Published: 29 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23524-0
Online ISBN: 978-3-319-23525-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Abstract

Chapter PDF

Similar content being viewed by others

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

An Experimental Comparison of Classical, FOND and Probabilistic Planning

Markov Decision Processes with Functional Rewards

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Planning in Discrete and Continuous Markov Decision Processes by Probabilistic Programming

Abstract

Chapter PDF

Similar content being viewed by others

A Partially-Observable Markov Decision Process for Dealing with Dynamically Changing Environments

An Experimental Comparison of Classical, FOND and Probabilistic Planning

Markov Decision Processes with Functional Rewards

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation