Abstract
In this paper, we are interested in optimal decisions in a partially observable universe. Our approach is to directly approximate an optimal strategic tree depending on the observation. This approximation is made by means of a parameterized probabilistic law. A particular family of Hidden Markov Models (HMM), with input and output, is considered as a model of policy. A method for optimizing the parameters of these HMMs is proposed and applied. This optimization is based on the cross-entropic (CE) principle for rare events simulation developed by Rubinstein.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Bakker, B., Schmidhuber J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In Proceedings of the 8th Conference on Intelligent Autonomous Systems, pp. 438–445. Amsterdam, The Netherlands (2004)
Bellman R. (1957): Dynamic Programming. Princeton University Press, Princeton, New Jersey
de Boer, P.-T., Kroesse, D.P., Mannor, S., Rubinstein, R.Y.: A tutorial on the cross-entropy method, http://www.cs.utwente.nl/~ptdeboer/ce/
Cassandra A.R. (1998): Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University, Rhode Island, Providence
Fine S., Singer Y., Tishby N. (1998): The hierarchical hidden markov model: Analysis and Application. Machine Learning 32(1): 41–62
Homem-de-Mello, T., Rubinstein, R.Y.: Rare event estimation for static models via cross-entropy and importance sampling. http://users.iems.nwu.edu/~tito/list.htm
Meuleau, N., Peshkin, L., Kim, K-E., Kaelbling, L.P.: Learning finite-state controllers for partially observable environments. In Proc. of UAI-99, pp. 427–436. Stockholm (1999)
Murphy, K., Paskin, M.: Linear time inference in hierarchical HMMs. In: Proceedings of Neural Information Processing Systems, Vancouver, Canada (2001)
Rubinstein R., Kroese D.P. (2004): The Cross-Entropy method. An unified approach to Combinatorial Optimization, Monte-Carlo Simulation, and Machine Learning. Information Science & Statistics, Springer, Berlin
Sondik E.J. (1971): The optimal control of partially observable markov processes. PhD thesis, Stanford University, Stanford, California
Sutton R.J., Barto A.G. (2000): Reinforcement Learning. MIT Press, Cambridge, MA
Theocharous, G: Hierarchical learning and planning in partially observable markov decision processes. PhD thesis, Michigan State University (2002)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Dambreville, F. Cross-entropic learning of a machine for the decision in a partially observable universe. J Glob Optim 37, 541–555 (2007). https://doi.org/10.1007/s10898-006-9061-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10898-006-9061-9