Control as Inference?

Watson, Joe

doi:10.1007/978-3-030-41188-6_16

Joe Watson⁷

Part of the book series: Studies in Computational Intelligence ((SCI,volume 883))

3514 Accesses

Abstract

The use of probabilistic methods for solving stochastic optimal control and reinforcement learning problems is a burgeoning field. However, as the methodologies have been motivated from different fields, there is no unifying view of the various approaches. In this review we examine the two key, and distinct, model-based methods for continuous control: path integrals and linear Gaussian message passing. We show that, while the Bellman equation is at the foundation of each method, the path integral method uses inference to approximate the solution, while the message passing analytically solves an upper bound. Unifying these methods requires a further study of continuous-time likelihood functions and their connection to forward backward stochastic differential equations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.00; Price excludes VAT (USA)

Softcover Book: USD 159.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Reinforcement Learning for Control Using Value Function Approximation

References

Anderson, B.D., Moore, J.B.: Optimal filtering. Courier Corporation (2012)
Google Scholar
Aoki, M.: Optimization of stochastic systems: topics in discrete-time systems, vol. 32. Academic Press (1967)
Google Scholar
Attias, H.: Planning by probabilistic inference. In: Proc. of the 9th Int. Workshop on Artificial Intelligence and Statistics (2003)
Google Scholar
Bar-Shalom, Y.: Stochastic dynamic programming: Caution and probing. IEEE Transactions on Automatic Control (1981)
Google Scholar
Bell, B.M.: The iterated Kalman smoother as a Gauss-Newton method. SIAM Journal on Optimization (1994)
Google Scholar
van den Berg, J.: Extended LQR: Locally-optimal feedback control for systems with non-linear dynamics and non-quadratic cost. In: Robotics Research. Springer (2016)
Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag (2006)
Google Scholar
Bryson, A.E.: Applied optimal control: Optimization, estimation and control. Routledge (2018)
Google Scholar
Dabbous, T., Ahmed, N.: Parameter identification for partially observed diffusions. Journal of optimization theory and applications 75(1), 33–50 (1992)
Article MathSciNet Google Scholar
Exarchos, I., Theodorou, E.A.: Learning optimal control via forward and backward stochastic differential equations. In: 2016 American Control Conference (ACC), pp. 2155–2161. IEEE (2016)
Google Scholar
Fleming, W.H., Mitter, S.K.: Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics: An International Journal of Probability and Stochastic Processes 8(1), 63–77 (1982)
Google Scholar
Ghahramani, Z., Hinton, G.E.: Parameter estimation for linear dynamical systems. Tech. rep. (1996)
Google Scholar
Hoffmann, C., Rostalski, P.: Linear optimal control on factor graphs - a message passing perspective -. IFAC (International Federation of Automatic Control) (2017)
Google Scholar
Jacobson, D.H., Mayne, D.Q.: Differential dynamic programming (1970)
Google Scholar
Kappen, H.J.: Linear theory for control of nonlinear stochastic systems. Phys. Rev. Lett. 95, 200201 (2005). https://doi.org/10.1103/PhysRevLett.95.200201
Article MathSciNet Google Scholar
Kappen, H.J.: Path integrals and symmetry breaking for optimal control theory. Journal of Statistical Mechanics: Theory and Experiment 2005(11), P11011–P11011 (2005). https://doi.org/10.1088/1742-5468/2005/11/p11011
Article MathSciNet MATH Google Scholar
Kappen, H.J.: Optimal control theory and the linear bellman equation (2011)
Google Scholar
Kappen, H.J., Gómez, V., Opper, M.: Optimal control as a graphical model inference problem. In: Proceedings of the Twenty-Third International Conference on Automated Planning and Scheduling, ICAPS 2013 (2013)
Google Scholar
Klenske, E.D., Hennig, P.: Dual control for approximate bayesian reinforcement learning. Journal of Machine Learning Research (2016)
Google Scholar
Levine, S.: Motor skill learning with local trajectory methods. Ph.D. thesis, Stanford University (2014)
Google Scholar
Levine, S.: Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909 (2018)
Levine, S., Koltun, V.: Guided policy search. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013 (2013)
Google Scholar
Li, W., Todorov, E.: Iterative linear quadratic regulator design for nonlinear biological movement systems. In: ICINCO (1) (2004)
Google Scholar
Loeliger, H.A., Dauwels, J., Hu, J., Korl, S., Ping, L., Kschischang, F.R.: The factor graph approach to model-based signal processing. Proceedings of the IEEE (2007)
Google Scholar
Powell, W.B.: Approximate Dynamic Programming: Solving the Curses of Dimensionality (Wiley Series in Probability and Statistics). Wiley-Interscience, New York, NY, USA (2007)
Book Google Scholar
Ruiz, H.C., Kappen, H.J.: Particle smoothing for hidden diffusion processes: Adaptive path integral smoother. IEEE Transactions on Signal Processing 65(12), 3191–3203 (2017)
Article MathSciNet Google Scholar
Särkkä, S., Solin, A.: Applied stochastic differential equations, vol. 10. Cambridge University Press (2019)
Google Scholar
Theodorou, E., Tassa, Y., Todorov, E.: Stochastic differential dynamic programming. In: Proceedings of the 2010 American Control Conference, pp. 1125–1132. IEEE (2010)
Google Scholar
Theodorou, E.A.: Iterative path integral stochastic optimal control: Theory and applications to motor control. Ph.D. thesis, Los Angeles, CA, USA (2011). AAI3466115
Google Scholar
Toussaint, M.: Robot trajectory optimization using approximate inference. In: Proceedings of the 26th annual international conference on machine learning. ACM (2009)
Google Scholar
Toussaint, M., Storkey, A.: Probabilistic inference for solving discrete and continuous state markov decision processes. In: Proceedings of the 23rd international conference on Machine learning. ACM (2006)
Google Scholar
Watson, J., Abdulsamad, H., Peters, J.: Stochastic optimal control as approximate input inference. In: Proceedings of The 3rd Conference on Robot Learning. PMLR (2019)
Google Scholar
Whittle, P.: Likelihood and cost as path integrals. Journal of the Royal Statistical Society: Series B (Methodological) 53(3), 505–529 (1991)
MathSciNet MATH Google Scholar
Williams, G., Aldrich, A., Theodorou, E.A.: Model predictive path integral control: From theory to parallel computation. Journal of Guidance, Control, and Dynamics 40(2), 344–357 (2017)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Technische Universität Darmstadt, Darmstadt, Germany
Joe Watson

Authors

Joe Watson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joe Watson .

Editor information

Editors and Affiliations

Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Boris Belousov
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Hany Abdulsamad
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Pascal Klink
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Simone Parisi
Department of Computer Science, Technische Universität Darmstadt, Darmstadt, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Watson, J. (2021). Control as Inference?. In: Belousov, B., Abdulsamad, H., Klink, P., Parisi, S., Peters, J. (eds) Reinforcement Learning Algorithms: Analysis and Applications. Studies in Computational Intelligence, vol 883. Springer, Cham. https://doi.org/10.1007/978-3-030-41188-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-41188-6_16
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-41187-9
Online ISBN: 978-3-030-41188-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Control as Inference?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Reinforcement Learning for Control Using Value Function Approximation

Reinforcement Learning for Control Using Value Function Approximation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Control as Inference?

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

From Reinforcement Learning to Optimal Control: A Unified Framework for Sequential Decisions

Reinforcement Learning for Control Using Value Function Approximation

Reinforcement Learning for Control Using Value Function Approximation

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation