Abstract
The optimal control problem with a long run average cost is investigated for unknown linear discrete-time systems with additive noise. The authors propose a value iteration-based stochastic adaptive dynamic programming (VI-based SADP) algorithm, based on which the optimal controller is obtained. Different from the existing relevant work, the algorithm does not need to estimate the expectation (conditional expectation) and variance (conditional variance) of states or other relevant variables, and the convergence of the algorithm can be proved rigorously. A simulation example is given to verify the effectiveness of the proposed approach.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Lewis F L, Vrabie D L, and Syrmos V L, Optimal Control, John Wiley & Sons Inc., Hoboken, 2012.
Guo J, Zhang J F, and Zhao Y L, Adaptive tracking of a class of first-order systems with binary-valued observations and fixed thresholds, Journal of Systems Science and Complexity, 2012, 25(6): 1041–1051.
Jiang Y and Jiang Z P, A robust adaptive dynamic programming principle for sensorimotor control with signal-dependent noise, Journal of Systems Science and Complexity, 2015, 28(2): 261–288.
Chen H F, Noisy observation based stabilization and optimization for unknown systems, Journal of Systems Science and Complexity, 2003, 16(3): 315–326.
Tang Q Y and Chen H F, Optimal adaptive control with constraint for ARMAX model, Journal of Systems Science and Complexity, 1991, 4(3): 254–263.
Li X X, Peng Z H, Jiao L, et al., Online adaptive Q-learning method for fully cooperative linear quadratic dynamic games, Science China Information Sciences, 2019, 62(12): 1–14.
Kiumarsi B, Lewis F L, and Jiang Z P, H∞ control of linear discrete-time systems: Off-policy reinforcement learning, Automatica, 2017, 78: 144–152.
Lewis F L and Vamvoudakis K G, Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 41(1): 14–25.
Rizvi S A A and Lin Z L, Output feedback Q-learning control for the discrete-time linear quadratic regulator problem, IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(5): 1523–1536.
Kiumarsi B, Lewis F L, Modares H, et al., Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics, Automatica, 2014, 50(4): 1167–1175.
Jiang Y, Fan J L, Chai T Y, et al., Tracking control for linear discrete-time networked control systems with unknown dynamics and dropout, IEEE Transactions on Neural Networks and Learning Systems, 2017, 29(10): 4607–4620.
He P and Jagannathan S, Reinforcement learning-based output feedback control of nonlinear systems with input constraints, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2005, 35(1): 150–154.
Wei Q L and Liu D R, A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems, Science China Information Sciences, 2015, 58(12): 1–15.
Wang D, Liu D R, Li H L, et al., An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2015, 46(5): 713–717.
Liu R R, Li Y, and Liu X K, Linear-quadratic optimal control for unknown mean-field stochastic discrete-time system via adaptive dynamic programming approach, Neurocomputing, 2018, 282: 16–24.
Liu X K, Liu R R, and Li Y, Infinite time linear quadratic stackelberg game problem for unknown stochastic discrete-time systems via adaptive dynamic programming approach, Asian Journal of Control, 2021, 23(2): 937–948.
Gravell B, Ganapathy K, and Summers T, Policy iteration for linear quadratic games with stochastic parameters, IEEE Control Systems Letters, 2020, 5(1): 307–312.
Wang J S and Yang G H, Output-feedback control of unknown linear discrete-time systems with stochastic measurement and process noise via approximate dynamic programming, IEEE Transactions on Cybernetics, 2017, 48(7): 1977–1988.
Han K Z, Feng J, and Yao Y, An integrated data-driven Markov parameters sequence identification and adaptive dynamic programming method to design fault-tolerant optimal tracking control for completely unknown model systems, Journal of the Franklin Institute, 2017, 354(13): 5280–5301.
Wong W C and Lee J H, A reinforcement learning-based scheme for direct adaptive optimal control of linear stochastic systems, Optimal Control Applications and Methods, 2010, 31(4): 365–374.
Yaghmaie F A and Gustafsson F, Using reinforcement learning for model-free linear quadratic control with process and measurement noises, Proceedings of the 58th IEEE Conference on Decision and Control (CDC), Nice, France, Dec. 11–13, 2019, 6510–6517.
Abbasi-Yadkori Y, Lazić N, and Szepesvári C, Model-free linear quadratic control via reduction to expert prediction, Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), Okinawa, Japan, Apr. 16–18, 2019, 3108–3117.
Xu X, Chen H, Lian C Q, et al., Learning-based predictive control for discrete-time nonlinear systems with stochastic disturbances, IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(12): 6202–6213.
Liang M M, Wang D, and Liu D R, Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2019, 50(11): 3972–3985.
Liang M M, Wang D, and Liu D R, Improved value iteration for neural-network-based stochastic optimal control design, Neural Networks, 2020, 124: 280–295.
M’sahli F, Fayeche C, Abdennour R B, et al., Application of adaptive controllers for the temperature control of a semi-batch reactor, International Journal of Computational Engineering Science, 2001, 2(2): 287–307.
Haas S M, Frei M G, Osorio I, et al., EEG ocular artifact removal through ARMAX model system identification using extended least squares, Communications in Information and Systems, 2003, 3(1): 19–40.
Deisenroth M P, Fox D, and Rasmussen C E, Gaussian processes for data-efficient learning in robotics and control, IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 37(2): 408–423.
Sethi S P, Suo W, Taksar M I, et al., Optimal production planning in a multi-product stochastic manufacturing system with long-run average cost, Discrete Event Dynamic Systems, 1998, 8(1): 37–54.
Borkar V S, Ergodic control of diffusion processes, Proceedings of the International Congress of Mathematicians (ICM), Madrid, Spain, 2006: 1299–1309.
Chen H F and Guo L, Optimal stochastic adaptive control with quadratic index, International Journal of Control, 1986, 43(3): 869–881.
Chen H F and Guo L, Stochastic adaptive control for a general quadratic cost, Journal of Systems Science and Mathematical Sciences, 1987, 7(4): 289–302.
Guo L, Self-convergence of weighted least-squares with applications to stochastic adaptive control, IEEE Transactions on Automatic Control, 1996, 41(1): 79–89.
Sutton R S, Barto A G, and Williams R J, Reinforcement learning is direct adaptive optimal control, IEEE Control Systems Magazine, 1992, 12(2): 19–22.
Ma C Q, Li T, and Zhang J F, Linear quadratic decentralized dynamic games for large population discrete-time stochastic multi-agent systems, Journal of Systems Science and Mathematical Sciences, 2007, 27(3): 464–480.
Chen H F and Guo L, Identification and Stochastic Adaptive Control, Springer Science & Business Media, New York, 1991.
Gao W N, Jiang Y, Jiang Z P, et al., Output-feedback adaptive optimal control of interconnected systems based on robust adaptive dynamic programming, Automatica, 2016, 72: 37–45.
Lancaster P and Rodman L, Algebraic Riccati Equations, Oxford University Press Inc., New York, 1995.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by the National Natural Science Foundation of China under Grant No. 61673284 and the Science Development Project of Sichuan University under Grant No. 2020SCUNL201.
Rights and permissions
About this article
Cite this article
Yang, X., Liu, S. Optimal Control of Unknown Discrete-Time Linear Systems with Additive Noise. J Syst Sci Complex 36, 591–612 (2023). https://doi.org/10.1007/s11424-023-1352-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11424-023-1352-4