Skip to main content
Log in

Advanced value iteration for discrete-time intelligent critic control: A survey

  • Research
  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Optimal control problems are ubiquitous in practical engineering applications and social life with the idea of cost or resource conservation. Based on the critic learning scheme, adaptive dynamic programming (ADP) is regarded as a significant avenue to address the optimal control problems by combining the advanced design ideas such as adaptive control, reinforcement learning, and intelligent control. This survey introduces the recent development of ADP and related intelligent critic control with an emphasis on advanced value iteration (VI) schemes for discrete-time nonlinear systems. The theoretical results focus on convergence and stability properties for general VI, stabilizing VI, integrated VI, evolving VI, adjustable VI schemes and so on. Several significant applications are also elaborated in aspects of optimal regulation, optimal tracking, and zero-sum games. We aim to break through the bottleneck problems for VI algorithms in realizing evolving control, accelerating learning speed, and reducing the calculation expense. In addition, the prospects of new theoretical and technical fields for advanced VI schemes are looked ahead.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Abu-Khalaf M, Lewis FL, Huang J (2006) Policy iterations on the Hamilton–Jacobi–Isaacs equation for H state feedback control with input saturation. IEEE Trans Autom Control 51(12):1989–1995

    MathSciNet  MATH  Google Scholar 

  • Al-Dabooni S, Wunsch DC (2020) An improved n-step value gradient learning adaptive dynamic programming algorithm for online learning. IEEE Trans Neural Netw Learn Syst 31(4):1155–1169

    MathSciNet  Google Scholar 

  • Al-Tamimi A, Abu-Khalaf M, Lewis FL (2007) Adaptive critic designs for discrete-time zero-sum games with application to H control. IEEE Trans Syst Man Cybern Part B 37(1):240–247

    MATH  Google Scholar 

  • Al-Tamimi A, Lewis FL, Abu-Khalaf M (2008) Discrete-time nonlinear HJB solution using approximate dynamic programming: convergence proof. IEEE Trans Syst Man Cybern Part B 38(4):943–949

    Google Scholar 

  • Bellman RE (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Bertsekas DP (2017) Value and policy iterations in optimal control and adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 28(3):500–509

    MathSciNet  Google Scholar 

  • Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont

    MATH  Google Scholar 

  • Bian T, Jiang ZP (2016) Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design. Automatica 71:348–360

    MathSciNet  MATH  Google Scholar 

  • Dierks T, Thumati BT, Jagannathan S (2009) Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence. Neural Netw 22(5–6):851–860

    MATH  Google Scholar 

  • Dong H, Zhao X, Luo B (2022) Optimal tracking control for uncertain nonlinear systems with prescribed performance via critic-only ADP. IEEE Trans Syst Man Cybern 52(1):561–573

    Google Scholar 

  • Fan Q, Wang D, Xu B (2021) H codesign for uncertain nonlinear control systems based on policy iteration method. IEEE Trans Cybern 52(10):10101–10110

    Google Scholar 

  • Ha M, Wang D, Liu D (2020) Event-triggered adaptive critic control design for discrete-time constrained nonlinear systems. IEEE Trans Syst Man Cybern 50(9):3158–3168

    Google Scholar 

  • Ha M, Wang D, Liu D (2021a) Generalized value iteration for discounted optimal control with stability analysis. Syst Control Lett 147(104847):1–7

    MathSciNet  MATH  Google Scholar 

  • Ha M, Wang D, Liu D (2021b) Neural-network-based discounted optimal control via an integrated value iteration with accuracy guarantee. Neural Netw 144:176–186

    Google Scholar 

  • Ha M, Wang D, Liu D (2022a) A novel value iteration scheme with adjustable convergence rate. IEEE Trans Neural Netw Learn Syst 1:11. https://doi.org/10.1109/TNNLS.2022.3143527

    Article  Google Scholar 

  • Ha M, Wang D, Liu D (2022b) Discounted iterative adaptive critic designs with novel stability analysis for tracking control. IEEE/CAA J Autom Sinica 9(7):1262–1272

    Google Scholar 

  • Ha M, Wang D, Liu D (2022c) Offline and online adaptive critic control designs with stability guarantee through value iteration. IEEE Trans Cybern 52:13262–13274

    Google Scholar 

  • He H, Ni Z, Fu J (2012) A three-network architecture for on-line learning and optimization based on adaptive dynamic programming. Neurocomputing 78(1):3–13

    Google Scholar 

  • Heydari A (2014) Revisiting approximate dynamic programming and its convergence. IEEE Trans Cybern 44(12):2733–2743

    Google Scholar 

  • Heydari A (2016) Theoretical and numerical analysis of approximate dynamic programming with approximation errors. J Guid Control Dyn 39(2):301–311

    Google Scholar 

  • Heydari A (2018) Stability analysis of optimal adaptive control under value iteration using a stabilizing initial policy. IEEE Trans Neural Netw Learn Syst 29(9):4522–4527

    Google Scholar 

  • Hou J, Wang D, Liu D, Zhang Y (2020) Model-free H optimal tracking control of constrained nonlinear systems via an iterative adaptive learning algorithm. IEEE Trans Syst Man Cybern 50(11):4097–4108

    Google Scholar 

  • Jiang H, Zhang H (2018) Iterative ADP learning algorithms for discrete-time multi-player games. Artif Intell Rev 50(1):75–91

    Google Scholar 

  • Kiran BR, Sobh I, Talpaert V, Mannion P, Al Sallab AA, Yogamani S, Perez P (2022) Deep reinforcement learning for autonomous driving: a survey. IEEE Trans Neural Netw Learn Syst 23(6):4909–4926

    Google Scholar 

  • Kiumarsi B, Lewis FL (2015) Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems. IEEE Trans Neural Netw Learn Syst 26(1):140–151

    MathSciNet  Google Scholar 

  • Kiumarsi B, Lewis FL, Modares H, Karimpour A, Naghibi-Sistani M-B (2014) Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics. Automatica 50:1167–1175

    MathSciNet  MATH  Google Scholar 

  • Kiumarsi B, Vamvoudakis KG, Modares H, Lewis FL (2018) Optimal and autonomous control using reinforcement learning: a survey. IEEE Trans Neural Netw Learn Syst 29(6):2042–2062

    MathSciNet  Google Scholar 

  • LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444

    Google Scholar 

  • Lee JM, Lee JH (2005) Approximate dynamic programming-based approaches for input-output data-driven control of nonlinear processes. Automatica 41(7):1281–1288

    MathSciNet  MATH  Google Scholar 

  • Lewis FL, Liu D (2013) Reinforcement learning and approximate dynamic programming for feedback control. Wiley, New Jersey

    Google Scholar 

  • Lewis FL, Vrabie D, Vamvoudakis KG (2012) Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst Mag 32(6):76–105

    MathSciNet  MATH  Google Scholar 

  • Li H, Liu D (2012) Optimal control for discrete-time affine non-linear systems using general value iteration. IET Control Theory Appl 6(18):2725–2736

    MathSciNet  Google Scholar 

  • Li C, Ding J, Lewis FL, Chai T (2021) A novel adaptive dynamic programming based on tracking error for nonlinear discrete-time systems. Automatica 129(109687):1–9

    MathSciNet  MATH  Google Scholar 

  • Liang M, Wang D, Liu D (2020a) Improved value iteration for neural-network-based stochastic optimal control design. Neural Netw 124:280–295

    MATH  Google Scholar 

  • Liang M, Wang D, Liu D (2020b) Neuro-optimal control for discrete stochastic processes via a novel policy iteration algorithm. IEEE Trans Systems Man Cybern 50(11):3972–3985

    Google Scholar 

  • Lin M, Zhao B, Liu D (2022) Policy gradient adaptive critic designs for model-free optimal tracking control with experience replay. IEEE Trans Syst Man Cybern 52(6):3692–3703

    Google Scholar 

  • Lincoln B, Rantzer A (2006) Relaxing dynamic programming. IEEE Trans Autom Control 51:1249–1260

    MathSciNet  MATH  Google Scholar 

  • Liu D, Wei Q (2014) Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 25(3):621–634

    Google Scholar 

  • Liu D, Zhang H (2005) A neural dynamic programming approach for learning control of failure avoidance problems. Int J Intell Control Syst 10(1):21–32

    Google Scholar 

  • Liu D, Li H, Wang D (2013a) Data-based self-learning optimal control: research progress and prospects. Acta Autom Sin 39(11):1858–1870

    MathSciNet  MATH  Google Scholar 

  • Liu D, Li H, Wang D (2013b) Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing 110:92–100

    Google Scholar 

  • Liu D, Li H, Wang D (2015a) Error bounds of adaptive dynamic programming algorithms for solving undiscounted optimal control problems. IEEE Trans Neural Netw Learn Syst 26(6):1323–1334

    MathSciNet  Google Scholar 

  • Liu D, Wei Q, Yan P (2015b) Generalized policy iteration adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans Syst Man Cybern 45(12):1577–1591

    Google Scholar 

  • Liu D, Wei Q, Wang D, Yang X, Li H (2017) Adaptive dynamic programming with applications in optimal control. Springer, London

    MATH  Google Scholar 

  • Liu Y, Zhang H, Yu R, Xing Z (2020) H tracking control of discrete-time system with delays via data-based adaptive dynamic programming. IEEE Trans Syst Man Cybern 50(11):4078–4085

    Google Scholar 

  • Liu D, Xue S, Zhao B, Luo B, Wei Q (2021) Adaptive dynamic programming for control: a survey and recent advances. IEEE Trans Syst Man Cybern 51(1):142–160

    Google Scholar 

  • Lu J, Wei Q, Wang F (2020) Parallel control for optimal tracking via adaptive dynamic programming. IEEE/CAA J Autom Sin 7(6):1662–1674

    MathSciNet  Google Scholar 

  • Lu J, Wei Q, Liu Y, Zhou T, Wang F (2022) Event-triggered optimal parallel tracking control for discrete-time nonlinear systems. IEEE Trans Syst Man Cybern 52(6):3772–3784

    Google Scholar 

  • Luo B, Liu D, Huang T, Wang D (2016) Model-free optimal tracking control via critic-only Q-learning. IEEE Trans Neural Netw Learn Syst 27(10):2134–2144

    MathSciNet  Google Scholar 

  • Luo B, Liu D, Huang T, Yang X, Ma H (2017a) Multi-step heuristic dynamic programming for optimal control of nonlinear discrete-time systems. Inf Sci 411:66–83

    MATH  Google Scholar 

  • Luo B, Liu D, Wu H, Wang D, Lewis FL (2017b) Policy gradient adaptive dynamic programming for data-based optimal control. IEEE Trans Cybern 47(10):3341–3354

    Google Scholar 

  • Luo B, Liu D, Wu H (2018a) Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure. IEEE Trans Neural Netw Learn Syst 29(6):2099–2111

    MathSciNet  Google Scholar 

  • Luo B, Yang Y, Liu D (2018b) Adaptive Q-Learning for data-based optimal output regulation with experience replay. IEEE Trans Cybern 48(12):3337–3348

    Google Scholar 

  • Luo B, Yang Y, Wu HN, Huang T (2020) Balancing value iteration and policy iteration for discrete-time control. IEEE Trans Syst Man Cybern 50(11):3948–3958

    Google Scholar 

  • Luo B, Yang Y, Liu D (2021) Policy iteration Q-learning for data-based two-player zero-sum game of linear discrete-time systems. IEEE Trans Cybern 51(7):3630–3640

    Google Scholar 

  • Mehraeen S, Dierks T, Jagannathan S, Crow ML (2013) Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks. IEEE Trans Cybern 43(6):1641–1655

    Google Scholar 

  • Modares H, Lewis FL (2014) Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learning. IEEE Trans Autom Control 59(11):3051–3056

    MathSciNet  MATH  Google Scholar 

  • Mu C, Wang D, He H (2017) Novel iterative neural dynamic programming for data-based approximate optimal control design. Automatica 81:240–252

    MathSciNet  MATH  Google Scholar 

  • Mu C, Wang D, He H (2018) Data-driven finite-horizon approximate optimal control for discrete-time nonlinear systems using iterative HDP approach. IEEE Trans Cybern 48(10):2948–2961

    Google Scholar 

  • Na J, Lv Y, Zhang K, Zhao J (2021) Adaptive identifier-critic based optimal tracking control for nonlinear systems with experimental validation. IEEE Trans Syst Man Cybern 52(1):459–472

    Google Scholar 

  • Pang B, Jiang ZP (2021) Adaptive optimal control of linear periodic systems: an off-policy value iteration approach. IEEE Trans Autom Control 66(2):888–894

    MathSciNet  MATH  Google Scholar 

  • Prokhorov DV, Wunsch DC (1997) Adaptive critic designs. IEEE Trans Neural Netw 8(5):997–1007

    Google Scholar 

  • Prokhorov DV, Santiago RA, Wunsch DC (1995) Adaptive critic designs: a case study for neurocontrol. Neural Netw 8(9):1367–1372

    Google Scholar 

  • Qiao J, Li M, Wang D (2022) Asymmetric constrained optimal tracking control with critic learning of nonlinear multiplayer zero-sum games. IEEE Trans Neural Netw Learn Syst

  • Si J, Wang YT (2001) On-line learning control by association and reinforcement. IEEE Trans Neural Netw 12(2):264–276

    Google Scholar 

  • Si J, Barto AG, Powell WB, Wunsch DC (2004) Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, New Jersey

    Google Scholar 

  • Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S, Grewe D, Nham J, Kalchbrenner N, Sutskever I, Lillicrap T, Leach M, Kavukcuoglu K, Graepel T, Hassabis D (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489

    Google Scholar 

  • Song R, Xiao W, Sun C (2013) Optimal tracking control for a class of unknown discrete-time systems with actuator saturation via data-based ADP algorithm. Acta Autom Sin 39(9):1413–1420

    MathSciNet  Google Scholar 

  • Song R, Li J, Lewis FL (2020) Robust optimal control for disturbed nonlinear zero-sum differential games based on single NN and least squares. IEEE Trans Syst Man Cybern 50(11):4009–4019

    Google Scholar 

  • Song S, Zhu M, Dai X, Gong D (2022) Model-free optimal tracking control of nonlinear input-affine discrete-time systems via an iterative deterministic Q-Learning algorithm. IEEE Trans Neural Netw Learn Syst

  • Sutton RS, Barto AG (2018) Reinforcement learning: an introduction, 2nd edn. The MIT Press, Cambridge

    MATH  Google Scholar 

  • Wang FY, Zhang H, Liu D (2009) Adaptive dynamic programming: an introduction. IEEE Comput Intell Mag 4(2):39–47

    Google Scholar 

  • Wang FY, Jin N, Liu D, Wei Q (2011) Adaptive dynamic programming for finite-horizon optimal control of discrete-time nonlinear systems with ε-error bound. IEEE Trans Neural Netw 22(1):24–36

    Google Scholar 

  • Wang D, Liu D, Wei Q (2012a) Finite-horizon neuro-optimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach. Neurocomputing 78:14–22

    Google Scholar 

  • Wang D, Liu D, Wei Q, Zhao D, Jin N (2012b) Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming. Automatica 48(8):1825–1832

    MathSciNet  MATH  Google Scholar 

  • Wang D, Liu D, Li H, Luo B, Ma H (2016) An approximate optimal control approach for robust stabilization of a class of discrete-time nonlinear systems with uncertainties. IEEE Trans Syst Man Cybern 46(5):713–717

    Google Scholar 

  • Wang D, He H, Liu D (2017a) Adaptive critic nonlinear robust control: a survey. IEEE Trans Cybern 47(10):3429–3451

    Google Scholar 

  • Wang D, He H, Liu D (2017b) Improving the critic learning for event-based nonlinear H control design. IEEE Trans Cybern 47(10):3417–3428

    Google Scholar 

  • Wang D, Mu C, Liu D (2017c) Data-driven nonlinear near-optimal regulation based on iterative neural dynamic programming. Acta Autom Sin 43(3):366–375

    MATH  Google Scholar 

  • Wang D, Ha M, Qiao J (2020a) Self-learning optimal regulation for discrete-time nonlinear systems under event-driven formulation. IEEE Trans Autom Control 65(3):1272–1279

    MathSciNet  MATH  Google Scholar 

  • Wang D, Ha M, Qiao J, Yan J, Xie Y (2020b) Data-based composite control design with critic intelligence for a wastewater treatment platform. Artif Intell Rev 53(5):3773–3785

    Google Scholar 

  • Wang D, Ha M, Qiao J (2021a) Data-driven iterative adaptive critic control towards an urban wastewater treatment plant. IEEE Trans Ind Electron 68(8):7362–7369

    Google Scholar 

  • Wang D, Zhao M, Ha M, Hu L (2021b) Adaptive-critic-based hybrid intelligent optimal tracking for a class of nonlinear discrete-time systems. Eng Appl Artif Intell 105:104443

    Google Scholar 

  • Wang D, Zhao M, Ha M, Ren J (2021c) Neural optimal tracking control of constrained nonaffine systems with a wastewater treatment application. Neural Netw 143:121–132

    Google Scholar 

  • Wang D, Zhao M, Qiao J (2021d) Intelligent optimal tracking with asymmetric constraints of a nonlinear wastewater treatment system. Int J Robust Nonlinear Control 31(14):6773–6787

    Google Scholar 

  • Wang D, Ha M, Zhao M (2022a) The intelligent critic framework for advanced optimal control. Artif Intell Rev 55(1):1–22

    Google Scholar 

  • Wang D, Hu L, Zhao M, Ha M, Qiao J (2022b) Event-triggered control design for optimal tracking of unknown nonlinear zero-sum sames. Acta Autom Sin 49(1):91–101

    Google Scholar 

  • Wang D, Hu L, Zhao M, Qiao J (2022c) Dual event-triggered constrained control through adaptive critic for discrete-time zero-sum games. IEEE Trans Syst Man Cybern 53(3):1584–1595

    Google Scholar 

  • Wang D, Wu J, Ren J, Qiao J (2022d) Online value iteration for intelligent discounted tracking design of constrained systems. IEEE Trans Circ Syst II Express Briefs 69(9):3829–3833

    Google Scholar 

  • Wang D, Zhao H, Zhao M, Ren J (2022e) Novel optimal trajectory tracking for nonlinear affine systems with an advanced critic learning structure. Neural Netw 154:131–140

    Google Scholar 

  • Wang D, Zhao M, Ha M, Qiao J (2022f) Intelligent optimal tracking with application verifications via discounted generalized value iteration. Acta Autom Sin 48(1):182–193

    Google Scholar 

  • Wang D, Ha M, Cheng L (2022g) Neuro-optimal trajectory tracking with value iteration of discrete-time nonlinear dynamics. IEEE Trans Neural Netw Learn Syst

  • Wang D, Ren J, Ha M, Qiao J (2022h) System stability of learning-based linear optimal control with general discounted value iteration. IEEE Trans Neural Netw Learn Syst

  • Wang D, Wu J, Ha M, Zhao M, Li M, Qiao J (2022i) Advanced optimal tracking control with stability guarantee via novel value learning formulation. IEEE Trans Neural Netw Learn Syst

  • Wang D, Wu J, Hu L, Qiao J (2022j) Discounted near-optimal control of affine systems via a progressive cost evolution formulation. Express briefs. IEEE Trans Circ Syst II

  • Wang D, Zhao M, Ha M, Qiao J (2022k) Stability and admissibility analysis for zero-sum games under general value iteration formulation. IEEE Trans Neural Netw Learn Syst

  • Wei Q, Liu D (2012) An iterative \(\epsilon\)-optimal control scheme for a class of discrete-time nonlinear systems with unfixed initial state. Neural Netw 32:236–244

    MATH  Google Scholar 

  • Wei Q, Liu D (2014) A novel iterative \(\theta\)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans Autom Sci Eng 11(4):1176–1190

    Google Scholar 

  • Wei Q, Liu D (2015) A novel policy iteration based deterministic Q-learning for discrete-time nonlinear systems. Sci China Inf Sci 58:1–15

    Google Scholar 

  • Wei Q, Liu D, Shi G (2015a) A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans Ind Electron 62(4):2509–2518

    Google Scholar 

  • Wei Q, Liu D, Yang X (2015b) Infinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans Neural Netw Learn Syst 26(4):866–879

    MathSciNet  Google Scholar 

  • Wei Q, Lewis FL, Sun Q, Yan P, Song R (2015c) Discrete-time deterministic \(Q\)-learning: a novel convergence analysis. IEEE Trans Cybern 47(5):1224–1237

    Google Scholar 

  • Wei Q, Liu D, Lin H (2016a) Value iteration adaptive dynamic programming for optimal control of discrete-time nonlinear systems. IEEE Trans Cybern 46(3):840–853

    Google Scholar 

  • Wei Q, Song R, Xu Y, Liu D (2016b) Iterative Q-learning-based nonlinear optimal tracking control. IEEE Symp Ser Comput Intell 2016:1–5

    Google Scholar 

  • Wei Q, Liu D, Lin Q (2017) Discrete-time local value iteration adaptive dynamic programming: admissibility and termination analysis. IEEE Trans Neural Netw Learn Syst 28(11):2490–2502

    MathSciNet  Google Scholar 

  • Wei Q, Liu D, Lin Q, Song R (2018a) Adaptive dynamic programming for discrete-time zero-sum games. IEEE Trans Neural Netw Learn Syst 29(4):957–969

    Google Scholar 

  • Wei Q, Lewis FL, Liu D, Song R, Lin H (2018b) Discrete-time local value iteration adaptive dynamic programming: convergence analysis. IEEE Trans Syst Man Cybern 48(6):875–891

    Google Scholar 

  • Wei Q, Song R, Liao Z, Li B, Lewis FL (2020) Discrete-time impulsive adaptive dynamic programming. IEEE Trans Cybern 50(10):4293–4306

    Google Scholar 

  • Wei Q, Wang L, Lu J, Wang FY (2021) Discrete-time self-learning parallel control. IEEE Trans Syst Man Cybern 52(1):192–204

    Google Scholar 

  • Werbos PJ (1977) Advanced forecasting methods for global crisis warning and models of intelligence. General Syst Yearbook 22:25–38

    Google Scholar 

  • Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. Neural, Fuzzy, and Adaptive Approaches, Handbook of Intelligent Control, pp 493–526

    Google Scholar 

  • Xue S, Luo B, Liu D, Yang Y (2021) Constrained event-triggered H control based on adaptive dynamic programming with concurrent learning. IEEE Trans Syst Man Cybern

  • Xue S, Luo B, Liu D, Gao Y (2022) Event-triggered ADP for tracking control of partially unknown constrained uncertain systems. IEEE Trans Cybern

  • Yan J, He H, Zhong X, Tang Y (2017) Q-learning-based vulnerability analysis of smart grid against sequential topology attacks. IEEE Trans Inf Forensics Secur 12(1):200–210

    Google Scholar 

  • Yu L, Liu W, Liu Y, Alsaadi FE (2022) Learning-based T-sHDP(λ) for optimal control of a class of nonlinear discrete-time systems. Int J Robust Nonlinear Control 32(5):2624–2643

    MathSciNet  Google Scholar 

  • Zhang H, Wei Q, Luo Y (2008) A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm. IEEE Trans Syst Man Cybern 38(4):937–942

    Google Scholar 

  • Zhang H, Luo Y, Liu D (2009) Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints. IEEE Trans Neural Netw 20(9):1490–1503

    Google Scholar 

  • Zhang H, Liu D, Luo Y, Wang D (2013a) Adaptive dynamic programming for control: algorithms and stability. Springer, London

    MATH  Google Scholar 

  • Zhang H, Zhang X, Luo Y, Yang J (2013b) An overview of research on adaptive dynamic programming. Acta Autom Sin 39(4):303–311

    MATH  Google Scholar 

  • Zhang H, Qin C, Jiang B, Luo Y (2014) Online adaptive policy learning algorithm for H state feedback control of unknown affine nonlinear discrete-time systems. IEEE Trans Cybern 44(12):2706–2718

    Google Scholar 

  • Zhang H, Zhang J, Yang G, Luo Y (2015) Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming. IEEE Trans Fuzzy Syst 23(1):152–163

    Google Scholar 

  • Zhang H, Jiang H, Luo Y, Xiao G (2017a) Data-driven optimal consensus control for discrete-time multi-agent systems with unknown dynamics using reinforcement learning method. IEEE Trans Ind Electron 64(5):4091–4100

    Google Scholar 

  • Zhang H, Liang H, Wang Z, Feng T (2017b) Optimal output regulation for heterogeneous multiagent systems via adaptive dynamic programming. IEEE Trans Neural Netw Learn Syst 28(1):18–29

    Google Scholar 

  • Zhang X, Bo Y, Cui L (2018) Event-triggered optimal control scheme for discrete-time nonlinear zero-sum games. Control Theory Appl 35(5):619–626

    MATH  Google Scholar 

  • Zhang L, Fan J, Xue W, Lopez VG, Li J, Chai T, Lewis FL (2022a) Data-driven H optimal output feedback control for linear discrete-time systems based on off-policy Q-learning. IEEE Trans Neural Netw Learn Syst

  • Zhang Y, Zhao B, Liu D, Zhang S (2022b) Event-triggered control of discrete-time zero-sum games via deterministic policy gradient adaptive dynamic programming. IEEE Trans Syst Man Cybern

  • Zhao M, Wang D, Qiao J, Hu L (2022a) Optimal trajectory tracking control for a class of nonlinear nonaffine systems via generalized N-step value gradient learning. Int J Robust Nonlinear Control 33(6):3471–3490

    MathSciNet  Google Scholar 

  • Zhao M, Wang D, Ha M, Qiao J (2022b) Evolving and incremental value iteration schemes for nonlinear discrete-time zero-sum games. IEEE Trans Cybern

  • Zhong X, Ni Z, He H (2016) A theoretical foundation of goal representation heuristic dynamic programming. IEEE Trans Neural Netw Learn Syst 27(12):2513–2525

    Google Scholar 

  • Zhong X, He H, Wang D, Ni Z (2018) Model-free adaptive control for unknown nonlinear zero-sum differential game. IEEE Trans Cybern 48(5):1633–1646

    Google Scholar 

  • Zhu Y, Zhao D, He H (2020) Invariant adaptive dynamic programming for discrete-time optimal control. IEEE Trans Syst Man Cybern 50(11):3959–3971

    Google Scholar 

  • Zhu Y, Zhao D (2021) Online minimax Q network learning for two-player zero-sum Markov games. IEEE Trans Neural Netw Learn Syst

Download references

Funding

This work was supported in part by the National Key Research and Development Program of China under Grant 2021ZD0112302; and in part by the National Natural Science Foundation of China under Grant Nos. 62222301, 61890930-5, and 62021003. No conflict of interest exits in this manuscript and it has been approved by all authors for publication.

Author information

Authors and Affiliations

Authors

Contributions

All authors’ individual contributions MZ: formal analysis; validation; writing—original draft. DW: investigation; supervision; writing—review & editing. JQ: investigation; supervision; writing—review & editing. MH: investigation; validation; supervision; writing—review & editing. JR: investigation; validation; supervision; writing—review & editing

Corresponding author

Correspondence to Ding Wang.

Ethics declarations

Conflict of interest

The authors declare no competing interests

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, M., Wang, D., Qiao, J. et al. Advanced value iteration for discrete-time intelligent critic control: A survey. Artif Intell Rev 56, 12315–12346 (2023). https://doi.org/10.1007/s10462-023-10497-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-023-10497-1

Keywords

Navigation