Abstract
Adaptive dynamic programming is a hot research topic nowadays. Therefore, the paper concerns a new local policy adaptive iterative dynamic programming (ADP) algorithm. Moreover, this algorithm is designed for the discrete-time nonlinear systems, which are used to solve problems concerning infinite horizon optimal control. The new local policy iteration ADP algorithm has the characteristics of updating the iterative control law and value function within one subset of the state space. Morevover, detailed iteration process of the local policy iteration is presented thereafter. The simulation example is listed to show the good performance of the newly developed algorithm.
Access provided by CONRICYT-eBooks. Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Adaptive dynamic programming (ADP) is always a hot research area since proposed by Werbos [1]. ADP is a very useful and significant intelligent way to solve nonlinear system problems. With the aim of getting optimal control law, the corresponding iterative learning methods are applied to analyze the convergence and optimality characteristics of ADP [2,3,4,5,6,7].
It has to be admitted that the iterative control laws and the iterative value functions usually have to be updated in the whole state space [8,9,10,11,12,13,14,15,16,17,18], which are also as “global policy iteration algorithms”. Moreover, the global policy iteration algorithms have the disadvantages of low efficiency during applications. Most of time, the algorithm has to pause to wait for the accomplishment of a search of the whole state area. Correspondingly, the computation efficiency goes down in the global policy iteration algorithm. The constraint has hindered the development of this research area. Therefore, useful policy iteration algorithms need to be proposed to increase computation efficiency.
This paper has proposed a new “local policy iteration algorithm” concerning the discrete nonlinear systems. It proves its usage to iterative in a small area. The algorithm has the ability to update the iterative control laws and also the iterative value functions within the given area of the state space. Despite the fact of iterative control laws updating within a preset state space, the system still has the ability to keep stable under any kind of iterative control law. At the end, the simulation part shows the good performance of this newly developed method.
2 Problem Statement
We assume a deterministic discrete-time nonlinear system here
where \(s_k \in {\mathbb {R}}^n\) is the state vector. Besides, \(c_k\in {\mathbb {R}}^m\) is the control vector. Assume \(s_0\) as the initial state and \(F(s_k,c_k)\) as the system function. Assume \({\underline{c}}_k=(c_k,c_{k+1},\dots )\) as an arbitrary sequence of controls. The performance index function can be defined as
for state \(s_0\) under the control sequence \({\underline{c}}_0=(c_0,c_1,\dots )\). The utility function \(U(x_k,c_k)\) is a positive definite function for \(s_k\) and \(c_k\). It is noted that \({\underline{c}}_k\) changes from k to \(\infty \).
We aim to find an optimal scheme. The scheme has the ability to minimize performance index function (2) while stabilizing system (1).
Assume the control sequence set as \(\underline{{\mathfrak {U}}}_k=\big \{{\underline{c}}_k :{\underline{c}}_k=(c_k, c_{k+1}, \ldots ),\, \forall c_{k+i}\in {\mathbb {R}}^m, i=0,1,2,\ldots \big \}\).
Then, for an arbitrary control sequence \({\underline{c}}_k \in \underline{{\mathfrak {U}}}_k\), the optimal performance index function is
Based on Bellman principle of optimality, \(J^*(s_k)\) meet the requirement of the discrete-time HJB formula
Define the law of optimal control as
Therefore, the HJB Eq. (4) is
Overall, there exists the curse of dimensionality. So it is very difficult to obtain the numerical results for the traditional dynamic programming algorithms. Considering this situation, we have proposed a new ADP algorithm thereafter.
3 Descriptions of This New Local Iterative ADP Algorithm
We have designed a new local iterative ADP algorithm. This section gives a detailed description of the algorithm. It is designed to have the ability to get the optimal control law for system (1) correspondingly. Assume \(\{\varTheta _s^{i }\}\) as the state sets, \(\varTheta _s^{i } \subseteq \varOmega _s\), \(\forall \,i\). The value iteration functions and the control laws of the newly developed algorithm have to be updated iteratively.
For all \( s_k\in \varOmega _s\), assume \(v_0(s_k)\) as an admissible control law. Besides, assume \(V_0(x_s)\) as the initial iterative value function for all \(s_k\in \varOmega _s\). The function satisfies the generalized HJB (GHJB) equation
where \(s_{k+1}=F(s_k,v_0(s_k))\). Then, for all \( s_k \in \varTheta _s^{0}\), the local iterative control law \(v_1(s_k)\) is computed as
and let \(v_1(s_k)=v_0(s_k)\), for all \( s_k \in {\varOmega _s}\mathrm{{\backslash }}{\varTheta }_s^0\).
For all \( s_k \in \varOmega _s\), assume \(V_1(s_k)\) as the iterative value function. Therefore, \(V_1(s_k)\) satisfies the GHJB equation
For \( i=1,2,\ldots \), assume \(V_i (s_k )\) as the iterative value function. So \(V_i (s_k )\) can satisfy the following GHJB equation
For all \( s_k \in \varTheta _x^{i}\), the iterative control law \(v_{i+1}(s_k)\) should be computed as
and for all \( s_k \in {\varOmega _s}\mathrm{{\backslash }}{\varTheta }_s^i\), let \(v_{i+1}(s_k)=v_{i}(s_k)\).
The local policy iteration algorithm will be updated within the preset subset of state space according to Eqs. (7) and (11). The given subset is part of whole state space. Therefore, during iterations, once local data of state space is got, the newly developed algorithm can be performed immediately. The advantage is that the algorithm can save lots of time while competing all the data of the whole space in traditional algorithms. Therefore, the computation efficiency can be improved greatly and save a lot of trouble. Besides, if the preset subset of state space is enlarged to all, local policy iteration algorithms equal to the global policy iteration ones.
4 Simulation Examples
First, we have chosen a discretized nonaffine nonlinear system as follows
We choose the utility function as \(Q=I_1\) and \(R=I_2\). Thereafter, We choose the state space as \(\varOmega _s\). While \(I_1\) and \(I_2\), are denoted as the identity matrices with suitable dimensions. Let the initial state be \(s_0=[1,-1]^{\mathsf {T}}\). Based on Algorithm 1 in [16].
The iterative value functions and iterative control laws should be updated accordingly. After 30 iterations, the algorithm has reached corresponding computing precision of \(\varepsilon = 0.001\). Figure 1(a) shows that the iterative value function is monotonically nonincreasing. More importantly, the value function converges to the optimum. Figure 1(b) illustrates the trajectories of simulation states while Fig. 1(c) shows the simulation functions. In Fig. 1(d), we have shown the optimal trajectories of control and also states correspondingly.
5 Conclusion
We proposed a new local policy iteration ADP algorithm in this paper. The algorithm has the ability to greatly improve the computation efficiency of traditional ADP algorithm concerning discrete time nonlinear systems. Therefore, it can reduce computation time greatly which contrast to traditional global policy iteration algorithms. The characteristic concerning this newly developed algorithm is that the iteration control laws and iterative iteration control laws are updated within a preset area of the state space. Besides, the simulation results have proven its effectiveness of the newly developed algorithm.
References
Werbos, P.: Advanced forecasting methods for global crisis warning and models of intelligence. Gen. Syst. Yearb. 22, 25–38 (1977)
Fu, Y., Fu, J., Chai, T.: Robust adaptive dynamic programming of two-player zero-sum games for continuous-time linear systems. IEEE Trans. Neural Netw. Learn. Syst. 26, 3314–3319 (2015). doi:10.1109/TNNLS.2015.2461452
Abouheaf, M., Lewis, F., Vamvoudakis, K., Haesaert, S., Babuska, R.: Multi-agent discrete-time graphical games and reinforcement learning solutions. Automatica 50(12), 3038–3053 (2014)
Zargarzadeh, H., Dierks, T., Jagannathan, S.: Optimal control of nonlinear continuous-time systems in strict-feedback form. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2535–2549 (2015)
Wei, Q., Liu, D.: Data-driven neuro-optimal temperature control of water gas shift reaction using stable iterative adaptive dynamic programming. IEEE Trans. Industr. Electron. 61(11), 6399–6408 (2014)
Heydari, A.: Revisiting approximate dynamic programming and its convergence. IEEE Trans. Cybern. 44(12), 2733–2743 (2014)
Lewis, F., Vrabie, D., Vamvoudakis, K.: Reinforcement learning and feedback control: using natural decision methods to design optimal adaptive controllers. IEEE Control Syst. 32(6), 76–105 (2012)
Wei, Q., Liu, D., Lin, H.: Value iteration adaptive dynamic programming for optimal control of discrete-time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 27(2), 444–458 (2016)
Wei, Q., Liu, D., Yang, X.: Inifinite horizon self-learning optimal control of nonaffine discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 26(4), 879–886 (2015)
Wei, Q., Song, R., Yan, P.: Data-driven zero-sum neuro-optimal control for a class of continuous-time unknow nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 27(2), 444–458 (2016)
Wei, Q., Wang, F., Liu, D., Yang, X.: Finite-approximation-error based discrete-time iterative adaptive dynamic programming. IEEE Trans. Cybern. 44(12), 2820–2833 (2014)
Wei, Q., Liu, D., Shi, G., Liu, Y.: Optimal multi-battery coordination control for home energy management systems via distributed iterative adaptive dynamic programming. IEEE Trans. Ind. Electron. 42(7), 4203–4214 (2015)
Wei, Q., Liu, D., Shi, G.: A novel dual iterative Q-learning method for optimal battery management in smart residential environments. IEEE Trans. Ind. Electron. 62(4), 2509–2518 (2015)
Wei, Q., Liu, D.: A novel iterative \(\theta \)-adaptive dynamic programming for discrete-time nonlinear systems. IEEE Trans. Autom. Sci. Eng. 11(4), 1176–1190 (2014)
Wei, Q., Liu, D.: Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Trans. Autom. Sci. Eng. 11(4), 1020–1036 (2014)
Liu, D., Wei, Q.: Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems. IEEE Trans. Neural Netw. Learn. Syst. 25(3), 621–634 (2014)
Xu, X., Hou, Z., Lian, C., He, H.: Online learning control using adaptive critic designs with sparse kernel machines. IEEE Trans. Neural Netw. Learn. Syst. 24(5), 762–775 (2013)
Liu, D., Yang, X., Wang, D., Wei, Q.: Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE Trans. Cybern. 45(7), 1372–1385 (2015)
Acknowledgments
This work was supported in part by the National Natural Science Foundation of China under Grants 61233001, 61273140, 61374105, and 61304079.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Wei, Q., Xu, Y., Lin, Q., Liu, D., Song, R. (2017). Local Policy Iteration Adaptive Dynamic Programming for Discrete-Time Nonlinear Systems. In: Cong, F., Leung, A., Wei, Q. (eds) Advances in Neural Networks - ISNN 2017. ISNN 2017. Lecture Notes in Computer Science(), vol 10262. Springer, Cham. https://doi.org/10.1007/978-3-319-59081-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-59081-3_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59080-6
Online ISBN: 978-3-319-59081-3
eBook Packages: Computer ScienceComputer Science (R0)