Keywords

1 Introduction

In order to design optimal control law for uncertain systems, the approximate/adaptive dynamic programming (ADP) approach is a biologically – inspired, non-model-based, computational method that has been used in numerous researches [1], such that appropriate reinforcement learning systems design by Werbos or neuro-dynamic programming by Bertsekas. In [1], the recent development of ADP theory consist of three approaches: operations research, real-time control problems for dynamical systems, applying previous ADP to nonlinear uncertain systems.

The non-model-based approach has been extensively implemented to investigate adaptive optimal stabilization and corresponding tracking problems [5, 6]. In [5], the control law was obtained after transforming the robust control problem into an optimal control problem. The corresponding optimal control depend on discrete-time HJB equation and it was solved by using a neural network. The proposed control law in [5] ensures closed-loop locally asymptotic stability of uncertain nonlinear system with an inequality condition. In [6], Fan et al. proposed the sliding mode controller based on adaptive optimal control theory for partially unknown nonlinear systems with input disturbances. The nearly optimal control design ensures stability of the equivalent sliding-mode dynamics by using policy iteration algorithm [6]. The critic network is utilized to approximate the cost function to overcome the difficulty at the second step of policy iteration algorithm. The proposed controller in [6] ensures closed-loop UUB stability of uncertain nonlinear system depend on the property of bounded signal.

In [7], Jiang and Jiang pointed out the control design based on continuous time systems and the equivalent HJB equation. However, the analytical solution of HJB equation is difficult to be obtained and [7] proposed the PI online technique. The stability analysis of closed-loop system pointed out input state stability (ISS) property and the estimation of attraction region depend on KL functions. In [8], Jiang proposed adaptive optimal control law based on algebraic Ricatti equation for uncertain linear systems without external disturbance. The computational adaptive optimal control algorithm was developed from Kleinman (1968) result. In [2, 9], the discretized model is utilized to propose PI, VI – based output ADP design control techniques. However, the results of these proposed control law much depend on the sample time of discrete systems.

Remarkably, this paper extends [2, 9], which focuses on the continuous time systems with external disturbance, to robust control law of uncertain systems. Additionally, we develop a new adaptive optimal control for continuous-time systems with uncertainties under the framework of the idea of ADP problem and external disturbances.

2 Adaptive Optimal Control Design

In this paper, we study a class of continuous-time systems described by:

$$ \left\{ {\begin{array}{*{20}l} {\dot{z} = g\left( {z;y;v} \right)} \hfill \\ {\dot{x} = Ax + B\left( {u + \Delta \left( {z;y;v} \right)} \right) + Dv} \hfill \\ {\dot{v} = Ev} \hfill \\ {e = Cx + Fv} \hfill \\ \end{array} } \right. $$
(1)

where \( x \in {\mathbf{\mathbb{R}}}^{n} \) is the measured component of the state available for feedback control, \( u \in {\mathbf{\mathbb{R}}}^{m} \) is the input, \( y = Cx \in {\mathbf{\mathbb{R}}}^{r} \) represents the output of plant, \( y_{d} = - Fv \in {\mathbf{\mathbb{R}}}^{r} \) is the reference signal to be tracked, \( e \in {\mathbf{\mathbb{R}}}^{r} \) is tracking error. \( z \in {\mathbf{\mathbb{R}}}^{p} ;v \in {\mathbf{\mathbb{R}}}^{q} \) are the states of the exosystem. The functions \( g:{\mathbf{\mathbb{R}}}^{p} \times {\mathbf{\mathbb{R}}}^{r} \times {\mathbf{\mathbb{R}}}^{q} \to {\mathbf{\mathbb{R}}}^{p} \) and \( \Delta :{\mathbf{\mathbb{R}}}^{p} \times {\mathbf{\mathbb{R}}}^{r} \times {\mathbf{\mathbb{R}}}^{q} \to {\mathbf{\mathbb{R}}}^{m} \) are two locally Lipschitz functions satisfying, \( g\left( {0;0;0} \right) = 0 \) and \( \Delta \left( {0;0;0} \right) = 0 \). Suppose \( A \in {\mathbf{\mathbb{R}}}^{n \times n} ; \) \( B \in {\mathbf{\mathbb{R}}}^{n \times m} ; \) \( C \in {\mathbf{\mathbb{R}}}^{r \times n} ; \) \( D \in {\mathbf{\mathbb{R}}}^{n \times q} ; \) \( E \in {\mathbf{\mathbb{R}}}^{q \times q} ; \) \( F \in {\mathbf{\mathbb{R}}}^{r \times q} ; \) \( g \in {\mathbf{\mathbb{R}}}^{p} ; \) \( \Delta \in {\mathbf{\mathbb{R}}}^{m} \) are unknown and \( z;v;y;y_{d} \) are unmeasurable.

The control objective is to find the adaptive optimal control law based on an iterative algorithm ensures that tracking errors converge to zero and convergence properties of this iterative algorithm in presence of uncertain and external disturbance in system. It is known from the adaptive optimal control literature that the following assumptions are given for solving the classical adaptive optimal control problem.

Assumption 1:

The pair \( \left( {A;B} \right) \) is controllable.

Assumption 2:

The transmission zeros condition holds, i.e.,

$$ rank\left[ {\begin{array}{*{20}c} {A - \lambda I} & B \\ C & 0 \\ \end{array} } \right] = n + r;\forall \lambda \in \delta \left( E \right). $$

Assumption 3:

The minimal polynomial of \( E \) is available, which is:

$$ {\Gamma}_{E} \left( s \right) = \prod\limits_{i = 1}^{M} {\left( {s - \lambda_{i} } \right)^{{a_{j}^{{}} }} } \prod\limits_{j = 1}^{N} {\left( {s^{2} - 2\mu_{j}^{{}} s + \mu_{j}^{2} + \omega_{j}^{2} } \right)^{{b_{j}^{{}} }} } $$

with degree \( q_{E} \le q \) and \( a_{i} ;b_{j} \) are positive integers and \( \lambda_{i} ;\mu_{j} ;\omega_{j} \in {\mathbf{\mathbb{R}}} \) for \( i = \overline{1;M} ;j = \overline{1;N} \).

Assumption 4:

There exists functions \( \beta_{z} ;\beta_{\Delta } \) of class \( KL \) and a function \( \gamma_{z} ;\gamma_{\Delta } \) of class \( K \), both of which are independent of any \( v \) satisfies:

$$ \begin{aligned} & \left\| {z\left( t \right)} \right\| \le \beta_{z} \left( {\left\| {z\left( 0 \right)} \right\|,t} \right) + \gamma_{z} \left( {\left\| e \right\|} \right) \\ & \left\| {\Delta \left( t \right)} \right\| \le \beta_{z} \left( {\left\| {\Delta \left( 0 \right)} \right\|,t} \right) + \gamma_{z} \left( {\left\| e \right\|} \right) \\ \end{aligned} $$

Assumption 5:

There exist a continuously differentiable, positive definite and radially unbounded function \( {\Pi} :{\mathbf{\mathbb{R}}}^{p} \to {\mathbf{\mathbb{R}}} \) and two constant \( c_{1} > 0;c_{2} > 0 \), such that:

$$ \begin{aligned} & \frac{{\partial {\Pi} }}{\partial z}g\left( {z,y,v} \right) \le - c_{1} \left| {\Delta \left( {z,y,v} \right)} \right|^{2} + c_{2} \left| e \right|^{2} ; \\ & \forall z \in {\mathbf{\mathbb{R}}}^{p} ;y \in {\mathbf{\mathbb{R}}}^{r} \\ \end{aligned} $$

Assumption 6:

There exists a constant known number \( \xi > 0 \) such that matrix \( C \) satisfied \( \left\| C \right\| \le \xi \).

Remark 1:

It is different from [2], this paper implement class of systems satisfying \( z;v;y;y_{d} \) are unmeasurable and \( x,u,e \) are measurable.

We introduce the classical theorem (Kleinman 1968-Method) [4]: Let \( K_{0} \) be any stabilizing feedback gain matrix, and repeat the following steps for \( k = 0;1; \ldots \)

  • Step 1: Solve for the real symmetric positive definite solution \( P_{k} \) of the Lyapunov equation:

    $$ A_{k}^{T} P_{k} + P_{k} A_{k} + Q + K_{k}^{T} K_{k} = 0 $$
    (2)
  • Step 2: Update the feedback gain matrix by:

  • $$ K_{k + 1} = B^{T} P_{k} $$
    (3)

Then, the following properties hold:

  1. (1)

    \( A - BK_{k} \) is Hurwitz.

  2. (2)

    \( P^{*} \le P_{k + 1} \le P_{k} \)

  3. (3)

    \( \lim\limits_{k \to \infty } K_{k} = K^{*} ; \lim\limits_{k \to \infty } P_{k} = P^{*} \)

We propose the adaptive optimal control law based on the next theorems described as follows:

Theorem 1:

We denote \( \varepsilon = x - Xv \), if a controller is designed as \( u = - K^{*} \varepsilon + Uv \) where \( X;U \) solve the following regulator equation:

$$ \left\{ {\begin{array}{*{20}l} {XE = AX + BU + D} \hfill \\ {0 = CX + F} \hfill \\ \end{array} } \right. $$

and \( K^{*} = B^{T} P^{*} \) with the symmetric matrix \( P^{*} > 0 \) is the unique solution of the well-known algebraic Riccati equation:

$$ P^{*} A + A^{T} P^{*} + Q - P^{*} BB^{T} P^{*} = 0 $$
(4)

and the weighting matrices \( Q \) in (4) satisfying \( \lambda_{\hbox{min} } \left( Q \right) > \gamma .\xi^{2} > \frac{{c_{2} }}{{c_{1} }}.\xi^{2} \) then the closed-loop system achieves disturbance rejection and asymptotic tracking.

Proof:

Using assumption 2 implies the regulator equation is solvable for any matrices \( D;F \).

$$ \varepsilon = x - Xv $$
(5)
$$ \begin{aligned} & \Rightarrow \dot{\varepsilon } = \dot{x} - X\dot{v} \\ & \Rightarrow \dot{\varepsilon } = Ax + B\left( {u + \Delta \left( {z;y;v} \right)} \right) + Dv - XEv \Rightarrow \dot{\varepsilon } = A\left( {\varepsilon + Xv} \right) + B\left( {u + \Delta \left( {z;y;v} \right)} \right) + Dv - XEv \\ & \Rightarrow \dot{\varepsilon } = A\varepsilon + B\left( {u - Uv + \Delta \left( {z;y;v} \right)} \right) \\ \end{aligned} $$
$$ e = y - y_{d} = Cx + Fv = C\left( {\varepsilon + Xv} \right) + Fv\quad \quad \Rightarrow e = C\varepsilon $$
(6)

Define \( V_{1} = \varepsilon^{T} P^{*} \varepsilon \), we have:

$$ \begin{aligned} \dot{V}_{1} & = \varepsilon^{T} P^{*} \left( {A\varepsilon + B\left( { - K^{*} \varepsilon + \Delta } \right)} \right) + \left( {A\varepsilon + B\left( { - K^{*} \varepsilon + \Delta } \right)} \right)^{T} P^{*} \varepsilon \\ & = \varepsilon^{T} \left( {P^{*} \left( {A - BK^{*} } \right) + \left( {A - BK^{*} } \right)^{T} P^{*} } \right)\varepsilon + \varepsilon^{T} P^{*} B\Delta + \Delta^{T} B^{T} P^{*} \varepsilon \\ & = \varepsilon^{T} \left( { - Q - P^{*} BB^{T} P^{*} } \right)\varepsilon + 2\varepsilon^{T} P^{*} B\Delta = - \varepsilon^{T} Q\varepsilon - \left| {B^{T} P^{*} \varepsilon - \Delta } \right|^{2} + \left| \Delta \right|^{2} \\ \end{aligned} $$

By using assumption 6, we have:

$$ Q \ge \lambda_{\hbox{min} } \left( Q \right).I > \gamma .\xi^{2} \ge \gamma .\left\| C \right\|^{2} \ge \gamma C^{T} C $$
(7)

Then

$$ \dot{V}_{1} \le - \gamma \varepsilon^{T} C^{T} C\varepsilon + \left| \Delta \right|^{2} = - \gamma \left| e \right|^{2} + \left| \Delta \right|^{2} $$

Setting \( V = V_{1} + \frac{1}{{c_{1} }}{\Pi} \left( z \right) \), by using assumption 5, we have:

$$ \dot{V} \le - \gamma \left| e \right|^{2} + \left| \Delta \right|^{2} + \frac{1}{{c_{1} }}\left( { - c_{1} \left| \Delta \right|^{2} + c_{2} \left| e \right|^{2} } \right) \Rightarrow \dot{V} \le - \left( {\gamma - \frac{{c_{2} }}{{c_{1} }}} \right)\left| e \right|^{2} $$

It is clear that a direct application of LaSalle’s Invariance Principle yields the GAS property of the closed-loop system.

Theorem 2:

There exists a small constant \( \alpha > 0 \), such that for all symmetric matrix \( P > 0 \) satisfying \( \left| {P - P^{*} } \right| < \alpha \), the overall system is GAS with controller is \( u = - B^{T} P\varepsilon + Uv \).

Proof:

Since (4), for any symmetric matrix \( P > 0 \) we have: \( A^{T} P + PA + \widehat{Q} - PBB^{T} P = 0 \).

where:

$$ \widehat{Q} = Q + \left( {P^{*} - P} \right)A + A^{T} \left( {P^{*} - P} \right) + PBB^{T} P - P^{*} BB^{T} P^{*} $$

Since (7), we have \( Q > \gamma C^{T} C \) then exists a constant \( \mu > 0 \) such that: \( Q - \gamma C^{T} C > \mu I \) .Then, by continuity, there exists \( \alpha > 0 \), such that for all symmetric matrix \( P > 0 \) satisfying \( \left| {P - P^{*} } \right| < \alpha \) we have \( \widehat{Q} > Q - \mu I \), which implies \( \widehat{Q} > \gamma C^{T} C \). Therefore, by Theorem 1, the control \( u = - B^{T} P\varepsilon + Uv \) globally asymptotically stabilizes system.

Remark 2:

The proposed optimal control law guarantees the GAS property of the closed-loop system in presence of uncertain parameters and external disturbances.

Using assumption 3, we can always find a vector \( \bar{v}\left( t \right) \in {\mathbf{\mathbb{R}}}^{{q_{E} }} \) and a matrix \( \bar{E} \in {\mathbf{\mathbb{R}}}^{{q_{E} \times q_{E} }} \) such that:

$$ \begin{aligned} \dot{\bar{v}}\left( t \right) = \bar{E}.\bar{v}\left( t \right) \hfill \\ v\left( t \right) = G.\bar{v}\left( t \right) \, \hfill \\ \end{aligned} $$
(8)

with \( G \in {\mathbf{\mathbb{R}}}^{{q \times q_{E} }} \) is an unknown constant matrix.

From (5); (8) and Theorem 1, the LOORP is solved, if we design a controller \( u = K^{*} \left( {x - XG\bar{v}\left( t \right)} \right) + UG\bar{v}\left( t \right) \)

PI based output ADP design:

Suppose \( \Delta \) is available during the learning phase.

$$ \begin{aligned} & \dot{\varepsilon } = A\varepsilon + B\left( {u - UG\bar{v} + \Delta \left( {z;y;v} \right)} \right) = \left( {A - BK_{k} } \right)\varepsilon + B\left( {u + K_{k} \varepsilon - UG\bar{v} + \Delta } \right) \\ & \dot{\varepsilon } = A_{k} \varepsilon + B\left( {w + K_{k} \varepsilon - UG\bar{v}} \right) \\ \end{aligned} $$

Define \( \bar{P}_{k} = C^{T} P_{k} C \) (8), from (6), we have:

$$ \begin{array}{*{20}l} {e\left( {t + \delta t} \right)^{T} \overline{P}_{k} e\left( {t + \delta t} \right) - e\left( t \right)^{T} \overline{P}_{k} e\left( t \right) = \,\varepsilon \left( {t + \delta t} \right)^{T} P_{k} \varepsilon \left( {t + \delta t} \right) - \varepsilon \left( t \right)^{T} P_{k} \varepsilon \left( t \right)} \hfill \\ {\;\; = \int\limits_{t}^{t + \delta t} {\left[ {\varepsilon^{T} \left( {A_{k}^{T} P_{k} + P_{k} A_{k} } \right)\varepsilon + 2\left( {w + K_{k} \varepsilon - UG\bar{v}} \right)^{T} B^{T} P_{k} \varepsilon } \right]} d\tau } \hfill \\ \end{array} $$
(9)
$$ = - \int\limits_{t}^{t + \delta t} {\varepsilon^{T} \left( {Q + K_{k}^{T} K_{k} } \right)} \varepsilon d\tau + 2\int\limits_{t}^{t + \delta t} {\left( {w + K_{k} \varepsilon } \right)^{T} K_{k + 1} \varepsilon } d\tau - 2\int\limits_{t}^{t + \delta t} {\bar{v}^{T} \left( {UG} \right)^{T} K_{k + 1} \varepsilon } d\tau $$
(10)
$$ {\text{Define}}\;XG = \bar{X};UG = \bar{U};Q + K_{k}^{T} K_{k} = Q_{k} . $$
(11)

Since (5); (8); (11), we have:

$$ \varepsilon^{T} \left( {Q + K_{k}^{T} K_{k} } \right)\varepsilon = \left( {x^{T} - \bar{v}^{T} \bar{X}^{T} } \right)Q_{k} \left( {x - \bar{X}\bar{v}} \right) = x^{T} Q_{k} x - \bar{v}^{T} \bar{X}^{T} Q_{k} x - x^{T} Q_{k} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} Q_{k} \bar{X}\bar{v} $$
$$ \begin{aligned} & \left( {w + K_{k} \varepsilon } \right)^{T} K_{k + 1} \varepsilon = w^{T} K_{k + 1} \varepsilon + \varepsilon^{T} K_{k}^{T} K_{k + 1} \varepsilon = w^{T} K_{k + 1} \left( {x - \bar{X}\bar{v}} \right) + \left( {x^{T} - \bar{v}^{T} \bar{X}^{T} } \right)K_{k}^{T} K_{k + 1} \left( {x - \bar{X}\bar{v}} \right) \\ & = w^{T} K_{k + 1} x - w^{T} K_{k + 1} \bar{X}\bar{v} + x^{T} K_{k}^{T} K_{k + 1} x - \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} x - x^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v} \\ \end{aligned} $$
$$ \bar{v}^{T} \left( {UG} \right)^{T} K_{k + 1} \varepsilon = \bar{v}^{T} \bar{U}^{T} K_{k + 1} \left( {x - \bar{X}\bar{v}} \right) = \bar{v}^{T} \bar{U}^{T} K_{k + 1} x - \bar{v}^{T} \bar{U}^{T} K_{k + 1} \bar{X}\bar{v} $$

Then

$$ \begin{aligned} & \,\;e\left( {t + \delta t} \right)^{T} \overline{P}_{k} e\left( {t + \delta t} \right) - e\left( t \right)^{T} \overline{P}_{k} e\left( t \right) \\ & \,\; = - \int\limits_{t}^{t + \delta t} {\left[ {x^{T} Q_{k} x - \bar{v}^{T} \bar{X}^{T} Q_{k} x - x^{T} Q_{k} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} Q_{k} \bar{X}\bar{v}} \right]} d\tau + 2\int\limits_{t}^{t + \delta t} {\left[ {\begin{array}{*{20}l} {w^{T} K_{k + 1} x - w^{T} K_{k + 1} \bar{X}\bar{v}} \hfill \\ { + x^{T} K_{k}^{T} K_{k + 1} x - \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} x} \hfill \\ { - x^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v} + \bar{v}^{T} \bar{X}^{T} K_{k}^{T} K_{k + 1} \bar{X}\bar{v}} \hfill \\ \end{array} } \right]} d\tau \\ & \,\; - 2\int\limits_{t}^{t + \delta t} {\left[ {\bar{v}^{T} \bar{U}^{T} K_{k + 1} x - \bar{v}^{T} \bar{U}^{T} K_{k + 1} \bar{X}\bar{v}} \right]} d\tau \\ & \,\; = 2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} x} \right)} d\tau - 2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} \bar{X}\bar{v}} \right)} d\tau + \int\limits_{t}^{t + \delta t} {x^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right)x} d\tau + \int\limits_{t}^{t + \delta t} {x^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}\bar{v}} d\tau \\ & \,\; + \int\limits_{t}^{t + \delta t} {\bar{v}^{T} \bar{X}^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)x} d\tau + + \int\limits_{t}^{t + \delta t} {\bar{v}^{T} \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}\bar{v}} d\tau + - 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \bar{U}^{T} K_{k + 1} x} \right)} d\tau \, \\ & \;\;{ = }\,2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} x} \right)} d\tau - 2\int\limits_{t}^{t + \delta t} {\left( {w^{T} K_{k + 1} \bar{X}\bar{v}} \right)} d\tau + 2\int\limits_{t}^{t + \delta t} {x^{T} K_{k}^{T} K_{k + 1} x} d\tau - \int\limits_{t}^{t + \delta t} {x^{T} Q_{k} x} d\tau \\ & \;\; + \, 2\int\limits_{t}^{t + \delta t} {x^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}\bar{v}} d\tau + \int\limits_{t}^{t + \delta t} {\bar{v}^{T} \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}\bar{v}} d\tau - 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \bar{U}^{T} K_{k + 1} x} \right)} d\tau \\ \end{aligned} $$

Applying Kronecker product representation gives:

$$ e^{T} \overline{P}_{k} e = \left( {e^{T} \otimes e^{T} } \right)vec\left( {\overline{P}_{k} } \right);w^{T} K_{k + 1} x = \left( {x^{T} \otimes w^{T} } \right)vec\left( {K_{k + 1} } \right);w^{T} K_{k + 1} \bar{X}\bar{v} = \left( {\bar{v}^{T} \otimes w^{T} } \right)vec\left( {K_{k + 1} \bar{X}} \right); $$
$$ x^{T} K_{k}^{T} K_{k + 1} x = \left( {x^{T} \otimes \left( {K_{k} x} \right)^{T} } \right)vec\left( {K_{k + 1} } \right);x^{T} \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}\bar{v} = \left( {\bar{v}^{T} \otimes x^{T} } \right)vec\left( {\left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X}} \right); $$
$$ \bar{v}^{T} \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}\bar{v} = \left( {\bar{v}^{T} \otimes \bar{v}^{T} } \right)vec\left[ {\left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X}} \right] $$
$$ \bar{v}^{T} \bar{U}^{T} K_{k + 1} x = \left( {x^{T} \otimes \bar{v}^{T} } \right)vec\left( {\bar{U}^{T} K_{k + 1} } \right); \, $$

Define: \( \left( {Q_{k} - 2K_{k}^{T} K_{k + 1} } \right)\bar{X} = G_{1;k} ; \)

$$ \left[ {\bar{X}^{T} \left( {2K_{k}^{T} K_{k + 1} - Q_{k} } \right) + 2\bar{U}^{T} K_{k + 1} } \right]\bar{X} = G_{2;k} ;\bar{U}^{T} K_{k + 1} = G_{3;k} $$
$$ {\Phi}_{k} = \left[ {\begin{array}{*{20}c} {{\Upsilon} \left( {t_{0}^{\left( k \right)} } \right)} & {\vartheta \left( {t_{0}^{\left( k \right)} } \right)} & {\nu \left( {t_{0}^{\left( k \right)} } \right)} & {\chi \left( {t_{0}^{\left( k \right)} } \right)} & {\pi \left( {t_{0}^{\left( k \right)} } \right)} & {\sigma \left( {t_{0}^{\left( k \right)} } \right)} \\ {{\Upsilon} \left( {t_{1}^{\left( k \right)} } \right)} & {\vartheta \left( {t_{1}^{\left( k \right)} } \right)} & {\nu \left( {t_{1}^{\left( k \right)} } \right)} & {\chi \left( {t_{1}^{\left( k \right)} } \right)} & {\pi \left( {t_{1}^{\left( k \right)} } \right)} & {\sigma \left( {t_{1}^{\left( k \right)} } \right)} \\ \ldots & \ldots & \ldots & \ldots & \ldots & \ldots \\ {{\Upsilon} \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\vartheta \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\nu \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\chi \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\pi \left( {t_{l - 1}^{\left( k \right)} } \right)} & {\sigma \left( {t_{l - 1}^{\left( k \right)} } \right)} \\ \end{array} } \right];{\Psi}_{k} = \left[ \begin{aligned} \rho \left( {t_{0}^{\left( k \right)} } \right) \hfill \\ \rho \left( {t_{1}^{\left( k \right)} } \right) \hfill \\ \ldots \hfill \\ \rho \left( {t_{l - 1}^{\left( k \right)} } \right) \hfill \\ \end{aligned} \right] $$
$$ \begin{aligned} & {\Upsilon} \left( t \right) = e^{T} \left( t \right) \otimes e^{T} \left( t \right) - e^{T} \left( {t + \delta t} \right) \otimes e^{T} \left( {t + \delta t} \right);\vartheta \left( t \right) = 2\int\limits_{t}^{t + \delta t} {\left( {x^{T} \otimes \left( {w + K_{k} x} \right)^{T} } \right)d\tau } \\ & \nu \left( t \right) = - 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \otimes w^{T} } \right)d\tau } ;\chi \left( t \right) = 2\int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \otimes x^{T} } \right)d\tau } ;\pi \left( t \right) = \int\limits_{t}^{t + \delta t} {\left( {\bar{v}^{T} \otimes \bar{v}^{T} } \right)d\tau } \\ & \delta \left( t \right) = - 2\int\limits_{t}^{t + \delta t} {\left( {x^{T} \otimes \bar{v}^{T} } \right)d\tau } ;\rho \left( t \right) = \int\limits_{t}^{t + \delta t} {x^{T} Q_{k} } xd\tau \\ \end{aligned} $$

Consequently, we have:

$$ {\Phi}_{k} .\left[ {\begin{array}{*{20}l} {vec\left( {\bar{P}_{k} } \right)} \hfill \\ {vec\left( {K_{k + 1} } \right)} \hfill \\ {vec\left( {K_{k + 1} \bar{X}} \right)} \hfill \\ {vec\left( {G_{1;k} } \right)} \hfill \\ {vec\left( {G_{2;k} } \right)} \hfill \\ {vec\left( {G_{3;k} } \right)} \hfill \\ \end{array} } \right] = {\Psi}_{k} $$
(12)

Assumption 7:

For each \( k = 1;2; \ldots \) there exists an integer \( N \) such that, when \( k \ge N \), the following rank condition holds:

$$ rank\left( {{\Phi}_{k} } \right) = \frac{{n\left( {n + 1} \right)}}{2} + \left( {m + q} \right)n $$

By assumption 7, then

\( \left[ {vec\left( {\bar{P}_{k} } \right),vec\left( {K_{k + 1} } \right),vec\left( {K_{k + 1} \bar{X}} \right),vec\left( {G_{1;k} } \right),vec\left( {G_{2;k} } \right),vec\left( {G_{3;k} } \right)} \right]^{T} \) can be uniquely determined by:

$$ \left[ {vec\left( {\bar{P}_{k} } \right),vec\left( {K_{k + 1} } \right),vec\left( {K_{k + 1} \bar{X}} \right),vec\left( {G_{1;k} } \right),vec\left( {G_{2;k} } \right),vec\left( {G_{3;k} } \right)} \right]^{T} = \left( {{\Phi}_{k}^{T} {\Phi}_{k} } \right)^{ - 1} {\Phi}_{k}^{T} {\Phi}_{k} $$
(13)

Assumption 2 implies that \( B \) is in full column rank, so \( K_{k + 1} = B^{T} P_{k} \) is in full row rank, then:

$$ \bar{U}^{T} = G_{3;k} .K_{k + 1}^{T} .\left( {K_{k + 1} .K_{k + 1}^{T} } \right)^{ - 1} $$
(14)

Remark 3:

It is different from [2], we obtain the adaptive optimal control law for continuous time systems affected by external disturbances.

Now, we are ready to propose the following adaptive optimal control algorithm for practical online implementation.

figure a
$$ u_{k} = - K_{{j^{*} }} .\varepsilon + \left( {G_{{3;j^{*} }} .K_{{j^{*} + 1}}^{T} .\left( {K_{{j^{*} + 1}} .K_{{j^{*} + 1}}^{T} } \right)^{ - 1} } \right)^{T} \bar{v} $$

Theorem 3:

Let \( K_{0} \) be any stailizing feedback gain matrix, and let \( \left( {P_{k} ;K_{k + 1} ;\bar{U}} \right) \) be obtained from Algorithm 1. Then, under assumption 7, the following properties hold:

  1. (1)

    \( A - BK_{k} \) is Hurwitz.

  2. (2)

    \( P^{*} \le P_{k + 1} \le P_{k} \)

  3. (3)

    \( \lim\limits_{k \to \infty } K_{k} = K^{*} ; \lim\limits_{k \to \infty } P_{k} = P^{*} \)

Proof:

From (9); (10) one sees that the \( \left( {P_{k} ;K_{k + 1} } \right) \) obtained from (2); (3) must satisfy the condition (12). In addition, by assumption 7, it is unique. Therefore, the solution in theorem Kleinman 1968 is the same as the solution in (13) for any \( k \ge N \).

3 Simulation Results

$$ \begin{aligned} & \left[ {\begin{array}{*{20}c} {\dot{x}} \\ {\ddot{x}} \\ {\dot{\phi }} \\ {\ddot{\phi }} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 & 1 & 0 & 0 \\ 0 & {\frac{{ - \left( {I + ml^{2} } \right)b}}{{I\left( {M + m} \right) + Mml^{2} }}} & {\frac{{m^{2} gl^{2} }}{{I\left( {M + m} \right) + Mml^{2} }}} & 0 \\ 0 & 0 & 0 & 1 \\ 0 & {\frac{ - mlb}{{I\left( {M + m} \right) + Mml^{2} }}} & {\frac{{mgl\left( {M + m} \right)}}{{I\left( {M + m} \right) + Mml^{2} }}} & 0 \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ {\dot{x}} \\ \phi \\ {\dot{\phi }} \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 0 \\ {\frac{{I + ml^{2} }}{{I\left( {M + m} \right) + Mml^{2} }}} \\ 0 \\ {\frac{ml}{{I\left( {M + m} \right) + Mml^{2} }}} \\ \end{array} } \right]u \\ & y = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} 1 & 0 & 0 & 0 \\ \end{array} } \\ {\begin{array}{*{20}c} 0 & 0 & 1 & 0 \\ \end{array} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} x \\ {\dot{x}} \\ \phi \\ {\dot{\phi }} \\ \end{array} } \right] + \left[ {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \right]u \\ \end{aligned} $$
(15)

In this section, we apply the proposed adaptive optimal control law to an inverted pendulum on a cart described as (15) and Table 1. The simulation results in Fig. 1 show the convergence of matrix P and K of proposed algorithm and the tracking errors converge to zero.

Table 1. The parameters of inverted pendullum
Fig. 1.
figure 1

Convergence of matrix P, K and tracking errors

4 Conclusion

This paper presents an adaptive optimal control algorithm for practical online implementation of continuous-time systems with unknown system dynamics and external disturbance. The proposed algorithm pointed out the global asymptotical stability property and convergence properties. The theory and simulation results illustrate the effectiveness of proposed algorithm.