1 Introduction

In many practical systems, such as electric circuits [1], power systems [2], and spacecraft systems [3], slow and fast dynamics coexist and are coupled. These systems can be modeled as two-time-scale systems (TTSSs), where the optimal control objective of TTSSs is to minimize a predefined performance index concerning the control energy, state variables, or other indicators. Such problems have always been a hot topic in the control field for TTSSs. To address these issues, it is often required to deal with Hamilton–Jacobi–Bellman (HJB) equations, which are hard to solve analytically. As a result, methods focusing on finding an approximate solution have become popular to realize near-optimal control for practical needs [4, 5]. Another concern is that applying a traditional optimal control of single-time-scale systems directly to TTSSs may pose ill-conditioned numerical and high-dimensional issues. Therefore, some effective methods have been proposed [6,7,8,9,10,11,12,13,14]. For instance, when the fast dynamics decay rapidly, the controller can be designed solely based on the slow dynamics [6]. However, when the fast dynamics are not globally asymptotically stable, we can decompose the original problem into two reduced subproblems with separate time scales and then design a composite control strategy [7,8,9,10,11,12,13]. In [14], the authors proposed a feasible alternative to utilizing a fixed-point iteration method.

The optimal control methods for TTSSs mentioned above [7, 8, 10,11,12, 14] assume that the system’s dynamics are completely known. However, precisely parameterized system models are always hard to obtain in real applications. Based on the practical background of industrial thickening and flotation processes, model-free optimal operational [15, 16] and optimal tracking [17] control strategies have been suggested. Utilizing prior model identification, the unknown system dynamics are estimated through a multi-time-scale dynamic neural network, and then the near-optimal controller is designed [18, 19]. However, this method is still based on precise modeling. To directly obtain the optimal solution with unknown slow dynamics, an adaptive composite near-optimal control strategy is designed in the framework of adaptive dynamic programming [20]. Nevertheless, the controllers are continuously updated in real-time [15,16,17,18,19,20], posing implementation challenges due to the physical devices and communication limitations.

To overcome the difficulty of continuous updating, the event-triggered mechanism has been proposed so that the controller updates when necessary [21], enhancing the efficient usage of restricted resources. The event-triggered optimal control (ETOC) offers a tradeoff between performance and resources. Thus, a sub-optimal performance is obtained if selecting a less-resourced oriented controller, such as an event-triggered controller [22]. Furthermore, the upper bound of the performance index [23, 24], unmatched uncertainties [25] and double-channel communication [26] have also been studied within the scope of ETOC. The case of nonlinear stochastic systems is considered in [27]. However, related works on ETOC are mainly about single-time-scale systems. For example, considering the unknown dynamics, an event-triggered adaptive dynamic programming method has been proposed based on the policy and value iteration algorithms [28]. The adaptive neural network (NN) control or the adaptive fuzzy control is also a valid tool to deal with nonparametric uncertainties [29, 30]. Additionally, a model-free Q-learning-based algorithm has been designed where the Q-function is considered a parametrization for the state and the input [31]. Finally, some related works concern event-triggered mechanism (ETM) for TTSSs [6, 9, 32, 33], but optimality and unknown dynamics have not been simultaneously considered.

Based on the above discussions, ETOC for TTSSs with unknown dynamics is still an open issue to be investigated, as methods for single-time-scale systems [23,24,25, 31, 34] do not apply to TTSSs directly. The main challenge is that the errors of the slow and the fast states are always coupled, which is not the case in single-time scale systems.

This paper considers unknown slow dynamics and proposes a Q-learning ETOC algorithm for TTSSs. The Q-learning method is an adaptive dynamic programming method using the actor-critic structure. An action-depend function that estimates the expected performance under a given action and a given state through the collected input/output data selects the optimal action based on the previous observations. A significant advantage of such a strategy is that it does not require accurately modeling the systems. The main contribution of this paper is twofold: (1) Considering the event-triggered sub-optimal control for TTSSs, as existing event-triggered controllers for TTSSs mainly focus on stabilization [6, 9, 32, 33] and neglect performance indicators during the control process, or focus on using a continuous-time optimal controller without an event-triggering mechanism [18,19,20]. (2) Proposing an online approach without knowing the system’s slow dynamics, thus not requiring prior offline training [20] or identifying unknown dynamics [18, 19]. The optimal controller can be approximated directly with the state, input information, and partially known dynamics.

The remainder of the paper is organized as follows. Section 2 presents the background concerning the decomposition of slow and fast modes and the optimal control theory for TTSSs. Section 3 designs the event-triggered sub-optimal control with an unknown slow dynamic and analyzes the stability of the original system. Section 4 demonstrates the efficiency of the proposed strategy using numerical examples, and finally, Sect. 5 concludes this work.

Notation: \(I_{n}\) denotes the identity matrix of dimension n, \(\parallel \!\cdot \!\parallel \) represents the Euclidean norm for vectors or the spectral norm for matrices, \({{\mathcal {R}}}_{+}\) is the set of positive real numbers, and \({{\mathcal {N}}}_{+}\) denotes the set of positive integers. For a matrix \(A\in {{\mathcal {R}}}^{n \times m}\), \(vec(A) =[a^T_1, a^T_2, \dots , a^T_m ]^T\), where \(a_i \in R^n\) is the ith column of \(A,\ i=1,2,\dots m\). \(tr(A)=\sum _{i = 1}^{n}a_{ii}\) indicates the trace, \( \otimes \) indicates the Kronecker product, \({\bar{\lambda }}(R),\, {\underline{\lambda }}(R)\) denote the maximum and minimum eigenvalue of matrix R, respectively, and \(O(\cdot )\) is the order of magnitude defined in [10]: vector function \(f(t,\varepsilon )\in {{\mathcal {R}}}^n\) is said to be \(O(\varepsilon )\) over an interval \([t_1,t_2]\) if there exist positive constants k and \(\varepsilon ^*\) such that

$$\begin{aligned} \begin{aligned} \Vert f(t,\varepsilon )\Vert \le k\varepsilon \quad \forall \varepsilon \in [0,\varepsilon ^*],\ \forall t\in [t_1,t_2]. \end{aligned} \end{aligned}$$

2 Problem formation

Consider the following linear continuous-time TTSSs described by

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{x}}(t)=A_{11}x(t)+A_{12}z(t)+B_{1}u(t),\ x(t_{0})=x^{0} \\ \varepsilon {\dot{z}}(t)=A_{21}x(t)+A_{22}z(t)+B_{2}u(t),\ z(t_{0})=z^{0} \\ y(t)=C_1x(t)+C_2z(t) \end{array}\right. } \end{aligned}$$
(1)

where \(x(t) \in {{\mathcal {R}}}^{n_{1}}\) is the slow state vector, \(z(t)\in {{\mathcal {R}}}^{n_{2}}\) is the fast state vector, \(u(t)\in {{\mathcal {R}}}^{m}\) and \(y(t)\in {{\mathcal {R}}}^{p}\) are the input and output, respectively. \(\varepsilon \) represents a small positive singular perturbation parameter. \(A_{11}\in {{\mathcal {R}}}^{n_{1}\times n_{1}},\ A_{12}\in {{\mathcal {R}}}^{n_{1}\times n_{2}},\ A_{21}\in {{\mathcal {R}}}^{n_{2}\times n_{1}},\ A_{22}\in {{\mathcal {R}}}^{n_{2}\times n_{2}}, B_{1}\in {{\mathcal {R}}}^{n_{1}\times m},\ B_{2}\in {{\mathcal {R}}}^{n_{2}\times m}\).

The objective is adjusting u to minimize the performance index

$$\begin{aligned} \begin{aligned} J=\frac{1}{2}\int _{0}^{\infty }(y^{T}y+u^{T}Ru)dt, R>0. \end{aligned} \end{aligned}$$
(2)

Let \(A=\begin{bmatrix} A_{11} &{} A_{12}\\ \frac{1}{\varepsilon } A_{21} &{} \frac{1}{\varepsilon }A_{22} \end{bmatrix}, B=\left[ \begin{array}{c} {B_1} \\ \frac{1}{\varepsilon }B_2 \end{array}\right] , C=[C_1, C_2].\) \(X=\left[ \begin{array}{c} x \\ z \end{array}\right] \), and the initial state \(X(t_0)=X_0=\left[ \begin{array}{c} x^0 \\ z^0 \end{array}\right] \). From the optimal control theory, the exact optimal control for the original system (1) with a performance index (2) is

$$\begin{aligned} \begin{aligned} u_{opt}=-R^{-1}B^TPX, \end{aligned} \end{aligned}$$
(3)

where matrix P is the solution of the following Riccati equation

$$\begin{aligned} \begin{aligned} 0=-PA-A^TP+PBR^{-1}B^TP-C^TC, \end{aligned} \end{aligned}$$
(4)

and the corresponding performance index is \(J_{opt}=\frac{1}{2}X_0^TPX_0\).

However, the traditional method cannot solve (4) directly due to numerical illness. An effective way is to divide the original system into two separate time-scale subsystems and design the controllers separately. Additionally, to reduce resource waste and consider the triggering factor, the controller only updates at the triggering instants, i.e., \(\{t_x^k\}\) and \(\{t_z^k\}\), \(k \in {{\mathcal {N}}}_{+}\), for the slow and fast subsystems, respectively. The input is written as

$$\begin{aligned} \begin{aligned} u=u_{d}=K_1 {\hat{x}}+K_2 {\hat{z}}, \end{aligned} \end{aligned}$$
(5)

where \(K_1\in {{\mathcal {R}}}^{m\times n_{1}}\), \(K_2\in {{\mathcal {R}}}^{m\times n_{2}}\), and the sampled states \( {\hat{x}},\ {\hat{z}}\) are defined as

$$\begin{aligned} \begin{aligned} {\hat{x}}(t)=x(t_x^{k}), \quad \forall t\in [t_x^{k},t_x^{k+1}) \\ {\hat{z}}(t)=z(t_z^{k}), \quad \forall t\in [t_z^{k},t_z^{k+1}) \end{aligned} \end{aligned}$$

The errors between the sampled states and the real states are defined as

$$\begin{aligned} \begin{aligned} e_1(t)&={\hat{x}}(t)-x(t), \\e_2(t)&={\hat{z}}(t)-z(t). \end{aligned} \end{aligned}$$

For convenience, we denote \(E=(e_1^T \ e_2^T)^T,\ K=(K_1 \ K_2)\).

Assumption 1

\(A_{22}\) is nonsingular.

Under Assumption 1, the following Chang transformation [8] will be used to decouple the slow and the fast modes.

$$\begin{aligned} \begin{aligned} \left[ \begin{array}{c} \xi \\ \eta \end{array}\right] =T \left[ \begin{array}{c} x\\ z \end{array}\right] , \end{aligned} \end{aligned}$$
(6)

where \(T=\begin{bmatrix} I_{n}-\varepsilon ML &{} -\varepsilon M\\ L &{} I_{m} \end{bmatrix}\), \(\xi \) and \(\eta \) denote the decoupled slow and fast dynamics of the original system, respectively. \(L\in {{\mathcal {R}}}^{n_{2}\times n_{1}}\) and \(M\in {{\mathcal {R}}}^{n_{1}\times n_{2}}\) are the solutions of the following equations,

$$\begin{aligned} \begin{aligned}&\varLambda _{21}+\varepsilon L\varLambda _{11}-\varLambda _{22}L-\varepsilon L\varLambda _{12}L=0,\\&\varepsilon \varLambda _{11}M-\varepsilon \varLambda _{12}LM+\varLambda _{12}-M\varLambda _{22}-\varepsilon ML\varLambda _{12}=0, \end{aligned} \end{aligned}$$

where \(\varLambda _{11}=A_{11}+B_{1}K_{1},\,\varLambda _{12}=A_{12}+B_{1}K_{2},\,\varLambda _{21}=A_{21}+B_{2}K_{1},\,\varLambda _{22}=A_{22}+B_{2}K_{2}\).

Following the Chang transformation, the dynamics of the original system (1) with the input (5) can be transformed into

$$\begin{aligned} \begin{aligned} \left[ \begin{array}{c} {\dot{\xi }}\\ \varepsilon {\dot{\eta }} \end{array}\right] =&\begin{bmatrix} \varLambda _{11}-\varLambda _{12}L &{} 0\\ 0 &{}\varLambda _{22}+\varepsilon L\varLambda _{12} \end{bmatrix} \left[ \begin{array}{c} \xi \\ \eta \end{array}\right] \\&+ \left[ \begin{array}{c} B_{1}-M(B_2+\varepsilon LB_1)\\ B_{2}+\varepsilon LB_1\end{array}\right] KE. \end{aligned} \end{aligned}$$
(7)

Let

$$\begin{aligned} \begin{aligned} {\hat{\xi }} ={\hat{x}},\ {\hat{\eta }} =L{\hat{x}}+{\hat{z}},\ e_f={\hat{\eta }}-\eta =Le_1+e_2. \end{aligned} \end{aligned}$$
(8)

With transformation (6), the input \(u_d\) can be divided into two parts, \(u_{d}=u_{sd}+u_{fd}\), concerning \({\hat{\xi }}\) and \({\hat{\eta }}\), respectively

$$\begin{aligned} \begin{aligned} u_{sd}=&K_0{\hat{\xi }}=K_0\xi +K_0e_1+O(\varepsilon )\xi +O(\varepsilon )\eta , \\ u_{fd}=&K_2{\hat{\eta }}, \end{aligned} \end{aligned}$$
(9)

where \(K_0\) and \(K_2\) will be separately designed for the slow and fast subsystems, respectively. By substituting \(\xi \), \(\eta \) with x and z by (6) and compare the parameters in (5), we obtain \(K_1=K_0+K_2L\). From [7], \(L=A_{22}^{-1}(A_{21}+B_2K_0)+O(\varepsilon )\). Let \(A_0=A_{11}-A_{12}A_{22}^{-1}A_{21},\ B_0=B_1-A_{12}A_{22}^{-1}B_2,\, \varLambda _{0}=A_0+B_0K_0\). In [9] and [10], it is pointed out that for \(O(\varepsilon )\) approximations, \(\varLambda _{11}-\varLambda _{12}L\) can be approximated by \(\varLambda _{0}\) and L can be approximated by \(\varLambda _{22}^{-1}\varLambda _{21}\). Along with (6), the transformed variables \(\xi , \eta \) can be approximated by

$$\begin{aligned} \begin{aligned} \xi (t)&=x(t)+O(\varepsilon )x(t)+O(\varepsilon )z(t), \\ \eta (t)&=z(t)+\varLambda _{22}^{-1}\varLambda _{21}x(t)+O(\varepsilon )x(t). \end{aligned} \end{aligned}$$
(10)

Remark 1

Note that although the anti-diagonal elements of the system matrix are zero in (7), the errors \(e_1\) and \(e_2\) coexist in the dynamic of \(\xi \) and \(\eta \), respectively, posing difficulties in the design of ETM.

When the inputs are updated continuously, \(E=0\), and if Assumption 1 holds, the transformed system (7) can be approximated by two subsystems. Then, the original optimal control problem of system (1) with performance index (2) can be converted to two subproblems as follows [20].

(1) For the approximated slow subsystem

$$\begin{aligned} \begin{aligned}&{\dot{x}}_s=A_{0}x_s+B_{0}u_s,\\&y_s=C_0x_s+D_0u_s, \end{aligned} \end{aligned}$$
(11)

where \(C_0=C_1-C_2A_{22}^{-1}A_{21}, D_0=-C_2A_{22}^{-1}B_2\). The objective is adjusting \(u_s\) to minimize the performance index

$$\begin{aligned} \begin{aligned} J_s&=\int _{t_0}^{\infty }r_s(t)dt\\&=\frac{1}{2}\int _{t_0}^{\infty }(x_s^{T}C_0^TC_0x_s+2x_s^TC_0^TD_0u_s\\&\quad +u_s^TR_0u_s)dt. \end{aligned} \end{aligned}$$

The reward \(r_s(t)\) can also be written in a standard quadratic form

$$\begin{aligned} \begin{aligned} r_s(t) =\frac{1}{2}(x_s^{T}G_sx_s +u_{sc}^{T}R_0u_{sc}), \end{aligned} \end{aligned}$$
(12)

where \(R_0=R+D_0^{T}D_0, \ G_s=C_0^T(I-D_0R_0^{-1}D_0^T) C_0, \ u_{sc}=u_s+R_0^{-1}D_0^TC_0x_s\). Note that \(I-D_0R_0^{-1}D_0^T =(I+D_0R^{-1}N_0^T)^{-1}>0\).

We define the value function as

$$\begin{aligned} \begin{aligned} V_s(x_s(t))=\frac{1}{2}\int _{t}^{\infty }(x_s^{T}G_sx_s +u_{sc}^{T}R_0u_{sc})dt. \end{aligned} \end{aligned}$$
(13)

Let \(A_s=A_0-B_{0}R_0^{-1}D_0^TC_0\). From the optimal control theory [11], if the Riccati equation

$$\begin{aligned} \begin{aligned} A_s^TP_s+P_sA_s-P_sB_0R_0^{-1}B_0^TP_s+G_s=0 \end{aligned} \end{aligned}$$
(14)

has a positive semidefinite stabilizing solution \(P_s\), then the optimal control can be constructed by

$$\begin{aligned} \begin{aligned} u_{sc}^*=-R_0^{-1}B_0^TP_sx_s. \end{aligned} \end{aligned}$$

Accordingly, the optimal control of the slow subsystem (11) is

$$\begin{aligned} \begin{aligned} u_{s}^*=-R_0^{-1}(D_0^TC_0+B_0^TP_s)x_s. \end{aligned} \end{aligned}$$
(15)

Since the system (11) is linear, under the optimal control \(u^*_s\), the optimal value function can be denoted in a quadratic form, that is

$$\begin{aligned} \begin{aligned} V_s^*(x_s(t))=\frac{1}{2}x_s(t)^TP_sx_s(t). \end{aligned} \end{aligned}$$

For the approximated slow subsystem, we define Hamilton function as

$$\begin{aligned} \begin{aligned}&H_s\left( x_s(t),u_{s}(x_s),\frac{\partial V_s^*(x_s)}{\partial x_s(t)}\right) \\&\quad =\frac{\partial {V_s(x_s)}^T}{\partial x_s(t)}{\dot{x}}_s(t)+r_s(t). \end{aligned} \end{aligned}$$
(16)

Under the optimal control \(u^*_{s}(t)\), it has

$$\begin{aligned} \begin{aligned} H_s\left( x_s(t),u_{s}^*(x_s),\frac{\partial V_s^*(x_s)}{\partial x_s(t)}\right) =0. \end{aligned} \end{aligned}$$

From the form of \(u_s^*\) in (15), it can be deduced that

$$\begin{aligned} \begin{aligned}&H_s\left( \xi (t),u_{s}^*(\xi ),\frac{\partial V_s^*(\xi )}{\partial \xi (t)}\right) \\&\quad =\frac{1}{2}\xi ^T(P_sA_0+A_0^TP_s+O(\varepsilon ))\xi \\&\qquad +\xi ^TP_sB_0u_{s}^*+\frac{1}{2}{\xi ^T} C_0^TC_0 \xi +\xi ^TC_0^TD_0u_{s}^*\\&\qquad +\frac{1}{2}{u_{s}^*}^T R_0 u_{s}^* =0. \end{aligned} \end{aligned}$$
(17)

(2) For the approximated fast subsystem

$$\begin{aligned} \begin{aligned} \varepsilon {\dot{z}}_f&=A_{22}z_f+B_{2}u_f, \\y_f&=C_{2}z_f. \end{aligned} \end{aligned}$$
(18)

The objective is adjusting \(u_f\) to minimize the performance index

$$\begin{aligned} \begin{aligned} J_f=\frac{1}{2}\int _{0}^{\infty }(z_f^{T}G_f z_f+u_f^{T}Ru_f)dt, \end{aligned} \end{aligned}$$

where \(G_f=C_2^TC_2\).

If the Riccati equation

$$\begin{aligned} \begin{aligned} A_{22}^TP_f+P_fA_{22}+P_fB_2R^{-1}B_2^TP_f-G_f=0 \end{aligned} \end{aligned}$$
(19)

has a positive semidefinite stabilizing solution \(P_f\), then the optimal control of the fast subsystem (18) is \(u_{f}^*=-R^{-1}B_2^TP_fz_f\).

Since the system (18) is linear, under the optimal control \(u^*_f\), the optimal value function can be denoted in a quadratic form, that is

$$\begin{aligned} \begin{aligned} V_f^*(z_f(t))=\frac{1}{2}\varepsilon z_f(t)^TP_fz_f(t). \end{aligned} \end{aligned}$$

Similarly, the Hamilton function of the fast subsystem is defined as

$$\begin{aligned} \begin{aligned}&H_f\left( z_f(t),u_{f}(z_f),\frac{\partial V_f^*(z_f)}{\partial z_f}\right) \\&\quad =\frac{1}{\varepsilon }\frac{\partial V_f^*}{\partial z_f}{\dot{z}}_f+\frac{1}{2}{u_{f}}^T R u_{f}\\&\qquad +\frac{1}{2}z_f^T G_f z_f. \end{aligned} \end{aligned}$$

Then under the optimal control \(u^*_{f}(\eta )\)

$$\begin{aligned} \begin{aligned}&H_f\left( \eta (t),u_{f}^*(\eta ),\frac{\partial V_f^*(\eta )}{\partial \eta (t)}\right) \\&\quad =\frac{1}{2} \eta ^T(P_fA_{22} +A_{22}P_f +G_f\\&\qquad +O(\varepsilon ))\eta +\eta ^T P_f B_2u_f^*+\frac{1}{2}{u_f^*}^TRu_f^* =0. \end{aligned} \end{aligned}$$
(20)

With the obtained \(P_s\) and \(P_f\) by solving the above two subproblems, the optimal composite input of the original system (1) with the performance index (2), operating on the actual states x and z is

$$\begin{aligned} \begin{aligned}&u_{c}^*=-[(I-R^{-1}B_2^TP_fA_{22}^{-1}B_2)R_0^{-1}(D_0^TC_0\\&\quad +B_0^TP_s) +R^{-1}B_2^TP_fA_{22}^{-1}A_{21}]x-R^{-1}B_2^TK_2z. \end{aligned} \end{aligned}$$
(21)

Assumption 2

The triples \((A_0,B_0,C_0)\) and \((A_{22},\, B_2, C_2)\) are stabilizable-detectable.

Lemma 1

[7] For the dynamics of the transformed system (7) with \(E=0\) and approximated by reduced order subsystems (11) and (18), the approximated states satisfy \(\xi (t)=x_s(t)+O(\varepsilon )x+O(\varepsilon )z, \ \eta (t)=z_f(t)+O(\varepsilon )x\). If Assumptions 1 and 2 hold, then (14) and (19) have unique positive semi-definite solutions \(P_s\) and \(P_f\), respectively. The composite feedback control \(u_c^*\) in (21) is \(O(\varepsilon ) \) close to \(u_{opt}\) defined in (3). Further, \(J_c\), the value of the performance index obtained by (2) with the controller \(u_c^*\), satisfies

$$\begin{aligned} \begin{aligned} J_c=J_{opt}+O(\varepsilon ^2). \end{aligned} \end{aligned}$$

\(\square \)

A brief proof of Lemma 1 is presented in the appendix part.

From Lemma 1 and considering the triggering factor, the optimal control for the slow subsystem \(u_{sd}^*\) is according to (15)

$$\begin{aligned} \begin{aligned} u_{sd}^*(\xi )=-R_0^{-1}(D_0^TC_0+B_0^TP_s){\hat{\xi }}, \end{aligned} \end{aligned}$$
(22)

and the optimal control for the fast subsystem \( u_{fd}^*\) is

$$\begin{aligned} \begin{aligned} u_{fd}^*(\eta )=-R^{-1}B_2^TP_f{\hat{\eta }}, \end{aligned} \end{aligned}$$
(23)

Hence, the event-triggered sub-optimal composite control is

$$\begin{aligned} \begin{aligned} u_{d}^*=u_{sd}^*(\xi )+u_{fd}^*(\eta ). \end{aligned} \end{aligned}$$
(24)

For consistency with the standard quadratic form, we define \(u_{sdc}\) for the slow subsystem, which is an adaption form of \(u_{sc}\) in (12) as follows

$$\begin{aligned} \begin{aligned} u_{sdc}(\xi )=u_{sd}+R_0^{-1}D_0^TC_0\xi . \end{aligned} \end{aligned}$$

3 Main results

In this part, we consider the situation where the slow dynamics are unknown, i.e., \(A_{11},A_{12}, B_1\) in (1) are unknown. So the solution of the Riccati equation (14) cannot be obtained directly. To deal with this problem, a Q-learning method based on available data is proposed.

For convenience, denote \(B_0'=B_{1}-M(B_2+\varepsilon LB_1)\). Referring to the time-triggered Hamiltonian function (16), (7) and their approximations, the event-triggered Hamiltonian functions are rewritten as follows.

$$\begin{aligned} \begin{aligned}&H_s\left( \xi (t),u_{sd}(\xi ),\frac{\partial V_s^*(\xi )}{\partial \xi (t)},KE\right) \\&=\frac{\partial {V_s^*}^T(\xi )}{\partial \xi }{\dot{\xi }} +\frac{1}{2}\xi ^{T}C_0^TC_0\xi \\&\qquad +\xi ^TC_0^TD_0u_{sd}+\frac{1}{2}u_{sd}^TR_0u_{sd}\\&\quad =\frac{1}{2}\xi ^T(P_sA_0+A_0^TP_s+O(\varepsilon ))\xi +\xi ^TP_s(B_0K_0\xi \\&\qquad +B_0'KE)+\frac{1}{2}\xi ^{T}C_0^TC_0\xi \\&\qquad +\xi ^TC_0^TD_0u_{sd}+\frac{1}{2}u_{sd}^TR_0u_{sd}. \end{aligned} \end{aligned}$$

Combining (9) leads to

$$\begin{aligned}&H_s\left( \xi (t),u_{sd}(\xi ),\frac{\partial V_s^*(\xi )}{\partial \xi (t)},KE\right) \nonumber \\= & {} \frac{1}{2}\xi ^T(P_sA_0+A_0^TP_s +C_0^TC_0+O(\varepsilon ))\xi \nonumber \\&+\xi ^T(P_sB_0+C_0^TD_0)u_{sd} +\xi ^TP_sB_0'KE\nonumber \\&-\xi ^TP_sB_0K_0e_1+\frac{1}{2}u_{sd}^T R_0 u_{sd}\nonumber \\&+O(\varepsilon )\xi ^TP_sB_0M\eta . \end{aligned}$$
(25)

Similarly,

$$\begin{aligned} \begin{aligned}&H_f\left( \eta (t),u_{fd}(\eta ),\frac{\partial V_f^*(\eta )}{\partial \eta (t)},KE\right) \\&\quad =\frac{\partial V_f^*(\eta )}{\partial \eta }((A_{22} +B_2K_2 +\varepsilon L \varLambda _{12})\eta +(B_2\\&\qquad +\varepsilon LB_1)KE) +\frac{1}{2}{u_{fd}}^T R u_{fd}+\frac{1}{2}\eta ^T G_f \eta \\&\quad =\frac{1}{2} \eta ^T(P_fA_{22} +A_{22}P_f+G_f+O(\varepsilon ))\eta \\&\qquad +\eta ^T P_f B_2u_{fd}+\eta ^T P_fB_2K_1e_1\\&\qquad -\eta ^TP_fB_2K_2Le_1 +\frac{1}{2}{u_{fd}}^TRu_{fd}. \end{aligned} \end{aligned}$$

With the derivation results in (25), define a Q function

$$\begin{aligned} \begin{aligned}&Q(\xi ,u_{sd},KE)\\&\quad =V_s^*(\xi )+H_s(\xi ,u_{sd},\frac{\partial V_s^*}{\partial \xi },KE)\\&\qquad -H_s(\xi ,u^*_{s},\frac{\partial V_s^*}{\partial \xi })\\&\quad =\frac{1}{2}\xi ^T(P_s+P_sA_0+A_0^TP_s+C_0^TC_0+O(\varepsilon ))\xi \\&\qquad +\xi ^T(P_sB_0+C_0^TD_0)u_{sd}+\xi ^TP_sB_0'KE \\&\qquad -\xi ^TP_sB_0K_0e_1+\frac{1}{2}{u_{sd}}^T R_0 u_{sd}. \end{aligned}\nonumber \\ \end{aligned}$$
(26)

Let

$$\begin{aligned} \begin{aligned} \varPhi _s=&[\frac{1}{2}\xi ^T\otimes \xi ^T,u_{sd}^T\otimes \xi ^T, KE^T\otimes \xi ^T, \\&-(K_0^T e_1)^T\otimes \xi ^T],\\ W_c =&{\left[ \begin{array}{c} vec(P_s+P_sA_s+A_sP_s+C_0^TC_0+O(\varepsilon )) \\ vec(P_sB_0+C_0^TD_0)\\ vec(P_sB_0')\\ vec(P_sB_0) \end{array}\right] }. \end{aligned} \end{aligned}$$

Then (26) can be written as

$$\begin{aligned} \begin{aligned} Q(\xi ,u_{sd},KE)=\varPhi _s W_{c}+\frac{1}{2}u_{sd}^TR_0u_{sd}. \end{aligned} \end{aligned}$$

Since the vector \(W_c\) is an unknown constant vector, an approximated critic weight vector \({\hat{W}}_c\in {{\mathcal {R}}}^{n_{1}^2+3mn_1}\) is utilized. Accordingly, we define the following function \({\hat{Q}}\) to estimate Q.

$$\begin{aligned} \begin{aligned} {\hat{Q}}(x,u_{sd})=\varPhi _s{\hat{W}}_{c}+\frac{1}{2}u_{sd}^TR_0u_{sd}. \end{aligned} \end{aligned}$$

Similarly, the actor weight \({\hat{W}}_a \in {{\mathcal {R}}}^{n_{1}}\) is introduced to construct the controller

$$\begin{aligned} \begin{aligned} u_{sd}={\hat{W}}_{a}^T{\hat{\xi }}(t), \end{aligned} \end{aligned}$$
(27)

the tuning laws of \({\hat{W}}_{c}\) and \({\hat{W}}_{a}\) will be introduced later.

Remark 2

Note that the unknown parameters are all integrated in \(W_c\), so \(Q(\xi ,u_{sd},KE)\) is considered as two separate parts, i.e., unknown and available information. Additionally, the second part of \(W_c\), \(vec(P_sB_0+C_0^TD_0)\), contains all the unknown information needed in the optimal controller in (22).

The overall controller is

$$\begin{aligned} \begin{aligned} u_d={\hat{W}}_{a}^T{\hat{\xi }}(t)+K_2{\hat{\eta }}. \end{aligned} \end{aligned}$$
(28)

As for the fast subsystem, the optimal controller, i.e., \(K_2=-R^{-1}B_2^TP_f\), is chosen.

According to the performance index defined in (12) and the Hamilton function (17), under optimal control (24), it can be obtained that

$$\begin{aligned} \begin{aligned}&Q(\xi (t),u_{sd}^*)\\&\quad =Q(\xi (t-T),u_{sd}^*)-\frac{1}{2}\int ^t_{t-T}(\xi ^{T}C_0^TC_0\xi \\&\qquad +2\xi ^TC_0^TD_0u_s+u_s^TR_0u_s)d\tau , \end{aligned} \end{aligned}$$

where T denotes a small fixed time interval.

Define the error related to the critic network as

$$\begin{aligned} \begin{aligned} e_{c}&={\hat{Q}}(x(t),{\hat{u}}_{sd}(t))-{\hat{Q}}(x(t-T),{\hat{u}}_{sd}(t-T))\\&\quad +\frac{1}{2}\int ^t_{t-T}(\xi ^TG_s\xi +{u}_{sd}R{u}_{sd})d\tau \\&={\hat{W}}_c^T(\varPhi _s(t)-\varPhi _s(t-T))\\&\quad +\frac{1}{2}\int ^t_{t-T}(\xi ^TG_s\xi +{u}_{sdc}R{u}_{sdc})d\tau . \end{aligned} \end{aligned}$$

Define the error related to the actor as

$$\begin{aligned} \begin{aligned} e_{a}={\hat{W}}_{a}^T \xi (t_x^k)+R_0^{-1}{\hat{W}}_{cp}^T\xi (t_x^k), \end{aligned} \end{aligned}$$
(29)

where \({\hat{W}}_{cp}=W_c(n_1^2+1:n_1^2+mn_1)\). Let \(\sigma =\varPhi _s(t)-\varPhi _s(t-T)\), \(K_c=\frac{1}{2}\Vert e_c\Vert ^2\), \(K_a=\frac{1}{2}\Vert e_a\Vert ^2\).

The tuning law for the weight of the critic network is designed as

$$\begin{aligned} \begin{aligned} \dot{{\hat{W}}}_c&=-\alpha _{c} \frac{1}{(1+\sigma ^T\sigma )^2}\frac{\partial K_c}{\partial {\hat{W}}_c}\\&=-\alpha _{c} \frac{\sigma }{(1+\sigma ^T\sigma )^2}e_c^T, \end{aligned} \end{aligned}$$
(30)

where the convergence speed can be adjusted through \(\alpha _{c}\in {{\mathcal {R}}}_+\).

Due to the event-triggered mechanism, the controller only updates at triggering instants. The tuning law for the weight of the actor is designed as

$$\begin{aligned} \begin{aligned} \dot{{\hat{W}}}_a=&0, \\{\hat{W}}_a^{+}=&{\hat{W}}_a-\alpha _a \frac{1}{(1+\xi (t)^T\xi (t))}\frac{\partial K_a}{\partial {\hat{W}}_a}, \\=&{\hat{W}}_a-\alpha _a \frac{\xi }{(1+\xi (t)^T\xi (t))}e_a^T \quad t=t_x^k, \end{aligned} \end{aligned}$$
(31)

where \(\alpha _{a}\in {{\mathcal {R}}}_+\) presents the convergence speed.

Let \(V_1(\xi )=\frac{1}{2}\xi ^{T} P_s \xi , \ V_2(\eta )=\frac{1}{2}\varepsilon \eta ^T P_f \eta \),

$$\begin{aligned} \begin{aligned} {\dot{V}}_1(\xi )=&\frac{1}{2}{\dot{\xi }}^{T}P_s\xi +\frac{1}{2}\xi ^{T} P_s {\dot{\xi }}\\=&\frac{1}{2}\xi ^T(P_sA_0+A_0^TP_s+O(\varepsilon ))\xi \\&+\xi ^TP_sB_0K_0\xi +\xi ^TP_sB_0'KE. \end{aligned} \end{aligned}$$

Apply (17) to replace \(P_sA_0+A_0^TP_s\)

$$\begin{aligned} \begin{aligned} {\dot{V}}_1(\xi )&=-\frac{1}{2}\xi ^{T}(C_0^TC_0+O(\varepsilon ))\xi -\xi ^T(P_sB_0\\&\quad +C_0^TD_0 +O(\varepsilon ))u_s^*-\frac{1}{2}{u_s^*}^{T}R_0u_s^*\\&\quad +\xi ^TP_sB_0K_0\xi +\xi ^TP_sB_0'KE. \end{aligned} \end{aligned}$$

From (9), (12) and the equation \(\xi ^TP_sB_0=-{u_{sc}^*}^T R_0\), it can be rewritten as

$$\begin{aligned} \begin{aligned} {\dot{V}}_1(\xi )&=-\frac{1}{2}\xi ^T(G_s+O(\varepsilon ))\xi -\frac{1}{2}{u_{sc}^*}^TR_0u_{sc}^*\\&\quad -\xi ^TP_sB_0u_s^*+\xi ^TP_sB_0(u_{sd}-K_0e_1\\&\quad +O(\varepsilon )M\eta )+\xi ^TP_sB_0'KE\\&=-\frac{1}{2}\xi ^T(G_s+O(\varepsilon ))\xi -\frac{1}{2}{u_{sc}^*}^TR_0u_{sc}^*\\&\quad +{u_{sc}^*}^T R_0 u_{s}^*-{u_{sc}^*}^T R_0 u_{sd}-\xi ^TP_sB_0K_0e_1\\&\quad +\xi P_sB_0'KE+O(\varepsilon )\xi ^TP_sB_0M\eta \\&=-\frac{1}{2}\xi ^T(G_s+O(\varepsilon ))\xi +\frac{1}{2}({u_{sc}^*}-u_{sdc})^T R_0\\&\quad \times ({u_{sc}^*} -u_{sdc}) -\frac{1}{2}u_{sdc}^T R_0 u_{sdc}\\&\quad -\xi ^TP_sB_0K_0e_1+\xi P_sB_0'KE\\&\quad +O(\varepsilon )\xi ^TP_sB_0M\eta . \end{aligned}\nonumber \\ \end{aligned}$$
(32)

The performance index of the fast subsystem satisfies

$$\begin{aligned} \begin{aligned} {\dot{V}}_2(\eta )&=\frac{1}{\varepsilon }\frac{\partial V_2}{\partial \eta } ((A_{22}+O(\varepsilon )) \eta +(B_2+O(\varepsilon )) K_2\eta \\&\quad +B_2KE).\\&=\frac{1}{2}\eta ^T(P_fA_{22}+A_{22}^TP_f+O(\varepsilon ))\eta \\&\quad +\eta ^TP_fB_2K_2\eta +\eta ^TP_fB_2KE. \end{aligned} \end{aligned}$$

Similarly, for \(V_2\), apply (20) to take the place of \(P_fA_{22}+A_{22}^TP_f\).

$$\begin{aligned} \begin{aligned} {\dot{V}}_2&=-\frac{1}{2}\eta ^T(G_f+O(\varepsilon ))\eta \\&\quad +\frac{1}{2}(u_{f}^*-u_{fd})^T R (u_{f}^*-u_{fd})\\&\quad -\frac{1}{2} u_{fd}^T Ru_{fd} +\eta ^TP_fB_2K_0e_1\\&\le -\frac{1}{2}{\underline{\lambda }}(G_f+O(\varepsilon ))\Vert \eta \Vert ^2\\&\quad +\frac{1}{2}{\bar{\lambda }}(R)\Vert R^{-1}B_2^TP_f\Vert \Vert e_f\Vert ^2\\&\quad - \frac{1}{2}{\underline{\lambda }}(R)\Vert u_{fd}\Vert ^2+\eta ^TP_fB_2K_0e_1. \end{aligned} \end{aligned}$$
(33)

Since \(e_f=Le_1+e_2\), then

$$\begin{aligned} \begin{aligned}&{\dot{V}}_1+{\dot{V}}_2 \\&\le -\frac{1}{2}\xi ^T(G_s+O(\varepsilon ))\xi -\frac{1}{2}{\underline{\lambda }}(R_0)\Vert u_{sdc}\Vert ^2\\&+\left( \frac{1}{2}{\bar{\lambda }}(R_0)\Vert R_0^{-1}B_0^TP_s\Vert ^2\right. \\&\left. +{\bar{\lambda }}(R)\Vert L\Vert ^2\Vert R^{-1}B_2^TP_f\Vert ^2\right) \Vert e_1\Vert ^2\\&+\xi ^T(P_sB_0'K_1-P_sB_0K_0)e_1+\eta ^TP_fB_2K_0e_1\\&-\frac{1}{2}\eta ^T(G_f+O(\varepsilon ))\eta -\frac{1}{2}{\underline{\lambda }}(R)\Vert u_{fd}\Vert ^2\\&+{\bar{\lambda }}(R)\Vert R^{-1}B_2^TP_f\Vert ^2\Vert e_2\Vert ^2+\xi ^TP_sB_0'K_2e_2. \end{aligned} \end{aligned}$$
(34)

When \(\varepsilon \) is small enough, there exist \(G_s-O(\varepsilon )>G_s'>0, G_f-O(\varepsilon )>G_f'>0\). Let

$$\begin{aligned} \begin{aligned} \left\Vert {\tilde{W}}_a \right\Vert \le l_1, \ \left\Vert -R_0^{-1}(B_0^TP_s+D_0^TC_0) \right\Vert&\le l_2, \\ \left\Vert R_0^{-1}B_0^TP_s \right\Vert \le l_3,\ \left\Vert R_0^{-1}{B_0'}^TP_s \right\Vert&\le l_4, \\ \left\Vert R^{-1}B_2^TP_f \right\Vert \le l_5, \left\Vert L\right\Vert&\le l. \end{aligned} \end{aligned}$$
(35)

By applying following inequalities to (34)

$$\begin{aligned} \begin{aligned}&\Vert \xi \Vert \Vert e_1\Vert \le \frac{1}{2\alpha _1}\Vert \xi \Vert ^2+\frac{\alpha _1}{2}\Vert e_1\Vert ^2,\\&\Vert \xi \Vert \Vert e_2\Vert \le \frac{1}{2\alpha _2}\Vert \xi \Vert ^2+\frac{\alpha _2}{2}\Vert e_2\Vert ^2,\\&\Vert \eta \Vert \Vert e_1\Vert \le \frac{1}{2\alpha _3}\Vert \eta \Vert ^2+\frac{\alpha _3}{2}\Vert e_1\Vert ^2,\\&\Vert e_f\Vert ^2=\Vert Le_1+e_2\Vert ^2\le 2l^2\Vert e_1\Vert ^2+2\Vert e_2\Vert ^2. \end{aligned} \end{aligned}$$

where \(\alpha _1, \alpha _2, \alpha _3\) and \(\alpha _4\) are parameters to be designed later.

Choose \(c_1,\ c_2,\ c_3\) and \(c_4\) which satisfies

$$\begin{aligned} \begin{aligned} c_1\le&\frac{1}{2}{\underline{\lambda }}(G_s')-l_1{\bar{\lambda }}(R_0)\\&-\frac{{\bar{\lambda }}(R_0)l_3\sqrt{l_1^2+l_2^2}}{2\alpha _1} -\frac{{\bar{\lambda }}(R_0)l_4l_5}{2\alpha _2}\\&-\frac{{\bar{\lambda }}(R_0)l_4\sqrt{l_1^2+l_2^2+l^2l_5^2}}{\alpha _1},\\ c_2\ge&(l_1^2+l_2^2){\bar{\lambda }}(R_0)+\alpha _3 {\bar{\lambda }}(R)l_5\sqrt{l_1^2+l_2^2}+{\bar{\lambda }}(R)l^2l_5\\&+\alpha _1{\bar{\lambda }}(R_0)l_4\sqrt{l_1^2+l_2^2+l^2l_5^2}\\&+\frac{\alpha _1({\bar{\lambda }}(R_0)l_3\sqrt{l_1^2+l_2^2}}{2}, \\c_3\le&\frac{1}{2}{\underline{\lambda }}(G_f')-\frac{{\bar{\lambda }}(R)l_5\sqrt{l_1^2+l_2^2}}{\alpha _3}, \\c_4\ge&l_5{\bar{\lambda }}(R)+\frac{\alpha _2{\bar{\lambda }}(R_0)l_4l_5}{2}. \end{aligned}\nonumber \\ \end{aligned}$$
(36)

\(c_1,\ c_2,\ c_3,\ c_4\) are assumed positive and \(c_2,\ c_3\), and \(c_4\) can be positive by adjusting the value of \(\alpha _1, \alpha _2, \alpha _3\), and \(\alpha _4\). The positivity of \(c_1\) will be easy to obtain after proving that \(l_1\) tends to zero as \({\tilde{W}}_a\) tends to the origin.

\(\mathbf {Theorem\ 1}.\) Consider the two-time-scale system (1) and suppose that Assumptions 1, 2, and (35) hold. The weights of the networks tune as (29)-(31) and the controller is designed as (28). As long as the corresponding controller updates when either of the following conditions is satisfied,

$$\begin{aligned} \left\Vert e_1 \right\Vert ^2\ge & {} \frac{2c_1\Vert \xi \Vert ^2+{\underline{\lambda }}(R_0)\Vert u_{sdc}\Vert ^2}{2c_2}, \end{aligned}$$
(37)
$$\begin{aligned} \left\Vert e_2 \right\Vert ^2\ge & {} \frac{2c_3\Vert \eta \Vert ^2+{\underline{\lambda }}(R)\Vert u_{fd}\Vert ^2}{2c_4}, \end{aligned}$$
(38)

then, there exists \(\varepsilon ^{*}>0\), such that for all \(\varepsilon \in (0,\varepsilon ^*]\) and any bounded initial conditions \(x^0,\, z^0\), the states of (1) converge asymptotically to the origin, and the controller is sub-optimal. In addition, the optimal cost is guaranteed by

$$\begin{aligned} \begin{aligned} J(X_0,u_{d}^*)&= J(X_0,u_{opt}) +\frac{1}{2}\int _0^\infty (u_{d}^*-u_{opt})^T\\&\quad \times R(u_{d}^*-u_{opt}^*)dt. \end{aligned} \end{aligned}$$
(39)

Figure 1 illustrates a flowchart of the implementation process, where (37) and (38) are the two triggering conditions for the slow and fast subsystems, respectively. Once they are triggered, the input controller will be updated.

Fig. 1
figure 1

Flowchart of the proposed method. ((37) and (38) are the triggering conditions for the slow and the fast dynamics, respectively)

Proof

First, we prove that the system is asymptotically stable. According to previous analysis,

$$\begin{aligned}&{\dot{V}}_1(\xi )\\= & {} -\frac{1}{2}\xi ^TG_s'\xi +\frac{1}{2}({u_{sc}^*}-u_{sdc})^T R_0 ({u_{sc}^*}-u_{sdc})\\&-\frac{1}{2}u_{sdc}^T R_0 u_{sdc}-\xi ^TP_sB_0{\hat{W}}_a^Te_1\\&+O(\varepsilon ))\xi ^TP_sB_0M\eta +\xi P_sB_0'KE\\\le & {} -\frac{1}{2}{\overline{\lambda }}(G_s')\left\Vert \xi \right\Vert ^2+ \frac{1}{2}{\overline{\lambda }}(R_0)\left\Vert {\tilde{W}}_a^T\xi -{\hat{W}}^T_a e_1 \right\Vert ^2\\&- \frac{1}{2}{\underline{\lambda }}(R_0)\left\Vert u_{sdc} \right\Vert ^2 -\xi ^TP_sB_0{\hat{W}}_a^Te_1\\&+O(\varepsilon )\xi ^TP_sB_0M\eta +\xi P_sB_0'KE. \end{aligned}$$

The performance index of the fast subsystem satisfies

$$\begin{aligned} \begin{aligned} {\dot{V}}_2(\eta )\le&-\frac{1}{2}{\underline{\lambda }}(G_f')\Vert \eta \Vert ^2- \frac{1}{2}{\underline{\lambda }}(R)\Vert u_{fd}\Vert ^2\\&+\frac{1}{2}l_5{\bar{\lambda }}(R)\Vert e_f\Vert ^2+\eta ^TP_fB_2{\hat{W}}_a^Te_1. \end{aligned} \end{aligned}$$

Note that the sum of terms concerning errors in \({\dot{V}}_1,\ {\dot{V}}_2\) satisfies

$$\begin{aligned} \begin{aligned}&-\xi ^TP_sB_0K_0e_1+\xi P_sB_0'KE+\eta ^TP_fB_2{\hat{W}}_a^Te_1 \\&\quad \le \left( {\bar{\lambda }}(R_0)l_3\sqrt{l_1^2+l_2^2}+2{\bar{\lambda }}(R_0)l_4\sqrt{l_1^2+l_2^2+l^2l_5^2}\right) \\&\qquad \Vert \xi \Vert \Vert e_1\Vert \\&\qquad +{\bar{\lambda }}(R_0)l_4l_5\Vert \xi \Vert \Vert e_2\Vert +2{\bar{\lambda }}(R)l_5\sqrt{l_1^2+l_2^2}\Vert \eta \Vert \Vert e_1\Vert . \end{aligned} \end{aligned}$$

Then (34) can be rewritten as

$$\begin{aligned} \begin{aligned} {\dot{V}}_1+{\dot{V}}_2 \le&-c_1\Vert \xi \Vert ^2-\frac{1}{2}{\underline{\lambda }}(R_0)\Vert u_{sdc}\Vert ^2+c_2\Vert e_1\Vert ^2\\&-c_3\Vert \eta \Vert ^2-\frac{1}{2}{\underline{\lambda }}(R)\Vert u_{fd}\Vert ^2+c_4\Vert e_2\Vert ^2. \end{aligned} \end{aligned}$$

If the system triggers when (37) or (38) is satisfied, then \( {\dot{V}}_1+{\dot{V}}_2<0\), and thus the states of the original system asymptotically converge to the origin.

Next, we prove that the parameters are convergent.

We define the weight estimation errors as

$$\begin{aligned} \begin{aligned}&{\tilde{W}}_a=-(P_sB_0+C_0^TD_0)R_0^{-1}-{\hat{W}}_a,\\&{\tilde{W}}_c=W_c-{\hat{W}}_c,\\&{\tilde{W}}_{cp}=P_sB_0+C_0^TD_0-{\hat{W}}_{cp}. \end{aligned} \end{aligned}$$

Then

$$\begin{aligned} \dot{{\tilde{W}}}_c= & {} -\alpha _{c} \frac{\sigma \sigma ^T}{(1+\sigma ^T\sigma )^2}{\tilde{W}}_c, \nonumber \\ {\tilde{W}}_a^{+}= & {} {\tilde{W}}_a-\alpha _{a} \frac{\xi (t)\xi ^T(t)}{1+\xi ^T(t)\xi (t)}({\tilde{W}}_a+{\hat{W}}_{cp}R_0^{-1}). \end{aligned}$$
(40)

We set \(V_3({\tilde{W}}_a)=\frac{1}{2}tr\{{\tilde{W}}_a^T{\tilde{W}}_a\}\) and define \( \bigtriangleup V_3=V_3({\tilde{W}}_a^{+})-V_3({\tilde{W}}_a)\). Then from the dynamic of \({\tilde{W}}_a\) in (40)

$$\begin{aligned} \begin{aligned}&\bigtriangleup V_3\\&=\alpha _a tr\left( -{\tilde{W}}_a^T\frac{\xi \xi ^T}{1+\xi ^T\xi }{\tilde{W}}_a-{\tilde{W}}_a^T\frac{\xi \xi ^T}{1+\xi ^T\xi }{\tilde{W}}_{cp}R_0^{-1} \right. \\&\quad +\frac{\alpha _a}{2}{\tilde{W}}_a^T\frac{(\xi \xi ^T)^2}{(1+\xi ^T\xi )^2}{\tilde{W}}_a\\&\quad +\alpha _a{\tilde{W}}_a^T\frac{(\xi \xi ^T)^2}{(1+\xi ^T\xi )^2}{\tilde{W}}_{cp}R_0^{-1} \\&\quad \left. +\frac{\alpha _a}{2}({\tilde{W}}_{cp}R_0^{-1})^T\frac{(\xi \xi ^T)^2}{(1+\xi ^T\xi )^2}{\tilde{W}}_{cp}R_0^{-1}\right) . \end{aligned} \end{aligned}$$

Since \(tr(AB)=tr(BA)\), it gets

$$\begin{aligned} \begin{aligned} \bigtriangleup V_3&= \frac{\alpha _a\xi ^T}{1+\xi ^T\xi } (-{\tilde{W}}_a{\tilde{W}}_a^T-{\tilde{W}}_{cp}R_0^{-1}{\tilde{W}}_a^T\\&\quad +\frac{\alpha _a}{2}\frac{\Vert \xi \Vert ^2}{1+\Vert \xi \Vert ^2}({\tilde{W}}_a{\tilde{W}}_a^T +2{\tilde{W}}_{cp}R_0^{-1}{\tilde{W}}_a\\&\quad +{\tilde{W}}_{cp}R_0^{-1}({\tilde{W}}_{cp}R_0^{-1})^T))\xi . \end{aligned} \end{aligned}$$

By utilizing the following inequalities

$$\begin{aligned} \begin{aligned} \frac{\Vert \xi \Vert ^2}{1+\Vert \xi \Vert ^2}&\le 1, \\ -\xi ^T{\tilde{W}}_{cp}R_0^{-1}{\tilde{W}}_a^T\xi&\le \frac{\xi ^T}{2}\left( \frac{1}{\beta _b}{\tilde{W}}_{cp}{\tilde{W}}_{cp}^T\right. \\&\left. \quad +\beta _b{\tilde{W}}_a(R_0^{-1})^{T}R_0^{-1}{\tilde{W}}_a^T\right) \xi , \end{aligned} \end{aligned}$$

where \(\beta _b\) can be arbitrary positive real number.

Then, it can be deduced that

$$\begin{aligned} \begin{aligned} \bigtriangleup V_3&\le \frac{\alpha _a}{1+\xi ^T\xi } \xi ^T\left( -\left( 1-\frac{\Vert 1-\alpha _a\Vert \beta _b{\bar{\lambda }}(R_0^{-1})^2}{2} \right. \right. \\&\left. \left. \quad -\frac{\alpha _a}{2}\right) {\tilde{W}}_a{\tilde{W}}_a^T+\left( \frac{\Vert 1-\alpha _a\Vert }{2\beta _b}+\frac{\alpha _a}{2}{\bar{\lambda }}(R_0^{-1})^2\right) {\tilde{W}}_{cp}{\tilde{W}}_{cp}^T \right) \xi . \end{aligned} \end{aligned}$$

Choosing proper \(\alpha _a, \beta _b\) to ensure \(1-\frac{\Vert 1-\alpha _a\Vert \beta _b{\bar{\lambda }}(R_0^{-1})^2}{2} -\frac{\alpha _a}{2}> 0\), then \(\bigtriangleup V_3< 0\) when \({\tilde{W}}_{a}\) is beyond a certain range than \({\tilde{W}}_{cp}\). Since \({\tilde{W}}_{cp}\), as part of \({\tilde{W}}_{c}\), converges to the origin, then \(\xi ^T(\frac{\Vert 1-\alpha _a\Vert }{2\beta _b}+\frac{\alpha _a}{2}{\bar{\lambda }}(R_0^{-1})^2){\tilde{W}}_{cp}{\tilde{W}}_{cp}^T)\xi \) tends to converge to the origin. So \(\hat{W_a}\) is near optimal. In addition, it implies that \(l_1\) becomes smaller and \(c_1>0\) in (36) is easier to satisfied.

Third, we prove that Zeno behavior is excluded.

For the slow mode, from (37), the triggering condition satisfies \(\frac{\Vert e_1(t)\Vert }{\Vert \xi (t)\Vert }\ge \frac{c_1}{c_2}\). For convenience, we denote \(s(t)=\frac{\Vert e_1(t)\Vert }{\Vert \xi (t)\Vert }\). From the dynamics of the subsystems (7), since the parameters are bounded, \({{\mathcal {F}}}_1\in {{\mathcal {R}}}_+\) satisfy \( \Vert {\dot{\xi }}(t)\Vert \le {{\mathcal {F}}}_1(\Vert \xi (t)\Vert +\Vert e_1(t)+\Vert e_2(t)\Vert )\) and \( \Vert {\dot{e}}_1(t)\Vert \le {{\mathcal {F}}}_1(\Vert \xi (t)\Vert +\Vert e_1(t)\Vert +\Vert e_2(t)\Vert )\). Then, it is further deduced that

$$\begin{aligned} \begin{aligned}&\frac{ds}{dt}\le {{\mathcal {F}}}_1(1+s)^2+ {{\mathcal {F}}}_1\left( 1+\frac{\Vert {e}_1(t)\Vert }{\Vert \xi \Vert }\right) \frac{\Vert {e}_2(t)\Vert }{\Vert \xi \Vert }, \\&\quad t \in [t_{x}^k,t_{x}^{k+1}). \end{aligned} \end{aligned}$$

As the states are convergent, it has \({{\mathcal {C}}}_1\in {{\mathcal {R}}}_+\) satisfying \({{\mathcal {C}}}_1\ge {{\mathcal {F}}}_1(1+\frac{\Vert {e}_1(t)\Vert }{\Vert \xi \Vert })\frac{\Vert {e}_2(t)\Vert }{\Vert \xi \Vert }\), then \(\frac{dt}{ds}\le \frac{1}{{{\mathcal {F}}}_1(1+s)^2+{{\mathcal {C}}}_1}\). Since s(t) changes from 0 to a number larger than \(\frac{c_1}{c_2}\) between the two triggering instants, then the triggering interval \(\tau _s=t_{x}^{k+1}-t_{x}^k>\int _0^{\frac{c_1}{c_2}}\frac{1}{{{\mathcal {F}}}_1(1+s)^2+{{\mathcal {C}}}_1}ds>0\).

Similarly, for the fast mode, \({{\mathcal {F}}}_2,{{\mathcal {C}}}_2\in {{\mathcal {R}}}_+\) satisfies \( \Vert {\dot{\eta }}(t)\Vert \le {{\mathcal {F}}}_2(\Vert \eta (t)\Vert +\Vert e_2(t)\Vert )\), \( \Vert {\dot{e}}_2(t)\Vert \le {{\mathcal {F}}}_2(\Vert \xi (t)\Vert +\Vert e_2(t)\Vert )\) and \({{\mathcal {C}}}_2\ge {{\mathcal {F}}}_2(1+\frac{\Vert {e}_2(t)\Vert }{\Vert \eta \Vert })\frac{\Vert {e}_1(t)\Vert }{\Vert \eta \Vert }\), the triggering interval \(\tau _f=t_{z}^{k+1}-t_{z}^k>\int _0^{\frac{c_3}{c_4}} \frac{1}{{{\mathcal {F}}}_2(1+s)^2+{{\mathcal {C}}}_2}ds>0\). Thus Zeno behavior is excluded.

Regarding the performance index employing the proposed event-triggered mechanism, [24] provides a similar proof. Under a precise optimal control \(u_{opt}\) in (3) and the sub-optimal event-triggered control \(u_{d}^*\), the performance indexes have the relation described in (39).

This completes the proof. \(\square \)

4 Simulation

This part conducts three simulation examples involving a four-dimensional system, a practical motor example, and a comparison study. The slow mode dynamics are assumed unknown, which means that \(A_{11}, A_{12}, B_1\) are unavailable.

(1)The effectiveness of the proposed method

Consider a four-dimensional system, the parameters are \(A_{11}={\left[ \begin{array}{cc} 0 &{} 0.4 \\ 0 &{} 0 \end{array} \right] }, A_{12}= {\left[ \begin{array}{cc} 0 &{} 0 \\ 0.345 &{} 0 \end{array} \right] }, A_{21}={\left[ \begin{array}{cc} 0 &{} -0.524 \\ 0 &{} 0 \end{array} \right] }, A_{22}={\left[ \begin{array}{cc} -0.465 &{} 0.262 \\ 0 &{} -1 \end{array} \right] },\ B_{1}\!=\! {\left[ \begin{array}{cc}0&0\end{array}\right] }^{T}\!, \ B_{2}= {\left[ \begin{array}{cc}0&1\end{array}\right] }^{T}\). We set \(\varepsilon =0.1\), \(C_1=C_2=\left[ 1\ 1\right] \), \(R=1\). The initial states are \(x(0)={\left[ \begin{array}{cc}6&5\end{array}\right] }^{T},\ z(0)={\left[ \begin{array}{cc}4&2\end{array}\right] }^{T}\) and \(\alpha _a=0.5, \alpha _c=2\). The trajectories of the states, triggering time, and inputs are depicted in Fig. 2. Furthermore, to highlight the approximate optimality of the proposed method, the optimal control is also presented in Fig. 2. By solving the Riccati equations of the slow and fast subsystems (14), and (19) respectively, we obtain \(u_c^*=[-2.0000 -2.5632]x+[-0.5953 -0.5205]z\) from (21). The solid line depicts the case under the proposed event-triggered control with unknown slow dynamics, while the dotted line illustrates that under optimal control. From Fig. 2, we observe that the states converge asymptotically to the origin. The minimum triggering intervals for the slow and fast dynamics are \(0.6730\ \mathrm {s}\) and \(0.0320\ \mathrm {s}\), respectively, and the average triggering intervals are \(1.0288\ \mathrm {s}\) and \(0.3182\ \mathrm {s}\).

Fig. 2
figure 2

State trajectories and inputs

(2) A DC-motor example

In this simulation case, a practical DC-motor example is considered, as described in [20], where the electromagnetic transient is regarded as a fast mode, while the torque response is rather slow. The dynamics of such systems can be modeled as

$$\begin{aligned} \begin{aligned} J_{m}\frac{d\omega }{dt}=&-b\omega +k_{m}i, \\ L_i\frac{di}{dt}=&-k_b\omega -R_0i+u, \end{aligned} \end{aligned}$$

where \(J_m\) is the equivalent moment of inertia, \(\omega \) is the angular speed, b is the equivalent viscous friction coefficient, and \(k_m\) and \(k_b\) are the torque and back electromotive force, respectively. iu and \(R_0\) represent the armature current, voltage, and resistance, respectively, \(L_i\) is a small inductance constant, and the system parameters are as follows: \(J_m = 0.093\ \mathrm{{kgm^2}}, k_m = 0.7274\ \mathrm{{Nm}}, k_b = 0.6\ \mathrm{{vs/rad}}, L_i = 0.006\ \mathrm {H}, R_0 = 0.6\ {\Omega }, b = 0.008\). Rewriting the latter formula in the form of (1), the parameters are \(\varepsilon =0.06\), \(A_{11}=-0.086, A_{12} =7.82, A_{21} =-0.6, A_{22}=-0.6, B_1=0, B_2=1\). Let \(C_1=2, C_2=1.\) We set \(\alpha _a=0.5, \alpha _c=20\). The calculated optimal control for the slow mode is \(u_{sc}^*(\xi )=-0.4741\xi \), and the initial values of the states are chosen as \(x_1(0)=6, x_2(0)=2\). At \(t=0.75\ \mathrm {s}, {\hat{W}}_a=-0.4769\). The error between the actor network weight with the proposed method and the one using the optimal control, tunes as Fig. 3 shows, indicating that \({\tilde{W}}_a\) is near the origin after \(0.75\ \mathrm {s}\), suggesting that the control gain is near-optimal. The triggering condition is also depicted in Fig. 3.

Fig. 3
figure 3

The error of the actor network weight and the triggering condition

Table 1 Data of the two methods
Fig. 4
figure 4

Comparison of the states, triggering time, and performance index

(3) Comparing ETM and the proposed method

This simulation aims to compare our work and [9]. The system parameters are the ones in the first case. In [9], the controller is also in the form of (5) where the control gains are \(K_1= [-0.6861 -\!1.1784]\), \(K_2=-R^{-1}B_2^{T}P_f=[-0.5953 -\!0.5205]\). Accordingly, \(K_0=[-0.2\ -0.1]\). The triggering condition is designed as \(\Vert e_{i}(t)\Vert \ge c_0+c_{1}e^{-{\alpha _{0}} t} \) where \(i=1,2\), \(e_{1}(t)\) and \(e_{2}(t)\) denote the error between the sampled and the real states of the slow and fast modes, respectively. Here \({\alpha _0}=0.1, c_0=0.05\) and \(c_{1}=0.2\) as [9].

Regarding the proposed method, to be consistent with the basic parameters in [9], the initial control gains, \(K_0, K_1\), and \(K_2\) are equal. Table 1 compares the triggering numbers, minimum triggering intervals and average triggering intervals between the proposed method and [9] for \(90\ \mathrm {s}\). The trajectories of the states and the triggering time are depicted in Fig. 4 along with the performance index. The latter figure highlights that in \(90\ \mathrm {s}\), although the triggering numbers of the proposed method are much smaller than ETM in [9], the suggested method has a lower cost. In [9], because of the triggering mechanisms, after some time, the threshold of the triggering condition has a minor change. The states influence the error’s rate of change, posing a critical impact on the triggering frequency (Fig. 3), while in the proposed method, the thresholds change together with the states. The triggering frequency in [9] is relatively high at the beginning and reduces when time goes to infinity, while for the proposed method, the change of the triggering frequency is rather gentle.

5 Conclusion

This paper presents an event-triggered sub-optimal control for TTSSs with unknown slow dynamics. Based on the singular perturbation theory, a composite event-triggered controller is designed. Additionally, we utilize a Q-function as a critic network in the form of the product of available information and unknown parameters. The controller is constructed using an actor network and is updated to be sub-optimal under the proposed event-triggered mechanism. Furthermore, we prove that the triggering time intervals are strictly positive, and thus, Zeno behavior is excluded. Moreover, the system’s global asymptotic stability is guaranteed. Future work will address TTSSs with actuator failure or fully unknown parameters for more general applications.