1 Introduction

Mathematical models of practical plants, in general, are hardly determined with appropriate accuracy. To design the controller without a mathematical model of a controlled plant in discrete-time domain, the model-free adaptive control schemes have been proposed by using only the set of input–output data [1,2,3]. In general, the full-state feedback has been required to gain enough information such that the works of [4] for the linear plant and [5, 6] for nonlinear systems. On the other hand, the output feedback control schemes have been less studied than the state feedback schemes because output feedback controllers have been much more difficult in many cases [7, 8]. In order to handle the applications with unknown nonlinear discrete-time systems and lacking state measurement, model-free adaptive controllers based on output feedback have been developed with the closed-loop stability guarantee [9,10,11,12]. Nevertheless, the stability analysis is only a bare minimum requirement for controller designs, but the optimization of a prescribed cost function is preferred for several control applications [13,14,15].

The optimal control schemes based on the concept of action-critic networks have been proposed to determine the estimated solution of the Hamilton–Jacobi–Bellman (HJB) equation [16] within the manner of reinforcement learning (RL) algorithms [17, 18]. In general, both action and critic networks have been established by artificial neural networks (ANN) when the unknown cost function has been approximated by a critic-ANN and the solution of control effort has been obtained by an action-ANN [19, 20]. The architectures and learning schemes of action-critic networks have been proposed such that “neuro dynamic programming” [21], “adaptive critic design” [22] and “adaptive dynamic programming” for discrete-time systems [23] and continuous-time systems [24]. In [25], the controlled plant has been considered as a gray-box system and the action-critic structure has been proposed to design the adaptive controller with nearly optimization manner based on RL algorithm. Consideration of approximation errors, the generalized policy iteration has been developed in [26]. Both value and policy iterations play an importance role for solving optimal control problems, but both iterations seem inconvenient for implementation with practical plants. That motivates us to design the learning algorithm for both critic and action networks without inner iteration.

Currently, they have a few works for the implementation of practical systems with action-critic networks and RL learning because the standard algorithms cannot be directly applied for time-varying conditions and uncertainties which are common for application plants [27]. Furthermore, the measurement of full-state variables is generally required to design controllers and learning algorithms [28, 29]. Together with the economic reason, output feedback control schemes are strongly desired for a large class of practical plants. Recently, the output feedback controllers based on RL algorithms have been proposed with the condition of persistent excitation (PE) [30]. The PE condition is generally required to be satisfied for adaptive algorithms with stability analysis. In [31], the PE condition can be relaxed with the ANN control scheme for nearly optimal regulation scheme, but the controller is limited for a class of affine nonlinear discrete-time systems. For a class of non-affine systems, the Q-learning algorithm based on critic-action networks has been proposed in [32,33,34], but it has been emphasized on state-feedback scheme and regulation problem. For practical perspective, the output feedback controller will be developed by the action-critic structure and the online learning algorithm only.

Fuzzy systems have been successfully utilized for the presence of robustness and uncertainties of optimal controllers when mathematical models of controlled plants have been considered as unknown [35]. In [36], fuzzy hyperbolic model has been developed as an action network tuned by the internal reinforcement signal for a class of unknown discrete-time systems, but only the regulation problem has been discussed. Based on the back-stepping adaptive control, the uncertainties and unknown systems have been handled [37, 38], but the full-state feedback has been required to design controllers. The design of output feedback controller based on fuzzy systems has been proposed by [39], but this controller has been conducted by a class of continuous-time systems with unity control gain. Recently, the controller based on a recurrent-fuzzy neural network with RL has been proposed by [40] for a class of nonlinear discrete-time systems, but only the tracking error has been selected for the reward function of the critic network.

In this article, the controlled plant is considered as a class of non-affine discrete-time systems when the mathematical model is unknown. To design the controller without any model, the model-free adaptive control scheme is established by an action-critic networks architecture with RL algorithm. The control signal is generated via an action network constructed by a single input fuzzy-rules emulated network (FREN) [41]. The set of IF–THEN rules for FREN is created by the human knowledge according to the relation between the control signal and the plant’s output [42] such that

Action IF Higher output is desired, THEN Larger control signal is requested.

Within the manner of optimization between the tracking error and the energy of control signal, a critic network is established to estimate the long-term cost function. A multi-input fuzzy-rules emulated network (MiFREN) is implemented to create a critic network with the set of IF–THEN rules as

Critic IF Error is big and Control energy is large, THEN Reward should be low.

This reward can lead to the cost function generated by MiFREN with the relation such that the lower cost function can be obtained when the tracking error and the control energy are tiny. The main contributions of this article are shortly listed as the followings:

  • Unlike other works such that [17, 25, 29, 30, 34], action-critic schemes have been designed by ANNs with random weight parameters; in this work, both action and critic networks are designed by IF–THEN rules utilized by human knowledge of the controlled plant and the controller’s actuator that allows the engineer to design the structure and adjustable parameters in the sense of engineering not in the random aspect.

  • The online learning algorithm is developed without inner policy and value iterations while the convergence of tracking error and internal signals can be guaranteed. Unlike a case of event-trigger and sampling time systems such that [23, 27, 33, 43], the proposed controller can be utilized for more extensive discrete-time systems.

  • The tracking controller is designed without the transformation of the original systems to be the augmented system dynamic that allows the proposed controller be able to be implemented directly for a large class of practical plants such as the prototype of DC-motor current control in this work.

The rest of this article is organized as follows. A class on nonlinear discrete-time systems and problem formulation is mentioned in Sect. 2. Section 3 introduces the design of action and critic networks with the concept of IF–THEN rules related on the controlled plant’s characteristic. The learning algorithm is developed in Sect. 4 with convergence analysis for tracking error and internal signals. The computer simulation system is firstly utilized to demonstrate the design procedure and the performance of the proposed controller with a selected nonlinear plant in Sect. 5.1. Secondly, in Sect. 5.2, the experimental system with a DC-motor current control is constructed to demonstrate the effectiveness and the online learning ability against the nonlinearity and uncertainty terms of practical systems. Section 6 draws the conclusions.

2 Problem statement: a class of nonlinear discrete-time systems

The block diagram in Fig. 1 presents our prototyping DC-motor current control system which has input terminal as control effort \(u(k) \in {\mathbb {R}}\) and output terminal as measured current \(y(k+1) \in {\mathbb {R}}\) when k denotes as \(k^{\mathrm {th}}\) sampling time index. The control signal u(k) is a driving voltage generated by a data-acquisition card (CONTEC\(^{\textregistered }\) AIO-160802L-LPE). The motor current \(y(k+1)\) is measured by the instrument circuit connected with analog input of AIO-160802L-LPE. This plant is considered as an unknown nonlinear system with input u(k) and output \(y(k+1)\). The mathematical model of this system will not be required to design our controller and stability analysis. The nonlinear behavior of this DC-motor driving system can be demonstrated in Fig. 2 as a VI curve when input voltage and motor current are denoted as control effort u(k) and current output \(y(k+1)\), respectively. Without any information about system’s mathematical model, this controlled plant can be considered as a class of non-affine discrete-time system and the system dynamic can be formulated as

$$\begin{aligned} y(k+1)=f_o(u(k), \ldots , u(k-l_u), y(k), \ldots , y(k-l_y))+d(k), \end{aligned}$$
(1)

when \(f_o(-)\) is an unknown nonlinear function, \(l_u\) and \(l_y\) are unknown system orders and d(k) is a bounded disturbance as \(|d(k)| \le d^o_M\). Let us define \(\chi _i(k)=[u(k-1)\,\ldots \, u(k-l_u)\,y(k)\,\ldots \,y(k-l_y)]^T\), thus the system dynamic (1) can be rewritten as

$$\begin{aligned} y(k+1)=f_o(u(k),\chi _i(k))+d(k). \end{aligned}$$
(2)

Without loss of generality, the following assumptions are stated for the nonlinear function \(f_o(-)\)

Fig. 1
figure 1

DC-motor current control configuration

Fig. 2
figure 2

VI characteristic of DC-motor driver system

.

Assumption 1

The nonlinear function \(f_o(-)\) is continuous with respect to the first argument u(k) or \(\frac{{\partial f_o(u(k),\chi _i(k))}}{\partial u(k)}\) is existed.

Assumption 2

Two constants \(g_m\) and \(g_M\) are existed where

$$\begin{aligned} 0<g_m\le \Big |\frac{{\partial f_o(u(k),\chi _i(k))}}{\partial u(k)}\Big | \le g_M. \end{aligned}$$
(3)

Those assumptions are standard requirements for several nonlinear discrete-time control schemes. In this work, the proposed control scheme will be designed under the conditions that the nonlinear function \(f_o(-)\) and the boundaries in (3) are completely unknown. The boundaries in (3) can be estimated by VI curve or experimental data. For example, in this application the estimated value of (3) can be obtained by the estimated tangent of the curve in Fig. 2 as

$$g_M = {\frac{20-0}{1.5-0.5}}=20. $$
(4)

The proposed control scheme will be developed to handle the tracking problem for a class of system in (1) by adaptive networks and stability analysis in the next section.

3 Action and critic architecture based on FRENs

In this work, the control scheme is proposed by the concept of action and critic networks presented by Fig. 3 when an action network is established by FRENaction or FRENa and a critic network is created by MiFRENcritic or MiFRENc. The action network or FRENa is designed to generate the control effort for the controlled plant, and parameters inside this network are tuned to minimize the estimated cost function obtained by the critic network or MiFRENc. The reword function for MiFRENc is established by IF–THEN rules according to the relation of tracking error and control effort. Two sets of IF–THEN rules and network architectures will be introduced for both FRENa and MiFRENc in the followings subsections.

Fig. 3
figure 3

FREN: action and critic networks architecture

3.1 Action network: FRENa

According to the human knowledge related on the controlled plant, the IF–THEN rules can be defined as

$$\begin{aligned} ``{\hbox {IF}}e(k)\hbox { is Positive Large THEN }u(k)\hbox { is Negative Large''}, \end{aligned}$$

when e(k) denotes as the tracking error given by

$$\begin{aligned} e(k)=y(k)-r(k), \end{aligned}$$
(5)

where r(k) is the desired trajectory. That means the error determined by (5) is large in positive thus the output y(k) should be reduced by the large in negative of control effort u(k). In this work, the set of IF–THEN rules can be defined as

$$\begin{aligned} \begin{array}{ll} \hbox {IF } e(k) \hbox { is }\hbox { NL } &{}\hbox {THEN } u_1(k)=\beta _{\mathrm {PL}}(k)\mu _{\mathrm {NL}}(e_k), \\ \hbox {IF } e(k) \hbox { is }\hbox { NM } &{}\hbox {THEN } u_2(k)=\beta _{\mathrm {PM}}(k)\mu _{\mathrm {NM}}(e_k),\\ \hbox {IF } e(k) \hbox { is }\hbox { NS } &{}\hbox {THEN } u_3(k)=\beta _{\mathrm {PS}}(k)\mu _{\mathrm {NS}}(e_k),\\ \hbox {IF } e(k) \hbox { is }\hbox { Z } &{}\hbox {THEN } u_4(k)=\beta _{\mathrm {Z}}(k)\mu _{\mathrm {Z}}(e_k), \\ \hbox {IF } e(k) \hbox { is }\hbox { PS } &{}\hbox {THEN } u_5(k)=\beta _{\mathrm {NS}}(k)\mu _{\mathrm {PS}}(e_k), \\ \hbox {IF } e(k) \hbox { is }\hbox { PM } &{}\hbox {THEN } u_6(k)=\beta _{\mathrm {NM}}(k)\mu _{\mathrm {PM}}(e_k), \\ \hbox {IF } e(k) \hbox { is }\hbox { PL } &{}\hbox {THEN } u_7(k)=\beta _{\mathrm {NL}}(k)\mu _{\mathrm {PL}}(e_k), \\ \end{array} \end{aligned}$$

The notations of linguistic variables N, P, L, M, S and Z denote as negative, positive, large, medium, small and zero, respectively. The nonlinear function \(\mu _{\Box }(e_k)\) is a membership function and \(\beta _{\Box }(k)\) is an adjustable parameter for linguistic value \(\Box\), where \(\Box\) denotes as linguistic values such that Negative Large (NL), Negative Medium(NM),..., Zero(Z), ..., Positive Large(PL) for all using membership functions. Regarding to the relation of FREN’s computation [41], the control effort can be obtained by

$$\begin{aligned} u(k)=\sum _{i=1}^{7}u_i(k). \end{aligned}$$
(6)

To simplify, the control effort can be rewritten as

$$\begin{aligned} u(k)=\beta _a^T(k)\phi _a(k), \end{aligned}$$
(7)

when

$$\begin{aligned} \beta _a(k)=[\beta _{\mathrm {PL}}(k)\quad \beta _{\mathrm {PM}}(k)\quad \cdots \quad \beta _{\mathrm {NL}}(k)]^T, \end{aligned}$$
(8)

and

$$\begin{aligned} \phi _a(k)=[\mu _{\mathrm {NL}}(e_k)\quad \mu _{\mathrm {NM}}(e_k)\quad \cdots \quad \mu _{\mathrm {PL}}(e_k)]^T. \end{aligned}$$
(9)

The network architecture of FRENa is depicted in Fig. 4. According to the universal function approximation of FREN [41], it exists the ideal parameter \(\beta _a^{*}\) that leads to

$$\begin{aligned} u^{*}(k)=\beta _a^{*T}\phi _a(k)+\varepsilon _a(k), \end{aligned}$$
(10)

when \(\varepsilon _a(k)\) is the approximation error of FRENa. By using (2), the error dynamic can be obtained as

$$\begin{aligned} e(k+1)=f_o(u(k),\chi _i(k))+d(k)-r(k+1). \end{aligned}$$
(11)

Adding and subtracting \(f_o(u^{*}(k),\chi _i(k))\) into (11), thus, the error dynamic can be rewritten as

$$\begin{aligned} e(k+1)=f_o(u(k),\chi _i(k))-f_o(u^{*}(k),\chi _i(k))+d(k). \end{aligned}$$
(12)

By using mean value theorem and Assumption 1, the error dynamic (12) can be obtained as

$$\begin{aligned} e(k+1)=g(u^i(k),\chi _i(k))[u(k)-u^{*}(k)]+d(k), \end{aligned}$$
(13)

where

$$\begin{aligned} g(u^i(k),\chi _i(k))=\frac{{\partial f_o(u^i(k),\chi _i(k))}}{\partial u^i(k)}, \end{aligned}$$
(14)

when \(u^i(k) \in [\min \{u^{*}_k,u_k\},\,\max \{u^{*}_k,u_k\}]\). Substituting \(u^{*}(k)\) with (10) and u(k) with (7) and defining \(g(u^i(k),\chi _i(k))=g(k)\), this, the error dynamic (13) can be rewritten as

$$\begin{aligned} e(k+1)=g(k)[\beta _a(k)-\beta _a^{*}]^T\phi _a(k)-g(k)\varepsilon _a(k)+d(k). \end{aligned}$$
(15)

Let us define \(\tilde{\beta }_a(k)=\beta _a(k)-\beta _a^{*}\), \(d_a(k)=d(k)-g(k)\varepsilon _a(k)\) and \(\Lambda _a(k)=\tilde{\beta }_a^T(k)\phi _a(k)\), thus, we obtain

$$\begin{aligned} e(k+1)=g(k)\Lambda _a(k)+d_a(k). \end{aligned}$$
(16)

The error dynamic obtained in (16) indicates the relation with the difference of ideal and adjustable parameters of action network FRENa and its approximation error.

Fig. 4
figure 4

FRENa network architecture

3.2 Critic network: MiFRENc

In order to minimize for both tracking error and control energy, an infinite-horizon cost function is defined as

$$\begin{aligned} L(k)=\sum _{i=k}^{\infty }\gamma _L^{i-k}[pe^2(i)+qu^2(i)], \end{aligned}$$
(17)

when p and q are positive constants and \(0<\gamma _L\le 1\) as a discount factor. Let us rearrange (17) as

$$\begin{aligned} L(k)= &\, {} pe^2(k)+qu^2(k) \nonumber \\&+\,\gamma _L\sum _{i=k+1}^{\infty }\gamma _L^{i-(k+1)}[pe^2(i)+qu^2(i)],\nonumber \\= &\, {} l(k)+\gamma _L L(k+1), \end{aligned}$$
(18)

when l(k) is the local cost function defined by

$$\begin{aligned} l(k)=pe^2(k)+qu^2(k). \end{aligned}$$
(19)

Let us define \(\xi _k=[e^2(k):u^2(k)]\) as the current states including the tracking error and the control effort, thus we have

$$\begin{aligned} L(k)=l(\xi _k)+\gamma _L L(k+1). \end{aligned}$$
(20)

For the closed-loop system with output feedback, it is clear that the next time index of tracking error is the function of current control effort and the current control effort is the function of current tracking error that leads to

$$\begin{aligned} \xi _{k+1}=[e^2(k+1):u^2(k+1)]=\mathfrak {f}_{\xi }(\xi _k), \end{aligned}$$
(21)

when \(\mathfrak {f}_{\xi }(-)\) is an unknown analytic function. According to composition of functions, we have

$$\begin{aligned} \xi _{k+2}=\mathfrak {f}_{\xi }\circ \mathfrak {f}_{\xi }(\xi _k)\triangleq \mathfrak {f}_{\xi }^2(\xi _k). \end{aligned}$$
(22)

Combination (2022) and all future steps, it leads us to

$$\begin{aligned} L(k)=l(\xi _k)+\gamma _L l(F_{\xi }(\xi _k)), \end{aligned}$$
(23)

where \(F_{\xi }(\xi _k)=\mathfrak {f}_{\xi }^{j}(\xi _k)\) for \(j=1\rightarrow \infty\). Regarding (23), the cost function in (17) can be estimated by MiFRENc as \(\hat{L}(k)\). This network has two inputs \(e^2(k)\) and \(u^2(k)\) and one output \(\hat{L}(k)\) as Fig. 5. The relation between inputs and estimated cost function can be established by the set of IF–THEN rules such that

$$\begin{aligned} ``\hbox {IF }e^2(k)\hbox { is Large and }u^2(k)\hbox { is Large THEN }\hat{L}(k){\hbox{should be Large value}}.{\text{''}} \end{aligned}$$
(24)

This is a strange forward IF–THEN rule to indicate that the good reward can be obtained when the control system has less tracking error with lower control effort. Thus, the set of IF–THEN rules can be defined as

$$\begin{aligned} \begin{array}{lll} \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is L }&{}\hbox {THEN }\hat{L}_1(k)=\beta _{\mathrm {L1}}(k)\phi _1(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is S }&{}\hbox {THEN }\hat{L}_2(k)=\beta _{\mathrm {L2}}(k)\phi _2(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is Z }&{}\hbox {THEN }\hat{L}_3(k)=\beta _{\mathrm {L3}}(k)\phi _3(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is L }&{}\hbox {THEN }\hat{L}_4(k)=\beta _{\mathrm {S1}}(k)\phi _4(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is S }&{}\hbox {THEN }\hat{L}_5(k)=\beta _{\mathrm {S2}}(k)\phi _5(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is Z }&{}\hbox {THEN }\hat{L}_6(k)=\beta _{\mathrm {S3}}(k)\phi _6(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is L }&{}\hbox {THEN }\hat{L}_7(k)=\beta _{\mathrm {Z1}}(k)\phi _7(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is S }&{}\hbox {THEN }\hat{L}_8(k)=\beta _{\mathrm {Z2}}(k)\phi _8(k), \\ \hbox {IF }e^2(k)\hbox { is L }&{}\hbox {and }u^2(k)\hbox { is Z }&{}\hbox {THEN }\hat{L}_9(k)=\beta _{\mathrm {Z3}}(k)\phi _9(k), \\ \end{array} \end{aligned}$$

when \(\phi _1(k)=\mu _{\mathrm {L}}(e^2_k)\mu _{\mathrm {L}}(u^2_k)\), \(\phi _2(k)=\mu _{\mathrm {L}}(e^2_k)\mu _{\mathrm {S}}(u^2_k)\) and so on. The estimated cost function can be obtained as

$$\begin{aligned} \hat{L}(k)=\sum _{i=1}^{9}\hat{L}_1(k). \end{aligned}$$
(25)

To simplify, the relation in (25) can be rewritten as

$$\begin{aligned} \hat{L}(k)=\beta _c^T(k)\phi _c(k), \end{aligned}$$
(26)

when

$$\begin{aligned} \beta _c(k)=[\beta _{\mathrm {L1}}(k)\quad \beta _{\mathrm {L1}}(k)\quad \cdots \quad \beta _{\mathrm {Z3}}(k)]^T, \end{aligned}$$
(27)

and

$$\begin{aligned} \phi _c(k)=[\phi _1(k)\quad \phi _2(k)\quad \cdots \quad \phi _9(k)]^T. \end{aligned}$$
(28)

The network architecture of MiFRENc is depicted in Fig. 5. Regarding the universal function approximation of MiFREN, it exists \(\beta _c^{*}\) such that

$$\begin{aligned} L(k)=\beta _c^{*T}\phi _c(k)+\varepsilon _c(k), \end{aligned}$$
(29)

when \(\varepsilon _c(k)\) is the approximation error of MiFRENc. By adding and subtracting \(\beta _c^{*T}\phi _c(k)\) on the left hand side of (26), thus we obtain

$$\begin{aligned} \hat{L}(k)=\tilde{\beta }_c^T(k)\phi _c(k)+\beta _c^{*T}\phi _c(k), \end{aligned}$$
(30)

when \(\tilde{\beta }_c(k)=\beta _c^T(k)-\beta _c^{*}\). Let us define \(\Lambda _c(k)=\tilde{\beta }_c^T(k)\phi _c(k)\), thus, the estimated cost function (30) can be rewritten as

$$\begin{aligned} \hat{L}(k)=\Lambda _c(k)+\beta _c^{*T}\phi _c(k). \end{aligned}$$
(31)

It us clear that the accuracy of estimated cost function relates on the learning algorithm of weight parameters \(\beta\). The proposed learning algorithms will be developed in the next section to tune all adjustable parameters inside FRENa and MiFRENc with convergence analysis.

Fig. 5
figure 5

MiFRENc network architecture

4 Learning algorithms and performance analysis

The learning algorithms are developed for both FRENa and MiFRENc. To improve the computation complexity according to the practical systems point of view, in this work, only the parameters \(\beta (k)\) have been tuned by the proposed learning laws. The performance analysis beside of the tracking error and external signals is established by Lyapunov direct method.

4.1 Learning algorithm for FRENa

In this subsection, the learning algorithm is developed for adjustable parameters of FRENa. To avoid the causality problem of \(e(k+1)\) in (16), the error function of FRENa is given by \(\Lambda _a(k)\) and the estimated function \(\hat{L}(k)\) as

$$\begin{aligned} e_a(k)=\sqrt{g(k)}\Lambda _a(k)+\frac{{1}}{\sqrt{g(k)}}\hat{L}(k). \end{aligned}$$
(32)

The cost function of FRENs is given as

$$\begin{aligned} E_a(k)=\frac{{1}}{2}e^2_a(k). \end{aligned}$$
(33)

Based on the gradient reach, the tuning law for \(\beta _a\) is established as

$$\begin{aligned} \beta _a(k+1)=\beta _a(k)-\eta _a\frac{{\partial E_a(k)}}{\partial \beta _a(k)}, \end{aligned}$$
(34)

when \(\eta _a\) denotes as the selected learning rate which will be given next by the main theorem. By using the chain rule, the partial derivative term can be determined as

$$\begin{aligned} \frac{{\partial E_a(k)}}{\partial \beta _a(k)}= & {} \frac{{\partial E_a(k)}}{\partial e_a(k)}\frac{{\partial e_a(k)}}{\partial \Lambda _a(k)}\frac{{\partial \Lambda _a(k)}}{\partial \beta _a(k)},\nonumber \\= & {}\, e_a(k)\sqrt{g(k)}\phi _a(k). \end{aligned}$$
(35)

Substituting (35) into (34) and using \(e_a(k)\) in (32), we obtain

$$\begin{aligned} \beta _a(k+1)= &\, {} \beta _a(k)-\eta _a[\sqrt{g(k)}\Lambda _a(k) \nonumber \\&+\,\frac{{1}}{\sqrt{g(k)}}\hat{L}(k)]\sqrt{g(k)}\phi _a(k), \nonumber \\= &\, {} \beta _a(k)-\eta _a[g(k)\Lambda _a(k)+\hat{L}(k)]\phi _a(k). \end{aligned}$$
(36)

Let us recall the error dynamic (16) and consider to neglect the disturbance or \(d_a(k)=0\), thus, we obtain

$$\begin{aligned} g(k)\Lambda _a(k)= e(k+1). \end{aligned}$$
(37)

Substituting (37) into (36), the learning law of \(\beta _a\) can be rewritten as

$$\begin{aligned} \beta _a(k+1)=\beta _a(k)-\eta _a[e(k+1)+\hat{L}(k)]\phi _a(k). \end{aligned}$$
(38)

The unknown nonlinear function g(k) is completely disappeared in the learning law (38), that allows this algorithm is capable for online learning phase of FRENa with unknown plant’s dynamic equations.

4.2 Learning algorithm for MiFRENc

The learning algorithm to tune parameters inside MiFRENc is developed in this subsection. Let us define the error function of MiFRENc as

$$\begin{aligned} e_c(k)=\delta \hat{L}(k)-\hat{L}(k-1)+l(k), \end{aligned}$$
(39)

when \(\delta\) is a positive constant which will be discussed next for the performance analysis. The cost function to be minimized for tuning \(\beta _c\) is given as

$$\begin{aligned} E_c(k)=\frac{{1}}{2}e^2_c(k). \end{aligned}$$
(40)

The learning dynamic of \(\beta _a\) is obtained as

$$\begin{aligned} \beta _c(k+1)=\beta _c(k)-\eta _c\frac{{\partial E_c(k)}}{\partial \beta _c(k)}, \end{aligned}$$
(41)

when \(\eta _c\) denotes as the selected learning rate. By using the chain rule with \(E_c(k)\) in (40), \(e_c(k)\) in (39) and \(\hat{L}(k)\) in (31), the partial derivative term can be obtained as

$$\begin{aligned} \frac{{\partial E_c(k)}}{\partial \beta _c(k)}= &\, {} \frac{{\partial E_c(k)}}{\partial e_c(k)}\frac{{\partial e_c(k)}}{\partial \hat{L}(k)}\frac{{\partial \hat{L}(k)}}{\partial \beta _c(k)},\nonumber \\= &\, {} e_c(k)\delta \phi _c(k). \end{aligned}$$
(42)

The learning dynamic (41) can be obtained as

$$\begin{aligned} \beta _c(k+1)=\beta _c(k)-\eta _ce_c(k)\delta \phi _c(k). \end{aligned}$$
(43)

Recalling \(e_c(k)\) in (39) with (43), thus, the learning algorithm for MiFRENc can be rewritten as

$$\begin{aligned} \beta _c(k+1)=\beta _c(k)-\eta _c\delta [l(k)-\hat{L}(k-1)+\delta \hat{L}(k)]\phi _c(k). \end{aligned}$$
(44)

This is a practical tuning law which will be used to adjust the parameter \(\beta _c\) as online learning phase.

4.3 Performance analysis

The main theorem is proposed to demonstrate the setting of controller’s parameters and learning rates to ensure the closed-loop performance when the tracking error and internal signals are bounded within defined compact sets.

Theorem 4.1

Consider the nonlinear discrete-time system described by (1) and let Assumptions 1 and 2 be held. Let\(d_M\), \(g_M\), \(\varepsilon _{cM}\), \(\beta _{aM}\) and\(L_M\)be existed. Under the control law in (7) and learning algorithms in (38) and (44), it guarantees that the functions\(\Lambda _a(k)\)and\(\Lambda _c(k)\) and the tracking errore(k) are bounded when designed parameters are appropriately chosen as the followings:

$$\begin{aligned}&\frac{{1}}{2}< \delta \le 1, \end{aligned}$$
(45)
$$\begin{aligned}&0< \eta _a \le \frac{{g_m}}{N_a^2g_M^2}, \end{aligned}$$
(46)

and

$$\begin{aligned} 0< \eta _c \le \frac{{1}}{\delta ^2N_c^2}, \end{aligned}$$
(47)

where \(N_a\) and\(N_c\) are number of IF–THEN rules of FRENa and MiFRENc, respectively. The boundaries of e(k), \(\Lambda _a(k)\) and\(\Lambda _c(k)\) are obtained as\(\Omega _e\), \(\Omega _a\) and\(\Omega _c\) when

$$\begin{aligned}&\Omega _e \,\doteq\, \sqrt{\frac{{\Xi _M}}{\frac{{\rho _1}}{3}-\frac{{\rho _3}}{4}p}}. \end{aligned}$$
(48)
$$\begin{aligned}&\Omega _a \,\doteq\, \sqrt{\frac{{\Xi _M}}{\rho _2g_m-\rho _1g^2_M-\frac{{\rho _3}}{8}q}}, \end{aligned}$$
(49)

and

$$\begin{aligned} \Omega _c \,\doteq\, \sqrt{\frac{{\Xi _M}}{\rho _3\delta ^2-\rho _4}}, \end{aligned}$$
(50)

where

$$\begin{aligned} \Xi _M\, \doteq \,\rho _1d^2_m+\rho _3\varepsilon ^2_{cM}+\frac{{\rho _3}}{8}\beta ^2_{aM} +\Big [\frac{{\rho _3}}{8}(\gamma -1)^2+ \frac{{\rho _2}}{g_o}\Big ]L^2_M. \end{aligned}$$
(51)

All constants\(\rho _1\), \(\rho _2\), \(\ldots\), \(\rho _4\) are given as

$$\begin{aligned}&\rho _1>\frac{{3}}{4}p\rho _3, \end{aligned}$$
(52)
$$\begin{aligned}&\rho _2>\frac{{\rho _1g^2_M}}+\frac{{\rho _3}{8}q}{g_m}\rho _3, \end{aligned}$$
(53)
$$\begin{aligned}&\rho _3>\frac{{\rho _4}}{\delta ^2}, \end{aligned}$$
(54)

and

$$\begin{aligned} \rho _4>\frac{{\rho _3}}{4}. \end{aligned}$$
(55)

Remark

In this work, the number of IF–THEN rules is given as 7 and 9 rules for FRENa and MiFRENc, respectively. The design of the number of IF–THEN rules is conducted by the computation complexity, and the results of simulation and experimental systems will be discussed by the next section.

Proof

By using the Lyapunov direct method, in this work, the candidate function is given as

$$\begin{aligned} V(k)=\rho _1e^2(k)+\frac{{\rho _2}}{\eta _a}\tilde{\beta }^T_a(k)\tilde{\beta }_a(k) +\frac{{\rho _3}}{\eta _c}\tilde{\beta }^T_c(k)\tilde{\beta }_c(k) +\rho _4\Lambda _c^2(k-1), \end{aligned}$$
(56)

or

$$\begin{aligned} V(k)=V_1(k)+V_2(k)+V_3(k)+V_4(k), \end{aligned}$$
(57)

when

$$\begin{aligned}&V_1=\rho _1e^2(k), \end{aligned}$$
(58)
$$\begin{aligned}&V_2=\frac{{\rho _2}}{\eta _a}\tilde{\beta }^T_a(k)\tilde{\beta }_a(k), \end{aligned}$$
(59)
$$\begin{aligned}&V_3=\frac{{\rho _3}}{\eta _c}\tilde{\beta }^T_c(k)\tilde{\beta }_c(k), \end{aligned}$$
(60)

and

$$\begin{aligned} V_4=\rho _4\Lambda _c^2(k-1). \end{aligned}$$
(61)

According to the error dynamic in (16), the change of Lyapunov candidate function \(V_1(k)\) can be obtained by

$$\begin{aligned} \Delta V_1(k)= &\, {} \rho _1\big [e^2(k+1)-e^2(k)\big ], \nonumber \\= &\, {} \rho _1\big [[g(k)\Lambda _a(k)+d_a(k)]^2-e^2(k)\big ], \nonumber \\\le & {}\, \rho _1\big [2g^2(k)\Lambda ^2_a(k)+2d^2_a(k)-e^2(k)\big ]. \end{aligned}$$
(62)

Applying Assumption 2 and the upper bound of the disturbance and the estimation error as \(d_m\) when \(|d_a(k)|\le d_M\): \(\forall k=1,2, \ldots\), the relation in (62) can be rewritten as

$$\begin{aligned} \Delta V_1(k)\le -\rho _1e^2(k)+2\rho _1g^2_M\Lambda ^2_a(k)+2\rho _1d^2_M. \end{aligned}$$
(63)

By using the tuning law in (36), the change of \(V_2(k)\) can be expressed as

$$\begin{aligned} \Delta V_2(k)=\, & {} \frac{{\rho _2}}{\eta _a} \Big [\tilde{\beta }^T_a(k+1)\tilde{\beta }_a(k+1)-\tilde{\beta }^T_a(k)\tilde{\beta }_a(k)\Big ],\nonumber \\=\, & {} \frac{{\rho _2}}{\eta _a} \Big [\big [\tilde{\beta }_a(k)-\eta _a [g(k)\Lambda _a(k) \nonumber \\&+\,\hat{L}(k)]\phi _a(k)\big ]^T\big [\tilde{\beta }_a(k)-\eta _a [g(k)\Lambda _a(k)\nonumber \\&+\,\hat{L}(k)]\phi (k)\big ]-\tilde{\beta }^T_a(k)\tilde{\beta }_a(k)\Big ],\nonumber \\= & {} -2\rho _2[g(k)\Lambda _a(k)+\hat{L}(k)]\tilde{\beta }^T_a(k)\phi (k) \nonumber \\&+\,\rho _2\eta _a[g(k)\Lambda _a(k)\nonumber \\&+\,\hat{L}(k)]^2\phi ^T_a(k)\phi (k),\nonumber \\= & {} -\,2\rho _2\Lambda _a(k)[g(k)\Lambda _a(k)]-2\rho _2\Lambda _a(k)\hat{L}(k)\nonumber \\&+\,\rho _2\eta _a||\phi _a(k)||^2[g(k)\Lambda _a(k)+\hat{L}(k)]^2. \end{aligned}$$
(64)

With the lower bound and upper bound of g(k) in (3), the change of \(V_2(k)\) (64) can be rewritten as

$$\begin{aligned} \Delta V_2(k)\le & {} -2\rho _2g_m\Lambda ^2_a(k)-2\rho _2\Lambda _a(k)\hat{L}(k) \nonumber \\&+\,\rho _2\eta _a||\phi _a(k)||^2g^2_M\Lambda ^2_a(k)\nonumber \\&+\,\rho _2\eta _a||\phi _a(k)||^2[\hat{L}^2(k)+2g(k)\Lambda _a(k)\hat{L}(k)],\nonumber \\= &\, {} \rho _2\Big [-g_m\Lambda ^2_a(k)-(g_m-\eta _a||\phi _a(k)||^2g^2_M)\Lambda ^2_a(k) \nonumber \\&-\,2\Lambda _a(k)[I-\eta _a||\phi _a(k)||^2g(k)]\hat{L}(k) \nonumber \\&+\,\eta _a||\phi _a(k)||^2\hat{L}^2(k) \Big ],\nonumber \\= &\, {} \rho _2\Big [-g_m\Lambda ^2_a(k)-(g_m-\eta _a||\phi _a(k)||^2g^2_M)\big [\Lambda ^2_a(k)\nonumber \\&+\,\frac{{2\Lambda _a(k)[I-\eta _a||\phi _a(k)||^2g(k)]\hat{L}(k)}}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\Big ]\nonumber \\&+\,\eta _a||\phi _a(k)||^2\hat{L}^2(k) \Big ],\nonumber \\= &\, {} \rho _2\Big [-g_m\Lambda ^2_a(k)-(g_m-\eta _a||\phi _a(k)||^2g^2_M)\nonumber \\&\times \Big |\Big |\Lambda _a(k)+\frac{{[1-\eta _a||\phi _a(k)||^2g(k)]\hat{L}(k)}}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\Big |\Big |^2 \nonumber \\&+\, \frac{{[1-\eta _a||\phi _a(k)||^2g(k)]^2\hat{L}^2(k)}}{g_m-\eta _a||\phi _a(k)||^2g^2_M} \nonumber \\&+\,\eta _a||\phi _a(k)||^2\hat{L}^2(k) \Big ],\nonumber \\= &\, {} -\rho _2g_m\Lambda ^2_a(k)-\rho _2(g_m-\eta _a||\phi _a(k)||^2g^2_M)\nonumber \\&\times \Big |\Big |\Lambda _a(k)+\frac{{[1-\eta _a||\phi _a(k)||^2g(k)]\hat{L}(k)}}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\Big |\Big |^2 \nonumber \\&+\, \rho _2\frac{{1-\eta _a||\phi _a(k)||^2g_m}}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\hat{L}^2(k). \end{aligned}$$
(65)

It can be simplified as

$$\begin{aligned} \Delta V_2(k)\le & {} -\rho _2g_m\Lambda ^2_a(k)+ \frac{\rho _2}{g_m}\hat{L}^2(k)-\rho _2(g_m\nonumber \\&-\,\eta _a||\phi _a(k)||^2g^2_M)\Big |\Big |\Lambda _a(k) \nonumber \\&+\,\frac{[1-\eta _a||\phi _a(k)||^2g(k)]\hat{L}(k)}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\Big |\Big |^2. \end{aligned}$$
(66)

Referring the learning law of \(\beta _c\) in (43), the change of \(V_3(k)\) can be expressed as

$$\begin{aligned} \Delta V_3(k)= &\, {} \frac{\rho _3}{\eta _c}\Big [\tilde{\beta }^T_c(k+1)\tilde{\beta }_c(k+1)-\tilde{\beta }^T_c(k)\tilde{\beta }_c(k)\Big ],\nonumber \\= &\, {} \frac{\rho _3}{\eta _c}\Big [[\tilde{\beta }_c(k)-\eta _c\delta e_c(k)\phi _c(k)]^T[\tilde{\beta }_c(k) \nonumber \\&-\,\eta _c\delta e_c(k)\phi _c(k)]-\tilde{\beta }^T_c(k)\tilde{\beta }_c(k)\Big ],\nonumber \\= & {} \frac{\rho _3}{\eta _c}\Big [-2\eta _c\delta e_c(k)\tilde{\beta }^T_c(k)\phi _c(k)\nonumber \\&+\,\eta ^2_c\delta ^2e^2_c(k)||\phi _c(k)||^2\Big ],\nonumber \\= & {} -2\rho _3\delta \Lambda _c(k)e_c(k)\nonumber \\&+\,\rho _3\eta _c\delta ^2||\phi _c(k)||^2e^2_c(k). \end{aligned}$$
(67)

By adding and subtracting \(\delta L(k)\) and \(L(k-1)\) on the left hand side of the error function (39) for MiFRENc, we obtain

$$\begin{aligned} e_c(k)= &\, {} \delta [\hat{L}(k)-L(k)]+\delta L(k)-[\hat{L}(k-1)\nonumber \\&-\,L(k-1)]-L(k-1)+l(k), \nonumber \\= &\, {} \delta [\hat{\beta }_c^T(k)\phi _c(k)-\beta _c^T\phi _c(k)-\varepsilon _c(k)]+\delta L(k)\nonumber \\&-\,L(k-1)+l(k)-[\hat{\beta }_c^T(k-1)F_c(k-1)\nonumber \\&-\,\beta _c^T\phi _c(k-1)-\varepsilon _c(k-1)], \nonumber \\= &\, {} \delta [\hat{\beta }_c^T(k)-\beta _c^T]\phi _c(k) -[\hat{\beta }_c^T(k-1)\nonumber \\&-\,\beta _c^T]\phi _c(k-1)+\delta L(k)-L(k-1)+l(k) \nonumber \\&-\,\delta \varepsilon _c(k)+\varepsilon _c(k-1), \nonumber \\= &\, {} \delta \tilde{\beta }_c^T(k)\phi _c(k) -\tilde{\beta }_c^T(k-1)\phi _c(k-1)+\delta L(k)\nonumber \\&-\,L(k-1)+l(k)-\delta \varepsilon _c(k)+\varepsilon _c(k-1). \end{aligned}$$
(68)

Regarding to the definition of \(\Lambda _c(k)\), the relation in (68) can be rewritten as

$$\begin{aligned} e_c(k)=\, & {} \delta \Lambda _c(k)-\Lambda _c(k-1)+\delta L(k)-L(k-1)\nonumber \\&+\,l(k)-\delta \varepsilon _c(k)+\varepsilon _c(k-1). \end{aligned}$$
(69)

Let us rearrange (69), thus, we obtain

$$\begin{aligned} \delta \Lambda _c(k)=\, & {} e_c(k)-\delta L(k)+\Lambda _c(k-1)+L(k-1)\nonumber \\&-\,l(k)+\delta \varepsilon _c(k)-\varepsilon _c(k-1). \end{aligned}$$
(70)

Substitute (70) into (67), thus, we have

$$\begin{aligned} \Delta V_3(k)= & {} -2\rho _3e_c(k)\Big [e_c(k)-\delta L(k)+\Lambda _c(k-1)\nonumber \\&+\,L(k-1)-l(k)+\delta \varepsilon _c(k)-\varepsilon _c(k-1)\Big ] \nonumber \\&+\,\rho _3\eta _c\delta ^2||\phi _c(k)||^2e^2_c(k),\nonumber \\= & {} -\rho _3\Big [1-\eta _c\delta ^2||\phi _c(k)||^2\Big ]e^2_c(k)-\rho _3e^2_c(k)\nonumber \\&+\,2\rho _3e_c(k)\Big [\delta L(k)-\Lambda _c(k-1)-L(k-1)\nonumber \\&+\,l(k)-\delta \varepsilon _c(k)+\varepsilon _c(k-1)\Big ], \nonumber \\= & {} -\rho _3\Big [1-\eta _c\delta ^2||\phi _c(k)||^2\Big ]e^2_c(k)\nonumber \\&-\,\rho _3\delta ^2\Lambda ^2_c(k)+\rho _3\Big [\delta L(k)-\Lambda _c(k-1)\nonumber \\&-\,L(k-1)+l(k)-\delta \varepsilon _c(k)+\varepsilon _c(k-1)\Big ]^2,\nonumber \\\le & {} -\rho _3\Big [1-\eta _c\delta ^2||\phi _c(k)||^2\Big ]e^2_c(k) \nonumber \\&-\,\rho _3\delta ^2\Lambda ^2_c(k)+\frac{\rho _3}{4}\Lambda ^2_c(k-1) \nonumber \\&+\,\frac{\rho _3}{4}l^2(k)+\frac{\rho _3}{4}[\delta L(k)-L(k-1)]^2 \nonumber \\&+\,\frac{\rho _3}{4}\Big [\delta \varepsilon _c(k)-\varepsilon _c(k-1)\Big ]^2. \end{aligned}$$
(71)

Let us define the designed parameter \(\delta\) as \(0<\delta \le 1\) and recall the local cost function l(k) in (19), thus, the relation in (71) can be obtained as

$$\begin{aligned} \Delta V_3(k)\le & {} -\rho _3\Big [1-\eta _c\delta ^2||F_c(k)||^2\Big ]e^2_c(k)-\rho _3\delta ^2\Lambda ^2_c(k)\nonumber \\&+\,\frac{\rho _3}{4}\Lambda ^2_c(k-1)+\frac{\rho _3}{4}pe^2(k) \nonumber \\&+\,\frac{\rho _3}{8}q\Lambda ^2_a(k)+\frac{\rho _3}{8}||\beta ^T_a(k)\phi _a(k)||^2 \nonumber \\&+\,\frac{\rho _3}{4}[\delta L(k)-L(k-1)]^2+\rho _3\varepsilon ^2_{cM}, \end{aligned}$$
(72)

where \(|\varepsilon _c(k)| \le \varepsilon ^2_{cM}\). For \(V_4(k)\), its first difference can be obtained as

$$\begin{aligned} \Delta V_4(k)=\rho _4\Big [\Lambda ^2_c(k)-\Lambda ^2_c(k-1)\Big ]. \end{aligned}$$
(73)

Finally, the change of Lyapunov function V(k) is obtained as

$$\begin{aligned} \Delta V(k)\le & {} -\frac{\rho _1}{3}e^2(k)+\rho _1g^2_M\Lambda ^2_a(k)+\rho _1d^2_M \nonumber \\&-\,\rho _2g_m\Lambda ^2_a(k)-\rho _2(g_m-\eta _a||\phi _a(k)||^2g^2_M)\nonumber \\&\times \,\Big |\Big |\Lambda _a(k)+\frac{[1-\eta _a||\phi _a(k)||^2g(k)]L(k)}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\Big |\Big |^2 \nonumber \\&+\, \frac{\rho _2}{g_m}L^2(k)-\rho _3\Big [1-\eta _c\delta ^2||\phi _c(k)||^2\Big ]e^2_c(k)\nonumber \\&-\,\rho _3\delta ^2\Lambda ^2_c(k)+\frac{\rho _3}{4}\Lambda ^2_c(k-1)+\frac{\rho _3}{4}pe^2(k) \nonumber \\&+\,\frac{\rho _3}{8}q\Lambda ^2_a(k)+\frac{\rho _3}{8}||\beta ^T_a\phi _a(k)||^2 \nonumber \\&+\,\frac{\rho _3}{4}[\delta L(k)-L(k-1)]^2+\rho _3\varepsilon ^2_{cM}\nonumber \\&+\,\rho _4\Big [\Lambda ^2_c(k)-\Lambda ^2_c(k-1)\Big ],\nonumber \\\le & {} -\Big [\frac{\rho _1}{3}-\frac{\rho _3}{4}p\Big ]e^2(k) \nonumber \\&-\,\Big [\rho _2g_m-\rho _1g^2_M-\frac{\rho _3}{8}q\Big ]\Lambda ^2_a(k)\nonumber \\&-\,\Big [\rho _3\delta ^2-\rho _4 \Big ]\Lambda ^2_c(k)-\Big [\rho _4-\frac{\rho _3}{4} \Big ]\Lambda ^2_c(k-1)\nonumber \\&-\,\rho _2[g_m-\eta _a||\phi _a(k)||^2g^2_M]\Big |\Big |\Lambda _a(k)\nonumber \\&+\,\frac{[1-\eta _a||\phi _a(k)||^2g(k)]L(k)}{g_m-\eta _a||\phi _a(k)||^2g^2_M}\Big |\Big |^2\nonumber \\&-\,\rho _3\Big [1-\eta _c\delta ^2||\phi _c(k)||^2\Big ]e^2_c(k)+\Xi _M. \end{aligned}$$
(74)

The membership functions of FRENa and MiFRENc are given by (9) and (28), respectively. It is clear that \(\phi _a(k)\) and \(\phi _c(k)\) are satisfied as the followings

$$\begin{aligned} 0< \phi _a(k) \le N_a, \end{aligned}$$
(75)

and

$$\begin{aligned} 0< \phi _c(k) \le N_c. \end{aligned}$$
(76)

According to the designed parameters given by (45)–(47), constants \(\rho _{1-4}\) satisfied conditions in (52)–(55) and the relations in (75, 76), the change of Lyapunov function can be negative semi-define or \(\Delta V(k)\le 0\) when

$$\begin{aligned}&|e(k)| \ge \sqrt{\frac{\Xi _M}{\frac{\rho _1}{3}-\frac{\rho _3}{4}p}} \doteq \Omega _e, \end{aligned}$$
(77)
$$\begin{aligned}&|\Lambda _a(k)| \ge \sqrt{\frac{\Xi _M}{\rho _2g_m-\rho _1g^2_M-\frac{\rho _3}{8}q}} \doteq \Omega _a, \end{aligned}$$
(78)

and

$$\begin{aligned} |\Lambda _c(k)| \ge \sqrt{\frac{\Xi _M}{\rho _3\delta ^2-\rho _4}} \doteq \Omega _c. \end{aligned}$$
(79)

Thus, the existence of the compact sets (48), (79) can be encouraged by (77)–(79), respectively. This proof is completed by the manner of Lyapunov direct method. \(\square\)

The validation of the proposed control scheme will be presented in the next section for the computer simulation system with a non-affine discrete-time system and the hardware implementation system for DC-motor current control-plant.

5 Validation results

5.1 Simulation results

The following non-affine discrete-time system with output feedback plant is used for simulation:

$$\begin{aligned} y(k+1)=\sin (y_k)+[5+\cos (y_ku_k)]u_k. \end{aligned}$$
(80)

The desire trajectory is given as

$$\begin{aligned} r(k+1)=A_r\sin \left(\omega _r\pi \frac{k}{k_M}\right), \end{aligned}$$
(81)

where \(k_M=4000\) as the maximum time index, \(Ar=1.0,\, \omega _r=16\) when \(0<k\le \frac{k_M}{2}\) and \(Ar=2.0,\, \omega _r=8\) when \(\frac{k_M}{2}<k\le k_M\). The designed parameter \(\delta\) is selected as \(\delta =0.75\) to follow (45). The learning rate of MiFRENc is designed by (47) as

$$\begin{aligned} 0< \eta _c \le \frac{1}{\delta ^2N_c^2}=\frac{1}{0.75^29^2}=0.0219. \end{aligned}$$
(82)

Thus, we select the learning rate for MiFRENc as \(\eta _c=0.02\). For designing the learning rate of FRENa, let us chose the boundaries \(g_m\) and \(g_M\) as 1 and 2, respectively. According to (46), the learning rate of FRENa is designed as

$$\begin{aligned} 0< \eta _a \le \frac{g_m}{N_a^2g_M^2} = \frac{1}{7^22^2}=0.005. \end{aligned}$$
(83)

Thus, the learning rate for FRENa is given in \(\eta _a=0.0025\). The membership settings of FRENa and MiFRENc are depicted in Figs. 6 and 7, respectively. The setting of membership functions can be desired by the proper ranges of e(k), \(e^2(k)\) and \(u^2(k)\). In this application, the ranges are given as \([-\,5, 5]\), [0, 10] and [0, 10] for e(k), \(e^2(k)\) and \(u^2(k)\), respectively. The initial setting of adjustable parameters \(\beta _{\Box }(1)\) for FRENa and MiFRENc is given as Table 1.

Table 1 Initial setting \(\beta _{\Box }(1)\): simulation case
Fig. 6
figure 6

FRENa membership functions: simulation case

Fig. 7
figure 7

MiFRENc membership functions: simulation case

Fig. 8
figure 8

Tracking performance y(k) and e(k): simulation case

Fig. 9
figure 9

Control effort u(k): simulation case

Fig. 10
figure 10

Estimated cost function \(\hat{L}(k)\): simulation case

The tracking performance is presented in Fig. 8 for both the motor current y(k) and the tracking error e(k). The maximum absolute value of tracking error is \(|e(k)|_{\mathrm {max}}=2.4022\) and the average absolute value of tracking error at steady state is 0.0074 when \(k=3000{-}4000\). Figure 9 displays the control effort u(k), and Fig. 10 illustrates the estimated cost function \(\hat{L}(k)\).

5.2 Experimental results

The DC-motor current control system is constructed to validate the performance of control scheme. The desired trajectory is given as

$$\begin{aligned} r(k+1)=I_r\sin \left(\omega _r\pi \frac{k}{k_M}\right), \end{aligned}$$
(84)

where \(k_M=2000\) as the maximum time index, \(I_r=15 \mathrm {[mA]},\, \omega _r=8\) when \(0<k\le \frac{k_M}{2}\) and \(I_r=30 \mathrm {[mA]},\, \omega _r=4\) when \(\frac{k_M}{2}<k\le k_M\). The designed parameter \(\delta\) is selected as \(\delta =0.75\) to follow (45). The learning rate of MiFRENc is designed by (47) as

$$\begin{aligned} 0< \eta _c \le \frac{1}{\delta ^2N_c^2}=\frac{1}{0.75^29^2}=0.0219. \end{aligned}$$
(85)

Thus, we select the learning rate for MiFRENc as \(\eta _c=0.02\).

Fig. 11
figure 11

FRENa membership functions: experimental system case

Fig. 12
figure 12

MiFRENc membership functions: experimental system case

Remark

The learning rate \(\eta _c\) is selected as the same as simulation case because this learning rate is related only the network architecture of MiFRENc which is same as the previous case.

Regarding to the result in (4), let us chose the boundaries \(g_m\) and \(g_M\) as 10 and 20, respectively. According to (46), the learning rate of FRENa is designed as

$$\begin{aligned} 0< \eta _a \le \frac{g_m}{N_a^2g_M^2} = \frac{10}{7^220^2}=0.00051. \end{aligned}$$
(86)

Thus, we desire to select the learning rate for FRENa as \(\eta _a=0.00025\). It is around half of computation result obtained by (86).

Remark

In this experimental system case, the constants \(g_m\) and \(g_M\) are selected as 10 times because the relation between output (\(y(k): \pm \,50\) [mA]) and input (\(u(k):\pm \,5\) [V]) with value ranges is around 10 times without unit.

The membership settings of FRENa and MiFRENc for this experimental system are illustrated in Figs. 11 and 12, respectively when the proper ranges are given in \([-\,50, 50]\)mA, [0, 10]mA\(^2\) and [0, 10]V\(^2\) for e(k), \(e^2(k)\) and \(u^2(k)\), respectively. The initial setting of adjustable parameters \(\beta _{\Box }(1)\) for FRENa and MiFRENc is given as Table 2.

Table 2 Initial setting \(\beta _{\Box }(1)\): experimental system case
Fig. 13
figure 13

Tracking performance y(k) and e(k): experimental system

Fig. 14
figure 14

Control effort u(k): experimental system

Fig. 15
figure 15

Estimated cost function \(\hat{L}(k)\): experimental system

Fig. 16
figure 16

u(k) and e(k): experimental system

The tracking performance is represented in Fig. 13 for both the motor current y(k) and the tracking error e(k). The maximum absolute value of tracking error is \(|e(k)|_{\mathrm {max}}=78.1642\) [mA] and the average absolute value of tracking error at steady state is 0.4817 [mA] when \(k=1500-2000\). Furthermore, the control effort u(k) and the estimated cost function \(\hat{L}(k)\) are depicted in Figs. 14 and 15, respectively. In Fig. 13, the large variation of the tracking error is observed. It is caused by the instant back-EMF of the motor. For the compensate of this issue, the controller produces a large variation of the control effort as depicted in Fig. 14. Thus, this phenomenon leads to a second peak of \(\hat{L}(k)\) in Fig. 15. The phase plan between u(k) and e(k) is depicted in Fig. 16 to represent the character of a large variation with a clear point of view. Moreover, when the desired trajectory r(k) is changed, the controller provides a higher amplitude of the armature voltage depicted in Fig. 14 that leads to increasing of the cost function (17). Thus, in Fig. 15, the second ripple is detected because of the increasing of the control energy.

Fig. 17
figure 17

Tracking performance y(k) and e(k): second run

Fig. 18
figure 18

u(k) and e(k): second run

To demonstrate the advantage of the proposed RL learning algorithm, the second run is tested when the initial parameters of MiFRENc and FRENa are selected as the final parameters obtained by the first run. For the second run, the large variation is compensated as the results depicted in Fig. 17. The maximum absolute value of tracking error is \(|e(k)|_{\mathrm {max}}=7.391\) [mA] and the average absolute value of tracking error at steady state is 0.2197 [mA] when \(k\) = 1500–2000. Furthermore, the plot in Fig. 18 indicates the effectiveness of the proposed controller to compensate the large variation occurred in this plant.

6 Conclusions

An adaptive controller for a class of nonlinear discrete-time systems has been proposed by action-critic networks (FRENa and MiFRENc). Practically, the controller has only required the parameter \(g_M\), which has been directly estimated by experimental data, when the mathematical model of controlled plants has been completely omitted. Two sets of IF–THEN rules have been created according to the human knowledge of controlled plant and the optimization manner of tracking error and control energy for FRENa and MiFRENc, respectively. The online learning algorithm of two networks has been developed to tune all adjustable parameters by RL manner. The theoretical analysis has been conducted by the Lyapunov method to guarantee the convergence of tracking error and internal signals. The numerical system based on computer simulation has demonstrated the effectiveness of the proposed controller and the convergence of error signal. The experimental system with DC-motor current control has been established by our prototyping product. The controller design has been conducted by using only the VI characteristic curve obtained by the standard testing process. The results have represented the satisfied performance of control scheme such that a superior tracking performance and a compensation of large variation occurred by unknown nonlinear terms of controlled plant.

Unlike other RL controllers, in this work, the critic network has been designed directly by using the set of IF–THEN rules from the human knowledge of the controlled plant. To emphasize this advantage, the research based on nonholonomic systems with this proposed scheme is our future investigating theme.