1 Introduction

System identification has been the focus of many researchers in recent decades in various areas, for modeling linear [1, 2] and nonlinear systems [3,4,5]. The main goal of system identification is to utilize input/output data to obtain an appropriate model, in a way that the behavior of the identified model is as close as possible to the behavior of the actual system, as per a predefined criterion. The mathematical model thus created can be used for analysis or controller design [6,7,8,9].

Mathematical modeling is possible in several ways: analytical modeling (white-box), experimental modeling (black-box) and hybrid modeling (gray-box). In the white-box case, considering the nature of the components of the system and the physical laws governing them, mathematical models are formed between the input and the output of the system. In black-box approach, there is no information about the internal components of the system, and only by using the input /output data, a mathematical relationship is obtained between the input and the output. In the gray-box approach, the physical components of the system are distinctive; however, their values are unknown, and by using the input/output data, the unknown parameters can be determined. Several system identification methods have been proposed for linear and nonlinear systems [10, 11]. Since most industrial processes are complex and nonlinear, nonlinear systems identification has attracted a lot of attention in the past two decades. However, nonlinear system identification is much more difficult than linear systems identification [12]. In addition, due to the complexity of the systems, it is difficult to obtain physical models since the exact system model is not always available. Therefore, data-driven methods that are dependent on historical information of the system are recently developed to identify the system behavior. These methods do not require complex mathematical tools and are very useful in practice [13, 14].

Bilinear systems are a class of nonlinear systems that are widely studied and used due to their simplicity, and they can be considered as a suitable model for many physical systems [15,16,17]. In recent years, bilinear systems have drawn a significant amount of attention due to their intrinsic simplicity and wide applications [18]. The importance of such systems is due to their wide range of applications, not only in engineering but also in biology, economics and chemistry [19]. Such a system can explain many physical phenomena and it has been used in many fields, such as air conditioning control [20], immune system, heart regulator, control of carbon dioxide in the lungs and blood pressure [21,22,23]. There are a couple of methods that have been proposed for bilinear system identification, such as least-squares methods that are based on minimizing the error, i.e., in other words, the summation of square errors [24, 25], gradient methods [26, 27], maximum likelihood methods [28], iterative methods [29, 30], recursive methods [31,32,33] and error prediction methods [34]. In addition, interaction matrices approach is utilized for identification and observer design of bilinear systems [35, 36]. In [35], optimal bilinear observers are designed for bilinear state-space models, and a new method for identification of bilinear system is introduced in [36]. As can be seen in this paper, by using the interaction matrix formulation, the bilinear system state is expressed in terms of input/output measurement. In the iterative methods, to improve the estimation of the parameters, the algorithm uses all the data in every iteration and the results of the previous iteration. In these methods, the algorithm is applied once the input–output dataset is collected. In the recursive algorithms, the algorithm uses only the current data to improve the previous estimation. In [37], recursive extended least squares and maximum likelihood methods have been used to identify the bilinear system parameters. Gibson et al. [28] have provided maximum likelihood parameter estimation algorithm for bilinear system identification. Also, Kalman filtering algorithm for parameter estimation of bilinear systems has been proposed in [38]. In [39], the state-space model of bilinear systems has been converted to a transfer function by removing state variables and the recursive least squares algorithm and multi-innovation theory are used to increase the parameter estimation accuracy. In a multi-innovation identification algorithm, increasing the innovation length can increase parameter estimation accuracy and reduce the algorithm’s sensitivity to noise.

Li et al. [25] have presented a least-squares iterative algorithm to identify bilinear system parameters using the maximum likelihood. The maximum likelihood iterative least-squares algorithm can provide a more accurate estimate of bilinear systems than the iterative least-squares algorithm. In [40], an iterative algorithm based on the hierarchical principle is proposed to alleviate the complexity of computational load. The proposed algorithm can improve the accuracy of parameter estimation and reduce the computational load. In [41], to achieve a better accuracy, an algorithm using the Kalman filter and multi-innovation theory has been proposed. Both algorithms work well, and the Kalman filter-based multi-innovation recursive extended least squares algorithm has a higher parameter estimation accuracy than the Kalman filter-based recursive extended least squares algorithm. Using the principle of hierarchical identification [40] and data filtering method, a gradient iterative algorithm and a filtering-based gradient iterative algorithm have been presented in [26]. The proposed algorithms can provide very accurate estimates of bilinear systems. Two-step gradient-based iterative algorithm has less computational cost than gradient-based iterative algorithm and its convergence speed is faster than other two methods in this paper. The algorithms presented in [26] provide a better parameter estimation accuracy than the methods presented in [27]. In [42], a state filtering-based hierarchical identification algorithm has been proposed. In [43], multi-innovation-based stochastic gradient algorithm is presented by using the decomposition method. Decomposition-based multi-innovation stochastic gradient algorithm has higher accuracy than the decomposition-based stochastic gradient algorithm. Also, pursuant to hierarchical principle and data filtering, a least squares iterative algorithm is proposed for identification of bilinear systems [44]. Ding et al. [45] have provided stochastic gradient algorithm and gradient iterative algorithm to estimate the parameters of bilinear systems using an auxiliary model. Auxiliary model-based gradient iterative algorithm uses all the input–output data measured in each iteration which is more suitable than the auxiliary stochastic gradient algorithm. The auxiliary model-based gradient iterative algorithm is effective for bilinear systems in a white noise environment. With the help of subspace identification method, a data-driven design approach has been presented in [46]. In [47], the stochastic gradient algorithm using hierarchical identification principle and multi-innovation idea is developed. In addition, in [48], considering the data filtering method, an extended stochastic gradient algorithm has been proposed.

Although the recursive least-squares method is the most common estimation method among the various previously published methods and has a high convergence rate, it suffers from some drawbacks such as high computational burden. Therefore, in order to overcome this problem, other identification methods such as the principle of hierarchical identification are proposed that divide the main system into multiple subsystems with smaller dimensions to estimate unknown parameters. For example, in [25, 48], the computational load for identifying the system is high, and the hierarchical identification principle has been utilized to obtain an effective method for parameter estimation with higher computational efficiency. It has been shown that the suggested algorithms have less parameter estimation error compared to other presented algorithms.

Motivated by the above-stated concerns, a four-stage hierarchical identification algorithm is used in this paper to identify a bilinear system based on state-space equations. In this regard, a four-stage recursive least squares algorithm and a four-stage stochastic gradient algorithm are proposed for the identification of bilinear systems. To improve the computational efficiency using the hierarchical identification principle, the identification model is decomposed into four subsystems and the information vector is separately decomposed into four subvectors with smaller dimensions. In addition, an ARMA colored noise model is used in the presented model. Since only input/output data of the system is available in these algorithms, a state observer is used to estimate the system states, and then, the estimated states are used in the identification algorithm. Finally, the proposed algorithm presented for bilinear system identification is simulated and the convergence of the identified parameters is reported. The main contributions of this paper are listed as follows:

  • A four-stage recursive least squares algorithm and a four-stage stochastic gradient are proposed using the hierarchical identification principle to reduce the computational efficiency. The principle of hierarchical identification divides the main system into several subsystems with small dimensions. Also, the information vector is broken down into several information subdivisions.

  • A bilinear state observer is presented based on the Kalman filter algorithm for bilinear state-space estimation.

  • To show the high efficiency of the four-stage recursive least squares algorithm, a comparison of the computational efficiency between two recursive least squares and four-stage recursive least squares algorithm is provided.

The rest of this paper is organized as follows: In Sect. 2, the preliminary definitions, problem statement and the bilinear state-space system is presented. In Sect. 3, a four-stage recursive least squares algorithm is described. Section 4 shows the computational efficiency of the 4S-RLS algorithm. Section 5 presents a four-stage stochastic gradient algorithm. A numerical example and a practical example are presented in Sect. 6 to show the effectiveness of the proposed algorithm. Finally, the paper is ended by Sect. 7 with some concluding points.

2 Problem statement

In this section, first, a number of notations are explained. Superscript T represents the matrix transpose. \(\widehat{\rho }({t})\) is the estimation of the parameter \(\rho \) in time. I (\({I}_{n})\) represents identity matrix \(n\times n\), and \(q\) is the unit shift operator as

$$ qz\left( t \right) = z\left( {t + 1} \right),\quad q^{ - 1} z\left( t \right) = z\left( {t - 1} \right) $$

Figure 1 shows the state-space representation of bilinear systems. According to this figure, the bilinear system state-space model is defined as follows:

Fig.1
figure 1

Bilinear state-space system

$${\varvec{z}}\left(t+1\right)=A{\varvec{z}}\left(t\right)+B{\varvec{z}}\left(t\right)\overline{u }\left(t\right)+f\overline{u }\left(t\right)$$
(1)
$$\overline{y }\left(t\right)=h{\varvec{z}}\left(t\right)+\omega (t)$$
(2)

where \({\varvec{z}}(t)=[{{z}_{1}(t),{z}_{2}(t),\cdots ,{z}_{n}(t)]}^{{T}}\in {\mathbb{R}}^{n}\) is the state vector, \(\overline{u }\left(t\right)\) is the system input, \(\overline{y }\left(t\right)\) is the system output, \(\omega \left(t\right)=\frac{D\left(q\right)}{C\left(q\right)}v(t)\) is a colored noise and \(v\left(t\right)\in {\mathbb{R}}\) is a zero mean white noise. \(A\in {\mathbb{R}}^{n\times n}\), \(B\in {\mathbb{R}}^{n\times n}\), \(f\in {\mathbb{R}}^{n}\) and \(h\in {\mathbb{R}}^{1\times n}\) are the system matrices and vectors with an appropriate dimension as follows:

$$ \begin{aligned} A & = \left[ {\begin{array}{*{20}l} { - a_{1} } \hfill & 1 \hfill & 0 \hfill & \ldots \hfill & 0 \hfill \\ { - a_{2} } \hfill & 0 \hfill & 1 \hfill & \ddots \hfill & 0 \hfill \\ \vdots \hfill & \vdots \hfill & \ddots \hfill & \ddots \hfill & 0 \hfill \\ { - a_{n - 1} } \hfill & 0 \hfill & \cdots \hfill & 0 \hfill & 1 \hfill \\ { - a_{n} } \hfill & 0 \hfill & \ldots \hfill & 0 \hfill & 0 \hfill \\ \end{array} } \right] \in {\mathbb{R}}^{n \times n} \\ B & = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {{\varvec{b}}_{{\varvec{1}}} } \\ {{\varvec{b}}_{{\varvec{2}}} } \\ \end{array} } \\ \vdots \\ {{\varvec{b}}_{{\varvec{n}}} } \\ \end{array} } \right] \in {\mathbb{R}}^{n \times n} \quad {\varvec{b}}_{{\varvec{i}}} \in {\mathbb{R}}^{1 \times n} \\ f & = \left[ {f_{1} ,f_{2} , \ldots ,f_{n} } \right]^{T} \in {\mathbb{R}}^{n} ,\quad h = \left[ {1,0, \ldots ,0} \right] \in {\mathbb{R}}^{1 \times n} \\ \end{aligned} $$

Using the shift operator, the polynomials \(D\left(q\right)\) and \(C\left(q\right)\) are defined as

$$ \begin{aligned} D\left( q \right) & = 1 + d_{1} q^{ - 1} + d_{2} q^{ - 2} + \cdots + d_{p} q^{ - p} \\ C\left( q \right) & = 1 + c_{1} q^{ - 1} + c_{2} q^{ - 2} + \cdots + c_{m} q^{ - m} \\ \end{aligned} $$

According to the method presented in [48] and also using Eqs. (1) and (2), it can be concluded that

$$ \begin{aligned} z_{1} \left( t \right) & = - \mathop \sum \limits_{i = 1}^{n} a_{i} z_{1} \left( {t - i} \right) + \mathop \sum \limits_{i = 1}^{n} {\varvec{b}}_{{\varvec{i}}} {\varvec{z}}\left( {t - i} \right)\overline{u}\left( {t - i} \right) \\ { } & \quad + \mathop \sum \limits_{i = 1}^{n} f_{i} \overline{u}\left( {t - i} \right) \\ \end{aligned} $$
(3)

The parameter vector \(\rho \) is defined as:

$$\rho ={\left[{\rho }_{s},{\rho }_{n}\right]}^{{T}}\in {\mathbb{R}}^{{n}^{2}+2n+m+p}$$

where

$$ \begin{aligned} \rho_{s} & = \left[ {\rho_{a}^{{{T}}} ,\rho_{b}^{{{T}}} ,\rho_{f}^{{{T}}} } \right]^{{{T}}} \in {\mathbb{R}}^{{n^{2} + 2n}} \\ \rho_{n} & = \left[ {c_{1} ,c_{2} , \ldots c_{m} , d_{1} ,d_{2} \ldots ,d_{p} } \right]^{{{T}}} \in {\mathbb{R}}^{m + p} \\ \end{aligned} $$

and

$$ \begin{aligned} \rho_{a} & = \left[ {a_{1} ,a_{2} { }, \ldots ,a_{n} } \right]^{{{T}}} \in {\mathbb{R}}^{n} \\ {\varvec{\rho}}_{{\varvec{b}}} & = \left[ {{\varvec{b}}_{1} ,{\varvec{b}}_{2} , \ldots ,{\varvec{b}}_{{\varvec{n}}} } \right]^{{\varvec{T}}} \in {\mathbb{R}}^{{{\varvec{n}}^{2} }} \\ \rho_{f} & = \left[ {f_{1} ,f_{2} , \ldots ,f_{n} } \right]^{{{T}}} \in {\mathbb{R}}^{n} \\ \end{aligned} $$

In addition, the information vector \(\varphi \left(t\right)\) is defined as follows:

$$ \begin{aligned} \varphi \left( t \right) & = \left[ {\varphi_{z}^{T} \left( t \right),\varphi_{{z\overline{u}}}^{T} \left( t \right),\varphi_{{\overline{u}}}^{T} \left( t \right),\varphi_{n}^{T} \left( t \right)} \right]^{{{T}}} \in {\mathbb{R}}^{{n_{0} }} , \\ n_{0} & : = n^{2} + 2n + m + p \\ \end{aligned} $$
$$ \begin{aligned} \varphi_{s} \left( t \right) & = \left[ {\varphi_{z}^{T} \left( t \right),\varphi_{{z\overline{u}}}^{T} \left( t \right),\varphi_{{\overline{u}}}^{T} \left( t \right)} \right]^{{{T}}} \in {\mathbb{R}}^{{n_{1} }} , \\ n_{1} & : = n^{2} + 2n \\ \end{aligned} $$
$$ \begin{aligned} \varphi_{z} \left( t \right) & = \left[ { - z_{1} \left( {t - 1} \right), - z_{1} \left( {t - 1} \right) , \ldots , - z_{1} \left( {t - n} \right)} \right]^{T} \in {\mathbb{R}}^{{n_{2} }} , \\ n_{2} & : = n \\ \end{aligned} $$
$$ \begin{aligned} \varphi_{{z\overline{u}}} & = [{\varvec{z}}^{T} \left( {t - 1} \right)\overline{u}\left( {t - 1} \right),{\varvec{z}}^{T} \left( {t - 2} \right)\overline{u}\left( {t - 2} \right), \ldots ,{\varvec{z}}^{T} \left( {t - n} \right)\overline{u}\left( {t - n} \right)]^{T} \in {\mathbb{R}}^{{n_{2} }} , \\ n_{3} & : = n^{2} \\ \end{aligned} $$
$$ \begin{aligned} \varphi_{{\overline{u}}} \left( t \right) & = \left[ {\overline{u}\left( {t - 1} \right),\overline{u}\left( {t - 2} \right) , \ldots , \overline{u}\left( {t - n} \right)} \right]^{T} \in {\mathbb{R}}^{{n_{2} }} , \\ n_{2} & : = n \\ \end{aligned} $$

From (2), the colored noise equation can be written as

$$ \begin{aligned} \omega \left( t \right) & = \left[ {1 - C\left( z \right)} \right]\omega \left( t \right) + D\left( z \right)v\left( t \right) \\ & = - c_{1} \omega \left( {t - 1} \right) - c_{2} \omega \left( {t - 2} \right) - \cdots - c_{m} \omega \left( {t - m} \right) \\ & \quad + v\left( t \right) + d_{1} v\left( {t - 1} \right) + d_{2} v\left( {t - 2} \right) + \cdots \\ & \quad + d_{p} v\left( {t - p} \right) = \varphi_{n}^{T} \left( t \right)\rho_{n} + v\left( t \right) \\ \end{aligned} $$
(4)

where the information vector \({\varphi }_{n}\left(t\right)\) is defined as follows:

$$ \begin{aligned} \varphi_{n} \left( t \right) & = [ - \omega \left( {t - 1} \right), - \omega \left( {t - 2} \right), \ldots , - \omega \left( {t - m} \right), \\ & \quad v\left( {t - 1} \right),v\left( {t - 2} \right), \ldots ,v\left( {t - p} \right)]^{T} \in {\mathbb{R}}^{{n_{4} }} ,\quad n_{4 } : = m + p \\ \end{aligned} $$

Substituting (3) in (2) and according to the definition of information vectors, the bilinear system identification model in (1) and (2) can be expressed as:

$$ \begin{aligned} \overline{y}\left( t \right) & = \varphi_{z}^{T} \left( t \right)\rho_{a} + \varphi_{{z\overline{u}}}^{T} \left( t \right)\rho_{b} + \varphi_{{\overline{u}}}^{T} \left( t \right)\rho_{f} \\ & \quad + \varphi_{n}^{T} \left( t \right)\rho_{n} + v\left( t \right) = \varphi^{T} \left( t \right)\rho + v\left( t \right) \\ \end{aligned} $$
(5)

2.1 Input–output model of bilinear system

The input–output relationship of a bilinear state-space system is obtained for the identification purpose by eliminating the state variables. By removing the state vector in Eqs. (1) and (2), the input–output relationship of the bilinear system is expressed as follows:

$$ \left[ {A\left( q \right) + u\left( {t - n} \right)B\left( q \right)} \right]y\left( t \right) = \left[ {C\left( q \right) + u\left( {t - 1} \right)D\left( q \right)} \right]u\left( t \right) + v\left( t \right) $$
(6)
$$ \begin{aligned} A\left( q \right) & = 1 + a_{1} q^{ - 1} + a_{2} q^{ - 2} + \cdots + a_{{n_{a} }} q^{{ - n_{a} }} \\ B\left( q \right) & = b_{1} q^{ - 1} + b_{2} q^{ - 2} + \cdots + b_{{n_{b} }} q^{{ - n_{b} }} \\ C\left( q \right) & = c_{1} q^{ - 1} + c_{2} q^{ - 2} + \cdots + c_{m} q^{ - m} \\ D\left( q \right) & = d_{2} q^{ - 2} + d_{3} q^{ - 3} + \cdots + c_{w} q^{ - w} \\ \end{aligned} $$

The following steps have been carried out to obtain (6). Therefore, using (1), one can write

$$\left\{\begin{array}{l}{z}_{1}\left(t+1\right)={z}_{2}\left(t\right)+{f}_{1}u\left(t\right)\\ {z}_{2}\left(t+1\right)={z}_{3}\left(t\right)+{f}_{2}u(t)\\ \vdots \\ {z}_{n-1}\left(t+1\right)={z}_{n}\left(t\right)+{f}_{n-1}u\left(t\right) \\ { z}_{n}\left(t+1\right)=-{a}_{n}{z}_{1}\left(t\right)-{a}_{n-1}{z}_{2}\left(t\right)-{a}_{n-2}{z}_{3}\left(t\right)-\dots -{a}_{1}{z}_{n}\left(t\right) \\ \quad -\left[{b}_{n}{z}_{1}\left(t\right)+{b}_{n-1}{z}_{2}\left(t\right)+{b}_{n-2}{z}_{3}\left(t\right)+\dots +{b}_{1}{z}_{n}\left(t\right)\right]u\left(t\right)+{f}_{n}u\left(t\right).\end{array}\right.$$
(7)

Then, by using (7), the following equations are obtained directly:

$$\left\{\begin{array}{l}{z}_{2}\left(t\right)={z}_{1}\left(t+1\right)-{f}_{1}u\left(t\right) \\ {z}_{3}\left(t\right)={z}_{2}\left(t+1\right)-{f}_{2}u\left(t\right) \\ \quad ={z}_{1}\left(t+2\right)-{f}_{1}u\left(t+1\right)-{f}_{2}u\left(t\right) \\ {z}_{4}\left(t\right)={z}_{3}\left(t+1\right)-{f}_{3}u\left(t\right) \\ \quad ={z}_{1}\left(t+3\right)-{f}_{1}u\left(t+2\right)-{f}_{2}u\left(t+1\right)-{f}_{3}u\left(t\right)\\ \vdots \\ {z}_{n}\left(t\right)={z}_{n-1}\left(t+1\right)-{f}_{n-1}u\left(t\right) \\ ={z}_{1}\left(t+n-1\right)-{f}_{1}u\left(t+n-2\right)-{f}_{2}u\left(t+n-3\right)-\dots {-f}_{n-1}u\left(t\right)\end{array}\right.$$
(8)

Multiplying both sides of last equation of (8) by operator \(q\), we have

$$ \begin{aligned} z_{n} \left( {t + 1} \right) & = z_{1} \left( {t + n} \right) - f_{1} u\left( {t + n - 1} \right) \\ & \quad - f_{2} u\left( {t + n - 2} \right) - \cdots - f_{n - 1} u\left( {t + 1} \right) \\ \end{aligned} $$
(9)

Replacing (9) in the last equation of (7) yields

$$ \begin{aligned} & - \left[ { a_{n} , a_{n - 1} , a_{n - 2} , \ldots , a_{1} } \right]\left[ {\begin{array}{*{20}c} {z_{1} \left( t \right)} \\ {\begin{array}{*{20}c} {z_{2} \left( t \right)} \\ {z_{3} \left( t \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} \vdots \\ {z_{n} \left( t \right)} \\ \end{array} } \\ \end{array} } \right] \\ & - \left[ { b_{n} , b_{n - 1} , b_{n - 2} , \ldots , b_{1} } \right]\left[ {\begin{array}{*{20}c} {z_{1} \left( t \right)} \\ {\begin{array}{*{20}c} {z_{2} \left( t \right)} \\ {z_{3} \left( t \right)} \\ \end{array} } \\ {\begin{array}{*{20}c} \vdots \\ {z_{n} \left( t \right)} \\ \end{array} } \\ \end{array} } \right]u\left( t \right) + f_{n} u\left( t \right) \\ & \qquad = z_{1} \left( {t + n} \right) - f_{1} u\left( {t + n - 1} \right) - f_{2} u\left( {t + n - 2} \right) - \cdots - f_{n - 1} u\left( {t + 1} \right) \\ \end{aligned} $$
(10)

Now, according to matrix representation of (8) and (10), one can write

$$ \begin{aligned} & \left( {1 + a_{1} q^{ - 1} + a_{2} q^{ - 2} + \cdots + a_{n} q^{ - n} } \right)q^{n} z_{1} \left( t \right) \\ & \qquad + \left[ {b_{1} q^{ - 1} + b_{2} q^{ - 2} + \cdots + b_{n} q^{ - n} } \right)q^{n} z_{1} \left( t \right)]u\left( t \right) \\ & \quad = [ f_{n} + a_{n - 1} f_{1} + a_{n - 2} f_{2} + \cdots + a_{1} f_{n - 1} ,f_{n - 1} \\ & \qquad + a_{n - 2} f_{1} + a_{n - 3} f_{2} + \cdots + a_{1} f_{n - 2} , \ldots , f_{2} + a_{1} f_{1} , f_{1} ] \\ & \qquad \left[ {\begin{array}{c} {u\left( t \right)} \\ {u\left( {t + 1} \right)} \\ \vdots \\ {u\left( {t + n - 1} \right)}\\ \end{array} } \right] = \bigg\{ [ b_{n - 1} f_{1} + b_{n - 2} f_{2} + \cdots + b_{1} f_{n - 1} ,b_{n - 2} f_{1} \\ & \qquad + b_{n - 3} f_{2} + \cdots + b_{1} f_{n - 2} , \ldots ,b_{1} f_{1} , 0 ] \\ & \qquad \left.\left[ {\begin{array}{c} {u\left( t \right)} \\ {u\left( {t + 1} \right)} \\ \vdots \\ {u\left( {t + n - 1} \right)} \\ \end{array} } \right] \right\} u\left( t \right) \\ \end{aligned} $$
(11)

In order to simplify (11), we define two vectors as follows:

$$ \begin{aligned} \left[ {c_{n} , \ldots ,c_{2} ,c_{1} } \right] & : = [ f_{n} + a_{n - 1} f_{1} + a_{n - 2} f_{2} + \cdots \\ & \quad + a_{1} f_{n - 1} , f_{2} + a_{1} f_{1} , f_{1} ] \in {\mathbb{R}}^{1 \times n} \\ \end{aligned} $$
(12)
$$ \begin{aligned} \left[ {d_{n} , \ldots ,d_{3} ,d_{2} } \right] & : = [ b_{n - 1} f_{1} + b_{n - 2} f_{2} + \cdots \\ & \quad + b_{1} f_{n - 1} ,b_{1} f_{1} ] \in {\mathbb{R}}^{{1 \times \left( {n - 1} \right)}} \\ \end{aligned} $$
(13)

Then, (11) can be written as

$$ \begin{aligned} & A\left( q \right)q^{n} z_{1} \left( t \right) + [B(q)q^{n} z_{1} (t)]u\left( t \right) = C\left( q \right)q^{n} u\left( t \right) \\ & \quad + [D(q)]q^{n} u\left( t \right)u\left( t \right). \\ \end{aligned} $$
(14)

where

$$ \begin{aligned} A\left( q \right) & = 1 + a_{1} q^{ - 1} + a_{2} q^{ - 2} + \cdots + a_{n} q^{ - n} \\ B\left( q \right) & = b_{1} q^{ - 1} + b_{2} q^{ - 2} + \cdots + b_{n} q^{ - n} \\ C\left( q \right) & = c_{1} q^{ - 1} + c_{2} q^{ - 2} + \cdots + c_{n} q^{ - n} \\ D\left( q \right) & = d_{2} q^{ - 2} + d_{3} q^{ - 3} + \cdots + d_{n} q^{ - n} \\ \end{aligned} $$

Then, Eq. (14) can be rewritten as

$$ \begin{aligned} & A\left( q \right)q^{n} z_{1} \left( t \right) + u\left( t \right)\left[ {B\left( q \right)q^{n} z_{1} \left( t \right)} \right] = C\left( q \right) z^{n} u\left( t \right) \\ & \quad + u\left( t \right)\left[ {D\left( q \right)q^{n} u\left( t \right)} \right] \\ \end{aligned} $$
(15)

By substituting \(t\) with \(t-n\), we have

$${z}_{1}\left(t\right)=\frac{C\left(q\right)+u(t-n)D\left(q\right)}{A\left(q\right)+u(t-n)B\left(q\right)} u(t)$$

Replacing \({z}_{1}\left(t\right)\) in relation (2), the input–output relation of the bilinear state-space system in (1) and (2) is obtained as follows:

$$y\left(t\right)= \frac{C\left(q\right)+u(t-n)D\left(q\right)}{A\left(q\right)+u(t-n)B\left(q\right)} u\left(t\right)+v(t)$$

3 Four-stage recursive least squares algorithm

In this section, a four-stage recursive least squares algorithm is proposed to alleviate computational load, increase the convergence rate of the parameters to actual values and reduce the error simultaneously. According to the hierarchical principle, the main system is broken down into four subsystems; then, an algorithm is presented to estimate the unknown parameters of the bilinear system. Consider the following performance index:

$$J\left(\rho \right)=\sum_{j=1}^{t}{\left[\overline{y }\left(j\right)-{\varphi }^{T}\left(j\right)\rho \right]}^{2}$$

Using the least-squares principle and minimizing the performance index, the recursive least squares algorithm can be written as

$$\widehat{\rho }\left(t\right)=\widehat{\rho }\left(t-1\right)+{K}\left(t\right)\left[\overline{y }\left(t\right)-{\varphi }^{T}\left(t\right)\widehat{\rho }\left(t-1\right)\right]$$
(16)
$${K}\left(t\right)=R\left(t-1\right)\varphi \left(t\right){[1+{\varphi }^{T}\left(t\right)R\left(t-1\right)\varphi \left(t\right)]}^{-1}$$
(17)
$$R\left(t\right)=\left[ I-{K}\left(t\right){\varphi }^{T}\left(t\right)\right]R\left(t-1\right)$$
(18)

where \(R\left(t\right)\) is the covariance matrix and \({K}\left(t\right)=R\left(t\right)\varphi \left(t\right)\) is the gain vector. The first problem of identification is that only the input and output data are available. Since \(\varphi \left(t\right)\) contains unknown state variables, and \({\varphi }_{n}\left(t\right)\) consists of the noise variable \((\omega \left(t-i\right) ,i=1,2,..m)\), it is not possible to estimate the parameter \(\widehat{\rho }\left(t\right)\) with Eqs. (16)–(18). Therefore, a bilinear state observer must be designed for state estimation.

3.1 Bilinear state observer algorithm

As we know, Kalman filter algorithm is commonly used to estimate the states of linear systems. Here, in order to apply the Kalman filter for bilinear systems, Eqs. (1) and (2) should be written in the following form:

$$ \begin{aligned} & {\varvec{z}}\left( {t + 1} \right) = A_{1} \left( {\varvec{t}} \right){\varvec{z}}\left( t \right) + f\overline{u}\left( t \right) \\ & \overline{y}\left( t \right) = h{\varvec{z}}\left( t \right) + \omega \left( t \right) \\ & A_{1} \left( {\varvec{t}} \right) = A + B\overline{u}\left( t \right) \\ \end{aligned} $$

which may be considered as a Linear state-space model. Therefore, the Kalman filter can be used to design a bilinear state observecr [48].

$$ \begin{aligned} \hat{\varvec{z}}\left( {t + 1} \right) & = A\hat{\varvec{z}}\left( t \right) + B\hat{\varvec{z}}\left( t \right)\overline{u}\left( t \right) + f\overline{u}\left( t \right) \\ & \quad + K_{z} \left( t \right)\left[ { \overline{y}\left( t \right) - h\hat{\varvec{z}}\left( t \right) - \varphi_{n}^{T} \left( t \right)\rho_{n} } \right] \\ \end{aligned} $$
(19)
$$ \begin{aligned} K_{z} \left( t \right) & = AR_{z} \left( t \right)h^{T} \left[ {hR_{z} \left( t \right)h^{T} + R_{v} } \right]^{ - 1} B\overline{u}\left( t \right)R_{z} \left( t \right) \\ & \quad \times h^{T} \left[ {hR_{z} \left( t \right)h^{T} + R_{v} } \right]^{ - 1} \\ \end{aligned} $$
(20)
$$ \begin{aligned} R_{z} \left( {t + 1} \right) & = \left[ {A - K_{z} \left( t \right)h + B\overline{u}\left( t \right)} \right]R_{z} \left( t \right) \\ & \quad \left[ {A^{T} - h^{T} K_{z}^{T} \left( t \right) + B^{T} \overline{u}\left( t \right)} \right] \\ & \quad + K_{z} \left( t \right)R_{v} K_{z}^{T} \left( t \right) \\ \end{aligned} $$
(21)

where \({\widehat{R}}_{v}\left(t\right)=\frac{1}{t}\sum_{j=1}^{t}{[\overline{y }\left(j\right)-h\widehat{{\varvec{z}}}\left(j\right)]}^{2}\), \({K}_{z}\left(t\right)\) is the optimal vector of state observer and \({R}_{z}\left(t+1\right)\) is the covariance matrix state estimation error.

If the vectors and matrices \(A\), \(B\), \(f\) and n are unknown, the bilinear state observer in (19)–(21) cannot be used. Therefore, \(\widehat{{\varvec{z}}}\left(t\right)\) should be estimated by considering the estimated parameters.

Therefore, the parameter estimation vectors are defined as follows:

$$ \begin{aligned} \hat{\rho } & = \left[ {\hat{\rho }_{s} ,\hat{\rho }_{n} } \right]^{T} \in {\mathbb{R}}^{{n_{0} }} \\ \rho_{s} & = \left[ {\hat{\rho }_{a}^{T} ,\hat{\rho }_{b}^{T} ,\hat{\rho }_{f}^{T} } \right]^{T} \in {\mathbb{R}}^{{n_{1} }} \\ \hat{\rho }_{n} & = \left[ {\hat{c}_{1} ,\hat{c}_{2} , \ldots \hat{c}_{m} ,\hat{d}_{1} ,\hat{d}_{2} \ldots ,\hat{d}_{p} } \right]^{T} \in {\mathbb{R}}^{{n_{4} }} \\ \hat{\rho }_{a} & = \left[ {\hat{a}_{1} ,\hat{a}_{2} , \ldots ,\hat{a}_{n} } \right]^{T} \\ \hat{\rho }_{b} & = \left[ {\hat{\varvec{b}}_{1} ,\hat{\varvec{b}}_{2} , \ldots ,\hat{\varvec{b}}_{{\varvec{n}}} } \right]^{T} \in {\mathbb{R}}^{{n_{3} }} \\ \hat{\rho }_{f} & = \left[ {\hat{f}_{1} ,\hat{f}_{2} , \ldots ,\hat{f}_{n} } \right]^{T} \in {\mathbb{R}}^{{n_{2} }} \\ \end{aligned} $$
$$\widehat{A}\left(t\right)=\left[\begin{array}{cccc}\begin{array}{c}-{\widehat{a}}_{1}\left(t\right)\\ -{\widehat{a}}_{2}\left(t\right)\\ \begin{array}{c}\vdots \\ \begin{array}{c}-{\widehat{a}}_{n-1}\left(t\right)\\ -{\widehat{a}}_{n}\left( t\right)\end{array}\end{array}\end{array}& \begin{array}{c}1\\ 0\\ \begin{array}{c}\vdots \\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{cc}\begin{array}{c}\begin{array}{c}0\\ 1\\ \ddots \end{array}\\ \begin{array}{c}\cdots \\ \dots \end{array}\end{array}& \begin{array}{c}\begin{array}{c}\begin{array}{c}\dots \\ \ddots \end{array}\\ \ddots \end{array}\\ 0\\ 0\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ \begin{array}{c}1\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$
(22)
$$ \hat{B}\left( t \right): = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{\varvec{b}}_{1} \left( {\varvec{t}} \right)} \\ {\hat{\varvec{b}}_{2} \left( {\varvec{t}} \right)} \\ \end{array} } \\ \vdots \\ {\hat{\varvec{b}}_{{\varvec{n}}} \left( {\varvec{t}} \right)} \\ \end{array} } \right]\,\,\,{\varvec{b}}_{{\varvec{i}}} \in {\mathbb{R}}^{1 \times n} ,\hat{f}\left( t \right): = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{f}_{1} \left( t \right)} \\ {\hat{f}_{2} \left( t \right)} \\ \end{array} } \\ \vdots \\ {\hat{f}_{n} \left( t \right)} \\ \end{array} } \right] $$
(23)

By substituting \(A\), \(B\) and \(f\) in (19)–(21) with the estimated matrices and vectors \(\widehat{A}(t)\), \(\widehat{B}(t)\) and \(\widehat{f}(t)\), we have

$$ \begin{aligned} \hat{\varvec{z}}\left( {t + 1} \right) & = \hat{A}\hat{\varvec{z}}\left( t \right) + \hat{B}\hat{\varvec{z}}\left( t \right)\overline{u}\left( t \right) + \hat{\rho }_{f} \overline{u}\left( t \right) \\ & \quad + K_{z} \left( t \right)\left[ {\overline{y}\left( t \right) - h\hat{\varvec{z}}\left( t \right) - \hat{\varphi }_{n}^{T} \left( t \right)\hat{\rho }_{n} } \right] \\ \end{aligned} $$
(24)
$$ \begin{aligned} K_{z} \left( t \right) & = \hat{A}R_{z} \left( t \right)h^{T} \left[ {hR_{z} \left( t \right)h^{T} + R_{v} } \right]^{ - 1} \hat{B}\overline{u}\left( t \right)R_{z} \left( t \right) \\ & \quad \times h^{T} \left[ {hR_{z} \left( t \right)h^{T} + R_{v} } \right]^{ - 1} \\ \end{aligned} $$
(25)
$$ \begin{aligned} R_{z} \left( {t + 1} \right) & = \left[ {\hat{A} - K_{z} \left( t \right)h + \hat{B}\overline{u}\left( t \right)} \right]R_{z} \left( t \right) \\ & \quad \left[ {\hat{A}^{T} - h^{T} K_{z}^{T} \left( t \right) + \hat{B}^{T} \overline{u}\left( t \right)} \right] \\ & \quad + K_{z} \left( t \right)R_{v} K_{z}^{T} \left( t \right) \\ \end{aligned} $$
(26)

Thus, based on the bilinear state observer the estimates \(\widehat{{\varvec{z}}}(t)\) of the unknown states z(t) can be calculated.

Replacing an unknown state \({\varvec{z}}\left(t-i\right)\) with estimated \(\widehat{{\varvec{z}}}\left(t-i\right)\) and an unknown noise term \(\omega \left(t-i\right)\) with estimated \(\widehat{\omega }\left(t-i\right)\), the information estimation vectors are defined as:

$$\widehat{\varphi }\left(t\right)={\left[{{\widehat{\varphi }}_{z}}^{T}\left(t\right),{{\widehat{\varphi }}_{z\overline{u}} }^{T}\left(t\right),{{\varphi }_{\overline{u}} }^{T}\left(t\right),{{\widehat{\varphi }}_{n}}^{T}\left(t\right)\right]}^{T}\in {\mathbb{R}}^{{n}_{0}}$$
$${\widehat{\varphi }}_{s}\left(t\right)={\left[{{\widehat{\varphi }}_{z}}^{T}\left(t\right),{{\widehat{\varphi }}_{z\overline{u}} }^{T}\left(t\right),{{\varphi }_{\overline{u}} }^{T}(t)\right]}^{T}\in {\mathbb{R}}^{{n}_{1}}$$
$${\widehat{\varphi }}_{z}\left(t\right)={\left[-{\widehat{z}}_{1}\left(t-1\right),-{\widehat{z}}_{1}\left(t-2\right) ,\dots ,-{\widehat{z}}_{1}\left(t-n\right)\right]}^{T} \in {\mathbb{R}}^{{n}_{2}}$$
$${\widehat{\varphi }}_{z\overline{u} }=[{\widehat{{\varvec{z}}}}^{T}\left(t-1\right)\overline{u }\left(t-1\right),{\widehat{{\varvec{z}}}}^{T}\left(t-2\right)\overline{u }\left(t-2\right),\dots { ,\widehat{{\varvec{z}}}}^{T}\left(t-n\right)\overline{u }\left(t-n\right){]}^{T}\in {\mathbb{R}}^{{n}_{3}}$$
$${\widehat{\varphi }}_{n}\left(t\right)=[-\widehat{\omega }\left(t-1\right),-\widehat{\omega }\left(t-2\right),\dots ,-\widehat{\omega }\left(t-m\right), \widehat{v}\left(t-1\right),\widehat{v}\left(t-2\right),\dots ,\widehat{v}\left(t-p\right)]^{T}\in {\mathbb{R}}^{{n}_{4}}$$

It should be noted that the estimations of \(\omega \left(t\right)\) and \(v\left(t\right)\) are defined as:

$$\widehat{\omega }\left(t\right)=\overline{y }\left(t\right)-{\widehat{\varphi }}_{s}\left(t\right){\widehat{\rho }}_{s}\left(t-1\right)$$
(27)
$$\widehat{v}\left(t\right)=\overline{y }\left(t\right)-{\widehat{\varphi }}^{T}\left(t\right)\widehat{\rho }\left(t-1\right)$$
(28)

The identification models 4 and 5 can be written in four subsystems as follows:

$$ \begin{aligned} \overline{y}_{a} \left( t \right) & = \varphi_{z}^{T} \left( t \right)\rho_{a} + v\left( t \right) \\ \overline{y}_{b} \left( t \right) & = \varphi_{{z\overline{u}}}^{T} \left( t \right)\rho_{b} + v\left( t \right) \\ \overline{y}_{{\overline{u}}} \left( t \right) & = \varphi_{{\overline{u}}}^{T} \left( t \right)\rho_{f} + v\left( t \right) \\ \omega \left( t \right) & = \varphi_{n}^{T} \left( t \right)\rho_{n} + v\left( t \right) \\ \end{aligned} $$

According to the least squares principle, by defining the criterion function, the recursive relationships are as follows:

$$ \begin{aligned} \hat{\rho }_{a} \left( t \right) & = \hat{\rho }_{a} \left( {t - 1} \right) + K_{1} \left( t \right)\left[ {\overline{y}_{z} \left( t \right) - \varphi_{z}^{T} \left( t \right)\hat{\rho }_{a} \left( {t - 1} \right)} \right] \\ & = \hat{\rho }_{a} \left( {t - 1} \right) + K_{1} \left( t \right)[\overline{y}\left( t \right) - \varphi_{{z\overline{u}}}^{T} \left( t \right)\rho_{b} \\ & \quad - \varphi_{{\overline{u}}}^{T} \left( t \right)\rho_{f} - \varphi_{z}^{T} \left( t \right)\hat{\rho }_{a} \left( {t - 1} \right)] \\ \end{aligned} $$
(29)
$${K}_{1}\left(t\right)=\frac{{R}_{1}\left(t-1\right){\varphi }_{z}\left(t\right)}{\beta +{\varphi }_{z}^{T}\left(t\right){R}_{1}\left(t-1\right){\varphi }_{z}\left(t\right)}$$
(30)
$${R}_{1}\left(t\right)=\frac{1}{\beta }\left[I-{K}_{1}\left(t\right){\varphi }_{z}^{T}\left(t\right)\right]{R}_{1}\left(t-1\right)$$
(31)
$$ \begin{aligned} \hat{\rho }_{b} \left( t \right) & = \hat{\rho }_{b} \left( {t - 1} \right) + K_{2} \left( t \right)\left[ {\overline{y}_{{z\overline{u}}} \left( t \right) - \varphi_{{z\overline{u}}}^{T} \left( t \right)\hat{\rho }_{b} \left( {t - 1} \right)} \right] \\ & = \hat{\rho }_{b} \left( {t - 1} \right) + K_{2} \left( t \right)[\overline{y}\left( t \right) - \varphi_{z}^{T} \left( t \right)\rho_{a} \\ & \quad - \varphi_{{\overline{u}}}^{T} \left( t \right)\rho_{f} - \varphi_{{z\overline{u}}}^{T} \left( t \right)\hat{\rho }_{b} \left( {t - 1} \right)] \\ \end{aligned} $$
(32)
$${K}_{2}\left(t\right)=\frac{{R}_{2}\left(t-1\right){\varphi }_{z\overline{u} }\left(t\right)}{\beta +{\varphi }_{z\overline{u} }^{T}\left(t\right){R}_{2}\left(t-1\right){\varphi }_{z\overline{u} }\left(t\right)}$$
(33)
$${R}_{2}\left(t\right)=\frac{1}{\beta }\left[I-{ K}_{2}\left(t\right){\varphi }_{z\overline{u} }^{T}\left(t\right)\right]{R}_{2}\left(t-1\right)$$
(34)
$$ \begin{aligned} \hat{\rho }_{f} \left( t \right) & = \hat{\rho }_{f} \left( {t - 1} \right) + K_{3} \left( t \right)\left[ {\overline{y}_{{\overline{u}}} \left( t \right) - \varphi_{{\overline{u}}}^{T} \left( t \right)\hat{\rho }_{f} \left( {t - 1} \right)} \right] \\ & = \hat{\rho }_{f} \left( {t - 1} \right) + K_{3} \left( t \right)[\overline{y}\left( t \right) - \varphi_{z}^{T} \left( t \right)\rho_{a} \\ & \quad - \varphi_{{z\overline{u}}}^{T} \left( t \right)\rho_{b} - \varphi_{{\overline{u}}}^{T} \left( t \right)\hat{\rho }_{f} \left( {t - 1} \right)] \\ \end{aligned} $$
(35)
$${K}_{3}\left(t\right)=\frac{{R}_{3}\left(t-1\right){\varphi }_{\overline{u} }\left(t\right)}{\beta +{\varphi }_{\overline{u} }^{T}\left(t\right){R}_{3}\left(t-1\right){\varphi }_{\overline{u} }\left(t\right)}$$
(36)
$${R}_{3}\left(t\right)=\frac{1}{\beta }\left[I-{K}_{3}\left(t\right){\varphi }_{\overline{u} }^{T}\left(t\right)\right]{R}_{3}\left(t-1\right)$$
(37)
$${\widehat{\rho }}_{n}\left(t\right)={\widehat{\rho }}_{n}\left(t-1\right)+{K}_{4}\left(t\right)[\omega \left(t\right)-{{\varphi }_{n}}^{T}\left(t\right){\widehat{\rho }}_{n}\left(t-1\right)]$$
(38)
$${K}_{4}\left(t\right)=\frac{{R}_{4}\left(t-1\right){\varphi }_{n}\left(t\right)}{\beta +{\varphi }_{n}^{T}\left(t\right){R}_{3}\left(t-1\right){\varphi }_{n}\left(t\right)}$$
(39)
$${R}_{4}\left(t\right)=\frac{1}{\beta }\left[I-{K}_{4}\left(t\right){\varphi }_{n}^{T}\left(t\right)\right]{R}_{4}\left(t-1\right)$$
(40)

The information vectors \({\varphi }_{z}\), \({\varphi }_{z\overline{u} }\) and \({\varphi }_{n}\) contain unknown states \(z\left(t\right)\), and Eqs. (29), (32), (35) and (38) are used to estimate unknown parameters. Therefore, the algorithm (29)–(40) cannot directly estimate the unknown parameters. Consequently, by substituting their estimations, we have the following relations:

$$ \begin{aligned} \hat{\rho }_{a} \left( t \right) & = \hat{\rho }_{a} \left( {t - 1} \right) + K_{1} \left( t \right)[\overline{y}\left( t \right) - \varphi_{{z\overline{u}}}^{T} \left( t \right)\hat{\rho }_{b} \\ & \quad - \varphi_{{\overline{u}}}^{T} \left( t \right)\hat{\rho }_{f} - \varphi_{z}^{T} \left( t \right)\hat{\rho }_{a} \left( {t - 1} \right)] \\ \end{aligned} $$
(41)
$${K}_{1}\left(t\right)=\frac{{R}_{1}\left(t-1\right){\widehat{\varphi }}_{z}\left(t\right)}{\beta +{\widehat{\varphi }}_{z}^{T}\left(t\right){R}_{1}\left(t-1\right){\widehat{\varphi }}_{z}\left(t\right)}$$
(42)
$${R}_{1}\left(t\right)=\frac{1}{\beta }\left[I-{K}_{1}\left(t\right){\widehat{\varphi }}_{z}^{T}\left(t\right)\right]{R}_{1}\left(t-1\right)$$
(43)
$$ \begin{aligned} \hat{\rho }_{b} \left( t \right) & = \hat{\rho }_{b} \left( {t - 1} \right) + K_{2} \left( t \right)[\overline{y}\left( t \right) - \hat{\varphi }_{z}^{T} \left( t \right)\hat{\rho }_{a} \\ & \quad - \varphi_{{\overline{u}}}^{T} \left( t \right)\hat{\rho }_{f} - \hat{\varphi }_{{z\overline{u}}}^{T} \left( t \right)\hat{\rho }_{b} \left( {t - 1} \right)] \\ \end{aligned} $$
(44)
$${K}_{2}\left(t\right)=\frac{{R}_{2}\left(t-1\right){\widehat{\varphi }}_{z\overline{u} }\left(t\right)}{\beta +{\widehat{\varphi }}_{z\overline{u} }^{T}\left(t\right){R}_{2}\left(t-1\right){\widehat{\varphi }}_{z\overline{u} }\left(t\right)}$$
(45)
$${R}_{2}\left(t\right)=\frac{1}{\beta }\left[I-{K}_{2}\left(t\right){\widehat{\varphi }}_{z\overline{u} }^{T}\left(t\right)\right]{R}_{2}\left(t-1\right)$$
(46)
$$ \begin{aligned} \hat{\rho }_{f} \left( t \right) & = \hat{\rho }_{f} \left( {t - 1} \right) + K_{3} \left( t \right)[\overline{y}\left( t \right) - \hat{\varphi }_{z}^{T} \left( t \right)\hat{\rho }_{a} \\ & \quad - \hat{\varphi }_{{z\overline{u}}}^{T} \left( t \right)\hat{\rho }_{b} - \varphi_{{\overline{u}}}^{T} \left( t \right)\hat{\rho }_{f} \left( {t - 1} \right)] \\ \end{aligned} $$
(47)
$${ K}_{3}\left(t\right)=\frac{{R}_{3}\left(t-1\right){\varphi }_{\overline{u} }\left(t\right)}{\beta +{\varphi }_{\overline{u} }^{T}\left(t\right){R}_{3}\left(t-1\right){\varphi }_{\overline{u} }\left(t\right)}$$
(48)
$${R}_{3}\left(t\right)=\frac{1}{\beta }\left[I-{K}_{3}\left(t\right){\varphi }_{\overline{u} }^{T}\left(t\right)\right]{R}_{3}\left(t-1\right)$$
(49)
$${\widehat{\rho }}_{n}\left(t\right)={\widehat{\rho }}_{n}\left(t-1\right)+{K}_{4}\left(t\right)[\widehat{\omega }\left(t\right)-{{\varphi }_{n}}^{T}\left(t\right){\widehat{\rho }}_{n}\left(t-1\right)]$$
(50)
$${K}_{4}\left(t\right)=\frac{{R}_{4}\left(t-1\right){\widehat{\varphi }}_{n}\left(t\right)}{1+{\widehat{\varphi }}_{n}^{T}\left(t\right){R}_{3}\left(t-1\right){\widehat{\varphi }}_{n}\left(t\right)}$$
(51)
$${R}_{4}\left(t\right)=\left[I-{K}_{4}\left(t\right){\widehat{\varphi }}_{n}^{T}\left(t\right)\right]{R}_{4}\left(t-1\right)$$
(52)
$${\widehat{\varphi }}_{z}\left(t\right)={\left[-{\widehat{z}}_{1}\left(t-1\right),-{\widehat{z}}_{1}\left(t-2\right) ,\dots ,-{\widehat{z}}_{1}\left(t-n\right)\right]}^{T} \in {\mathbb{R}}^{n}$$
(53)
$${\widehat{\varphi }}_{zu}=[{\widehat{{\varvec{z}}}}^{T}\left(t-1\right)\overline{u }\left(t-1\right),{\widehat{{\varvec{z}}}}^{T}\left(t-2\right)\overline{u }\left(t-2\right),\dots { ,\widehat{{\varvec{z}}}}^{T}\left(t-n\right)\overline{u }\left(t-n\right){]}^{T}\in {\mathbb{R}}^{{n}^{2}}$$
(54)
$${\varphi }_{u}\left(t\right)={\left[\overline{u }\left(t-1\right),\overline{u }\left(t-2\right) ,\dots , \overline{u }\left(t-n\right)\right]}^{T}$$
(55)
$$ \begin{aligned} \hat{\varphi }_{n} \left( t \right) & = [ - \hat{\omega }\left( {t - 1} \right), - \hat{\omega }\left( {t - 2} \right), \ldots , \\ & \quad - \hat{\omega }\left( {t - m} \right),\hat{v}\left( {t - 1} \right),\hat{v}\left( {t - 2} \right), \ldots ,\hat{v}\left( {t - p} \right)]^{T} \in {\mathbb{R}}^{m + p} \\ \end{aligned} $$
(56)
$$ \begin{aligned} \hat{\varvec{z}}\left( {t + 1} \right) & = \hat{A}\hat{\varvec{z}}\left( t \right) + \hat{B}\hat{\varvec{z}}\left( t \right)\overline{u}\left( t \right) + \hat{\rho }_{f} \overline{u}\left( t \right) \\ & \quad + K_{z} \left( t \right)\left[ { \overline{y}\left( t \right) - h\hat{\varvec{z}}\left( t \right) - \hat{\varphi }_{n}^{T} \left( t \right)\hat{\rho }_{n} } \right] \\ \end{aligned} $$
(57)
$$ \begin{aligned} K_{z} \left( t \right) & = \hat{A}R_{z} \left( t \right)h^{T} \left[ {hR_{z} \left( t \right)h^{T} + R_{v} } \right]^{ - 1} \hat{B}\overline{u}\left( t \right)R_{z} \left( t \right) \\ & \quad \times h^{T} \left[ {hR_{z} \left( t \right)h^{T} + R_{v} } \right]^{ - 1} \\ \end{aligned} $$
(58)
$$ \begin{aligned} R_{z} \left( {t + 1} \right) & = \left[ {\hat{A} - K_{z} \left( t \right)h + \hat{B}\overline{u}\left( t \right)} \right]R_{z} \left( t \right) \\ & \quad \left[ {\hat{A}^{T} - h^{T} K_{z}^{T} \left( t \right) + \hat{B}^{T} \overline{u}\left( t \right)} \right] \\ & \quad + K_{z} \left( t \right)R_{v} K_{z}^{T} \left( t \right) \\ \end{aligned} $$
(59)
$$\widehat{A}\left(t\right)=\left[\begin{array}{ccc}\begin{array}{c}-{\widehat{a}}_{1}\left(t\right)\\ -{\widehat{a}}_{2}\left(t\right)\\ \begin{array}{c}\vdots \\ \begin{array}{c}-{\widehat{a}}_{n-1}\left(t\right)\\ -{\widehat{a}}_{n}\left( t\right)\end{array}\end{array}\end{array}& \begin{array}{c}1\\ 0\\ \begin{array}{c}\vdots \\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{cc}\begin{array}{c}\begin{array}{c}0\\ 1\\ \ddots \end{array}\\ \begin{array}{c}\cdots \\ \dots \end{array}\end{array}& \begin{array}{c}\begin{array}{c}\begin{array}{c}\dots \\ \ddots \end{array}\\ \ddots \end{array}\\ 0\\ 0\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ \begin{array}{c}1\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$
(60)
$$ \hat{B}\left( t \right): = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{\varvec{b}}_{1} \left( {\varvec{t}} \right)} \\ {\hat{\varvec{b}}_{2} \left( {\varvec{t}} \right)} \\ \end{array} } \\ \vdots \\ {\hat{\varvec{b}}_{{\varvec{n}}} \left( {\varvec{t}} \right)} \\ \end{array} } \right]\,\,\,{\varvec{b}}_{{\varvec{i}}} \in {\mathbb{R}}^{1 \times n} ,\hat{f}\left( t \right): = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{f}_{1} \left( t \right)} \\ {\hat{f}_{2} \left( t \right)} \\ \end{array} } \\ \vdots \\ {\hat{f}_{n} \left( t \right)} \\ \end{array} } \right] $$
(61)

Equations (41)–(61) consist of four recursive least-squares algorithms for the bilinear system (1) and (2). In summary, the steps of the above algorithm are given in Algorithm 1.

figure a

3.2 Convergence analysis

Assumption 1

Assume \(\{v\left(t\right)\}\) is random white noise with zero mean and abounded variance \({\sigma }^{2}\)

$$E\left[v\left(t\right)\right]=0$$
(62)
$$E\left[{v}^{2}\left(t\right)\right]={\sigma }^{2}<\infty $$
(63)

Lemma 1

For the four-stage recursive least squares algorithm in (41)–(61), for any c > 1, the following inequalities hold:

$$\sum_{t=1}^{\infty }\frac{{{\widehat{\varphi }}_{z}}^{T}\left(t\right){R}_{1}(t){\widehat{\varphi }}_{z}\left(t\right)}{{\left[\mathrm{ln}\left|{{R}_{1}}^{-1}\left(t\right)\right|\right]}^{c}}<\infty $$
(64)
$$\sum_{t=1}^{\infty }\frac{{{\widehat{\varphi }}_{z\overline{u}} }^{T}\left(t\right){R}_{2}(t){\widehat{\varphi }}_{z\overline{u} }\left(t\right)}{{\left[\mathrm{ln}\left|{{R}_{2}}^{-1}\left(t\right)\right|\right]}^{c}}<\infty $$
(65)
$$\sum_{t=1}^{\infty }\frac{{{\widehat{\varphi }}_{\overline{u}} }^{T}\left(t\right){R}_{3}(t){\widehat{\varphi }}_{\overline{u} }\left(t\right)}{{\left[\mathrm{ln}\left|{{R}_{3}}^{-1}\left(t\right)\right|\right]}^{c}}<\infty $$
(66)
$$\sum_{t=1}^{\infty }\frac{{{\widehat{\varphi }}_{n}}^{T}\left(t\right){R}_{4}(t){\widehat{\varphi }}_{n}\left(t\right)}{{\left[\mathrm{ln}\left|{{R}_{4}}^{-1}\left(t\right)\right|\right]}^{c}}<\infty $$
(67)

The proof of Lemma 1can be found in [49].

Theorem 1

For the system in (1)–(2) and the four-stage recursive least squares algorithm in (41)–(61), let

$$ \begin{aligned} M\left( t \right) & : = \left[ {\ln \left| {R_{1}^{ - 1} \left( t \right)} \right|} \right]^{c} + \left[ {\ln \left| {R_{2}^{ - 1} \left( t \right)} \right|} \right]^{c} \\ & \quad + \left[ {\ln \left| {R_{3}^{ - 1} \left( t \right)} \right|} \right]^{c} + \left[ {\ln \left| {R_{4}^{ - 1} \left( t \right)} \right|} \right]^{c} \\ \end{aligned} $$
(68)

Assume that Conditions (62) and (63) hold, for any c > 1, we have

$$ \begin{aligned} \left\| {\hat{\rho }_{a} \left( t \right) - \rho_{a} } \right\|^{2} & = {{O}}\left( {\frac{M\left( t \right)}{{\lambda_{\min } \left[ {R_{1}^{ - 1} \left( t \right)} \right]}}} \right), \\ \left\| {\hat{\rho }_{b} \left( t \right) - \rho_{b} } \right\|^{2} & = {{O}}\left( {\frac{M\left( t \right)}{{\lambda_{\min } \left[ {R_{2}^{ - 1} \left( t \right)} \right]}}} \right) \\ \end{aligned} $$
(69)
$$ \begin{aligned} \left\| {\hat{\rho }_{f} \left( t \right) - \rho_{f} } \right\|^{2} & = O\left( {\frac{M\left( t \right)}{{\lambda_{\min } \left[ {R_{3}^{ - 1} \left( t \right)} \right]}}} \right), \\ \left\| {\hat{\rho }_{n} \left( t \right) - \rho_{n} } \right\|^{2} & = O\left( {\frac{M\left( t \right)}{{\lambda_{\min } \left[ {R_{4}^{ - 1} \left( t \right)} \right]}}} \right) \\ \end{aligned} $$
(70)

Proof

See the detailed proof in [50]. □

Theorem 2

For the identification model in Eq. (5) and the four-stage recursive least squares algorithm in Eqs. (41)–(61), assume that there are positive constants \({\alpha }_{1}\), \({\alpha }_{2}\), \({\alpha }_{3}\), \({\alpha }_{4}\), \({\beta }_{1}\), \({\beta }_{2}\), \({\beta }_{3}\), \({\beta }_{4}\) and large \(t\) such that the following persistent excitation conditions hold:

$${\alpha }_{1}{I}_{{n}_{2}}\le \frac{1}{t}\sum_{i=1}^{t}{\widehat{\varphi }}_{z}\left(i\right){{\widehat{\varphi }}_{z}}^{T}\left(i\right)\le {\beta }_{1}{I}_{{n}_{2}}$$
(71)
$${\alpha }_{2}{I}_{{n}_{3}}\le \frac{1}{t}\sum_{i=1}^{t}{\widehat{\varphi }}_{z\overline{u} }\left(i\right){{\widehat{\varphi }}_{z\overline{u}} }^{T}\left(i\right)\le {\beta }_{2}{I}_{{n}_{3}}$$
(72)
$${\alpha }_{3}{I}_{{n}_{2}}\le \frac{1}{t}\sum_{i=1}^{t}{\widehat{\varphi }}_{\overline{u} }\left(i\right){{\widehat{\varphi }}_{\overline{u}} }^{T}\left(i\right)\le {\beta }_{3}{I}_{{n}_{2}}$$
(73)
$${\alpha }_{4}{I}_{{n}_{4}}\le \frac{1}{t}\sum_{i=1}^{t}{\widehat{\varphi }}_{n}\left(i\right){{\widehat{\varphi }}_{n}}^{T}\left(i\right)\le {\beta }_{4}{I}_{{n}_{4}}$$
(74)

Then, four-stage recursive least squares parameter estimation errors converge to zero as t goes to infinity.

$$ \left\| {\hat{\rho }_{a} \left( t \right) - \rho_{a} } \right\|^{2} \to ,\,\,\left\| {\hat{\rho }_{b} \left( t \right) - \rho_{b} } \right\|^{2} \to 0 $$
(75)
$$ \left\| {\hat{\rho }_{f} \left( t \right) - \rho_{f} } \right\|^{2} \to ,\,\,\left\| {\hat{\rho }_{n} \left( t \right) - \rho_{n} } \right\|^{2} \to 0 $$
(76)

The proof is expressed in Appendix section.

4 The computational efficiency

Utilizing flops is an useful way to determine computsational efficiency [51]. Here, a flop is each operation of addition, multiplication, subtraction or division. In general, a division is presumed as a multiplication and a subtraction is presumed as an addition. Therefore, the algorithm can be represented by additions and multiplications. The number of multiplications and additions of the proposed algorithms are listed in Tables 1 and 2. In order to show the computational efficiency in the 4S-RLS algorithm, an RLS algorithm is presented for the sake of comparison. Tables 3 and 4 show that the computational load of the proposed algorithm is less than the RLS algorithm.

Table 1 Computational efficiency of the RLS algorithm
Table 2 Computational efficiency of the 4S-RLS algorithm
Table 3 Number of additions and multiplications of the algorithms
Table 4 Comparison of total flops of the algorithms with \({n}_{1}=8\), \({n}_{2}=2\), \({n}_{3}=4\), \({n}_{4}=2\)

The flop difference between the RLS algorithm and the 4S-RLS algorithm is as follows: \(N_{2} - N_{1} = 6\left( {2n_{2} + n_{3} + n_{4} } \right)^{2} + 6\left( {2n_{2} + n_{3} + n_{4} } \right) - \left[ {6\left( {2n_{2}^{2} + n_{3}^{2} + n_{4}^{2} } \right) + 8\left( {n_{2}^{2} + 2n_{2} } \right) + 4\left( {2n_{2} + n_{3} + n_{4} } \right) } \right] = 4n_{2}^{2} + 24n_{2} n_{3} + 24n_{2} n_{4} + 12n_{3} n_{4} - 12n_{2} + 2n_{3} + 2n_{4} > 0\). Therefore, \({N}_{1}<{N}_{2}\) which means that the 4S-RLS algorithm is more efficient than the RLS algorithm.

5 Four-stage stochastic gradient algorithm

In this part, a four-stage stochastic gradient algorithm is considered to estimate the unknown parameters and reduce the computational burden.

The second-order criterion function is considered as follows:

$$J\left(\rho \right)=\frac{1}{2}{\left[\overline{y }\left(t\right)-{\varphi }^{T}\left(t\right)\rho \right]}^{2}$$

By computing the gradient of \(J\), we have

$$\nabla \left[J\left(\rho \right)\right]=\frac{\partial \left(J\left(\rho \right)\right)}{\partial \rho }=-\varphi (t)\left[\overline{y }\left(t\right)-{\varphi }^{T}\left(t\right)\rho \right]$$

According to the gradient search principle and minimizing the objective function, the stochastic gradient algorithm is obtained as follows:

Assume that \(\frac{1}{\mu (t)}\) is the step size.

$$ \begin{aligned} \hat{\rho }\left( t \right) & = \hat{\rho }\left( {t - 1} \right) + \frac{\varphi \left( t \right)}{{\mu \left( t \right)}}\left[ {\overline{y}\left( t \right) - \varphi^{T} \left( t \right)\hat{\rho }\left( {t - 1} \right) } \right] \\ \mu \left( t \right) & = \propto \mu \left( {t - 1} \right) + \left\| {\varphi \left( t \right)} \right\|^{2} ,\,\,\mu \left( 0 \right) = 1 \\ \end{aligned} $$

where \(0\le \propto <1\) is a forgetting factor that can improve the accuracy of parameter estimation. Hence, the four-stage stochastic gradient algorithm is obtained as follows:

$${\widehat{\rho }}_{a}\left(t\right) ={\widehat{\rho }}_{a}\left(t-1\right)+\frac{{\varphi }_{z}}{{\mu }_{1}\left(t\right)}[\overline{y }\left(t\right)-{{\varphi }_{z\overline{u}} }^{T}\left(t\right){\rho }_{b}-{{\varphi }_{\overline{u}} }^{T}\left(t\right){\rho }_{f} -{{\varphi }_{z}}^{T}\left(t\right){\widehat{\rho }}_{a}\left(t-1\right)]$$
(77)
$${\mu }_{1}\left(t\right)=\propto {\mu }_{1}\left(t-1\right)+{\Vert {\varphi }_{z}(t)\Vert }^{2}$$
(78)
$${\widehat{\rho }}_{b}\left(t\right)={\widehat{\rho }}_{b}\left(t-1\right)+\frac{{\widehat{\varphi }}_{z\overline{u}}}{{\mu }_{2}\left(t\right)}[\overline{y }\left(t\right)-{{\widehat{\varphi }}_{z}}^{T}\left(t\right){\widehat{\rho }}_{a}-{{\varphi }_{\overline{u}} }^{T}\left(t\right){\widehat{\rho }}_{f}-{{\widehat{\varphi }}_{z\overline{u}} }^{T}\left(t\right){\widehat{\rho }}_{b}\left(t-1\right)]$$
(79)
$${\mu }_{2}\left(t\right)=\propto {\mu }_{2}\left(t-1\right)+{\Vert {\varphi }_{z\overline{u} }(t)\Vert }^{2}$$
(80)
$${\widehat{\rho }}_{f}\left(t\right)={\widehat{\rho }}_{f}\left(t-1\right)+\frac{{\varphi }_{\overline{u}}}{{\mu }_{3}\left(t\right)}[y(t)-{{\widehat{\varphi }}_{z}}^{T}\left(t\right){\widehat{\rho }}_{a} -{{\widehat{\varphi }}_{z\overline{u}} }^{T}\left(t\right){\widehat{\rho }}_{b}-{{\varphi }_{\overline{u}} }^{T}\left(t\right){\widehat{\rho }}_{f}\left(t-1\right)]$$
(81)
$${\mu }_{3}\left(t\right)=\propto {\mu }_{3}\left(t-1\right)+{\Vert {\varphi }_{\overline{u} }(t)\Vert }^{2}$$
(82)
$${\widehat{\rho }}_{n}\left(t\right)={\widehat{\rho }}_{n}\left(t-1\right)+\frac{{\varphi }_{n}}{{\mu }_{4}\left(t\right)}[\widehat{\omega }\left(t\right)- {{\varphi }_{n}}^{T}\left(t\right){\widehat{\rho }}_{n}\left(t-1\right)]$$
(83)
$${\mu }_{4}\left(t\right)=\propto {\mu }_{4}\left(t-1\right)+{\Vert {\varphi }_{n}(t)\Vert }^{2}$$
(84)
$${\widehat{\varphi }}_{z}\left(t\right)={\left[-{\widehat{z}}_{1}\left(t-1\right),-{\widehat{z}}_{1}\left(t-2\right),\dots ,-{\widehat{z}}_{1}\left(t-n\right)\right]}^{T}$$
(85)
$${\widehat{\varphi }}_{zu}=[{\widehat{{\varvec{z}}}}^{T}\left(t-1\right)\overline{u }\left(t-1\right),{\widehat{{\varvec{z}}}}^{T}\left(t-2\right)\overline{u }\left(t-2\right),\dots { ,\widehat{{\varvec{z}}}}^{T}\left(t-n\right)\overline{u }\left(t-n\right){]}^{T}\in {\mathbb{R}}^{{n}^{2}}$$
(86)
$${\varphi }_{u}\left(t\right)={\left[\overline{u }\left(t-1\right),\overline{u }\left(t-2\right) ,\dots , \overline{u }\left(t-n\right)\right]}^{T}$$
(87)
$${\widehat{\varphi }}_{n}\left(t\right)=[-\widehat{\omega }\left(t-1\right),-\widehat{\omega }\left(t-2\right),\dots ,-\widehat{\omega }\left(t-m\right),\widehat{v}\left(t-1\right),\widehat{v}\left(t-2\right),\dots ,\widehat{v}\left(t-p\right){]}^{T}\in {\mathbb{R}}^{m+p}$$
(88)
$$\widehat{{\varvec{z}}}\left(t+1\right)=\widehat{A}\widehat{{\varvec{z}}}\left(t\right)+\widehat{B}\widehat{{\varvec{z}}}\left(t\right)\overline{u }\left(t\right)+{\widehat{\rho }}_{f}\overline{u }\left(t\right)+{K}_{z}\left(t\right)\left[\overline{y }\left(t\right)-h\widehat{{\varvec{z}}}\left(t\right)-{{\widehat{\varphi }}_{n}}^{T}\left(t\right){\widehat{\rho }}_{n}\right]$$
(89)
$${K}_{z}\left(t\right)=\widehat{A}{R}_{z}\left(t\right){h}^{T}{\left[h{R}_{z}\left(t\right){h}^{T}+{R}_{v}\right]}^{-1}\widehat{B}\overline{u }\left(t\right){R}_{z}\left(t\right)\times {h}^{T}{\left[h{R}_{z}\left(t\right){h}^{T}+{R}_{v}\right]}^{-1}$$
(90)
$${R}_{z}\left(t+1\right)=\left[\widehat{A}-{K}_{z}\left(t\right)h+\widehat{B}\overline{u }\left(t\right)\right]{R}_{z}\left(t\right) [{\widehat{A}}^{T}-{h}^{T}{{K}_{z}}^{T}\left(t\right)+{\widehat{B}}^{T}\overline{u }(t)]+{K}_{z}\left(t\right){R}_{v}{{K}_{z}}^{T}\left(t\right)$$
(91)
$$\widehat{A}\left(t\right)=\left[\begin{array}{ccc}\begin{array}{c}-{\widehat{a}}_{1}\left(t\right)\\ -{\widehat{a}}_{2}\left(t\right)\\ \begin{array}{c}\vdots \\ \begin{array}{c}-{\widehat{a}}_{n-1}\left(t\right)\\ -{\widehat{a}}_{n}\left( t\right)\end{array}\end{array}\end{array}& \begin{array}{c}1\\ 0\\ \begin{array}{c}\vdots \\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{cc}\begin{array}{c}\begin{array}{c}0\\ 1\\ \ddots \end{array}\\ \begin{array}{c}\cdots \\ \dots \end{array}\end{array}& \begin{array}{c}\begin{array}{c}\begin{array}{c}\dots \\ \ddots \end{array}\\ \ddots \end{array}\\ 0\\ 0\end{array}\end{array}& \begin{array}{c}0\\ 0\\ \begin{array}{c}0\\ \begin{array}{c}1\\ 0\end{array}\end{array}\end{array}\end{array}\end{array}\right]$$
(92)
$$ \hat{B}\left( t \right): = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{\varvec{b}}_{1} \left( {\varvec{t}} \right)} \\ {\hat{\varvec{b}}_{2} \left( {\varvec{t}} \right)} \\ \end{array} } \\ \vdots \\ {\hat{\varvec{b}}_{{\varvec{n}}} \left( {\varvec{t}} \right)} \\ \end{array} } \right],\hat{f}\left( t \right): = \left[ {\begin{array}{*{20}c} {\begin{array}{*{20}c} {\hat{f}_{1} \left( t \right)} \\ {\hat{f}_{2} \left( t \right)} \\ \end{array} } \\ \vdots \\ {\hat{f}_{n} \left( t \right)} \\ \end{array} } \right] $$
(93)

In summary, the steps of the proposed algorithm are presented in Algorithm 2.

figure b

6 Simulation results

In order to show the efficiency of the proposed method, two numerical examples are provided. The first example is a bilinear state-space system that should be identified, and the second one is a practical example where the system is represented by bilinear state-space model.

6.1 Numerical example

Consider a bilinear state-space system as follows:

$$ \begin{aligned} {\varvec{z}}\left( {t + 1} \right) & = \left[ {\begin{array}{*{20}c} { - 0.20} & 1 \\ {0.25} & 0 \\ \end{array} } \right]{\varvec{z}}\left( t \right) \\ & \quad + { }\left[ {\begin{array}{*{20}c} {0.08} & {0.17} \\ { - 0.12} & { - 0.2} \\ \end{array} } \right]{\varvec{z}}\left( t \right)\overline{u}\left( t \right) \\ & \quad + \left[ {\begin{array}{*{20}c} {0.4} \\ 2 \\ \end{array} } \right]\overline{u}\left( t \right) \\ \overline{y}\left( t \right) & = \left[ {1,0} \right]{\varvec{z}}\left( t \right) - c\omega \left( {t - 1} \right) + dv\left( {t - 1} \right) + v\left( t \right) \\ \end{aligned} $$

The parameter vector for identification is

$$\rho ={\left[{a}_{1},{a}_{2},{b}_{11},{b}_{12},{b}_{21},{b}_{22},{f}_{1},{f}_{2},c,d\right]}^{T}$$
$$\rho =[0.20 ,-0.25 , 0.08 , 0.17,-0.12 ,-0.2 , 0.40 , 2 ,-0.3 , 1]$$

For simulation studies, consider a persistent excitation input signal \(u(t)\), where \(v(t)\) is a white noise with zero mean and variance \({\sigma }^{2}={0.10}^{2}\), \(\beta =0.88\), \(\propto =0.998\) and the data length is L = 3000. The parameter estimation error is calculated by \(\delta = \left\| {\rho \left( t \right) - \rho } \right\|/\left\| \rho \right\|\). The results of the parameters estimations and the error with the four-stage recursive least squares algorithm are presented in Fig. 2 and Table 5. In addition, the results of the four-stage stochastic gradient algorithm are shown in Fig. 3 and Table 6.

Fig. 2.
figure 2

4S-RLS estimation errors

Table 5 4S-RLS estimates and errors (\({\sigma }^{2}={0.10}^{2}\))
Fig. 3.
figure 3

4S-SG estimation errors

Table 6 4S-SG estimates and errors (\({\sigma }^{2}={0.10}^{2}\))

From the simulation results presented in the figures and the tables, the following conclusions can be derived: Convergence analysis shows that the proposed algorithms are effective and the estimated parameters by the proposed algorithms can converge to their real values. Figures 2 and 3 show that the estimation error decreases with a suitable speed. Also, the results presented in Tables 5 and 6 show the convergence of the parameters to the real values. The 4S-RLS algorithm can provide an effective parameter estimation compared to the 4S-SG algorithm. Due to the data length and the noise variance, the 4S-RLS algorithm has a smaller estimation error than the 4S-SG algorithm and also has a higher parameter estimation accuracy. The system states and their estimates are shown in Figs. 4 and 5, respectively. The estimated states correspond well to the actual system states, indicating that the bilinear state observer is effective.

Fig. 4
figure 4

States z(1) and z(2) and their estimates using 4S-RLS algorithm

Fig. 5
figure 5

States z(1) and z(2) and their estimates using 4S-SG algorithm

6.2 Practical example

The pH neutralization would be a case study to demonstrate the effectiveness of the proposed methods. For this process, the input/output data are gathered using the GMN test signal [32]. In this process, which is a highly nonlinear process, acid flow (HNO3), base flow (NaOH) and buffer (NaHco3) are considered as input to the system, and the pH level is considered as the system output. The inputs can be regarded as (u1), (u2), (u3). In this process, it can be assumed that acid flow rate and tank capacity are constant. The structure of the pH neutralization process is shown in Fig. 6.

Fig. 6
figure 6

Structure of pH neutralization process [17]

In this example, it is assumed that only input/output data is available and the proposed methods are used to identify the system with bilinear state-space models. To identify the system, 4S-RLS and 4S-SG algorithms are used to estimate the vector parameter \(\widehat{\rho }\left(t\right)\) with data length t = N = 1280 and variance \({\sigma }^{2}={0.10}^{2}\). To confirm the results, the estimated output and the actual output data for the test dataset are shown in Figs. 7 and 8. The estimation error is calculated as \(e :=\frac{\Vert \widehat{y}\left(t\right)-y\Vert }{\Vert y\Vert }\times 100\). According to the simulation results, the obtained error value for the 4S-RLS algorithm is 3.9734% and for the 4S-SG algorithm is 5.5564%.

Fig. 7
figure 7

Estimated output and real output of pH neutralization process using 4S-RLS algorithm

Fig. 8
figure 8

Estimated output and real output of pH neutralization process using 4S-SG algorithm

7 Conclusion

In this paper, the parameter identification of bilinear state-space systems with colored noise which is expressed by the ARMA model was investigated. The proposed methods are based on the hierarchy principle. Since the states of the system need to be used for identification and only the system input/output data are available, a bilinear state observer was designed to estimate the system states. By using the hierarchical identification principle and the gradient search, a four-stage recursive least-squares algorithm and a four-stage stochastic gradient algorithm were presented to reduce the computational burden.The simulation results have demonstrated that 4S-SG algorithm is efficient for identifying the bilinear systems, and the 4S-RLS algorithm outperforms the 4S-SG algorithm, with less estimation error. In addition, with increasing data length for different noise variances, the accuracy of the proposed method increases.