1 Introduction

Advanced process control strategies are often based on system models, and system identification hence accounts for a significant proportion of applications in modern control theory and practice [1,2,3]. The basic idea is to compare the system responses with the identification model in some optimal ways in order to confirm how close the model response fits the system response [4,5,6]. In recent years, nonlinear processes have attracted a lot of attention both from the control and the signal processing point of view and have been increasingly deployed [7,8,9]. When tackling the problem of modeling nonlinear processes from the observed data, one of the main difficulties is how to manage the trade-off between the model flexibility and the risk of overfitting [10, 11].

In engineering practice, nonlinear systems with the multi-rate sampling are ubiquitous [12]. The traditional discrete-time systems which have the same sampling rate among all variables are generally referred to as the single-rate sampled-data systems. However, because of the practical difficulties such as hardware restrictions and the instability of communication networks, it is often impossible to achieve the acquisitions of all signals involved in the process at the same sampling rate [13]. On the other hand, in the discretization process of control systems, the intentional multi-rate sampling of different signals can in fact more accurately capture their potential characteristics and predict better the unknown underlying dynamics. In other words, the application of different sampling rates is desirable and even inevitable in practice [14]. As a result, the research on multi-rate sampled-data systems has recently received considerable and increasing attention over the years [15, 16]. Li and Zhang designed the maximum likelihood multi-innovation method to identify the Hammerstein system with the dual-rate sampling [17]; Yan et al. introduced a mixed-rate model-based least squares estimator for the multi-rate discrete-time closed-loop system [18]; Chen et al. analyzed the multi-rate stabilization problem for the singularly perturbed system by using a time-dependent Lyapunov function [19]. In this paper, the dual-rate nonlinear systems with bilinear forms are considered, which can be regarded as a special case of missing output systems. In the context of the dual-rate sampling, the fast inputs and slow outputs are measured at different sampling rates, with sampling intervals being periodic. The objective of this paper is to reconstruct the inter-sample outputs and to estimate the unknown system parameters based on the inferential control idea.

In the field of the discrete-time system identification, many different identification ideas have been proposed in the literature [20,21,22]. Among them, the iterative method is attractive as it leads to good estimates of the unknown terms, even when a small amount of data is used for the estimation of the statistics [23,24,25,26,27,28,29]. Iterative methods are widely used in gradient-based optimization algorithms such as the auxiliary model gradient-based iterative identification method for multi-input nonlinear output-error systems [30, 31]. Another popular tool is the particle filter that can be used to identify nonlinear state-space systems with unknown states [32,33,34]. Based on the Monte Carlo sampling principle, the posterior probability density function of the state is represented by using a set of the weighted particles [35, 36]. Inspired by this, Chen et al. developed a new particle filter-based approach to identify a dual-rate nonlinear model by treating the unknown outputs as the states [37]. Later in [38], by combining the multi-innovation identification principle, the particle filter was further extended to the estimation of the unknown process outputs and parameters for the nonstate-space output-error type models.

The plant model to be studied in this paper has a particular bilinear form called the bilinear-in-parameter model, which has been found to be increasingly popular in modeling nonlinear processes [39, 40]. When the output nonlinearity of the block-oriented systems can be expressed as a linear combination of a set of the known basis functions of the past input/output data, the system can be transformed into the bilinear-in-parameter system. For example, the bilinear-in-parameter system can be used to describe the Hammerstein–Wiener system [41]. In this paper, we seek to propose identification algorithms for combined output and parameter estimation under low measurement rate constraints. In order to obtain the pseudo-linear regression identification models, two different techniques of tackling the special bilinear term are adopted, namely the over-parameterization technique and the parameter separation technique. Based on the two approximate discrete-time models, the corresponding particle filter-based identification methods are developed for bilinear-in-parameter models in the dual-rate framework. Then, the original parameter estimation problem is decomposed into two sub-problems: i) using the particle filters to recover the missing measurements from the available observations and ii) using the negative gradient search principle [42] to perform parameter estimation based on the recovered output signals. To be more specific, the particle filter is introduced that combines the Epanechnikov function to approximately estimate the missing outputs between the measured output samples. On this basis, the modified gradient-based iterative algorithm can then be employed to identify all the model parameters. As compared with the traditional gradient-based iterative (GI) approaches, the former can improve the accuracy of parameter estimation.

The rest of this paper is organized as follows. Section 2 describes the problem formulation of the dual-rate bilinear-in-parameter models and briefly derives two typical GI algorithms. Section 3 derives the improved GI algorithms by combining the particle filter. In Sect. 4, a representative numerical example is given to illustrate the effectiveness of the proposed algorithms. The paper concludes with the final remarks in Sect. 5.

2 Problem description

The notations used in this paper are fairly standard. The variables in bold represent multi-dimensional vectors or matrices. \(\hat{A}_k\) denotes the iterative estimate of A at k-th iteration. The superscript “T” is the vector/matrix transpose. \(``A=:X''\) or \(``X:=A''\) reads as “A is defined as X.” The Frobenius norm of a matrix \(\varvec{A}\) is defined by \(\Vert \varvec{A}\Vert ^2:=\mathrm{tr}[\varvec{A}\varvec{A}^{{\mathrm{T}}}]\), where \(\mathrm{tr}[\cdot ]\) denotes the trace of \(\varvec{A}\varvec{A}^{{\mathrm{T}}}\). \(\mathbf{1}_n\) represents an n-dimensional column vector with all elements unity. In this section, we first present the bilinear-in-parameter models with the available input–output data. Then, two techniques for processing the bilinear parts of the system are given from the perspective of parameter estimation. On this basis, the corresponding GI identification methods are derived for comparison.

2.1 The dual-rate bilinear-in-parameter system

Consider the following discrete-time output nonlinear systems:

$$\begin{aligned} y(l)=\sum ^n_{j=1}\beta _j\sum ^m_{i=1}\alpha _ig_i(y(l-j))+\sum ^{n_\gamma }_{\iota =1}\gamma _{\iota }u(l-\iota )+v(l),\nonumber \\ \end{aligned}$$
(1)

where l is the discrete-time index, \(u(l)\in {\mathbb {R}}\) is the system input, \(y(l)\in {\mathbb {R}}\) is the system output, \(\{\beta _j\}^n_{j=1}\), \(\{\alpha _i\}^m_{i=1}\) and \(\{\gamma _{\iota }\}^{n_\gamma }_{\iota =1}\) are the unknown parameters, \(v(l)\in {\mathbb {R}}\) is a stochastic white noise with zero mean and variance \(\sigma ^2\), and \(g_i(\cdot )\), \(i=1,2,\cdots ,m\) are the known nonlinear basis functions.

Define the parameter vectors \(\varvec{\alpha }\), \(\varvec{\beta }\), \(\varvec{\gamma }\), the information vector \(\varvec{\varphi }_u(l)\) and the information matrix \(\varvec{G}(l)\) as

$$\begin{aligned}&\varvec{\alpha }:=[\alpha _1,\alpha _2,\cdots ,\alpha _m]^{{\mathrm{T}}}\in {\mathbb {R}}^m,\quad \varvec{\beta }:=[\beta _1,\beta _2,\cdots ,\beta _n]^{{\mathrm{T}}}\in {\mathbb {R}}^n, \\&\varvec{\gamma }:=[\gamma _1,\gamma _2,\cdots ,\gamma _{n_\gamma }]^{{\mathrm{T}}}\in {\mathbb {R}}^{n_\gamma }, \\&\varvec{\varphi }_u(l):=[u(l-1),u(l-2),\cdots ,u(l-{n_\gamma })]^{{\mathrm{T}}}\in {\mathbb {R}}^{n_\gamma }, \\&\varvec{G}(l):= \left[ \begin{array}{cccc} g_1(y(l-1)) &{} g_1(y(l-2)) &{} \cdots &{} g_1(y(l-n)) \\ g_2(y(l-1)) &{} g_2(y(l-2)) &{} \cdots &{} g_2(y(l-n)) \\ \vdots &{} \vdots &{} &{} \vdots \\ g_m(y(l-1)) &{} g_m(y(l-2)) &{} \cdots &{} g_m(y(l-n))\end{array}\right] \in {\mathbb {R}}^{m \times n}. \end{aligned}$$

According to the above definitions, the output nonlinear system in (1) can be converted into the following bilinear-in-parameter models:

$$\begin{aligned} y(l)=\varvec{\alpha }^{{\mathrm{T}}}\varvec{G}(l)\varvec{\beta }+\varvec{\varphi }^{{\mathrm{T}}}_u(l)\varvec{\gamma }+v(l). \end{aligned}$$
(2)

Remark 1

The bilinear-in-parameter model can describe the output nonlinear system, in which its output nonlinearity is represented as a linear combination of a set of the known basis functions of the past output data. Likewise, if we define \(g_i(y(l-j)):=g_i(u(l-j))\), the corresponding input nonlinear system can also readily be converted into the bilinear-in-parameter model [43].

Fig. 1
figure 1

The dual-rate system

The identification approaches pursued in this paper are based on the assumption of sampling the plant output at a rate slower than the control input, namely, the inputs and outputs are sampled by two different rates, with the former being higher than the latter. Let the measurement (output) sampling period dh be a multiple of h for some integer \(d>1\) and \(h>0\) is the fundamental (input) sampling period. Then the input–output data \(\{u(lh),y(ldh)\}\) can be sampled. The detailed configuration is shown in Fig. 1, where \(H_h\) is a zero-order hold with period h, \(S_{dh}\) a sampler with period dh. The input \(u_c(t)\) is generated by \(H_h\), processing a discrete-time signal u(lh); \(P_c\) is a continuous-time process. The output of \(P_c\) is \(y_c(t)\) which is further down-sampled by the sampler \(S_{dh}\), yielding the measured output y(ldh) with period dh. For convenience, letting \(h=1\) and replacing l with ld in (2) result in

$$\begin{aligned} y(ld)=\varvec{\alpha }^{{\mathrm{T}}}\varvec{G}(ld)\varvec{\beta }+\varvec{\varphi }^{{\mathrm{T}}}_u(ld)\varvec{\gamma }+v(ld), \end{aligned}$$
(3)

where

$$\begin{aligned} \varvec{G}(ld)= & {} \left[ \begin{array}{cccc} g_1(y(ld-1)) &{} g_1(y(ld-2)) &{} \cdots &{} g_1(y(ld-n)) \\ g_2(y(ld-1)) &{} g_2(y(ld-2)) &{} \cdots &{} g_2(y(ld-n)) \\ \vdots &{} \vdots &{} &{} \vdots \\ g_m(y(ld-1)) &{} g_m(y(ld-2)) &{} \cdots &{} g_m(y(ld-n))\end{array}\right] , \end{aligned}$$
(4)
$$\begin{aligned} \varvec{\varphi }_u(ld)= & {} [u(ld-1),u(ld-2),\cdots ,u(ld-{n_\gamma })]^{{\mathrm{T}}}. \end{aligned}$$
(5)

The proposed parameter estimation algorithms in this paper are based on this identification model in (3)–(5). Many identification methods are derived based on the identification models of the systems [44,45,46,47,48,49] and can be used to estimate the parameters of other linear systems and nonlinear systems [50,51,52,53,54,55] and can be applied to other fields [56,57,58,59,60,61,62,63,64] such as chemical process control systems and engineering application systems. Throughout this paper, we refer to (3)–(5) as the fast-rate model. The measured input–output data are \(\{u(l), l=1,2,\cdots ,L\}\) at the fast rate and \({\{y(ld),d=1,2,\cdots \}}\) at the slow rate. Thus, the inter-sample outputs (missing outputs) \(\{y(ld-i), i=1,2,\cdots ,d-1\}\) are not available. In what follows, we will show how to estimate the unknown parameters by using the GI method with the dual-rate measurement data which are \({\{u(l),y(ld)\}}\).

2.2 The GI algorithm using the over-parameterization technique

As we can observe, \(y_0(ld):=\varvec{\alpha }^{{\mathrm{T}}}\varvec{G}(ld)\varvec{\beta }\) is the bilinear function of \(\varvec{\alpha }\) and \(\varvec{\beta }\). A common approach in statistics for estimation problems of this kind is to over-parameterize the model so that it becomes linear. For simplicity and without loss of generality, we let \(\alpha _1=1\) for the rest of the paper. This ensures the uniqueness of the identification results. Then, the bilinear part \(\varvec{\alpha }^{{\mathrm{T}}}\varvec{G}(ld)\varvec{\beta }\) can be written as follows:

$$\begin{aligned} \varvec{\alpha }^{{\mathrm{T}}}\varvec{G}(ld)\varvec{\beta }= & {} \varvec{\varphi }^{{\mathrm{T}}}_1(ld)\varvec{\rho },\quad \varvec{\varphi }_1(ld):=\text {col}[\varvec{G}^{{\mathrm{T}}}(ld)]\in {\mathbb {R}}^{mn}, \\ \varvec{\rho }:= & {} \varvec{\alpha }\otimes \varvec{\beta }\\= & {} [\alpha _1\beta _1,\alpha _1\beta _2,\cdots ,\alpha _1\beta _n,\alpha _2\beta _1,\alpha _2\beta _2,\cdots ,\\&\alpha _2\beta _n,\cdots ,\alpha _m\beta _1,\alpha _m\beta _2,\cdots ,\alpha _m\beta _n]^{{\mathrm{T}}} \\= & {} [\beta _1,\beta _2,\cdots ,\beta _n,\alpha _2\beta _1,\alpha _2\beta _2,\cdots ,\alpha _2\beta _n,\\&\cdots ,\alpha _m\beta _1,\alpha _m\beta _2,\cdots ,\alpha _m\beta _n]^{{\mathrm{T}}}\in {\mathbb {R}}^{mn}, \end{aligned}$$

where \(\otimes \) is the Kronecker product operator and the \(\text {col}[\cdot ]\) operator is one which forms a vector from a matrix by stacking its columns on top of one another.

From the above derivation, the model in (3)–(5) can then be linearized as the following regression-like model:

$$\begin{aligned} y(ld)= & {} \varvec{\varphi }^{{\mathrm{T}}}_1(ld)\varvec{\rho }+\varvec{\varphi }^{{\mathrm{T}}}_u(ld)\varvec{\gamma }+v(ld)\nonumber \\= & {} \varvec{\phi }^{{\mathrm{T}}}(ld)\varvec{\theta }+v(ld), \end{aligned}$$
(6)

in which

$$\begin{aligned}&\varvec{\phi }(ld):=[\varvec{\varphi }^{{\mathrm{T}}}_1(ld),\varvec{\varphi }^{{\mathrm{T}}}_u(ld)]^{{\mathrm{T}}}\in {\mathbb {R}}^{mm+n_{\gamma }},\\&\varvec{\theta }:=[\varvec{\rho }^{{\mathrm{T}}},\varvec{\gamma }^{{\mathrm{T}}}]^{{\mathrm{T}}}\in {\mathbb {R}}^{mm+n_{\gamma }}. \end{aligned}$$

Regression-like model (6) cannot be considered as a basis for the direct estimation because it is a pseudo-linear regression equation involving the information vector \(\varvec{\phi }(ld)\) that contains the unmeasured inter-sample output data. Thus the typical GI algorithm cannot be used directly to identify the parameter vector \(\varvec{\theta }\). Here, we apply the gradient-based iterative algorithm through the over-parameterization technique to estimate \(\varvec{\theta }\), which is abbreviated as the O-GI algorithm shown in the following:

$$\begin{aligned}&\hat{\varvec{\theta }}_{k}=\hat{\varvec{\theta }}_{k-1}+\mu _{1,k}\sum ^{L}_{l=1}\hat{\varvec{\phi }}_{k}(ld)[y(ld)-\hat{\varvec{\phi }}^{{\mathrm{T}}}_{k}(ld)\hat{\varvec{\theta }}_{k-1}], \end{aligned}$$
(7)
$$\begin{aligned}&\mu _{1,k}\le 2\lambda _{\max }^{-1}\left[ \sum ^L_{l=1}\hat{\varvec{\phi }}_{k}(ld)\hat{\varvec{\phi }}^{{\mathrm{T}}}_{k}(ld)\right] , \end{aligned}$$
(8)
$$\begin{aligned}&\hat{\varvec{\phi }}_{k}(ld)=[\hat{\varvec{\varphi }}^{{\mathrm{T}}}_{1,k}(ld),\varvec{\varphi }^{{\mathrm{T}}}_{u}(ld)]^{{\mathrm{T}}}, \end{aligned}$$
(9)
$$\begin{aligned}&\hat{\varvec{\varphi }}_{1,k}(ld)=[g_1(\hat{y}_k(ld-1)),\cdots ,g_1(y(ld-d)),\cdots ,\nonumber \\&\quad g_1(\hat{y}_k(ld-n)),g_2(\hat{y}_k(ld-1)),\cdots ,g_2(y(ld-d)),\nonumber \\&\quad \cdots ,g_2(\hat{y}_k(ld-n)),\cdots ,g_m(\hat{y}_k(ld-1)),\nonumber \\&\quad \cdots ,g_m(y(ld-d)),\cdots ,g_m(\hat{y}_k(ld-n))]^{{\mathrm{T}}}, \end{aligned}$$
(10)
$$\begin{aligned}&\varvec{\varphi }_{u}(ld)=[u(ld-1),u(ld-2),\cdots ,u(ld-n_{\gamma })]^{{\mathrm{T}}}, \end{aligned}$$
(11)
$$\begin{aligned}&\hat{y}_{k}(ld-i)=\hat{\varvec{\phi }}^{{\mathrm{T}}}_{k}(ld-i)\hat{\varvec{\theta }}_{k}, \end{aligned}$$
(12)
$$\begin{aligned}&\hat{\varvec{\theta }}_{k}=[\hat{\varvec{\beta }}^{{\mathrm{T}}}_k,\widehat{\alpha _2\varvec{\beta }^{{\mathrm{T}}}}_k,\cdots ,\widehat{\alpha _m\varvec{\beta }^{{\mathrm{T}}}}_k,\hat{\varvec{\gamma }}^{{\mathrm{T}}}_k]^{{\mathrm{T}}}\nonumber \\&\quad =[\hat{\beta }_{1,k},\cdots ,\hat{\beta }_{n,k},\widehat{\alpha _2\beta _{1,k}},\cdots ,\widehat{\alpha _2\beta _{n,k}},\cdots ,\widehat{\alpha _m\beta _{1,k}},\nonumber \\&\quad \cdots ,\widehat{\alpha _m\beta _{n,k}},\hat{\varvec{\gamma }}^{{\mathrm{T}}}_k]^{{\mathrm{T}}}, \end{aligned}$$
(13)
$$\begin{aligned}&\hat{\alpha }_{i,k}=\frac{1}{n}\sum ^{n}_{j=1}\frac{\widehat{\alpha _i\beta _{j,k}}}{\hat{\beta }_{j,k}}, \quad i=2,3,\cdots ,m, \quad j=1,2,\cdots ,n, \end{aligned}$$
(14)
$$\begin{aligned}&\hat{\varvec{\vartheta }}_k=[\hat{\varvec{\beta }}^{{\mathrm{T}}}_k,\hat{\alpha }_{2,k},\hat{\alpha }_{3,k},\cdots ,\hat{\alpha }_{m,k},\hat{\varvec{\gamma }}^{{\mathrm{T}}}_k]^{{\mathrm{T}}}. \end{aligned}$$
(15)

Note that the estimate \(\hat{\varvec{\beta }}_k\) can be obtained directly from the first n terms of \(\hat{\varvec{\theta }}_{k}\) since \(\alpha _1=1\). From (13), we get n values of the parameter estimate \(\hat{\alpha }_{i,k}\) and take their average as the value of \(\alpha _i\) [40].

2.3 The GI algorithm using the parameter separation technique

Considering the possible problems caused by the product term \(a_ib_j\) in the above O-GI algorithm, a parameter separation technique is developed here to deal with the bilinear part. Let \(\varvec{g}_i(ld)\in {\mathbb {R}}^{1 \times n}\) be the i-th row of the information matrix \(\varvec{G}(ld)\). The details are listed as follows:

$$\begin{aligned} \varvec{\alpha }^{{\mathrm{T}}}\varvec{G}(ld)\varvec{\beta }= & {} g_1(y(ld-1))\beta _1+\alpha _2g_2(y(ld-1))\beta _1\\&+\cdots +\alpha _{m}g_m(y(ld-1))\beta _1 \\&+g_1(y(ld-2))\beta _2+\alpha _2g_2(y(ld-2))\beta _2\\&+\cdots +\alpha _{m}g_m(y(ld-2))\beta _2+\cdots \\&+g_1(y(ld-n))\beta _n+\alpha _2g_2(y(ld-n))\beta _n\\&+\cdots +\alpha _{m}g_m(y(ld-n))\beta _n \\= & {} [g_1(y(ld-1)),g_1(y(ld-2)),\cdots ,\\&g_1(y(ld-n))]\varvec{\beta }+[g_2(y(ld-1))\beta _1,\\&g_2(y(ld-2))\beta _2,\cdots ,g_2(y(ld-n))\beta _n]\alpha _2 \\&+[g_3(y(ld-1))\beta _1,g_3(y(ld-2))\beta _2,\\&\cdots ,g_3(y(ld-n))\beta _n]\alpha _3+\cdots \\&+[g_m(y(ld-1))\beta _1,g_m(y(ld-2))\beta _2,\\&\cdots ,g_m(y(ld-n))\beta _n]\alpha _m \\= & {} [\varvec{g}_1(ld),\varvec{g}_2(ld)\varvec{\beta },\varvec{g}_3(ld)\varvec{\beta },\cdots ,\\&\varvec{g}_m(ld)\varvec{\beta }] \left[ \begin{array}{c} \varvec{\beta }\\ \alpha _2\\ \alpha _3\\ \vdots \\ \alpha _m\end{array}\right] \nonumber \\= & {} \varvec{\varphi }_2^{{\mathrm{T}}}(ld)\varvec{\eta }, \end{aligned}$$

where

$$\begin{aligned} \varvec{g}_i(ld):= & {} [g_i(y(ld-1)),g_i(y(ld-2)),\cdots ,g_i(y(1d-d)), \\&\cdots ,g_i(y(ld-n))]\in {\mathbb {R}}^{1 \times n}, \\ i= & {} 1,2,\cdots ,m, \\ \varvec{\varphi }_2(ld):= & {} [\varvec{g}_1(ld),\varvec{g}_2(ld)\varvec{\beta },\varvec{g}_3(ld)\varvec{\beta },\cdots ,\\&\varvec{g}_m(ld)\varvec{\beta }]^{{\mathrm{T}}}\in {\mathbb {R}}^{n+m-1}, \\ \varvec{\eta }:= & {} [\varvec{\beta },\alpha _2,\alpha _3,\cdots ,\alpha _m]^{{\mathrm{T}}}\in {\mathbb {R}}^{n+m-1} . \end{aligned}$$

Likewise, the dual-rate bilinear-in-parameter model in (3)–(5) becomes:

$$\begin{aligned} y(ld)= & {} \varvec{\varphi }_2^{{\mathrm{T}}}(ld)\varvec{\eta }+\varvec{\varphi }^{{\mathrm{T}}}_u(ld)\varvec{\gamma }+v(ld) \\= & {} \varvec{\psi }^{{\mathrm{T}}}(ld)\varvec{\vartheta }+v(ld), \end{aligned}$$

where

$$\begin{aligned}&\varvec{\psi }(ld):=\left[ \begin{array}{c} \varvec{\varphi }_2(ld) \\ \varvec{\varphi }_u(ld) \end{array} \right] \in {\mathbb {R}}^{n+m+n_\gamma -1},\\&\varvec{\vartheta }:=\left[ \begin{array}{c} \varvec{\eta } \\ \varvec{\gamma } \end{array} \right] \in {\mathbb {R}}^{n+m+n_\gamma -1}. \end{aligned}$$

Then employing the negative gradient search principle, the corresponding GI algorithm based on the parameter separation technique (the PS-GI algorithm for short) for the identification of system parameters can be written as

$$\begin{aligned}&\hat{\varvec{\vartheta }}_{k}=\hat{\varvec{\vartheta }}_{k-1}+\mu _{2,k}\sum ^{L}_{l=1}\hat{\varvec{\psi }}_{k}(ld)[y(ld)-\hat{\varvec{\psi }}^{{\mathrm{T}}}_{k}(ld)\hat{\varvec{\vartheta }}_{k-1}], \end{aligned}$$
(16)
$$\begin{aligned}&\mu _{2,k}\le 2\lambda _{\max }^{-1}\left[ \sum ^L_{l=1}\hat{\varvec{\psi }}_{k}(ld)\hat{\varvec{\psi }}^{{\mathrm{T}}}_{k}(ld)\right] , \end{aligned}$$
(17)
$$\begin{aligned}&\hat{\varvec{\psi }}_{k}(ld)=[\hat{\varvec{\varphi }}^{{\mathrm{T}}}_{2,k}(ld),\varvec{\varphi }^{{\mathrm{T}}}_{u}(ld)]^{{\mathrm{T}}}, \end{aligned}$$
(18)
$$\begin{aligned}&\hat{\varvec{\varphi }}_{2,k}(ld)=[\hat{\varvec{g}}_{1,k}(ld),\hat{\varvec{g}}_{2,k}(ld)\hat{\varvec{\beta }}_{k-1},\hat{\varvec{g}}_{3,k}(ld)\hat{\varvec{\beta }}_{k-1},\cdots ,\nonumber \\&\quad \hat{\varvec{g}}_{m,k}(ld)\hat{\varvec{\beta }}_{k-1}]^{{\mathrm{T}}}, \end{aligned}$$
(19)
$$\begin{aligned}&\hat{\varvec{g}}_{i,k}(ld)=[g_i(\hat{y}_k(ld-1)),g_i(\hat{y}_k(ld-2)),\cdots ,\nonumber \\&\quad g_i(y(ld-d)),\cdots ,g_i(\hat{y}_k(ld-n))], \end{aligned}$$
(20)
$$\begin{aligned}&\varvec{\varphi }_{u}(ld)=[u(ld-1),u(ld-2),\cdots ,u(ld-n_{\gamma })]^{{\mathrm{T}}}, \end{aligned}$$
(21)
$$\begin{aligned}&\hat{y}_{k}(ld-i)=\hat{\varvec{\psi }}^{{\mathrm{T}}}_{k}(ld-i)\hat{\varvec{\vartheta }}_{k}, \end{aligned}$$
(22)
$$\begin{aligned}&\hat{\varvec{\vartheta }}_k=[\hat{\varvec{\beta }}^{{\mathrm{T}}}_k,\hat{\alpha }_{2,k},\hat{\alpha }_{3,k},\cdots ,\hat{\alpha }_{m,k},\hat{\varvec{\gamma }}^{{\mathrm{T}}}_k]^{{\mathrm{T}}}. \end{aligned}$$
(23)

3 The particle filter-based GI algorithms

Data deficiency is commonly encountered in statistics, and it is important to address this in an appropriate way that does not interfere with the conclusions drawn from the given data. In this section, the objective is to derive the improved GI algorithms based on the particle filter which use the incomplete data to estimate concurrently the unknown system parameters and outputs. The particle filter is employed to calculate the unknown output \(\hat{y}_k(ld-i)\), and the GI method is applied to estimate the value of the parameter vector \(\varvec{\vartheta }\).

To the best of our knowledge, one of attractive advantages of the particle filter lies in fact that it combines the Monte Carlo sampling methods with Bayesian inference to tackle the nonlinear dynamic estimation problem at a reasonable computational cost [65]. Hence, it is sometimes called the sequential Monte Carlo Bayesian estimator of the probability distribution of unknown variables. Within a Bayesian framework, it is assumed that the distribution of variables to be estimated is determined by all available variables in the system, which are generally considered to be random. In this regard, the first step is to represent the posterior probability density function \(p(y(ld-i)|Y_o,U,\hat{\varvec{\vartheta }}_{k})\) based on the observed data \(\{U, Y_o\}\) and the known parameter estimation vector \(\hat{\varvec{\vartheta }}_k\). Then, \(\hat{y}_k(ld-i)\) is given by

$$\begin{aligned} \hat{y}_k(ld-i)= & {} \int {y(ld-i)f(y(ld-i))dy(ld-i)}, \nonumber \\ f(y(ld-i)):= & {} p(y(ld-i)|Y_o,U,\hat{\varvec{\vartheta }}_{k}). \end{aligned}$$
(24)

Here, we denote \(Y_o:=\{y(d),y(2d),\cdots ,y(ld)\}\) and \(U:=\{u(1),u(2),\cdots , u(ld)\}\), respectively. Note that typically this expression cannot be used directly to compute \(\hat{y}_k(ld-i)\) analytically and, as a result, one has to resort to numerical techniques for its derivation. The particle filter achieves the task of representing the posterior probability density function by utilizing a bank of the independent random variables called particles. According to the Monte Carlo sampling principle, the right-hand side of (24) can be modified in the following form:

$$\begin{aligned}&\int {y(ld-i)f(y(ld-i))dy(ld-i)}\approx \frac{1}{S}\sum ^{S}_{s=1}y^{(s)}_k(ld-i), \\&\quad =\frac{1}{S}\sum ^{S}_{s=1}\int {y(ld-i)\delta (y(ld-i)-\hat{y}^{(s)}_k(ld-i))dy(ld-i)}. \end{aligned}$$

Therefore the posterior probability density function can be approximated as

$$\begin{aligned} f(y(ld-i))\approx \frac{1}{S}\sum ^{S}_{s=1}\delta (y(ld-i)-\hat{y}^{(s)}_k(ld-i)), \end{aligned}$$

where S is the total number of samples, \(\hat{y}^{(s)}_k(ld-i)\) is the s-th particle that approximates the distribution, and \(\delta (\cdot )\) denotes the Dirac delta function. Then

$$\begin{aligned} \hat{y}_k(ld-i)\approx \frac{1}{S}\sum ^{S}_{s=1}\hat{y}^{(s)}_k(ld-i) . \end{aligned}$$

However, it is not generally convenient to sample from the posterior probability density function directly; thus, we need to introduce the importance sampling technique. By defining \(\hat{Y}_k(ld-i-1):=\{\hat{y}_{k}(ld-i-1),\cdots ,y(ld-d),\cdots ,\hat{y}_k(1)\}\), we choose the importance function \(q(\cdot )\) as

$$\begin{aligned}&q(y(ld-i)|Y(d:ld),U,\hat{\varvec{\vartheta }}_{k})\\&\quad =p(y(ld-i)|\hat{Y}_k(ld-i-1),U,\hat{\varvec{\vartheta }}_{k}), \end{aligned}$$

which is in practice easier to sample than the target distribution.

In the importance sampling process, each sample is extracted from the importance distribution \(q(\cdot )\) instead of the posterior \(p(\cdot )\). This discrepancy is generally compensated by appropriately weighting each sample separately. Specifically, the associated weight \(\hat{w}^{(s)}_k(ld)\) corresponding to \(\hat{y}^{(s)}_k(ld)\) is determined by

$$\begin{aligned} \hat{w}^{(s)}_k(ld)\propto p(y(ld)|\hat{y}^{(s)}_k(ld))\hat{w}^{(s)}_k(ld-d). \end{aligned}$$
(25)

The posterior probability density function can then be represented by

$$\begin{aligned} f(y(ld-i))\approx & {} \sum ^{S}_s\hat{W}^{(s)}_k(ld-i)\delta (y(ld-i)\nonumber \\&-\hat{y}^{(s)}_k(ld-i)),\\ \hat{W}^{(s)}_k(ld-i)= & {} \frac{\hat{w}^{(s)}_k(ld-i)}{\sum ^{S}_{s=1}\hat{w}^{(s)}_k(ld-i)},\quad \sum ^{S}_s\hat{W}^{(s)}_k=1,\nonumber \end{aligned}$$
(26)

where \(\hat{w}^{(s)}_k(ld-i)\) is the unnormalized importance weight and \(\hat{W}^{(s)}_k(ld-i)\) is the normalized weight associated with the s-th particle at time \(ld-i\).

Nevertheless, the probability density function \(p(y(ld)|\hat{y}^{(s)}_k(ld))\) involved is still unknown. Assume that the noise is of Gaussian distribution. Then the weight can be represented by

$$\begin{aligned} \hat{w}^{(s)}_k(ld)= & {} \frac{1}{\sqrt{2\pi }\sigma }{\exp }\left[ -\frac{(y(ld)-\hat{y}^{(s)}_k(ld))^2}{2\sigma ^2}\right] \\&\times \hat{w}^{(s)}_k(ld-d). \end{aligned}$$

In what follows, we design a kernel density estimator to update \(\hat{w}^{(s)}_k(ld)\) recursively, which offers a flexible way of modeling in the given datasets without the imposition of a parametric model. The kernel is a weighting function used in nonparametric identification methods. In order to construct the kernel function, we define

$$\begin{aligned} \hat{\xi }^{(s)}_k(ld):= & {} |y(ld)-\hat{y}^{(s)}_k(ld)|>0, \end{aligned}$$
(27)
$$\begin{aligned} \hat{\xi }_k(ld):= & {} \max \{\hat{\xi }^{(s)}_k(ld),\quad s=1,2,\cdots ,S\}+1.\nonumber \\ \end{aligned}$$
(28)

Clearly, we can obtain the following relationship:

$$\begin{aligned} \hat{\pi }^{(s)}_k(ld):=\frac{\hat{\xi }^{(s)}_k(ld)}{\hat{\xi }_k(ld)}<1,\quad s=1,2,\cdots ,S, \end{aligned}$$

in which \(\hat{\pi }^{(s)}_k(ld)\) can be viewed as the independent and identically distributed sample drawn from \(p(y(ld)|\hat{y}^{(s)}_k(ld))\). Here, we choose the Epanechnikov function [66] to estimate \(p(y(ld)|\hat{y}^{(s)}_k(ld))\). Then we obtain

$$\begin{aligned} p(y(ld)|\hat{y}^{(s)}_k(ld))=\frac{3}{4}[1-(\hat{\pi }^{(s)}_k(ld))^2]. \end{aligned}$$

Upon normalizing the density function, one can get

$$\begin{aligned} p(y(ld)|\hat{y}^{(s)}_k(ld))= & {} \frac{\frac{3}{4}[1-(\hat{\pi }^{(s)}_k(ld))^2]}{\sum ^S_{s=1}\frac{3}{4}[1-(\hat{\pi }^{(s)}_k(ld))^2]}\nonumber \\= & {} \frac{1-(\hat{\pi }^{(s)}_k(ld))^2}{S-\sum ^S_{s=1}(\hat{\pi }^{(s)}_k(ld))^2}. \end{aligned}$$
(29)

Normalization guarantees that the result of the resulting estimation is a probability density function. Substituting (29) into (25) yields the weight update law:

$$\begin{aligned} \hat{w}^{(s)}_k(ld)=\frac{1-(\hat{\pi }^{(s)}_k(ld))^2}{S-\sum ^S_{s=1}(\hat{\pi }^{(s)}_k(ld))^2}\hat{w}^{(s)}_k(ld-d). \end{aligned}$$
(30)

Remark 2

Noted that the above nonparametric kernel density estimation method is not flexible because of the definition of \(\hat{\xi }_k(ld)\). In general, the smaller \(\hat{\xi }^{(s)}_k(ld)\) is, the greater the weight of \(\hat{y}^{(s)}_k(ld)\) is in estimation y(ld), which may cause \(\hat{\xi }_k(ld)\approx 1\). Therefore one new flexible definition for \(\hat{\xi }_k(ld)\) is given by

$$\begin{aligned} \hat{\xi }_k(ld):= & {} \max \{\hat{\xi }^{(s)}_k(ld),\quad s=1,2,\cdots ,S\}+\chi ^\epsilon ,\nonumber \\ \end{aligned}$$
(31)

where \(\chi \) and \(\epsilon \) are the adaptive factors. The value of \(\hat{\xi }_k(ld)\) can be modified appropriately according to the adaptive factors. Therefore the new adaptive kernel estimator can be adjusted to improve the accuracy in fitting density function.

From the previous expression, it can easily be checked that the weights \(\hat{w}^{(s)}_k(ld-i), i\ne 0\) cannot be determined based on (30). Thus, we keep the weights unchanged at the inter-sampled instant \(ld-i\), i.e.,

$$\begin{aligned} \hat{w}^{(s)}_k(ld-i)=\hat{w}^{(s)}_k(ld),\quad i=1,2,\cdots ,d-1. \end{aligned}$$
(32)

Accordingly, the missing outputs sought can now be calculated by a series of the weighted samples as follows:

$$\begin{aligned} \hat{y}_k(ld-i)= & {} \sum ^{S}_{s=1}\hat{W}^{(s)}_k(ld-i)\hat{y}^{(s)}_k(ld-i) \\= & {} \sum ^{S}_{s=1}\hat{W}^{(s)}_k(ld)\hat{y}^{(s)}_k(ld-i). \end{aligned}$$

In the particle filtering approach, resampling procedures are usually applied to reduce the adverse effects caused by the particle degeneracy problem. In this paper, we adopt the approach to filter the particles according to \(\hat{W}^{(s)}_k(ld)\), i.e., discarding the samples with small weights and copying those with large weights, before the weight of each sample is finally assigned as \(\hat{W}^{(s)}_k(ld-i)=1/S\). To be specific, the posterior probability density function and the missing output estimate are given by

$$\begin{aligned} f(y(ld-i))\approx & {} \sum ^{S}_{r=1}\frac{1}{S}\delta (y(ld-i)-\hat{y}^{(r)}_k(ld-i)) \end{aligned}$$
(33)
$$\begin{aligned}= & {} \sum ^{S}_{s=1}\frac{s_i}{S}\delta (y(ld-i)-\hat{y}^{(s)}_k(ld-i)),\nonumber \\ \hat{y}_k(ld-i)\approx & {} \sum ^{S}_{r=1}\frac{1}{S}\hat{y}^{(r)}_k(ld-i) \end{aligned}$$
(34)
$$\begin{aligned}= & {} \sum ^{S}_{s=1}\frac{s_i}{S}\hat{y}^{(s)}_k(ld-i),\nonumber \\ \hat{y}^{(s)}_k(ld-i)\sim & {} p(y(ld-i)|\hat{y}_k(ld-i-1),\cdots ,\nonumber \\&\cdots \hat{y}_k(1),u(1),\cdots ,u(Ld),\hat{\varvec{\vartheta }}_{k}),\nonumber \\ \end{aligned}$$
(35)

where \(\hat{y}^{(r)}_k(ld-i)\) represents the r-th particle in a new set of particles resampled at time \(ld-i\) and \(s_i\) refers to the number of times that \(\hat{y}^{(s)}_k(ld-i)\) is copied in this new particle swarm. Equations (27)–(35) constitute the particle filtering algorithm. In the O-GI and PS-GI algorithms, we use (12) and (22) to calculate the output estimate \(\hat{y}_k(ld-i)\), respectively. In the improved GI algorithms, we use the particle filter to estimate \(y(ld-i)\). To be specific, replacing \(\hat{y}_k(ld-i)\) in (12) and (22) with (34) yields a particle filter gradient-based iterative algorithm by using the over-parameterization technique (the O-PF-GI algorithm for short) and a particle filter gradient-based iterative algorithm by using the parameter separation technique (the PS-PF-GI algorithm for short), respectively. For simplicity, the pseudo-code description of the PS-PF-GI algorithm is provided in Algorithm 1.

Remark 3

The particle filter is an optimal estimation tool for nonlinear state-space models with unknown states, which represents the posterior distribution through a set of random particles and their associated weights. Inspired by the idea, a new particle filter-based algorithm for identifying a class of input-output systems within the GI framework is presented. The unknown outputs are regarded as the unknown states, while the known outputs are regarded as the measurable outputs. The kernel density estimation method is then developed to update the weights of these particles at each iteration. Compared with the traditional GI method, the advantage of this method is that it can more accurately estimate the missing outputs. It is of the fact that the novel particle filter applies the measurable outputs to adjust the weight of each particle. In addition, the proposed algorithms in the paper can find many potential applications in the recovery and reconstruction of measurement signals in the noisy environment. The proposed particle filter-based algorithm of simultaneous output and parameter estimation for output nonlinear systems under low measurement rate constraints in this paper can be extended to multi-input and multi-output bilinear systems, and engineering technology literature [67,68,69,70] can be applied to other control and schedule areas such as the information processing and transportation communication systems [71,72,73,74,75,76,77] and so on.

figure a

4 Example

In this section, we provide a detailed numerical experiment to evaluate the ability for parameter estimation and numerical performance in terms of accuracy and efficiency in the applications of the four proposed methods, namely the PS-PF-GI, the O-PF-GI, the PS-GI and the O-GI algorithms. This comparison will hence allow us to see how the proposed the particle filter-based approaches perform compared to the traditional solutions. The system under consideration is given by

$$\begin{aligned}&y(l)=[1,0.28] \left[ \begin{array}{ccc} 0.30y(l-1) &{} 0.30y(l-2) &{} 0.30y(l-3)\\ 0.50y^2(l-1) &{} 0.50y^2(l-2) &{} 0.50y^2(l-3)\end{array}\right] \\&\quad \left[ \begin{array}{c} 0.45 \\ 0.30 \\ 0.46 \end{array}\right] +0.10u(l-1)+0.90u(l-2)+v(l), \\&\varvec{\alpha }=[1,0.28]^{{\mathrm{T}}},\quad \varvec{\beta }=[0.45,0.30,0.46]^{{\mathrm{T}}},\\&\varvec{\gamma }=[0.10,0.90]^{{\mathrm{T}}}, \\&\varvec{\vartheta }=[0.28,0.45,0.30,0.46,0.10,0.90]^{{\mathrm{T}}}. \end{aligned}$$
Table 1 The parameter estimates and their errors under different algorithms
Table 2 The means and the standard deviations of the output estimation error under different algorithms
Fig. 2
figure 2

The PS-PF-GI estimation errors \(\delta \) versus k

Fig. 3
figure 3

The O-PF-GI estimation errors \(\delta \) versus k

Fig. 4
figure 4

The PS-GI estimation errors \(\delta \) versus k

Fig. 5
figure 5

The O-GI estimation errors \(\delta \) versus k

Fig. 6
figure 6

The true outputs and the predicted outputs versus time l

In simulation, the input u(l) is taken as the persistent excitation signal with zero mean and unit variance and v(l) as a white noise with zero mean. The number of the particles in the particle filter is set to be \(S=50\), and the measurement data length is taken as \(L=600\). The first 400 data samples \((l=1,2,3,\cdots ,400)\) are used in the first stage to estimate \(\varvec{\vartheta }\), \(y(l-i)\), and the remaining data points (samples 400-600) are then used to test the estimated models. When taking \(d=2\), the inputs \(\{u(1),u(2),u(3),\cdots ,u(400)\}\) and the outputs \(\{y(2),y(4),\cdots ,y(400)\}\) are available, while \(\{y(1),y(3),\cdots ,y(399)\}\) are unavailable.

The simulation consists of three parts: (i) the unknown output estimation and (ii) the system parameter estimation and (iii) the output prediction. The results are given in Tables 1, 2 and Figs. 2, 3, 4, 5, and 6, respectively. Table 1 shows the parameter estimates and the corresponding parameter estimation errors \(\delta :=\Vert \hat{\varvec{\vartheta }}_k(l)-\varvec{\vartheta }\Vert /\Vert \varvec{\vartheta }\Vert \) obtained by the four algorithms under the different noise variances when the number of iterations \(k=200\). Then, the relationships between \(\delta \) and k are plotted in Figs. 2, 3, 4 and 5. The means and standard deviations (Stds) of the errors \(\epsilon :=y(l)-\hat{y}_k(l)\) between the true missing outputs and the estimated outputs under the same conditions are shown in Table 2. To test the quality of the estimated models obtained from the different algorithms, we compare the errors between the predicted outputs and the true outputs by using \(l_e=200\) samples from \(l=401\) to \(l=600\). The detailed comparison results are shown in Fig. 6, where the iteration \(k=200\) and the noise variance \(\sigma ^2=0.10^2\). Using the root-mean-square error to denote the errors, we have

$$\begin{aligned} Error_1= & {} \left[ \frac{1}{l_e}\sum ^{600}_{l=401}[\hat{y}_{1,k}(l)-{y}(l)]^2\right] ^{1/2}=0.10288, \\ Error_2= & {} \left[ \frac{1}{l_e}\sum ^{600}_{l=401}[\hat{y}_{2,k}(l)-{y}(l)]^2\right] ^{1/2}=0.10308, \\ Error_3= & {} \left[ \frac{1}{l_e}\sum ^{600}_{l=401}[\hat{y}_{3,k}(l)-{y}(l)]^2\right] ^{1/2}=0.10446, \\ Error_4= & {} \left[ \frac{1}{l_e}\sum ^{600}_{l=401}[\hat{y}_{4,k}(l)-{y}(l)]^2\right] ^{1/2}=0.10467, \end{aligned}$$

where \(\hat{y}_{1,k}(l)\) denotes the predicted output of the PS-PF-GI algorithm, \(\hat{y}_{2,k}(l)\) the predicted output of the O-PF-GI algorithm, \(\hat{y}_{3,k}(l)\) the predicted output of the PS-GI algorithm and \(\hat{y}_{4,k}(l)\) the predicted output of the O-GI algorithm.

From the result of this experiment, we can draw the following features.

  • It can be observed from the results in Table 1 that the particle filter-based algorithms generate lower estimation errors than the errors caused by the traditional methods under the different noise levels—see the parameter estimation errors in the last column in Table 1.

  • According to Figs. 2, 3, 4 and 5, we can see clearly that the parameter estimation accuracy given by the PS-PF-GI algorithm, the PS-O-GI algorithm, the PS-GI algorithm and the O-GI algorithm becomes high with the iterative step k increasing. Under the same conditions, an appropriate noise level leads to lower parameter estimation accuracy.

  • The estimated outputs of the PS-PF-GI algorithm can track the missing outputs very well, as shown in Table 2. It can be observed from Fig. 6 that the predicted outputs from the PS-PF-GI estimated model fluctuate in a small range around their true values. However, compared with the PS-GI and O-GI algorithms, the PS-PF-GI and O-PF-GI algorithms have relatively larger computational burden.

5 Conclusions

This paper focuses on a class of dual-rate nonlinear systems which have been widely used in nonlinear system modeling and control. A key issue is to effectively identify the systems with good generalization performance. In this regard, a combined particle-based filter and the gradient-based iterative method is presented to identify a dual-rate bilinear-in-parameter system. It involves two steps, where one estimates the joint probability density of the missing observations based on the initial guesses of the parameters in the first step, and the second step minimizes the cost function to update a new estimate of the parameter vector. These two steps are repeated until the change in the parameter estimates after each iteration is within a specified tolerance level. The numerical results of the simulation example indicate that the particle filter-based methods can give more reliable output and parameter estimation than the traditional gradient-based iterative algorithms. The methods proposed in this paper can combine some mathematical methods [78,79,80,81,82,83,84,85] and other identification approaches [86,87,88,89,90,91,92] to study the parameter estimation problems of other linear systems and nonlinear systems [93,94,95,96,97,98,99] and can be applied to other engineering systems.