1 Introduction

Conventional discrete-time control systems are mainly developed for single-rate systems in which all input and output variables are sampled at a single (uniform) rate [2]. Identification techniques for single-rate systems have been well investigated [1, 16, 25, 26]. However, the single-rate sampling scheme sometimes may not be applicable in practical industrial processes, because the involved physical variables might be sampled at different sampling rates; see some examples in [13, 20]. Such systems with more than one sampling rate are called multirate systems. Recently, multirate systems have been a hot topic in control and identification fields, and many related achievements have been reported [3, 4, 15, 18, 23, 33, 39].

Most of the sampling schemes for multirate systems in the literature are uniform sampling, that is, the sampling interval for all variables are fixed. However, a more general class of multirate systems feature the following characteristics: The involved variables are measured at non-uniform intervals, e.g., in the cases when manual sampling or laboratory analysis is required. Systems whose sampling intervals for the input and/or output channels are non-equidistant in time are termed as non-uniformly sampled multirate systems [8, 21, 24, 28]. The systems with irregular missing samples can be characterized under the framework of the non-uniformly sampled system model [10, 14, 17, 31, 32, 38, 40, 41, 43, 45]. In this paper, we consider a class of periodically non-uniformly sampled systems which are special cases of the general non-uniformly sampled systems. It has been shown that the periodically non-uniform sampling has some promising properties, such as preserving controllability and observability [8, 28] and being capable of uniquely recovering the continuous-time system [8]. The related identification and control problems have been investigated for periodically non-uniformly sampled systems [8, 9, 21, 22, 24, 27, 28]. Specifically, some interesting identification methods have been proposed for the non-uniformly sampled systems, for example, the subspace methods [21, 27], the auxiliary model-based least squares method [24], the multi-innovation gradient method [37] and the partially coupled least squares method [9]. In this work, the main objective is to propose a computationally efficient identification algorithm for periodically non-uniformly sampled systems based on a hierarchical identification principle.

It is known that multirate and non-uniformly sampled systems involve more parameters than single-rate systems, which will unavoidably result in heavy computational costs of the identification algorithms. This motivates us to develop computationally efficient algorithms for online identification, which is of both theoretical merit and practical needs. The decomposition technique is useful in many areas such as the image procession [3436]. Based on the decomposition technique, the hierarchical identification principle is very effective in reducing the complexity of the identification algorithm for large-scale systems [5]. The key is to decompose a system into subsystems and to address the associated items [6, 11]. For system identification models with different structures, the ways of doing decomposition are different. The multivariable system in [5] was decomposed into two subsystems, one with a parameter vector and the other with a parameter matrix. The dual-rate state-space model in [6] was also decomposed into two subsystems according to the lifted state-space structure. In this paper, we propose to decompose the system identification model into N (2⩽N⩽dim(θ)) submodels, where dim(θ) represents the dimension of the parameter vector θ. Based on such a decomposition, the proposed algorithm can greatly reduce the computational complexity.

The rest of the paper is organized as follows. Section 2 derives the identification model of the non-uniformly sampled systems and describes the problem formulation. Section 3 gives the recursive least squares algorithm for the non-uniformly sampled systems for comparisons. Section 4 presents a hierarchical least squares algorithm and compares the computational load with the recursive least squares algorithm. Section 5 analyzes the performance of the proposed algorithm. Section 6 provides an illustration example. Finally, we offer some concluding remarks in Sect. 7.

2 Model description and problem formulation

Consider the systems with a periodically non-uniform updating and uniform sampling scheme, which can refer to the Fig. 3 in [24]. For such a non-uniformly sampled system, the control input is non-uniformly updated r times with intervals τ i (i=1,2,…,r) at the time instants t=kT+t i i=0,1,2,…,r−1. Here, t i :=t i−1+τ i =τ 1+τ 2+⋯+τ i , t 0=0 and T:=τ 1+τ 2+⋯+τ r =t r is known as the frame period; the output is sampled uniformly with the frame period T. In the following, we establish the discrete-time state-space model and the input–output representation of the non-uniformly sampled systems from its continuous-time state-space model.

Consider a continuous-time process with the controllable and observable state-space representation:

$$ \left \{ \everymath{\displaystyle}\begin{array}{l}\dot{\boldsymbol{x}}(t) = \boldsymbol{A}_c \boldsymbol{x}(t) +\boldsymbol{B}_cu(t),\\[4pt]y(t) = \boldsymbol{C} \boldsymbol{x}(t) +Du(t),\end{array} \right .$$
(1)

where x(t)∈ℝn is the state vector, u(t)∈ℝ the control input, y(t)∈ℝ the output, A c , B c , C are constant matrices of appropriate dimensions and D is a constant. Because of the non-uniform zero-order hold at the input port, the input u(t) is a square wave signal with the following expression:

$$ u(t)=\left \{ \everymath{\displaystyle} \begin{array}{l@{\quad}l}u(kT) & kT\leqslant t<kT+t_1, \\[4pt]u(kT+t_1) & kT+t_1\leqslant t<kT+t_2, \\\vdots \\u(kT+t_{r-1}) & kT+t_{r-1}\leqslant t<(k+1)T.\end{array}\right .$$
(2)

The output y(t) is sampled by a sampler with the frame period T, yielding a discrete-time signal y(kT).

The solution for the state equation in (1) is given by

$$ \boldsymbol{x}(t)=\mathrm{e}^{\boldsymbol{A}_c (t-t_0)}\boldsymbol{x}(t_0)+\int_{t_0}^t\mathrm{e}^{\boldsymbol{A}_c(t-\tau)}\boldsymbol{B}_c u(\tau)\,\mathrm{d}\tau.$$
(3)

For a non-pathological frame period T, by letting t 0=kT and t=kT+T in (3) and using (2) to discretize the state-space model in (1), we have [8, 24],

(4)

where

$$\boldsymbol{A}:=\mathrm{e}^{\boldsymbol{A}_cT}\in{\mathbb{R}}^{n\times n}, \quad \quad \boldsymbol{B}_i:={\mathrm{e}}^{\boldsymbol{A}_c(T-t_i)}\int ^{\tau_i}_0\mathrm{e}^{\boldsymbol{A}_ct}\,\mathrm{d}t\,\boldsymbol{B}_c\in {\mathbb{R}}^n.$$

Transforming the state-space model in (4) into an input–output representation and taking into account the disturbance v(kT), we have

$$ \alpha(z)y(kT)=\sum_{i=1}^rb_i(z)u(kT+t_{i-1})+v(kT),$$
(5)

where {v(kT)} is a white noise sequence with zero mean, a(z) and b i (z) are polynomials, in the unit backward shift operator z −1, related to A, B i , C and D and have the form of [24]

Remark 1

Equation (5) is the input–output representation for the non-uniformly sampled systems. In the following, we will develop the identification algorithm for the model in (5). {u(kT+t i ), y(kT): i=0,1,2,…,r−1; k=1,2,…} are the available input–output data. a i and b ij are the parameters to be identified. In [24], an auxiliary model-based least squares method has been developed for the non-uniformly sampled systems with an output error model. In this paper, we aim to develop a new method to reduce the computational burden of the least squares algorithm.

Remark 2

The expression in (5) can be viewed as a multiple-input single-output system model with r fictitious inputs u(kT+t i ), i=0,1,2,…,r−1. When r=1, it becomes a discrete-time model for single-rate systems. It can be seen from (5) that the number of parameters for the non-uniformly sampled model is (r+1)n+1, which is much larger than 2n+1 of the single-rate model. Thus the computational burden of the corresponding identification algorithm for the non-uniformly sampled model is heavier than that for the single-rate model.

3 The recursive least squares algorithm

In order to show the advantages of the proposed algorithm in this paper, we recall the well-established recursive least squares (RLS) algorithm in this section.

Let the superscript T denote the vector transpose and define the parameter vector θ and the information vector φ(kT) as

Equation (5) can be written in a vector form

$$ y(kT)=\boldsymbol{{\varphi}}^{\mathrm{T}}(kT)\boldsymbol{{\theta}}+v(kT).$$
(6)

The estimate \(\hat{\boldsymbol{{\theta}}}(kT)\) of the parameter vector θ in (6) can be obtained from the following RLS algorithm:

(7)
(8)

where \(\boldsymbol{P}_{0}(kT)\in{\mathbb{R}}^{n_{0}\times n_{0}}\) is the covariance matrix and I is an identity matrix of appropriate sizes.

Remark 3

It is known that the RLS algorithm has the advantage of fast convergence rate, but its computational burden is relatively heavy. From (7)–(8), we can see that the RLS algorithm for non-uniformly sampled systems requires calculating the covariance matrix P 0(kT) of a large size n 0×n 0 at each recursion step. Since the number n 0=(r+1)n is larger than the parameter number 2n+1 of single-rate systems, the RLS algorithm in (7)–(8) tolerates a heavier computational burden than the RLS algorithm for single-rate systems. This motivates us to present a highly computationally efficient algorithm to estimate the parameter vector θ in (6).

4 The hierarchical estimation algorithm

The hierarchical identification principle is an effective way of handling large-scale systems by reducing the computational load [57, 11]. Here, we use the hierarchical identification principle to derive the hierarchical least squares (HLS) algorithm for the system in (5).

Decompose the information vector φ(kT) into N sub-information vectors and the parameter vector θ into N sub-parameter vectors with dimension n i , i.e.,

Then the overall identification model in (6) can be written as

$$ y(kT)= \boldsymbol{{\varphi}}_i^{\mathrm{T}}(kT)\boldsymbol{{\theta }}_i+ \sum_{j=1, j\not=i}^N\boldsymbol{{\varphi}}_j^{\mathrm{T}}(kT)\boldsymbol{{\theta}}_j+v(kT).$$
(9)

Let \(\hat{\boldsymbol{{\theta}}}_{i}(kT)\) be the estimate of θ i at time t=kT. According to the least squares principle, the estimation algorithm for θ i in (9) can be expressed as

(10)
(11)

where P i (kT) is the covariance matrix of the ith subsystem.

Remark 4

Because the expression on the right-hand side of (10) contains the unknown parameter vectors θ j , j=1,2,…,i−1,i+1,…,N, using the hierarchical identification principle and replacing these unknown vectors θ j (ji) in (10) with their estimates \(\hat{\boldsymbol{{\theta }}}_{j}(kT-T)\) at the preceding time t=kTT yields

(12)

Equations (12) and (11) form the decomposition-based least squares (HLS) identification algorithm.

To initialize the algorithm, we take \(\boldsymbol {P}_{i}(0)=p_{0}\boldsymbol{I}_{n_{i}}\), with p 0 being normally a large positive number (e.g., p 0=106) and \(\hat{\boldsymbol{{\theta}}}_{i}(0)=\mathbf{1}_{n_{i}}/p_{0}\), with \(\mathbf{1}_{n_{i}}\) being an n i -dimensional column vector whose elements are all 1.

Remark 5

The diagram of the HLS algorithm is shown in Fig. 1. From the HLS algorithm in (11)–(12) and Fig. 1, we can see that the parameter estimate \(\hat{\boldsymbol{{\theta}}}_{i}(kT)\) at time t=kT depends not only on \(\hat{\boldsymbol{{\theta}}}_{i}(kT-T)\) at the preceding time t=kTT, but also on the estimates \(\hat{\boldsymbol{{\theta}}}_{j}(kT-T)\) (j=1,2,…,i−1,i+1,…,N) of all the other submodels at time t=kTT.

Fig. 1
figure 1

The diagram of the HLS algorithm

Remark 6

The proposed HLS algorithm in (11)–(12) has less computational burden than the RLS algorithm in (7)–(8). The computational burden of the two algorithms are compared in Table 1, where the numbers of multiplications and additions are for each step. Taking a non-uniformly sampled system with n 0=6 and N=2 as an example, i.e., n 1=n 2=3, the times of multiplication and addition are shown in the braces.

Table 1 Comparisons of computational efficiency

Remark 7

From Table 1, we can see that the computation cost of the HLS algorithm in (11)–(12) is lower than that of the RLS algorithm, and the computational cost of the HLS algorithm depends on the choice of the number N. The larger N, the lower the computation cost. When N=n 0, i.e., n i =1, we get the least computational cost.

5 Main convergence results

In this section, we establish the main convergence results of the HLS algorithm for non-uniformly sampled systems, using the martingale convergence theorem (Lemma D.5.3 in [16]). Two problems are addressed: (1) Do the parameter estimates given by the HLS algorithm converge to the true parameters? (2) How fast does the algorithm converge?

Let us introduce some notations. The symbols λ max[X] and λ min[X] represent the maximum and minimum eigenvalues of the positive definite matrix X, respectively; ∥X2:=tr[XX T]. The relation f(k)=O(g(k)) means that there exist positive constants δ 1 and k 0 such that |f(k)|⩽δ 1 g(k) for g(k)⩾0 and kk 0.

Assume that \(\{v(kT), \mathcal{F}_{kT}\}\) is a martingale difference sequence defined on a probability space \(\{\varOmega, \mathcal{F}, P\}\), where \(\{\mathcal{F}_{kT}\}\) is the σ algebra sequence generated by the observations up to and including time t=kT [16]. The noise sequence {v(kT)} satisfies the following assumptions:

Lemma 1

For the algorithm in (11)(12), the following inequalities hold:

$$\sum_{j=1}^\infty\frac{\boldsymbol{{\varphi}}_i^{\mathrm {T}}(jT)\boldsymbol{P}_i(jT)\boldsymbol{{\varphi}}_i(jT)}{[\ln |\boldsymbol{P}_i^{-1}(jT)|]^c}<\infty,\quad \mathrm{a.s.},\ c>1.$$

The proof can be done in a similar way to that of Lemma 1 in [24].

Theorem 1

For the system in (6) and the HLS algorithm in (11)(12), if

$$\gamma(kT):= \Biggl(1-\sum_{i=1}^N\boldsymbol{{\varphi}}_i^{\mathrm {T}}(kT)\boldsymbol{P}_i(kT)\boldsymbol{{\varphi}}_i(kT) \Biggr)\tilde{y}^2(kT) +2\sum_{i\neq j}^N\tilde{y}_i(kT)\tilde{y}_j(kT)\leqslant0,$$

then we take \(\hat{\boldsymbol{{\theta}}}(kT)=\hat{\boldsymbol {{\theta}}}(kT-T)\). Define

$$R(kT):=\sum_{i=1}^N\bigl[\ln\big|\boldsymbol{P}_i^{-1}(kT)\big|\bigr]^c$$

and assume that (A1) and (A2) hold, then for any c>1, we have

$$\big\|\hat{\boldsymbol{{\theta}}}_i(kT)-\boldsymbol{{\theta}}_i\big\|^2=O \biggl( \frac{R(kT)}{\lambda_{\min}[\boldsymbol{P}_i^{-1}(kT)]} \biggr),\quad i=1,2,\ldots,N,\ \mathrm{a.s.}$$

Proof

Define the parameter estimation error vector \(\tilde{\boldsymbol{{\theta}}}(kT)\) and a nonnegative definite functions Q i (kT) as

(13)
(14)

and

Define the innovation

(15)

Using (9), we have

$$ e(kT)=-\boldsymbol{{\varphi}}^{\mathrm{T}}(kT)\tilde{\boldsymbol {{\theta}}}(kT-T)+v(kT)=-\tilde{y}(kT)+v(kT).$$
(16)

Inserting (12) into (14) gives

(17)

Define the stochastic Lyapunov function

$$Q(kT):=\sum_{i=1}^NQ_i(kT).$$

Using (17) and (16), we have

(18)

Since {v(kT)} is a white noise sequence, taking the conditional expectation of both sides of (18) with respect to \(\mathcal{F}_{kT-T}\) and using (A1)–(A2) give

$$\mathrm{E}\bigl[Q(kT)\mid\mathcal{F}_{kT-T}\bigr]=Q(kT-T)-\gamma(kT)+\sum _{i=1}^N\boldsymbol{{\varphi}}_i^{\mathrm{T}}(kT)\boldsymbol {P}_i(kT)\boldsymbol{{\varphi}}_i(kT)\sigma^2.$$

If γ(kT)⩾0, we have

$$ \mathrm{E}\bigl[Q(kT)\mid\mathcal{F}_{kT-T}\bigr]\leqslant Q(kT-T)+\sum_{i=1}^N\boldsymbol{{\varphi}}_i^{\mathrm {T}}(kT)\boldsymbol{P}_i(kT)\boldsymbol{{\varphi}}_i(kT)\sigma^2;$$
(19)

otherwise, we have \(\hat{\boldsymbol{{\theta}}}(kT):=\hat {\boldsymbol{{\theta}}}(kT-T)\) and Q(kT):=Q(kTT). Thus, we always have

(20)

Define

$$Z(kT):=\frac{Q(kT)}{R(kT)}.$$

Using (20), we have

(21)

According to Lemma 1, the summation of the last term of the right-hand side of (21) for k from 1 to ∞ is finite. Applying the martingale convergence theorem (Lemma D.5.3 in [16]) to (21), we conclude that Z(kT) converges \(\mathrm{a.s.}\) to a finite random variable, say Z 0, i.e.,

$$Z(kT)=\frac{Q(kT)}{R(kT)}\rightarrow Z_0<\infty.$$

Thus we have

$$Q(kT)=O\bigl(R(kT)\bigr), \quad Q_i(kT)=O\bigl(R(kT)\bigr).$$

From the definition of Q i (kT), we have

This proves Theorem 1. □

Corollary 1

For the system in (6) and the HLS algorithm in (11)(12), assume that the conditions of Theorem 1 hold and that there exist positive constants c 1, c 2, c 0, and k 0 such that for kk 0, the following general persistent excitation condition holds:

$$(\mathrm{A3})\quad c_1{\boldsymbol{I}}_{n_i}\leqslant \frac{1}{k}\sum_{j=1}^k\boldsymbol{{\varphi}}_i(jT)\boldsymbol {{\varphi}}_i^{\mathrm{T}}(jT)\leqslant c_2k^{c_0}{\boldsymbol{I}}_{n_i},\quad i=1,2,\ldots,N,\ \mathrm{a.s.}$$

Then the parameter estimation errors converge to zero like \(\|\tilde{\boldsymbol{{\theta}}}_{i}(kT)\|^{2}\rightarrow0\), a.s., i=1,2,…,N.

Proof

From the definitions of P i (kT) and the condition (A3), we have

Using Theorem 1, we have

The proof is completed. □

Remark 8

Corollary 1 shows that the estimation errors \(\|\tilde{\boldsymbol{{\theta}}}_{i}(kT)\|^{2}\) converge to zero at the rate of N[lnk]c/k as k approaches infinity. This implies that the convergence rate depends on the number N and that the larger N, the slower the convergence rate. Therefore, the HLS algorithm in (11)–(12) reduces the computational load at the cost of slowing down the convergence rate. In order to guarantee a certain convergence rate, the number N should not be too large.

6 Example

Consider a stable continuous-time process with the following state-space representation:

$$S_c:\quad \left \{ \everymath{\displaystyle} \begin{array}{l}\dot{\boldsymbol{x}}(t)=\left [\everymath{\displaystyle}\begin{array}{c@{\quad}c}-0.1 & -0.5\\ 1 & 0\end{array}\right ] \boldsymbol{x}(t) +\left [ \everymath{\displaystyle} \begin{array}{c} 1 \\ 0\end{array} \right ]u(t),\\[10pt]y(t) = [0, 2] \boldsymbol{x}(t). \end{array} \right .\nonumber $$

Take r=2, τ 1=0.3 s, τ 2=0.5 s, then t 1=τ 1=0.3 s, t 2=τ 1+τ 2=T=0.8 s. Discretizing this system gives

$$\left \{ \everymath{\displaystyle} \begin{array}{l}x(kT+T)=\left [\everymath{\displaystyle}\begin{array}{c@{\quad}c}0.7754 & -0.3642\\ 0.7285 & 0.8483\end{array} \right ]x(kT) +\left [ \everymath{\displaystyle} \begin{array}{c@{\quad}c}0.2509 & 0.4776\\ 0.1818 & 0.1217\end{array} \right ]\left [ \begin{array}{c}u(kT)\\u(kT+t_1)\end{array} \right ],\\[10pt]y(kT)=[ 0 , 2]x(kT). \end{array} \right .$$

The corresponding input–output relationship is

In the simulation, the inputs {u(kT+t i ), i=0,1} are taken as persistent excitation signal sequences with zero mean and unit variance, and {v(kT)} as a white noise sequence with zero mean and variance σ 2=0.102. The corresponding noise-to-signal ratio is δ ns=11.82 %. The parameter estimates and their errors \(\delta:=\|\hat{\boldsymbol{{\theta}}}(kT)-\boldsymbol {{\theta}}\|/\|\boldsymbol{{\theta}}\|\) given by the RLS algorithm in (7)–(8) are shown in Table 2. In this simulation, the number of parameters is n 0=6. Let N=2 and n 1=n 2=3, then apply the HLS algorithm in (11)–(12) to estimate the parameters of this non-uniformly sampled system. The simulation results are shown in Table 3. The estimation errors δ versus k of the two algorithms are shown in Fig. 2.

Fig. 2
figure 2

The parameter estimation error δ versus k

Table 2 The RLS parameter estimates and errors
Table 3 The HLS parameter estimates and errors with N=2

From Tables 13 and Fig. 2, we have the following conclusions:

  • The identification model is a fictitious two-input single-output system because the input is non-uniformly sampled twice over the frame period. Table 2 and Fig. 2 show that RLS algorithm can give highly accurate estimation.

  • In this simulation, r=2, n 0=6 and N=2. From Table 1, we can see that the numbers of multiplication and addition at each step for the HLS algorithm are 60 and 48, respectively, and are fewer than 96 and 84 of the RLS algorithm. Compared to the RLS algorithm, the HLS algorithm saves 40 % computational cost.

  • The parameter estimation errors for the HLS algorithm become smaller and gradually get close to zero as the data length k increases.

  • The convergence rate of the HLS algorithm is a little slower than that of the RLS algorithm.

7 Conclusions

In this paper, an HLS algorithm is presented for a class of non-uniformly sampled systems based on the hierarchical identification principle. The main advantage of the proposed algorithm is that it can reduce the computation load compared with the traditional RLS algorithm. The convergence analysis indicates that the parameter estimates given by the HLS algorithm can converge to their true values, and the number N cannot be too large in the algorithm implementation because the number of the submodels really affects the convergence rate. A numerical example shows that the HLS algorithm can give effective parameter estimates with lower computational effort. The proposed hierarchical method based on decomposing the parameter vector and the information vector can be extended to a variety of system models such as the multivariable systems with a complicated colored noise [12]. Due to its benefit of reduced computational complexity, the proposed identification method could be applied to resource-constrained networked dynamic systems [19, 29, 30, 42, 44].