1 Introduction

Modern industrial plants have been the source of challenging tasks in dynamical system identification and control [51]. In particular, the design of control systems to achieve the level of quality demanded by current industry standards requires building accurate models of the plant being controlled. Usually, the designer tackles the plant modeling problem by trying to find models capable of representing faithfully the dynamics of the (possibly nonlinear) system of interest based solely on available data.

Since data are usually available in the form of input and output time series, these can be used for building direct or inverse models of nonlinear systems by means of computational intelligence methods, such as neural networks [8, 18, 46, 48], Takagi-Sugeno fuzzy models [34, 53, 59] or hybrid neuro-fuzzy systems [5, 6, 17, 54], to mention just a few possibilities. Although several techniques for nonlinear dynamical system identification have been proposed [1, 49], they can be categorized into one of the following approaches: global, local and hybrid models.

Global models usually implement a single structure, such as a single feedforward or recurrent neural network model, that approximates the whole mapping between the input and the output of the system being identified. Indeed, global models constitute the mainstream in nonlinear system identification and control [37, 46, 47, 63].

Local models, in their turn, have attracted much attention because of their ability to fit to the local shape of an arbitrary surface (i.e. mapping), which is particularly difficult when the dynamical system characteristics vary considerably throughout the input space. The input space is usually divided into smaller, localized regions, each one being associated with a simpler (usually linear) model. To estimate the system output at a given time, a single model is chosen from the pool of available local models according to some criteria defined on the current input data.

A common local modeling approach is known as local model network (LMN) [26, 28, 44, 45, 55, 60], which uses basis functions to implement its localized nature, similar to the standard Radial Basis Function network (RBFN). In LMNs, however, the output weights of an RBFN are replaced with local functions of the inputs. As a consequence, due to the better approximation properties of the local functions, the LMN model does not require as many basis functions as the standard RBFN to achieve the desired accuracy and, hence, the number of parameters is reduced dramatically [27].

Finally, hybrid models combine the aforementioned modeling approaches in order to build a suitable approximation of the dynamics of the system of interest [23, 39]. In [39], for example, it is introduced a novel technique, named as Mixture of Support Vector Machine Experts (MSVME), whose main purpose is to combine the complementary characteristics of Support Vector Machines (a global approach) and Mixture of Expert models (a local one) and apply them to dynamical system identification.

Local modeling techniques often use the Self-Organizing Map (SOM) to partition the input-output space into smaller regions, over which the local models are built [2, 7, 9, 10, 19, 20, 40, 50, 52]. The SOM [36] is an unsupervised competitive learning algorithm which has been commonly applied to clustering, vector quantization and data visualization tasks [35]. The results reported on those studies are rather appealing, indicating that SOM-based local models can be feasible alternatives to global models based on more traditional supervised neural network architectures, such as the RBFN and the Multilayer Perceptron (MLP), and also to more recent ones, such as the Extreme Learning Machine (ELM) [29, 30].

An important limitation of the aforementioned local linear models is that they were specifically designed to use the SOM algorithm. This means that their performances degrade considerably if another vector quantization (VQ) algorithm is used. The main advantage in using other VQ algorithm than the SOM to partition the input-output space is related to computational costs. There are several VQ algorithms available [61], such as the K-Means [22, 42] and the Frequency-Sensitive Competitive Learning (FSCL) [3], which are considerably lighter than the SOM and still achieve equivalent data partitioning results.

More recently, a general methodology for building and evaluating local linear models with respect to their robustness (i.e. insensitiveness) to changes in the VQ algorithm has been proposed in [58]. This methodology evaluates whether the difference between two distributions of residuals generated by the same local model using two different VQ algorithms are statistically significant or not. For this purpose, the nonparametric Kolmogorov-Smirnov hypothesis test was chosen. By using this methodology, it was shown that the SOM can be successfully replaced by a lighter VQ algorithm (e.g. the FSCL) without performance loss.

From the exposed, in this paper we introduce two novel strategies for estimating the parameters of local linear models which are general enough to be used by any VQ algorithm. Despite developing the proposed techniques initially for the SOM, we evaluate the performance of the resulting local linear models for different VQ algorithms by means of the methodology introduced in [58].

The first of the proposed techniques is a prototype-based approach, while the second one is a data-based approach. More specifically, the prototype-based technique uses the weight vectors of a given neuron of the SOM and of its K nearest neighbors to build the corresponding linear model. The data-based technique computes the local linear model associated with a given neuron using the data vectors that are mapped into the Voronoi cell of that neuron and the data vectors mapped into the Voronoi cells of its K nearest neighbors.

A comprehensive evaluation of the proposed approaches is carried out for the task of inverse identification of three benchmarking single-input/single-output (SISO) dynamical systems. Their performances are compared to those achieved by linear and nonlinear global models, and by other existing local linear modeling techniques. In addition, we perform a thorough statistical analysis of the residuals for model validation purposes and also evaluate how robust the proposed techniques are with respect to the VQ algorithm used to partition the input-output space.

The remainder of the paper is organized as follows. In Section 2, two previous SOM-based local modeling approaches which serve as building blocks for the proposed approaches are described. In Section 3, the proposed techniques for parameter estimation of linear local models are then presented. Comprehensive computer simulations and performance analysis of the proposed approaches are presented in Section 4. The paper is concluded in Section 5.

2 SOM-based local modeling approaches

Let us assume that the dynamical systems we are dealing with can be described mathematically by the NARX discrete time modelFootnote 1 [41, 49]:

$$ y(t) = f[ \mathbf{y}_{p}(t-1), \mathbf{u}_{q}(t) ] + \xi(t), $$
(1)

with

$$ \mathbf{y}_{p}(t-1) = [y(t-1) \;\; y(t-2) \;\; {\ldots} \;\; y(t-p)]^{T} \in \mathbb{R}^{p} $$
(2)

and

$$ \mathbf{u}_{q}(t) = [u(t) ;\; u(t-1) \;\; {\ldots} \;\; u(t-q+1)]^{T} \in \mathbb{R}^{q}. $$
(3)

Note that f(⋅) is an unknown nonlinear mapping, \(u(t) \in \mathbb {R}\) and \(y(t) \in \mathbb {R}\) denote, respectively, the input and output of the model at time step t, while p ≥ 1 and q ≥ 1 (pq) are the input-memory and output-memory orders, respectively. The variable ξ(t) stands for the error term, assumed as an additive white Gaussian noise process.

In this paper, we aim at evaluating the proposed local linear models on inverse system identification task, mainly for later use in control strategies. Thus, for the inverse system identification task we are interested in, the neural network models are designed to approximate the following inverse mapping:

$$ u(t) = f^{-1}[\mathbf{u}_{q}(t-1); \mathbf{y}_{p}(t-1)] + \xi(t). $$
(4)

This kind of nonlinear inverse model of a system is required, for example, in the implementation of several control system strategies, such as Direct Inverse Control [21], Internal Model Control [33, 38, 43, 49], Generalized Inverse Learning [31, 32] and others [4, 19, 20].

2.1 The VQTAM approach

The Vector-Quantized Temporal Associative Memory (VQTAM) [9] approach aims at building an input-output associative mapping using the Self-Organizing Map (SOM). It is a generalization to the temporal domain of a SOM-based associative memory technique that has been used by many authors to learn static (i.e. memoryless) input-output mappings, specially within the domain of robotics (see [11] and references therein).

The SOM learns from examples a mapping from a high-dimensional continuous input space \(\mathcal {X}\) onto a low-dimensional discrete space (lattice) \(\mathcal {A}\) of N neurons which are arranged in fixed topological forms, e.g., a rectangular 2-dimensional array. The map \(i^{*}(\mathbf {x}):\mathcal {X} \rightarrow \mathcal {A}\) is defined by the weight vectors \(\mathcal {W}=\{\mathbf {w}_{1}, \mathbf {w}_{2}, \ldots , \mathbf {w}_{N}\}, \mathbf {w}_{i} \in \mathbb {R}^{p} \subset \mathcal {X}\), and their corresponding coordinates \(\mathbf {r}_{i} \in \mathbb {R}^{2}\) in the lattice \(\mathcal {A}\).

According to the VQTAM framework, the input vector to the SOM at time step t, x(t), is composed of two parts. The first part, denoted \({\mathbf {x}}^{in}(t) \in \mathbb {R}^{p+q}\), carries data about the input of the dynamic mapping to be learned. The second part, denoted \(x^{out}(t) \in \mathbb {R}\), contains data concerning the desired output of this mapping. The weight vector of neuron i, w i (t), has its dimension increased accordingly. These simple definitions are formalized as follows:

$$ {\mathbf{x}}(t) = \left(\begin{array}{c} {\mathbf{x}}^{in}(t) \\ x^{out}(t) \end{array} \right) \quad \text{and} \quad {\mathbf{w}}_{i}(t) = \left(\begin{array}{c} {\mathbf{w}}_{i}^{in}(t) \\ w_{i}^{out}(t) \end{array} \right) $$
(5)

where \({\mathbf {w}}_{i}^{in}(t) \in \mathbb {R}^{p+q}\) and \(w_{i}^{out}(t) \in \mathbb {R}\) are, respectively, the portions of the weight vector (a.k.a. prototype vector) which store information about the inputs and the outputs of the desired mapping. Depending on the variables chosen to build the vector x in(t) and scalar x out(t) one can use the SOM algorithm (or any other VQ algorithm) to learn the forward or the inverse mapping of a given dynamic system.

For the inverse identification task we are interested in, we need the following definitions:

$$\begin{array}{@{}rcl@{}} \mathbf{x}^{in}(t)=[\mathbf{u}_{q}(t-1) \;\;\; \mathbf{y}_{p}(t-1)]^{T}, \end{array} $$
(6)
$$\begin{array}{@{}rcl@{}} x^{out}(t) = u(t). \end{array} $$
(7)

The winning neuron at time t is determined based only on x in(t):

$$ i^{*}(t) = \arg \min\limits_{\forall i \in \mathcal{A}} \{ \| {\mathbf{x}}^{in}(t) - {\mathbf{w}}_{i}^{in}(t) \| \}. $$
(8)

For updating the weights, both x in(t) and x out(t), are used:

$$\begin{array}{@{}rcl@{}} {\mathbf{w}}_{i}^{in}(t+1) = {\mathbf{w}}_{i}^{in}(t) + \alpha(t) h(i^{*},i;t)[{\mathbf{x}}^{in}(t) - {\mathbf{w}}_{i}^{in}(t)], \end{array} $$
(9)
$$\begin{array}{@{}rcl@{}} w_{i}^{out}(t+1) = w_{i}^{out}(t) + \alpha(t)h(i^{*},i;t)[x^{out}(t) - w_{i}^{out}(t)], \end{array} $$
(10)

where 0 < α(t) < 1 is the learning rate, and h(i , i;t) is a time-varying Gaussian neighborhood function defined as

$$ h(i^{*},i;t)= \exp \left(-\frac{\| {\mathbf{r}}_{i}(t) - {\mathbf{r}}_{i^{*}}(t)\|^{2}}{2\sigma^{2}(t)}\right) $$
(11)

where r i (t) and \({\mathbf {r}}_{i^{*}}(t)\) are, respectively, the coordinates of the neurons i and i in the output array, and σ(t)>0 defines the radius of the neighborhood function at time t. The variables α(t) and σ(t) should both decay with time to guarantee convergence of the weight vectors to stable steady states. In this paper, we adopt an exponential decay for both variables:

$$ \alpha(t) = \alpha_{0} \left(\frac{\alpha_{T}}{\alpha_{0}} \right)^{(t/T)} \quad \text{and} \quad \sigma(t) = \sigma_{0} \left(\frac{\sigma_{T}}{\sigma_{0}}\right)^{(t/T)} $$
(12)

where α 0 (σ 0) and α T (σ T ) are the initial and final values of α(t) (σ(t)), respectively.

As training proceeds, the SOM learns to associate input weight vectors \({\mathbf {w}}_{i}^{in}\) with the corresponding output weights \(w_{i}^{out}\). Weight adjustment is performed until a steady state of global ordering of the weight vectors has been achieved. In this case, we say that the map has converged.

Once the SOM has been trained, the predicted input \(\hat {u}(t)\), for a given input vector x in(t), is simply computed as

$$ \hat{u}(t) = w_{i^{*}}^{out}(t), $$
(13)

where the corresponding winning neuron i (t) is determined as in (8).

It is worth noting that in MLP and RBF networks, the vector x in(t) is presented to the network input, while x out(t) is used as a target output for the computation of an error signal that guides learning. The VQTAM method in its turn allows VQ algorithms, such as the SOM, to include the desired output x out(t) as part of the network input vector x(t). This allows the interpretation of the term \(e_{i}(t)=x^{out}(t) - w_{i}^{out}(t)\) in (10) as the error term due to the i-th neuron. Thus, using (13), the estimation error due to the winning neuron is given by

$$ e_{i^{*}}(t) = x^{out}(t) - w_{i^{*}}^{out}(t) = u(t) - \hat{u}(t). $$
(14)

2.2 The KSOM model

Souza and Barreto [58] introduced the KSOM model and evaluated it also in inverse system identification tasks. The KSOM model uses a single local linear model whose vector of coefficients is time-varying, i.e. it is computed at each iteration based on the prototype vectors of the neuron nearest to the current input vectors and of its K nearest neighbors.

For building a KSOM model we train firstly the VQTAM as described in Section 2.1 using just a few neurons (usually less than 100 units) in order to have a reduced representation of the input-output mapping encoded in the weight vectors of the VQTAM. Then, for a new input vector presented at time t, the coefficients of a time-varying linear local model are estimated using the weight vectors of the K first winning neurons \(\{ i_{1}^{*}, i_{2}^{*}, \dots , i_{K}^{*}\}\). These neurons are selected as follows:

$$\begin{array}{@{}rcl@{}} i_{1}^{*}(t) & = & \arg \min\limits_{\forall i} \left\{ \| {\mathbf{x}}^{in}(t) - {\mathbf{w}}_{i}^{in}(t) \| \right\}\\ i_{2}^{*}(t) & = & \arg \min_{\forall i\neq i_{1}^{*}} \left\{ \| {\mathbf{x}}^{in}(t) - {\mathbf{w}}_{i}^{in}(t) \| \right\} \\ {} & {\vdots} & \qquad \qquad {\vdots} \qquad \qquad {\vdots} \\ i_{K}^{*}(t) & = & \arg \min\limits_{\forall i\neq \{i_{1}^{*}, \ldots, i_{K-1}^{*}\}} \left\{ \| {\mathbf{x}}^{in}(t) - {\mathbf{w}}_{i}^{in}(t) \| \right\} \end{array} $$
(15)

Let the set of K winning weight vectors at time t be denoted by \(\{ \mathbf {w}_{i_{1}^{*}}, \mathbf {w}_{i_{2}^{*}}, \dots ,\mathbf {w}_{i_{K}^{*}} \}\). Recall that due to the VQTAM training style, each weight vector w i (t) has a portion associated with x in(t) and other associated with x out(t). So, the KSOM uses the corresponding K pairs of prototype vectors \(\{ \mathbf {w}_{i^{*}_{k}}^{in}(t), w_{i^{*}_{k}}^{out}(t) \}_{k=1}^{K}\), with the aim of building a local linear mapping at time t:

$$ w_{i^{*}_{k}}^{out} = \mathbf{c}^{T}(t) \mathbf{w}_{i^{*}_{k}}^{in}(t), \qquad k=1, \ldots, K $$
(16)

where c(t)=[b 1(t),…,b q (t),a 1(t),…,a p (t)]T is a time-varying coefficient vector. Equation (16) can be written in a matrix form as

$$ \mathbf{w}^{out}(t) = \mathbf{R}(t)\mathbf{c}(t), $$
(17)

where the output vector \(\mathbf {w}^{out}(t) \in \mathbb {R}^{K}\) and the regression matrix \(\mathbf {R}(t) \in \mathbb {R}^{K\times (p+q)}\) at time t are defined as follows

$$ \mathbf{w}^{out}(t) = \left[w_{i_{1}^{*}}^{out}(t) \;\; w_{i_{2}^{*}}^{out}(t) \;\; {\cdots} \;\; w_{i_{K}^{*}}^{out}(t)\right]^{T} $$
(18)

and

$$ \mathbf{R}(t) = \left(\begin{array}{ccccc} w_{i_{1}^{*},1}^{in}(t) & w_{i_{1}^{*},2}^{in}(t) & {\cdots} & w_{i_{1}^{*},p+q}^{in}(t) \\ w_{i_{2}^{*},1}^{in}(t) & w_{i_{2}^{*},2}^{in}(t) & {\cdots} & w_{i_{2}^{*},p+q}^{in}(t) \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ w_{i_{K}^{*},1}^{in}(t) & w_{i_{K}^{*},2}^{in}(t) & {\cdots} & w_{i_{K}^{*},p+q}^{in}(t) \\ \end{array} \right). $$
(19)

In practice, since we usually have p+q > K, the matrix R is non-square. In this case, we estimate the coefficient vector c(t) by means of the regularized least-squares method:

$$ \mathbf{c}(t)=(\mathbf{R}^{T}(t)\mathbf{R}(t)+\lambda \mathbf{I})^{-1}\mathbf{R}^{T}(t)\mathbf{w}^{out}(t), $$
(20)

where I is an identity matrix of order K and λ > 0 (e.g. λ = 0.001) is a small constant added to the diagonal of R T(t)R(t) to make sure that this matrix is full rank. Once c(t) is estimated, we can locally approximate the output of the nonlinear mapping by means of the following linear mapping:

$$\begin{array}{@{}rcl@{}} \hat{u}(t) & = & \sum\limits_{k=1}^{q}b_{k}(t)u(t-k) + \sum\limits_{l=1}^{p}a_{l}(t)y(t-l), \\ {} & = & \mathbf{c}^{T}(t) \mathbf{x}^{in}(t). \end{array} $$
(21)

An approach quite similar to KSOM was introduced by [52] and used for inverse local modeling purpose in [19, 20]. In this architecture, the required prototype vectors are not selected as the K nearest prototypes to the current input vector, but rather automatically selected as the winning prototype at time t and its K−1 topological neighbors. If a perfect topology preservation is achieved during SOM training, the neurons in the topological neighborhood of the winning prototype are also the nearest ones to the current input vector. However, if topological defects are present, as usually occurs for multidimensional data, this property cannot be guaranteed. Thus, the use of this architecture is limited to topology-preserving VQ algorithms. The KSOM model, however, is general enough to be used with different types of VQ algorithms.

3 The proposed approaches

As mentioned previously, the two new techniques can be understood as multiple model variants of the original KSOM model. In other words, while the KSOM model requires only a single local linear model, whose vector of coefficients is estimated at each iteration step, the proposed approaches build multiple local linear models, one for each neuron in the SOM.

Another way of understanding the differences between the proposed approaches and the KSOM model is that while in the KSOM approach the single vector of coefficients is time-variant, in the proposed approaches the vector of coefficients of each local model is fixed after a learning phase. The proposed techniques differ basically in the way the vectors of coefficients are estimated. One of them uses the prototype (weight) vectors of the SOM neurons, while the other uses the data vectors mapped to these prototypes.

3.1 Local linear model based on the K nearest prototypes

The first technique to be described will be referred to as Prototype-based Multiple KSOM Model (P-MKSOM). The first step in building the P-MKSOM model requires the VQTAM approach (see Section 2.1). Building the local models of the P-MKSOM starts once VQTAM training is finished.

Firstly, let \(j_{k}^{(i)}\) denote the k-th nearest neighbor of neuron i. Thus, find the K nearest neighbors of the prototype vector \(\mathbf {w}^{in}_{i}\) as follows:

$$\begin{array}{@{}rcl@{}} i_{1} & = & \arg \min\limits_{\forall j\neq i} \left\{ \| {\mathbf{w}}_{i}^{in} - {\mathbf{w}}_{j}^{in} \| \right\}, \\ i_{2} & = & \arg \min\limits_{\forall j\neq \{i,i_{1}\}} \left\{ \| {\mathbf{w}}_{i}^{in} - {\mathbf{w}}_{j}^{in} \| \right\}, \\ {} & {\vdots} & \qquad \qquad {\vdots} \qquad \qquad {\vdots} \\ i_{K} & = & \arg \min\limits_{\forall j\neq \{i,i_{1},\ldots,i_{K-1}\}} \left\{ \| {\mathbf{w}}_{i}^{in} - {\mathbf{w}}_{j}^{in} \| \right\}, \end{array} $$
(22)

where \(\mathcal {J}_{i} = i \cup \{i_{k}\}_{k=1}^{K}\) is the set containing the indexes of the K nearest neighbors of the prototype vector \(\mathbf {w}^{in}_{i}\), including neuron i. Figure 1 illustrates this process.

Fig. 1
figure 1

P-MKSOM model building: a Voronoi cells associated with each prototype vector after training the VQTAM model; b the set of K = 6 nearest neighbors of a given prototype vector \({\mathbf {w}}_{i^{*}}^{in}\), whose indices are included in the set \((\mathcal {J}_{i^{*}})\)

Once the set \(\mathcal {J}_{i}\) is determined for each neuron i, we build N local regression models using the prototype vectors whose indexes belong to \(\mathcal {J}_{i}\). Thus, associated with neuron i, we have a coefficient vector \(\mathbf {c}_{i}\in \mathbb {R}^{p+q}\) computed using the regularized least-squares method:

$$ \mathbf{c}_{i}=\left(\mathbf{R}_{i}^{T}\mathbf{R}_{i} + \lambda \mathbf{I} \right)^{-1}\mathbf{R}_{i}^{T}\mathbf{b}_{i}^{out}, $$
(23)

where I is a identity matrix of order (p+q)×(p+q) and λ > 0 (e.g. λ = 0.001) is a small regularization constant. The vector \(\mathbf {b}^{out}_{i}\in \mathbb {R}^{K+1}\) is comprised of the output parts of the K+1 prototype vectors whose indexes belong to \(\mathcal {J}_{i}\), i.e.

$$ \mathbf{b}_{i}^{out} = \left[w_{i}^{out} \;\; w_{i_{1}}^{out} \;\; {\cdots} \;\; w_{i_{K}}^{out}\right]^{T}, $$
(24)

and the matrix \(\mathbf {R}_{i} \in \mathbb {R}^{(K+1)\times (p+q)}\) is comprised of the input parts of the same K+1 prototype vectors:

$$ \mathbf{R}_{i} = \left(\begin{array}{c} \left(\mathbf{w}_{i}^{in} \right)^{T} \\ \left(\mathbf{w}_{i_{1}}^{in} \right)^{T} \\ {\vdots} \\ \left(\mathbf{w}_{i_{K}}^{in} \right)^{T} \\ \end{array} \right) = \left(\begin{array}{ccccc} w_{i,1}^{in} & w_{i,2}^{in} & {\cdots} & w_{i,p+q}^{in} \\ w_{i_{1},1}^{in} & w_{i_{1},2}^{in} & {\cdots} & w_{i_{1},p+q}^{in} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ w_{i_{K},1}^{in} & w_{i_{K},2}^{in} & {\cdots} & w_{i_{K},p+q}^{in} \end{array} \right), $$
(25)

where the superscript T denotes the transpose vector/matrix.

Once the N local regression models are built, they can be used to approximate the output of the nonlinear mapping of interest. Recall that the P-MKSOM model requires one local model (and hence, one vector of coefficients) per neuron. Which one to use at time t is defined by the index of the winning neuron, determined as shown in (8).

Since we are interested in inverse system identification, the P-MKSOM model estimates the current input u(t) by means of the following equation:

$$ \hat{u}(t) = \mathbf{c}_{i^{*}}^{T} \mathbf{x}^{in}(t), $$
(26)

where the estimation error (residual) at time t is defined as \(e(t) = u(t) - \hat {u}(t)\).

3.2 Local linear model estimation based on the data vectors mapped to the K nearest prototypes

The second technique to be proposed, henceforth called Data-based Multiple KSOM Model (D-MKSOM), is similar to the P-MKSOM model, differing only in the way the vectors of coefficients c i , i = 1,...,N, are estimated. Instead of using the prototype vector of neuron i and of its K nearest neighbors, the D-MKSOM model computes the vector of coefficients c i of neuron i using the (training) data vectors that are mapped to that neuron and to its K nearest neighbors. In other words, in order to estimate the vector c i , the D-MKSOM model uses all the (training) data vectors belonging to the region formed by the union of the Voronoi cells of neuron i and of its K nearest neighbors.

The first and second steps in building the D-MKSOM model are the same as those for the P-MKSOM: (i) train the VQTAM model using the available training data. (ii) Then, find the set \(\mathcal {J}_{i} = i \cup \{i_{k}\}_{k=1}^{K}\) containing the indexes of the K nearest neighbors of the prototype vector \({\mathbf {w}}_{i}^{in}\), i = 1,...,N, as defined in (22).

A third step is necessary and consists if finding the set of (training) data vectors that are mapped to the prototypes \({\mathbf {w}}_{i}^{in}\), \({\mathbf {w}}_{i_{1}}^{in}\), \({\mathbf {w}}_{i_{2}}^{in}\), …, \({\mathbf {w}}_{i_{K}}^{in}\), for i = 1,…,N.

Let n (i) be the number of input vectors \({\mathbf {x}^{in}} \in {\mathbb {R}}^{p+q}\) mapped to the Voronoi cell of neuron i. Similarly, let \(n^{(i_{k})}\) be the number of input vectors \({\mathbf {x}^{in}} \in {\mathbb {R}}^{p+q}\) mapped to the Voronoi cell of k-th nearest neighbor of neuron i. Hence, the total number of vectors mapped to neuron i and its K nearest neighbors is given by

$$ n_{i} = n^{(i)} + n^{(i_{1})} + n^{(i_{2})} + {\cdots} + n^{(i_{K})}. $$
(27)

Also, let \(\mathbf {X}_{i}^{in}\) be a (p+qn (i) data matrix whose columns are the vectors x in mapped to the Voronoi cell of neuron i. Finally, let \({\mathbf {x}}_{i}^{out} \in \mathbb {R}^{n^{(i)}}\) the vector containing the target outputs x out associated with the vectors \({\mathbf {x}}^{in} \in \mathbf {X}_{i}^{in}\).

By the same token, \(\mathbf {X}_{i_{k}}^{in}\) is a (p+qn (i k ) data matrix whose columns are the vectors x in mapped to the Voronoi cell of neuron i k , k = 1,…,K. Accordingly, \({\mathbf {x}}_{i_{k}}^{out} \in \mathbb {R}^{n^{(i_{k})}}\) is the vector containing the target outputs x out associated with the vectors \({\mathbf {x}}^{in} \in {\mathbf {X}}_{i_{k}}^{in}\).

Once the pairs \(\{ \mathbf {X}_{i}^{in}, \mathbf {x}_{i}^{out}\}\), \(\{ \mathbf {X}_{i_{1}}^{in}, \mathbf {x}_{i_{1}}^{out}\}\), \(\{ \mathbf {X}_{i_{2}}^{in}, \mathbf {x}_{i_{2}}^{out}\}\), …, \(\{ \mathbf {X}_{i_{K}}^{in}, \mathbf {x}_{i_{K}}^{out}\}\), are determined for neuron i and its K nearest neighbors, we can build the local linear model for neuron i. For this purpose, assuming that the sets \({\mathbf {x}}_{i}^{out}\) and \({\mathbf {x}}_{i_{k}}^{out}\), k = 1,…,K are arranged as column vectors, we build the vector \(\mathbf {b}^{out}_{i} \in \mathbb {R}^{n_{i}}\) as follows:

$$ \mathbf{b}_{i}^{out} = \left[ \begin{array}{c} \mathbf{x}_{i}^{out} \\ \mathbf{x}_{i_{1}}^{out} \\ {\vdots} \\ \mathbf{x}_{i_{K}}^{out} \end{array} \right]_{n_{i} \times 1}. $$
(28)

Similarly, the regression matrix \(\mathbf {R}_{i} \in \mathbb {R}^{n_{i} \times (p+q)}\) is built using the data matrices \(\mathbf {X}_{i}^{in}\) and \(\mathbf {X}_{i_{k}}^{in}\), k = 1,…,K, as follows:

$$\begin{array}{@{}rcl@{}} \mathbf{R}_{i} & = & \left(\begin{array}{c} \left(\mathbf{X}_{i}^{in} \right)^{T} \\ \left(\mathbf{X}_{i_{1}}^{in} \right)^{T} \\ {\vdots} \\ \left(\mathbf{X}_{i_{K}}^{in} \right)^{T} \\ \end{array} \right)_{n_{i} \times (p+q)}, \end{array} $$
(29)

where the superscript T denotes the transpose of a vector/matrix.

Finally, the vector of coefficients of neuron i, \(\mathbf {c}_{i}\in \mathbb {R}^{p+q}\), is estimated using the regularized lest-squares method as

$$ \mathbf{c}_{i}=\left(\mathbf{R}_{i}^{T}\mathbf{R}_{i} + \lambda \mathbf{I} \right)^{-1}\mathbf{R}_{i}^{T}\mathbf{b}_{i}^{out}, $$
(30)

where I is a identity matrix of dimension (p+q)×(p+q) and λ > 0 (e.g. λ = 0.001) is a small regularization constant.

Once the N local regression models are built, they can be used to approximate locally the output of the nonlinear mapping of interest. Recall that in this paper we are interested in the inverse identification problem. Thus, the D-MKSOM model estimates u(t) also using (26).

4 Computer simulations and discussion

The performances of the proposed SOM-based local linear models are compared to MLP- and ELM-based global models in the task of inverse system identification. We also compare the proposed models with other local modeling approaches, such as the VQTAM, KSOM and the Local Linear Mapping (LLM) [62]. The LLM associates a linear model with each neuron of the SOM and estimates their vectors of coefficients using a variant of the least mean squares (LMS) adaptation rule. In essence, the LLM can be viewed as a recursiveFootnote 2 variant of the D-MKSOM model when K = 0. All simulations were carried out in Matlab®;.

All the models are initially evaluated via the statistics of the normalized mean-squared estimation error (NMSE) computed for the testing time series:

$$ NMSE=\frac{{\sum}^{M}_{t=1}e^{2}(t)}{M\cdot\hat{\sigma}_{u}^{2}} = \frac{{\sum}^{M}_{t=1} (u(t) - \hat{u}(t))^{2}}{M\cdot\hat{\sigma}_{u}^{2}}, $$
(31)

where \(\hat {\sigma }_{u}^{2}\) is the sample variance computed over the testing samples of the observed time series of the input variable, \(\{ u(t) \}_{t=1}^{M}\), with M being the number of samples.

Then, for the sake of completeness, residual analysis and hypothesis testing are carried out for the best models. More specifically, residual analysis is an important, but usually neglected, step in model validation. while hypothesis testing aims to analyze the influence of the VQ algorithm on the performance of the local models. Hypothesis testing is implemented using the Kolmogorov-Smirnov test [57] over the sequence of residuals generated by a given model.

4.1 A brief description of the datasets

The Hydraulic Actuator Dataset: This dataset consists of input and output time series of a hydraulic actuator of a mechanical structure known as crane [56]. This equipment has actually 4 actuators: one for the rotation of the whole structure, one to move the arm, one to move the forearm and one to move a telescopic extension of the forearm. This plant was chosen because it has a long arm and a long forearm with considerable flexibility on mechanic structure, making the movement of the whole crane oscillating and hard to control. The position of the arm is controlled by a hydraulic actuator. The oil pressure in the actuator is controlled by the extension of the valve opening through which the oil flows into the actuator. The position of the robot arm is then a function of the oil pressure.

The Robot Arm Dataset: This dataset is obtained from a flexible robot arm installed on an electric motor. The transfer function of this structure was modeled from the measured reaction torque of the structure on the ground to the acceleration of the flexible arm. The applied input is a periodic sine sweep. The measured values of the reaction torque of the structure define the input time series, while the acceleration of the flexible arm denotes the output variable. This dataset is available for download from the SISTA’s Identification Database (DaISy) websiteFootnote 3.

The Heat Exchanger Dataset: The third dataset comes from a liquid-satured steam heat exchanger [16], where water is heated by pressurized saturated steam through a copper tube. The motivation for the choice of the heat exchanger as a benchmark is that this plant is characterized by a non-minimum phase behavior which makes the design of controllers particularly challenging even in a linear context. The measured values of the liquid flow rate (in m 3/s) define the input time series, while the outlet liquid temperatures (in Celsius degrees) define the output time series. The sampling rate was set to 1s. This dataset is also available from the DaISy repository.

4.2 Results on the hydraulic actuator dataset

For this data set, the proposed P-MKSOM and D-MKSOM models were compared with two single-hidden-layer MLPs: one trained by the standard backprop algorithm (MLP-1h) and the other trained by the Levenberg-Marquardt (MLP-LM) algorithm. We also compared the proposed models with an MLP with two hidden layers trained by the backprop algorithm and with the ELM network. For all the nonlinear global models, the hidden neurons used hyperbolic tangent activation functions, while the output neuron used a linear one. For the sake of completeness, we also evaluated the performance of a global linear (ARX) model trained by the LMS algorithm.

The models were trained using the first 384 samples of the input/output time series, validated with the following 128 samples, and tested with the remaining 512 samples. Validation was required in order to find suitable values for training hyperparameters (e.g. number of neurons) of all models. Input/output time series were rescaled to the [−1,+1] range. Memory orders were set to p = 5 and q = 4.

A systematic search for the optimal number of hidden neurons that resulted in the smallest NMSE values on the validation dataset was carried out, with values of N ranging from 2 to 30. For each value of N, 100 independent training/validation runs were executed. The best configurations found for the MLP-based models with one hidden layer (MLP-1h and MLP-LM) had 20 hidden neurons. The number of neurons of the first hidden layer of the MLP-2h model was then set to 20, while the number of neurons of the second hidden layer was heuristically set to half the number of neurons of the first hidden layer. The MLP-1h and MLP-2h were trained with a fixed learning rate of 0.1. For the ELM model, the number of hidden neurons was set to the same value used by the MLP-1h and MLP-LM models.

The best number of neurons found for the VQTAM model after experimentation on the validation set was N = 20 (100 independent training/validation runs for each value of N). This number was then used by all the other SOM-based models. The initial and final learning rates were set to α 0 = 0.5 and α T = 0.01. The initial and final values of the neighborhood function radius were σ 0 = N/2 and σ T = 0.001. The learning rate for the LMS rule of the LLM model was set to 0.01. The optimal number of nearest neighbors for the KSOM was found to be around K = 10 (100 independent training/validation runs of each value of K ranging from 1 to 20). This value was then used by the P-MKSOM and D-MKSOM models.

The obtained results are shown in Table 1, where are displayed the mean, minimum, maximum and standard deviation (std) of the NMSE values, collected over 100 training/testing runs, with the weights of the neural models randomly initialized at each run. In this table, the models are sorted in decreasing order according to the mean NMSE values. One can note that the performances of P-MKSOM and D-MKSOM models on this dataset are far better than those of all other models, with the D-MKSOM model presenting better performance than the P-MKSOM model. The ELM and KSOM models had also acceptable performances on this dataset. It is worth mentioning that among the four best ranked models for this dataset, three of them are local linear models. A curious fact is that the linear global ARX model performed better than the three global MLP-based models (MLP-LM, MLP-1h, MLP-2h). Among the MLP-based models, the use of second-order information could also explains the better performance of the MLP-LM model.

Table 1 Performance results for the hydraulic actuator data

Figure 2 shows typical sequences of estimated values of the valve position provided by the best local and best global models for the the hydraulic actuator dataset. Figure 2a shows the sequence generated by the D-MKSOM model, while Fig. 2b shows the sequence estimated by the ELM model. It is very difficult to detect visually differences between the two figures, because both models achieved good performances.

Fig. 2
figure 2

Typical estimated sequences of the valve position provided by a the D-MKSOM model and b the ELM model. Open circles ‘ ∘’ denote actual sample values, while the solid line indicates the estimated sequence

4.3 Results on the flexible robot arm dataset

For this dataset additional tests were carried out and other models were included in the simulations. The models were trained using the first 615 samples of the input-output time series, validated with the following 205 samples and tested with the remaining 204 samples. Input/output time series were rescaled to the [−1,+1] range. The memory orders were set to p = 5 and q = 4. After some experimentation with the validation dataset, the best configurations found for the MLP-1h and MLP-LM models have 30 hidden neurons. Thus, the number of neurons in the first and second hidden layers of the MLP-2h was set to 30 and 15, respectively. The learning rates for the MLP-1h and MLP-2h were set to 0.1. For the ELM model, the number of hidden neurons was set to 30, same value used by the MLP-1h and MLP-LM models.

The best number of neurons found for the VQTAM model after experimentation on the validation set was N = 35 (100 independent training/validation runs for each value of N ranging from 5 to 50). This number of neurons was then used by all the other SOM-based models. For each SOM-based model, the initial and final learning rates were set to α 0 = 0.5 and α T = 0.01. The initial and final values of radius of the neighborhood function are σ 0 = N/2 and σ T = 0.001. The learning rate of the LMS rule used by the LLM model was set to 0.1. The optimal number of nearest neighbors for the KSOM was found to be K = 20 (after 100 independent training/validation runs for each value of K ranging from 1 to 35). This value was then used by the P-MKSOM and D-MKSOM models.

VQTAM variants, which included topological and geometric interpolations [24, 25] - named VQTAM-G and VQTAM-T, respectively - were included in the experiments with the hope of improving the approximation accuracy of the plain VQTAM model. The KSOM model was also implemented using the Parameterless SOM (PLSOM) architecture [12], which requires no learning rate and neighborhood width parameters.

The results in terms of NMSE values are shown in Table 2, after 100 training/testing runs. The weights of the neural models were randomly initialized at each run. As in Table 2, the models were sorted in decreasing order of the mean NMSE values.

Table 2 Performance results for the flexible robot arm data

The D-MKSOM model is again the best one, followed closely by the KSOM model. By replacing the SOM with the PLSOM network, the performance of the KSOM method degraded, but it still remained much better than the rest of the models. Among the global models, the ELM model achieved the best performance. It is worth pointing out that among the five best ranked models for this dataset, four of them are local linear models. The LLM model performed only better than the ARX, VQTAM-G and MLP-2h models. The performances of these three models were definitely very poor. As a final remark for this dataset, the VQTAM-T model performed much better than the VQTAM-G model.

Figure 3 shows typical sequences of estimated values of the reaction torque of the structure provided by the best local and best global models for the flexible robot arm dataset. Figure 3a shows the sequence generated by the D-MKSOM model, while Fig. 3b shows the sequence estimated by the ELM model.

Fig. 3
figure 3

Typical estimated sequences of reaction torque of the structure provided by a the D-MKSOM and the b MLP-LM models. Dashed lines denote actual sample values, while the solid line indicates the estimated sequence

4.4 Results on the heat exchanger dataset

For this dataset, we included performance results of variants of the P-MKSOM and D-MKSOM models obtained by replacing the SOM with other VQ algorithms in the VQTAM phase of the model. All the models were trained using the first 2200 samples of the input/output time series, validated with the following 1000 samples and tested with the remaining 800 samples. Input/output time series were rescaled to the [−1, +1] range.

The best configuration found for the MLP-1h and MLP-LM models had 20 hidden neurons (100 independent training/validation runs for each value of N in the range of 2 to 50). The number of neurons of the first hidden layer of the MLP-2h model was then set to 20, while for the second hidden layer it was set to half the number of neurons of the first hidden layer. The MLP-1h and MLP-LM were trained with constant learning rate equal to 0.1. For the ELM model, the number of hidden neurons was set to 20, same value used by the MLP-1h and MLP-LM models. The memory orders were set to p = 6 and q = 3.

The best number of neurons found for the VQTAM model after experimentation on the validation set was N = 30 (100 independent training/validation runs for each value of N ranging from 5 to 50). This number of neurons was then used by all the other SOM-based models. The initial and final learning rates were set to α 0 = 0.5 and α T = 0.001. The initial and final values of the neighborhood function radius are σ 0 = N/2 and σ T = 0.001. The learning rate for the LMS rule used by the LLM model was set to 0.1. The optimal number of nearest neighbors for the KSOM was found to be around K = 20 (100 independent training/validation runs for each value of K ranging from 1 to 30). This value was then used by the P-MKSOM and D-MKSOM models. The obtained results are shown in Table 3.

Table 3 Performance results for the heat exchanger data

This time, the best performance was achieved by the ELM network, a global model. Note, however, that the D-MKSOM model achieved better performance than the MLP-1h model and it performed comparably to the ELM model. Again, the VQTAM-T model performed better than the VQTAM-G model. It is worth noting that, this time, among the five best ranked models, three of them are global nonlinear models.

The influence of the VQ algorithm on the performances of the P-MKSOM and D-MKSOM approaches is analyzed in Table 4 for the heat exchanger dataset. The local models in question were implemented using the following VQ algorithms: K-means, WTA, FSCL and FCL. By analyzing this table we can observe that, for both the P-MKSOM and D-MKSOM methods, the best local models were the ones generated by the SOM, an possible indication that topology preservation was an important feature for the proposed models to capture the dynamics of this particular dataset.

Table 4 Performance results for the P-MKSOM and D-MKSOM models using different VQ algorithms (heat exchanger data)

4.5 Residual analysis

Residual analysis is an important but usually neglected issue in model validation. It allows the user to assess how well the model learns the dynamics of the training data and how well it respects the modeling assumptions. The most common modeling assumption is that noise should resemble a Gaussian white noise process [41] (see (4)). By analyzing the sequence of residuals produced by each model (local and global ones) using the testing data, the user can assess the degree of matching between the statistical properties of the sequence of residuals and the theoretic modeling assumptions.

For linear models, the following autocorrelation function (ACF) and the cross-correlation function (XCF) are used for validating the identified model:

$$\begin{array}{@{}rcl@{}} {\Phi}_{ee}(\tau)&=& E\{e(t-\tau)e(t)\}= \delta(\tau) \end{array} $$
(32)
$$\begin{array}{@{}rcl@{}} {\Phi}_{ue}(\tau) &=& E\{u(t-\tau)e(t)\}=0, \;\;\; \forall \tau \end{array} $$
(33)

where E{⋅} denotes the expected value operator, e(t) denotes the residual (error) obtained by the model at time t using testing data, u(t) is the corresponding input sample and τ > 0 is the lag. However, these functions are insufficient for evaluating nonlinear models [13]. In these cases, one has to resort to nonlinear variants of the ACF and XCF [14, 15], such as those listed below:

$$\begin{array}{@{}rcl@{}} {\Phi}_{u^{2'}e}(\tau) &=& E\{[u^{2}(t)-\overline{u^{2}(t)}]e(t-\tau)\}=0, \quad \forall \tau \end{array} $$
(34)
$$\begin{array}{@{}rcl@{}} {\Phi}_{u^{2'}{e}^{2}}(\tau) &=& E\{[u^{2}(t)-\overline{u^{2}(t)}]{e}^{2}(t-\tau)\}=0, \quad \forall \tau \end{array} $$
(35)

where δ(τ) is the Kronecker delta function. The overbar denotes the time average operation, while the prime symbol () in (34)–(35) indicates that the mean has been removed from the corresponding data sequence. The aforementioned linear and nonlinear correlation functions can be used to check if the sequence of residuals produced by a given model on testing data resembles a Gaussian white noise process.

Without loss of generality, we performed the residual analysis on the heat exchanger data set only. The results on the estimation of the linear and nonlinear ACF’s/XCF’s are shown in Fig. 4. We show only the results for the D-MKSOM, ELM and P-MKSOM models, since they presented the best performances for this dataset. By analyzing these figures, we can note that the assumptions of gaussianity and uncorrelatedness of the residuals are mostly satisfied by the three evaluated models.

Fig. 4
figure 4

Residual correlation analysis for the heat exchanger data: (a,d,g,j) D-MKSOM model, (b,e,h,k) ELM model and (c,f,i,l) P-MKSOM model

Additional qualitative evaluation of the residuals generated by the identified models is usually carried out by plotting histograms, boxplots and normal probability plots. By means of the histograms and normal probability plots we verify how close the distribution of the residuals is to the assumption of gaussianity. If the data are normal the normal probability plot will be linear. Boxplots are useful to identify outliers.

In Fig. 5 we show the histograms, boxplots and normal plots to the sequence of residuals generated by the D-MKSOM, ELM and P-MKSOM models. These results, combined with those provided by the correlation tests, suggest that the identified models performed well since they seem to fulfill the requirements posed by the modeling assumptions.

Fig. 5
figure 5

Histograms, boxplots and Normal probability plots of the residuals for the heat exchanger data set: (a,d,g) D-MKSOM model, (b,e,h) ELM model and (c,f,i) P-MKSOM model

4.6 Robustness analysis of the proposed models

The final set of experiments aims at evaluating the degree of similarity, from a statistical viewpoint, among the sequence of residuals generated by the P-MKSOM and D-MKSOM methods for different VQ algorithms. For this purpose, we use the Kolmogorov-Smirnov test (KS-test) [57]. The KS-test quantifies a distance between the empirical cumulative distribution functions (CDF) of two sequences of residuals. The null hypothesis to be tested is that the sequences are drawn from the same distribution.

The rationale for the application of the KS-test in the present local modeling context is the following: for example, if two P-MKSOM models are implemented using two different VQ algorithms and they generate statistically equivalent sequences of residuals (according to the KS-test), then the two P-MKSOM models should be equivalent to each other. The same reasoning applies to D-MKSOM models implemented via different VQ algorithms.

Table 5 presents the results for the P-MKSOM model, while the results for the D-MKSOM model are shown in Table 6. A rejection of the null hypothesis indicates that the CDF of the residuals generated by the original MKSOM modelFootnote 4 is different from the CDF of residuals generated by the MKSOM implemented with a different VQ algorithm. The acceptance of the null hypothesis indicates that the CDF of the residuals generated by the original MKSOM models is equivalent to the CDF of residuals generated by the MKSOM models implemented with a different VQ algorithm.

From Table 5 we can infer that, for the heat exchanger data, the performance of the original P-MKSOM model is statistically equivalent to those obtained by implementing the P-MKSOM with the FSCL, WTA and K-means algorithms. In case of equivalence, we recommend the user to choose the computationally lighter VQ algorithm to build the linear local model of interest (the WTA algorithm, in this case). The results shown in Table 4 contribute for choosing the models that have lower values of NMSE among the WTA, K-means and FSCL models.

Table 5 KS-test results on the P-MKSOM performance

From Table 6 we can infer that, for the heat exchanger data, the performance of the original D-MKSOM model is statistically equivalent to those obtained by implementing the D-MKSOM with all algorithms. As mentioned before, in case of equivalence, we recommend the user to choose the computationally lighter VQ algorithm.

Table 6 KS-test results on the D-MKSOM performance

5 Conclusions

In this paper we introduced two novel techniques, called D-MKSOM and P-MKSOM models, for estimating the parameters of local linear models for dynamical system identification. Originally, both proposed models are based on the SOM network, but they can be easily extended in order to use any prototype-based vector quantization algorithm. A comprehensive evaluation of the proposed approaches was carried out for the task of inverse identification of three benchmarking dynamical systems. Their performances were compared to those achieved by standard MLP-based global models and by the recently proposed ELM network.

In addition to the evaluation based on NMSE values, the models were validated by means of a thorough residual analysis to check their adequacy to modeling assumptions. A robustness analysis of the proposed models was carried out in order to evaluate the dependency of the proposed models with respect to the vector quantization algorithm used to implement the VQTAM approach.

From the exposed, the main general conclusion is that the proposed local linear models (D-MKSOM and P-MKSOM) consistently outperformed standard global models for system identification based on MLP and ELM neural networks, especially the D-MKSOM model. We also verified that the D-MKSOM model is more robust that the P-MKSOM model to changes in the base vector quantization algorithm.

Currently, we are extending the proposed approaches to deal with multi-input/multi-output (MIMO) systems.