Abstract
In this paper we introduce two novel techniques for local linear modeling of dynamical systems. As in the standard approach, we use vector quantization (VQ) algorithms, such as the Self-Organizing Map, to partition the joint input-output space into smaller regions. Then, to each neuron we associate a vector of parameters which must be suitably estimated. The first estimation technique uses the prototypes of the i-th neuron and its K nearest neighbors to build the corresponding local linear model. The second technique builds the i-th local linear model using the data vectors that are mapped into the regions comprised of the Voronoi cells of the i-th neuron and its K nearest neighbors. A comprehensive evaluation of the proposed techniques is carried out for the task of inverse identification of three benchmarking Single Input/Single Output (SISO) dynamical systems. Their performances are compared to those achieved by the Multilayer Perceptron and the Extreme Learning Machine networks. We also evaluate how robust are the proposed techniques with respect to the VQ algorithm used to partition the input-output space. The results show that proposed techniques consistently outperform standard approaches for all evaluated datasets.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
1 Introduction
Modern industrial plants have been the source of challenging tasks in dynamical system identification and control [51]. In particular, the design of control systems to achieve the level of quality demanded by current industry standards requires building accurate models of the plant being controlled. Usually, the designer tackles the plant modeling problem by trying to find models capable of representing faithfully the dynamics of the (possibly nonlinear) system of interest based solely on available data.
Since data are usually available in the form of input and output time series, these can be used for building direct or inverse models of nonlinear systems by means of computational intelligence methods, such as neural networks [8, 18, 46, 48], Takagi-Sugeno fuzzy models [34, 53, 59] or hybrid neuro-fuzzy systems [5, 6, 17, 54], to mention just a few possibilities. Although several techniques for nonlinear dynamical system identification have been proposed [1, 49], they can be categorized into one of the following approaches: global, local and hybrid models.
Global models usually implement a single structure, such as a single feedforward or recurrent neural network model, that approximates the whole mapping between the input and the output of the system being identified. Indeed, global models constitute the mainstream in nonlinear system identification and control [37, 46, 47, 63].
Local models, in their turn, have attracted much attention because of their ability to fit to the local shape of an arbitrary surface (i.e. mapping), which is particularly difficult when the dynamical system characteristics vary considerably throughout the input space. The input space is usually divided into smaller, localized regions, each one being associated with a simpler (usually linear) model. To estimate the system output at a given time, a single model is chosen from the pool of available local models according to some criteria defined on the current input data.
A common local modeling approach is known as local model network (LMN) [26, 28, 44, 45, 55, 60], which uses basis functions to implement its localized nature, similar to the standard Radial Basis Function network (RBFN). In LMNs, however, the output weights of an RBFN are replaced with local functions of the inputs. As a consequence, due to the better approximation properties of the local functions, the LMN model does not require as many basis functions as the standard RBFN to achieve the desired accuracy and, hence, the number of parameters is reduced dramatically [27].
Finally, hybrid models combine the aforementioned modeling approaches in order to build a suitable approximation of the dynamics of the system of interest [23, 39]. In [39], for example, it is introduced a novel technique, named as Mixture of Support Vector Machine Experts (MSVME), whose main purpose is to combine the complementary characteristics of Support Vector Machines (a global approach) and Mixture of Expert models (a local one) and apply them to dynamical system identification.
Local modeling techniques often use the Self-Organizing Map (SOM) to partition the input-output space into smaller regions, over which the local models are built [2, 7, 9, 10, 19, 20, 40, 50, 52]. The SOM [36] is an unsupervised competitive learning algorithm which has been commonly applied to clustering, vector quantization and data visualization tasks [35]. The results reported on those studies are rather appealing, indicating that SOM-based local models can be feasible alternatives to global models based on more traditional supervised neural network architectures, such as the RBFN and the Multilayer Perceptron (MLP), and also to more recent ones, such as the Extreme Learning Machine (ELM) [29, 30].
An important limitation of the aforementioned local linear models is that they were specifically designed to use the SOM algorithm. This means that their performances degrade considerably if another vector quantization (VQ) algorithm is used. The main advantage in using other VQ algorithm than the SOM to partition the input-output space is related to computational costs. There are several VQ algorithms available [61], such as the K-Means [22, 42] and the Frequency-Sensitive Competitive Learning (FSCL) [3], which are considerably lighter than the SOM and still achieve equivalent data partitioning results.
More recently, a general methodology for building and evaluating local linear models with respect to their robustness (i.e. insensitiveness) to changes in the VQ algorithm has been proposed in [58]. This methodology evaluates whether the difference between two distributions of residuals generated by the same local model using two different VQ algorithms are statistically significant or not. For this purpose, the nonparametric Kolmogorov-Smirnov hypothesis test was chosen. By using this methodology, it was shown that the SOM can be successfully replaced by a lighter VQ algorithm (e.g. the FSCL) without performance loss.
From the exposed, in this paper we introduce two novel strategies for estimating the parameters of local linear models which are general enough to be used by any VQ algorithm. Despite developing the proposed techniques initially for the SOM, we evaluate the performance of the resulting local linear models for different VQ algorithms by means of the methodology introduced in [58].
The first of the proposed techniques is a prototype-based approach, while the second one is a data-based approach. More specifically, the prototype-based technique uses the weight vectors of a given neuron of the SOM and of its K nearest neighbors to build the corresponding linear model. The data-based technique computes the local linear model associated with a given neuron using the data vectors that are mapped into the Voronoi cell of that neuron and the data vectors mapped into the Voronoi cells of its K nearest neighbors.
A comprehensive evaluation of the proposed approaches is carried out for the task of inverse identification of three benchmarking single-input/single-output (SISO) dynamical systems. Their performances are compared to those achieved by linear and nonlinear global models, and by other existing local linear modeling techniques. In addition, we perform a thorough statistical analysis of the residuals for model validation purposes and also evaluate how robust the proposed techniques are with respect to the VQ algorithm used to partition the input-output space.
The remainder of the paper is organized as follows. In Section 2, two previous SOM-based local modeling approaches which serve as building blocks for the proposed approaches are described. In Section 3, the proposed techniques for parameter estimation of linear local models are then presented. Comprehensive computer simulations and performance analysis of the proposed approaches are presented in Section 4. The paper is concluded in Section 5.
2 SOM-based local modeling approaches
Let us assume that the dynamical systems we are dealing with can be described mathematically by the NARX discrete time modelFootnote 1 [41, 49]:
with
and
Note that f(⋅) is an unknown nonlinear mapping, \(u(t) \in \mathbb {R}\) and \(y(t) \in \mathbb {R}\) denote, respectively, the input and output of the model at time step t, while p ≥ 1 and q ≥ 1 (p ≥ q) are the input-memory and output-memory orders, respectively. The variable ξ(t) stands for the error term, assumed as an additive white Gaussian noise process.
In this paper, we aim at evaluating the proposed local linear models on inverse system identification task, mainly for later use in control strategies. Thus, for the inverse system identification task we are interested in, the neural network models are designed to approximate the following inverse mapping:
This kind of nonlinear inverse model of a system is required, for example, in the implementation of several control system strategies, such as Direct Inverse Control [21], Internal Model Control [33, 38, 43, 49], Generalized Inverse Learning [31, 32] and others [4, 19, 20].
2.1 The VQTAM approach
The Vector-Quantized Temporal Associative Memory (VQTAM) [9] approach aims at building an input-output associative mapping using the Self-Organizing Map (SOM). It is a generalization to the temporal domain of a SOM-based associative memory technique that has been used by many authors to learn static (i.e. memoryless) input-output mappings, specially within the domain of robotics (see [11] and references therein).
The SOM learns from examples a mapping from a high-dimensional continuous input space \(\mathcal {X}\) onto a low-dimensional discrete space (lattice) \(\mathcal {A}\) of N neurons which are arranged in fixed topological forms, e.g., a rectangular 2-dimensional array. The map \(i^{*}(\mathbf {x}):\mathcal {X} \rightarrow \mathcal {A}\) is defined by the weight vectors \(\mathcal {W}=\{\mathbf {w}_{1}, \mathbf {w}_{2}, \ldots , \mathbf {w}_{N}\}, \mathbf {w}_{i} \in \mathbb {R}^{p} \subset \mathcal {X}\), and their corresponding coordinates \(\mathbf {r}_{i} \in \mathbb {R}^{2}\) in the lattice \(\mathcal {A}\).
According to the VQTAM framework, the input vector to the SOM at time step t, x(t), is composed of two parts. The first part, denoted \({\mathbf {x}}^{in}(t) \in \mathbb {R}^{p+q}\), carries data about the input of the dynamic mapping to be learned. The second part, denoted \(x^{out}(t) \in \mathbb {R}\), contains data concerning the desired output of this mapping. The weight vector of neuron i, w i (t), has its dimension increased accordingly. These simple definitions are formalized as follows:
where \({\mathbf {w}}_{i}^{in}(t) \in \mathbb {R}^{p+q}\) and \(w_{i}^{out}(t) \in \mathbb {R}\) are, respectively, the portions of the weight vector (a.k.a. prototype vector) which store information about the inputs and the outputs of the desired mapping. Depending on the variables chosen to build the vector x in(t) and scalar x out(t) one can use the SOM algorithm (or any other VQ algorithm) to learn the forward or the inverse mapping of a given dynamic system.
For the inverse identification task we are interested in, we need the following definitions:
The winning neuron at time t is determined based only on x in(t):
For updating the weights, both x in(t) and x out(t), are used:
where 0 < α(t) < 1 is the learning rate, and h(i ∗, i;t) is a time-varying Gaussian neighborhood function defined as
where r i (t) and \({\mathbf {r}}_{i^{*}}(t)\) are, respectively, the coordinates of the neurons i and i ∗ in the output array, and σ(t)>0 defines the radius of the neighborhood function at time t. The variables α(t) and σ(t) should both decay with time to guarantee convergence of the weight vectors to stable steady states. In this paper, we adopt an exponential decay for both variables:
where α 0 (σ 0) and α T (σ T ) are the initial and final values of α(t) (σ(t)), respectively.
As training proceeds, the SOM learns to associate input weight vectors \({\mathbf {w}}_{i}^{in}\) with the corresponding output weights \(w_{i}^{out}\). Weight adjustment is performed until a steady state of global ordering of the weight vectors has been achieved. In this case, we say that the map has converged.
Once the SOM has been trained, the predicted input \(\hat {u}(t)\), for a given input vector x in(t), is simply computed as
where the corresponding winning neuron i ∗(t) is determined as in (8).
It is worth noting that in MLP and RBF networks, the vector x in(t) is presented to the network input, while x out(t) is used as a target output for the computation of an error signal that guides learning. The VQTAM method in its turn allows VQ algorithms, such as the SOM, to include the desired output x out(t) as part of the network input vector x(t). This allows the interpretation of the term \(e_{i}(t)=x^{out}(t) - w_{i}^{out}(t)\) in (10) as the error term due to the i-th neuron. Thus, using (13), the estimation error due to the winning neuron is given by
2.2 The KSOM model
Souza and Barreto [58] introduced the KSOM model and evaluated it also in inverse system identification tasks. The KSOM model uses a single local linear model whose vector of coefficients is time-varying, i.e. it is computed at each iteration based on the prototype vectors of the neuron nearest to the current input vectors and of its K nearest neighbors.
For building a KSOM model we train firstly the VQTAM as described in Section 2.1 using just a few neurons (usually less than 100 units) in order to have a reduced representation of the input-output mapping encoded in the weight vectors of the VQTAM. Then, for a new input vector presented at time t, the coefficients of a time-varying linear local model are estimated using the weight vectors of the K first winning neurons \(\{ i_{1}^{*}, i_{2}^{*}, \dots , i_{K}^{*}\}\). These neurons are selected as follows:
Let the set of K winning weight vectors at time t be denoted by \(\{ \mathbf {w}_{i_{1}^{*}}, \mathbf {w}_{i_{2}^{*}}, \dots ,\mathbf {w}_{i_{K}^{*}} \}\). Recall that due to the VQTAM training style, each weight vector w i (t) has a portion associated with x in(t) and other associated with x out(t). So, the KSOM uses the corresponding K pairs of prototype vectors \(\{ \mathbf {w}_{i^{*}_{k}}^{in}(t), w_{i^{*}_{k}}^{out}(t) \}_{k=1}^{K}\), with the aim of building a local linear mapping at time t:
where c(t)=[b 1(t),…,b q (t),a 1(t),…,a p (t)]T is a time-varying coefficient vector. Equation (16) can be written in a matrix form as
where the output vector \(\mathbf {w}^{out}(t) \in \mathbb {R}^{K}\) and the regression matrix \(\mathbf {R}(t) \in \mathbb {R}^{K\times (p+q)}\) at time t are defined as follows
and
In practice, since we usually have p+q > K, the matrix R is non-square. In this case, we estimate the coefficient vector c(t) by means of the regularized least-squares method:
where I is an identity matrix of order K and λ > 0 (e.g. λ = 0.001) is a small constant added to the diagonal of R T(t)R(t) to make sure that this matrix is full rank. Once c(t) is estimated, we can locally approximate the output of the nonlinear mapping by means of the following linear mapping:
An approach quite similar to KSOM was introduced by [52] and used for inverse local modeling purpose in [19, 20]. In this architecture, the required prototype vectors are not selected as the K nearest prototypes to the current input vector, but rather automatically selected as the winning prototype at time t and its K−1 topological neighbors. If a perfect topology preservation is achieved during SOM training, the neurons in the topological neighborhood of the winning prototype are also the nearest ones to the current input vector. However, if topological defects are present, as usually occurs for multidimensional data, this property cannot be guaranteed. Thus, the use of this architecture is limited to topology-preserving VQ algorithms. The KSOM model, however, is general enough to be used with different types of VQ algorithms.
3 The proposed approaches
As mentioned previously, the two new techniques can be understood as multiple model variants of the original KSOM model. In other words, while the KSOM model requires only a single local linear model, whose vector of coefficients is estimated at each iteration step, the proposed approaches build multiple local linear models, one for each neuron in the SOM.
Another way of understanding the differences between the proposed approaches and the KSOM model is that while in the KSOM approach the single vector of coefficients is time-variant, in the proposed approaches the vector of coefficients of each local model is fixed after a learning phase. The proposed techniques differ basically in the way the vectors of coefficients are estimated. One of them uses the prototype (weight) vectors of the SOM neurons, while the other uses the data vectors mapped to these prototypes.
3.1 Local linear model based on the K nearest prototypes
The first technique to be described will be referred to as Prototype-based Multiple KSOM Model (P-MKSOM). The first step in building the P-MKSOM model requires the VQTAM approach (see Section 2.1). Building the local models of the P-MKSOM starts once VQTAM training is finished.
Firstly, let \(j_{k}^{(i)}\) denote the k-th nearest neighbor of neuron i. Thus, find the K nearest neighbors of the prototype vector \(\mathbf {w}^{in}_{i}\) as follows:
where \(\mathcal {J}_{i} = i \cup \{i_{k}\}_{k=1}^{K}\) is the set containing the indexes of the K nearest neighbors of the prototype vector \(\mathbf {w}^{in}_{i}\), including neuron i. Figure 1 illustrates this process.
Once the set \(\mathcal {J}_{i}\) is determined for each neuron i, we build N local regression models using the prototype vectors whose indexes belong to \(\mathcal {J}_{i}\). Thus, associated with neuron i, we have a coefficient vector \(\mathbf {c}_{i}\in \mathbb {R}^{p+q}\) computed using the regularized least-squares method:
where I is a identity matrix of order (p+q)×(p+q) and λ > 0 (e.g. λ = 0.001) is a small regularization constant. The vector \(\mathbf {b}^{out}_{i}\in \mathbb {R}^{K+1}\) is comprised of the output parts of the K+1 prototype vectors whose indexes belong to \(\mathcal {J}_{i}\), i.e.
and the matrix \(\mathbf {R}_{i} \in \mathbb {R}^{(K+1)\times (p+q)}\) is comprised of the input parts of the same K+1 prototype vectors:
where the superscript T denotes the transpose vector/matrix.
Once the N local regression models are built, they can be used to approximate the output of the nonlinear mapping of interest. Recall that the P-MKSOM model requires one local model (and hence, one vector of coefficients) per neuron. Which one to use at time t is defined by the index of the winning neuron, determined as shown in (8).
Since we are interested in inverse system identification, the P-MKSOM model estimates the current input u(t) by means of the following equation:
where the estimation error (residual) at time t is defined as \(e(t) = u(t) - \hat {u}(t)\).
3.2 Local linear model estimation based on the data vectors mapped to the K nearest prototypes
The second technique to be proposed, henceforth called Data-based Multiple KSOM Model (D-MKSOM), is similar to the P-MKSOM model, differing only in the way the vectors of coefficients c i , i = 1,...,N, are estimated. Instead of using the prototype vector of neuron i and of its K nearest neighbors, the D-MKSOM model computes the vector of coefficients c i of neuron i using the (training) data vectors that are mapped to that neuron and to its K nearest neighbors. In other words, in order to estimate the vector c i , the D-MKSOM model uses all the (training) data vectors belonging to the region formed by the union of the Voronoi cells of neuron i and of its K nearest neighbors.
The first and second steps in building the D-MKSOM model are the same as those for the P-MKSOM: (i) train the VQTAM model using the available training data. (ii) Then, find the set \(\mathcal {J}_{i} = i \cup \{i_{k}\}_{k=1}^{K}\) containing the indexes of the K nearest neighbors of the prototype vector \({\mathbf {w}}_{i}^{in}\), i = 1,...,N, as defined in (22).
A third step is necessary and consists if finding the set of (training) data vectors that are mapped to the prototypes \({\mathbf {w}}_{i}^{in}\), \({\mathbf {w}}_{i_{1}}^{in}\), \({\mathbf {w}}_{i_{2}}^{in}\), …, \({\mathbf {w}}_{i_{K}}^{in}\), for i = 1,…,N.
Let n (i) be the number of input vectors \({\mathbf {x}^{in}} \in {\mathbb {R}}^{p+q}\) mapped to the Voronoi cell of neuron i. Similarly, let \(n^{(i_{k})}\) be the number of input vectors \({\mathbf {x}^{in}} \in {\mathbb {R}}^{p+q}\) mapped to the Voronoi cell of k-th nearest neighbor of neuron i. Hence, the total number of vectors mapped to neuron i and its K nearest neighbors is given by
Also, let \(\mathbf {X}_{i}^{in}\) be a (p+q)×n (i) data matrix whose columns are the vectors x in mapped to the Voronoi cell of neuron i. Finally, let \({\mathbf {x}}_{i}^{out} \in \mathbb {R}^{n^{(i)}}\) the vector containing the target outputs x out associated with the vectors \({\mathbf {x}}^{in} \in \mathbf {X}_{i}^{in}\).
By the same token, \(\mathbf {X}_{i_{k}}^{in}\) is a (p+q)×n (i k ) data matrix whose columns are the vectors x in mapped to the Voronoi cell of neuron i k , k = 1,…,K. Accordingly, \({\mathbf {x}}_{i_{k}}^{out} \in \mathbb {R}^{n^{(i_{k})}}\) is the vector containing the target outputs x out associated with the vectors \({\mathbf {x}}^{in} \in {\mathbf {X}}_{i_{k}}^{in}\).
Once the pairs \(\{ \mathbf {X}_{i}^{in}, \mathbf {x}_{i}^{out}\}\), \(\{ \mathbf {X}_{i_{1}}^{in}, \mathbf {x}_{i_{1}}^{out}\}\), \(\{ \mathbf {X}_{i_{2}}^{in}, \mathbf {x}_{i_{2}}^{out}\}\), …, \(\{ \mathbf {X}_{i_{K}}^{in}, \mathbf {x}_{i_{K}}^{out}\}\), are determined for neuron i and its K nearest neighbors, we can build the local linear model for neuron i. For this purpose, assuming that the sets \({\mathbf {x}}_{i}^{out}\) and \({\mathbf {x}}_{i_{k}}^{out}\), k = 1,…,K are arranged as column vectors, we build the vector \(\mathbf {b}^{out}_{i} \in \mathbb {R}^{n_{i}}\) as follows:
Similarly, the regression matrix \(\mathbf {R}_{i} \in \mathbb {R}^{n_{i} \times (p+q)}\) is built using the data matrices \(\mathbf {X}_{i}^{in}\) and \(\mathbf {X}_{i_{k}}^{in}\), k = 1,…,K, as follows:
where the superscript T denotes the transpose of a vector/matrix.
Finally, the vector of coefficients of neuron i, \(\mathbf {c}_{i}\in \mathbb {R}^{p+q}\), is estimated using the regularized lest-squares method as
where I is a identity matrix of dimension (p+q)×(p+q) and λ > 0 (e.g. λ = 0.001) is a small regularization constant.
Once the N local regression models are built, they can be used to approximate locally the output of the nonlinear mapping of interest. Recall that in this paper we are interested in the inverse identification problem. Thus, the D-MKSOM model estimates u(t) also using (26).
4 Computer simulations and discussion
The performances of the proposed SOM-based local linear models are compared to MLP- and ELM-based global models in the task of inverse system identification. We also compare the proposed models with other local modeling approaches, such as the VQTAM, KSOM and the Local Linear Mapping (LLM) [62]. The LLM associates a linear model with each neuron of the SOM and estimates their vectors of coefficients using a variant of the least mean squares (LMS) adaptation rule. In essence, the LLM can be viewed as a recursiveFootnote 2 variant of the D-MKSOM model when K = 0. All simulations were carried out in Matlab®;.
All the models are initially evaluated via the statistics of the normalized mean-squared estimation error (NMSE) computed for the testing time series:
where \(\hat {\sigma }_{u}^{2}\) is the sample variance computed over the testing samples of the observed time series of the input variable, \(\{ u(t) \}_{t=1}^{M}\), with M being the number of samples.
Then, for the sake of completeness, residual analysis and hypothesis testing are carried out for the best models. More specifically, residual analysis is an important, but usually neglected, step in model validation. while hypothesis testing aims to analyze the influence of the VQ algorithm on the performance of the local models. Hypothesis testing is implemented using the Kolmogorov-Smirnov test [57] over the sequence of residuals generated by a given model.
4.1 A brief description of the datasets
The Hydraulic Actuator Dataset: This dataset consists of input and output time series of a hydraulic actuator of a mechanical structure known as crane [56]. This equipment has actually 4 actuators: one for the rotation of the whole structure, one to move the arm, one to move the forearm and one to move a telescopic extension of the forearm. This plant was chosen because it has a long arm and a long forearm with considerable flexibility on mechanic structure, making the movement of the whole crane oscillating and hard to control. The position of the arm is controlled by a hydraulic actuator. The oil pressure in the actuator is controlled by the extension of the valve opening through which the oil flows into the actuator. The position of the robot arm is then a function of the oil pressure.
The Robot Arm Dataset: This dataset is obtained from a flexible robot arm installed on an electric motor. The transfer function of this structure was modeled from the measured reaction torque of the structure on the ground to the acceleration of the flexible arm. The applied input is a periodic sine sweep. The measured values of the reaction torque of the structure define the input time series, while the acceleration of the flexible arm denotes the output variable. This dataset is available for download from the SISTA’s Identification Database (DaISy) websiteFootnote 3.
The Heat Exchanger Dataset: The third dataset comes from a liquid-satured steam heat exchanger [16], where water is heated by pressurized saturated steam through a copper tube. The motivation for the choice of the heat exchanger as a benchmark is that this plant is characterized by a non-minimum phase behavior which makes the design of controllers particularly challenging even in a linear context. The measured values of the liquid flow rate (in m 3/s) define the input time series, while the outlet liquid temperatures (in Celsius degrees) define the output time series. The sampling rate was set to 1s. This dataset is also available from the DaISy repository.
4.2 Results on the hydraulic actuator dataset
For this data set, the proposed P-MKSOM and D-MKSOM models were compared with two single-hidden-layer MLPs: one trained by the standard backprop algorithm (MLP-1h) and the other trained by the Levenberg-Marquardt (MLP-LM) algorithm. We also compared the proposed models with an MLP with two hidden layers trained by the backprop algorithm and with the ELM network. For all the nonlinear global models, the hidden neurons used hyperbolic tangent activation functions, while the output neuron used a linear one. For the sake of completeness, we also evaluated the performance of a global linear (ARX) model trained by the LMS algorithm.
The models were trained using the first 384 samples of the input/output time series, validated with the following 128 samples, and tested with the remaining 512 samples. Validation was required in order to find suitable values for training hyperparameters (e.g. number of neurons) of all models. Input/output time series were rescaled to the [−1,+1] range. Memory orders were set to p = 5 and q = 4.
A systematic search for the optimal number of hidden neurons that resulted in the smallest NMSE values on the validation dataset was carried out, with values of N ranging from 2 to 30. For each value of N, 100 independent training/validation runs were executed. The best configurations found for the MLP-based models with one hidden layer (MLP-1h and MLP-LM) had 20 hidden neurons. The number of neurons of the first hidden layer of the MLP-2h model was then set to 20, while the number of neurons of the second hidden layer was heuristically set to half the number of neurons of the first hidden layer. The MLP-1h and MLP-2h were trained with a fixed learning rate of 0.1. For the ELM model, the number of hidden neurons was set to the same value used by the MLP-1h and MLP-LM models.
The best number of neurons found for the VQTAM model after experimentation on the validation set was N = 20 (100 independent training/validation runs for each value of N). This number was then used by all the other SOM-based models. The initial and final learning rates were set to α 0 = 0.5 and α T = 0.01. The initial and final values of the neighborhood function radius were σ 0 = N/2 and σ T = 0.001. The learning rate for the LMS rule of the LLM model was set to 0.01. The optimal number of nearest neighbors for the KSOM was found to be around K = 10 (100 independent training/validation runs of each value of K ranging from 1 to 20). This value was then used by the P-MKSOM and D-MKSOM models.
The obtained results are shown in Table 1, where are displayed the mean, minimum, maximum and standard deviation (std) of the NMSE values, collected over 100 training/testing runs, with the weights of the neural models randomly initialized at each run. In this table, the models are sorted in decreasing order according to the mean NMSE values. One can note that the performances of P-MKSOM and D-MKSOM models on this dataset are far better than those of all other models, with the D-MKSOM model presenting better performance than the P-MKSOM model. The ELM and KSOM models had also acceptable performances on this dataset. It is worth mentioning that among the four best ranked models for this dataset, three of them are local linear models. A curious fact is that the linear global ARX model performed better than the three global MLP-based models (MLP-LM, MLP-1h, MLP-2h). Among the MLP-based models, the use of second-order information could also explains the better performance of the MLP-LM model.
Figure 2 shows typical sequences of estimated values of the valve position provided by the best local and best global models for the the hydraulic actuator dataset. Figure 2a shows the sequence generated by the D-MKSOM model, while Fig. 2b shows the sequence estimated by the ELM model. It is very difficult to detect visually differences between the two figures, because both models achieved good performances.
4.3 Results on the flexible robot arm dataset
For this dataset additional tests were carried out and other models were included in the simulations. The models were trained using the first 615 samples of the input-output time series, validated with the following 205 samples and tested with the remaining 204 samples. Input/output time series were rescaled to the [−1,+1] range. The memory orders were set to p = 5 and q = 4. After some experimentation with the validation dataset, the best configurations found for the MLP-1h and MLP-LM models have 30 hidden neurons. Thus, the number of neurons in the first and second hidden layers of the MLP-2h was set to 30 and 15, respectively. The learning rates for the MLP-1h and MLP-2h were set to 0.1. For the ELM model, the number of hidden neurons was set to 30, same value used by the MLP-1h and MLP-LM models.
The best number of neurons found for the VQTAM model after experimentation on the validation set was N = 35 (100 independent training/validation runs for each value of N ranging from 5 to 50). This number of neurons was then used by all the other SOM-based models. For each SOM-based model, the initial and final learning rates were set to α 0 = 0.5 and α T = 0.01. The initial and final values of radius of the neighborhood function are σ 0 = N/2 and σ T = 0.001. The learning rate of the LMS rule used by the LLM model was set to 0.1. The optimal number of nearest neighbors for the KSOM was found to be K = 20 (after 100 independent training/validation runs for each value of K ranging from 1 to 35). This value was then used by the P-MKSOM and D-MKSOM models.
VQTAM variants, which included topological and geometric interpolations [24, 25] - named VQTAM-G and VQTAM-T, respectively - were included in the experiments with the hope of improving the approximation accuracy of the plain VQTAM model. The KSOM model was also implemented using the Parameterless SOM (PLSOM) architecture [12], which requires no learning rate and neighborhood width parameters.
The results in terms of NMSE values are shown in Table 2, after 100 training/testing runs. The weights of the neural models were randomly initialized at each run. As in Table 2, the models were sorted in decreasing order of the mean NMSE values.
The D-MKSOM model is again the best one, followed closely by the KSOM model. By replacing the SOM with the PLSOM network, the performance of the KSOM method degraded, but it still remained much better than the rest of the models. Among the global models, the ELM model achieved the best performance. It is worth pointing out that among the five best ranked models for this dataset, four of them are local linear models. The LLM model performed only better than the ARX, VQTAM-G and MLP-2h models. The performances of these three models were definitely very poor. As a final remark for this dataset, the VQTAM-T model performed much better than the VQTAM-G model.
Figure 3 shows typical sequences of estimated values of the reaction torque of the structure provided by the best local and best global models for the flexible robot arm dataset. Figure 3a shows the sequence generated by the D-MKSOM model, while Fig. 3b shows the sequence estimated by the ELM model.
4.4 Results on the heat exchanger dataset
For this dataset, we included performance results of variants of the P-MKSOM and D-MKSOM models obtained by replacing the SOM with other VQ algorithms in the VQTAM phase of the model. All the models were trained using the first 2200 samples of the input/output time series, validated with the following 1000 samples and tested with the remaining 800 samples. Input/output time series were rescaled to the [−1, +1] range.
The best configuration found for the MLP-1h and MLP-LM models had 20 hidden neurons (100 independent training/validation runs for each value of N in the range of 2 to 50). The number of neurons of the first hidden layer of the MLP-2h model was then set to 20, while for the second hidden layer it was set to half the number of neurons of the first hidden layer. The MLP-1h and MLP-LM were trained with constant learning rate equal to 0.1. For the ELM model, the number of hidden neurons was set to 20, same value used by the MLP-1h and MLP-LM models. The memory orders were set to p = 6 and q = 3.
The best number of neurons found for the VQTAM model after experimentation on the validation set was N = 30 (100 independent training/validation runs for each value of N ranging from 5 to 50). This number of neurons was then used by all the other SOM-based models. The initial and final learning rates were set to α 0 = 0.5 and α T = 0.001. The initial and final values of the neighborhood function radius are σ 0 = N/2 and σ T = 0.001. The learning rate for the LMS rule used by the LLM model was set to 0.1. The optimal number of nearest neighbors for the KSOM was found to be around K = 20 (100 independent training/validation runs for each value of K ranging from 1 to 30). This value was then used by the P-MKSOM and D-MKSOM models. The obtained results are shown in Table 3.
This time, the best performance was achieved by the ELM network, a global model. Note, however, that the D-MKSOM model achieved better performance than the MLP-1h model and it performed comparably to the ELM model. Again, the VQTAM-T model performed better than the VQTAM-G model. It is worth noting that, this time, among the five best ranked models, three of them are global nonlinear models.
The influence of the VQ algorithm on the performances of the P-MKSOM and D-MKSOM approaches is analyzed in Table 4 for the heat exchanger dataset. The local models in question were implemented using the following VQ algorithms: K-means, WTA, FSCL and FCL. By analyzing this table we can observe that, for both the P-MKSOM and D-MKSOM methods, the best local models were the ones generated by the SOM, an possible indication that topology preservation was an important feature for the proposed models to capture the dynamics of this particular dataset.
4.5 Residual analysis
Residual analysis is an important but usually neglected issue in model validation. It allows the user to assess how well the model learns the dynamics of the training data and how well it respects the modeling assumptions. The most common modeling assumption is that noise should resemble a Gaussian white noise process [41] (see (4)). By analyzing the sequence of residuals produced by each model (local and global ones) using the testing data, the user can assess the degree of matching between the statistical properties of the sequence of residuals and the theoretic modeling assumptions.
For linear models, the following autocorrelation function (ACF) and the cross-correlation function (XCF) are used for validating the identified model:
where E{⋅} denotes the expected value operator, e(t) denotes the residual (error) obtained by the model at time t using testing data, u(t) is the corresponding input sample and τ > 0 is the lag. However, these functions are insufficient for evaluating nonlinear models [13]. In these cases, one has to resort to nonlinear variants of the ACF and XCF [14, 15], such as those listed below:
where δ(τ) is the Kronecker delta function. The overbar denotes the time average operation, while the prime symbol (′) in (34)–(35) indicates that the mean has been removed from the corresponding data sequence. The aforementioned linear and nonlinear correlation functions can be used to check if the sequence of residuals produced by a given model on testing data resembles a Gaussian white noise process.
Without loss of generality, we performed the residual analysis on the heat exchanger data set only. The results on the estimation of the linear and nonlinear ACF’s/XCF’s are shown in Fig. 4. We show only the results for the D-MKSOM, ELM and P-MKSOM models, since they presented the best performances for this dataset. By analyzing these figures, we can note that the assumptions of gaussianity and uncorrelatedness of the residuals are mostly satisfied by the three evaluated models.
Additional qualitative evaluation of the residuals generated by the identified models is usually carried out by plotting histograms, boxplots and normal probability plots. By means of the histograms and normal probability plots we verify how close the distribution of the residuals is to the assumption of gaussianity. If the data are normal the normal probability plot will be linear. Boxplots are useful to identify outliers.
In Fig. 5 we show the histograms, boxplots and normal plots to the sequence of residuals generated by the D-MKSOM, ELM and P-MKSOM models. These results, combined with those provided by the correlation tests, suggest that the identified models performed well since they seem to fulfill the requirements posed by the modeling assumptions.
4.6 Robustness analysis of the proposed models
The final set of experiments aims at evaluating the degree of similarity, from a statistical viewpoint, among the sequence of residuals generated by the P-MKSOM and D-MKSOM methods for different VQ algorithms. For this purpose, we use the Kolmogorov-Smirnov test (KS-test) [57]. The KS-test quantifies a distance between the empirical cumulative distribution functions (CDF) of two sequences of residuals. The null hypothesis to be tested is that the sequences are drawn from the same distribution.
The rationale for the application of the KS-test in the present local modeling context is the following: for example, if two P-MKSOM models are implemented using two different VQ algorithms and they generate statistically equivalent sequences of residuals (according to the KS-test), then the two P-MKSOM models should be equivalent to each other. The same reasoning applies to D-MKSOM models implemented via different VQ algorithms.
Table 5 presents the results for the P-MKSOM model, while the results for the D-MKSOM model are shown in Table 6. A rejection of the null hypothesis indicates that the CDF of the residuals generated by the original MKSOM modelFootnote 4 is different from the CDF of residuals generated by the MKSOM implemented with a different VQ algorithm. The acceptance of the null hypothesis indicates that the CDF of the residuals generated by the original MKSOM models is equivalent to the CDF of residuals generated by the MKSOM models implemented with a different VQ algorithm.
From Table 5 we can infer that, for the heat exchanger data, the performance of the original P-MKSOM model is statistically equivalent to those obtained by implementing the P-MKSOM with the FSCL, WTA and K-means algorithms. In case of equivalence, we recommend the user to choose the computationally lighter VQ algorithm to build the linear local model of interest (the WTA algorithm, in this case). The results shown in Table 4 contribute for choosing the models that have lower values of NMSE among the WTA, K-means and FSCL models.
From Table 6 we can infer that, for the heat exchanger data, the performance of the original D-MKSOM model is statistically equivalent to those obtained by implementing the D-MKSOM with all algorithms. As mentioned before, in case of equivalence, we recommend the user to choose the computationally lighter VQ algorithm.
5 Conclusions
In this paper we introduced two novel techniques, called D-MKSOM and P-MKSOM models, for estimating the parameters of local linear models for dynamical system identification. Originally, both proposed models are based on the SOM network, but they can be easily extended in order to use any prototype-based vector quantization algorithm. A comprehensive evaluation of the proposed approaches was carried out for the task of inverse identification of three benchmarking dynamical systems. Their performances were compared to those achieved by standard MLP-based global models and by the recently proposed ELM network.
In addition to the evaluation based on NMSE values, the models were validated by means of a thorough residual analysis to check their adequacy to modeling assumptions. A robustness analysis of the proposed models was carried out in order to evaluate the dependency of the proposed models with respect to the vector quantization algorithm used to implement the VQTAM approach.
From the exposed, the main general conclusion is that the proposed local linear models (D-MKSOM and P-MKSOM) consistently outperformed standard global models for system identification based on MLP and ELM neural networks, especially the D-MKSOM model. We also verified that the D-MKSOM model is more robust that the P-MKSOM model to changes in the base vector quantization algorithm.
Currently, we are extending the proposed approaches to deal with multi-input/multi-output (MIMO) systems.
Notes
NARX stands for Nonlinear Autoregressive model with exogenous inputs.
That is, instead of using the recursive LMS rule, the D-MKSOM model uses the standard (non-recursive) least-squares method.
By the original MKSOM models, we mean the P- and D-MKSOM models that use the VQTAM method with the SOM as the VQ algorithm.
References
Abonyi J (2003) Fuzzy Model Identification for Control. Birkhäuser
Abonyi J, Nemeth S, Vincze C, Arva P (2003) Process analysis and product quality estimation by self-organizing maps with an application to polyethylene production. Comput Ind 52(3):221–234
Ahalt S, Krishnamurthy A, Cheen P, Melton D (1990) Competitive learning algorithms for vector quantization. Neural Netw 3(3):277–290
Andrášik A, Mészáros A, de Azevedo S (2004) On-line tuning of a neural PID controller based on plant hybrid modeling. Comput Chem Eng 28(8):1499–1509
Azeem MF, Hanmandlu M, Ahmad N (2000) Generalization of adaptive neuro-fuzzy inference systems. IEEE Transactions on Neural Networks 11(6):1332–1346
Babuška R, Verbruggen H (2003) Neuro-fuzzy methods for nonlinear system identification. Annu Rev Control 27:73–85
Barreto GA, Aguayo L (2009). In: Príncipe JC, Miikkulainen R (eds) Time series clustering for anomaly detection using competitive neural networks. Springer, pp 28–36
Barreto GA, Araújo AFR (2004) Identification and control of dynamical systems using the self-organizing map. IEEE Transactions on Neural Networks 15(5):1244–1259
Barreto GA, Araújo AFR (2004) Identification and control of dynamical systems using the self-organizing map. IEEE Transactions on Neural Networks 15(5):1244–1259
Barreto GA, Souza LGM (2006) Adaptive filtering with the self-organizing maps: A performance comparison. Neural Netw 19(6):785–798
Barreto GA, Araújo AFR, Ritter HJ (2003) Self-organizing feature maps for modeling and control of robotic manipulators. J Intell Robot Syst 36(4):407–450
Berglund E, Sitte J (2006) The parameterless self-organizing map algorithm. IEEE Transactions on Neural Networks 17(2):305–316
Billings SA, Voon WSF (1983) Structure detection and model validity tests in the identification of nonlinear systems. IEE Proceedings, Part D, Control Theory and Applications 130(4):193–199
Billings SA, Voon WSF (1986) Correlation based model validity tests for nonlinear models. Int J Control 44(1):235–244
Billings SA, Zhu QM (1994) Nonlinear model validation using correlation tests. Int J Control 60(6):1107–1120
Bittanti S, Piroddi L (1997) Nonlinear identification and control of a heat exchanger: A neural network approach. J Frankl Inst 334(1):135–153
Chen JQ, Xi YG (1998) Nonlinear system modeling by competitive learning and adaptive fuzzy inference system. IEEE Trans Syst Man Cybern C 28(2):231–238
Chen S, Billings SA, Grant PM (1990) Nonlinear system identification using neural networks. Int J Control 51:1191–1214
Cho J, Principe J, Erdogmus D, Motter M (2006) Modeling and inverse controller design for an unmanned aerial vehicle based on the self-organizing map. IEEE Transactions on Neural Networks 17(2):445–460
Cho J, Principe J, Erdogmus D, Motter M (2007) Quasi-sliding mode control strategy based on multiple linear models. Neurocomputing 70(4-6):962–974
Daosud W, Thitiyasook P, Arpornwichanop A, Kittisupakorn P, Hussain MA (2005) Neural network inverse model-based controller for the control of a steel pickling process. Comput Chem Eng 29(10):2110–2119
Darken C, Moody J (1990) Fast adaptive k-means clustering: Some empirical results. In: Proceedings of the international joint conference on neural networks (IJCNN’90), vol 2, pp 233–238
Gan M, Peng H, Chen L (2012) A global-local optimization approach to parameter estimation of RBF-type models. Inf Sci 197:144–160
Göppert J, Rosenstiel W (1993) Topology preserving interpolation in selforganizing maps. In: Proceedings of the NeuroNIMES’93, pp 425–434
Göppert J, Rosenstiel W (1995) Topological interpolation in SOM by affine transformations. In: Proceedings of the european symposium on artificial neural networks (ESANN’95), pp 15–20
Gregorčič G, Lightbody G (2007) Local model network identification with gaussian processes. IEEE Transactions on Neural Networks 18(5):1404–1423
Gregorčič G, Lightbody G (2008) Nonlinear system identification: from multiple-model networks to gaussian processes. Eng Appl Artif Intell 21(7):1035–1055
Hametner C, Jakubek S (2013) Local model network identification for online engine modelling. Inf Sci 220:210–225
Huang G, Huang GB, Song S, You K (2015) Extreme learning machine: Theory and applications. Neural Netw 61(1):32–48
Huang GB, Zhu QY, Ziew CK (2006) Extreme learning machine: Theory and applications. Neurocomputing 70(1–3):489–501
Hunt KJ, Sbarbaro D, Zbikowski R, Gawthrop P (1992) Neural networks for control systems: a survey. Automatica 28(6):1083–1111
Hussain MA (1996) Inverse model control strategies using neural networks: Analysis, simulation and on-line implementation. PhD thesis, University of London
Hussain MA, Kershenbaum LS (2000) Implementation of an inverse-model-based control strategy using neural networks on a partially simulated exothermic reactor. Chem Eng Res Des 78(2):299–311
Kim E, Lee H, Park M, Park M (1998) A simply identified Sugeno-type fuzzy model via double clustering. Inf Sci 110(1–2):25–39
Kohonen T (2013) Essentials of the self-organizing map. Neural Netw 37:52–65
Kohonen TK, Oja E, Simula O, Visa A, Kangas J (1996) Engineering applications of the self-organizing map. Proc IEEE 84(10):1358–1384
Li X, Yu W (2002) Dynamic system identification via recurrent multilayer perceptrons. Inf Sci 147 (1–4):45–63
Lightbody G, Irwin GW (1997) Nonlinear control structures based on embedded neural system models. IEEE Transactions on Neural Networks 8(3):553–567
Lima CAM, Coelho ALV, Von Zuben FJ (2007) Hybridizing mixtures of experts with support vector machines: Investigation into nonlinear dynamic systems identification. Inf Sci 177(10):2049–2074
Liu J, Djurdjanovic D (2008) Topology preservation and cooperative learning in identification of multiple model systems. IEEE Transactions on Neural Networks 19(12):2065–2072
Ljung L (1999) System Identification: Theory for the user, 2nd edn. Prentice-Hall, Englewood Cliffs
MacQueen J (1967) Some methods for classification and analysis of multivariate observations. In: Le Cam LM, Neyman J (eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol 1, pp 281–297. University of California Press, Berkeley
Mu C, Sun C, Yu X (2011) Internal model control based on a novel least square support vector machines for MIMO nonlinear discrete systems. Neural Comput Applic 20(8):1159–1166
Murray-Smith R, Gollee H (1994) A constructive learning algorithm for local model networks. In: IEEE workshop on computer-intensive methods in control and signal processing, pp 21–29
Murray-Smith R, Hunt KJ (1995). In: Hunt KJ, Irwin GR, Warwick K (eds) Local model architectures for nonlinear modelling and control. Springer, Neural network engineering in dynamic control systems, pp 61–82
Narendra KS (1996) Neural networks for control theory and practice. Proc IEEE 84(10):1385–1406
Narendra KS, Lewis FL (2001) Special issue on neural networks feedback control. Automatica 37(8)
Narendra KS, Parthasarathy K (1990) Identification and control of dynamical systems using neural networks. IEEE Transactions on Neural Networks 1(1):4–27
Norgaard M, Ravn O (2000) Neural Networks for Modelling and Control of Dynamic Systems. Springer-Verlag, Hansen LK
Papadakis SE, Kaburlasos VG (2010) Piecewise-linear approximation of non-linear models based on probabilistically/possibilistically interpreted intervals numbers (INs). Inf Sci 180(24):5060–5076
Peng H, Nakano K, Shioya H (2007) A comprehensive review for industrial applicability of artificial neural networks. IEEE Trans Control Syst Technol 15(1):130–143
Principe JC, Wang L, Motter MA (1998) Local dynamic modeling with self-organizing maps and applications to nonlinear system identification and control. Proc IEEE 86(11):2240–2258
Rezaee B, Fazel Zarandi M (2010) Data-driven fuzzy modeling for TakagiSugenoKang fuzzy system. Inf Sci 180(2):241–255
Rubio JJ (2009) SOFMLS: Online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309
Shorten R, Murray-Smith R, Bjørgan R, Gollee H (1999) On the interpretation of local models in blended multiple model structures. Int J Control 72(7–8):620–628
Sjöberg J, Zhang Q, Ljung L, Benveniste A, Deylon B, Glorennec PY, Hjalmarsson H, Juditsky A (1995) Nonlinear black-box modeling in system identification: A unified overview. Automatica 31(12):1691–1724
Soong TT (2004) Fundamentals of Probability and Statistics for Engineers, 1st edn. Wiley, West Sussex
Souza LGM, Barreto GA (2010) On building local models for inverse system identification with vector quantization algorithms. Neurocomputing 73(10–12):1993–2005
Takagi T, Sugeno M (1985) Fuzzy identification of systems and its application to modeling and control. IEEE Trans Syst Man Cybern 15(1):116–132
Teslić L, Hartmann B, Nelles O, Škrjanc I (2011) Nonlinear system identification by gustafson-kessel fuzzy clustering and supervised local model network learning for the drug absorption spectra process. IEEE Transactions on Neural Networks 22(12):1941–1951
Vasuki A, Vanathi PT (2006) A review of vector quantization techniques. IEEE Potentials 25(4):39–47
Walter J, Ritter H, Schulten K (1990) Non-linear prediction with self-organizing map. In: Proceedings of the IEEE international joint conference on neural networks (IJCNN’90), vol 1, pp 587–592
Yu W (2004) Nonlinear system identification using discrete-time recurrent neural networks with stable learning algorithm. Inf Sci 158:131–157
Acknowledgments
The first author thanks CNPq for the financial support through the grant no. 309841/2012-7. The second author thanks FUNCAP and CAPES for the scholarships granted along the development of this research. Both authors thanks NUTEC (Fundação Núcleo de Tecnologia Industrial do Ceará) for providing the laboratory infrastructure for the execution of the computer experiments reported in this paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Barreto, G.A., M. Souza, L.G. Novel approaches for parameter estimation of local linear models for dynamical system identification. Appl Intell 44, 149–165 (2016). https://doi.org/10.1007/s10489-015-0699-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-015-0699-1