1 Introduction

The modeling and identification of dynamic systems are of fundamental importance in engineering. Tasks, such as the analysis of existing processes, the design of new processes and controllers, optimization, simulation, fault detection, among others, are based on process models and directly impacted by the quality of the obtained models. In the cases where obtaining a phenomenological model is complex, system identification techniques are typically employed (Schoukens and Ljung 2019; Nelles 2001).

Many processes exhibit static and dynamic nonlinear behaviors, predominantly when large changes in operating conditions are considered. Recurrent artificial neural networks (RNN) are an interesting option for modeling systems based on data, since they allow the identification of nonlinear processes in which there is little or no knowledge of process governing physics (Ljung et al. 2020; Yu 2004; Isermann and Münchof 2011).

The application of RNNs in the identification of dynamic systems is a vast topic and has been widely investigated in the literature. Shortly after the proposition of the multilayer perceptron (MLP) network, the first studies using RNN for system identification emerged and they considered recurrent architectures derived from the classic MLP (Fernandez et al. 1990; Ayoubi 1994). Although recurrent MLP networks are still used in system identification problems, such as in Boussaada et al. (2018), which use them to identify the daily solar radiation dynamics, new RNN architectures have been proposed and enable to obtain more easily identified models with good accuracy. One of such architectures which stands out due to its computationally efficient training procedure is the echo-state network (ESN), proposed in Jaeger (2001).

Some approaches for identifying nonlinear systems using ESN have been presented in the literature. In Jaeger (2003), the use of the ESN network in an online identification approach was first proposed. In Rodan and Tino (2010), some simplified ESN architectures were used for system identification. Recently, the study presented in Yang et al. (2019) proposed methods for online identification using ESN models with sparse recursive algorithms. Some applications which make use of this modeling approach to identify different types of dynamic systems were also reported recently, such as wind generation systems (Chen et al. 2019), oil wells (Antonelo et al. 2017), and cooling systems (Schwedersky et al. 2018).

Even though there are some examples of ESN application for system identification in the literature, this type of application is still incipient and the works that use it present specific architectures and model tuning methods for each application. As an approach to unify system identification tasks based on ESNs, in this paper we present a general architecture for identifying multiple-input and multiple-output (MIMO) nonlinear dynamical systems using ESNs. We provide a formalization of the best practices in the literature to create the ESN model, to guarantee that its reservoir is stable and has rich dynamics, and propose a tuning method to define the model main hyperparameters. In Sect. 2, a general simplified architecture is presented and includes the procedure to build ESN models and a method for tuning the main hyperparameters. The application of the proposed method is demonstrated through two case studies detailed in Sect. 3: the identification of a pH neutralization process, and the identification of a real industrial process, which is a test rig for hermetic refrigeration compressors. The results of the proposed method are compared with those presented by traditional system identification techniques, such as linear and nonlinear models, in Sect. 4. The conclusions are presented in Sect. 5.

2 System Identification with Echo-State Networks

The echo-state network (ESN) consists of a recurrent neural network architecture which is based on the reservoir computation (Lukoševičius and Jaeger 2009), an alternative learning paradigm to the traditional RNN paradigm. It is based on the use of a nonlinear dynamic system with randomly generated fixed weights to map the inputs into a high-dimensional space, in which the classification or regression task is easier to perform (Lukoševičius 2012).

The states of this dynamic system make up a structure generally denominated as the reservoir. Such structure can be understood as a temporal kernel, which projects the input into a dynamic nonlinear space. During the model operation, the reservoir states evolve in a trajectory dependent on external stimuli and also on the memory of past stimuli. The network output is obtained using an output layer, which processes the instantaneous states of the reservoir. A general representation of the ESN network is presented in Fig. 1.

Fig. 1
figure 1

ESN model architecture. Solid lines represent fixed connections, while dashed lines represent trainable weights

2.1 Model Structure

The ESN models have, generally, two main structures, the reservoir and its readout mechanism. The reservoir consists of a neural network with recurrent connections, which is responsible for its dynamic behavior. The connections between the reservoir neurons are fixed and randomly generated prior to the training phase, so the only trainable elements of the network are the weights and biases that make up the reservoir readout mechanism.

The ESN network can be described by a pair of equations, defining the state update and the model output. The state update equation

$$\begin{aligned} \mathbf{x} _{e}(k)&=(1-\alpha )\mathbf{x} _{e}(k-1) +\alpha \text {f}(\mathbf{W} _\mathrm{{in}}^\mathrm{{res}}{} \mathbf{u} (k) \nonumber \\&\quad +\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}{} \mathbf{x} _{e}(k-1)+\mathbf{W} _\mathrm{{bias}}^\mathrm{{res}} \mathrm {,} \end{aligned}$$
(1)

describes the dynamic behavior of the reservoir states, where: \(\mathbf{u} (k) \in \mathbb {R}^{N_u}\) is the input vector at time step k; \(\mathbf{x} _e(k) \in \mathbb {R}^{N_x}\) is the reservoir state vector; \(\alpha \) is the reservoir leak rate; \(\mathrm {f}(\cdot )\) is the neuron activation function; \(\mathbf{W} _\mathrm{{in}}^\mathrm{{res}} \in \mathbb {R}^{N_x \times N_u}\) and \(\mathbf{W} _\mathrm{{bias}}^\mathrm{{res}} \in \mathbb {R}^{N_x \times 1}\) are matrices that connect the input and bias with the reservoir, respectively; and \(\mathbf{W} _{\text {res}}^{\text {res}} \in \mathbb {R}^{N_x \times N_x}\) is the matrix that represents the reservoir recurrent connections.

In this formulation, the output is considered as a linear combination of the states, which have a nonlinear update mechanism, and the corresponding biases as

$$\begin{aligned} \mathbf{y} _{e}(k)=\mathbf{W} _\mathrm{{res}}^\mathrm{{out}}{} \mathbf{x} _{e}(k)+\mathbf{W} _\mathrm{{bias}}^\mathrm{{out}} \mathrm {,} \end{aligned}$$
(2)

where \( \mathbf{y} _e(k) \in \mathbb {R}^{N_y}\) is the network output vector; matrix \(\mathbf{W} _\mathrm{{res}}^\mathrm{{out}} \in \mathbb {R}^{N_y \times N_x}\) represents the connections between reservoir and output; and \(\mathbf{W} _\mathrm{{bias}}^\mathrm{{out}} \in \mathbb {R}^{N_y \times 1}\) is the matrix with output bias connections. However, it is also possible to consider a nonlinear mapping from the states to the outputs, as detailed in Lukoševičius (2012).

The general choice for the activation function, \(\mathrm {f}(\cdot )\), is a sigmoid function, and the hyperbolic tangent function is the most common option (Jaeger et al. 2007). Other functions can be used to activate the reservoir states, such as the linear function (Inubushi and Yoshimura 2017; Ganguli et al. 2008) and a combination of sigmoid and linear functions (Lun et al. 2015). Self-normalizing activation functions can also be used, following the architecture proposed in Verzelli et al. (2019).

The reservoir creation is fundamental to the model success, as the reservoir connection weights are fixed. The non-trainable connections, \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\), \(\mathbf{W} _\mathrm{{in}}^\mathrm{{res}}\), \(\mathbf{W} _\mathrm{{out}}^\mathrm{{res}} \), and \( \mathbf{W} _\mathrm{{bias}}^\mathrm{{res}} \), are generated following a specific distribution, generally uniform. The main design parameters are the connection rate (\( c_\mathrm{{from}}^\mathrm{{to}} \)) and the scaling (\( v_\mathrm{{from}}^\mathrm{{to}} \)), which define the matrix sparsity and the connection strength, respectively (Lukoševičius 2012).

The recurrent weight matrix, \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\), which defines the reservoir neuron connections, deserves special attention. Its main design parameter is the spectral radius, \( \rho (\mathbf{W} _\mathrm{{res}}^\mathrm{{res}})\), which corresponds to the largest absolute value of the eigenvalues of \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\), thus being directly associated with the stability of the reservoir matrix. Generally, matrix \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\) is created following a predetermined sparsity, being further re-scaled to guarantee that its largest eigenvalue is equal to the desired spectral radius.

An important condition for the successful creation of an ESN is the design of a reservoir that presents the echo-state property (ESP). This property implies that the reservoir states converge asymptotically, under a driving input. For ESNs that do not use leaky neurons in the reservoir, i.e., \(\alpha =1\), a sufficient condition for the echo-state property is \( \rho (\vert \mathbf{W} _\mathrm{{res}}^\mathrm{{res}} \vert )<1\), with \( \vert \mathbf{W} _\mathrm{{res}}^\mathrm{{res}} \vert \) representing the matrix formed by the absolute values of each original matrix element. For the case which considers leaky units, a sufficient condition to obtain the echo-state property is \( \rho (\mathbf{M} )<1 \), with \( \mathbf{M} =\vert \mathbf{W} _\mathrm{{res}}^\mathrm{{res}} \vert +(1-\alpha )\mathbf{I} \), where \( \mathbf{I} \) represents the identity matrix with the same dimension of \( \mathbf{W} _\mathrm{{res}}^\mathrm{{res}} \) (Yildiz et al. 2012). More details about the echo-state property and the ESN model initialization are found in Yildiz et al. (2012), Wainrib and Galtier (2016), and Lukoševičius (2012).

The reservoir time scale can be adjusted using the parameter \( \alpha \), the leak rate. This parameter defines the amount of the last reservoir state that will be preserved, as these neurons are leaky integrator units. By selecting \( \alpha \in (0,1]\), the reservoir memory is changed, with lower values resulting in a reservoir with more memory, while values closer to 1 result in less memory.

2.2 Model Training

The ESN output is formed by a linear combination of the reservoir internal states plus a bias. The reservoir readout mechanism, \(\mathbf{W} _\mathrm{{res}}^\mathrm{{out}}\), and the bias matrix associated with the output, \( \mathbf{W} _\mathrm{{bias}}^\mathrm{{out}} \), are the only trainable matrices in the ESN, and the ridge regression is the most commonly used alternative to obtain such matrices (Lukoševičius 2012). As an alternative for the ESN training, the least absolute shrinkage and selection operator (LASSO) regression can also be applied (Qiao et al. 2018).

As the reservoir readout weights are the only trainable parameters in the ESN model, the ESN can be trained by solving a regular least squares problem, which is computationally efficient in comparison with the training algorithms used for classical RNNs. The recursive estimation of the reservoir readout weights using the recursive least squares method is also possible, as described in (Jaeger 2003). Besides being computationally efficiency, the ESN training mechanism avoids some drawbacks verified in RNN training algorithms based on the backpropagation algorithm, such as the vanishing and exploding gradient problem (Hochreiter 1998; Lukoševičius and Jaeger 2009).

Considering \(\mathbf{W} ^\mathrm{{out}} = \begin{bmatrix} \mathbf{W} _\mathrm{{bias}}^\mathrm{{out}}&\mathbf{W} _\mathrm{{res}}^\mathrm{{out}} \end{bmatrix}\), the ESN output can be written as

$$\begin{aligned} \mathbf {Y}_{e}=\mathbf{W} ^\mathrm{{out}}\mathbf {X}_{e}\mathrm {,} \end{aligned}$$
(3)

with the matrix \( \mathbf{Y} _{e} \in \mathbb {R}^{N_y \times T} \) being defined as

$$\begin{aligned} \mathbf{Y} _{e} = \begin{bmatrix}{} \mathbf{y} _{e}(k)&\mathbf{y} _{e}(k+1)&\cdots&\mathbf{y} _{e}(k+T-1)\end{bmatrix} \mathrm {,} \end{aligned}$$
(4)

where \( \mathbf{X} _{e} \in \mathbb {R}^{(1+N_x)\times T} \) is a matrix formed by the reservoir states \( x_e \) and an always active input, written as

$$\begin{aligned} \mathbf{X} _{e} = \begin{bmatrix}1 &{} 1&{} \cdots &{} 1 \\ \mathbf{x} _{e}(k) &{} \mathbf{x} _{e}(k+1) &{} \cdots &{} \mathbf{x} _{e}(k+T-1) \end{bmatrix} \mathrm {,} \end{aligned}$$
(5)

both generated from a reservoir excited by \( \mathbf{u} _{e}(k) \) during a training period with T samples.

The learning procedure consists of finding the optimal value of \( \mathbf{W} ^\mathrm{{out}} \in \mathbb {R}^{N_y \times (1+N_x)}\) which minimizes the root mean-square error between \( \mathbf{y} _{e}(k) \) and \( \mathbf{y} (k) \), for all instants k, \( k+1 \), \( \dots \), \( k+T-1 \).

If the ridge regression method is used, matrix \( \mathbf{W} ^\mathrm{{out}} \) is defined as (Lukoševičius 2012)

(6)

with \(y_{e_i}(n)\) and \(y_i(n)\) representing the i-th network and measured outputs, respectively; \(\mathbf{w} _i^\mathrm{{out}}\) being the i-th row of \(\mathbf{W} ^\mathrm{{out}}\), representing the reservoir readout weights and bias associated with the i-th output; and the regularization term, \( \beta \), being used to penalize weights with large absolute values, which contributes to avoiding undesired overfitting. By solving this optimization problem, through the objective function minimization, the reservoir readout weight matrix \( \mathbf{W} ^\mathrm{{out}} \) is obtained. A solution for this optimization problem can be described as

$$\begin{aligned} \mathbf{W} ^\mathrm{{out}} = (\mathbf{X} _{e}^T\mathbf{X} _{e} + \beta \mathbf{I} )^{-1}{} \mathbf{X} _{e}^T\mathbf{Y} \mathrm {,} \end{aligned}$$
(7)

where matrix \( \mathbf{Y} \) represents the target values for the model outputs and has the same structure of \( \mathbf{Y} _{e} \).

This regularization approach is advised when there is risk of overfitting or feedback instability. Extremely large \(\mathbf{w} _i^\mathrm{{out}}\) values are an indicator of a very sensitive solution, which can easily become unstable. This is usually a result of overfitting to process noise and in some cases can be avoided with a proper regularization method (Lukoševičius 2012). Other literature approaches can also be used to improve the ESN robustness, such as using special loss functions (Li et al. 2012; Guo et al. 2017; Han and Xu 2018), using nonlinear reservoir readouts as a replacement for the conventional linear output layer, like the formulations based on the support vector machine (Shi and Han 2007) and kernel adaptive filtering (Zhou et al. 2018). There are also robust ESN alternatives based on the recursive estimation of the output layer weights, using the recursive Least M-estimate algorithm (Bessa and Barreto 2019).

2.3 System Identification Procedure

For the identification of nonlinear dynamic systems, the ESN can be applied in a standard system identification scheme, as shown in Fig. 2. The training of the ESN model uses examples of the process dynamics, organized as a data series of process inputs, \(\mathbf{u} (k) \in \mathbb {R}^{N_u}\), and outputs, \(\mathbf{y} (k) \in \mathbb {R}^{N_y}\), at time step k obtained through an identification test procedure. A perturbation signal, \(\mathbf{d} (k)\), may also affect the system output, but it cannot be measured; otherwise, it can be considered as part of \(\mathbf{u} (k)\) for system identification purposes. The ESN input layer can be used to scale the inputs, so that the reservoir is excited using values with proper amplitudes, with its output exciting the reservoir. The reservoir states and the recorded process outputs are used to obtain the reservoir readout weights, as presented in Sect. 2.2.

Fig. 2
figure 2

Block diagram for the nonlinear system identification with ESN. The ESN model is trained using data from an identification test

The procedure to obtain an identified system, based on the ESN model is summarized as:

  1. 1.

    Data series acquisition the input and output data series are obtained by applying an excitation signal on the process. For linear systems, a pseudo-random binary signal (PRBS) is a usual choice and its frequency is chosen based on the process dynamics. For nonlinear systems, in addition to the frequency, the signal amplitude must be carefully chosen to reach all desired operating conditions. To accomplish this, an alternative is the use of an amplitude-modulated PRBS signal (APRBS), where each step has a different amplitude (Nelles 2001).

  2. 2.

    Data series division the data series acquired during the identification test should be divided into three portions, used to train, develop, and test the model. The first portion is used to train the ESN models. The development set is used to verify the performance of the trained model, select model architectures, and tune hyperparameters during the model selection phase. The test set is used to evaluate the final model performance, in order to compare it with other models. Each set is formed by a contiguous series of the identification test data.

  3. 3.

    Model creation the procedure to create the ESN model is described in detail in Sect. 2.1. Matrices \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\), \(\mathbf{W} _\mathrm{{in}}^\mathrm{{res}}\), and \( \mathbf{W} _\mathrm{{bias}}^\mathrm{{res}} \) are generated using a uniform distribution, with the matrix \(\mathbf{W} _\mathrm{{in}}^\mathrm{{res}}\) being scaled using information about the input range, so that the expected reservoir inputs are roughly mapped into a \([-1,+1]\) range. Matrix \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\) is created considering all of its elements with positive values, is scaled considering the desired spectral radius, and then some connections are transformed into negative values, to achieve a stable reservoir with diverse dynamics. This procedure is sufficient to achieve the ESP when the selected spectral radius is lower than 1, as discussed in Sect. 2.1.

  4. 4.

    Model training the ESN training procedure is performed using the ridge regression method, described in Sect.  2.2. Matrix \(\mathbf{W} ^\mathrm{{out}}\) is obtained by solving (7).

  5. 5.

    Hyperparameters tuning the main ESN hyperparameters (reservoir size, connection rate, leak rate, and reservoir spectral radius) are tuned using a grid search procedure. The hyperparameters are initialized with feasible values, and through successive model creation, training, and evaluation using the development set, the influence of each of these parameters is assessed. Details about this step are presented in the case studies, in Sect. 3.

  6. 6.

    Model evaluation the final model performance is evaluated using the test data series, by feeding the model with the test data series inputs and comparing the model output with the true output of the system in the test data series.

3 Case Studies: Model Development and Tuning

In this section, the proposed nonlinear system identification procedure based on ESN models, presented in Section 2, is demonstrated. Two nonlinear system identification problems are presented, for which the data series acquisition, model creation, training, hyperparameter tuning, and performance evaluations are detailed. The first case study, which is presented in Sect. 3.1, details the identification of a simulated pH neutralization process. The second one, presented in Sect. 3.2, consists of the identification of a real MIMO test rig used to test refrigeration compressors. The first three steps of the model development procedure, presented in Sect. 2.3, are detailed in the individual sections of each case study, while the last step, the model evaluation, is presented in Sect. 4.

3.1 pH Neutralization Process

The pH neutralization is a process used in many literature sources to benchmark nonlinear system identification and nonlinear control algorithms. This process is widely used in the literature because of its relevance in the chemical industry and its challenging nonlinear dynamics. This case study considers the formulation presented in Gomez et al. (2004), which is a simplification of the one detailed in Henson and Seborg (1994).

The neutralization reactor process consists of the mixing, in a tank with constant volume V, of a NaOH base stream \( q_1 \), a \( \mathrm {NaHCO}_3 \) buffer stream \( q_2 \), and an \( \mathrm {HNO}_3 \) acid stream \( q_3 \), as shown in the process diagram presented in Fig. 3. The main challenge in this process consists of the identification of the process operating with only strong acids and strong bases, case in which the process operates in a highly nonlinear operating region, near the pH neutral zone.

Fig. 3
figure 3

P&ID diagram of the pH neutralization process

The process output, y, is the pH of the effluent solution \(q_4\), being manipulated by the base flow rate \( q_1 \) and the buffer flow rate \( q_2 \), which is considered as an unmeasured disturbance. In this formulation, \( q_3 \) stream is assumed to be constant. All the flow variables are expressed in milliliter per second.

The first principles modeling of the pH neutralization process is presented in detail in Henson and Seborg (1994). The dynamic model is obtained using conservation equations and equilibrium relations, considering perfect mixing, constant density, and complete solubility of the ions. The chemical reaction is defined as

$$\begin{aligned} \text {H}_2\text {CO}_3&\rightleftharpoons \text {HCO}_3^- + \text {H}^+, \end{aligned}$$
(8)
$$\begin{aligned} \text {HCO}_3^-&\rightleftharpoons \text {CO}_3^{2-} + \text {H}^+, \end{aligned}$$
(9)
$$\begin{aligned} \text {H}_2\text {O}&\rightleftharpoons \text {OH}^- + \text {H}^+, \end{aligned}$$
(10)

with the equilibrium constants corresponding to

$$\begin{aligned} K_{\text {a}_1}&=\frac{[\text {HCO}_3^-][\text {H}^+]}{[\text {H}_2\text {CO}_3]}, \end{aligned}$$
(11)
$$\begin{aligned} K_{\text {a}_2}&=\frac{[\text {CO}_3^{2-}][\text {H}^+]}{[\text {H}\text {CO}_3^-]}, \end{aligned}$$
(12)
$$\begin{aligned} K_\text {w}&=[\text {H}^+][\text {OH}^-] \text {.} \end{aligned}$$
(13)

By defining two reaction invariants, \(W_\text {a}\), which is a charge related quantity, and \(W_\text {b}\), that represents the \(\text {CO}_3^{2-}\) ion concentration, the chemical equilibria for each stream \(i \in [1,4]\) are

$$\begin{aligned} W_{\text {a}_i}&=[\text {H}^+]_i - [\text {OH}^-]_i-[\text {HCO}_3^-]_i-2[\text {CO}_3^{2-}]_i, \end{aligned}$$
(14)
$$\begin{aligned} W_{\text {b}_i}&=[\text {H}_2\text {CO}_3]_i+[\text {H}\text {CO}_3^-]_i+[\text {CO}_3^{2-}]_i \text {.} \end{aligned}$$
(15)

From the quantities \(W_\text {a}\) and \(W_\text {b}\), the pH can be obtained as

$$\begin{aligned}&W_\mathrm{{b}}\frac{ \frac{K_{\text {a}_1}}{[\text {H}^+]} + \frac{2K_{\text {a}_1}K_{\text {a}_2}}{[\text {H}^+]^2} }{1 + \frac{K_{\text {a}_1}}{[\text {H}^+]} + \frac{K_{\text {a}_1}K_{\text {a}_2}}{[\text {H}^+]^2}} + W_\text {a} + \frac{K_\text {w}}{[\text {H}^+]} - [\text {H}^+] = 0, \end{aligned}$$
(16)
$$\begin{aligned}&\text {pH}=-\text {log}([\text {H}^+]) \text {.} \end{aligned}$$
(17)

Since the tank volume is constant, the mass balance results in

$$\begin{aligned} q_1+q_2+q_3-q_4=0 \text {,} \end{aligned}$$
(18)

which can be combined with the mass balance for each ionic species to obtain the differential equations for the reaction invariants \(W_\mathrm{{a_4}}\) and \(W_\mathrm{{b_4}}\), given by

$$\begin{aligned} \frac{{\text {d}} W_\mathrm{{a_4}}(t)}{{\text {d}} t}&=\frac{q_1(t)(W_\mathrm{{a_1}}-W_\mathrm{{a_4}}(t))}{V} + \frac{q_2(t)(W_\mathrm{{a_2}}-W_\mathrm{{a_4}}(t))}{V}\nonumber \\&\quad + \frac{q_3(t)(W_\mathrm{{a_3}}-W_\mathrm{{a_4}}(t))}{V} \end{aligned}$$
(19)
$$\begin{aligned} \frac{{\text {d}} W_\mathrm{{b_4}}(t)}{{\text {d}} t}&=\frac{q_1(t)(W_\mathrm{{b_1}}-W_\mathrm{{b_4}}(t))}{V} + \frac{q_2(t)(W_\mathrm{{b_2}}-W_\mathrm{{b_4}}(t))}{V}\nonumber \\&\quad + \frac{q_3(t)(W_\mathrm{{b_3}}-W_\mathrm{{b_4}}(t))}{V} \mathrm {.} \end{aligned}$$
(20)

The process dynamics, in a state space formulation, is defined as

$$\begin{aligned}&\dot{\mathbf{x }}=\mathbf{r} (\mathbf{x} )+\mathbf{g} (\mathbf{x} )q_1+\mathbf{p} (\mathbf{x} )q_2, \end{aligned}$$
(21)
$$\begin{aligned}&h(\mathbf{x} ,y)=0, \end{aligned}$$
(22)

where the process states are

$$\begin{aligned} \mathbf{x} =\begin{bmatrix} x_1&x_2 \end{bmatrix}^T = \begin{bmatrix} W_\mathrm{{a_4}}&W_\mathrm{{b_4}} \end{bmatrix}^T, \end{aligned}$$
(23)

and

$$\begin{aligned} \mathbf{r} (\mathbf{x} )&=\begin{bmatrix} \frac{q_3(t)(W_\mathrm{{a_3}}-x_1)}{V}&\frac{q_3(t)(W_\mathrm{{b_3}}-x_2)}{V} \end{bmatrix}^T \text {,} \end{aligned}$$
(24)
$$\begin{aligned} \mathbf{g} (\mathbf{x} )&=\begin{bmatrix}\frac{(W_\mathrm{{a_1}}-x_1)}{V}&\frac{(W_\mathrm{{b_1}}-x_2)}{V} \end{bmatrix}^T \text {,} \nonumber \\ \mathbf{p} (\mathbf{x} )&=\begin{bmatrix}\frac{(W_\mathrm{{a_2}}-x_1)}{V}&\frac{(W_\mathrm{{b_2}}-x_2)}{V}\end{bmatrix}^T \text {,} \end{aligned}$$
(25)
$$\begin{aligned} h(\mathbf{x} ,y)&= x_1+10^{y(t)-14}-10^{-y(t)} \nonumber \\&\quad +x_2\frac{1+2 \times 10^{y(t)-K_2}}{1+10^{K_1-y(t)}+10^{y(t)-K_2}} \mathrm {,} \end{aligned}$$
(26)

with \(K_1\) and \(K_2\) representing the first and second disassociation constants of the weak acid \(\text {H}_2\text {CO}_3\), respectively.

The nominal operating values for this process are presented in Table 1.

Table 1 Nominal operating conditions of the neutralization process phenomenological model

3.1.1 Data Series Acquisition

The phenomenological model of the pH neutralization process was used as the real process. The numerical integration Runge–Kutta 45 was used to solve the ordinary differential equations, presented in (20). To represent measurement noise, white noise with variance 0.05 was considered in the pH value, which is obtained by solving the process output Eq. (26) in the simulation.

An open-loop simulation was performed to obtain the data series for the system identification procedure. The sampling time selected for the model is 10 s. For the process excitation, an APRBS was designed to take the system to operate near the main desired operating conditions, represented by pH values between 5 and 10.

3.1.2 Model Creation and Hyperparameter Tuning

The selection of the best hyperparameters was performed through a grid search, where a new model was instanced using the selected hyperparameter set, was trained by solving Eq. (7), and had its performance accessed using the test set. This search considered as performance metric the mean absolute percentage error (MAPE) and evaluated the four main hyperparameters within the applicable range. The hyperparameters selected for the grid search were the number of reservoir units, the spectral radius, the leak rate, and the regularization parameter. The number of reservoir units is usually in the order of hundreds or thousands, and the maximum value tested in this work was 5000 units. To obtain reservoirs with the ESP, the spectral radius should be selected as discussed in Sect. 2.1. Since the leak rate is proposed to be tested in the range, we propose that the spectral radius is kept at values less than 1, which results in reservoirs with the ESP regardless of the leak rate value, when the reservoir matrix is created using the procedure proposed in Yildiz et al. (2012). The leak rate should be tested within the (0, 1] range. For the regularization parameter choice, values in the \([10^{-4},10^{1}]\) range were considered. The reservoir connection rate was selected as \( c_\mathrm{{res}}^\mathrm{{res}}=0.001 \), to create a sparse reservoir. The choice for the reservoir neurons activation function, \(\text {f} (\cdot )\), was the hyperbolic tangent function, given by

$$\begin{aligned} \text {f}(x)=\frac{e^{2x}-1}{e^{2x}+1} \text {.} \end{aligned}$$
(27)

The grid search considered 5 models being trained for each configuration. The tuning that presented the lowest MAPE in the grid search has a 4000 unit reservoir, a leak rate of 0.3, and a spectral radius \(\rho (\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}) = 0.99\). The best training regularization parameter was \( \beta = 10^{-4} \). A graphical representation of the grid search experiments is presented in Fig. 4, which presents the mean values and the upper and lower limits defined by two standard deviations around the mean of the MAPE value for the validation set. Each representation considers the impact of a specific hyperparameter, while the others are kept at their best tuning values. The four main hyperparameters are considered in this analysis. The selected values for the final model are presented as an asterisk in Fig. 4.

Fig. 4
figure 4

Hyperparameter impact on the ESN model performance for the pH neutralization process. Evaluated hyperparameters were: a reservoir size, b spectral radius, c leak rate, and d regularization. The black lines represent the mean MAPE value, the gray regions represent two standard deviations, and the asterisks represent the final selected values

The hyperparameters whose tuning presented higher impact in the model performance are the spectral radius and the leak rate, both associated with the reservoir creation. Similar behavior is reported in Lukoševičius (2012). For these parameters, a fine-tuning can result in model performance improvements. The ridge regression regularization parameter, which tunes the model bias/variance trade-off by penalizing high readout weights, and the reservoir size also impact the model performance, but not as much as the other two hyperparameters. In this specific case study, tuning the ridge regression parameter with low values presented the best results. In addition, larger reservoir sizes result in better prediction performance, but the performance gain is small for reservoirs larger than 3000.

This model creation considered a conservative tuning, with \(\rho (\mathbf{W _\mathrm{{res}}^\mathrm{{res}})}\) being tuned to guarantee the echo-state property, because \(\rho (\mathbf{W _\mathrm{{res}}^\mathrm{{res}})}<1\) is a necessary condition for the ESP. If a less conservative tuning is considered, with values larger than one, the model performance may improve; however, it is possible that this tuning does not have the ESP, which can pose a problem, as the model stability cannot be guaranteed.

3.2 Refrigeration Compressor Test Rig

The refrigeration compressor test rig considered in this section is used in industry to perform tests in refrigeration compressor samples, by subjecting the sample to different operating conditions, thus emulating its use in several types of refrigeration systems. The test rig has valves connected with the compressor suction and discharge lines, in order to manipulate the pressures associated with these lines. These valves are connected to a buffer tank, which decouples partially the suction and discharge pressures. A simplified piping and instrumentation diagram of the process and a picture of the experimental test rig are presented in Fig. 5.

Fig. 5
figure 5

Refrigeration compressor test rig: a simplified piping and instrumentation diagram and b experimental process picture

The valve associated with the suction line is normally closed, which makes the suction pressure directly proportional to the voltage applied. The valve connected with the discharge line is normally open, making the discharge pressure also directly proportional to the voltage applied. Due to the process nature, there is a coupling between the suction and discharge pressures, so changes in the suction valve position affect not only the suction pressure, but also the discharge pressure. However, the coupling between the discharge valve and the suction pressure is negligible, being almost completely compensated by the buffer tank.

3.2.1 Data Series Acquisition

For this case study, the data series was obtained from a real test rig. In this rig, the valves are manipulated in order to control the pressures, which are acquired using a sampling time of 0.1 s. Based on previous knowledge of the process typical operating conditions and the time duration of the process dynamics, three APRBS signals were generated to reach all the desired process operating conditions. The valves were manipulated once at a time, so at each change in a valve the other one was kept at a fixed position (with different but constant values at each change). These identification tests were divided into three datasets, to train, develop, and test the models.

3.2.2 Model Creation and Hyperparameter Tuning

Similarly to the procedure presented for the pH neutralization process, to create the ESN model for the refrigeration compressor test rig, an initial model structure was selected as the starting point for fine-tuning the hyperparameters, which was performed in a grid search. The same activation function used in the pH neutralization process and presented in (27) was used in this case study. The ESN model was built with a MIMO structure, due to the nature of the process, considering the voltages of the two valves as inputs and returning the pressures in the suction and discharge of the compressor as the model outputs. Since this case study considers a MIMO process which presents output variables with different magnitude levels, it is important to consider an appropriate performance index which makes it possible to evaluate the model outputs with equal importance to perform the hyperparameter tuning. For this purpose, the MAPE for each model output was used.

The ESN model tuning was performed through a grid search of the reservoir size, spectral radius, leak rate, and regularization parameter, considering the same range presented for the pH neutralization process. The tuning considered a performance metrics in which the average MAPE of the two variables of interest is minimal. The optimal model, obtained through the grid search, had a reservoir with 2000 units and a leak rate of 0.7. The spectral radius optimal value was \(\rho (W_\mathrm{{res}}^\mathrm{{res}}) = 0.9\) and a value \( \beta = 0.01 \) was used as the regularization parameter. In this tuning, matrix \(\mathbf{W} _\mathrm{{res}}^\mathrm{{res}}\) was initialized considering a connection rate \( c_\mathrm{{res}}^\mathrm{{res}} = 0.01\). The impact of the hyperparameter tuning in the model performance is summarized in Fig. 6. In this case study, for the sake of brevity, only the two most relevant hyperparameters are presented: spectral radius and leak rate.

Fig. 6
figure 6

Hyperparameter impact on the ESN model performance for the refrigeration compressor test rig. Evaluated hyperparameters were: a spectral radius and b leak rate for the suction pressure, and c spectral radius and d leak rate for the discharge pressure. The black lines represent the mean MAPE value, the gray regions represent two standard deviations, and the asterisks represent the final selected values

The model presented the best results for spectral radius values in the [0.8, 1.0] range, with both outputs presenting the best performance for values closer to 0.9. For the leak rate there is a different error behavior for each model output: \( y_2 \) shows smaller errors for \( \alpha = 0.9 \), while \( y_1 \) has a lower MAPE for \( \alpha = 0.7 \). When considering both outputs, selecting \( \alpha = 0.7\) resulted in the best performance compromise.

4 Case Studies: Results and Discussion

The final selected models were evaluated using the test portions of the data series. This portion is contiguous and represents a complete sequence of steps in the input variables. A graphical representation of the results is presented in Fig. 7, with the results for the pH neutralization process presented in Fig. 7a, and the results for the refrigeration compressor test rig shown in Fig. 7b. For both cases, the ESN model outputs are compared with the process outputs.

Fig. 7
figure 7

Comparison between the process output (black line) and the ESN model output (red line) for: a pH neutralization process and b refrigeration compressor test rig (Color figure online)

The performance indexes of the test results are summarized in a table for each case study. For this analysis, the mean squared error (MSE), the \(R^2\) correlation criterion, and the mean absolute percentage error (MAPE) are used as performance indexes. In Table 2, the results for the pH neutralization process are presented, while in Table 3, the results for both outputs of the refrigeration compressor case study are detailed. The results of the proposed ESN approach are compared with linear and nonlinear baseline models. As linear models, first- and second-order structures were considered. As nonlinear baseline model, a structure based on an extreme learning machine and a Hammerstein nonlinear model (ELM–Hammerstein) was used (Tang et al. 2014). This model consists of a Hammerstein model with an ELM neural network as its nonlinear static part. This baseline model was selected because the ELM neural network presents a learning paradigm similar to the one of ESN, where a neural network is generated using a random distribution and only a portion of the full model is trained, which results in a less computationally expensive model training procedure. The ELM model was built with the same number of neurons used for the final ESN model considered in each case study. The linear portion of the ELM–Hammerstein structure was implemented as a first-order model in both case studies. The last baseline is a nonlinear model which considers a recurrent neural network based on the long short-term memory (LSTM), as detailed in Schwedersky et al. (2019). This model was built using a single fully connected hidden layer, with 10 LSTM units.

Table 2 Comparison between the model performance metrics for the pH neutralization process
Table 3 Comparison between the model performance metrics for the refrigeration compressor test rig

The model obtained with the proposed ESN approach presented good results for both case studies. For the pH neutralization process, the ESN model presented a high fidelity, which can be verified through inspection of 7a. This fact is also supported by the performance indexes presented in 2, which shows that for this case study both the LSTM and the proposed model are the choices. Similarly, the ESN model presented the best results for suction pressure in the refrigeration compressor test rig case study, and similar performance to the other nonlinear baseline models for discharge pressure. Even though the ELM–Hammerstein model provided the best results for discharge pressure, the differences obtained for this variable with the three nonlinear model alternatives are quite small. Even though a combination of an ESN and an ELM–Hammerstein model is the best choice for this process, it is usually not worth the implementation effort required to obtain this model. If a single model is to be chosen, the natural choice would be the ESN model, as it presented better results for suction pressure and almost the same result as the best model for discharge pressure. For this process, it is important to note that some errors related to the static characteristics of the process appear, especially in the discharge pressure. These errors are present in all models and are related to unmeasured disturbances, mainly the compressor and ambient temperatures, which can change between tests and affect the compressor operating condition. Temperature changes affect both the density of the refrigerant fluid and the solubility of the refrigerant in the oil used for lubricating the compressor moving parts (Björk and Palm 2006). As a consequence, temperature changes affect the mass flow rate and the charge of refrigerant fluid, which reflect as changes in the operating pressures. The performance indexes for this case study, summarized in Table 3, show that, even for a real process subject to unmeasured disturbances, the ESN model presented better results than the baseline methods, except for an equivalent result to the one of the ELM–Hammerstein for the discharge pressure.

Even though both case studies consider ESN models with large reservoir sizes, the computational cost required to train the proposed models is still reasonable for practical applications. The time required for training the model for the pH neutralization process considering a data series of 5000 samples, a 3.6 GHz AMD Ryzen 2400G processor, and a MATLAB implementation is presented in Fig. 8a for different reservoir sizes. The time required to train a model with reservoir sizes of 500 units was less than one second, while the time required for a reservoir with 5000 units was about 80 s. Since the training process is usually performed offline, using larger reservoir sizes is not an obstacle to the implementation of the proposed method. In addition, the computational cost to perform predictions using the model is also reasonable, as shown in Fig. 8b. For reservoir sizes smaller than 1000 units, the required time is in the order of a few milliseconds, while a reservoir with 5000 units requires computing times smaller than 15 ms, which is a reasonable value for a wide range of applications. The other case study presented equivalent results, but they are not presented in this paper for the sake of brevity.

Fig. 8
figure 8

Comparison between the total computation time required to: a train the ESN model and b perform a one step-ahead prediction considering different reservoir sizes

5 Conclusions

In this paper, an approach to multiple-input and multiple-output nonlinear system identification based on the echo-state network was presented. This approach formalizes some of the best practices in the literature to create an ESN, and proposes a procedure to tune its main hyperparameters. The application of the proposed method was demonstrated in two case studies, a simulated pH neutralization process, as presented in Henson and Seborg (1994), and an experimental case which considers a real refrigeration compressor test rig used in industry.

Based on the results of both case studies, it was possible to show that the ESN model creation and its tuning can have a strong impact on the final model performance. Besides the use of good practices to create the ESN reservoir, the process to acquire a suitable dataset for system identification and the choice of the ESN hyperparameters deserve special attention, since input data which contain strong information about the system dynamics and careful tuning are necessary for obtaining a good model. For both case studies, the spectral radius and the leak rate were the hyperparameters that had the largest impact on the final model performance. For both case studies, the spectral radius and the leak rate were the hyperparameters whose tuning presented the greatest impact in the final model performance.

For all the case studies, the proposed models based on the ESN approach obtained the best results among all the evaluated methods (two linear models, an ELM–Hammerstein model, and an LSTM network model). These results indicate that the use of the proposed approach, based on the ESN, is an alternative to classical techniques for nonlinear system identification. It enables the identification of MIMO nonlinear systems in a data-driven approach, without much need for defining model structures and orders based on knowledge about the process dynamics and its nonlinearities, which is the case of classical approaches, such as the linear and Hammerstein models.