Keywords

1 Introduction

Hybrid systems are heterogeneous dynamical systems that arise out of the interaction of continuous and discrete dynamics. The continuous behavior is the fact of the natural evolution of the physical process whereas the discrete behavior can be due to the presence of switches, operating phases, transitions, computer program codes, etc. These hybrid dynamics characterize the behavior of a broad class of physical systems, for example, the real-time control systems where physical processes are controlled by embedded controllers. The notion of hybrid system can also be used to represent complex nonlinear continuous systems. In fact, the operating range of a nonlinear system can be decomposed into a group of operating point. For each operation point, we associate a simple sub-model (linear or affine) with it. Indeed, a complex system can be modeled as a hybrid system switching between simple sub-models.

This chapter addresses the problem of identification of hybrid systems represented by piecewise autoregressive models with exogenous input (PWARX). This problem consists in building mathematical models of hybrid systems from observed input-output data. The PWARX models have attracted a considerable attention in recent years, since they provide an efficient solution for modeling a wide range of engineering applications (Roll et al. 2004; Nakada et al. 2005; Wen et al. 2007; Xu et al. 2012). In addition, these models are able to approximate any nonlinear system with arbitrary accuracy (Lin and Unbehauen 1992). Moreover, the PWA model can be considered as a generic representation for other hybrid models such as jump linear models (JL models) (Vidal et al. 2002), Markov jump linear models (MJL models) (Doucet et al. 2001), mixed logic dynamical models (MLD models) (Bemporad et al. 2000), max-min-plus-scaling systems (MMPS models) (De Schutter and Van den Boom 2000), linear complementarity models (LC models) (Vander-Schaft and Schumacher 1998), extended linear complementarity models (ELC models) (De Schutter and De Moor 1999). In fact, the transfer of the results of PWARX models to other classes of hybrid systems is insured thanks to the properties of equivalence of PWARX models (Heemels et al. 2001). The PWARX models are obtained by decomposing the regression domain into a finite number of non-overlapping convex polyhedral regions and by associating a simple linear model with each region. Consequently, two main problems must be considered for the identification of PWARX models: one is the estimation of the parameters of the sub-models and two is the determination of the hyperplanes defining the partitions of the state-input regression. Consequently, the identification of PWARX models is one of the most difficult problems that represent an area of research where considerable work has been done in the last decade. In fact, numerous solutions have been proposed in the literature for the identification of the PWARX models such as the clustering-based solution (Ferrari-Trecate et al. 2003), the Bayesian solution (Juloski et al. 2005), the bounded-error solution (Bemporad et al. 2005), the greeting solution (Bemporad et al. 2003), the sparse optimization solution (Bako 2011; Bako and Lecoeuche 2013), and so on. The sparse solutions do not smooth out the effect of the measurement noise. Then, they often fail in real time applications since the measurement data are usually contaminated by an unknown additional noise. The greedy algorithms are very time consuming since they involve the solution of NP-hard problems. In addition, it can cause a loss of information because it sometimes fails to associate data to the appropriate regressors. The Bayesian approach assumes that the probability density functions of the unknown parameters of the system are known a priori. Otherwise, it requires an additional sequential processing to improve the identification results. The clustering solution is based on a simple and instructive procedure. It does not require a priori knowledge of the system. Therefore, only the clustering approach is considered in this chapter. This solution consists of three main steps, which are data classification, parameter estimation and region reconstruction. It is easy to remark that the performance of this approach depends on the efficiency of the used classification algorithm (Lassoued and Abderrahim 2013a, b, c, d, 2014a, b). The early methods have favored the simplicity of implementation. In fact, they present several drawbacks, which can be summarized as follows:

  • Most of them are based on the optimization of nonlinear criteria. Consequently, they may converge to local minima in the case of poor initializations.

  • Their performances degrade in the case of the presence of outliers in the data to be classified.

  • Most of them assume that the number of sub-models is a priori known.

To overcome these problems, we have proposed the use of other clustering algorithms such as Chiu’s method (Chiu 1997) and Density Based Spatial Clustering of Applications with Noise (DBSCAN) method (Chaitali 2012; Sander et al. 1998). This choice is justified by the fact that these algorithms automatically generate the number of models. In addition, they are characterized by their robustness to the classification of noisy measurements that containing also outliers.

This chapter is organized as follows. Section 2 presents the assumptions for PWARX model identification. In Sect. 3, we recall the main steps of the identification of PWARX systems based on clustering algorithm and its main drawbacks. Section 4 proposes two solutions to overcome the main problems of the existing methods. In Sect. 5, we present three simulation examples in order to illustrate the performance of the proposed solutions and to compare their efficiency with the modified k-means method. Section 6 proposes an application of the developed approach to an olive oil esterification reactor.

2 Piecewise Affine System Identification

Consider a discrete-time PieceWise Auto-Regressive eXogenous model (PWARX) with input \(u(k) \in {\mathbb{R}}\), output \(y(k) \in {\mathbb{R}}\) defined in the bounded polyhedron regressor space \(H \subset {\mathbb{R}}^{d}\) (\(d = n_{a} + n_{b} + 1\)). The system is decomposed in s different modes \(\left\{ {H_{i} } \right\}_{i = 1}^{s}\), in each one an ARX model is associated:

$$y(k) = f(\varphi (k)) + e(k).$$
(1)

f is a piecewise affine function defined by:

$$f(\varphi ) = \left\{ {\begin{array}{*{20}l} {\theta_{1}^{T} \bar{\varphi }\quad if\;\varphi \in H_{1} } \hfill \\ \vdots \hfill \\ {\theta_{s}^{T} \bar{\varphi }\quad if\;\varphi \in H_{s} } \hfill \\ \end{array} } \right.$$
(2)

where

$$\overline{\varphi } = \left[ {\begin{array}{*{20}l} {\varphi^{T} } & 1 \\ \end{array} } \right]^{T}.$$
(3)

e(k) is the additive noise and \(\varphi (k)\) is the regressor vector, containing past input and output observations, defined as:

$$\varphi (k) = \left[ {\begin{array}{*{20}l} {y(k - 1) \ldots y(k - n_{a} )} & {u(k - 1) \ldots u(k - n_{b} )} \\ \end{array} } \right]^{T}.$$
(4)

\(\theta_{i} \in R^{d + 1}\) is the parameter vector, valid in H i , defined as follows:

$$\theta_{i}^{T} = \left[ {\begin{array}{*{20}l} {a_{1} } & {a_{2} } & \ldots & {a_{{n_{a} }} } & {b_{1} } & {b_{2} } & \ldots & {b_{{n_{b} }} } & g \\ \end{array} } \right]$$
(5)

where a i and b i are the coefficients of the model related respectively to the output and the input data, while n a and n b are the model orders. g is the independent affine coefficient.

Problem statement

Given input-output data generated by a PWARX system, we are interested simultaneously in identifying the number of submodels s, the parameter vectors \(\left\{ {\theta_{i} } \right\}_{i = 1}^{s}\) and the partitions \(\left\{ {H_{i} } \right\}_{i = 1}^{s}\) taking into account the following assumptions:

  • The orders n a and n b of the system are known.

  • The noise e(k) is assumed to be a Gaussian process independent and identically distributed with zero mean and finite variance \(\sigma^{2}\).

  • The regions \(\left\{ {H_{i} } \right\}_{i = 1}^{s}\) are the polyhedral partitions of a bounded domain \(H \subset {\mathbb{R}}^{d}\) such that:

$$\left\{ {\begin{array}{*{20}l} {\bigcup\nolimits_{i = 1}^{s} H_{i} = H} \\ {H_{i} \bigcap H_{j} = \emptyset \quad \forall\,i \ne j} \\ \end{array} } \right.$$
(6)

3 Clustering Based PWARX Identification

The main steps of the clustering-based approach for the identification of PWARX models can be summarized as follows: constructing small data set from the initial data set, estimating a parameter vector for each small data set, classifying the parameter vectors in s clusters, classifying the initial data set and estimating the s sub-models with their partitions.

  1. 1.

    Form \(\left\{ {\varphi (k),y(k)} \right\}_{k = 1}^{N}\) from the given dataset \(S = (u(k),y(k)),\;k = 1, \ldots,N\)

  2. 2.

    Create local datasets C k and identify the local parameter vectors \(\theta_{k}\)

    1. (a)

      Choose \(n_{\rho }\), the cardinality of data points to be contained in C k , randomly.

    2. (b)

      For each dataset \(\varphi (k),y(k)\), build C k containing \(\left\{ {\varphi (k),y(k)} \right\}\) and its \((n_{\rho } - 1)\) nearest neighbors satisfying:

      $$\left\| {\varphi (k) - {\kern 1pt} \overset{\lower0.5em\hbox{$\smash{\scriptscriptstyle\smile}$}}{\varphi } } \right\|^{2} \le \left\| {\varphi (k) - \hat{\varphi }} \right\|^{2},\quad\forall\,(\hat{\varphi },\hat{y}) \notin C_{k}.$$
      (7)
    3. (c)

      Determine \(\theta_{k}\) for each data in \(C_{k},k = 1, \ldots,N\) using the least square method.

      $$\theta_{k} = (\phi_{k}^{T} \phi_{k} )^{ - 1} \phi_{k}^{T} Y_{k}.$$
      (8)

      where

      $$\phi_{k} = \left[ {\overline{\varphi }\, (t_{k}^{1} ) \ldots \overline{\varphi }\, (t_{k}^{{n_{\rho } }} )} \right]^{T},$$
      $$Y_{k} = \left[ {y\,(t_{k}^{1} ) \ldots y\,(t_{k}^{{n_{\rho } }} )} \right]^{T}.$$

      and (\(t_{k}^{1}, \ldots,t_{k}^{{n_{\rho } }}\)) are the indexes of the elements belonging in C k

  3. 3.

    Cluster the local parameter vectors (\(\theta_{k},\;k = 1, \ldots,N\)) into s disjoint clusters while determining the value of s by using a suitable classification technique.

  4. 4.

    Identify the final models \(\left\{ {\theta_{i} } \right\}_{i = 1}^{s}.\)

  5. 5.

    Estimate the polyhedral partitions \(\left\{ {H_{i} } \right\}_{i = 1}^{s}\) i.e. estimate the hyperplanes separating H i from H j , \(i \ne j\). This is a standard pattern recognition/classification problem that can be solved by several established techniques. The most common technique is the Support Vector Machines (SVM) (Wang 2005; Duda et al. 2001).

The classification of data represents the main step for PWARX system identification because a successful identification of models’ parameters and hyperplanes depends on the correct data classification. For the sake of simplicity, the early approaches use classical clustering algorithms for the data classification such as k-means algorithms.

However, these algorithms present several drawbacks. In fact, they may converge to local minima in the case of poor initializations because they are based on the minimization of non linear criterion. Furthermore, their performances degrade in the case of the presence of outliers in the data to be classified. In addition, most of them assume that the number of sub-models is a priori known.

4 The Proposed Clustering Techniques

In order to improve the identification results we propose the use of other classification algorithms such as Chiu’s alogorithm and DBSCAN algorithm.

4.1 The Chiu’s Clustering Technique

The Chiu’s clustering method is a modified form of the Mountain method for cluster estimation (Chiu 1994). Each data point is considered as a potential cluster center instead of considering it as a grid point. This method is very advantageous compared with the Mountain method:

  • The number of points to be evaluated is equal to the number of data points.

  • It does not need to specify a grid solution which trades off between the accuracy and the computational complexity.

  • It improves the computational efficiency and robustness of the original method.

Chiu’s classification method consists in computing a potential value for each point of the data set based on its distances to the other data points and consider each data point as a potential cluster center. The point having the highest potential value is chosen as the first cluster center. The key idea in this method is that once the first cluster center is chosen, the potential of all other points is reduced according to their distance from the cluster center. All the points which are close to the first cluster center will have greatly reduced potentials. The next cluster center take then the highest remaining potential value. The procedure for determining a new center and updating other potentials is executed until a predefined condition is reached. This condition depends on the minimum value of the potentials or the required number of clusters which are reached.

This method consists in computing a potential value for each point (\(\theta_{i},\;i = 1, \ldots,N\)), based on its distances to the other data points and consider each data point as a potential cluster center. The potential is computed using the following expression:

$$P_{i} = \sum\limits_{j = 1}^{N} e^{{ - \frac{4}{{r_{a}^{2} }}\left\| {\theta_{i} - \theta_{j} } \right\|^{2} }}.$$
(9)

The potential of each local parameter is a function of the distance from this parameter to all the other local parameters. Thus, a local parameter with many neighboring local parameters will have the highest potential value. The constant r a is the radius defining the neighborhood which can be determined by the following expression:

$$r_{a} = \frac{\alpha }{N}\sum\limits_{i = 1}^{N} \frac{1}{{n_{\rho } }}\sum\limits_{j = 1}^{{n_{\rho } }} \left\| {\theta_{i} - \theta_{j} } \right\|.$$
(10)

where \(\alpha\) can be chosen as follows \(0 < \alpha < 1\).

Equation (9) can be exploited to eliminate the outliers. As this equation attribute to the outliers a low potential, we can fix a threshold \(\gamma\) under which the local parameters are not accepted and then removed from the data set. This threshold is described by the following equation:

$$\gamma = \hbox{min} (P) + \beta \left( {\hbox{max} (P) - \hbox{min} (P)} \right).$$
(11)

where P is the vector containing the potentials P i such that \(P = \left[ {P_{1}, \ldots,P_{N} } \right]\) and \(\beta\) is a parameter chosen as \(0 < \beta < 1\).

The elimination of outliers reduces the parameter vectors to (\(\theta_{i},\;i = 1, \ldots,N^{{\prime }}\)) (\(N^{{\prime }} < N\)). Then, from this new data set, we select the data point with the highest potential value as the first cluster center.

Let \(\theta_{1}^{*}\) be this first center and \(P_{1}^{*}\) be its potential. The other potentials P j , \((j = 1, \ldots,N^{{\prime }} )\) are then updated using this expression:

$$P_{i} \Leftarrow P_{i} - P_{1}^{*} e^{{ - \frac{4}{{r_{b}^{2} }}\left\| {\theta_{i} - \theta_{1}^{*} } \right\|^{2} }}.$$
(12)

Expression (13) allows to associate lower potentials to the local parameters close to the first center. Consequently, this choice guaranties that these parameters are not selected as cluster centers in the next step. The parameter r b is a positive constant that must be chosen larger than r a to avoid obtaining cluster centers which are too close to each other. The constant r b is computed using this formula:

$$r_{b} = \frac{\alpha }{N}\sum\limits_{i = 1}^{N} \mathop {\hbox{max} }\limits_{{j = 1\;:\;n_{\rho } }} \left( {\left\| {\theta_{i} - \theta_{j} } \right\|} \right).$$
(13)

In general after obtaining the kth cluster center, the potential of every local parameter is updated by the following formula:

$$P_{i} \Leftarrow P_{i} - P_{k}^{*} e^{{ - \frac{4}{{r_{b}^{2} }}\left\| {\theta_{i} - \theta_{k}^{*} } \right\|^{2} }}.$$
(14)

where \(P_{k}^{*}\) and \(\theta_{k}^{*}\) are respectively the potential and the center of the kth local parameter.

The number of sub-models s is a parameter that we would like to determine. Therefore, we have developed some criteria for accepting or rejecting the cluster centers as it is explained in the algorithm of the next section.

To search the elements belonging to each cluster, we compute the distance between the estimated output and the real one and classify \(\varphi (k)\) within the cluster which has the minimum distance.

$$\arg \;\hbox{min} \left( {\theta_{i}^{T} \overline{\varphi }_{k} - y_{k} } \right), \quad i = 1, \ldots,s.$$
(15)

The Chiu’s clustering technique can be summarized by the following algorithm:

While \(\varepsilon\) is a small parameter characterizing the minimum distance between the new cluster center and the existing ones.

4.2 The DBSCAN Clustering Technique

The Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is a pioneer algorithm of density-based clustering (Chaitali 2012; Sander et al. 1998). This algorithm is based on the concepts of density-reachability and density-connectivity. These concepts depend on two input parameters: epsilon (\(\varepsilon\)) and (MinPts).

  • \(\varepsilon\): is the radius around an object that defines its \(\varepsilon\)-neighborhood.

  • MinPts: is the minimum number of points.

For a given object q, when the number of objects within the \(\varepsilon\)-neighborhood is at least MinPts, then q is defined as a core object. All objects within its \(\varepsilon\)-neighborhood are said to be directly density-reachable from q.

In general, an object p is considered density-reachable if it is within the \(\varepsilon\)-neighborhood of an object that is directly density-reachable or just density-reachable from q. The objects p and q are said to be density-connected if there exist an object g that both p and q are density-reachable from.

The DBSCAN algorithm define then a cluster as the set of objects in a data set that are density-connected to a particular core object. Any object that is not part of a cluster is categorized as noise. For a given data set \(S = \left\{ {\theta_{k} } \right\}_{k = 1}^{N}\), \(\varepsilon\) and MinPts as inputs, the \(\varepsilon\)-neighborhood of a point \(\theta_{i}\) is defined as:

$$N_{\varepsilon } (\theta_{i} ) = \left\{ {\theta_{j} \in S;\;\left\| {\theta_{i} - \theta_{j} } \right\| \le \varepsilon } \right\}$$
(17)

The DBSCAN constructs clusters by checking the \(\varepsilon\)-neighborhood of each object in the data set. If the cardinal of the \(\varepsilon\)-neighborhood (denoted by \(cN_{\varepsilon }\)) of an object \(\theta_{k}\) contains more than MinPts, a new cluster is created having \(\theta_{k}\) as core. The DBSCAN then iteratively collects directly density-reachable objects from these core objects. The process terminates when no new objects can be added to any cluster. The main steps of this algorithm can be summarized as follows:

5 Simulation Examples

In this section, we aim at illustrating the performance of the proposed methods with three simulation examples. First of all, we take an academic PWARX model where the proposed methods are compared with the well known k-means one (Ferrari-Trecate et al. 2001, 2003). After that, a nonlinear model is considered to show the efficiency of the proposed methods in approximating any nonlinear model. Finally, a pH neutralization process is simulated in order to prove their ability to model complex systems and to determine the number of sub-models.

5.1 Quality Measures

To achieve the purpose of these simulations, we consider the following quality measures (Juloski et al. 2006):

  • The maximum of relative error of parameter vectors is defined by

    $$\varDelta_{\theta } = \mathop {\hbox{max} }\limits_{i = 1, \ldots,s} \frac{{\left\| {\theta_{i} - \overline{\theta }_{i} } \right\|_{2} }}{{\left\| {\overline{\theta }_{i} } \right\|_{2} }}$$
    (18)

    where \(\overline{\theta }_{i}\) and \(\theta_{i}\) are the true and the estimated parameter vectors for sub-model i, respectively. The identified model is deemed acceptable if \(\varDelta_{\theta }\) is small or close to zero.

  • The averaged sum of the squared residuals is defined by

    $$\sigma_{e}^{2} = \frac{1}{s}\sum\limits_{i = 1}^{s} \frac{{SSR_{i} }}{{\left| {D_{i} } \right|}}$$
    (19)

    where \(SSR_{i} = \sum\limits_{{(y(k),\varphi (k)) \in D_{i} }} (y(k) - \left[ {\varphi (k)^{{\prime }} 1} \right]\theta_{i} )^{2}\) and \(\left| {D_{i} } \right|\) is the cardinality of cluster \(D_{i}\).

    The identified model is considered acceptable if \(\sigma_{e}^{2}\) is small and/or close to the expected noise variance of the true system.

  • The percentage of the output variation that is explained by the model is defined by

    $$FIT = 100 \cdot \left( {1 - \frac{{\left\| {\hat{y} - y} \right\|_{2} }}{{\left\| {y - \overline{y} } \right\|_{2} }}} \right)$$
    (20)

    where \(\hat{y}\) and y are the estimated and the real outputs’ vectors, respectively, and \(\overline{y}\) is the mean value of y.

    The identified model is considered acceptable if FIT is close to 100.

  • The relative error expressed in percentage (%) is given by:

    $$e_{r} \left( k \right) = 100 \cdot \frac{{\left| {y\left( k \right) - \hat{y}\left( k \right)} \right|}}{{\left| {y\left( k \right)} \right|}}$$
    (21)

    where \(\hat{y}(k)\) and y(k) are the estimated and the real outputs at time k.

    The identified model is considered acceptable if e r is close to 0 %.

5.2 Identification Results of a PWARX Model

Consider the following PWARX model (Boukharouba 2011):

$$y(k) = \left\{ {\begin{array}{*{20}l} {\left[ {\begin{array}{*{20}l} {0.4} & {0.5} & {0.3} \\ \end{array} } \right]\bar{\varphi }(k) + e(k)} & {if\;\varphi (k) \in H_{1} ,} \\ {\left[ {\begin{array}{*{20}l} { - 0.7} & {0.6} & { - 0.5} \\ \end{array} } \right]\bar{\varphi }(k) + e(k)} & {if\;\varphi (k) \in H_{2} ,} \\ {\left[ {\begin{array}{*{20}l} {0.4} & { - 0.2} & { - 0.2} \\ \end{array} } \right]\bar{\varphi }(k) + e(k)} & {if\;\varphi (k) \in H_{3} ,} \\ \end{array} } \right.$$
(22)
$$\begin{aligned} H_{1} & = \left\{ {\varphi \in \Re^{2}:\left[ {\begin{array}{*{20}c} 1 & {0.3} & 0 \\ \end{array} } \right]\overline{\varphi } \ge 0\;{\text{and}}\;\left[ {\begin{array}{*{20}c} 0 & {0.5} & 0 \\ \end{array} } \right]\overline{\varphi } > 0} \right\} \\ H_{2} & = \left\{ {\varphi \in \Re^{2}:\left[ {\begin{array}{*{20}c} 1 & {0.3} & 0 \\ \end{array} } \right]\overline{\varphi } \le 0\;{\text{and}}\;\left[ {\begin{array}{*{20}c} 1 & { - 0.3} & 0 \\ \end{array} } \right]\overline{\varphi } < 0} \right\} \\ H_{3} & = \left\{ {\varphi \in \Re^{2}:\left[ {\begin{array}{*{20}c} 1 & { - 0.3} & 0 \\ \end{array} } \right]\overline{\varphi } \ge 0\;{\text{and}}\;\left[ {\begin{array}{*{20}c} 0 & {0.5} & 0 \\ \end{array} } \right]\overline{\varphi } \le 0} \right\} \\ \end{aligned}$$
(23)

where s = 3, n a  = 1, n b  = 1, and \(\varphi (k) = \left[ {\begin{array}{*{20}c} {y(k - 1)} & {u(k - 1)} \\ \end{array} } \right]^{T}\) is the regressor vector.

System (22) is simulated using an input signal u(k) and a noise signal e(k) which are normal distributions with variances respectively 0.5 and 0.05. The output y(k) is presented in Fig. 1.

Fig. 1
figure 1

The real output of the system (squares: output of sub-model 1, triangles: output of sub-model 2 and dots: output of sub-model 3)

Table 1 presents the estimated parameter vectors obtained with the proposed methods and the k-means one.

Table 1 Estimated parameters

After obtaining the estimated parameter vectors, we apply the SVM algorithm in order to estimate the regions. We can then attribute each parameter vector to the corresponding region where it is defined. The estimated outputs obtained with three algorithms are presented in Fig. 2.

Fig. 2
figure 2

The estimated outputs a with Chiu, b with DBSCAN, and c with k-means

Table 2 presents the quality measures (18), (19) and (20) of the two proposed methods and the k-means method. The obtained results prove the efficiently of the proposed methods compared with the existing method (k-means).

Table 2 Validation results

5.3 Identification Results of a Nonlinear Model

Consider the nonlinear system described by the following equation (Lai et al. 2010):

$$\begin{aligned} y(k) & = \frac{1.5y(k - 1)y(k - 2)}{{1 + y^{2} (k - 1) + y^{2} (k - 2)}} + \sin \left( {y(k - 1) + y(k - 2)} \right) \\ &\quad + u(k - 1) + 0.8u(k - 2) \\ \end{aligned}$$
(24)

This nonlinear system can be modeled by a PWARX model of the form (Lai 2011):

$$y(k) = \left\{ {\begin{array}{*{20}l} {\theta_{1}^{T} \overline{\varphi } (k) \quad if\;\varphi \in H_{1} } \\ \vdots \\ {\theta_{s}^{T} \overline{\varphi } (k) \quad if\;\varphi \in H_{s} } \\ \end{array} } \right.$$
(25)

where

$$\varphi (k) = \left[ {y(k - 1),\;y(k - 2),\;u(k - 1),\;u(k - 2)} \right]^{T}$$
(26)
$$\overline{\varphi } = \left[ {\begin{array}{*{20}c} {\varphi^{T} } & 1 \\ \end{array} } \right]^{T}.$$
(27)

\(\theta_{i}\) are the parameter vectors and s is the number of submodels to be determined. u(k) is a random input in the range of [−2, 2].

For the DBSCAN based method, the choice of the synthesis parameters \(n_{\rho }\), MinPts and \(\varepsilon\) is as follows:

$$\left\{ {\begin{array}{*{20}l} {n_{\rho } = 20;} \\ {MinPts = 35;} \\ {\varepsilon = 0.85} \\ \end{array} } \right.$$

For the Chiu clustering algorithm, we have only one synthesis parameter: \(n_{\rho } = 17\).

The number of submodels s depends on the initial parameters chosen. With the parameters described above, we obtain s = 6.

The parameter vectors are presented in Tables 3 and 4.

Table 3 Estimated parameter vectors with the Chiu’s method
Table 4 Estimated parameter vectors with the DBSCAN method

Figures 3 and 4 illustrate the outputs and the relative error signals of the two proposed methods.

Fig. 3
figure 3

The estimated outputs a with Chiu, and b with DBSCAN

Fig. 4
figure 4

The relative error a with Chiu, and b with DBSCAN

In Table 5, the FIT is computed for the identification and the validation with the two proposed methods. The obtained results are very satisfactory and show that the performances of the two methods are close.

Table 5 Quality measures with the two proposed methods

5.4 Identification Results of a PH Neutralisation Process

5.4.1 Process Description

The ‘neutralization’ is used to describe the reaction result between an acid and a base in which the properties of \({\text{H}}^{ + }\) and \({\text{OH}}^{ - }\) that characterized the acid and base will be destroyed or neutralized. In fact, the ions \({\text{H}}^{ + }\) and \({\text{OH}}^{ - }\) will be combined to form the water molecule \({\text{H}}_{2} {\text{O}}\). The resulting solution produced by the reaction is composed of a salt and water. The general formula for acid–base neutralization reactions can be written as:

$$acid + base \to salt + water$$
(28)

The process of pH neutralization (see Fig. 5) is constituted essentially of a treatment tank of cross sectional area A, a mixer, acid and base injection pipes, a pH probe, a level sensor to measure the level h in the tank and a discharge valve (Henson and Seborg 1994; Salehi et al. 2009). It consists of an acid stream q 1, buffer stream q 2 and base stream q 3 that are mixed in the tank. The effluent stream q 4 exits the tank via the discharge valve with an adjusted pH m . The streams \(\left\{ {q_{i} } \right\}_{i = 1}^{4}\) are characterized by the following parameters:

Fig. 5
figure 5

A pH neutralization process

  • \(\left\{ {W_{ai} } \right\}_{i = 1}^{4}\) are the charge related quantities for \(\left\{ {q_{i} } \right\}_{i = 1}^{4}\).

  • \(\left\{ {W_{bi} } \right\}_{i = 1}^{4}\) are the mass balance quantities for \(\left\{ {q_{i} } \right\}_{i = 1}^{4}\).

The pH probe introduces a delay time \(\tau\) in the measured pH m value such as \(pH_{m} = pH(t - \tau )\).

The objective of the pH neutralization process is to control the pH value of the effluent through manipulating the base flow rate q 3 while considering the acid flow rate q 1 and the buffer flow rate q 2 as disturbances.

The dynamic model of the neutralization process is developed as follows:

  • The pH value of the obtained solution is derived from the conservation equations and equilibrium reactions as follows:

    $$W_{a4} + \frac{{K_{w} }}{{\left[ {H^{ + } } \right]}} + W_{b4} \frac{{\frac{{K_{a1} }}{{\left[ {H^{ + } } \right]}} + \frac{{2K_{a1} K_{a2} }}{{\left[ {H^{ + } } \right]^{2} }}}}{{1 + \frac{{K_{a1} }}{{\left[ {H^{ + } } \right]}} + \frac{{K_{a1} K_{a2} }}{{\left[ {H^{ + } } \right]^{2} }}}} - \left[ {H^{ + } } \right] = 0.$$
    (29)

    Knowing that

    $$pH_{m} = - \log \left( {\left[ {H^{ + } } \right]} \right)$$
    (30)
    $$K_{w} = \left[ {H^{ + } } \right]\left[ {OH^{ - } } \right],$$
    (31)

    Equation (29) can be then rewritten as:

    $$W_{a4} + 10^{{pH_{m} - 14}} + W_{b4} \frac{{1 + 2\left( {10^{{pH_{m} - pK_{a2} }} } \right)}}{{1 + 10^{{pH_{m} - pK_{a1} }} + 10^{{pH_{m} - pK_{a2} }} }} - 10^{{ - pH_{m} }} = 0$$
    (32)
  • The mass balance yields to:

    $$A\frac{dh}{dt} = q_{1} + q_{2} + q_{3} - q_{4}$$
    (33)

    Taking into account that the exit flow rate \(q_{4} = C_{v}.h^{0.5}\), Eq. (33) becomes:

    $$A\frac{dh}{dt} = q_{1} + q_{2} + q_{3} - C_{v} \cdot h^{0.5}$$
    (34)

    where \(C_{v}\) is the constant valve coefficient.

  • The differential equations of the effluent reaction invariants \((W_{a4},W_{b4} )\) can be determined as follows:

    $$A{\kern 1pt} {\kern 1pt} h\frac{{dW_{a4} }}{dt} = q_{1} (W_{a1} - W_{a4} ) + q_{2} (W_{a2} - W_{a4} ) + q_{3} (W_{a3} - W_{a4} )$$
    (35)
    $$A{\kern 1pt} {\kern 1pt} h\frac{{dW_{b4} }}{dt} = q_{1} (W_{b1} - W_{b4} ) + q_{2} (W_{b2} - W_{b4} ) + q_{3} (W_{b3} - W_{b4} )$$
    (36)

Nominal model parameters and operating conditions (Xiao et al. 2014) are given in Table 6.

Table 6 Operation parameters of the pH neutralization process

The static nonlinearity of this process can be represented by the titration curve shown in Fig. 6 with a beginning pH of 2.7 and an ending pH of 10.7. A brief glance at the curve indicates that the process of pH neutralization is highly nonlinear.

Fig. 6
figure 6

The titration curve

5.4.2 Structure Identification

It was mentioned that the early approaches of identification of pH neutralization process approximate this process around an operating range as a First Order Plus Delay Time model. Added to that, the evolution of the pH in Fig. 7, for a fixed values of the input q 3, is similar to a first order system response.

Fig. 7
figure 7

The pH evolution with different values of q 3

Therefore, we propose to represent the sub-models by a discrete first order plus dead time models (n a  = 1, n b  = 2) defined by:

$$y(k) = \left\{ {\begin{array}{*{20}l} {a_{1,1} y(k - 1) + b_{1,1} u(k - 1) + b_{1,2} u(k - 2)} \hfill \\ {\quad if\; \varphi (k) \in H_{1} } \hfill \\ \vdots \hfill \\ {a_{s,1} y(k - 1) + b_{s,1} u(k - 1) + b_{s,2} u(k - 2)} \hfill \\ {\quad if\; \varphi (k) \in H_{s} } \hfill \\ \end{array} } \right.$$
(37)

where the regressor vector is defined by:

$$\varphi (k) = \left[ {y(k - 1),\;u(k - 1),\;u(k - 2)} \right]^{T}$$

and the parameter vectors are denoted by:

$$\theta_{i} (k) = \left[ {a_{i,1},b_{i,1},b_{i,2} } \right],\;i = 1, \ldots,s.$$

5.4.3 Input Design

The input design is an important aspect to be considered when implementing nonlinear system identification experiments. In fact, two main properties must be verified by this input in order to generate representative data measurements to be used in identification purpose. First, the input must be able to excitep the totality of dynamics range present in the system. Second, the used input signal must illustrate the response of the system to a range of amplitude changes since these models have nonlinear gains. For these reasons, we have considered the Multi-Sine sequence as input sequences to identify the ph neutralization process since it satisfies the above two conditions. It presents several frequencies and exhibits different amplitude changes. The dynamic of this input is defined according to the dominant time constant range of the process. The amplitudes are selected to cover the totality operating region around the nominal value of the base flow rate \(q_{3} = 15.6\,{\text{ml/s}}\).

5.4.4 Results

The nonlinear model of the pH process defined by Eqs. (32), (34), (35) and (36) and the parameters of Table 6 is used to generate the output using a Multi-Sine excitation Sequence. The system output is corrupted by a Gaussian white noise with zero mean and standard deviation \(\sigma = 0.001\) in order to simulate industrial situations where the obtained measurements are often noisy. The obtained input-output data illustrated in Fig. 8 are then divided into two parts. The first part is used for the identification and the second is considered for the validation purpose.

Fig. 8
figure 8

The data of the multi-sine input

The number of neighboring is chosen \(n_{\rho } = 85\) for the two methods. The DBSCAN approach uses the following synthesis parameters

$$\left\{ {\begin{array}{*{20}l} {MinPts = 40;} \\ {\varepsilon = 0.18} \\ \end{array} } \right.$$

The number of submodels obtained with these parameters is (s = 6). The parameter vectors are illustrated in Table 7.

Table 7 Estimated parameter vectors

The validation results and the estimated titration curves are presented respectively in Figs. 9 and 10 which shows that the obtained model gives good results in terms of dynamic and nonlinear gain of the pH process.

Fig. 9
figure 9

The validation outputs a with Chiu, and b with DBSCAN

Fig. 10
figure 10

The validation of the titration curve a with Chiu, and b with DBSCAN

Now, we compare the performance of the two proposed methods using the quality measures (19), (20) and (21). The obtained results are summarized in Table 8 and Fig. 11.

Table 8 Quality measures with the two proposed methods
Fig. 11
figure 11

Relative error for the two proposed methods a with Chiu, and b with DBSCAN

6 Experimental Example: A Semi-batch Reactor

6.1 Process Description

The olive oil esterification reactor produces ester with a very high added value which is used in fine chemical industry such as cosmetic products. The esterification reaction between vegetable olive oil with free fatty acid and alcohol, producing ester, is given by the following equation:

$$Acid + Alcohol \leftrightarrow Ester + Water.$$
(38)

The ratio of the alcohol to acid represents the main factor of this reaction because the esterification reaction is an equilibrium reaction i.e. the reaction products, water and ester, are formed when equilibrium is reached. In addition, the yield of ester may be increased if water is removed from the reaction. The removal of water is achieved by the vaporisation technique while avoiding the boiling of the alcohol. In fact, we have used an alcohol (1-butanol), characterized by a boiling temperature of 118 °C which is greater than the boiling temperature of the water (close to 100 °C). In addition, the boiling temperatures of the fatty acid (oleic acid) and the ester are close to 300 °C. Therefore, the boiling point of water may be provided by a temperature slightly greater than 100 °C.

The block diagram of the process is shown in Fig. 12. It is constituted essentially of:

Fig. 12
figure 12

Block diagram of the reactor

  • A reactor with double-jackets: It has a cylindrical shape manufactured in stainless steel. It is equipped with a bottom valve for emptying the product, an agitator, an orifice introducing the reactants, a sensor of the reaction mixture temperature, a pressure sensor and an orifice for the condenser. The double-jackets ensure the circulation of a coolant fluid which is intended for heating or for cooling the reactor.

  • A heat exchanger: It allows to heat or to cool the coolant fluid circulating through the reactor jacket. Heating is carried out by three electrical resistances controlled by a dimmer for varying the heating power. It is intended to achieve the required reaction temperature of the esterification. Cooling is provided by circulating cold water through the heat exchanger. It is used to cool the reactor when the reaction is completed.

  • A condenser: It allows to condense the steam generated during the reaction. It plays an important role because it is also used to indicate the end of the reaction which can be deduced when no more water is dripping out of the condenser.

  • A data acquisition card between the reactor and the calculator.

The ester production by this reactor is based on three main steps as illustrated in Fig. 13.

Fig. 13
figure 13

Specific trajectory of the reactor temperature

6.2 Experimental Results

The alternative of considering a PWA map is very interesting because the characteristic of the system can be considered as piecewise linear in each operating phase: the heating phase, the reacting phase and the cooling phase.

Previous works has demonstrated that the adequate estimated orders n a and n b of each sub-model are equal to two (Talmoudi et al. 2008). Thus, we can adopt the following structure:

$$y(k) = \left\{ {\begin{array}{*{20}l} { - a_{1,1} y(k - 1) - a_{1,2} y(k - 2) + b_{1,1} u(k - 1)} \hfill \\ { + b_{1,2} u(k - 2)\quad if\;\varphi (k) \in H_{1} } \hfill \\ \vdots \hfill \\ {a_{s,1} y(k - 1) + a_{s,2} y(k - 2) + b_{s,1} u(k - 1)} \hfill \\ { + b_{s,2} u(k - 2)\quad if\;\varphi (k) \in H_{s} } \hfill \\ \end{array} } \right.$$
(39)

where the regressor vector is defined by:

$$\varphi (k) = \left[ { - y(k - 1),\; - y(k - 2),\;u(k - 1),\;u(k - 2)} \right]^{T}$$

and the parameter vectors is denoted by:

$$\theta_{i} (k) = \left[ {a_{i,1},\;a_{i,2},\;b_{i,1},\;b_{i,2} } \right],\quad i = 1, \ldots,s.$$

We have picked out some input-output measurements from the reactor in order to identify a model to this process. We have taken two measurement files, one for the identification having a length N = 220 and another one of length N = 160 for the validation.

The measurement file used in this identification is presented in Fig. 14.

Fig. 14
figure 14

The real input-output evolution

We apply the proposed identification procedures in order to represent the reactor by a PWARX model. The number of neighboring is chosen \(n_{\rho } = 70\) with the two proposed techniques. Our purpose is to estimate the number of sub-models s, the parameter vectors \(\theta_{i} (k),\;i = 1, \ldots,s\) and the hyperplanes defining the partitions \(\left\{ {H_{i} } \right\}_{i = 1}^{s}\).

The obtained results are as follows:

  • The number of sub-models is s = 3.

  • The parameter vectors \(\theta_{i} (k)\), \(i = 1,2\;{\text{and}}\;3\) are illustrated in Table 9.

    Table 9 Estimated parameter vectors with the proposed clustering techniques

The attribution of every parameter vector to the submodel that has generated it is ensured by the SVM algorithm. The obtained outputs are then computed and they are represented in Fig. 15.

Fig. 15
figure 15

Estimated outputs with two methods a with Chiu, and b with DBSCAN

To validate the obtained models, we have considered a new input-output measurement file having a length N = 160 shown in Fig. 16.

Fig. 16
figure 16

The validation input

The real and the estimated validation outputs and the errors are presented in Fig. 17.

Fig. 17
figure 17

Estimated validation outputs and the errors with two methods a with Chiu, and b with DBSCAN

7 Conclusion

In this chapter, we have considered only the clustering based procedures for the identification of PWARX systems. We focused on the most challenging step which is the task of data points classification. In fact, we have proposed the use of two clustering techniques which are the Chiu’s clustering algorithm and the DBSCAN algorithm. These algorithms present several advantages. Firstly, they do not require any initialization so the problem of convergence towards local minima is overcome. Secondly, these algorithms are able to remove the outliers from the data set. Finally, our approaches generate automatically the number of sub-models. Numerical simulation results are presented to demonstrate the performance of the proposed approaches and to compare them with the k-means one. Also, an experimental validation with an olive oil reactor is presented to illustrate the efficiency of the developed methods.