1 Introduction

Model predictive control (MPC) refers to a class of control algorithms that employ an explicit dynamic model of a controlled process to predict its future outputs, and determine the control actions through optimization [1]. MPC has been one of the most successful control techniques for the industrial processes with multivariable coupling, constraints, and time-delay [2, 3]. With linear MPC becoming mature, nonlinear MPC becomes the focus of current research as it can contribute to good control performance for highly nonlinear processes.

Many nonlinear models have been integrated in MPC algorithms, such as first-principle models [4], neural networks [5], support vector machines (SVM) [6], and block-oriented models [7,8,9,10]. In particular, a Hammerstein model, which consists of a static nonlinear model followed by a linear dynamic model, has been demonstrated to be able to adequately approximate a large number of processes, such as a binary distillation column [7], a pH neutralization process [11], a continuous stirred tank reactor [12], a solid oxide fuel cell [13], a stretch reflex process [14], a turntable servo system [15], and so on. Compared with other nonlinear models, the Hammerstein model has the advantages of simple structure and easy identification [7, 8], and thus is considered here.

In the most typical case, a polynomial is used as the static part of the Hammerstein model. A significant drawback of this model is that, for accurately approximating complex nonlinearities, the polynomial order is likely to be very high [16]. This will cause the difficulty of identification and oscillatory interpolation. In the view of nonlinear approximation, neural networks are excellent approximators [5]. Unfortunately, training a neural network is not easy. Nonlinear optimization problems have to be solved to compute the network model parameters, and a number of networks should be trained and compared to get a good model. Support vector machines (SVM) are also employed to act the static part of a Hammerstein model [17], but it has the disadvantage of lacking spareness, i.e., including a large number of support vectors and corresponding parameters. For these reasons, it is necessary to find a suitable nonlinear approximator for Hammerstein models. A piecewise linear (PWL) function is an interesting alternative. Based on the so-called simplicial partition method, the canonical PWL representation is studied in [18, 19]. The parameters of canonical PWL functions can be identified by solving a least-square problem; they can be easily calculated by using the existing PWL toolbox [20]. It has been proven that PWL functions can uniformly approximate any Lipschitz-continuous function defined on a compact domain. Moreover, PWL functions do not suffer from the problems of oscillatory interpolation and lacking spareness.

Several MPC algorithms have been developed for dynamic systems represented by Hammerstein models. The main focus of these algorithms is how to deal with the static nonlinearity for optimizing control laws. A pioneering study can be found in [21], where the Hammerstein model is directly integrated into the MPC controller and the corresponding control law is calculated by solving a complex non-convex nonlinear optimization problem. By transforming the nonlinearity to a polytopic description, the Robust MPC (RMPC) has been designed for Hammerstein models [22, 23]. This algorithm just requires to solve a quadratic programming (QP) problem with linear matrix inequality (LMI) constraints, but the conservativeness and the online computational burden should be carefully considered. By treating input nonlinearities as unknown disturbances, the linear MPC with an input disturbance model is also studied for Hammerstein systems with unknown input nonlinearities [24]. The most common and the most used control algorithms for Hammerstein models are the nonlinearity inversion-based MPC algorithms which consist of a linear controller followed by the inversion of input nonlinearity [7, 9, 10]. For multivariable systems with complex nonlinearities, such algorithms have to calculate numerical inversion of the nonlinearities online and require extra transformation of input constraints. An another idea is to find a linear approximation of the model or of the predicted process trajectory which makes formulation of a QP MPC problem possible [5, 17, 25].

This paper develops a nonlinear MPC algorithm based on PWL Hammerstein models. In contrast to the aforementioned MPC algorithms, the proposed MPC algorithm does not require inversion of input nonlinear block, and thus, it can directly integrate input constraints without any transformation. To reduce computational burden, at each sampling period, the predicted output trajectory is firstly linearized at an assumed input trajectory, and then the optimal control actions are simply calculated by solving a QP problem. In particular, due to the character that a PWL function becomes a linear function at a specific input subregion, the derivatives used in the linearization process are obtained in a computationally efficient look-up table style. The developed control algorithm can integrate various disturbance models, which makes it possible to achieve better control performance than the controllers just based on the traditional output disturbance model. This paper extends the work of [17, 25], where only output disturbance models are considered, SVM and neural networks are, respectively, used to act the nonlinear part of the model, and the derivatives of static nonlinearities are calculated online. Two benchmark systems, a continuous stirred tank reactor (CSTR) and a PH neutralization reactor, are employed to show the advantages of the proposed algorithm. The simulation results illustrate that the algorithm can make a good balance between control performance and computational efficiency. Namely, it can give almost same control accuracy with that of the MPC algorithm with nonlinear optimization, at the cost of slightly higher computational burden than the linear MPC.

The rest of this paper is organized as follows. Section 2 introduces the PWL Hammerstein model and the canonical representation for PWL functions. Section 3 details development of the nonlinear MPC based on the PWL Hammerstein model. Section 4 gives the simulation results for the two benchmark systems. Section 5 concludes the paper.

2 Preliminaries

2.1 Piecewise linear Hammerstein model

The controlled dynamic system is represented by the following discrete-time, nonlinear, state-space model

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{x}_{k+1}=f(\varvec{x}_{k},\varvec{u}_{k})\\ \varvec{y}_{k}=g(\varvec{x}_{k}) \end{array} \right. \end{aligned}$$
(1)

where \(\varvec{u}_{k}\in \mathbb {R}^{n_{u}}\) is the system input vector (the vector of manipulated variables), \(\varvec{x}_{k}\in \mathbb {R}^{n_{x}}\) is the system state vector, \(\varvec{y}_{k}\in \mathbb {R}^{n_{y}}\) is the system output vector (the vector of controlled variables), k is the sample time, \(f:\mathbb {R}^{n_{u}+n_{x}} \rightarrow \mathbb {R}^{n_{x}}\) and \(g:\mathbb {R}^{n_{x}} \rightarrow \mathbb {R}^{n_{y}}\) are smooth nonlinear functions.

Fig. 1
figure 1

The discrete-time Hammerstein model

A Hammerstein model is employed to approximate the above nonlinear process. As shown in Fig. 1, the Hammerstein model consists of a static nonlinear block followed by a dynamic linear block. It can be described by the following equations

$$\begin{aligned} \left\{ \begin{array}{l} \varvec{v}_{k}=F(\varvec{u}_{k})\\ \varvec{x}_{k+1}=\varvec{A}\varvec{x}_{k}+\varvec{B}\varvec{v}_{k}\\ \varvec{y}_{k}=\varvec{C}\varvec{x}_{k} \end{array} \right. \end{aligned}$$
(2)

where \(\varvec{v}_{k}\in \mathbb {R}^{n_{v}}\) is an intermediate variable vector between model blocks, \(F:\mathbb {R}^{n_{u}} \rightarrow \mathbb {R}^{n_{v}}\) is a static nonlinear function. The model matrices \(\varvec{A}\), \(\varvec{B}\) and \(\varvec{C}\) are of dimensionality \(n_{x} \times n_{x}\), \(n_{x} \times n_{v}\) and \(n_{y} \times n_{x}\), respectively.

In this study, \(F(\cdot )\) is represented by a PWL function. Note, in contrast to the previous studies on Hammerstein MPC [7, 9, 10], the nonlinear function \(F(\cdot )\) is allowed to be non-invertible and coupled multivariable nonlinearities. The coupled multivariable nonlinearities mean that multiple outputs of \(F(\cdot )\) are nonlinear functions of all inputs, which are often encountered in multivariable industrial processes [3]. In the processes with coupled multivariable nonlinearities, transforming input constraints to the constraints on immediate variables is not easy, which seriously limits the application of the previously proposed inversion-based Hammerstein MPC algorithms. This paper only assumes \(F(\cdot )\) is Lipschitz continuous. Such an assumption is not restrictive because static description of numerous technological processes, e.g., chemical reactors or distillation columns, satisfies that condition.

2.2 Canonical piecewise linear function

In its typical form, a PWL function simply consists of a number of linear sub-functions, which are used only in their specific subregions [19]. Such classical representation requires an excessive number of parameters and may make identification very difficult. For this reason, Chua et al. [26] proposed a canonical expression (with minimum and necessary number of parameters) for PWL functions in \(\mathbb {R}^{1}\) in the 1970s. It is until 1999 that Julian et al. [18, 19] by using the so-called simplicial partition method, developed the canonical PWL representation in \(\mathbb {R}^{n}\). The simplicial partition method is employed to divide the input domain to determine subregions, which has a wider range of application and is much simpler than other partition methods (e.g., the clustering method [27] and the prior knowledge-based method [28]). The resulting canonical PWL representation can approximate nonlinear mappings sufficiently closely, and the corresponding parameters can be calculated by the existing toolbox [20] directly and conveniently. In the following, the relevant definitions and concepts of the canonical PWL function are briefly reviewed. More details can be found in [18, 19].

Definition 1

Simplex: Let \(\varvec{u}_{0}, \varvec{u}_{1}, \ldots , \varvec{u}_{n}\) be \(n\,{+}\,1\) vectors in \(\mathbb {R}^{n}\). A simplex (polytope) \(\varvec{S}\) is defined by

$$\begin{aligned} \varvec{S}\triangleq \left\{ \varvec{u}\left| \ \varvec{u}=\sum _{i=0}^{n}\mu _{i}\varvec{u}_{i} \right. \right\} \end{aligned}$$
(3)

where \(\varvec{\mu }=[\mu _{0} \ \mu _{1} \ \ldots \ \mu _{n}]^{\mathrm {T}}\) is the simplex parameter vector, \(0\le \mu _{i}\le 1\) and \(\sum _{i=0}^{n}\mu _{i}=1\).

Consider a domain in \(\mathbb {R}^{n}\) of the form

$$\begin{aligned} \varvec{D} = \left\{ \varvec{u}\in \mathbb {R}^{n}\left| \ \underline{u}^{i}\le u^{i} \le \overline{u}^{i}, i = 1,2,\ldots ,n\right. \right\} \end{aligned}$$
(4)

where \(\underline{u}^{i}\) and \(\overline{u}^{i}\) are the lower bound and the upper bound of \(u^{i}\), respectively. The simplicial partition of \(\varvec{D}\) consists of two steps. Firstly, by specifying the number of divisions (\(m_{i}\)) associated with the \(u^i\) axis (i.e., dividing the interval \([\underline{u}^{i}, \overline{u}^{i}]\) into \(m_{i}\) divisions), the domain \(\varvec{D}\) is evenly divided into \(\prod _{i=1}^{n}m_{i}\) hypercubes

$$\begin{aligned} \begin{aligned} \varvec{H_{h}}&\triangleq [\underline{u}^{1}+h^{1}\delta _{1}, \ \underline{u}^{1}+(h^{1}+1)\delta _{1}] \\&\quad \times [\underline{u}^{2}+h^{2}\delta _{2}, \ \underline{u}^{2}+(h^{2}+1)\delta _{2}]\\&\quad \quad \vdots \\&\quad \times [\underline{u}^{n}+h^{n}\delta _{n}, \ \underline{u}^{n}+(h^{n}+1)\delta _{n}] \end{aligned}\end{aligned}$$
(5)

where \(\varvec{h}=[h^{1} \ h^{2} \ \ldots \ h^{n}]^{\mathrm {T}}\), \(h^{i}\) are integers and \(0\le h^{i} \le m_{i}-1\). In the axis \(u^i\), the grid step is \(\delta _{i} = (\overline{u}^{i} - \underline{u}^{i}) / m_{i}\). The corresponding grid step vector is formed as \(\varvec{\delta }=[\delta _{1} \ \delta _{2} \ \ldots \ \delta _{n}]^{\mathrm {T}}\). Each hypercube is characterized by \(2^{n}\) vertices. In particular, the vertex \(\varvec{u_{h}}=[\underline{u}^{1}+h^{1}\delta _{1}, \ \underline{u}^{2}+h^{2}\delta _{2}, \ \ldots , \ \underline{u}^{n}+h^{n}\delta _{n}]^{\mathrm {T}}\) can uniquely characterize \(\varvec{H_{h}}\), and thus, \(\varvec{h}\) is used as the index for \(\varvec{H_{h}}\). In the second stage of the partition procedure, each hypercube is subdivided into n! simplices \(\varvec{S}_{s}, s=1, \ldots , n!\). Each simplex is characterized by \(n+1\) vertices. All the simplices in the same hypercube \(\varvec{H_{h}}\) have the common vertex \(\varvec{u}_{\varvec{h}}\). By using the two-step simplicial partition, the domain \(\varvec{D}\) is finally divided into \(n!\times \prod _{i=1}^{n}m_{i}\) regions (denoted as \(\varvec{H_h}\varvec{S}_s\)). Partitioning of a domain in \(\mathbb {R}^{2}\) is shown in Fig. 2.

Fig. 2
figure 2

The simplicial partition of a domain in \(\mathbb {R}^2\)

Based on the simplicial partition, a PWL function is represented in the following canonical style

$$\begin{aligned} F(\varvec{u})=\varvec{c}^{\mathrm {T}}\varvec{\Lambda } (\varvec{u}) \end{aligned}$$
(6)

where \(\varvec{c}\) is the parameter vector of length \(\prod _{i=1}^{n}(m_{i}+1)\) and \(\varvec{\Lambda } (\varvec{u})\) is a vector function formed by a set of generating functions (multiple nestings of absolute value functions) [18, 19].

3 Nonlinear MPC based on the PWL Hammerstein model

To achieve offset-free control, some mechanism to deal with potential process disturbance and unavoidable process model mismatch should be introduced in the designed MPC algorithm. The standard method is to augment the process state with the artificially introduced disturbance [4, 29,30,31,32,33,34]. As a result, the original process model becomes the augmented state-space model and the resulting MPC algorithm can achieve offset-free control. It is also possible to use the non-minimal state-space model, in which the extended states consist of input increments, output increments and output signals [35,36,37]. Considering the augmented model method has the advantage of relatively low matrix dimensionality, it is used in this study. In the following, based on the PWL Hammerstein model defined by Eqs. (2), the MPC algorithm is developed by following the standard flow [29], i.e., the three parts: the augmented model design, the observer design, and the controller design.

3.1 Augmented model design

Considering the PWL Hammerstein model (2), the corresponding augmented model is designed as

$$\begin{aligned} \begin{aligned}&\varvec{v}_{k}=F(\varvec{u}_{k})\\&\begin{bmatrix} \varvec{x}_{k+1} \\ \varvec{d}_{k+1} \end{bmatrix} = \underbrace{\begin{bmatrix} \varvec{A}&\varvec{B}_d \\ \varvec{0}_{n_x \times n_d}&\varvec{I}_{n_d \times n_d} \end{bmatrix}}_{\overline{\varvec{A}}} \begin{bmatrix} \varvec{x}_{k} \\ \varvec{d}_{k} \end{bmatrix} + \underbrace{\begin{bmatrix} \varvec{B} \\ \varvec{0}_{{n_d \times n_v}} \end{bmatrix}}_{\overline{\varvec{B}}}\varvec{v}_{k} +\varvec{\zeta }_{k}\\&\varvec{y}_{k}=\underbrace{\begin{bmatrix} \varvec{C}&\varvec{C}_{d} \end{bmatrix}}_{\overline{\varvec{C}}} \begin{bmatrix} \varvec{x}_{k} \\ \varvec{d}_{k} \end{bmatrix}+\varvec{\xi }_{k} \end{aligned} \end{aligned}$$
(7)

where \(\varvec{d}_{k}\in \mathbb {R}^{n_d}\) is the augmented disturbance, \(\varvec{B}_d\) and \(\varvec{C}_d\) are the disturbance model matrices of dimensionality \(n_x\times n_d\) and \(n_y\times n_d\), respectively; \(\overline{\varvec{A}}\), \(\overline{\varvec{B}}\), and \(\overline{\varvec{C}}\) are the augmented model matrices of dimensionality \((n_x+n_d)\times (n_x+n_d)\), \((n_x+n_d)\times n_v\) and \(n_y\times (n_x+n_d)\), respectively; \(\varvec{\zeta }_{k}=[\varvec{\zeta }^{\mathrm {T}}_{x,k} \ \varvec{\zeta }^{\mathrm {T}}_{d,k} ]^{\mathrm {T}}\in \mathbb {R}^{n_x+n_d}\) and \(\varvec{\xi }_{k}\in \mathbb {R}^{n_y}\) are zero-mean white noise for the augmented state and the output. The variances of \(\varvec{\zeta }_{x,k}\), \(\varvec{\zeta }_{d,k}\), and \(\varvec{\xi }_{k}\) are specified by matrices \(\varvec{Q}_x\), \(\varvec{Q}_d\), and \(\varvec{Q}_y\) of dimensionality \(n_x\times n_x\), \(n_d\times n_d\) and \(n_y\times n_y\), respectively.

The disturbance model is of critical importance for offset-free control. Although the matrices \(\varvec{B}_d\) and \(\varvec{C}_d\) can be estimated from measured data, they are generally artificially selected for reducing modeling cost and obtaining good control performance. The most classical existing MPC algorithms for Hammerstein models [7, 9, 10, 17, 25] use only the output disturbance model in which \(\varvec{B}_d = \varvec{0}_{n_x \times n_d}\), \(\varvec{C}_d = \varvec{I}_{n_y \times n_d}\) with \(n_d=n_y\). It is because MPC with such a simplified model may effectively compensate for a wide range of typical disturbances and model mismatch. However, the output disturbance model has two serious drawbacks as detailed in [32, 33, 38]. First, it cannot result in offset-free control if the controlled plant contains integration. Second, it often leads to sluggish closed-loop performance due to its poor ability to model actual disturbance dynamics. As an alternative to using the output disturbance model, the input disturbance one has been recommended in recent research for linear MPC algorithms [32, 33, 38] since it is likely to alleviate the aforementioned disadvantages. Because the nonlinear function \(F(\cdot )\) is static, the input disturbance model is implemented by choosing \(\varvec{B}_d = \varvec{B}\) and \(\varvec{C}_d = \mathbf 0 _{n_y \times n_d}\) with \(n_d=n_u\). In this work, both input and output disturbance models are considered for controller design and finally one of them is chosen by comparing their control quality.

3.2 Observer design

After designing the augmented model, a proper observer is required to estimate the state \(\varvec{x}_{k}\) and the augmented disturbance \(\varvec{d}_{k}\) from the measured process input and output signals. An Augmented Kalman Filter (AKF) [32, 33] can act as an observer to solve the estimation problem.

The AKF consists of the following two steps: model prediction and measurement correction. The first step is described by the equation

$$\begin{aligned} \begin{aligned}&\varvec{v}_{k}=F(\varvec{u}_{k})\\&\begin{bmatrix} \varvec{x}_{k|k-1} \\ \varvec{d}_{k|k-1} \end{bmatrix} = \overline{\varvec{A}}\begin{bmatrix} \varvec{x}_{k-1|k-1} \\ \varvec{d}_{k-1|k-1} \end{bmatrix} + \overline{\varvec{B}}\varvec{v}_{k-1} \end{aligned} \end{aligned}$$
(8)

whereas the second step is characterized by

$$\begin{aligned} \begin{bmatrix} \varvec{x}_{k|k} \\ \varvec{d}_{k|k} \end{bmatrix} = \begin{bmatrix} \varvec{x}_{k|k-1} \\ \varvec{d}_{k|k-1} \end{bmatrix} + \varvec{L}_\mathrm{KF}\left( \varvec{y}_{k}-\overline{\varvec{C}}\begin{bmatrix} \varvec{x}_{k|k-1} \\ \varvec{d}_{k|k-1} \end{bmatrix}\right) \end{aligned}$$
(9)

where \(\varvec{L}_\mathrm{KF}=[\varvec{L}^{\mathrm {T}}_{x} \ \varvec{L}^{\mathrm {T}}_{d}]^{\mathrm {T}}\) is the observer gain matrix of dimensionality \((n_x+n_d)\times n_y\). \(\varvec{L}^{\mathrm {T}}_{x}\) and \( \varvec{L}^{\mathrm {T}}_{d}\) are the state gain matrix and the augmented disturbance gain matrix of dimensionality \(n_x\times n_y\) and \(n_d\times n_y\), respectively. The gain matrix \(\varvec{L}_\mathrm{KF}\) can be computed offline by solving the following algebraic Riccati equation (ARE)

$$\begin{aligned} \varvec{L}_\mathrm{KF}=\varvec{P}\overline{\varvec{C}}(\overline{\varvec{C}}\varvec{P}\overline{\varvec{C}}^{\mathrm {T}}+\varvec{Q}_y)^{-1} \end{aligned}$$
(10)

where

$$\begin{aligned} \varvec{P}=\overline{\varvec{A}}\varvec{P}\overline{\varvec{A}}^{\mathrm {T}}-\overline{\varvec{A}}\varvec{P}\overline{\varvec{C}}^{\mathrm {T}} (\overline{\varvec{C}}\varvec{P}\overline{\varvec{C}}^{\mathrm {T}}+\varvec{Q}_y)^{-1}\overline{\varvec{C}}\varvec{P}\overline{\varvec{A}}^{\mathrm {T}}+\overline{\varvec{Q}} \end{aligned}$$
(11)

where \(\overline{\varvec{Q}}=\mathrm {diag}(\varvec{Q}_x,\varvec{Q}_d)\). The variance matrices \(\varvec{Q}_x\), \(\varvec{Q}_d\) and \(\varvec{Q}_y\) are treated as the adjustable observer parameters to balance the offset-free control speed and the control sensitivity to noises. Because \(F(\cdot )\) is just a nonlinear function of the system input \(\varvec{u}_{k}\) rather than the estimated states, the above AKF (8) and (9) is essentially a linear Kalman filer, whose properties (e.g., the observation error convergence) have been well studied in previous literature [32, 33] and thus is omitted here.

In addition to AKF, moving horizon estimation (MHE) [39] can also be used for state estimation. Different with AKF, MHE can explicitly deal with the constraints on the states to be estimated. On the other hand, MHE generally has a higher computational burden than AKF. If state constraints are available and the computational cost is acceptable, MHE may be alternatively used.

3.3 MPC optimization problem

The objective of the MPC algorithm is to minimize differences between the predefined reference trajectory and the predicted process outputs and to avoid excessive control actions. Thus, the future sequence of increments of the manipulated variables

$$\begin{aligned} \Delta \varvec{U}_{k} = \left[ \Delta \varvec{u}^{\mathrm {T}}_{k|k} \ \Delta \varvec{u}^{\mathrm {T}}_{k+1|k} \ldots \Delta \varvec{u}^{\mathrm {T}}_{k+Hc-1|k}\right] ^{\mathrm {T}} \end{aligned}$$

is determined online at each sampling instant by solving the following optimization problem with constraints

(12)

where the norm is defined as \({\Vert \varvec{x} \Vert }_{\varvec{M}}^2 \triangleq \varvec{x}^{\mathrm {T}} \varvec{M} \varvec{x}\); \(\varvec{y}^{r}_{i|k}\in \mathbb {R}^{n_y}\) is the assumed reference trajectory for the sampling instant i known at the instant k; Hp is the prediction horizon; Hc is the control horizon; \(\varvec{w}_y\ge \varvec{0}_{n_y \times n_y}\), \(\varvec{w}_{\Delta u}\ge \varvec{0}_{n_u \times n_u}\) are weights of the minimized cost-function; \(\varvec{y}_{\min }\in \mathbb {R}^{n_y}\), \(\varvec{y}_{\max }\in \mathbb {R}^{n_y}\) are the constraints imposed on the system outputs; \(\varvec{u}_{\min }\in \mathbb {R}^{n_u}\), \(\varvec{u}_{\max }\in \mathbb {R}^{n_u}\), \(\Delta \varvec{u}_{\min }\in \mathbb {R}^{n_u}\), and \(\Delta \varvec{u}_{\max }\) are the constraints imposed on the magnitude and on the increments of system inputs, respectively. The output constraints may lead to infeasibility problems. In such cases, the so-called soft constraints can be used to replace the original output constraints, which is a standard approach in MPC. Readers can find more details in the classical textbooks [5, 34, 35].

At each sampling instant, \(n_{u} \mathrm{Hc}\) future control increments are calculated from the MPC optimization problem (12), then only the increments for the instant k are actually applied to the process, namely \(\varvec{u}_{k}=\varvec{u}_{k-1}+{\Delta }\varvec{u}_{k}\). At the next sampling instant, the whole optimization procedure is repeated.

It should be noted that, because the mapping from \(\varvec{u}_{k}\) to \(\varvec{v}_{k}\) (i.e., the function F(\(\cdot \))) is essentially nonlinear, the predicted outputs (\(\varvec{y}_{i|k}\)) are nonlinear functions of the calculated online control increments \(\Delta \varvec{U}_{k} \). This means that the future control increments are computed by solving the complex nonlinear optimization problem (12) [40]. In this paper, such a control algorithm is called as the MPC algorithm with nonlinear optimization (MPC-NL). The significant online computational burden seriously limits the application of the MPC-NL algorithm.

Bearing in mind possible computational complexity of the MPC-NL algorithm, this paper develops a computationally efficient MPC algorithm based on a multistep linearization technique (MPC-ML). At each sampling instant, by using an assumed input trajectory (multistep future inputs), the PWL functions are first transformed within the control horizon into multiple linear functions. Consequently, the predicted outputs become the linear functions of the future control increments, and the nonlinear optimization problem (12) is simplified into a QP problem, so that the future control increments can be easily computed (computational burden of quadratic optimization is much lower than that of general nonlinear optimization and the global optimal solution is always found). For achieving good approximation accuracy, the above-mentioned linearization and the resulting QP optimization may be repeated several times in internal iterations at each sampling instant.

3.4 Linearization of predicted trajectory

Taking into account the augmented Hammerstein model defined by Eq. (7), the predicted trajectory \(\varvec{V}_{k}\) of the intermediate variables between the static and dynamic parts of the model (over the control horizon) is

$$\begin{aligned} \underbrace{\begin{bmatrix} \varvec{v}_{k|k} \\ \varvec{v}_{k+1|k} \\ \vdots \\ \varvec{v}_{k+Hc-1|k} \\ \end{bmatrix}}_{\varvec{V}_{k}} = \begin{bmatrix} F(\varvec{u}_{k-1}+\Delta \varvec{u}_{k|k}) \\ F(\varvec{u}_{k-1}+\Delta \varvec{u}_{k+1|k}) \\ \vdots \\ F(\varvec{u}_{k-1}+\Delta \varvec{u}_{k+Hc-1|k})\end{bmatrix}, \end{aligned}$$
(13)

Consequently, the predicted output trajectory \(\varvec{Y}_{k}\) (over the prediction horizon) is

$$\begin{aligned} \begin{aligned} \underbrace{\begin{bmatrix} \varvec{y}_{k+1|k} \\ \varvec{y}_{k+2|k} \\ \vdots \\ \varvec{y}_{k+Hc|k} \\ \vdots \\ \varvec{y}_{k+Hp|k} \\ \end{bmatrix}}_{\varvec{Y}_{k}}&= \underbrace{\begin{bmatrix} \varvec{C}\varvec{B}&0&\cdots&0 \\ \varvec{C}\varvec{A}\varvec{B}&\varvec{C}\varvec{B}&\cdots&0 \\ \vdots&\vdots&\ddots&\vdots \\ \varvec{C}\varvec{A}^{Hc-1}\varvec{B}&\varvec{C}\varvec{A}^{Hc-2}\varvec{B}&\cdots&\varvec{C}\varvec{B} \\ \vdots&\vdots&\ddots&\vdots \\ \varvec{C}\varvec{A}^{Hp-1}\varvec{B}&\varvec{C}\varvec{A}^{Hp-2}\varvec{B}&\cdots&\sum _{i=0}^{Hp-Hc}\varvec{C}\varvec{A}^i\varvec{B} \\ \end{bmatrix}}_{\varvec{\Gamma }_{v}} \varvec{V}_{k}\\&\quad + \underbrace{\begin{bmatrix} \varvec{C}\varvec{A} \\ \varvec{C}\varvec{A}^{2} \\ \vdots \\ \varvec{C}\varvec{A}^{Hc} \\ \vdots \\ \varvec{C}\varvec{A}^{Hp} \\ \end{bmatrix}}_{\varvec{\Gamma }_{x}} \varvec{x}_{k|k} + \underbrace{\begin{bmatrix} \varvec{C}_d+\varvec{C}\varvec{B}_d \\ \varvec{C}_d+\sum _{i=0}^{1}\varvec{C}\varvec{A}^i\varvec{B}_d\\ \vdots \\ \varvec{C}_d+\sum _{i=0}^{Hc-1}\varvec{C}\varvec{A}^i\varvec{B}_d \\ \vdots \\ \varvec{C}_d+\sum _{i=0}^{Hp-1}\varvec{C}\varvec{A}^i\varvec{B}_d \\ \end{bmatrix}}_{\varvec{\Gamma }_{d}} \varvec{d}_{k|k}. \end{aligned}\end{aligned}$$
(14)

From Eq. (13) it is important to notice that the trajectory \(\varvec{V}_{k}\) is a nonlinear function of the calculated future control increments, i.e., \(\Delta \varvec{u}_{k+i|k}\) for \(i=0,\ldots ,\mathrm{Hc}-1\), whereas from Eq. (14) one may notice that the output trajectory \(\varvec{Y}_{k}\) is linear in terms of the trajectory \(\varvec{V}_{k}\). If it would be possible to obtain a linear representation of the \(\varvec{V}_{k}\) trajectory with respect to \(\Delta \varvec{u}_{k+i|k}\), the resulting output trajectory becomes linear in terms of the calculated control increments. For this purpose, the function \(F(\cdot )\) should be linearized and its linear approximation should be used for prediction.

The following three-step procedure is designed to linearize the function \(F(\cdot )\) for a given input vector \(\dot{\varvec{u}}=[\dot{u}^{1}, \ldots , \dot{u}^{n}]^{\mathrm {T}}\) (a point in the domain \(\varvec{D}\)).

Step 1 :

Find the hypercube containing \(\dot{\varvec{u}}\). As indicated in Eq. (5), a vector \(\varvec{h}=[h^1 \ \ldots \ h^n]^{\mathrm {T}}\) can uniquely index a hypercube \(\varvec{H_h}\). Hence, one can find the hypercube by computing the vector \(\varvec{h}\) from \(\dot{\varvec{u}}\). The specific procedure for completing Step 1 is:

  1. (1a)

    Calculate \(h^i=\mathrm {floor}((\dot{u}^i-\underline{u}^{i})/\delta _i)\) \(i=1,\ldots ,n\).

  2. (1b)

    If \(h^i=m_i\), let \(h^i=m_i-1\).

  3. (1c)

    Repeat step (1a) and (1b) for \(i=1,\ldots ,n\).

In Step 1, one can find a unique hypercube \(\varvec{H_h}\) for the given vector \(\dot{\varvec{u}}\) (which defines the current linearization point), even if \(\dot{\varvec{u}}\) is located at the intersection between hypercubes and the boundary of the domain \(\varvec{D}\).

Step 2 :

Find the simplex that \(\dot{\varvec{u}}\) belongs to.

Lemma 1

Assume a simplex \(\varvec{H_h}\varvec{S}_s\) is characterized by \(n+1\) vectors \(\varvec{u}_0, \ldots , \varvec{u}_n\). The necessary and sufficient condition of \(\dot{\varvec{u}} \in \varvec{H_h}\varvec{S}_s\) is there exists an unique simplex parameter vector \(\hat{\varvec{\mu }}=[\hat{\mu }_0 \ \hat{\mu }_1 \ \ldots \ \hat{\mu }_n]^{\mathrm {T}}\) subject to the condition

$$\begin{aligned} \left\{ \begin{aligned}&\hat{\varvec{\mu }} = \underbrace{\begin{bmatrix} \varvec{u}_0&\varvec{u}_1&\cdots&\varvec{u}_n\\ 1&1&\cdots&1\\ \end{bmatrix}^{-1}}_{\varvec{S}_u^{-1}} \begin{bmatrix} \dot{\varvec{u}} \\ 1 \end{bmatrix}\\&0\le \hat{\mu }_i \le 1,\quad i=1,\ldots ,n. \end{aligned}\right. \end{aligned}$$
(15)

The proof of Lemma 1 is straightforward from the definition of a simplex (Definition 1).

By using the condition (15), one can check the simplices in the hypercube \(\varvec{H_h}\) one by one. However, it should be noted that the inversion of the matrix \(\varvec{S}_u\) has to be calculated online because \(\varvec{S}_u\) changes with \(\dot{\varvec{u}}\), which will lead to unnecessary computing cost.

A linear transformation \(T:\mathbb {R}^n\rightarrow \mathbb {R}^{n}\) is defined as

$$\begin{aligned} \varvec{z}=T(\dot{\varvec{u}})=(\dot{\varvec{u}}-\underline{\varvec{u}}) ./\varvec{\delta }-\varvec{h} \end{aligned}$$
(16)

where ’./’ represents the element-wise division.

The transformation T is employed to “translate” the hypercube \(\varvec{H_h}\) into the unit hypercube \(\varvec{H}_\mathrm{unit}=[0,1]^n\) which is illustrated in Fig. 3 for \(\mathbb {R}^2\). It is obvious that the hypercube \(\varvec{H}_\mathrm{unit}\) has the same number of simplices as \(\varvec{H_h}\) has. Moreover, the relative locations of the simplices in the same hypercube do not change after the linear transformation. For example, both \(\varvec{H_h}\varvec{S}_1\) and \(\varvec{H}_\mathrm{unit}\varvec{S}_1\) are located at the bottom left in the corresponding hypercubes. In this regard, one can determine the simplex containing \(\dot{\varvec{u}}\) by judging which simplex in the unit hypercube \(\varvec{H}_\mathrm{unit}\) includes \(\varvec{z}\).

Fig. 3
figure 3

The translation T from \(\varvec{H_h}\) to \(\varvec{H}_\mathrm{unit}\) in \(\mathbb {R}^2\)

Lemma 2

If and only if the following condition is satisfied

$$\begin{aligned} \left\{ \begin{aligned}&\hat{\varvec{\mu }} = \underbrace{\begin{bmatrix} \varvec{z}_0&\varvec{z}_1&\cdots&\varvec{z}_n\\ 1&1&\cdots&1\\ \end{bmatrix}^{-1}}_{\varvec{S}_z^{-1}} \begin{bmatrix} \varvec{z} \\ 1 \end{bmatrix}\\&0\le \hat{\mu }_i \le 1,\quad i=1,\ldots ,n \end{aligned}\right. \end{aligned}$$
(17)

where \(\varvec{z}_i = T(\varvec{u}_i)\), \(i=0,\ldots , n\) and \(\varvec{z} = T(\dot{\varvec{u}})\), then \(\dot{\varvec{u}} \in \varvec{H_h}\varvec{S}_s\).

Unlike \(\varvec{S}_u^{-1}\), \(\varvec{S}_z^{-1}\) can be calculated offline and stored and can be used online in a look-up table manner.

The specific procedure for completing Step 2 is:

  1. (2a)

    Compute \(\varvec{z} = T(\dot{\varvec{u}})\).

  2. (2b)

    For each \(\varvec{H}_\mathrm{unit}\varvec{S}_s\): if the condition (17) is satisfied, then \(\dot{\varvec{u}}\) is in \(\varvec{H}_{h}\varvec{S}_s\); otherwise repeat checking the condition (17) for other simplices in \(\varvec{H}_\mathrm{unit}\) until the simplex including \(\dot{\varvec{u}}\) is found.

Step 3 :

Linearize the PWL function F(\(\cdot \)). In each simplex \(\varvec{H_h}\varvec{S}_s\), the PWL function is equivalent to the following linear function

$$\begin{aligned} \begin{aligned} F(\varvec{u})&=F(\tilde{\varvec{u}}_{\varvec{h}})+ \left. \frac{\partial F}{\partial u^1}\right| _{\varvec{h},s}(u^1-\tilde{u}_{\varvec{h}}^1) +\cdots \\&\quad +\left. \frac{\partial F}{\partial u^n}\right| _{\varvec{h},s}(u^n-\tilde{u}_{\varvec{h}}^n) \\&=\varvec{\phi } \varvec{u}+\varvec{\psi } \end{aligned}\end{aligned}$$
(18)

where \(\tilde{\varvec{u}}_{\varvec{h}}=[\tilde{u}_{\varvec{h}}^1 \ \ldots \ \tilde{u}_{\varvec{h}}^n]\) is an arbitrary point in \(\varvec{H_h}\varvec{S}_s\) (typically \(\varvec{u_h}\)); \(\varvec{\phi }=\left[ \left. \frac{\partial F}{\partial u^1}\right| _{\varvec{h},s} \ \cdots \ \left. \frac{\partial F}{\partial u^n}\right| _{\varvec{h},s} \right] \); \(\varvec{\psi }=F(\tilde{\varvec{u}}_{\varvec{h}}) - \varvec{\phi }\tilde{\varvec{u}}_{\varvec{h}}\). Note that given an identified PWL function, \(\varvec{\phi }\) in different simplices can be also calculated offline and stored for online use. The specific procedure for completing Step 3 is:

  1. (3a)

    Find \(\varvec{\phi }\).

  2. (3b)

    Calculate \(\varvec{\psi }\).

The above 3-step procedure can be used to linearize the function F(\(\cdot \)) at a single vector \(\dot{\varvec{u}}\). In order to linearize the trajectory \(\varvec{V}_{k}\) and finally linearize the predicted output trajectory \(\varvec{Y}_{k}\), the linearization is performed for some trajectory of the manipulated variables defined over the whole control horizon, which is defined in the following way

$$\begin{aligned} \dot{\varvec{U}}^{t}_{k}=\left[ (\dot{\varvec{u}}^{t}_{k|k})^{\mathrm {T}}, ~(\dot{\varvec{u}}^{t}_{k+1|k})^{\mathrm {T}} \ \ldots \ (\dot{\varvec{u}}^{t}_{k+Hc-1|k})^{\mathrm {T}} \right] ^{\mathrm {T}} \end{aligned}$$
(19)

The trajectory of future values of the manipulated variables (19) of course depends on the current operating point of the process, i.e., it depends on the sampling time k. Moreover, at the same sampling instant it is possible to repeat linearization a few times, the index t indicates the internal iteration (repetition of linearization).

By repeatedly using the 3-step procedure, the trajectory of the intermediate variables between the static and dynamic parts of the model (over the control horizon), \(\varvec{V}_{k}\), is linearized around the trajectory of the future process inputs (over the control horizon), \(\dot{\varvec{U}}^{t}\). From Eq. (18), the linearized trajectory is

$$\begin{aligned} \widetilde{\varvec{V}}^{t} = \underbrace{\begin{bmatrix} \varvec{\phi }^{t}_{k|k}&0&\cdots&0 \\ 0&\varvec{\phi }^{t}_{k+1|k}&\cdots&0 \\ \vdots&\vdots&\ddots&\vdots \\ 0&0&\cdots&\varvec{\phi }^{t}_{k+Hc-1|k}\\ \end{bmatrix}}_{\varvec{\Phi }^{t}_{k}} \varvec{U}^{t} + \underbrace{\begin{bmatrix} \varvec{\psi }^t_{k|k} \\ \varvec{\psi }^t_{k+1|k} \\ \vdots \\ \varvec{\psi }^t_{k+Hc-1|k} \end{bmatrix}}_{\varvec{\Psi }^{t}_{k}} \end{aligned}$$
(20)

Considering that \(\varvec{U}^t_{k}=\varvec{L}\Delta \varvec{U}^t_{k} +\varvec{U}_{k-1} \) where the matrix \(\varvec{L}\), of dimensionality \(n_u Hc \times n_u Hc\), and the vector \(\varvec{U}_{k-1}\), of length \(n_u Hc\), are

$$\begin{aligned} \varvec{L}=\begin{bmatrix} \varvec{I}_{n_u \times n_u}&\varvec{0}_{n_u \times n_u}&\cdots&\varvec{0}_{n_u \times n_u}\\ \varvec{I}_{n_u \times n_u}&\varvec{I}_{n_u \times n_u}&\cdots&\varvec{0}_{n_u \times n_u}\\ \vdots&\vdots&\ddots&\vdots \\ \varvec{I}_{n_u \times n_u}&\varvec{I}_{n_u \times n_u}&\cdots&\varvec{I}_{n_u \times n_u} \end{bmatrix}, \varvec{U}_{k-1}=\begin{bmatrix}\varvec{u}_{k-1}\\ \varvec{u}_{k-1} \\ \vdots \\ \varvec{u}_{k-1} \end{bmatrix} \end{aligned}$$

the linearized trajectory (20) may be compactly expressed

$$\begin{aligned} \widetilde{\varvec{V}}^{t}_{k} = \varvec{\Phi }^{t}_{k}\varvec{L} \Delta \varvec{U}^{t}_{k} + \varvec{\Phi }^{t}_{k}\varvec{U}_{k-1} +\varvec{\Psi }^{t}_{k} \end{aligned}$$
(21)

Next, taking into account Eq. (14), the linearized trajectory of output predictions is

$$\begin{aligned} \widetilde{\varvec{Y}}^{t}_{k}= \varvec{\Gamma }_v\varvec{\Phi }^{t}_{k}\varvec{L}\Delta \varvec{U}^{t}_{k} +\varvec{Y}^{0,t}_{k} \end{aligned}$$
(22)

where the so-called free trajectory (independent of the currently calculated future control sequence) is defined as

$$\begin{aligned} \varvec{Y}^{0,t}_{k}= \varvec{\Gamma }_v\varvec{\Phi }^{t}_{k}\varvec{U}_{k-1} +\varvec{\Gamma }_v\varvec{\Psi }^{t}_{k} +\varvec{\Gamma }_{x}\varvec{x}_{k|k} +\varvec{\Gamma }_{d}\varvec{d}_{k|k} \end{aligned}$$
(23)

3.5 Efficient MPC algorithm

Taking into account the linear approximation of the predicted output trajectory defined by Eq. (22), the general nonlinear MPC optimization problem (12) becomes the following QP problem

$$\begin{aligned} \begin{aligned}\underset{\Delta \varvec{U}^t_{k}}{\min }&\left\| \varvec{Y}^{r}_{k} -\varvec{\Gamma }_v\varvec{\Phi }^{t}_{k}\varvec{L}\Delta \varvec{U}^{t}_{k} -\varvec{Y}^{0,t}_{k} \right\| _{\varvec{W}_y}^{2} + \left\| \Delta \varvec{U}^t \right\| _{\varvec{W}_{\Delta u}}^{2} \\ s.t.&\varvec{Y}_l\le \varvec{\Gamma }_v\varvec{\Phi }^{t}_{k}\varvec{L}\Delta \varvec{U}^{t}_{k} +\varvec{Y}^{0,t}_{k}\le \varvec{Y}_{u} \\ &\varvec{U}_{l}\le \varvec{U}_{k-1}+\varvec{L}\Delta \varvec{U}^{t}_{k} \le \varvec{U}_{u}\\ &\Delta \varvec{U}_{l}\le \Delta \varvec{U}^{t}_{k} \le \Delta \varvec{U}_{u}\\ \end{aligned}\end{aligned}$$
(24)

where the vectors of length \(n_y Hc\) are

$$\begin{aligned}\begin{aligned} \varvec{Y}^r_{k}=\begin{bmatrix}\varvec{y}_{k+1|k}^r\\ \varvec{y}_{k+2|k}^r \\ \vdots \\ \varvec{y}_{k+Hp|k}^r \end{bmatrix}, \ \varvec{Y}_{l}=\begin{bmatrix}\varvec{y}_{\min }\\ \varvec{y}_{\min } \\ \vdots \\ \varvec{y}_{\min } \end{bmatrix}, \ \varvec{Y}_{u}=\begin{bmatrix}\varvec{y}_{\max }\\ \varvec{y}_{\max } \\ \vdots \\ \varvec{y}_{\max } \end{bmatrix} \end{aligned}\end{aligned}$$

and the vectors of length \(n_u Hp\) are

$$\begin{aligned}\begin{aligned} \varvec{U}_{l}=\begin{bmatrix}\varvec{u}_{\min }\\ \varvec{u}_{\min } \\ \vdots \\ \varvec{u}_{\min } \end{bmatrix}, \ \varvec{U}_{u}=\begin{bmatrix}\varvec{u}_{\max }\\ \varvec{u}_{\max } \\ \vdots \\ \varvec{u}_{\max } \end{bmatrix}, \ \end{aligned}\end{aligned}$$
$$\begin{aligned}\begin{aligned} \Delta \varvec{U}_{l}=\begin{bmatrix}\Delta \varvec{u}_{\min }\\ \Delta \varvec{u}_{\min } \\ \vdots \\ \Delta \varvec{u}_{\min } \end{bmatrix},\ \Delta \varvec{U}_{u}=\begin{bmatrix}\Delta \varvec{u}_{\max }\\ \Delta \varvec{u}_{\max } \\ \vdots \\ \Delta \varvec{u}_{\max } \end{bmatrix} \end{aligned}\end{aligned}$$

To sum up, the proposed MPC-ML algorithm consists of the following steps, which are repeated at each sampling instant k:

  1. 1.

    The state vector \(\varvec{x}_{k|k}\) and the disturbance vector \(\varvec{d}_{k|k}\) are estimated.

  2. 2.

    The vector which defines the linearization point is initialized, \(\dot{\varvec{U}}^{1}_{k}=\left[ {\varvec{u}}_{k-1}^{\mathrm {T}} \ {\varvec{u}}_{k-1}^{\mathrm {T}} \ldots {\varvec{u}}_{k-1}^{\mathrm {T}} \right] ^{\mathrm {T}}\), the index t is set to 1.

  3. 3.

    The linearized predicted output trajectory \(\widetilde{\varvec{Y}}^{t}_{k}\) is calculated, i.e., the matrices \(\varvec{\Phi }^{t}_{k}\) and \(\varvec{\Psi }^{t}_{k}\) are found.

  4. 4.

    The MPC QP problem (24) is solved to find \(\Delta \varvec{U}^{t}_{k}\) and the vector \(\dot{\varvec{U}}^{t}_{k}=\varvec{U}_{k-1}+\varvec{L}\Delta \varvec{U}^{t}_{k}\) is updated.

  5. 5.

    Decide whether internal iterations should be terminated. If the trajectory \(\Delta \varvec{U}^{t}_{k}\) is close to the trajectory at the previous internal iteration, i.e.,

    $$\begin{aligned} \left\| \Delta \varvec{U}^{t}_{k}-\Delta \varvec{U}^{t-1}_{k} \right\| ^{2}<TV_u \end{aligned}$$
    (25)

    or \(t>t_{\max }\), then go to Step 6; (\(TV_u\) is a threshold value to be tuned). Otherwise, update \(t:=t+1\) and go to Step 3.

  6. 6.

    Apply the first \(n_u\) elements of the vector \(\Delta \varvec{U}^{t}_{k}\) (i.e., the vector \(\Delta \varvec{u}^{t}_{k|k}\)) to the process.

At the next sampling instant, the algorithm starts from Step 1 and the above six-step procedure is repeated.

Remark 1

A similar method, which is called MPC with nonlinear prediction and linearization along predicted trajectory (MPC-NPLPT), has been proposed for different types of process models, including Hammerstein [17, 25], Wiener [5], Hammerstein-Wiener [41] as well as Wiener-Hammerstein cascade structures [42]. The MPC-NPLPT algorithm requires the process model to be differentiable and calculates the derivatives online, while the proposed MPC-ML algorithm linearizes the PWL function in the form of a look-up table. Moreover, the proposed MPC-ML can integrate various disturbance models, and thus can contribute to better control performance than MPC-NPLPT.

4 Simulation experiments

In this section, an single-input single-output (SISO) CSTR and a multiple-input multiple-output (MIMO) pH neutralization reactor are studied to show the advantages of the proposed MPC-ML algorithm.

First of all, the Hammerstein models are identified by the linear-nonlinear (L-N) approach [27, 43]. This approach is straightforward and ensures an accurate estimation of the static nonlinearity. In the first step of the L-N approach, the linear block of the Hammerstein model is identified using the dynamic training data (the signals \(\varvec{u}\) and \(\varvec{y}\)) and the ‘n4sid’ algorithm [44]. Then, the intermediate variable vector \(\varvec{v}\) is calculated using the process output of the steady-state training data and the static gain matrix of the linear model (\(\varvec{K}_\mathrm{lin}\)) from the equation \(\varvec{v}= \varvec{K}_\mathrm{lin}^{-1}\times \varvec{y}\). Finally, the nonlinear PWL block of the model is found from the steady-state training data (the signals \(\varvec{v}\) and \(\varvec{y}\)) using the PWL toolbox [20]. Multiple PWL Hammerstein models with different numbers of divisions are firstly identified, and then they are evaluated and selected by an independent dynamic test data set.

Based on the identified Hammerstein models, three types of MPC controllers, i.e., linear MPC (MPC-L), MPC based on a multistep linearization technique (MPC-ML) and MPC algorithm with nonlinear optimization (MPC-NL) are designed. In the above-mentioned algorithms, the same Hammerstein models are used as well as the same input and output disturbance models. For comparison, a MPC algorithm with nonlinear prediction and linearization along predicted trajectory (MPC-NPLPT) [41] is also designed based on the differentiable semiempirical Hammerstein process models [45]. Because the input disturbance model is not applicable in MPC-NPLPT, only the output disturbance model is used in this algorithm. Finally, the resulting MPC controllers are compared to show the advantages of the proposed MPC-ML algorithm. All the simulations are carried out using the software MATLAB R2015b in an Intel Core i5-4590, 3.30GHz computer with a 64-bit Windows 7 operating system.

4.1 Example 1: an SISO CSTR

Considering an SISO continuous stirred tank process, which consists of irreversible, exothermic reaction, \(A{\rightarrow }B\), in a constant volume reactor cooled by a single coolant stream (see Fig. 4). It can be modeled by the following continuous-time nonlinear equations [11]

$$\begin{aligned} \left\{ \begin{aligned} \dot{C}_A&=\frac{q}{V}\left( C_{A0}-C_A\right) -k_0C_Ae^{-\frac{E}{\mathrm{RT}}} \\ \dot{T}&=\frac{q}{V}\left( T_{0}-T\right) - \frac{\Delta Hk_0}{\rho C_p}C_Ae^{-\frac{E}{\mathrm{RT}}}\\&\quad +\frac{\rho _c C_{pc}}{\rho C_p V}q_c\left( 1-e^{-\frac{h_A}{\rho _c C_{pc}q_c}}\right) \left( T_{c0}-T\right) \end{aligned} \right. \end{aligned}$$
(26)

The process output variable is the concentration of A, denoted by \(C_A\). The process input variable is the coolant flow rate, denoted by \(q_c\). The variation ranges of the output and the input are \(C_A\in [0.02,0.15]\) and \(q_c\in [60,115]\), respectively. The control objective is to regulate \(C_A\) for set-point tracking by manipulating \(q_c\). The nominal model parameters are listed in Table 1. The sampling period is \(T_s=0.1\) min.

4.1.1 Modeling of the CSTR

At first, two training data sets, i.e., the dynamic training set and the steady-state one, are generated to identify the dynamic linear part and the static PWL part, respectively. The first set consists of 2000 samples around the nominal set-point, whereas the second one consists of 220 equidistant steady-state data points in the range \(q_c\in [60,115]\). Next, a dynamic test set with 1000 samples in the range \(q_c\in [60,115]\) is also generated for model evaluation and model selection. All the training data sets and the test data set are scaled by the nominal process input (103.41 L min\(^{-1}\)) and process output (0.1 mol L\(^{-1}\)): \(u=q_c-103.41\), \(y=C_A-0.1\).

The linear model is identified by the ‘n4sid’ algorithm [44]. The order of the linear model is determined by 4 (\(n_x=4\)) as in [40]. Next, the static PWL model is found. It is interesting to consider the influence of the number of divisions (m) used in the static part of the model on the overall accuracy of the whole model. Total nine PWL functions are calculated, whose number of divisions (m) are \(2, 3, \ldots 10\), respectively. Figure 5 depicts the steady-state data set versus the outputs of four PWL functions.

Fig. 4
figure 4

The SISO continuous stirred tank reactor

Table 1 Nominal model parameters of CSTR
Fig. 5
figure 5

Modeling of the CSTR: the training steady-state data set versus the obtained output of the PWL functions with different numbers of divisions

By combining the PWL functions and the linear model, the PWL Hammerstein models are finally formed. The mean squared errors (\(\mathrm {MSE}_\mathrm{test}\)) for the test data set are calculated to evaluate the identified Hammerstein models

$$\begin{aligned} \mathrm {MSE}_\mathrm{test}=\frac{1}{n_\mathrm{test}}\sum _{k=1}^{n_\mathrm{test}} (C_{A,k}-C_{A,k}^\mathrm{mod})^2 \end{aligned}$$
(27)

where \(n_\mathrm{test}\) is the number of samples in the test set, \(C_{A,k}\) denotes the real process output, \(C_{A,k}^\mathrm{mod}\) is the output of the models. Table 2 shows model test errors for the linear and PWL Hammerstein models with diffident numbers of divisions. The outputs of the PWL Hammerstein models with different numbers of divisions are depicted in Fig. 6. Additionally, the output of the linear model of the same order of dynamics is also shown. It is easily to notice that in general the PWL Hammerstein models are much more precise than the linear one. Finally, as a compromise between model complexity and accuracy, the model with less than six divisions (\(m\le 5\)) is used in the MPC-ML algorithms.

Table 2 Comparison of linear and PWL Hammerstein models of CSTR
Fig. 6
figure 6

Modeling of the CSTR: the linear model versus the Hammerstein models with different numbers of divisions

Fig. 7
figure 7

Closed-loop responses of CSTR with the input disturbance model

Fig. 8
figure 8

Closed-loop responses of CSTR with output disturbance model

Table 3 Control accuracy and computational burden for the CSTR

4.1.2 Predictive control of the CSTR

To evaluate the influence of the number of divisions (m) on control performance, the PWL Hammerstein models with 2, 3, 4, and 5 divisions are used in the MPC-ML controllers. The most accurate PWL Hammerstein model, the one with 5 divisions, is used in the MPC-NL controllers. All MPC controllers have the same parameters \(Hp=100\), \(Hc=10\), \(\varvec{w}_y=1000\) and \(\varvec{w}_{\Delta u}=1\). Additional parameters of the MPC-ML controller are: \(TV_u=0.1\), \(t_\mathrm{max}=3\). Constraints on the manipulated variables are: 60 L min\(^{-1}\) \(\le q_c \le \) 115 L min\(^{-1}\).

Figures 7 and 8 show the closed-loop responses of CSTR for a chosen trajectory of set-point, with input and output disturbance models, respectively. In order to compare the MPC algorithms, the following integral square error (\(\mathrm {ISE}\)) criterion is defined

$$\begin{aligned} \mathrm {ISE}=\sum _{k=1}^{400}(C_{A,k}^r-C_{A,k})^2 \end{aligned}$$
(28)

where \(C_{A,k}^r\) represents the set-points. Table 3 compares the MPC controllers in terms of the ISE indicator and the scaled computation time (in relation to the most computationally demanding MPC-NL scheme with the output disturbance model, whose computation time is 191.53s).

It can be seen that the MPC-ML controllers, the MPC-NPLPT controller (only applicable to the output disturbance model), and the MPC-NL controllers have similar closed-loop performance, both of which perform better than the MPC-L controllers. Because the identified linear model is only effective for the nominal working point, the MPC-L controllers give unsatisfactory sluggish tracking responses as the set-point being far away from the nominal working point (\(C_A = 0.1\)). By comparison, the nonlinear PWL Hammerstein model is effective in the entire working range. Besides, for this CSTR process, the MPC-ML based on more divisions (larger m) contribute to higher control accuracy than those with less divisions (smaller m). Meanwhile, the computation time of the MPC-ML controllers is far less than that for the MPC-NPLPT controller and the MPC-NL controllers, and is just slightly longer than that for the MPC-L controllers. Therefore, compared with other MPC controllers, the proposed MPC-ML can achieve a better balance between control performance and computational efficiency.

Moreover, the simulation results show all the controllers based on the input disturbance model perform better than their counterparts based on the output disturbance model. This illustrates the necessity of trying the input disturbance model and demonstrates the another advantage of MPC-ML over the traditional MPC-NPLPT (MPC-ML has the ability to integrate various disturbance model).

Based on the resulting control performance, the MPC-ML based on the input disturbance model and five divisions is recommended for the CSTR.

4.2 Example 2: a pH neutralization reactor

The pH neutralization reactor is shown in Fig. 9, in which the neutralization reaction takes place among acid (HNO\(_3\)), buffer (NaHCO\(_3\)), and base (NaOH). This process can be described by the following three continuous-time nonlinear differential equations [25, 46]

$$\begin{aligned} \left\{ \begin{aligned} \dot{W}_{a_4}&= \frac{1}{Ah}[ (W_{a_1}-W_{a_4})q_1 + (W_{a_2}-W_{a_4})q_2\\&\qquad \qquad + (W_{a_3}-W_{a_4})q_3] \\ \dot{W}_{b_4}&= \frac{1}{Ah}[ (W_{b_1}-W_{b_4})q_1 + (W_{b_2}-W_{b_4})q_2\\&\qquad \qquad + (W_{b_3}-W_{b_4})q_3]\\ \dot{h}&= \frac{1}{A} (q_1+q_2+q_3-c_vh^{0.5})\\ \end{aligned} \right. \end{aligned}$$
(29)

and the nonlinear algebraic output equation

$$\begin{aligned} \begin{aligned} W_{a_4}&+10^{\mathrm {pH}-14}-10^{-\mathrm {pH}}\\&+W_{b_4}\times \frac{1+2\times 10^{\mathrm {pH}-\mathrm {p}K_2}}{1+10^{\mathrm {p}K_1-\mathrm {pH}}+10^{\mathrm {pH}-\mathrm {p}K_2}} =0 \end{aligned}\end{aligned}$$
(30)

where \(q_i\), \(W_{a_i}\), and \(W_{b_i}\) are the flows, the charge balance factors, and the material balance factors, respectively; p\(K_1\) and p\(K_2\) are constant chemical coefficients. The control objective is to regulate the pH value of the effluent stream to predefined values and keep the liquid level h constant at the nominal value, i.e., \(\varvec{y}=[\mathrm {pH} \ h]^{\mathrm {T}}\). The manipulated variables are the acid flow \(q_1\) and the base flow \(q_3\), i.e., \(\varvec{u}=[q_1 \ q_3]^{\mathrm {T}}\). The operating range of the manipulated variables is: \(q_1 \in [0,30]\) and \(q_3 \in [0,30]\). The nominal parameters are listed in Table 4. The sampling period is \(T_s=10\)s.

Fig. 9
figure 9

The pH neutralization reactor

Table 4 Nominal model parameters of pH reactor
Fig. 10
figure 10

Modeling of the pH reactor: the training steady-state data \(v_1\) versus the obtained outputs of the PWL functions with different numbers of divisions

Fig. 11
figure 11

Modeling of the pH reactor: the training steady-state data \(v_2\) versus the obtained outputs of the PWL functions with different numbers of divisions

4.2.1 Modeling of the pH reactor

To model the pH reactor, the similar procedures as in Example 1 are performed. Two training sets, i.e., the dynamic training set and the steady-state one, are generated for the identification. The first set consists of 2000 samples around the nominal set-point, whereas the second one consists of 961 equidistant steady-state data points in the range \(q_1 \in [0,30]\) and \(q_3 \in [0,30]\). Also, a dynamic test set with 2000 samples in the range \(q_1 \in [0,30]\) and \(q_3 \in [0,30]\) is also generated for model evaluation and model selection. All the data sets are scaled by the nominal process input (\(\varvec{u}_\mathrm{norm}=[16.6,~15.6]^{\mathrm {T}}\)) and process output (\(\varvec{y}_\mathrm{norm}=[7.0255,~14.009]^{\mathrm {T}}\)): \(\varvec{u}=\varvec{u}-\varvec{u}_\mathrm{norm}\), \(\varvec{y}=\varvec{y}-\varvec{y}_\mathrm{norm}\).

A five-order linear model (\(n_x=5\)) is identified by the ‘n4sid’ algorithm[44], whose order has been proven to be sufficient to describe the dynamic of the neutralization reaction [25]. Then, to identify the PWL function, the steady-state intermediate variable vector \(\varvec{v}\) is computed as: \(\varvec{v}=[v_1, v_2]^T= \varvec{K}_\mathrm{lin}^{-1}\times \varvec{y}\).

In this case, the influence of different numbers of divisions \(m_1\) (associated with the \(q_1\) variable) and \(m_2\) (associated with the \(q_3\) variable) on the overall accuracy of the final PWL Hammerstein model is studied. For simplicity, \(m_1\) and \(m_2\) are bundled together and are represented as \(m_1\times m_2\). The same values of \(m_1\) and \(m_2\) are always used. Moreover, the same configuration of \(m_1\times m_2\) grid is used for the two submodels, i.e., for the variables \(v_1\) and \(v_2\). Total nine kinds of PWL functions are identified for \(v_1\) and \(v_2\), whose number of divisions (\(m_1\times m_2\)) are \(2\times 2\), \(3\times 3\), \(4\times 4\), ...\(10 \times 10\), respectively. Figures 10 and 11 depict the steady-state data set versus the outputs of the PWL functions.

Finally, the PWL Hammerstein models are formed by combining the PWL functions and the linear model. The mean squared errors (\(\mathrm {MSE}_\mathrm{test}\)) for the test data set are defined to evaluate the Hammerstein models

$$\begin{aligned} \mathrm {MSE}_\mathrm{test}&=\frac{1}{n_\mathrm{test}}\sum _{k=1}^{n_\mathrm{test}}\nonumber \\&\quad \times \left[ (\mathrm {pH}_{k}-\mathrm {pH}^\mathrm{mod}_{k})^2 + (h_{k}-h^\mathrm{mod}_{k})^2\right] \nonumber \\ \end{aligned}$$
(31)
Table 5 Comparison of linear and PWL Hammerstein models of the pH reactor

where \(\mathrm {pH}_{k}\) and \(h_{k}\) denote the real process outputs, \(\mathrm {pH}^\mathrm{mod}_{k}\) and \(h^\mathrm{mod}_{k}\) are the outputs of the models. Table 5 shows test errors for the linear and PWL Hammerstein models with diffident numbers of divisions. Figure 12 depicts the outputs of the PWL Hammerstein models with different numbers of divisions and the outputs of the linear model. Apparently, the PWL Hammerstein models can approximate the pH reactor more precisely than the linear one. Considering the models with \(2 \times 2\) and \(3 \times 3\) divisions have relatively low model complexity and the models with \(8 \times 8\) and \(9 \times 9\) divisions have high accuracy, they are selected to form MPC-ML controllers.

4.2.2 Predictive control of the pH reactor

In this part, the PWL Hammerstein models with \(2 \times 2\), \(3 \times 3\), \(8 \times 8\) and \(9 \times 9\) divisions are used to form the MPC-ML controllers. The MPC-NL controllers are based on the PWL Hammerstein model with \(9 \times 9\) divisions as this model has the highest modeling precision among the identified models. All MPC controllers use the same parameters: \(\mathrm{Hp=}20\), \(\mathrm{Hc}=10\), \(\varvec{w}_y = \begin{bmatrix} 1&0 \\ 0&1000 \end{bmatrix} \), \(\varvec{w}_{\Delta u}=0.5\varvec{I}_{2\times 2}\), \(TV_u=1\), and \(t_\mathrm{max}=3\). Constraints on the manipulated variables are: 10 mL/s \(\le q_1 \le \) 20 mL/s and 10 mL/s \(\le q_3 \le \) 20 mL/s.

Fig. 12
figure 12

Modeling of the pH reactor: the linear model versus the Hammerstein models with different numbers of divisions

Fig. 13
figure 13

Closed-loop responses of pH reactor with the input disturbance model

Fig. 14
figure 14

Closed-loop responses of pH reactor with the output disturbance model

Figures 13 and 14 show the closed-loop responses of the pH reactor for a predefined trajectory of set-point, with input and output disturbance models, respectively. To compare the MPC algorithms, the following integral square error (\(\mathrm {ISE}\)) criterion is defined

$$\begin{aligned} \mathrm {ISE}=\sum _{k=1}^{125}\left[ (\mathrm {pH}_{k}^r-\mathrm {pH}_{k})^2 + (h_{k}^r-h_{k})^2\right] \end{aligned}$$
(32)

where \(\mathrm {pH}_{k}^r\) and \(h_{k}^r\) denote the set-points. Table 3 compares the MPC controllers in terms of the ISE indicator and the scaled computation time (in relation to the most computationally demanding MPC-NL scheme with the input disturbance model, whose computation time is 30.46s).

The simulation results show that, in terms of the ISE indicator, the controllers based on the output disturbance model perform better than the controllers based on the input disturbance model. This means the output disturbance model is more appropriate and should be adopted for the pH neutralization process.

Compared with the linear controllers, the MPC-ML controllers with \(2\times 2\) and \(3\times 3\) divisions give worse control performance. That is because the corresponding PWL Hammerstein models are not sufficiently precise, and thus, the resulting MPC-ML controllers frequently switch model static gains between input subregions and lead to output chatting (see Fig. 13) and large oscillations (see Fig. 14). Nevertheless, the proposed MPC-ML controllers wtih \(8\times 8\) and \(9\times 9\) divisions show much better control performance than the MPC-L controllers, which illustrates the necessity of using PWL Hammerstein models.

Table 6 Control accuracy and computational burden for the pH reactor

With the output disturbance model, the proposed MPC-ML controller with \(9\times 9\) divisions and the MPC-NPLPT controller give similar control performance, both of which perform just slightly poorer than the MPC-NL controller. However, the computation time of MPC-ML is significantly less than that of MPC-NPLPT and MPC-NL. All things considered, the MPC-ML controller based the output disturbance model and \(9\times 9\) divisions is recommended (Table 6).

5 Conclusion

In this study, a computationally efficient nonlinear MPC algorithm (MPC-ML) is developed for PWL Hammerstein models. By using the simplicial partition method, PWL functions are represented in the canonical style. Control laws are efficiently calculated via multistep linearization of the predicted output trajectory. A three-step procedure is designed to linearize PWL functions, where the derivatives of PWL functions are used in a simple look-up table manner. The most attractive advantage of this algorithm is that it can achieve a good balance between control accuracy and computational burden. Besides, it can integrate various disturbance models, which is likely to contribute to better control performance than the existing MPC-NPLPT algorithm. Inversion of input nonlinearity is not required by MPC-ML, and thus, it can directly integrate input constraints without any transformation.

The advantages of the proposed algorithm are demonstrated by two benchmark nonlinear processes: the CSTR and the pH reactor are considered. These two nonlinear processes can be precisely approximated by the PWL Hammerstein model with 5 divisions and the model with \(9\times 9\) divisions, respectively. In both the two cases, the proposed MPC-ML algorithm gives almost same control accuracy as in the MPC-NL algorithm with nonlinear optimization, whereas its computational time just accounts for around \(5\%\) of that consumed by the MPC-NL algorithm. All things considered, good control accuracy and computational efficiency are two main advantages of the described MPC-ML algorithm.