1 INTRODUCTION

Due to increasing demands on the quality of the main types of petroleum products, the oil refining and petrochemical industries are forced to continuously improve the economic efficiency of production and the quality of products [1]. Production efficiency can be improved with the help of virtual monitoring systems and systems to check the quality of the output products and mass transfer processes such as distillation and absorption [2].

The development of new methods of predictive modeling, which means the use of statistical methods for model design oriented to estimate the quality indicators of the output variable of a plant taking into account the current values of the input variables [3], will provide a real-time noticeable increase in production efficiency.

The selection of input variables X = (x1, …, xp) affecting the output value Y, and the choice of the structure of the model can be based on correlational and regression analysis [4]. However, for nonlinear plants, the use of these methods does not allow us to determine the structure of the model. This leads to the ambiguity of obtaining estimates of the unknown parameters of the model B = (β0β1, …, βp), when the same sample of the experimental data corresponds equally well not to one but to the set of models at once \(F(X,B)\). This situation indicates the unidentifiable structure of the model. Structural identifiability occurs when two models \(M(B)\) and \(M(B^*)\) with the same structure \(M( \cdot )\) are called indistinguishable by output (we denote this property \(M(B^*) \approx M(B),\)) \(B,B^* \in \Omega \) if for any valid input \(x(t)\) the models have the same outputs \(Y(t,B,x) \equiv Y(t,B^*,x)\) for any \(t \geqslant 0\) [57]. Structural identifiability means the identifiability of the structure not of a single model but of the whole family of models [8].

For the analysis of structural identification, many different methods and algorithms for dynamic systems are proposed. For linear systems, the analysis of structural identifiability is understood quite well. There are a number of methods for its analysis, for example, the transfer function method [9], similarity transformation, and approaches based on the theory of differential algebra and graph theory [10].

However, for nonlinear plants it is much more difficult to analyze structural identifiability. This is due to the fact that the number of unknown model parameters may be more than the number of equations in the system [11]. For such plants, structural identifiability analysis is carried out using methods such as the decomposition of the output function Y into a Taylor series and studying the eigenvalues of the Fisher information matrix [12].

To analyze the identifiability of large-dimensional models, we apply the probabilistic algorithm method [13], which calculates the parameters for a system with an unknown structure of the plant model. The algorithm is based on the algebraic calculation of the rank of a certain power series of output functions. The rank is required to calculate the degree of transcendence (the degree of freedom of the expansion area related to the parameter). Despite the fact that this algorithm is widely used, it does not allow us to determine the source of nonidentifiability and does not group the parameters according to their functional relationships, nor does it provide transformations or reparameterization to make the model identifiable.

In the case when it is required to build a model for an industrial mass transfer process with an unknown model structure, the problem of structural identifiability remains relevant [14].

Among the available numerical nonparametric methods for extracting dependences from data for mass transfer processes and estimating the identifiability of plants, the most effective approach is based on the alternating conditional expectation (ACE) algorithm [15].

In relation to this, it is proposed to use the ACE algorithm and an additional input variable, which is not correlated to the response, to analyze the structural identifiability of the studied plant. The characteristic feature of this study is the analysis of the structural identifiability of the model to estimate the quality indicator of the output product of a nonlinear mass–transfer plant (MTP) based on the experimental data. In addition, the analysis of the structural identifiability of models is not limited to clarifying the fundamental possibility of an unambiguous estimation of parameters \(F(X,B)\). Considerable attention is paid to identifying the defined transformations \(F(X,B)\), describing the studied plant and affecting the accuracy of the structural identifiability index. For this, the concept of the threshold value of the structural identifiability index of the MTP model is introduced. It is based on taking into account the physicochemical characteristics of the MTP under consideration.

2 DESCRIPTION OF MASS-TRANSFER PLANT AND PROBLEM STATEMENT

The problem of constructing a model for estimating the reagent’s content (%) in the output product (bottom product) of the MTP in the case when the structure of the model is unknown is considered. The investigated MTP is shown in Fig. 1 and consists of two distillation columns (C‑1 and C‑2) and a synthesis reactor located between them.

Fig. 1.
figure 1

Technological scheme of MTP.

The structure of the rigorous model of the plant is rather complicated for practical application. In the general form, it can be represented as a system of equations for each kth stage of separation for each lth component, which includes material balance equations, energy balance equations, and phase equilibrium equations [16]:

$$\left\{ \begin{gathered} {{L}_{{k + 1}}}{{{\tilde {x}}}_{{k + 1,l}}} + {{V}_{{k - 1}}}{{{\tilde {y}}}_{{k - 1,l}}} + {{F}_{k}}{{z}_{{k,l}}} - {{L}_{k}}{{{\tilde {x}}}_{{k,l}}} - {{V}_{k}}{{{\tilde {y}}}_{{k,l}}} = 0, \hfill \\ {{L}_{{k + 1}}}{{h}_{{k + 1}}} + {{V}_{{k - 1}}}{{H}_{{k - 1}}} + {{F}_{k}}{{H}_{{{{F}_{k}}}}} - {{L}_{k}}{{h}_{k}} - {{V}_{k}}{{H}_{k}} = 0, \hfill \\ \tilde {y}_{{k,l}}^{*} = {{{\tilde {x}}}_{{k,l}}}\gamma _{{k,l}}^{L}{\kern 1pt} ({{p_{{k,l}}^{0}} \mathord{\left/ {\vphantom {{p_{{k,l}}^{0}} P}} \right. \kern-0em} P}), \hfill \\ {{E}_{k}} = ({{{\tilde {y}}}_{{k,l}}} - {{{{{\tilde {y}}}_{{k + 1,l}}})} \mathord{\left/ {\vphantom {{{{{\tilde {y}}}_{{k + 1,l}}})} {(\tilde {y}_{{k,l}}^{*}}}} \right. \kern-0em} {(\tilde {y}_{{k,l}}^{*}}} - {{{\tilde {y}}}_{{k + 1,l}}}), \hfill \\ \sum\limits_{l = 1}^c {{{{\tilde {y}}}_{{k,l}}}} - 1 = 0, \hfill \\ \sum\limits_{l = 1}^c {{{{\tilde {x}}}_{{k,l}}}} - 1 = 0, \hfill \\ \end{gathered} \right.\left( \begin{gathered} l = 1, \ldots ,c, \hfill \\ k = 1, \ldots ,N \hfill \\ \end{gathered} \right),$$
(1)

where \({{\tilde {y}}_{{k,l}}}\) is the concentration of the lth component on the kth stage in the vapor phase; \({{L}_{{k + 1}}}\) is the flow of the fluid entering the kth stage; \({{\tilde {x}}_{{k + 1,l}}}\) is the concentration of the lth component arriving at the kth stage in the liquid phase; \({{V}_{{k - 1}}}\) is the flow of the steam leaving the kth stage; \({{\tilde {y}}_{{k - 1,l}}}\) is the concentration of the lth component leaving the kth stage in the vapor phase; \({{F}_{k}}\) is the consumption of raw materials supplied to the kth stage;\({{z}_{{k,l}}}\) is the amount of the lth component in raw materials supplied to kth stage; \({{L}_{k}}\) is the flow of the fluid on the kth stage; \({{\tilde {x}}_{{k,l}}}\) is the concentration of the lth component on the kth stage in the liquid phase; \({{V}_{k}}\) is the flow of the steam on the kth stage; \(\gamma _{{k,l}}^{L}\) is the activity coefficient of the lth component in the liquid phase on the kth stage (the UNIQUAC model is used); \(p_{l}^{0}\) is the partial pressure of the lth component;\(P\) is the total pressure in the system; \({{E}_{k}}\) is the efficiency of the mass transfer according to Murphree on the kth stage; \({{h}_{{k + 1}}}\) is the enthalpy of the fluid entering the kth stage; \({{H}_{{k - 1}}}\) is the enthalpy of the vapor leaving kth stage; \({{H}_{{{{F}_{k}}}}}\) is the enthalpy of the power on the kth stage; \({{h}_{k}}\) is the enthalpy of the liquid on the kth stage; \({{H}_{k}}\) is the enthalpy of the vapor at the kth stage; c is the total number of components in the system; and N is the total number of stages in the distillation column.

The main problem with using the rigorous model is that \({{E}_{k}}\) is an unknown quantity. Also, the composition of feed is not known; therefore, it is impossible to use the analytical model directly to estimate the concentration of the reagent in the bottom product. Therefore, in practice, linear regression models of the form

$$\hat {Y} = {{\hat {\beta }}_{0}} + \sum\limits_{i = 1}^p {{{{\hat {\beta }}}_{i}}} {{x}_{i}},$$
(2)

where \({{x}_{i}}\) are the input variables available for measurement at each time period; \(\hat {Y}\) is the estimated value of the output variable of plant \(Y\); p is the number of input variables; \({{\hat {\beta }}_{0}}\) is a free coefficient of the model; and \({{\hat {\beta }}_{i}}\) are the coefficients of the model’s parameters.

In the case of using multiple regression, the structure of the model must be determined, which reduces the problem to estimating the coefficients of the model’s parameters. When the relationship between the response and the predictors is unknown or inaccurate, linear parametric regression can lead to erroneous results. The most effective approach for analyzing the structural identifiability of models for assessing quality indicators in the output of nonlinear MTPs under structural uncertainty is a nonparametric approach based on the ACE algorithm. This is justified by the fact that the optimal transformations obtained as a result of using ACEs do not require a priori assumptions about the form of functions that connect the output and input variables.

Then for p input variables \({{x}_{i}},\,\,i = 1,...,p\), and output \(Y\), the model of a plant has the following form:

$$Y = F(X,B) + \varepsilon ,$$
(3)

where \(X = ({{x}_{1}}, \ldots ,{{x}_{p}})\) is the vector of input controlled technological variables; \(B = ({{\beta }_{0}},{{\beta }_{1}}, \ldots ,{{\beta }_{p}})\) is the vector of coefficients; and \(\varepsilon \) is the measurement error of the output variable.

The task consists of finding the possibility of constructing an adequate mathematical model for assessing the concentration of the reagent in the cubic product of the process under study on a training sample. An analysis of the structural identifiability of the plant is proposed to be carried out based on the calculation of the structural identifiability index using the ACE algorithm and the introduction of an additional input variable that is not correlated to the output. The structural identifiability index HY refers to the degree of dependence of the output on a different set of input variables. To assess the degree of the structural identifiability of the plant, the calculated values HY are compared to the threshold structural identifiability index Hlv, which is proposed to be determined based on the analytical model of the MTP under consideration (1), taking into account the physicochemical characteristics of the process.

3 DESCRIPTION OF ALGORITHM OF ALTERNATING CONDITIONAL MATHEMATICAL EXPECTATIONS

The ACE regression model has the following general form:

$$\theta (Y) = \sum\limits_{i = 1}^p {{{\phi }_{i}}} ({{x}_{i}}) + \varepsilon ,$$
(4)

where \(\theta \) is the function with the response variable \(Y\); \({{\phi }_{i}}\) are the functions of the input variables (predictors) \({{x}_{i}},i = 1,...,p\).

Thus, the ACE model replaces the problem of estimating a linear function \(p\)-dimensional variable \(X = ({{x}_{1}},{{x}_{2}},...,{{x}_{p}})\) of the assessment \(p\) of individual one-dimensional functions \({{\phi }_{i}}\) and \(\theta \) using an iterative method. These transformations are achieved by minimizing the unexplained deviation of the linear relationship of the transformed response variable from the sum of the transformed variable predictors.

For the given dataset consisting of the response variable \(Y\) and variable predictors \({{x}_{1}},{{x}_{2}},...,{{x}_{p}}\), the ACE algorithm begins with the definition of arbitrary initial transformations \(\theta (Y),{{\phi }_{1}}({{x}_{1}}),...,{{\phi }_{p}}({{x}_{p}})\). The error variance \(({{\varepsilon }^{2}})\), which remained unexplained by the regression of the transformed dependent variables, is equal to the sum of the transformed independent variables provided that \(E\left[ {{{\theta }^{2}}(Y)} \right] = 1\):

$${{\varepsilon }^{2}}(\theta ,{{\phi }_{1}},...,{{\phi }_{p}}) = E{{\left\{ {\left[ {\theta (Y) - \sum\limits_{i = 1}^p {{{\phi }_{i}}({{x}_{i}})} } \right]} \right\}}^{2}}.$$
(5)

The minimization of \({{\varepsilon }^{2}}\) taking into consideration \({{\phi }_{1}}({{x}_{1}}),...,{{\phi }_{p}}({{x}_{p}})\) and \(\theta (Y)\) is calculated through a series of minimizations of unit functions given by the equations

$${{\phi }_{i}}({{x}_{i}}) = E\left[ {\theta (Y) - \sum\limits_{j \ne i}^p {{{\phi }_{j}}({{x}_{j}})\left| {{{x}_{i}}} \right.} } \right],$$
(6)
$$\theta (Y) = E{{\left[ {\sum\limits_{i = 1}^p {{{\phi }_{i}}} ({{x}_{i}})\left| Y \right.} \right]} \mathord{\left/ {\vphantom {{\left[ {\sum\limits_{i = 1}^p {{{\phi }_{i}}} ({{x}_{i}})\left| Y \right.} \right]} {\left\| {E\left[ {\sum\limits_{i = 1}^p {{{\phi }_{i}}} ({{x}_{i}})\left| Y \right.} \right]} \right\|}}} \right. \kern-0em} {\left\| {E\left[ {\sum\limits_{i = 1}^p {{{\phi }_{i}}} ({{x}_{i}})\left| Y \right.} \right]} \right\|}}.$$
(7)

Equations (6) and (7) form the base of the ACE algorithm [15]. The final \({{\phi }_{i}}({{x}_{i}}),\quad i = 1,...,p\), and \(\theta (Y)\) after minimization are estimates of the optimal transformation \(\phi _{i}^{*}({{x}_{i}}),i = 1,...,p\), and \(\theta {\text{*}}(Y)\). The response and predictors are related as follows:

$$\theta {\text{*}}(Y) = \sum\limits_{i = 1}^p {\phi _{i}^{*}} ({{x}_{i}}) + \varepsilon *,$$
(8)

where \(\varepsilon {\text{*}}\) is an error that cannot be fixed using ACE transformations under the assumption of a normal distribution. The minimum regression errors \(\varepsilon {\text{*}}\) and maximum multidimensional correlation coefficient \(\rho {\text{*}}\) are related by \(\varepsilon {{{\text{*}}}^{2}} = 1 - \rho {{{\text{*}}}^{2}}\).

The optimal ACE transformations are obtained numerically based on the data of the technological plant and do not require a priori assumptions about the specific functional form that relates the response to the predictors [17].

4 ALGORITHM FOR ANALYZING THE STRUCTURAL IDENTIFIABILITY OF A NONLINEAR PROCESS

For the analysis of structural identifiability, the base matrix (the data matrix formed from the sample containing the values of the input and output variables) and the number of perturbed data matrices M obtained from the base matrix by adding small random numbers to its elements are used as the initial information.

Step 1. Transform the base matrix into an extended data matrix with the dimension K(p + 2), where K is the number of observations, \(p\) is the number of predictors, p + 1 is an additional normally distributed input not correlated to output ξ with mathematical expectation \(\mu {\kern 1pt} = \,0\) and dispersion \({{\sigma }^{2}}{\kern 1pt} = 2.5\) \(\left( {\zeta \in N( - 2.5;2.5)} \right)\), and p + 2 is the response variable Y.

Step 2. Obtain the base set of vectors of optimal transformations for each input of the studied plant \({{{{\Phi}}}_{i}} = \phi _{i}^{*}({{X}_{i}})\) by applying the ACE algorithm to an extended data matrix,

$$\begin{gathered} \Phi _{i}^{{base}}({{x}_{i}}) = {{(\Phi _{i}^{{base,1}},...,\Phi _{i}^{{base,j}},...,\Phi _{i}^{{base,K}})}^{T}}, \\ i = 1,...,p + 2,\,\,\,{{x}_{{p + 1}}} = \xi ,\,\,\,\,{{x}_{{p + 2}}} = Y, \\ \end{gathered} $$
(9)

and the vector of differences (base matrix of the optimal transformations)

$$\Delta \Phi _{i}^{{base}} = {{(\Delta \Phi _{i}^{{base,1}},...,\Delta \Phi _{i}^{{base,k}},...,\Delta \Phi _{i}^{{base,K - 1}})}^{T}},$$
(10)

where

$$\Delta \Phi _{i}^{{base,k}} = \Phi _{i}^{{base,k + 1}} - \Phi _{i}^{{base,k}},\,\,\,\,k = 1,...,K - 1.$$
(11)

Step 3. From the base matrix we form the set of size matrices K(p + 2) to obtain the vectors of the optimal transformations using the disturbing influences. To achieve this, we add the small random numbers \(\alpha _{k}^{q} = \varepsilon _{k}^{q} \in N\left( { - 2.5;2.5} \right)\) to variables \({{x}_{i}},i = 1,2,...,p,p + 1,p + 2\) and reduce the resulting numbers by 0.02% of the average for y \(\left( {\alpha _{k}^{q} = \varepsilon _{k}^{q} \times 0.0002 \times {{\sum\nolimits_{k = 1}^K {{{y}_{k}}} } \mathord{\left/ {\vphantom {{\sum\nolimits_{k = 1}^K {{{y}_{k}}} } {\bar {y}}}} \right. \kern-0em} {\bar {y}}}} \right)\), \(k = 1,...,K\), \(q = 1,...,M\) (the transformed matrix with the addition of small random numbers \(\alpha _{k}^{q}\)).

Step 4. We find the set of vectors of the optimal transformations and differences:

$$\Phi _{i}^{q}({{x}_{{\alpha ,i}}}) = {{(\Phi _{{\alpha ,i}}^{{q,1}},...,\Phi _{{\alpha ,i}}^{{q,j}},...,\Phi _{{\alpha ,i}}^{{q,K}})}^{{\text{T}}}},\,\,\,\,i = 1,...,p + 2,\,\,\,\,q = 1,...,M,$$
(12)
$$\Delta \Phi _{{\alpha ,i}}^{q} = {{(\Delta \Phi _{{\alpha ,i}}^{{q,1}},...,\Delta \Phi _{{\alpha ,i}}^{{q,k}},...,\Delta \Phi _{{\alpha ,i}}^{{q,K - 1}})}^{T}},$$
(13)

where \({{x}_{{\alpha ,i}}} = x_{i}^{k} + {{\alpha }_{k}}^{q}\), \(\Delta \Phi _{{\alpha ,i}}^{{q,k}} = \Phi _{{\alpha ,i}}^{{q,k + 1}} - \Phi _{{\alpha ,i}}^{{q,k}}\), \(k = 1,...,K - 1\), and \(^{T}\) is the sign of the transposition.

Step 5. Normalize vectors \(\Delta \Phi _{i}^{b}\) and \(\Delta \Phi _{i}^{q}\) and transform the resulting difference vectors (10) and (13) to the following form:

$$\Delta \Phi _{{m,i}}^{{base}} = {{(\Delta \Phi _{{m,i}}^{{base,1}},...,\Delta \Phi _{{m,i}}^{{base,k}},...,\Delta \Phi _{{m,i}}^{{base,K - 1}})}^{T}},$$
(14)
$$\Delta \Phi _{{m,i}}^{q} = {{(\Delta \Phi _{{m,i}}^{{q,1}},...,\Delta \Phi _{{m,i}}^{{q,k}},...,\Delta \Phi _{{m,i}}^{{q,K - 1}})}^{T}},\,\,\,\,q = 1,...,M,$$
(15)

where \(\Delta \Phi _{{m,i}}^{{base,k}} = {{\Delta \Phi _{{\alpha ,i}}^{{base,k}}} \mathord{\left/ {\vphantom {{\Delta \Phi _{{\alpha ,i}}^{{base,k}}} {S_{i}^{{base}}}}} \right. \kern-0em} {S_{i}^{{base}}}}\), \(\Delta \Phi _{{m,i}}^{{q,k}} = {{\Delta \Phi _{{\alpha ,i}}^{{q,k}}} \mathord{\left/ {\vphantom {{\Delta \Phi _{{\alpha ,i}}^{{q,k}}} {S_{i}^{q}}}} \right. \kern-0em} {S_{i}^{q}}}\), index m is the sign of the averaging of the differences

$$\Delta \Phi _{{\alpha ,i}}^{{base,k}},\,\,\,\,\Delta \Phi _{{\alpha ,i}}^{{q,k}},\,\,\,\,S_{i}^{{base}} = ((\Delta \Phi _{{\alpha ,i}}^{{base}} - {{\overline {\Delta \Phi } _{{\alpha ,i}}^{{base}}{{)}^{2}}} \mathord{\left/ {\vphantom {{\overline {\Delta \Phi } _{{\alpha ,i}}^{{base}}{{)}^{2}}} {(K - 2){{)}^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}}}}}}} \right. \kern-0em} {(K - 2){{)}^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}}}}}},$$
$$S_{i}^{q} = ((\Delta \Phi _{{\alpha ,i}}^{q} - {{\overline {\Delta \Phi } _{{\alpha ,i}}^{q}{{)}^{2}}} \mathord{\left/ {\vphantom {{\overline {\Delta \Phi } _{{\alpha ,i}}^{q}{{)}^{2}}} {(K - 2){{)}^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}}}}}}} \right. \kern-0em} {(K - 2){{)}^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0em} 2}}}}}},\,\,\,\,\overline {\Delta \Phi } _{{\alpha ,i}}^{{base}} = \sum\limits_{k = 1}^{K - 1} {{{(\Delta \Phi _{{\alpha ,i}}^{{base,k}})} \mathord{\left/ {\vphantom {{(\Delta \Phi _{{\alpha ,i}}^{{base,k}})} {(K - 1)}}} \right. \kern-0em} {(K - 1)}}} ,$$
$$\overline {\Delta \Phi } _{{\alpha ,i}}^{q} = \sum\limits_{k = 1}^{K - 1} {{{(\Delta \Phi _{{\alpha ,i}}^{{q,k}})} \mathord{\left/ {\vphantom {{(\Delta \Phi _{{\alpha ,i}}^{{q,k}})} {(K - 1)}}} \right. \kern-0em} {(K - 1)}}} .$$

Step 6. Find the deviations of the differences (14) of the base optimal transformations from differences (15) for each \(q = 1,...,M\):

$$\Delta V_{i}^{{q,k}} = \Delta \Phi _{{m,i}}^{{base,k}} - \Delta \Phi _{{m,i}}^{{q,k}},\,\,\,\,i = 1,...,p + 2,\,\,\,\,k = 1,...,K - 1,$$
(16)

from which we form the sequence of vectors

$$\Delta V_{i}^{q} = {{(\Delta V_{i}^{{q,1}},...,\Delta V_{i}^{{q,k}},...,\Delta V_{i}^{{q,K - 1}})}^{{\text{T}}}},\,\,\,\,i = 1,...,p + 2.$$
(17)

Step 7. Obtain a quantitative estimate of the deviations \(\Delta V_{i}^{q}\) from (17):

$$\Delta E_{i}^{q} = \sum\limits_{k = 1}^{K - 1} {\left| {\Delta V_{i}^{{q,k}}} \right|} ,\,\,\,\,i = 1,...,p + 2,\,\,\,\,q = 1,...,M.$$
(18)

Step 8. Determine the structural identifiability index by the ith variable:

$${{H}_{i}} = {{\Delta {{E}_{{m,p + 1}}}} \mathord{\left/ {\vphantom {{\Delta {{E}_{{m,p + 1}}}} {\Delta {{E}_{{m,i}}}}}} \right. \kern-0em} {\Delta {{E}_{{m,i}}}}},\,\,\,\,i = 1,...,p + 2,$$
(19)

where \(\Delta {{E}_{{m,i}}} = \sum\nolimits_{q = 1}^M {{{\Delta E_{i}^{q}} \mathord{\left/ {\vphantom {{\Delta E_{i}^{q}} M}} \right. \kern-0em} M}} \), \(\Delta E_{i}^{q}\) is calculated by (18), and \({{H}_{{p + 2}}} = {{H}_{Y}}\).

Step 9. Compare the resulting structural identifiability indices \({{H}_{Y}}\) from (19) with the corresponding threshold value \({{H}_{{lv}}}\). If \({{H}_{Y}}{\kern 1pt} > {\kern 1pt} {{H}_{{lv}}}\), then the plant is identifiable, otherwise it is not identifiable based on the data provided.

5 ANALYSIS OF THE STRUCTURAL IDENTIFIABILITY USING A SYNTHETIC EXAMPLE

In order to demonstrate the operation of the ACE algorithm, a synthetic example was used to determine the functional dependence between dependent and independent variables, in which functional dependences are known.

Let the plant be defined by a functional dependence of the following form:

$$y = \sin ({{x}_{1}}) + \sin (1.7{{x}_{2}}) + \sin (3.4{{x}_{3}}) + \cos (2.4{{x}_{4}}) + \varepsilon .$$
(20)

According to Eq. (20) and input variables \({{x}_{i}},\;i = 1, \ldots ,4\), on which the restrictions \( - 2.5\, \leqslant \,{{x}_{1}}, \ldots ,{{x}_{4}}\, \leqslant \,2.5\) are placed, a sample of volume \(K{\kern 1pt} = \) 1000, representing a \((K\, \times \,5)\) matrix, is formed. An extended sample is obtained by including in the original sample an additional—uncorrelated to output Y—input \({{x}_{{err}}}\), \({{x}_{{err}}} \in N( - 2.5;2.5),\)and \({{x}_{5}} = {{x}_{{err}}}{\kern 1pt} \times 0.0002\), where \({{x}_{5}}\) is an additional input, uncorrelated with the output, \( - 0.0025 \leqslant \varepsilon \leqslant 0.0025\).

The analysis of the pair correlation coefficients and correlational relations obtained on the initial sample (Table 1) does not allow us to draw a conclusion on the possible structure of the model.

Table 1.   Coefficients of correlation

Applying the ACE algorithm to the extended sample, we form the base set of vectors of the optimal transformations \(\Phi _{i}^{{basic}}({{x}_{{\alpha ,i}}})\), graphically presented in Fig. 2 and indicating a fairly accurately found model structure.

Fig. 2.
figure 2

The result of applying the ACE algorithm to the elements of the base matrix.

For the analysis of structural identifiability at M = 25 (where M is the number of iterations of cycle repetition), 25 vectors of the optimal transformations were obtained; and they were compared with the base estimates of the model.

Table 2 shows the results of applying the proposed approach.

Table 2.   Parameters of structural identifiability for \({{H}_{i}}\)

The value \(\Delta {{E}_{{m,5}}} = 0.0394\) means that the parameter at input \({{x}_{5}}\) is unidentifiable. The other values \(\Delta {{E}_{{m,i}}}\) (the average sum of the distances between the points of the model’s base estimate from the model’s current estimate for the output and each input) fully confirm the existence of a nonlinear model for the studied plant and can serve as a sign of its identifiability. Quantities \({{H}_{i}}\) reflect the contribution of each variable with respect to an unidentifiable auxiliary input. The results obtained correspond to description (20). Thus, the plant is identifiable, since the value of the identifiability indicator for output \({{H}_{Y}}{\kern 1pt} = \,224.58\), which is significantly more than the specified threshold value \({{H}_{{lv}}}{\kern 1pt} = \,35.16\) with an error of 15% when \({{R}^{2}}{\kern 1pt} < \,0.7\).

The threshold value \({{H}_{{lv}}}\) is determined experimentally by varying \({{H}_{Y}}\) in the error range (%) \(\left[ {0;\;35} \right]\) from the average of each input variable. The results are shown in Fig. 3.

Fig. 3.
figure 3

Dependence \({{H}_{Y}}\) and R2 on noisiness of synthetic data.

If \({{H}_{Y}} < 35.16\), then the model is not identifiable. When \({{H}_{Y}} > 35.16\), the higher \({{H}_{Y}}\) the more accurate the model.

6 ANALYSIS OF STRUCTURAL IDENTIFIABILITY ON THE EXAMPLE OF THE MTP

When building a model for evaluating the concentration of the reagent in the still of the distillation column C−1 the data from a real technological plant were used. The temperature (x1 is TIC, °C) and pressure (x2 is PI, MPa) bottom of the distillation column C−1 were selected as the input data parameters of the model.

The analysis of the pair correlation coefficients and correlational relations obtained on the initial sample (Table 3) does not allow us to make a conclusion about the possible structure of the model.

Table 3.   Coefficients of correlation

To assess the identifiability of M =125, 125 vectors of the optimal transformations were obtained and they were compared with the base estimates of the model (Fig. 4).

Fig. 4.
figure 4

The result of applying the ACE algorithm to the elements of the base matrix of industrial data.

Table 4 presents the results of applying the proposed approach.

Table 4. Parameters of structural identifiability for a real plant

Value \(\Delta {{E}_{{m,3}}} = 0.0236\) for input \({{x}_{3}}\) allows us to conclude that its corresponding transformation ϕ3(x3) (4) is not identifiable. The other values of \(\Delta {{E}_{{m,i}}}\) confirm the existence of a nonlinear model for the studied plant. According to the values of index \({{H}_{i}}\) presented in Table 4, we can conclude that the plant is structurally identifiable since the value of the identifiability index indicator for the output of the test sample \({{H}_{Y}} = 68.1348\), which is significantly more than the specified threshold value \({{H}_{{lv}}} = 22.65\).

The threshold value \({{H}_{{lv}}}\) was determined on the generated data sample of a calibrated rigorous model in the error interval (%) \(\left[ {0;\;25} \right]\) from the average value of each input variable (Fig. 5).

Fig. 5.
figure 5

Dependence \({{H}_{Y}}\) and R2 on the noisiness of data of rigorous model.

In this case \({{R}^{2}} < 0.7\) when the data noise is 10%, which corresponds to the threshold value \({{H}_{{l\nu }}} = 22.65\).

Based on the experimental data of the technological process using various approximation methods (linear, logarithmic, exponential, quadratic) of the variables transformed by the ACE algorithm for the output variable Y and inputs x1 and x2 [17], a model of the following form was obtained:

$$\hat {Y} = 551.78 - 9.95{{x}_{1}} + 0.05x_{1}^{2} + 369.01{{x}_{2}} + 77.24x_{2}^{2} - 3.51{{x}_{1}}{{x}_{2}}.$$
(21)

Table 5 presents the coefficients of determination (R2) and the root-mean-square error (RMSE) of the parametric models obtained by the least squares method (LSM), robust regression (RR), model (21), and a nonparametric model based on the ACE algorithm [18] for the training (training) and test (test) samples.

Table 5. Values R2 and RMSE for the presented models

According to the results presented in Table 5, it can be seen that the nonparametric model constructed based on the ACE algorithm describes the studied MTP more accurately than the other methods. The results of the operation of a nonparametric model based on the ACE algorithm and experimental data are shown in Fig. 6.

Fig. 6.
figure 6

The results of the operation of a nonparametric model constructed on the ACE algorithm to evaluate the quality indicator of the output product in the test sample.

7 CONCLUSIONS

The article presents a method for the analysis of the structural identifiability based on the ACE algorithm with the addition of an additional input variable that is not correlated with the output under the conditions of the unknown structure of the MTP model. The calculated value of the structural identifiability index HY in the experimental data should not be less than its threshold value Hlν, which can be found in advance using the rigorous (taking into account the physicochemical laws) MTP model. Carrying out an analysis of the structural identifiability using the proposed ACE-based approach avoids the endless enumeration of model structures and allows us to determine the limit on the maximum accuracy of the model, as demonstrated by a synthetic example and real experimental data of the technological process.

By the example of constructing models for estimating the concentration of a reagent in the bottom product under the conditions of nonlinearity of the MTP, it is shown that the use of a nonparametric model based on the ACE algorithm improves the accuracy of the model to \(\left( {0.2221 - {{0.0659} \mathord{\left/ {\vphantom {{0.0659} {0.2221}}} \right. \kern-0em} {0.2221}}} \right) \times 100\% \approx 70.3\% \) of the RMSE compared to the nonlinear parametric model (21) on the test sample.