In this paper, the operator approach [13] is used as in [46] to improve the accuracy of solutions produced by some class of neural networks. Let \(C_{1}^{'},C_{2}^{'}, \ldots ,C_{l}^{'}\) be classes that entirely cover the space of initial objects \(X = \{ x{\text{|}}x = ({{x}_{1}},{{x}_{2}}, \ldots ,{{x}_{n}}), x \in {{R}^{n}}\} \), where xi is a feature of the object x and \(~i = 1,~2, \ldots ,n\). Let \({{Q}_{1}},{{Q}_{2}}, \ldots ,{{Q}_{l}}\) be a system of unary two-valued predicates over X such that \({{Q}_{j}}(x) \equiv \langle \langle x \in C_{j}^{'}\rangle \rangle \), \(~x \in X,~j = 1,2, \ldots ,l\). The recognition problem uU is an ordered pair u = (I0, Xq), where \({{I}_{0}} = \langle {{X}^{m}},\omega \rangle ~\) is initial information for the problem u,\({{X}^{m}} = \{ {{x}^{1}},{{x}^{2}}, \ldots ,{{x}^{m}}\} ~\) is a training sample, and \(\omega = {\text{||}}{{\omega }_{{ij}}}{\text{|}}{{{\text{|}}}_{{m \times l}}}\) is the classification matrix of the sample \({{X}^{m}}~({{\omega }_{{ij}}} = {{Q}_{j}}({{x}^{i}}),~~i = 1,2, \ldots ,m;j = 1,2, \ldots ,l.)\). The sample \({{X}^{q}} = \{ {\boldsymbol{x}^{1}},{\boldsymbol{x}^{2}}, \ldots ,{\boldsymbol{x}^{q}}\} \) is a sample of test objects, and the classification matrix \(~f = {\text{||}}{{f}_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\) of the sample Xq (as in the problem u) has to be computed. Here,\({{f}_{{ij}}} = {{Q}_{j}}({\boldsymbol{x}^{i}})\); \(i = 1,2, \ldots ,q;j = 1,2, \ldots ,l\). A model \(\mathfrak{M}(H,{\tilde {\text{x}}},{{\gamma }^{m}},{{\theta }_{1}},{{\theta }_{2}})\) of recognition algorithms was proposed in [1, 2]. Here, H is a piecewise linear surface in \({{R}^{n}}\); \({\tilde {x}} = ({{{\text{x}}}_{{00}}},{{{\text{x}}}_{{11}}},{{{\text{x}}}_{{10}}},{{{\text{x}}}_{{01}}})\) is a binary set of parameters; \({{\tilde {\gamma }}^{m}}\) is a set of parameters of the weight type; and \({{\theta }_{1}} = \min {{\theta }_{{1j}}}\) and \({{\theta }_{2}} = \max {{\theta }_{{2j}}}~~\) are parameters of the decision rule \(R{\text{*}}~~{\text{such that}}~~0 < {{\theta }_{{1j}}} \leqslant {{\theta }_{{2j}}},~j = 1,2, \ldots ,l\) . Correctness conditions for linear and algebraic closures of algorithms of the model \(\mathfrak{M}(H,{\tilde {\text{x}}},{{\tilde {\gamma }}^{m}},{{\theta }_{1}},{{\theta }_{2}})\) over sets of problems with standard information were found in [2]. For regular problems, an analytical form of a correct algorithm was found in [3], namely,

$$\mathcal{A}{\text{*}} = \left( {({{\theta }_{1}} + {{\theta }_{2}})\sum\limits_{i = 1}^q {\sum\limits_{j = 1}^l {{{f}_{{ij}}} \cdot {{B}^{k}}(i,j)} } } \right) \circ R{\text{*,}}$$
((1))
$$k = [({\text{ln}}q + {\text{ln}}l + \ln ({{\theta }_{1}} + {{\theta }_{2}}) - {\text{ln}}{{\theta }_{1}}){\text{/|ln}}{{a}_{0}}{\text{|}}] + 1,$$
((2))

where \({{a}_{0}} = \mathop {\max }\limits_{i,j} \mathop {\max }\limits_{(r,h) \ne (i,j)} {\text{|}}{{\Gamma }_{{rh}}}(i,j){\text{|}}.\) Here, \({\text{||}}{{\Gamma }_{{rh}}}(i,j){\text{|}}{{{\text{|}}}_{{q \times l}}}\) is the matrix of the quasi-basis operator \(B(i,j)\) [3], \(i = 1,2, \ldots ,q;j = 1,2, \ldots ,l\). As an initial model, the model of estimate calculation algorithms was used [3]. In what follows, let each recognition algorithm \(\mathcal{A}\) be such that \(\mathcal{A}\): \(\mathcal{A} = A \circ R{\text{*}}\) [3], where A is a recognition operator and \(R{\text{*}}\) is a threshold decision rule. Given the regular problem u [3], an operator A computes the matrix \(\varphi = {\text{||}}~{{\varphi }_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\), where \({{\varphi }_{{ij}}}\) is an estimate determining the membership of the object \({\boldsymbol{x}^{i}} \in {{X}^{q}}\) in the class Cj. In (1), given φ and k computed according to (2), the decision rule \(R{\text{*}}({{\theta }_{1}},{{\theta }_{2}})\) yields a matrix δ coinciding with the matrix f of the problem u. In a more general case, given φ, the operator \(R{\text{*}}\) computes a matrix that may not coincide with the matrix\(f = {\text{||}}{{f}_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\). Let U be the class of problems with standard information [2]. Our goal is, relying on [13] and the neural network paradigm and using algorithms of the model \(\mathfrak{M}(H,{\tilde {\text{x}}},{{\gamma }^{m}},{{\theta }_{1}},{{\theta }_{2}})\), to show the possibility of constructing a neural network which, given an arbitrary problem \(u \in U\), outputs its matrix f. There are a variety of families of recognition algorithms based on the partition principle [2, 7, 8], and the research of them both earlier and today forms the content of the classical field.

In what follows, let \(A', A{''}, A \in \mathfrak{M}\) be operators. The algebra ϑ over \(\mathfrak{M}\) is constructed using the following operations [2, 3]: (a) \({\text{const}}Au\), (b) \((A' + A'')u\) = \(A'u + A''u\), and (c) \((A' \cdot A'')u\) = \(A'u \cdot A''u\), where, by (c), we mean the power operation Ak. Here, neither the operands of the operations nor the resulting matrices in (a–c) must have elements with large moduli. Suppose that these values are specified on the interval \(( - 1, 1]\). Let Cj = \({{X}^{m}} \cap C_{j}^{'},j = 1,~2, \ldots ,l\).

Furthermore, relying on the model of algorithms \(\mathfrak{M}(H,{\tilde{\text{x}}},~{{\gamma }^{m}},{{\theta }_{1}},{{\theta }_{2}})~\) [2] and the neural network paradigm, we also take into account the weights \({{p}_{1}},{{p}_{2}}, \ldots ,{{p}_{n}}\) of features of objects from Rn. This model is denoted by \(\tilde {\mathfrak{M}}(H,{{\tilde {p}}^{n}},{{\tilde{\text{x}}}},{{\gamma }^{m}},{{\theta }_{1}},{{\theta }_{2}})\), where \({{\tilde {p}}^{n}}\) = \(({{p}_{1}},{{p}_{2}}, \ldots ,{{p}_{n}}),~{{p}_{i}} \geqslant 0\). Let \(\alpha ,\beta \) be characteristics of an object x associated with its membership in a fixed class and its position relative to H. The two-valued predicate H(x) defined on X determines which of the half-spaces relative to H contains the object x. We deal with the values \(\alpha = {{Q}_{j}}(x)\) and \(\beta = H(x)\). For \(\beta :\beta = 1\) if the object x belongs to the positive half-space determined by H in Rn and \(\beta = 0\) otherwise [2]. The values of β for objects from Xm are denoted by \({{\beta }_{t}}\) , t = 1, 2, …, m, while \({{\beta }^{i}}\) is used for objects from \({{X}^{q}}\). The values \({{\beta }^{i}}\) and \({{\bar {\beta }}^{i}}\) will be used later as synoptic weights of fourth-layer neurons in the network j-block (see Fig. 1). The training objects given as input to a neural network are arranged in the form of the list #, \({{x}^{m}}({{\alpha }_{t}},{{\beta }_{t}}),~ \ldots ,~{{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}}), \ldots ,~{{x}^{2}}({{\alpha }_{2}},{{\beta }_{2}}), {{x}^{1}}({{\alpha }_{1}},{{\beta }_{1}})\), which is finished with a special object—the value #. The notation \({{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}})\) is formal and means only that, given Cj and H, the object \({{x}^{t}}\) is assigned with \({{\alpha }_{t}},~{{\beta }_{t}}\). Moreover, neither object \({{x}^{t}}\) nor its weight γ depends on \({{\alpha }_{t}},{{\beta }_{t}}\). For objects from \({{X}^{q}}\), the characteristics are \((\beta ,~{\tilde{\text{x}}})\). The following question arises: is it possible to construct a neural network that reproduces the computations executed by a correct algorithm on the basis of the model \(\tilde {\mathfrak{M}}(H,{{\tilde {p}}^{n}},{\tilde{\text{x}}},{{\gamma }^{m}},{{\theta }_{1}},{{\theta }_{2}})\)?

Assuming that the network is multilayered, initially the activation function of first-layer neurons of the network is defined as

$$F_{{\mu \eta }}^{j}({{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}}),p) = \left\{ \begin{gathered} \frac{{({{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}}),p)}}{{x_{1}^{t}({{\alpha }_{t}},{{\beta }_{t}}) + x_{2}^{t}({{\alpha }_{t}},{{\beta }_{t}}) + \ldots + x_{n}^{t}({{\alpha }_{t}},{{\beta }_{t}})}},\quad ~{\text{if}}\quad {{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}}) \ne \,~\bar {0}~\quad {\text{and}}\quad ~{{\alpha }_{t}} = \mu ,\quad ~{{\beta }_{t}} = \eta \hfill \\ {{p}_{{0~}}}{\text{,}}\quad {\text{if}}\quad {{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}}) = ~\,\bar {0}\quad ~{\text{and}}\quad ~{{\alpha }_{t}} = \mu ,\quad ~{{\beta }_{t}} = \eta \hfill \\ 0~,\quad {\text{if}}\quad ~{{\alpha }_{t}} \ne \mu ~\quad {\text{or}}\quad ~{{\beta }_{t}} \ne \eta , \hfill \\ \end{gathered} \right.$$

where \(\mu ,\eta \in \{ 0,1\} \), \(p = ({{p}_{1}},{{p}_{2}}, \ldots ,{{p}_{n}})~\) are synoptic weights of first-layer neurons of the network, and \({{p}_{{0~}}}~\)is a parameter. For \({{\alpha }_{\# }}\),\(~{{\beta }_{\# }}\) of the object #, we assume that \({{\alpha }_{\# }} = 0\), \({{\alpha }_{\# }} = 1~\) and \(~{{\beta }_{\# }} = 0\), \({{\beta }_{\# }} = 1\) hold simultaneously and, additionally, 0⋅# = #⋅0 = 0. The weight is \(\gamma (\# ) = 0\). Ignoring the computations on the adder, we set \({{F}_{{\mu \eta }}}(\# ({{\alpha }_{\# }},{{\beta }_{\# }}),p) = \# \). For brevity, along with the notation \({{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}})\), we also use the notation \(x(\alpha ,\beta )\) or, as in Fig. 1, the notation \({{x}^{t}}(\alpha ,\beta ).\) Define

$$D_{{\mu \eta }}^{j} = \mathop \sum \limits_{x:~\alpha = \mu ,~\beta = \eta ,~x \in {{X}^{m}}} \gamma (x(\alpha ,\beta )) \cdot F_{{\mu \eta }}^{j}(x(\alpha ,\beta ),p),$$

where \(\gamma (x(\alpha ,\beta ))~\) is the weight of the training object \(x(\alpha ,\beta )\). Relying on the method for computing the estimate \({{{\Gamma }}_{{ij}}}\) by an operator of the model \(\mathfrak{M}\) and taking into account the neural network paradigm, we assume that, for the given problem u, an operator \(A\) of the model \(\tilde {\mathfrak{M}}\) calculates an estimate determining the membership of the object \({\boldsymbol{x}^{i}}\) in the class \({{C}_{j}}\), i.e., the element \({{{\Gamma }}_{{ij}}}\) of the matrix \({{\left\| {{{{\Gamma }}_{{ij}}}} \right\|}_{{q \times l}}}\) defined as

$$\begin{gathered} ~{{{\Gamma }}_{{ij}}} = \frac{{{{{\text{x}}}_{{00}}}({{x}^{i}}) \cdot D_{{00}}^{j} + {{{\text{x}}}_{{11}}}({{x}^{i}}) \cdot D_{{11}}^{j}}}{{{{{\text{x}}}_{{10}}}({{x}^{i}}) \cdot D_{{10}}^{j} + {{{\text{x}}}_{{01}}}({{x}^{i}}) \cdot D_{{01}}^{j} + 1}} \\ {\text{if}}\quad {\text{H(}}{{x}^{i}}{\text{)}} > 0,~ \\ \end{gathered} $$
((3) )
$$\begin{gathered} ~{{{\Gamma }}_{{ij}}} = \frac{{{{{\text{x}}}_{{10}}}({{x}^{i}}) \cdot D_{{10}}^{j} + {{{\text{x}}}_{{01}}}({{x}^{i}}) \cdot D_{{01}}^{j}}}{{{{{\text{x}}}_{{00}}}({{x}^{i}}) \cdot D_{{00}}^{j} + {{{\text{x}}}_{{11}}}({{x}^{i}}) \cdot D_{{11}}^{j} + 1}}~ \\ {\text{if}}\quad {\text{H}}({{x}^{i}}) < 0, \\ \end{gathered} $$
((4))

where \({{{\Gamma }}_{{ij}}}\) is an intermediate estimate to be processed at internal network layers. The main block of the network is represented in the form of a four-layer plane neural network with the nonstandard part consisting of the second to fourth layers (Fig. 1); this part of the network is called a j-block.

Inspecting the first-layer neurons (Fig. 1), we see that output in each of them corresponds to the jth class and is directed toward the input of a second-layer neuron of the j-block; this input has the synoptic weight \(\gamma ({{x}^{t}}({{\alpha }_{t}},{{\beta }_{t}}))\). Output of the other first-layer neuron with the same value computed by the adder is directed toward its own class; moreover, for each class, the object \({{x}^{t}}\) has its own characteristic \({{\alpha }_{t}}\). A second-layer neuron accumulates the estimate \(D_{{\mu \eta }}^{j}.\) Partially defined on \({{X}^{m}} \cup \{ \# \} \), the activation function of second-layer neurons is \(f_{{\mu \eta }}^{j}{\text{(}}{{x}^{t}},{\tilde{\text{x}}}) = {{{\text{x}}}_{{\mu \eta }}} \cdot D_{{\mu \eta }}^{j}\) if \({{x}^{t}} = \# \), and \(f_{{\mu \eta }}^{j}({{x}^{t}},{\tilde{\text{x}}}) = \Delta \) (not defined) means only that \({{x}^{t}}\) is updated if \({{x}^{t}} \ne \# ,\) i.e., the value \(f_{{\mu \eta }}^{j}({{x}^{t}},{\tilde{\text{x}}})\) is not defined if the inner loop is not completed (\({{x}^{t}} \ne \# \)). For each \(x \in {{X}^{q}}\), the objects \({{x}^{t}}\) from \({{X}^{m}}\) are sequentially given to the input of the network. The inner loop for the given \({{x}^{i}} \in {{X}^{q}}\) is completed at \({{x}^{t}} = \# \) with computing the ith row of the matrix \(\delta = {{\left\| {{{\delta }_{{ij}}}} \right\|}_{{q \times l}}}\) at the output of the neural network. The outer loop is completed with the exhaustion of \({{X}^{q}}\). Each neuron in the second-layer stratification has one more input from its own output in order to accumulate the estimate \(D_{{\mu \eta }}^{j}\), and its synoptic weight is \({{d}_{0}} = 1\) as long as \({{x}^{t}} \ne \# ~\). For \({{x}^{t}} = \# \), the synoptic weight \({{d}_{0}}\) becomes “0” and recovers the value of “1” when a new \({{x}^{i}} \in {{X}^{q}}\) is given to the input of the network. The event \(~{{x}^{t}} = \# \) (exhaustion of \({{X}^{m}}\)) takes place for all \(j = 1,2, \ldots ,l\) simultaneously, which leads to a change of the object from \({{X}^{q}}\). Note that each neuron in the network preserves the adder-computed value, say \(s\), only for a short time until \(s\) is transmitted as an output value to the neuron’s activation function or until s is transmitted as an intermediate value, as in the case of a second-layer neuron, according to the feedback principle, sending \(s\) to its own input for the next iteration with updated \({{x}^{t}}.~\)The activation function \(R\) of third-layer neurons of the j-block is a diagonal one such that the value at the adder is the output of \(R\). The fourth layer of the j-block consists of a nonstandard neuron with two synchronously functioning adders: Σ1, Σ2 and with the two-argument activation function \({{F}_{c}}(a,b) = c\frac{a}{{b + 1}}~\) with a parameter c. Depending on the value of the synoptic weight \({{\beta }^{i}}\) (see Fig. 1), the value of this activation function is one computed by formula (3) or (4) multiplied by c. Let \({{U}^{{0~}}}\) be the class of problems with standard information that do not contain isolated classes [2], and let \(f = {\text{||}}{{f}_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\) be the classification matrix of the problem \(u.\) Relying on what was described above and the model \(\tilde {\mathfrak{M}}(H,~{\tilde{\text{x}}},{{\tilde {p}}^{n}},{{\gamma }^{m}},{{\theta }_{1}},{{\theta }_{2}})\), in view of Theorems 3 and 4 from [2], we formulate the following result.

Theorem 1.Let \(u = ({{I}_{0}},~{{X}^{q}})~\) be an arbitrary problem from \({{U}^{0}}\). Then, for the problem\(u\), asix-layerplane neural network with\(l\)blocks (j-blocks) can be constructed such that, given the classification matrix\(f = {\text{||}}{{f}_{{ij}}}{\text{|}}{{{\text{|}}}_{{qxl}}}\)of the problem\(u\)and the parameter \(k\),the matrix\(\delta = {{\left\| {{{\delta }_{{ij}}}} \right\|}_{{q \times l}}}\)output by the network coincides with the matrix\(f\)of the problem\(u.\)

To expand the class of problems for which a neural network can be constructed that outputs the classification matrix of a problem, the neural network is supplemented with an additional intermediate layer consisting of a nonstandard neuron with an activation function depending on two variables and a parameter c. This function is denoted by Note that . Recall that a neuron of this kind was used previously as a fourth-layer neuron in the j-block.

The shaded rectangular areas in Fig. 2 correspond to the first layer of neurons of the network shown in Fig. 1. The constructed network fragment, which consists of two j-blocks j', j'', is supplemented with a fifth-layer neuron (Fig. 2). By analogy with the above-mentioned structure, the resulting one is called a J-block and the network itself is called a J-network (Fig. 3.). Assume that \(J = 1, 2, \ldots ,l\) and each J-block computes its own estimate. In fact, for each \(r = 1,2, \ldots ,q\), the rows of the estimate matrix \({{\left\| {{{{\Gamma }}_{{rh}}}(i,j)} \right\|}_{{q \times l}}}\) of the quasi-basis operator \(B(i,j)\) are sequentially calculated in the fifth layer. Eventually, we obtain a seven-layer plane neural network (Fig. 3), where the fifth layer contains l blocks (J-blocks).

Figure 3 shows an adder of the fifth layer as \({\Sigma }_{5}^{J}\), which is already known to represent the pair \(\sum\nolimits_5^J = \left( {\sum\nolimits_5^1 {,\sum\nolimits_5^2 {} } } \right)\) (see Fig. 2). The sixth layer of the network has to implement the power operation for each quasi-basis operator: \({{{\text{B}}}^{k}}(i,j)\), \(1 \leqslant i \leqslant q;1 \leqslant j \leqslant l\), for which \({{f}_{{ij}}} \ne 0\). The neuron of this operation has two inputs, and the synoptic weight of one of them is \({{{\text{d}}}_{1}}\). Initially, \({{{\text{d}}}_{1}} = 1\) to send the previously computed estimate to the adder. The synoptic weight d2 of the second input is equal to 0. Next, the value of \({{{\text{d}}}_{1}}\) is replaced by 0 and becomes the synoptic weight of the second input of this neuron. This input receives the feedback from the neuron output and, by means of the activation function, given \(k\) and \({{a}_{0}}~\)(see (2)) the neuron computes an element of the matrix of the operator \({{{\text{B}}}^{k}}(i,j)\) [9]. The output of this neuron is “strengthened” be the value \(\theta = {{\theta }_{1}} + {{\theta }_{2}}\) for pairs \({{f}_{{{{i}_{{v}}},{{j}_{{v}}}}}} = 1\) such that (Fig. 3). Here, the activation function is a partially defined diagonal activation function. The adders of seventh-layer neurons sequentially compute the rows of the matrix \(\varphi = {\text{||}}{{\varphi }_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\), which are used by the activation function to compute the corresponding rows of the matrix \(\delta = {\text{||}}{{\delta }_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\). Let U be the class of problems [2] that satisfy the following conditions: (c) \(~{{X}^{q}} \cap {{X}^{m}} = \emptyset \) and (d) \({{C}_{{j'}}} \ne {{C}_{{j{''}}}}\) for \(j' \ne j''\,:\,1 \leqslant j', j'' \leqslant l~\).

Theorem 2.Let \(u = ({{I}_{0}},~{{X}^{q}})\) be an arbitrary problem from U. Then, for the problem u, aseven-layerplane neural network with l blocks (J-blocks) can be constructed such that, given the classification matrix f = \({\text{||}}{{f}_{{ij}}}{\text{|}}{{{\text{|}}}_{{{\text{qx}}l}}}\) of the problemuand the parameter k,the matrix δ = \({\text{||}}{{\delta }_{{ij}}}{\text{|}}{{{\text{|}}}_{{q \times l}}}\) output by the network coincides with the matrix f of the problem u.

Note that the construction of j-blocks of the network does not require the construction of a piecewise linear surface H; instead, it is sufficient to specify the characteristics of the objects from\({{X}^{q}} \cup {{X}^{m}}\) for all cases from [2]. The parameters \({{\gamma }_{1}},{{\gamma }_{2}}, \ldots ,{{\gamma }_{m}}\) and \({\tilde {x}} = ({{{\text{x}}}_{{00}}},{{{\text{x}}}_{{11}}},{{{\text{x}}}_{{10}}}, {{{\text{x}}}_{{01}}})\) are determined according to the cases of [2], the parameter c is specified according to [2, 3]; the feature weights are \({{p}_{1}} = {{p}_{2}} = \ldots = {{p}_{n}} = {{p}_{0}} = 1\); and the parameters θ1, θ2 are such that \(0 < {{\theta }_{1}} \leqslant {{\theta }_{2}} < 1\) and \(~{{\theta }_{1}} \leqslant (1 - {{\theta }_{2}}){\text{/}}2\). Overall, all these parameters ensure the construction of a neural J-network for the problem \(~u \in U\).

figure 1

Fig. 1.

Fig. 2.
figure 2

J-block.

Fig. 3.
figure 3

Seven-layer J-network.