Keywords

1 Introduction

Quite often in practice, there are problems associated with the compatibility of the functional and hardware-software parts of the device. These problems are very complex and require an integrated approach. Their solution leads to a change in qualitative and quantitative characteristics according to specified requirements.

Artificial intelligence algorithms require complex use of hardware and software. Due to specific nature of the research, such basic indicators as productivity, diminutiveness and low economic costs associated with the production and maintenance of the devices being developed remain unchanged. The approach based on modeling of artificial neural networks is versatile and flexible, but has limitations related to the field of their application. Among the disadvantages inherent to the computer of von Neumann architecture, we can distinguish the following:

  • virtualization of calculators, architecture, physical processes;

  • the dependence of the processing time on the size of the program;

  • unjustified growth of hardware costs when increasing productivity;

  • low energy efficiency, etc.

At present, there is an increasing number of specialized intellectual architectures aimed at overcoming the described drawbacks [1,2,3,4,5,6,7,8]. Such devices have wide application range and are compatible with the environment of the computer system, but they also have some disadvantages. Generally, math models which use the “continuous mathematics” are dominant in the construction of modern digital devices, while the discrete basis remain without much attention. However, solving the problem of constructing effective computing devices it is impossible to ignore the compatibility level of the mathematical apparatus and the computer platform used for its implementation. In the field of artificial intelligence, this problem becomes urgent during the development of specialized computers based on the neural network paradigm.

Existing mathematical models of a neuron operate with continuous quantities, are realized on the basis of an analog elements, which leads to their poor compatibility with digital equipment. But at the same time, most neural networks use the principles of digital logic [2,3,4, 6,7,8]. And as the result, in promising computing devices being developed multi-level systems of models are implemented. These systems introduce certain disadvantages in the final implementation of the solution [9, 10].

In this paper, a method for constructing a neural-like architecture based on discrete trainable structures is proposed to improve the compatibility of artificial neural network models in the digital basis of programmable logic chips and general-purpose processors.

2 Model of the Gate Neural Network

The trainable gate network is representative of Boolean networks [5, 11,12,13,14,15,16] with the ability to specify the type of mapping of the vector of input signals to the output vector, using the learning algorithm. Such a network can be considered as an attempt to combine certain features of neural network technology and combinational logic gates to achieve a synergistic effect in the implementation of high-performance embedded systems.

We obtain a formalized representation of this type of network. It is known from dicrete mathematics that the full disjunctive normal form (FDNF) can be represented as follows:

$$ f\left( {x_{1} , \ldots ,x_{P} } \right) = \mathop \vee \limits_{\begin{subarray}{l} \left( {\sigma_{1} , \ldots ,\sigma_{P} } \right) \\ f\left( {\sigma_{1} , \ldots ,\sigma_{P} } \right) = 1 \end{subarray} } x_{1}^{{\sigma_{1} }} \wedge , \ldots , \wedge x_{P}^{{\sigma_{P} }} , $$
(1)

while the disjunction of all sets has the form:

$$ y = f\left( {\sigma_{1} , \ldots ,\sigma_{P} } \right) = 1 $$
(2)

Rule (2) can be reformulated as a disjunction over all full product terms (FPT) of P variables:

$$ \mathop \vee \limits_{n = 1}^{{2^{P} }} \psi_{n} \left( {\mathbf{x}} \right) = 1 $$
(3)

Then the minimal term can be written in the following way:

$$ \psi_{n} \left( {\mathbf{x}} \right) = \mathop \wedge \limits_{p = 1}^{P} x_{p}^{{M_{p} \left( {n - 1} \right)}} $$
(4)

Next, we define the function Mp (α):

$$ M_{p} \left( \alpha \right) = \left\{ {\begin{array}{*{20}l} {0,} \hfill & {\alpha \in \, \left[ {i \cdot T;\,\left( {i + 0,5} \right) \cdot T} \right),} \hfill \\ {1,} \hfill & {{\text{otherwise}}.} \hfill \\ \end{array} } \right. $$
(5)

where the period \( T = 2^{p} ,\,i = 0,\,1,\,2, \ldots ,\frac{{2^{P} }}{T} - 1. \)

The function (5) is square wave logical basis, similar to the Rademacher function [17]. Figure 1 shows the form of this function for p ≤ 3.

Fig. 1.
figure 1

View of the square wave function for p ≤ 3

The square wave function masks each variable included in Eq. (4) with the goal of specifying all FPTs. Next, we represent the FPT (3) in vector form:

$$ {\mathbf{\uppsi }} = \left[ {{\mathbf{\uppsi }}_{1} \left( {\mathbf{x}} \right),\;{\mathbf{\uppsi }}_{2} \left( {\mathbf{x}} \right),\;...,\;{\mathbf{\uppsi }}_{N} \left( {\mathbf{x}} \right)} \right], $$
(6)

where x—the column vector of input signals:

$$ {\mathbf{x}} = \left[ {x_{1} ,\;x_{2} ,\; \ldots ,\;x_{P} } \right]^{T} . $$
(7)

Next, we weigh functions of input signals in vector form, which is known from the theory of neural networks [1, 18, 19]:

$$ {\mathbf{w}}^{T} \wedge \mathbf{\uppsi } = {\mathbf{y}}, $$
(8)

where w—the column vector (9), and y—the column vector (10):

$$ {\mathbf{w}} = \left[ {w_{1} ,\;w_{2} ,\; \ldots ,\;w_{N} } \right]^{T} , $$
(9)
$$ {\mathbf{y}} = \left[ {y_{1} ,\;y_{2} ,\; \ldots ,\;y_{S} } \right]^{T} . $$
(10)

The matrix Eq. (8) has a similar form with the equation describing the formal neuron, radial basis function network [18, 19] and also the sigma-pi network [20], but in this case the multiplication operation is replaced by the conjunction operation, since the matrices have a binary form.

For a network containing one element in the output layer, we get the following expression:

$$ {\mathbf{w}}^{T} \wedge \mathbf{\uppsi } = \left[ {w_{1} ,\;w_{2} ,\; \ldots ,\;w_{N} } \right] \wedge \left[ {\begin{array}{*{20}c} {\mathbf{\uppsi }_{1} \left( {\mathbf{x}} \right)} \\ {\mathbf{\uppsi }_{2} \left( {\mathbf{x}} \right)} \\ \cdots \\ {\mathbf{\uppsi }_{N} \left( {\mathbf{x}} \right)} \\ \end{array} } \right] = \mathop \vee \limits_{n = 1}^{N} w_{n} \wedge \mathbf{\uppsi }_{n} \left( {\mathbf{x}} \right). $$
(11)

Next, we substitute (4) into (11), and obtain the following relation in the general form:

$$ y = \mathop \vee \limits_{n = 1}^{N} w_{n} \wedge \mathop \wedge \limits_{p = 1}^{P} x_{p}^{{M_{p} \left( {n - 1} \right)}} $$
(12)

The Eq. (12) is the model of a Boolean (gate) trainable network. It follows from expression (12) that in such model there are no operators inherent to neural networks, since they are bit-oriented. Weights are Boolean variables there, and not real numbers. This model describes a two-layer network in which the first layer is represented by a set of N constituent units (4), besides this layer does not require training. The output layer is represented by one disjunctive element, which summarizes the minterms, enabled by means of weight coefficients.

A similar dependence can be obtained for a network with several elements in the output layer:

$$ {\mathbf{w}}^{T} \wedge \mathbf{\uppsi } = \left[ \begin{aligned} w_{11} ,\;w_{12} ,\; \ldots ,\;w_{1N} \\ w_{21} ,\;w_{22} ,\; \ldots ,\;w_{2N} \\ \cdots \\ w_{S1} ,\;w_{S2} ,\; \ldots ,\;w_{SN} \\ \end{aligned} \right] \wedge \left[ {\begin{array}{*{20}c} {\mathbf{\uppsi }_{1} \left( {\mathbf{x}} \right)} \\ {\mathbf{\uppsi }_{2} \left( {\mathbf{x}} \right)} \\ \cdots \\ {\mathbf{\uppsi }_{N} \left( {\mathbf{x}} \right)} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {\mathop \vee \limits_{n = 1}^{N} w_{1n} \wedge \mathbf{\uppsi }_{n} \left( {\mathbf{x}} \right)} \\ {\mathop \vee \limits_{n = 1}^{N} w_{2n} \wedge \mathbf{\uppsi }_{n} \left( {\mathbf{x}} \right)} \\ \cdots \\ {\mathop \vee \limits_{n = 1}^{N} w_{sn} \wedge \mathbf{\uppsi }_{n} \left( {\mathbf{x}} \right)} \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} {y_{1} } \\ {y_{2} } \\ \cdots \\ {y_{S} } \\ \end{array} } \right]. $$
(13)

Then the Eq. (6) for each output can be written in a general form:

$$ y_{s} = \mathop \vee \limits_{n = 1}^{N} w_{sn} \wedge \mathop \wedge \limits_{p = 1}^{P} x_{p}^{{M_{p} \left( {n - 1} \right)}} $$
(14)

The analysis of dependences (13) and (14) shows that it is possible to synthesize on their basis an arbitrary combination device with P inputs and S outputs, which has two levels of gates and has an increased speed in hardware implementation. These formulas represent a trainable logical basis. Figure 2 shows a graph of the network.

Fig. 2.
figure 2

Trainable gate neural network

It is known that the maximum number of combinations of P variables is equal to 2P, and the number of functions is \( 2^{{2^{P} }} \). It follows that the number of neurons of the first layer is not more than 2P:

$$ N \le 2^{P} . $$
(15)

In turn, the number of neurons in the output layer is less than \( 2^{{2^{P} }} \):

$$ S \le 2^{{2^{P} }} . $$
(16)

Thus, the maximum sum from (15) and (16) describes the largest network without repeating elements. However, duplication of elements can be aimed to increasing the reliability of the network.

It is not difficult to show that the obtained model can be realized in the form of a full conjunctive normal form (FCNF). On the basis of de Morgan’s laws for several variables [21], we can show:

$$ \mathop \vee \limits_{n = 1}^{N} a_{n} = \overline{{\mathop \wedge \limits_{n = 1}^{N} \bar{a}_{n} }} . $$
(17)

Applying the rule (17) to expression (12) we obtain:

$$ y_{s} = \overline{{\mathop \wedge \limits_{n = 1}^{N} \left( {\bar{w}_{sn} \vee \mathop \vee \limits_{p = 1}^{P} x_{p}^{{\bar{M}_{p} \left( {n - 1} \right)}} } \right)}} . $$
(18)

Next, replacing the variables, we get the FCNF:

$$ \lambda_{s} = \mathop \wedge \limits_{n = 1}^{N} \left( {m_{sn} \vee \mathop \vee \limits_{p = 1}^{P} x_{p}^{{W_{p} \left( {n - 1} \right)}} } \right) . $$
(19)

Equations (12) and (19) are equivalent in essence like the FCNF and the FDNF are equivalent. It is seen from (19) that the weighing is performed by the disjunction operation, in contrast to (12).

3 Network Learning Algorithm

The learning algorithm of the perceptron according to the Widrow-Hoff rule is known from the theory of neural networks, [18, 19]:

$$ w_{sn} \left( {t + 1} \right) = w_{sn} \left( t \right) + \Delta w_{sn} \left( t \right) , $$
(20)
$$ \Delta w_{sn} \left( t \right) = x_{n} \left( t \right) \cdot \left( {d_{s} - y_{s} \left( t \right)} \right) , $$
(21)

On the basis of (20) and (21), it is easy to see the following:

  • weight w sn can increase or decrease depending on the sign of the increment of weight Δw sn ;

  • weight change occurs when the output signal y s deviates from the reference d s only for the input x n which causes this influence.

Using these statements, we can show the training algorithm for a binary network. We convert these formulas into a system of residual classes. It is known that additive operations and multiplication will look like the following [22]:

$$ \left( {a \pm b} \right)\bmod c = \left( {\left( {a\bmod c} \right) \pm \left( {b\bmod c} \right)} \right)\bmod c, $$
(22)
$$ \left( {a \cdot b} \right)\bmod c = \left( {\left( {a\bmod c} \right) \cdot \left( {b\bmod c} \right)} \right)\bmod c. $$
(23)

We describe (20) and (21), using (22) and (23). Then the Widrow-Hoff rules will take the form which is typical for operations performed by digital devices:

$$ \begin{aligned} w_{sn} \left( {t + 1} \right)\bmod q = \left( {w_{sn} \left( t \right) + \Delta w_{sn} \left( t \right)} \right)\bmod q \hfill \\ = \left( {w_{sn} \left( t \right)\bmod q + \Delta w_{sn} \left( t \right)\bmod q} \right)\bmod q, \hfill \\ \end{aligned} $$
(24)
$$ \begin{aligned} \Delta w_{sn} \left( t \right)\bmod q = \left( {x_{n} \left( t \right) \cdot \left( {d_{s} - y_{s} \left( t \right)} \right)} \right)\bmod q \hfill \\ = \left( {\left( {x_{n} \left( t \right)\bmod q} \right) \cdot \left( {\left( {d_{s} } \right)\bmod q - y_{s} \left( t \right)\bmod q} \right)} \right)\bmod q, \hfill \\ \end{aligned} $$
(25)

where q is a positive integer.

It is required that all variables (24) and (25) could accept only two states, or that the modulo is equal 2. Considering that additive operations can be replaced by the exclusive-OR operation and multiplication—by conjunctions, the Widrow-Hoff rule will be written in the following form:

$$ w_{sn} \left( {t + 1} \right) = w_{sn} \left( t \right) \oplus x_{n} \left( t \right) \wedge \left( {d_{s} \oplus y_{s} \left( t \right)} \right). $$
(26)

We apply rule (26) to the received network model (12). Taking into account the influence of minterms (4) on the learning element, we obtain the learning rule for the Boolean network:

$$ w_{sn} \left( {t + 1} \right) = w_{sn} \left( t \right) \oplus \left( {d_{s} \oplus y_{s} \left( t \right)} \right) \wedge \mathop \wedge \limits_{p = 1}^{P} \left( {x_{p} \left( t \right)} \right)^{{M_{p} \left( {n - 1} \right)}} . $$
(27)

4 Analysis of the Results

On the basis of the dependence (12), the following features of the model can be noted:

  • the model is a network;

  • first and second layer have specialization;

  • signals can be either excitatory or inhibitory;

  • the type of generalization is different for FDNF and FCNF networks;

  • there is no influence of minterms (maxterms) on each other.

Unlike formal models of neural networks, the Boolean network operates with the concepts of discrete mathematics. From the point of view of an intelligent approach, only binary input signals processing may seem insufficient when working with higher-order sets, but the feature of the obtained formulas (12), (19) is in the possibility of applying them as a logical basis controlled by weight coefficients. It is known that on the basis of a Boolean basis arbitrary combinational devices are constructed. Furthermore, with the actual implementation of the trainable gate network, it is characterized by greater performance and reliability associated with the fixed depth of the gates and the simplicity of the individual handlers. For solving more complicated tasks it is possible to use the series of gate networks. In this case, the topology of the device is more homogeneous, which leads to the interchangeability of its individual elements.

The developed network can be considered as a basis for constructing feedforward neural networks with a flexible topology that can be adapted to a specific task, up to the level of logical elements.

The proposed approach has the following advantages:

  1. 1.

    Greater homogeneity of the topology of the device, in contrast to the formal neuron, which contains adders, multipliers, activation functions.

  2. 2.

    Increase of the applied component on the hardware level to solve specific problems.

  3. 3.

    Reduction of the occupied area of the crystal, which is required for the hardware implementation of the network.

  4. 4.

    Parallelizing of the processing and learning of the network at the level of logical elements.

  5. 5.

    Flexible learning architecture of a formal neuron.

5 Conclusion

The work in the field of creating discrete learning networks is aimed to solve the problems of optimizing hardware and software costs in the construction of neural networks and digital equipment in general. The trainable gate network is not intended to replace a feedforward neural network, but it can be considered as a basis for constructing any digital network. The possibilities of gate networks are quite various. They can find the application for the creation of associative memory devices, cryptography, high performance combinational devices, solvers of Boolean functions and in other applications.