1 Introduction

Growing amount of data requires development of powerful and reliable tools for processing it. Artificial neural networks (ANN) are biologically inspired architectures that outperform most of conventional methods for data processing in many tasks. For instance, feedforward neural networks are quite popular for classification problems [1, 2]. However, they are ineffective in dealing with sequential data with long-term dependencies. Unlike feedforward neural networks, recurrent neural networks (RNNs) possess internal memory and are capable of retaining order of information and sharing the parameters across sequence.

Fig. 1
figure 1

Unfolded recurrent neural network

Early RNN architectures were introduced by Hopfiled and Jordan in the 1980s [3, 4]. Presently RNN can be represented as a chain of neural networks each passing information to a successor network (Fig. 1). Current state cell \(S_{1}\) gets a new input \(X_{1}\) along with hidden layer information of a previous cell \(S_{0}\) and produces an output \(Y_{1}\). The algorithm used to train RNN is called backpropagation through time (BPTT). It counts derivatives of the loss at each timestep and sums it up across time for each parameter. As the gap between timesteps gets bigger, vanishing gradient problem arises [5]. Long Short-term memory (LSTM) is a special configuration of RNN introduced to overcome this vanishing gradient problem [6].

Similar to RNN, LSTM also has a chain-like structure (Fig. 2) but each unit of a chain has a gated structure. Traditional LSTM cell consists of a memory cell to store state information and three gate layers that control flow of information within cells and network.

Fig. 2
figure 2

LSTM network

2 Long sort-term memory circuit architecture

2.1 LSTM structure

Figure 3 shows a basic LSTM cell structure. The core of a unit is an internal state storage. The information in a cell state is updated by forget, input and output layers.

Input gate First of all, a new input data and an output data from a previous cell are concatenated into a single vector. Then the vector goes through input gate, which behavior is described by equation:

$$\begin{aligned} i_{t}=\sigma (W^{(i)}x_{t}+U^{(i)}h_{t-1}+b^{(i)}) \end{aligned}$$

At the same time, concatenated vector is squashed between − 1 and 1 by applying hyperbolic tangent activation function:

$$\begin{aligned} g_{t}=tanh((W^{(g)}x_{t}+U^{(g)}h_{t-1}+b^{(g)}) \end{aligned}$$

The obtained output of \(g_{t}\) is elementwise multiplied with the output of the input gate \(i_{t}\). Since the value of \(i_{t}\) is between 0 and 1, the output \(i_{t}\bigodot g_{t}\) acts as a filter which form an intermediate cell state \({\tilde{C}}_{t}\).

Fig. 3
figure 3

LSTM unit structure

Forget gate Decision whether to allow information from a previous cell state to a current cell or completely block it, is made by the output of the forget gate, which is given by following equation:

$$\begin{aligned} f_{t}=\sigma (W^{(f)}x_{t}+U^{(f)}h_{t-1}+b^{(f)}) \end{aligned}$$

It also takes the values between 0 and 1.

Cell state The internal state of a LSTM cell is a sum of two components—an output of the input gate and output of the forget gate. Since forget gate decides whether to keep information or remove it, LSTM does not suffer from the vanishing gradient problem.

$$\begin{aligned} c_{t}={\tilde{C}}_{t}+f_{t}\bigodot c_{t-1} \end{aligned}$$

Output gate The output of an LSTM cell is a vector \(h_{t}\). It is a pointwise multiplication of sigmoid layer of an output gate and cell state squashed between − 1 and 1 by hyperbolic tangent activation function:

$$\begin{aligned} h_{t}=o_{t}\bigodot tanh(c_{t}), \end{aligned}$$

where

$$\begin{aligned} o_{t}=\sigma (W^{(o)}x_{t}+U^{(o)}h_{t-1}+b^{(o)}) \end{aligned}.$$

2.2 Matrix-vector multiplication

Matrix-vector multiplication (also known as a Hadamard-product multiplication) is a significant operation in LSTM gating systems and its accuracy plays a vital role. The proposed architecture for matrix-vector multiplication implementation is based on crossbar array using a novel device called “memristor”.

Fig. 4
figure 4

A memristor crossbar array with 4 inputs and 4 outputs

Memristor and memristor crossbar array Memristor is a non-volatile two-port device with variable resistor. Its existence was first postulated by Leon Chua in 1971. He predicted a device that maintain the relationship between charge and magnetic flux [7]. In 2008 HP labs announced the discovery of a device that possesses mentioned characteristics [8, 9].

A memristive crossbar array (Fig. 4) consists of a large number of intersecting rows and columns with memristors at junctions. An input vector is applied to a row of a crossbar and multiplied by the conductance of memristors. The output of a crossbar is a sum of currents across each column [10].

Memristor neuron circuit Figure 5 shows memristor neuron circuit with three inputs and one output. Each input is connected to a pair of memristors with conductances \(\sigma ^{+}\) and \(\sigma ^{-}\). If \(\sigma ^{+}\)\(\sigma ^{-}\) then resulting memristors conductance gives a positive weight, otherwise a negative weight [11].

$$\begin{aligned} (V_{1}*\sigma ^{(1)+} + V_{2}*\sigma ^{(2)+}) - (V_{1}*\sigma ^{(1)-} + V_{2}*\sigma ^{(2)-}) = \\ = (V_{1}(\sigma ^{(1)+} + \sigma ^{(1)-})+ V_{2}(\sigma ^{(2)+} + \sigma ^{(2)-}) \end{aligned}$$

The idea of this circuit can be further extended to implement vector-matrix multiplication of gating layers in LSTM. The sum of two memristances gives a required resulting weight, which can take both positive and negative values.

2.3 Activation layer circuit

Activation function in a traditional LSTM cell squashes each element of the output of vector-matrix multiplier either between 0 and 1 or − 1 and 1. To perform sigmoid and hyperbolic tangent function the CMOS-memristive thresholding circuit has been used. It is depicted in the Fig. 6. Memristor-inverter combination sets the threshold level whereas the breakworn voltage of a Zener diode determines the maximum height of the output voltage.

Fig. 5
figure 5

Memristor neuron circuit diagram

Fig. 6
figure 6

Activation circuit

Fig. 7
figure 7

Activation circuit response: a sigmoid; b hyperbolic tangent

2.4 Voltage multiplier circuit

Implementation of a pointwise multiplication (also known as Hadamard or Schur product) is presented in the Fig. 8. It consists of one NMOS transistor T1, two differential amplifiers, two inverters, buffer and IV converter circuits.

Fig. 8
figure 8

Two voltages multiplication circuit

Fig. 9
figure 9

Voltage multiplier inputs, a multiplier input 1, b multiplier input 2

3 Results

The responses of a circuit in the Fig. 6 for sigmoid and hyperbolic tangent functions are provided in the Fig. 7. The utilized CMOS technology in the circuit is 0.18um. To obtain hyperbolic tangent values the corresponding voltages should be set \(V_{dd1} = 1.3\,V; V_{ss1} = -\,0.5\,V; V_{p1}=0.4\,V; V_{n1} = -\,1.2\,V; V_{dd2} = 1.3\,V; V_{ss2} = -1.1\,V; V_{p2}=0.5\,V\) and \(V_{n2} =-\,0.4\,V\). The MOSFET transistor sizes are: \(M_{p1}=M_{p2}=0.18\,\upmu \mathrm{m}/3\,\upmu \mathrm{m}\), \(M_{n1}=0.18\,\upmu \mathrm{m}/4\,\upmu \mathrm{m}\), \(M_{n2}=0.18\,\upmu \mathrm{m}/4.5\,\upmu \mathrm{m}\).

Voltage multiplication (Fig. 8) is performed by a transistor T1(CMOS 0.18um technology, \(T_{1}=2\,\upmu \mathrm{m}/2\,\upmu \mathrm{m}\)) . Voltages to be multiplied \(V_{in1}\) and \(V_{in2}\) (see Fig. 9) are applied to the gate and drain of the transistor T1. The resulting output current is converted back to voltage by IV converter. Obtained voltage has an opposite sign therefore it is inverted again by Inverter 2. Since voltage applied to drain of transistor T1 should take only negative value and must not exceed the range (− 0.45:0)V, a differential amplifier 1 is used between input 1 and T1. Similarly, as transistor T1 gate voltage should always take positive values and lay in the range (0:0.45)V, a differential amplifier 2 is used between input 2 and transistor T1.

In our LSTM unit, Hadamard product is used to multiply LSTM gate outputs, which take values between (0:1) if activation function is a sigmoid and between (− 1:1) if activation function is hyperbolic tangent. Considering requirements for input values of transistor T1, \(V_{in1}\) is set to be an output hyperbolic tangent function and \(V_{in2}\) is an output of sigmoid function. By this, sigmoid fulfill the requirement of positive input voltage entering gate of T1 as its values are between 0 V and 1 V. As hyperbolic tangent function values are between (− 1:1)V and the drain input value can be only negative voltage, switches 1 and switch 2 are utilized to control multiplier output voltage polarity. If hyperbolic tangent output value is in the range (0:1)V, switch 1 passes a signal through Inverter 1 and buffer. Then inverted negative voltage enter the drain of T1 and multiplied with \(V_{in1}\). Switch 2 is open and the resulting \(V_{out}\) is taken from IV converter output. Figures 9 and 10 illustrate operation of the voltage multiplier circuit. Total area of one voltage multiplier circuit is 2,871.00 \(\upmu \mathrm{m}^2\) and power consumption 8.517 mW (Fig. 11).

Fig. 10
figure 10

Transistor T1 voltages, a transistor T1 drain voltage, b transistor T1 gate voltage

Fig. 11
figure 11

Voltage multiplier output

4 Conclusion

This work proposes a hardware architecture design for implementation of LSTM algorithm for processing and storing sequential data. The architecture was designed based on 0.18  \(\upmu\)m CMOS technology and novel devices called memristors. Utilization of memristor crossbar array for realization of vector-matrix multiplication within gate layers allows high scalability along with compatibility with CMOS technologies due to its nanoscale size and absence of leakage. The simulation results of the circuits for realization of basic computational operations of the traditional LSTM showed that they can be used to design other types of LSTM configuration.