Memristor-based neural networks with weight simultaneous perturbation training

Wang, Chunhua; Xiong, Lin; Sun, Jingru; Yao, Wei

doi:10.1007/s11071-018-4730-z

Memristor-based neural networks with weight simultaneous perturbation training

Review
Published: 18 December 2018

Volume 95, pages 2893–2906, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Nonlinear Dynamics Aims and scope Submit manuscript

Memristor-based neural networks with weight simultaneous perturbation training

Download PDF

Chunhua Wang ORCID: orcid.org/0000-0001-6522-9795¹,
Lin Xiong¹,
Jingru Sun¹ &
…
Wei Yao¹

1711 Accesses
69 Citations
Explore all metrics

Abstract

The training of neural networks involves numerous operations on the weight matrix. If neural networks are implemented in hardware, all weights will be updated in parallel. However, neural networks based on CMOS technology face many challenges in the updating phase of weights. For example, derivation of the activation function and error back propagation make it difficult to be realized at the circuit level, even though the back propagation algorithm is rather efficient and popular in neural networks. In this paper, a novel synaptic unit based on double identical memristors is designed, on the basis of which a new neural network circuit architecture is proposed. The whole network is trained by a hardware-friendly weight simultaneous perturbation (WSP) algorithm. The hardware implementation of neural networks based on WSP algorithm only involves the feedforward circuit and does not require the bidirectional circuit. Furthermore, two forward calculations are merely needed to update all weight matrices for each pattern, which significantly simplifies the weight update circuit and allows simpler and easier implementation of the neural network in hardware. The practicability, utility and simplicity of this scheme are demonstrated by the supervised learning tasks.

Enhancing in-situ updates of quantized memristor neural networks: a Siamese network learning approach

Article 13 February 2024

Electronic system with memristive synapses for pattern recognition

Article Open access 05 May 2015

Spontaneous sparse learning for PCM-based memristor neural networks

Article Open access 12 January 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the rapid development of artificial intelligence, neural networks are attracting more and more attentions and play a significantly important role in the field of image recognition, speech recognition [1] and automatic control, etc. Up to now, most of neural networks have been realized by software, which lacks the inherent parallelism of neural networks. However, circuit operations are inherently parallel and capable of providing high speed computation. Therefore, it is necessary to research hardware implementations of neural networks.

Regarding hardware implementations of neural networks, many studies have already been conducted [2,3,4,5,6,7,8], most of which implemented neural networks via CMOS technology. However, the implementation of nonvolatile weight storage is a major bottleneck for them. On the other hand, when the area and consumption taken into account, it is a hard task to efficiently realize the circuit-based neural networks using CMOS technology.

Recently, a novel device, the memristor [9,10,11,12,13], provides a fire-new approach for hardware implementations of neural networks. The memristor, a nonvolatile programmable resistor, was first physically realized by HP Labs in 2008 [14]. Current through the memristor or voltage across the memristor enables to tune memristor resistance. When its excitation is turned off, the device keeps its most recent resistance until the excitation is turned on again. Memristor is widely applied to chaotic circuits [15, 16], memory and neural networks. In memristor-based neural networks, memristor’s nonvolatility is significantly similar to the biological synapse, which spurs it to be a promising candidate for weight storage. And the fact that memristor belongs to nanodevices enables it easier to be integrated in the analog circuit. Because of these, memristor was widely used in hardware implementations of neural networks.

Many memristor-based neural networks were previously trained by the so-called spike-timing-dependent plasticity (STDP) [17,18,19,20,21], which was intended for explaining the biological phenomenon about tuning synaptic junction strength of the neurons. Besides, Sheri et al. realized neuromorphic character recognition system with two PCMO memristors as a synapse [22]. Nishitani et al. modified an existing three-terminal ferroelectric memristor (3T-FeMEM) model, and the STDP was exploited to perform supervised learning [19]. However, the convergence of STDP-based learning is not guaranteed for general inputs [23].

Other memristor-based learning systems recently attract people’s attentions. For example, memristor-based single-layer neural networks (SNNs) trained by the perceptron algorithm were designed to complete some linear separable tasks [24, 25]. Yet, it is the single-layer structure and binary outputs that limit its wide-ranging applications. Therefore, constructing memristor-based multilayer neural networks (MNNs) [26,27,28,29] that are rather more practical and efficient than SSNs is requisite. In 2012, Adhikari et al. put forward a modified chip-in-the-loop learning rule for MNNs with memristor bridge synapses that consisted of three transistors and four identical memristors and realized zero, negative and positive synaptic weights [30]. Unlike the conventional chip-in-the-loop learning, a complex multilayer network learning was conducted by decomposing it into multiple simple single-layer network learning, and the gradient descent algorithm was used to train all single-layer networks [30]. Their design makes the training process simplified and indeed reduces the communication overhead and the circuit complexity. However, their initial training and the weight update calculation were performed in software. In 2015, Soudry et al. designed an artificial synapse constituting of a single memristor and two transistors, and BP algorithm was used to train the whole network [31]. Indeed, it is fairly novel for their design that the errors are encoded as corresponding weight update time and the errors and inputs are introduced into weight updating process by means of the state variables of memristors. However, BP algorithm needs to propagate the error backward from the output layer to the hidden layers, and the update weight is obtained by calculating the derivative of the activation function. These complex calculations are not easy to implement at the circuit level.

Focusing on the difficulties in efficiently implementing learning rule like BP algorithm in hardware, some hardware-friendly algorithms were proposed and applied to the circuit implementation of neural networks. For example, in 2015, the random weight change (RWC) algorithm was used to train the memristor-based neural network where the synaptic weights were randomly updated by a small constant in each iteration [32]. Their design simplifies the circuit structure and does not involve complex circuit operations. However, in RWC’s weight updating process, just the sign but not the value of the error variation is considered, and the direction of weight change is random every time, which results in more iterations being needed.

In order to solve the problems in the memristor-based neural network trained by RWC and BP algorithm, in this paper, a new memristor-based neural network circuit trained by weight simultaneous perturbation (WSP) algorithm is proposed. WSP algorithm was introduced in [33]. Perturbations whose signs are randomly selected with equal probability are simultaneously added to all weights, and the difference between the first error function without perturbations and the second one with perturbations is used to control the change of weights in WSP. Unlike RWC algorithm, both the sign and value of the variation of the error function are exploited to update all weights. Hence, fewer iterations are needed compared with RWC algorithm. Additionally, in WSP, complex calculations such as derivation and error back propagation are not also involved compared with BP algorithm. Though the algorithm is relatively efficient and simple, so far, the memristor-based neural network circuit trained by WSP algorithm is not available. The hardware architecture of the neural network trained by WSP algorithm is therefore proposed in this paper. In addition, in order to solve the problem of storing and updating weights, a novel synaptic unit based on double identical memristors is designed, and taking full advantage of the characteristics of WSP algorithm, this design only needs two forward computations for each pattern to update all weight matrices, which greatly simplifies the circuit structure. This paper indicates that it is possible for neural networks to be actualized in a simpler, more compact and reliable form at the circuit level.

The remainder of this paper is organized as follows. In Sect. 2, the basic information about memristor, RWC algorithm and WSP algorithm is introduced. In Sect. 3, the circuit implementations of SNNs and MNNs are described in detail. In Sect. 4, the proposed circuit architectures are numerically evaluated by supervised tasks to demonstrate its practicability. Finally, Sect. 6 presents conclusions.

2 Preliminaries

In this section, the basic information about memristor, RWC algorithm and WSP algorithm is described.

2.1 Memristor

In 1971, Chua et al. firstly proposed the missing fourth fundamental circuit element, the memristor, which is a nonlinear passive two-terminal electrical component relating electric charge and magnetic flux linkage [11]. And the memristor’s electrical resistance can be tuned according to its current or voltage. Its physical model [34] is represented by the following formulas:

$$\begin{aligned}&i(t) = G(s(t))v(t) \end{aligned}$$

(1)

$$\begin{aligned}&\frac{{\mathrm{d}s(t)}}{{\mathrm{d}t}} = v(t) \end{aligned}$$

(2)

where G(s(t)) is the memductance. v(t) and i(t) represent voltage and current, respectively. s(t) denotes the state variable.

Assume that G(s(t)) is n-order differentiable near ${s^ * }$. G(s(t)) is expanded to the Taylor series at $s = {s^ * }$ as follows:

$$\begin{aligned} G(s)= & {} \frac{{G({s^\mathrm{{*}}})}}{{0!}} + \frac{{G'({s^\mathrm{{*}}})}}{{1!}}(s - {s^\mathrm{{*}}}) + \frac{{G''({s^\mathrm{{*}}})}}{{2!}}{(s - {s^\mathrm{{*}}})^2} \nonumber \\&+\cdots \, + \frac{{{G^{(n)}}({s^\mathrm{{*}}})}}{{n!}}{(s - {s^\mathrm{{*}}})^n} + \mathrm{O}\left[ {{{(s - {s^\mathrm{{*}}})}^n}} \right] \nonumber \\ \end{aligned}$$

(3)

where n is a positive integer. ${G^{(n)}}({s^ * })$ represents nth order derivative of the function G(s) at $s = {s^*}$, and $\mathrm{O}[{(s - {s^ * })^n}]$ represents higher-order infinitesimal of ${(s - {s^ * })^n}$.

Soudry et al. demonstrated that if the variations in the value of s(t) are restricted to be small, G(s(t)) is able to be approximated by first-order Taylor series around certain point ${s^ * }$ [31]. The memductance is hence given by

$$\begin{aligned} G(s) = \frac{{G({s^\mathrm{{*}}})}}{{0!}} + \frac{{G'({s^\mathrm{{*}}})}}{{1!}}(s - {s^\mathrm{{*}}}) \end{aligned}$$

(4)

simplified form:

$$\begin{aligned} G(s(t)) = {g^\mathrm{{*}}} + {\hat{g}}s(t) \end{aligned}$$

(5)

where ${\hat{g}} = {[\mathrm{d}G(s)/\mathrm{d}s]_{s = {s^ * }}}$ and ${g^ * } = G({s^ * }) - {\hat{g}}{s^ * }$.

Based on this memristor model, SSNs and MNNs are efficiently implemented in hardware.

2.2 Random weight change

In the random weight change (RWC) algorithm, synaptic weights are randomly tuned by a small constant, and the learning rule is given as

$$\begin{aligned} \left\{ \begin{aligned}&{\omega _{ij}}(n + 1) = {\omega _{ij}}(n) + \Delta {\omega _{ij}}(n + 1)\\&\Delta {\omega _{ij}}(n + 1) = \Delta {\omega _{_{ij}}}\mathrm{{ }}\;\;\;\;\;\quad \;\;\;\;\;\mathrm{if}\;E(n + 1) < E(n)\\&\Delta {\omega _{ij}}(n + 1) = \delta * Rand(n)\mathrm{{ }}\;\;\;\mathrm{if}\;E(n + 1) \ge E(n) \end{aligned} \right. \end{aligned}$$

(6)

where $E({\varvec{\cdot }})$ represents the squared error. $\delta $ is a small constant and Rand(n) represents the random function which has merely two values, namely $\pm \,1$. Weights of the network are changed randomly from the initial state with a small increment of $ +\,\delta $ or $ -\,\delta $. If $E({\varvec{\cdot }})$ decreases by the weight change, the same weight change is iterated until the error increases. If $E({\varvec{\cdot }})$ increases, weights are updated randomly again [32]. The RWC algorithm is similar to the Brownian weight movement [35], both of which belong to the stochastic learning algorithm [36]. In RWC, only the sign of the variation of the error function is considered, but its value is not fully utilized. Since the absolute value of the weight change is constant in RWC, this value is generally small in each weight update in order to ensure convergence. Moreover, the direction of weight change is random every time. Hence, the general trend of the trajectory of the weight change is to change along the direction of energy decline, as shown in [32, Fig. 1]. Therefore, more iterations are needed in RWC.

The WSP algorithm is different from the RWC algorithm. In WSP, the sign and value of the variation of the error function are used to tune the weights, which makes the data fully utilized, as described in Sect. 2.3. It also does not require complicated operations such as derivative and error back propagation.

2.3 Weight simultaneous perturbation

For convenience, the following nomenclatures are used in this paper.

${\omega _{ij}}(i = 1,2, \ldots ,m,j = 1,2, \ldots ,n)$: A weight.

$\Delta {\omega _{ij}}(i = 1,2, \ldots ,m,j = 1,2, \ldots ,n)$: Variation of ${\omega _{ij}}$.

${h_{ij}}(i = 1,2, \ldots ,m,j = 1,2, \ldots ,n)$: A sign. ${h_{ij}} = \pm \, 1$. The sign of ${h_{ij}}$ is randomly selected with equal probability.

$\mathbf{{W}}$: A $m \times n$ weight matrix consisting of ${\omega _{ij}}$.

$\Delta \mathbf{{W}}$: A $m \times n$ weight update matrix consisting of $\Delta {\omega _{ij}}$.

$\mathbf{{H}}$: A $m \times n$ sign matrix consisting of ${h_{ij}}$.

$\mathbf{{x}}$: A vector representing a pattern.

$\mathbf{{d}}$: A vector representing a target output.

$\mathbf{{o}}$: A vector representing an actual output of the network without perturbations for the input $\mathbf{{x}}$.

${\mathbf{{o}}_\mathrm{per}}$: A vector representing an actual output of the network with simultaneous perturbations for the input $\mathbf{{x}}$.

Hence, the relationship between $\mathbf{{x}}$ and $\mathbf{{o}}$ is given by

$$\begin{aligned} \mathbf{{o}} = f(\mathbf{{Wx}}) \end{aligned}$$

(7)

where $f({\varvec{\cdot }})$ represents the activation function. The error function is defined as

$$\begin{aligned} E = \frac{1}{2}{\left\| {\mathbf{{d}} - \mathbf{{o}}} \right\| ^2} \end{aligned}$$

(8)

In BP algorithm, the weight update learning rule is

$$\begin{aligned} \left\{ \begin{aligned}&\Delta {\omega _{ij}} = - \eta \frac{{\partial E}}{{\partial {\omega _{ij}}}}\\&{\omega _{ij}}(n + 1) = {\omega _{ij}}(n) + \Delta {\omega _{ij}} \end{aligned} \right. \end{aligned}$$

(9)

Regarding the WSP algorithm, its execution details are as follows. Firstly, the normal error function, without simultaneous perturbations, is calculated. Then, perturbations whose signs are randomly selected are simultaneously applied to all weights, and the error function calculation process is again executed. Finally, the weight update matrices are calculated according to the difference between the first and second error functions. In 1993, Alspector et al. [33] demonstrated that in WSP algorithm, the amount of change in weight was able to be approximated by the following formulas for a perturbation small enough

$$\begin{aligned} \left\{ \begin{array}{l} \begin{aligned} \Delta {\omega _{ij}} &{}= - \eta \frac{{\partial E}}{{\partial {\omega _{ij}}}} \\ &{} \approx - \eta \frac{{E({\omega _{ij}} + {\omega _\mathrm{per}}{h_{ij}}) - E({\omega _{ij}})}}{{{\omega _\mathrm{per}}}}{h_{ij}}\\ \end{aligned}\\ {\omega _{ij}}(n + 1) = {\omega _{ij}}(n) + \Delta {\omega _{ij}} \end{array} \right. \end{aligned}$$

(10)

The learning rule is expressed in matrix form

$$\begin{aligned} \left\{ \begin{array}{l} \begin{aligned} \Delta \mathbf{{W}} &{}= - \eta \frac{{E(\mathbf{{W}} + {\omega _\mathrm{per}}{} \mathbf{{H}}) - E(\mathbf{{W}})}}{{{\omega _\mathrm{per}}}}{} \mathbf{{H}} \\ &{}= - \eta \frac{{\Delta E}}{{{\omega _\mathrm{per}}}}{} \mathbf{{H}}\\ \end{aligned} \\ \mathbf{{W}}(n + 1) = \mathbf{{W}}(n) + \Delta \mathbf{{W}} \end{array} \right. \end{aligned}$$

(11)

where ${\omega _\mathrm{per}}$, a positive (usually small) constant, is the value of simultaneous perturbation injected into the weight. $\eta $ is the learning rate, a positive coefficient.

According to Eq. (11), the WSP algorithm requires only two feedforward operations of the neural network. And compared with BP algorithm, derivative of the activation function and error back propagation which are difficult to be implemented in hardware are not involved. In addition to simple sign circuits to control the direction of weight perturbation and weight adjustment, it either does not require other multiplication and transposition operations.

3 Circuit architecture design

In this section, implementing simple SNNs and general MNNs architectures based on memristor arrays with the WSP algorithm is described. MNNs are generally constructed by cascading multiple SNNs. Therefore, attentions are mainly paid to the design of SNNs.

3.1 Artificial synapse

As regards the circuit architecture of the artificial synapse in the memristor crossbars, three possible approaches are exhibited in Fig. 1. In Fig. 1a, a single memristor at the cross point of the circuit is regarded as a synapse for weight storage where the adjustable range of each weight is relatively small. Accordingly, in order to expand this range, the architectures in Fig. 1b, c are extensively employed in memristor crossbars. In Fig. 1b, each input signal and its opposite polarity signal are fed to a column of memristors, and in Fig. 1c each input signal is applied to a pair of memristors [27]. By adding simple auxiliary circuits, two memristors are considered as a synapse. In this paper, the architecture in Fig. 1b is selected.

The architecture constituting of two MOSFET (p-type and n-type) and an ideal single memristor was proposed by Soudry [31, Fig. 3(a)]. Taking full advantage of their architecture, a novel artificial synapse based on double identical memristors is designed, as shown in Fig. 2.

The current of the n-type transistor in the linear region is

$$\begin{aligned} {i_n} = {k_n}\left[ {({v_\mathrm{gs}} - {v_\mathrm{th}}){v_\mathrm{ds}} - \frac{1}{2}v_\mathrm{ds}^2} \right] \end{aligned}$$

(12)

where ${v_\mathrm{gs}}$ and ${v_\mathrm{ds}}$ are the gate–source voltage and the drain–source voltage, respectively. ${v_\mathrm{th}}$ is the threshold voltage. ${k_n}$ is the conductance parameter of the n-type transistor. When ${v_\mathrm{gs}} < {v_\mathrm{th}}$, the current of the n-type transistor is equal to zero, namely ${i_n} = 0$. Similarly, for the p-type transistor working in the linear region, its current is

$$\begin{aligned} {i_p} = - {k_p}\left[ {({v_\mathrm{gs}} - {v_\mathrm{th}}){v_\mathrm{ds}} - \frac{1}{2}v_\mathrm{ds}^2} \right] \end{aligned}$$

(13)

where ${k_p}$ is the conductance parameter of the p-type transistor. When ${v_\mathrm{gs}} > - {v_\mathrm{th}}$, the current of the p-type transistor is equal to zero, namely ${i_p} = 0$. Assume that ${k_n} = {k_p} = k$ and the absolute value of the threshold voltage, namely $\left| {{v_\mathrm{th}}} \right| $, is the same for two transistors.

The synaptic unit is fed with three voltage input signals in which u and ${\bar{u}} = -\,u$ are, respectively, connected to the terminals of the n-type and p-type transistors and an enable signal e whose value is 0, ${V_\mathrm{DD}}$, or $ - {V_\mathrm{DD}}$ (with ${V_\mathrm{DD}} > {v_\mathrm{th}}$) is connected to the gate of both transistors.

We assume that

$$\begin{aligned} - {v_\mathrm{th}}< u < {v_\mathrm{th}} \end{aligned}$$

(14)

According to [31, 37], in order to ensure that the input voltage is mainly applied to the memristor, assume that

$$\begin{aligned} k({v_\mathrm{DD}} - 2{v_\mathrm{th}})> > G(s(t)) \end{aligned}$$

(15)

Consequently, there are three possible situations:

1.
When $e = 0$, four transistors are off. In other words, the output current of the synaptic unit is equal to zero, namely $I = 0$. In this case, the state variables are not changed.
2.
When $e = {V_\mathrm{DD}}$, ${T_{r2}}$ and ${T_{r3}}$ both work in the linear region, while ${T_{r1}}$ and ${T_{r4}}$ are off. The currents ${I_1}$ and ${I_2}$ through both memristors, respectively, are
$$\begin{aligned} \left\{ \begin{array}{l} {I_\mathrm{{1}}} = \mathrm{{(}}{g^*} + {\hat{g}}{s_1}\mathrm{{)}}u\\ {I_\mathrm{{2}}} = - \mathrm{{(}}{g^*} + {\hat{g}}{s_2}\mathrm{{)}}u \end{array} \right. \end{aligned}$$
(16)
The output current I of the synaptic unit is
$$\begin{aligned} I = {I_1} + {I_2} = ({s_1} - {s_2}){\hat{g}}u = \Delta s{\hat{g}}u \end{aligned}$$
(17)
3.
When $e = - {V_\mathrm{DD}}$, ${T_{r1}}$ and ${T_{r4}}$ both work in the linear region, while ${T_{r2}}$ and ${T_{r3}}$ are off. The currents ${I_1}$ and ${I_2}$ through both memristors, respectively, are
$$\begin{aligned} \left\{ \begin{array}{l} {I_1} = - ({g^*} + {\hat{g}}{s_1})u\\ {I_2} = ({g^*} + {\hat{g}}{s_2})u \end{array} \right. \end{aligned}$$
(18)
The output current I of the synaptic unit is
$$\begin{aligned} I = {I_1} + {I_2} = ({s_2} - {s_1}){\hat{g}}u = \Delta s{\hat{g}}u \end{aligned}$$
(19)

where for notational simplicity, $\Delta s$ contains the sign of the change in two state variables.

According to the relationship between input signal and output signal from Eqs. (17) and (19), we can define the synaptic weight as

$$\begin{aligned} \omega = \Delta s{\hat{g}} \end{aligned}$$

(20)

The above analysis indicates that the design of the synaptic unit is feasible.

3.2 SNN circuit architecture

Based on the proposed synaptic unit in Fig. 2, a $3 \times 3$ SNN circuit architecture is designed in Fig. 3, where each synaptic unit is indexed by (i, j), with $i \in \left\{ {1, \ldots ,3} \right\} $ and $j \in \left\{ {1, \ldots ,3} \right\} $. Each synaptic unit receives two inputs ${u_j}$ , ${{\bar{u}}_j}$ and an enable signal ${e_{ij}}$, and produces an output current ${I_{ij}}$. Each column of synaptic units in the arrays shares two inputs ${u_j}$ and ${{\bar{u}}_j}$. The current summing operation, $\sum \nolimits _j {{I_{ij}}} $, is conducted at the terminal of each row. The row output interfaces convert the total current to the dimensionless output signal ${r_i}$.

3.3 Circuit operation

During running period of WSP algorithm, the circuit’s operations in each trial are composed of simple five phases. First, in the computing phase, the current summation operation and calculation of the error function are realized. Second, in the perturbing phase, perturbations with random signs are added to all weights simultaneously. Third, a new computing phase is needed in order to get the new error function. Fourth, in the restoring phase, restoring signals are applied to memristors to eliminate effect of the perturbation in the perturbing phase on the state variables. Fifth, in the updating phase, all weight matrices are adjusted according to Eq. (11). The signals and their effect on the state variable are shown in Fig. 4.

(1) First computing phase A vector $\mathbf{{x}}$ from the dataset is encoded by multiplying a positive constant a, which allows to convert a dimensionless number ${x_j}$ to corresponding voltage ${u_j}$, namely $\mathbf{{u}} = a\mathbf{{x}}$. In order not to change the state of the memristor at last, when the input signal is greater than zero, namely ${x_j} > 0$, the enable signal is designed as follows:

$$\begin{aligned} {e_{ij}}(t) = \left\{ \begin{array}{l} {V_\mathrm{DD}},\;\;\;\;\;\;\;\;\;\;\mathrm{if}\;\;0 \le t< 0.5{T_1}\\ - {V_\mathrm{DD}},\;\;\;\;\;\;\;\mathrm{if}\;\;0.5{T_1} \le t < {T_1} \end{array} \right. \end{aligned}$$

(21)

When the input signal is less than zero, namely ${x_j} < 0$, the enable signal is designed as follows:

$$\begin{aligned} {e_{ij}}(t) = \left\{ \begin{array}{l} - {V_\mathrm{DD}},\;\;\;\;\;\;\;\mathrm{if}\;\;0 \le t< 0.5{T_1}\\ {V_\mathrm{DD}},\;\;\;\;\;\;\;\;\;\;\mathrm{if}\;\;0.5{T_1} \le t < {T_1} \end{array} \right. \end{aligned}$$

(22)

According to Eq. (2), the total changes of the state variables for both memristors are, respectively, given by

$$\begin{aligned} \left\{ \begin{array}{l} \Delta {s_{ij1}} = \int _0^{0.5{T_1}} {(a{x_j})\mathrm{d}t + \int _{0.5{T_1}}^{{T_1}} {( - a{x_j})} } \mathrm{d}t = 0\\ \Delta {s_{ij2}} = \int _0^{0.5{T_1}} {( - a{x_j})\mathrm{d}t + \int _{0.5{T_1}}^{{T_1}} {(a{x_j})} } \mathrm{d}t = 0\\ \end{array} \right. \end{aligned}$$

(23)

From Eq. (23), the net changes of the state variables both are zero, which indicates that the resistance of both memristors is not finally changed. When $t \in \left[ {\left. {0,\;0.5{T_1}} \right) } \right. $, the transistors ${T_{r2}}$ and ${T_{r3}}$ both work in the linear region, while ${T_{r1}}$ and ${T_{r4}}$ correspondingly are in the cutoff region. The currents ${I_1}$ and ${I_2}$ in Fig. 2 are calculated according to Eq. (16):

$$\begin{aligned} \left\{ \begin{array}{l} {I_{ij1}} = a({g^*} + {\hat{g}}{s_{ij1}}){x_j}\\ {I_{ij2}} = - a({g^\mathrm{{*}}} + {\hat{g}}{s_{ij2}}){x_j} \end{array} \right. \end{aligned}$$

(24)

And the output current I of each synaptic unit from Eq. (17) is

$$\begin{aligned} \begin{array}{l} {I_{ij}} = {I_{ij1}} + {I_{ij2}} = a{\hat{g}}({s_{ij1}} - {s_{ij2}})x = a{\hat{g}}{s_{ij}}{x_j} \end{array} \end{aligned}$$

(25)

Therefore, the total current in each row terminal ${\varPhi _i}$ is

$$\begin{aligned} {\varPhi _i} = \sum \limits _j {{I_{ij}}} = a{\hat{g}}\sum \limits _j {{s_{ij}}} {x_j} \end{aligned}$$

(26)

The target output vector $\mathbf{{d}}$ from the dataset commonly is dimensionless. The constant c, consequently, is significantly essential to convert the current unit to a dimensionless number in the row output interface, namely

$$\begin{aligned} {o_i} = c{\varPhi _i} = ac{\hat{g}}\sum \limits _j {{s_{ij}}} {x_j} \end{aligned}$$

(27)

According to Eq. (20), we obtain

$$\begin{aligned} {\omega _{ij}} = ac{\hat{g}}{s_{ij}} \end{aligned}$$

(28)

On the basis of the Ohm’s law, the arithmetic operation $\mathbf{{Wx}}$ is realized in this circuit. Finally, according to the target output $\mathbf{{d}}$ and the actual output $\mathbf{{o}}$, the first error function also called cost function in machine learning is calculated and stored in a buffer circuit. The error function is given by

$$\begin{aligned} {E_\mathrm{fir}} = \frac{1}{2}{\left\| {\mathbf{{d}} - \mathbf{{o}}} \right\| ^2} \end{aligned}$$

(29)

(2) Perturbing phase During perturbing phase, a pair of voltage signals ${u_\mathrm{per}}$ and ${{\bar{u}}_\mathrm{per}}$ with unchanging amplitude and duration are applied to the terminals u and $\bar{u}$ in order to alter the state of memristors and introduce the simultaneous perturbation. To generate random perturbation, let $e = {h_{ij}}{V_\mathrm{DD}}$ control the sign of each perturbation, as described in Section III-A and the duration of ${{\bar{u}}_\mathrm{per}}$ is the same as that of e.

$$\begin{aligned} {e_{ij}} = {h_{ij}}{V_\mathrm{DD}}\;\;\;\;\mathrm{if}\;\;{T_1} \le t < {T_2} \end{aligned}$$

(30)

From Eq. (28), when $t \in \left[ {{T_1},{T_2}} \right) $, the perturbation constant ${\omega _\mathrm{per}}$ is given by

$$\begin{aligned} {s_\mathrm{per}} = \frac{{{\omega _\mathrm{per}}}}{{ac{\hat{g}}}} \end{aligned}$$

(31)

From the definitional perspective of the state variable, the total change in the internal state for the synaptic unit is

$$\begin{aligned} \left\{ \begin{array}{l} {u_\mathrm{per}} = a{x_\mathrm{per}}\;\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\mathrm{if}\;\;{T_1} \le t < {T_2}\\ \Delta {s_\mathrm{per}} = \Delta {s_1} - \Delta {s_2}\\ \;\;\;\;\;\;\;\;\;\; = \int _{{T_1}}^{{T_2}} {\left[ {{u_\mathrm{per}} - ( - {u_\mathrm{per}})} \right] \;} \mathrm{d}t\\ \;\;\;\;\;\;\;\;\;\;= 2a{x_\mathrm{per}}({T_2} - {T_1})\;\;\;\;\;\; \end{array} \right. \end{aligned}$$

(32)

where the absolute values of $\Delta {s_1}$ and $\Delta {s_2}$ are equal because two memristors of the synaptic unit are identical. According to Eqs. (31) and (32), the duration of the perturbing signal with a constant amplitude is

$$\begin{aligned} {T_\mathrm{per}} = {T_2} - {T_1} = \left| {\frac{{{\omega _\mathrm{per}}}}{{2{a^2}c{\hat{g}}{x_\mathrm{per}}}}} \right| \end{aligned}$$

(33)

where ${T_\mathrm{per}}$ is a positive constant. In other words, the duration of the perturbing phase is changeless throughout the training process, which also explains why the magnitude and duration of the voltage signals ${u_\mathrm{per}}$ and ${{\bar{u}}_\mathrm{per}}$ are fixed.

(3) Second computing phase During each computing phase, the operations of the entire network are the same as the first computing phase. In this phase, the enable signal is ( for ${x_j} > 0$)

$$\begin{aligned} {e_{ij}}(t) = \left\{ \begin{array}{l@{\quad }l} {V_\mathrm{DD}},&{}\mathrm{if}\;{T_2} \le t< \frac{{{T_2} + {T_3}}}{2}\\ - {V_\mathrm{DD}},&{}\mathrm{if}\;\frac{{{T_2} + {T_3}}}{2}\; \le t < {T_3} \end{array} \right. \end{aligned}$$

(34)

And the input signals are the same as in the first computing phase. Hence, the total changes in the internal state variables for the synaptic unit are

$$\begin{aligned} \left\{ \begin{array}{l} \Delta {s_{ij1}} = \int _{{T_2}}^{\frac{{{T_2} + {T_3}}}{2}} {(a{x_j})\mathrm{d}t + \int _{\frac{{{T_2} + {T_3}}}{2}}^{{T_3}} {( - a{x_j})} } \mathrm{d}t\\ \;\;\;\;\;\;\;\;\;\;= 0\\ \Delta {s_{ij2}} = \int _{{T_2}}^{\frac{{{T_2} + {T_3}}}{2}} {( - a{x_j})\mathrm{d}t + \int _{\frac{{{T_2} + {T_3}}}{2}}^{{T_3}} {(a{x_j})} } \mathrm{d}t\\ \;\;\;\;\;\;\;\;\;\;= 0 \end{array} \right. \end{aligned}$$

(35)

And the error function is given by

$$\begin{aligned} {E_\mathrm{sec}} = \frac{1}{2}{\left\| {\mathbf{{d}} - {\mathbf{{o}}_\mathrm{per}}} \right\| ^2} \end{aligned}$$

(36)

(4) Restoring phase During the perturbing phase, simultaneous perturbations are added to change the state of memristors. Restoring signals are, hence, needed to restore memristors to the stage where simultaneous perturbations are not added, which is significantly critical for accurate update of all weights. Yet, there are some differences compared with the perturbing phase. The enable signal and input signal are, respectively,

$$\begin{aligned} \left\{ \begin{array}{l} {u_\mathrm{per}} = a{x_\mathrm{per}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\quad \;\mathrm{if}\;{T_3} \le t < {T_{_4}}\\ {e_{ij}} = - {h_{ij}}{V_\mathrm{DD}} \end{array} \right. \end{aligned}$$

(37)

And the duration of e and u must meet the following requirement:

$$\begin{aligned} {T_\mathrm{per}}={T_4} - {T_3} \end{aligned}$$

(38)

(5) Updating phase During updating phase, the updating voltage signals with a constant amplitude, namely ${u_\mathrm{upd}}$ and ${{\bar{u}}_\mathrm{upd}}$, are, respectively, connected to the terminals u and ${\bar{u}}$. And the difference between two error functions is given by

$$\begin{aligned} \Delta E = E(\mathbf{{W}} + {\omega _\mathrm{per}}{} \mathbf{{H}}) - E(\mathbf{{W}}) = {E_\mathrm{sec}} - {E_\mathrm{fir}} \end{aligned}$$

(39)

According to the learning rule described by Eq. (11), the variation of each weight is

$$\begin{aligned} \Delta {\omega _{ij}} = - \eta \frac{{\Delta E}}{{{\omega _\mathrm{per}}}}{h_{ij}} \end{aligned}$$

(40)

From Eq. (28), we obtain

$$\begin{aligned} \Delta {s_{ij}} = - \eta \frac{{\Delta E}}{{ac{\hat{g}}{\omega _\mathrm{per}}}}{h_{ij}} \end{aligned}$$

(41)

The direction of weight adjustment is controlled by the enable signal e.

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} {e_{ij}} = {V_\mathrm{DD}}&{} \mathrm{if}\;\Delta {\omega _{ij}} > 0\\ {e_{ij}} = 0&{}\mathrm{if}\;\Delta {\omega _{ij}} = 0\\ {e_{ij}} = - {V_\mathrm{DD}}&{} \mathrm{if}\;\Delta {\omega _{ij}} < 0 \end{array} \right. \end{aligned}$$

(42)

From Eq. (2), the variation of the state variable for the synaptic unit is, hence, (for $\Delta {\omega _{ij}} > 0$)

$$\begin{aligned} \left\{ \begin{array}{l} {u_\mathrm{upd}} = a{x_\mathrm{upd}}\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\mathrm{if}\;\;{T_4} \le t < {T_5}\\ \Delta {s_{ij}} = \int _{{T_4}}^{{T_5}} {\left[ {{u_\mathrm{upd}} - ( - {u_\mathrm{upd}})} \right] \mathrm{d}t} \\ \;\;\;\;\;\;\;\; = \int _{{T_4}}^{{T_5}} {2a{x_\mathrm{upd}}\mathrm{d}t} \\ \;\;\;\;\;\;\;\; = 2a({T_5} - {T_4}){x_\mathrm{upd}} \end{array} \right. \end{aligned}$$

(43)

According to Eq. (41) and Eq. (43), the duration of the updating signal with a constant amplitude $a{x_\mathrm{upd}}$ is

$$\begin{aligned} {T_\mathrm{upd}} = {T_5} - {T_4} = \left| {\eta \frac{{\Delta E}}{{2{a^2}c{\hat{g}}{\omega _\mathrm{per}}{x_\mathrm{upd}}}}} \right| \end{aligned}$$

(44)

Therefore, the sign and duration of the input and update signals can be easily controlled by the enable signal e. Finally, for example, assume that the signs of $\Delta {\omega _{11}}$ and $\Delta {\omega _{12}}$ are plus and others are minus in Fig. 3. So, its weight update circuit is designed as Fig. 5.

Through the above analysis, there are three noteworthy points:

1.
In the whole training process, the duration and amplitude of the perturbation signal, namely ${T_\mathrm{per}}$ and $a{x_\mathrm{per}}$, are constants, and the random sign of the perturbation signal is controlled by the terminal e.
2.
The duration of computing and restoring phase for each pattern is also constant throughout the training process.
3.
In the kth weight update process, the duration and amplitude of the updating signal, namely ${T_\mathrm{upd}}$ and $a{x_\mathrm{upd}}$ , are constants for all synaptic units, and the sign of the updating signal is controlled by the terminal e. (k, a positive integer, is the index of weight update.)

3.4 MNN circuit architecture

In part 2 and part 3, attentions are primarily paid to the SNNs circuit architecture and its operations with the WSP algorithm training. And derivative of the activation function and error back propagation are not involved for the multilayer neural network in WSP, which greatly simplifies the circuit structure. Therefore, in this part, the MNNs circuit architecture cascading multiple SNNs, mainly a two-layer neural network, is discussed. Regarding the circuit architecture of MNNs, its schematic is described in Fig. 6.

For a two-layer neural network, its mathematical model is expressed as

$$\begin{aligned} \mathbf{{o}} = {\mathbf{{W}}_2}f({\mathbf{{W}}_1}{} \mathbf{{x}}) \end{aligned}$$

(45)

where $f({\varvec{\cdot }})$ denotes the activation function. ${\mathbf{{W}}_1}$ and ${\mathbf{{W}}_2}$ are the input-hidden and the hidden-output weight matrix, respectively. And from Eq. (11), the weight update rule for a two-layer network is given by

$$\begin{aligned} \left\{ \begin{aligned} \Delta {\mathbf{{W}}_1} = - \eta \frac{{\Delta E}}{{{\omega _\mathrm{per}}}}{\mathbf{{H}}_1}\\ \Delta {\mathbf{{W}}_2} = - \eta \frac{{\Delta E}}{{{\omega _\mathrm{per}}}}{\mathbf{{H}}_2} \end{aligned} \right. \end{aligned}$$

(46)

where ${\mathbf{{H}}_1}$ and ${\mathbf{{H}}_2}$ are, respectively, the random sign matrices of ${\mathbf{{W}}_1}$ and ${\mathbf{{W}}_2}$. The learning rule for a general MNNs is derived based on Eq. (11)

$$\begin{aligned} \Delta {\mathbf{{W}}_p} = - \eta \frac{{\Delta E}}{{{\omega _\mathrm{per}}}}{\mathbf{{H}}_p} \end{aligned}$$

(47)

where the subscript p, a positive integer, is the index of network layer without taking the input layer into account.

As for the circuit operations of MNNs, it is basically the same as that of SNNs. Similarly, it is only two feedforward operations that are needed to synchronously update all weight matrices for the MNN when WSP algorithm used to train the neural network according to Eq. (47).

4 Circuit simulation

In this section, the neural network circuit is modeled in SimElectronics attached to the Simscape toolbox in MATLAB to evaluate the circuit performance. The memristor model [31] proposed by Soudry is exploited in this paper. In SimElectronics, basic electrical component models including the MOSFET are provided. In addition, other circuit operations required for realizing the WSP algorithm such as random sign generation with equal probability and calculation of the error function are implemented using MATLAB code. As for the weight updating, a voltage signal with a constant magnitude, namely $a{x_\mathrm{upd}}$ in Fig. 4, is needed, and its polarity is controlled by the enable signal e. Hence, the weight update value is converted to corresponding pulse of specific duration from Eq. (44). Other parameters in this paper are shown in Table 1.

Table 1 Circuit parameters

Full size table

4.1 Basic operations of synaptic unit

To verify basic operations of the proposed synaptic unit, a numerical simulation is carried out. A relatively simple $2 \times 2$ synaptic unit circuit whose input signals $x_1$ and $x_2$ are separately set to 0.1 and 0.8 is simulated in ten operation cycles, namely $t \in [0,10{T_5}]$. The simultaneous perturbation ${\omega _\mathrm{per}}$ is 0.004, and its sign is randomly chosen with equal probability. The magnitude of updating signal, namely $a{x_\mathrm{upd}}$, is 60 mV. With regard to weight adjustments of four synapse units, assume that

$$\begin{aligned} \left\{ \begin{array}{l} \Delta {\omega _{11}} = 0.008 \times \textit{sign}(5{T_5} - t)\\ \Delta {\omega _{12}} = 0.008 \times \textit{sign}(5{T_5} - t)\\ \Delta {\omega _{21}} = 0.008 \times \textit{sign}(t - 5{T_5})\\ \Delta {\omega _{22}} = 0.008 \times \textit{sign}(t - 5{T_5}) \end{array} \right. \end{aligned}$$

(48)

It is because the synaptic unit is a dual identical memristors structure that the absolute value of the corresponding weight change for each memristor is 0.004.

Based on the above hypothetical data, the circuit simulation results are shown in Fig. 7, together with the input voltages of synaptic units and its corresponding weights. The feasibility of basic operations of the circuit is finally verified.

4.2 Three input odd parity function

A three input odd parity function task is conducted using a two-layer network with five neurons in the hidden layer and only one neuron in the output layer. Table 2 shows the truth table of the three odd parity function. In this task, the activation function is set as the sigmoid function. The learning rate $\eta $ is 0.2, and the significantly important parameter ${\omega _\mathrm{per}}$, the amount of weight simultaneous perturbation, is set to 0.002. If ${\omega _\mathrm{per}}$ is too large, the whole system may not converge from Eq. (11). On the contrary, if it is too small, the number of iterations will become very large. The magnitude of perturbation voltage, namely $a{x_\mathrm{per}}$, is 40 mV. The curve between network training errors and iterations is shown in Fig. 8. Eight different test samples, all possible inputs of three input odd parity function, are finally used to test the performance of the network. The target output and actual output are shown in Fig. 9 where the mean squared errors (MSE) are 0.0016. As shown in Fig. 8, the iterations trained with WSP algorithm are about 1000. For the same task, the iterations trained with RWC algorithm are approximately 10,000 and the iterations of BP are about 1000 in [38, Fig. 12], which indicates that the performance of the proposed memristor-based neural network with WSP training is successful.

Table 2 Truth table of three input parity function

Full size table

4.3 Iris classification

Another more complex classification task, Iris classification, is conducted to test accuracy of the proposed memristor-based architecture trained by WSP algorithm. The Iris dataset [39] contains 150 patterns which can be divided into three classes (Setosa, Versicolor, Virginica), and each pattern consists of four features which are normalized in this task. The simulation is executed using the designed two-layer network with four neurons in the hidden layer and three neurons in the output layer. In this task, the activation function is set as the sigmoid function. The three classes of the dataset are encoded with three bits, and the details are shown in Table 3 in which Output-X1, Output-X2 and Output-X3 represent outputs of three neurons in the output layer, respectively. In terms of network parameters, the learning rate $\eta $ is 0.02, and the significantly important parameter ${\omega _\mathrm{per}}$ is set to 0.001. The training set size is 120. Finally, the training error is shown in Fig. 10, and 30 test samples are selected to test the network performance. In Fig. 11, the target output and actual output are depicted. The MSE of each bit in Fig. 11a–c are 0.0004, 0.0012 and 0.0006, respectively. The iterations trained with RWC algorithm are approximately 20,000 in Fig. 10c, while that of WSP are approximately 2,000 in Fig. 10b. The iterations of BP are about 1200 in [31, Fig. 8(b)]. Compared with BP algorithm, the iterations with WSP algorithm training are more. But comparing [31, Fig.4(b)] with Fig. 6, the circuit structure trained by WSP algorithm is simpler, because it does not involve complicated circuit operations, such as derivative of the activation function and error back propagation. These results indicate that the proposed scheme is practical.

Table 3 Coding list for three classes

Full size table

5 Discussion

The iterations of WSP, RWC and BP algorithms are shown in Table 4 for three input odd parity function and Iris classification. WSP algorithm is an approximation to the gradient descent algorithm, so its convergence speed is smaller than that of the BP algorithm based on Eq. (10). RWC algorithm is based on the Brownian random motion equation, so the convergence speed is smaller than WSP algorithm based on Eq. (6). Although converging faster than RWC and WSP, BP algorithm involves complex operations such as the derivative of activation function and error back propagation. And in BP, the weight changes depend on gradient of the error function. When the error function has a great number of local minima, the neural network easily converges to local optimum. Many proposed solutions use a perturbation inserted in the network in order to get out from local minima. The introduction of noise during learning allows the solution to reach the global optimum. Furthermore, it has been shown that training with noise may improve the generalization capability of the network and reduce the negative effects of the weight variations after learning [40]. The advantages of WSP with respect to the BP are the ability to avoid local minima of the error function and the low computational complexity, which leads to a simplification in hardware implementation.

Table 4 The iterations of RWC, BP and WSP algorithms

Full size table

6 Conclusion

A memristor-based hardware implementation of the neural network with on-chip learning is proposed. Moreover, WSP algorithm, a hardware-friendly algorithm, is adopted to train the neural network, which does not require error back propagation and calculation of the derivative of activation function. Only two feedforward operations are needed to update all weights. Therefore, the circuit architecture is simpler and more compact compared with BP algorithm. By combining nonvolatile memristors and WSP algorithm, a simple, compact and reliable hardware implementations of neural networks is put in real-life practice. Two supervised learning tasks, three input odd parity function and Iris classification, are conducted to test the accuracy of the network, and the correctness of the proposed architecture is similar to its software counterpart, which opens an opportunity for large-scale online learning in MNNs.

References

Tao, F., Busso, C.: Gating neural network for large vocabulary audiovisual speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP) 26(7), 1286–1298 (2018)
Google Scholar
Talaśka, T., Kolasa, M., Dlugosz, R., Pedrycz, W.: Analog programmable distance calculation circuit for winner takes all neural network realized in the CMOS technology. IEEE Trans. Neural Netw. Learn. Syst. 27(3), 661–673 (2016)
Article MathSciNet Google Scholar
Talaśka, T., Kolasa, M., Dlugosz, R., Farine, P.A.: An efficient initialization mechanism of neurons for Winner Takes All Neural Network implemented in the CMOS technology. Appl. Math. Comput. 267, 119–138 (2015)
MathSciNet MATH Google Scholar
Sengupta, A., Banerjee, A., Roy, K.: Hybrid spintronic-CMOS spiking neural network with on-chip learning: devices, circuits, and systems. Phys. Rev. Appl. 6(6), 064003 (2016)
Article Google Scholar
Dlugosz, R., Kolasa, M., Pedrycz, W., Szulc, M.: Parallel programmable asynchronous neighborhood mechanism for Kohonen SOM implemented in CMOS technology. IEEE Trans. Neural Netw. 22(12), 2091–2104 (2011)
Article Google Scholar
Wu, X., Saxena, V., Zhu, K., Balagopal, S.: A CMOS spiking neuron for brain-inspired neural networks with resistive synapses and in situ learning. IEEE Trans. Circuits Syst. II 62(11), 1088–1092 (2015)
Article Google Scholar
Pan, C., Naeemi, A.: Non-Boolean computing benchmarking for beyond-CMOS devices based on cellular neural network. IEEE J. Explor. Solid-State Comput. Devices Circuits 2, 36–43 (2016)
Article Google Scholar
Goknar, I.C., Yildiz, M., Minaei, S., Deniz, E.: Neural CMOS-integrated circuit and its application to data classification. IEEE Trans. Neural Netw. Learn. Syst. 23(5), 717–724 (2012)
Article Google Scholar
Valov, I., Kozicki, M.: Non-volatile memories: organic memristors come of age. Nat. Mater. 16, 1170–1172 (2017)
Article Google Scholar
Wang, Z., Joshi, S., Savel’ev, S.E., Jiang, H., Midya, R., Lin, P., Wu, Q.: Memristors with diffusive dynamics as synaptic emulators for neuromorphic computing. Nat. Mater. 16(1), 101 (2017)
Article Google Scholar
Herrmann, E., Rush, A., Bailey, T., Jha, R.: Gate controlled three-terminal metal oxide memristor. IEEE Electron Device Lett. 39(4), 500–503 (2018)
Article Google Scholar
van de Burgt, Y., Lubberman, E., Fuller, E.J., Keene, S.T., Faria, G.C., Agarwal, S., Salleo, A.: A non-volatile organic electrochemical device as a low-voltage artificial synapse for neuromorphic computing. Nat. Mater. 16(4), 414–418 (2017)
Article Google Scholar
Gupta, I., Serb, A., Khiat, A., Zeitler, R., Vassanelli, S., Prodromakis, T.: Sub 100 nW volatile nano-metal-oxide memristor as synaptic-like encoder of neuronal spikes. IEEE Trans. Biomed. Circuits Syst. 12(2), 351–359 (2018)
Article Google Scholar
Strukov, D.B., Snider, G.S., Stewart, D.R., Williams, R.S.: The missing memristor found. Nature 453(7191), 80 (2008)
Article Google Scholar
Zhou, L., Wang, C., Zhou, L.: Generating hyperchaotic multi-wing attractor in a 4D memristive circuit. Nonlinear Dyn. 85(4), 2653–2663 (2016)
Article Google Scholar
Zhou, L., Wang, C., Zhou, L.: A novel no-equilibrium hyperchaotic multi-wing system via introducing memristor. Int. J. Circuit Theory Appl. 46(1), 84–98 (2018)
Article Google Scholar
Panwar, N., Rajendran, B., Ganguly, U.: Arbitrary spike time dependent plasticity (STDP) in memristor by analog waveform engineering. IEEE Electron Device Letters. 38(6), 740–743 (2017)
Article Google Scholar
Cai, W., Ellinger, F., Tetzlaff, R.: Neuronal synapse as a memristor: modeling pair- and triplet-based STDP rule. IEEE Trans. Biomed. Circuits Syst. 9(1), 87–95 (2015)
Article Google Scholar
Nishitani, Y., Kaneko, Y., Ueda, M.: Supervised learning using spike-timing-dependent plasticity of memristive synapses. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 2999–3008 (2015)
Article MathSciNet Google Scholar
Boyn, S., Grollier, J., Lecerf, G., Xu, B., Locatelli, N., Fusil, S., et al.: Learning through ferroelectric domain dynamics in solid-state synapses. Nat. Commun. 8, 14736 (2017)
Article Google Scholar
Kheradpisheh, S.R., Ganjtabesh, M., Thorpe, S.J., Masquelier, T.: STDP-based spiking deep convolutional neural networks for object recognition. Neural Netw. 99, 56–67 (2018)
Article Google Scholar
Sheri, A.M., Hwang, H., Jeon, M., Lee, B.G.: Neuromorphic character recognition system with two PCMO memristors as a synapse. IEEE Trans. Ind. Electron. 61(6), 2933–2941 (2014)
Article Google Scholar
Legenstein, R., Naeger, C., Maass, W.: What can a neuron learn with spike-timing-dependent plasticity? Neural Comput. 17(11), 2337–2382 (2005)
Article MathSciNet MATH Google Scholar
Alibart, F., Zamanidoost, E., Strukov, D.B.: Pattern classification by memristive crossbar circuits using ex situ and in situ training. Nat. Commun. 4, 2072 (2013)
Article Google Scholar
Prezioso, M., Merrikh-Bayat, F., Hoskins, B.D., Adam, G.C., Likharev, K.K., Strukov, D.B.: Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 61–64 (2015)
Article Google Scholar
Li, C., Belkin, D., Li, Y., Yan, P., Hu, M., Ge, N., Song, W., et al.: Efficient and self-adaptive in-situ learning in multilayer memristor neural networks. Nat. Commun. 9, 2385 (2018)
Article Google Scholar
Bayat, F.M., Prezioso, M., Chakrabarti, B., Nili, H., Kataeva, I., Strukov, D.: Implementation of multilayer perceptron network with highly uniform passive memristive crossbar circuits. Nat. Commun. 9, 2331 (2018)
Article Google Scholar
Hu, X., Feng, G., Duan, S., Liu, L.: A memristive multilayer cellular neural network with applications to image processing. IEEE Trans. Neural Netw. Learn. Syst. 28(8), 1889–1901 (2016)
Article MathSciNet Google Scholar
Zeng, X., Wen, S., Zeng, Z., Huang, T.: Design of memristor-based image convolution calculation in convolutional neural network. Neural Comput. Appl. 30(2), 503–508 (2018)
Article Google Scholar
Adhikari, S.P., Yang, C., Kim, H., Chua, L.O.: Memristor bridge synapse-based neural network and its learning. IEEE Trans. Neural Netw. Learn. Syst. 23(9), 1426–1435 (2012)
Article Google Scholar
Soudry, D., Di Castro, D., Gal, A., Kolodny, A., Kvatinsky, S.: Memristor-based multilayer neural networks with online gradient descent training. IEEE Trans. Neural Netw. Learn. Syst. 26(10), 2408–2421 (2015)
Article MathSciNet Google Scholar
Adhikari, S.P., Kim, H., Budhathoki, R.K., Yang, C., Chua, L.O.: A circuit-based learning architecture for multilayer neural networks with memristor bridge synapses. IEEE Trans. Circuits Syst. I 62(1), 215–223 (2015)
Article Google Scholar
Alspector, J., Meir, R., Yuhas, B., Jayakumar, A., Lippe, D.: A parallel gradient descent method for learning in analog VLSI neural networks. In: Advances in Neural Information Processing Systems, pp. 836–844 (1993)
Kvatinsky, S., Friedman, E.G., Kolodny, A., Weiser, U.C.: Team: threshold adaptive memristor model. IEEE Trans. Circuits Syst. I 60(1), 211–221 (2013)
Article MathSciNet Google Scholar
Belli, M.R., Conti, M., Turchetti, C.: Analog Brownian weight movement for learning of artificial neural networks. In: European Symposium on Artificial Neural Networks (ESANN), pp. 19–21 (1995)
Conti, M., Orcioni, S., Turchetti, C.: A new stochastic learning algorithm for analog hardware implementation. In: International Conference on Artificial Neural Networks (ICANN), pp. 1171–1176 (1998)
Kvatinsky, S., Satat, G., Wald, N., Friedman, E.G., Kolodny, A., Weiser, U.C.: Memristor-based material implication (imply) logic: design principles and methodologies. IEEE Trans. Very Large Scale Integr. Syst. 22(10), 2054–2066 (2014)
Article Google Scholar
Yang, C., Kim, H., Adhikari, S.P., Chua, L.O.: A circuit-based neural network with hybrid learning of backpropagation and random weight change algorithms. Sensors 17, 16 (2016)
Article Google Scholar
Bache, K., Lichman, M.: UCI machine learning repository. http://archive.ics.uci.edu/ml (2018)
Murray, A.F., Edwards, P.J.: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Trans. Neural Netw. 5(5), 792–802 (1994)
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (No. 61571185), the Natural Science Foundation of Hunan Province, China (No. 2016JJ2030) and the Open Fund Project of Key Laboratory in Hunan Universities (No. 15K027).

Author information

Authors and Affiliations

College of Computer Science and Electronic Engineering, Hunan University, Changsha, 410082, China
Chunhua Wang, Lin Xiong, Jingru Sun & Wei Yao

Authors

Chunhua Wang
View author publications
You can also search for this author in PubMed Google Scholar
Lin Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Jingru Sun
View author publications
You can also search for this author in PubMed Google Scholar
Wei Yao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chunhua Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, C., Xiong, L., Sun, J. et al. Memristor-based neural networks with weight simultaneous perturbation training. Nonlinear Dyn 95, 2893–2906 (2019). https://doi.org/10.1007/s11071-018-4730-z

Download citation

Received: 24 May 2018
Accepted: 11 December 2018
Published: 18 December 2018
Issue Date: 01 March 2019
DOI: https://doi.org/10.1007/s11071-018-4730-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Memristor-based neural networks with weight simultaneous perturbation training

Abstract

Similar content being viewed by others

Enhancing in-situ updates of quantized memristor neural networks: a Siamese network learning approach

Electronic system with memristive synapses for pattern recognition

Spontaneous sparse learning for PCM-based memristor neural networks

1 Introduction

2 Preliminaries

2.1 Memristor

2.2 Random weight change

2.3 Weight simultaneous perturbation

3 Circuit architecture design

3.1 Artificial synapse

3.2 SNN circuit architecture

3.3 Circuit operation

3.4 MNN circuit architecture

4 Circuit simulation

4.1 Basic operations of synaptic unit

4.2 Three input odd parity function

4.3 Iris classification

5 Discussion

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Memristor-based neural networks with weight simultaneous perturbation training

Abstract

Similar content being viewed by others

Enhancing in-situ updates of quantized memristor neural networks: a Siamese network learning approach

Electronic system with memristive synapses for pattern recognition

Spontaneous sparse learning for PCM-based memristor neural networks

Explore related subjects

1 Introduction

2 Preliminaries

2.1 Memristor

2.2 Random weight change

2.3 Weight simultaneous perturbation

3 Circuit architecture design

3.1 Artificial synapse

3.2 SNN circuit architecture

3.3 Circuit operation

3.4 MNN circuit architecture

4 Circuit simulation

4.1 Basic operations of synaptic unit

4.2 Three input odd parity function

4.3 Iris classification

5 Discussion

6 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation