Keywords

1 Summary

Neural activity is modulated through learning, i.e., long-term adaptation of synaptic weights. However, it remains unresolved how weights are adapted across the cortex to effectively solve a given task. A key question is how to assign credit to synapses that are situated deep within a hierarchical network. In deep learning, backpropagation (BP) is the current state-of-the-art for solving this issue, and may potentially serve as an inspiration for neuroscience. Application of BP to cortical processing is however non-trivial, due to several biologically implausible requirements it entails. For example, it requires information to be buffered for use at different stages of processing. Additionally, error propagation occurs through weights that must be mirrored at synapses in different layers, resulting in the weight transport problem. Furthermore, artificial neural networks (ANNs) operate in separate forward and backward phases, with inference and learning alternating strictly.

We introduce Phaseless Alignment Learning (PAL) [4], a biologically plausible technique for learning effective top-down weights across layers in cortical hierarchies. We propose that cortical networks can learn useful backward weights by utilizing a ubiquitous resource of the brain: noise. Despite being usually treated as a disruptive factor, noise can be leveraged by the feedback pathway as an additional carrier of information for synaptic plasticity.

PAL describes a fully dynamic system that effectively addresses all of the aforementioned problems: it models the dynamics of biophysical substrates, and all computations are carried out using information locally available at the synapses; learning occurs in a completely phase-less manner; plasticity is always-on for all synapses, both forward and backward, at all times. Our approach is consistent with biological observations and facilitates efficient learning without the need for wake-sleep phases or other forms of phased plasticity found in many other models of cortical learning.

PAL can be applied to a broad range of models and represents an improvement over previously known biologically plausible methods of credit assignment. For instance, when compared to feedback alignment (FA), PAL can solve complex tasks with fewer neurons and more effectively learn useful latent representations. We illustrate this by conducting experiments on various classification tasks using a cortical dendrite microcircuit model [7], which leverages the complexity of neuronal morphology and is capable of prospective coding [2].

2 Theory

PAL utilises the noise found in physical neurons, as information is sent across the cortical hierarchy, see Fig. 1 (a). Neuronal dynamics are described in a rate-based coding scheme of a network with \(\ell ~=~1 \, \ldots \, N\) layers,

$$\begin{aligned} \tau \, \boldsymbol{\dot{u}}_\ell = - \boldsymbol{u}_\ell + \boldsymbol{W}_{\ell ,\ell -1} \boldsymbol{r}_{\ell -1} + \boldsymbol{e}_\ell + \boldsymbol{\xi }_\ell \,, \end{aligned}$$
(1)

with bottom-up input \(\boldsymbol{W}_{\ell ,\ell -1} \boldsymbol{r}_{\ell -1}\), and noise \(\boldsymbol{\xi }_\ell \); the local error signal \(\boldsymbol{e}_\ell \) is used to update forward weights through \(\boldsymbol{\dot{W}}_{\ell ,\ell -1} \propto \boldsymbol{e}_\ell \, \boldsymbol{r}_{\ell -1}^T\). Errors are passed down from higher layers through top-down synapses \(\boldsymbol{B}_{\ell ,\ell +1} \) via \(\boldsymbol{e}_\ell = \varphi ' \cdot \, \boldsymbol{B}_{\ell ,\ell +1} \, \boldsymbol{e}_{\ell +1} \).

As suggested in [7], the different terms in Eq. (1) correspond to the different compartments of a pyramidal neuron, and the error is transported as the difference in firing rates of pairs of pyramidal and interneurons.

Fig. 1.
figure 1

PAL aligns weight updates with backpropagation in hierarchical cortical networks. (a) Cortical pyramidal cells as functional units of sensory processing and credit assignment. Bottom-up (\(\boldsymbol{W}_{\ell +1,\ell }\)) and top-down (\(\boldsymbol{B}_{\ell ,\ell +1}\)) projections preferentially target different dendrites. Due to stochastic dynamics of individual neurons, noise is added to the signal. (b) We train the backward projections in a deep, dendritic microcircuit network of multi-compartment neurons with layer sizes [5-20-10-20-5] using our method PAL. All backward weights \(\boldsymbol{B}_{\ell ,\ell +1}\) are learned simultaneously, while forward weights are fixed. Forward weights are initialised s.t. neurons are activated in their linear regime. (c) Same as b, but with weights initialised in non-linear regime. (d) In a simple teacher-student task with a neuron chain [1-1-1] of dendritic microcircuits, PAL is able to flip the sign of backwards weights, which is crucial for successful reproduction of the teaching signal. (e) PAL solves teacher-student task, where feedback alignment fails. The teaching signal (red dashed) requires positive forward weights, whereas all student networks are initialised with negative \(\boldsymbol{W}_{1,0}\). Note that PAL only learns the correct forward weights once the backwards weights have flipped sign (at epoch \(\sim \) 500). (f-h) PAL learns useful latent representations on the MNIST autoencoder task, whereas FA leads to poor feature separation. We train a network [784-200-2-200-784] using leaky-integrator neurons on the MNIST autoencoder task: (f) Shown are the activations after training in the two-neuron layer for all samples in the test set; colors encode the corresponding label. BP and PAL show improved feature separation compared to FA. (g) Linear separability of latent activation. (h) Alignment angle of top-down weights to all layers for networks trained with PAL. PAL is able adapt top-down weights while forward weights are learned at the same time. All curves show mean and standard deviation over 5 seeds.

PAL learns from the noise \(\boldsymbol{\xi }_\ell \) accumulated on top of a stimulus signal as it passes through the network. Backprojections are learned using high-pass-filtered rates \(\widehat{\boldsymbol{r}}_{\ell +1}\) through the rule

$$\begin{aligned} {\boldsymbol{\dot{B}}}_{\ell ,\ell +1} \propto \boldsymbol{\xi }_\ell \; \big ( {\widehat{\boldsymbol{r}}}_{\ell +1} \big )^T - \alpha \, {\boldsymbol{B}}_{\ell ,\ell +1} \;. \end{aligned}$$
(2)

By exploiting the autocorrelation properties of neuronal noise, this learning rule dynamically achieves approximate alignment \(\boldsymbol{B}_{\ell ,\ell +1} \, || \, \boldsymbol{W}_{\ell +1,\ell }^T\) for all layers simultaneously, and without interrupting the learning of forward weights (see Fig. 1 (b,c)). This allows networks which implement PAL to efficiently learn all weights (feedforward and feedback) without phases, as opposed to many bio-inspired learning rules found in the literature (e.g., Difference Target Propagation and variants [1, 3], AGREL [5, 6], Equilibrium Propagation [8]).

3 Results

We have evaluated PAL on varius tasks: for an excerpt of results, see Fig. 1 (b-h). Additionally, we benchmark PAL using standard tests such as the MNIST digit classification task, where the dendritic microcircuit model (of network size: [784-100-10]) achieves a final test error \(3.9 \pm 0.2\) % using PAL and \(4.7 \pm 0.1\) % with microcircuits with FA. We emphasize that our results were achieved through simulation of a fully dynamic, recurrent system that is biologically plausible. Weight and voltage updates were applied at every time step, and populations of multi-compartment neurons were used as a bio-plausible error transport mechanism. Our findings demonstrate that PAL can efficiently learn all weights and outperforms FA on tasks involving classification and latent space separation.

We argue that PAL can be realized both in biological and, more generally, physical components. Specifically, it capitalizes on the inherent noise present in physical systems and leverages simple filtering techniques to distinguish between signal and noise where necessary. A realization of PAL (or a variant) in physical form, whether in the cortex or on neuromorphic systems, constitutes an elegant solution to the weight transport problem, while enabling efficient learning with purely local computations.