Keywords

1 Introduction

Much of our previous work [13] regards the search for neuronal network mechanisms, whose emergent states underlie behavioral aspects traditionally studied by psychiatry, psychoanalysis and neuroscience [49]. A working hypothesis in neuroscience is that human memory is encoded in the neural net of the brain, and associativity is frequently used to describe mental processes, both in normal and pathological functioning. Neuronal models of associative memory [10] have therefore formed a central component of our descriptions.

In traditional neural network models of memory, such as the Hopfield model [10], the symmetry of synaptic connections is a necessary mathematical requirement for reaching stationary states (memory) [10, 11]. This is the case both when using the Boltzmann Machine (BM) procedure, as when employing more recent approaches based on the Generalized Simulated Annealing (GSA) algorithm [12]. Real biological neural networks, however, do not seem to comply with the synaptic symmetry condition. We then face the curious situation that the main mathematical-mechanistic neural models for memory are based on an assumption that is at odds with biological reality. There have been efforts to model stationary memory attractor states with asymmetric weights, but they are mathematically complex and usually applicable only to restricted situations [13, 14]. In spite of these interesting attempts, and even though memory modeling with neural networks has been an active field of research for decades, the (a)symmetry issue remains largely an unexplored (and almost forgotten) open problem. This indicates the need to consider new alternative approaches to this subject. Our main aim in the present exploratory work is to point out basic similarities between the synaptic symmetry problem and some aspects of the nonlinear Fokker-Planck (NLFP) dynamics. These connections may lead to a new possible way to address the symmetry problem in neural networks. Here we advance the first steps in the development of a formalism based on the non-linear Fokker-Planck equation, which we summarize in this paper, along with some preliminary results. A more detailed discussion is being prepared for an extended publication.

In previous work, we have used the Boltzmann Machine [10] and Generalized Simulated Annealing [12] to simulate memory. In the BM and GSA, pattern retrieval on the net is achieved by a simulated annealing (SA) process, where the temperature T is gradually lowered by an annealing schedule \(\alpha \). For a BM or GSA network with N nodes, where each node i has a discrete state \(S_i\) in \(\{-1, 1\}\), it is a necessary condition for the network to have stable states that synaptic weights between nodes i and j obey \(w_{ij} = w_{ji}\). One can then define an Energy function, representing the potential energy corresponding to the interactions between neurons,

$$\begin{aligned} E(\{S_i\}) = -\frac{1}{2} \sum \limits _{ij} w_{ij} S_iS_j, \end{aligned}$$
(1)

and stored memories correspond to minimum energy (stable) states, which are attractors in the memory retrieval mechanism (SA process).

In the SA process, the energy surface is sampled according to the following transition probabilities. For the Boltzmann Machine (BM)

$$\begin{aligned} P_{BG}(S_i\rightarrow - S_i) = \frac{1 }{1 + \exp \frac{ ( E(\{-S_i\}) - E(\{S_i\}) ) }{T} }, \end{aligned}$$
(2)

and for the Generalized Simulated Annealing or Tsallis Machine (GSA) [12]

$$\begin{aligned} P_{GSA}(S_i\rightarrow - S_i) = \frac{1 }{ \left[ 1 + (q-1) \frac{( E(\{-S_i\}) - E(\{S_i\})}{T} \right] ^{\frac{1}{q-1}} }. \end{aligned}$$
(3)

These transition probabilities tend to take the system from a current state towards a final, more favorable minimum energy state (although energy may increase at intermediate steps).

In Sect. 2, we briefly review neural network models as related to basic theory of Dynamical Systems. We then introduce basic aspects of the Fokker-Planck formalism. In Subsect. 4.2, we show that it is possible to introduce a drift or force term not arising from the gradient of a potential, which is related to asymmetric couplings, and still achieve stationary states for the probability density function, in the phase space describing the system. We also mention further developments and present our conclusions in the last section.

2 Dynamical Systems and Neural Networks

For a continuous deterministic dynamical system with phase space variables \( \{ X_1, X_2, \cdots , X_N\} \), considering that there is no noise, the equations of motion can be expressed as

$$\begin{aligned} \frac{dX_1}{dt}= & {} G_1(X_1, X_2, \cdots X_N) \\&\vdots&\\\frac{dX_N}{dt}= & {} G_N(X_1, X_2, \cdots X_N), \end{aligned}$$
(4)

which in self-explanatory vector notation is expressed as \( \frac{d\mathbf X}{dt}\,=\,\mathbf{G}(\mathbf{X})\), with \(\varvec{X}, \varvec{G} \in \mathfrak {R}^N \). That is, the time evolution of the system’s state \(\varvec{X} \) is described by a phase space flux given by the vectorial field \(\varvec{G}\). Neural networks have been widely studied within this framework [10]. In neural network models, the synaptic weight \(w_{ij}\) expresses the intensity of the influence of neuron j on neuron i (the coupling). So the net signal input to neuron i is given by

$$\begin{aligned} u_i = \sum \limits _j w_{ij} V_{O_j}, \end{aligned}$$
(5)

where \( V_{O_j}\) is the output signal of neuron j.

It is possible to generalize the McCulloch-Pitts (discrete activation) neural model, in order to consider continuous state variables [10, 11], so that \(V_{O_i}\) (in equilibrium) is updated by a continuous function of \(u_i\),

$$\begin{aligned} V_{O_i}(t + \Delta t) = g (u_i(t)). \end{aligned}$$
(6)

In Eq. (6), the activation function g(u) is usually nonlinear and saturates for large values of |u|, such as a sigmoid or \(\tanh (u)\). One possible continuous-time rule for updating the \(V_{O_i}\) [11, 15], is the set of differential equations

$$\begin{aligned} \frac{dV_{O_i}}{dt} = \frac{-V_{O_i} + g(u_i)}{\tau _i} = G_i(V_{O_1},V_{O_2}, \ldots ), \end{aligned}$$
(7)

where \(\tau _i\) are suitable time constants.

In traditional neural network models of memory, such as the Hopfield model, BM and GSA, \(w_{ij} = w_{ji}\) is a necessary condition for reaching stationary states (memory). This symmetry restriction seems to be biologically unrealistic. In this contribution we comment on the similarities between the synaptic symmetry problem and some features of the nonlinear Fokker-Planck dynamics, which may shed new light on this problem and suggest possible ways to tackle it.

3 Fokker-Planck Equation

We now consider an ensemble of identical systems, each consisting of N elements, that evolve from different initial conditions. This ensemble is described by a time-dependent probability density in phase space \({\mathcal P}(X_1, \cdots , X_N, t)\) obeying the Liouville equation

$$\begin{aligned} \frac{\partial {\mathcal P}}{\partial t} + \sum _{i=1}^N \frac{\partial ( {\mathcal P} G_i )}{\partial X_i} = 0. \end{aligned}$$
(8)

If the system presents noisy behavior, it is necessary to add a new diffusion-like term in Eq. (8), which results in the Fokker-Planck equation (FPE)

$$\begin{aligned} \frac{\partial {\mathcal P}}{\partial t} = D \left( \sum _{i=1}^N \frac{\partial ^2 {\mathcal P}}{\partial X_i^2} \right) - \sum _{i=1}^N \frac{\partial ( {\mathcal P} G_i )}{\partial X_i}, \end{aligned}$$
(9)

where D is the diffusion coefficient and the second term on the right, involving the field \(\varvec{G}\), is referred to as the drift term. We shall call \(\varvec{G}\) the drift field. If

$$\begin{aligned} G_1= & {} - \partial V/\partial X_1, \\&\vdots&\\G_N= & {} - \partial V/\partial X_N, \end{aligned}$$
(10)

for some potential function \(V(\varvec{X})\), there is a Boltzmann-Gibbs-like stationary solution to Eq. (9),

$$\begin{aligned} {\mathcal P}_{BG} = \frac{1}{Z} \exp \left[ -\frac{1}{D} V(\varvec{X}) \right] , \end{aligned}$$
(11)

where Z is an appropriate normalization constant. That is, \({\mathcal P}_{BG}\) satisfies (9) with \( \frac{\partial {\mathcal P}_{BG}}{\partial t} = 0 \). The distribution \({\mathcal P}_{BG}\) maximizes the Boltzmann-Gibbs entropy \(S_{BG}\) under the constraints of normalization and the mean value \(\langle V \rangle \) of the potential V.

Note that a dynamical system with a flux in phase space of the form (10) (gradient form) corresponds to a system that evolves so as to minimize V, i.e. down-hill along the potential energy surface. For a field \(\varvec{G} \) of the form (10) one has,

$$\begin{aligned} \frac{\partial G_i}{\partial X_j} = \frac{\partial G_j}{\partial X_i} = \frac{\partial ^2 V}{\partial X_i\partial X_j}. \end{aligned}$$
(12)

In the case of a Hopfield Neural Network, if the activation g(u) is linear, for example in Eq. (7), \(G_i \propto \sum \limits _j w_{ij} X_j \) (corresponding to linear forces),

$$\begin{aligned} \frac{\partial G_i}{\partial X_j} = w_{ij}, \end{aligned}$$
(13)

and therefore, by Eq. (12), \(w_{ij} = w_{ji}\). We see that the general condition (12), that guarantees that the Fokker-Planck dynamics evolves towards a stationary Boltzmann-Gibbs distribution (11), is very similar to the synaptic symmetry requirement, necessary for a neural network to evolve towards minima of an energy surface. This similarity is, of course, also closely related to the fact that the simulated annealing technique provides a useful algorithm to find the minima of the network’s energy landscape. In the Fokker-Planck case, however, it is possible to relax the condition (12), considering more general drift fields, and still have a dynamics that leads to a stationary Boltzmann-Gibbs distribution. This suggests that the Fokker-Planck scenario with non-gradient drift fields may be relevant to the synaptic symmetry problem. In the following sections we explore some basic aspects of this scenario, within the more general context of the nonlinear Fokker-Planck equation.

4 Nonlinear Fokker-Planck Equation

In [16], Ribeiro, Nobre and Curado state: “The linear differential equations in physics are, in many cases, valid for media characterized by specific conditions, like homogeneity, isotropy, and translational invariance, with particles interacting through short-range forces and with a dynamical behavior characterized by short-time memories”. It is possible to introduce a nonlinear diffusion term to the FPE to describe a physical ensemble of interacting particles, so that the nonlinearity is an effective description of the interactions [1620]. Physical systems characterized by spatial disorder and/or long-range interactions seem to be natural candidates for this formalism, which has recently attracted considerable attention from the complex systems research community.

We thus use the nonlinear Fokker-Planck equation (NLFP)

$$\begin{aligned} \frac{\partial {\mathcal P}}{\partial t} = D \left[ \sum _{i=1}^N \frac{\partial ^2 }{\partial X_i^2}\left( {\mathcal P}^{2 - q}\right) \right] - \sum _{i=1}^N \frac{\partial ( {\mathcal P} G_i )}{\partial X_i}, \end{aligned}$$
(14)

to study systems which may deviate from the linear description. Since we need to model stable properties of interesting physical systems, such as the stored memory states in a neural network, we search for possible stationary solutions to Eq. (14).

4.1 Stationary Solution - \(\varvec{G}\) of Gradient Form

In the most frequently studied case, where the field \(\varvec{G}\) is of the gradient form (10), the stationary solution of the NLFP is found by solving,

$$\begin{aligned} D \left[ \sum _{i=1}^N \frac{\partial ^2 }{\partial X_i^2}\left( {\mathcal P}^{2 - q}\right) \right] - \sum _{i=1}^N \frac{\partial ( {\mathcal P} G_i )}{\partial X_i} = 0, \end{aligned}$$
(15)

considering the Tsallis ansatz [20]

$$\begin{aligned} {\mathcal P}_q = A [1 - (1-q)\beta V(\varvec{X})]^\frac{1}{1-q}\, , \end{aligned}$$
(16)

where A and \(\beta \) are constants to be determined. One finds that the ansatz given by Eq. (16) is a stationary solution of the NLFP equation, if

$$\begin{aligned} A = [(2-q) \beta D]^\frac{1}{q-1}. \end{aligned}$$
(17)

We call Eq. (16) the q-exponential ansatz. As already mentioned, it constitutes a stationary solution of the NLFP equation, when \(\varvec{G} \) is minus the gradient of a potential V (Eq. (10)), and A and \(\beta \) satisfy Eq. (17). The distribution \({\mathcal P}_{q}\) is also called a q-maxent distribution because it optimizes the nonextensive q-entropy \(S_{q}\), under the constraints of normalization and the mean value of the potential V [17, 20]. In the limit \(q \rightarrow 1\), the q-maxent stationary distribution (16) reduces to the Boltzmann-Gibbs one (11), with \(\beta = 1/D\).

4.2 Stationary Solution - \(\varvec{G}\) Not of the Gradient Form

Now we consider the NLFP equation, with a drift term not arising from the gradient of a potential and with the form

$$\begin{aligned} \varvec{G} = \varvec{F} + \varvec{E}{,} \end{aligned}$$
(18)

where \(\varvec{F}\) is equal to minus the gradient of some potential \(V(\varvec{X})\), while \(\varvec{E}\) does not come from a potential function (that is, we have \(\partial E_i /\partial X_j \ne \partial E_j /\partial X_i\)). We then substitute this \(\varvec{G}\) and \({\mathcal P}_q\) (Eq. (16)) in the stationary NLFP Eq. (15) and obtain

$$\begin{aligned} D \left[ \sum _{i=1}^N \frac{\partial ^2 }{\partial X_i^2}\left( {\mathcal P}_q^{2 - q}\right) \right] - \left[ \sum _{i=1}^N \frac{\partial ( {\mathcal P}_q F_i )}{\partial X_i} \right] - \left[ \sum _{i=1}^N \frac{\partial ( {\mathcal P}_q E_i )}{\partial X_i} \right] = 0. \end{aligned}$$
(19)

The first two terms in Eq. (19) vanish, because we know that \({\mathcal P}_q\) is a stationary solution of Eq. (15), when only the gradient field \(\varvec{F}\) is present. In order for \({\mathcal P}_q\) to satisfy (19), we then require \( \sum _{i=1}^N \partial ( {\mathcal P}_q E_i )/\partial X_i = 0. \) If this relation is satisfied, then \({\mathcal P}_q\) is also a stationary solution of the full NLFP equation, including the non-gradient term corresponding to \(\varvec{E}\). We therefore require

$$\begin{aligned} \sum _{i=1}^N \frac{\partial }{\partial X_i} \left( E_i [1 - (1-q)\beta V]^\frac{1}{1-q} \right) = 0, \end{aligned}$$
(20)

This equation constitutes a consistency requirement that the potential function V, the non-gradient field \(\varvec{E}\), the inverse temperature \(\beta \), and the entropic parameter q have to satisfy in order that the nonlinear Fokker-Planck equation admits a stationary solution of the q-maxent form. In the most general \(\beta \)-dependent situation, the condition given by Eq. (20) leads to a rather complicated relation between the non-gradient component E and the potential V. However, there are cases where a \(\beta \)-independent set of constraints can be obtained. We illustrate this with a two-dimensional example. Consider two-dimensional fields F and G with components of the form,

$$\begin{aligned} (F_1; \,\, F_2)= & {} (-w_{11}X_1 - w_{12} X_2; \,\, -w_{22}X_2 - w_{12} X_1 ) \\(E_1; \,\, E_2)= & {} (h_{11}X_1 + h_{12} X_2; \,\,\,\, h_{21}X_1 + h_{22} X_2 ), \end{aligned}$$
(21)

with the \(w_{ij}\) and \(h_{ij}\) constant real parameters. The field F is minus the gradient of the potential

$$\begin{aligned} V(X_1,X_2) = \frac{1}{2}\left( w_{11}X_1^2 + 2w_{12} X_1 X_2 + w_{22}X_2^2 \right) . \end{aligned}$$
(22)

It can be verified after some algebra that the q-maxent distribution (16) is a stationary solution of the NLFP equation, if the parameters characterizing the potential V and the (non-gradient) drift term E satisfy,

$$\begin{aligned} h_{11} = h_{22} = w_{12} = 0, \\w_{11} h_{12} + w_{22} h_{21} = 0. \end{aligned}$$
(23)

We see that we have a family of noisy dynamical systems (described by NLFP equations) characterized by 3 independent parameters, that have q-maxent stationary solutions in spite of having drift fields not necessarily arising from a potential. This can be appreciated from the fact that the constraints (23) are compatible with \(h_{12} \ne h_{21}\). Therefore, the drift field given by Eq. (21) does not necessarily comply with the symmetry restriction described by Eq. (12), which is akin to the standard symmetry condition in neural networks.

We are preparing an extended manuscript with a more detailed and general discussion of the ideas that we presented here briefly, due to space limitations. There we plan to explore systematically the conditions for having a q-maxent stationary state in more general scenarios described by NLFP equations, where the deterministic part of the concomitant dynamics involves a phase space flux not having the gradient form.

5 Conclusions

Inspired on the symmetry problem in neural networks, we explored properties of multi-dimensional NLFP equations endowed with drift fields not arising from a potential. We considered drift fields having both a gradient term and a non-gradient contribution. The non-gradient component of the drift field exhibits asymmetric features akin to those associated with the dynamics of neural networks with asymmetric synaptic weights. We identified cases where a NLFP equation having a non-gradient drift field still has a stationary solution of the q-maxent form (i.e. a q-exponential of the potential associated with the gradient part of the drift field). In future contributions, we plan to continue exploring the connections between the NLFP equation and the synaptic symmetry problem in the dynamics of neural networks, in order to apply the formalism of a non-potential drift term to account for attractor states in neuronal circuits, with asymmetric synaptic interactions.