Introduction

Numerous studies have addressed the neural origins of the generation, organization, and adaptability of behavioral actions. Two general mechanisms can be recognized to underlie the production of behavior: reflexive processes, whereby the expression of an act is the reactionary motor response to a triggering sensory stimulus arising from the periphery (Fig. 1a), and autonomous mechanisms, in which the basic determinants of a behavior arise from motor output that is produced by pattern-generating circuitry within the central nervous system (Fig. 1b). From these studies on a variety of animal model systems, particularly the simpler and more accessible behaviors and nervous systems of invertebrates, significant progress has been made in understanding the synaptic and cellular properties of the underlying neural pathways and processes of learning and memory by which reflexively driven acts can be dynamically adjusted in an history-dependent manner in accordance with an animal’s changing external environment. For example, much is known about the neural control of reflexes such as those involved in defensive gill, siphon, and tail withdrawal in the mollusc Aplysia, and the cellular and molecular basis of the ability of the underlying sensori-motor pathways to adapt through simple forms of non-associative (habituation and sensitization) and associative learning (classical and operant conditioning) [for reviews see 13].

Fig. 1
figure 1

Different functional configurations of sensori-motor circuitry in Aplysia. a The neuronal pathways underlying reflexive behavior are composed of sensory input neurons which directly, and/or indirectly via relay interneurons, connect to motor output neurons (upper schematic). Neural activity in these otherwise silent circuits requires triggering by peripheral sensory stimuli (lower schematic) and propagates uni-directionally (arrows in upper schematic) to the motor neurons to elicit a transient behavioral response. b The neuronal circuits generating autonomous behaviors, including feeding behavior, consist of reciprocally interconnected groups of neurons (boxes and large circle in upper schematic) which interact (arrows) to produce patterned motor output. This is generated spontaneously and repetitively due to the regenerative membrane properties and synaptic connections of neurons within a central pattern-generating (CPG) network (encircled). Although sensory inputs are not required for actual motor pattern genesis, they can trigger or regulate ongoing CPG operation (lower schematic)

Unlike the sensori-motor transformations underlying reflexive behavior in which the rate of occurrence is a function of that of the eliciting stimulus, the internally driven motor commands responsible for motivated or goal-directed acts, such as feeding or sexual behavior, which are emitted spontaneously at a rate set by the animal itself [4], depend on processes of decision-making in the selection and initiation of appropriate actions. Such autonomous behaviors can also be modified by sensory information through non-associative and associative learning, although the functional interaction between external stimuli and internal decision-making circuitry remains poorly understood. A relevant learning paradigm to address this issue is operant conditioning in which an animal learns about the consequences of its goal-directed behavior. In operant conditioning, a contingent association is made between a specific act (the operant) and a reinforcing (rewarding) or an aversive (punishing) stimulus [5, 6]. As a result of positive reinforcement, operant conditioning reduces variability in internally driven behavior not only by favoring appropriate (rewarded) action but also by regulating its spontaneous recurrence in time [711]. This can result in a habitual and ultimately compulsive expression of the rewarded action that becomes initiated in a stereotyped rhythmic manner at a relatively high frequency, even after the rewarding stimulus has been removed [1116]. Conversely, in an operant-negative reinforcement relationship, an animal learns through unfavorable behavioral consequences to diminish the spontaneous expression of a particular act, thereby decreasing the likelihood of a future confrontation with the unfavorable stimulus [4]. Such behavioral changes in turn necessitate alterations in the central neuronal processes engaged in motor pattern selection and initiation [17]. However, the cellular mechanisms that underlie the operant reward-induced acquisition of compulsive behavior from an otherwise impulsively expressed act, or the opposing relationship between reward versus aversive associative learning, are still largely unknown.

Among a variety of invertebrates that express memory formation through operant conditioning [7, 8, 1820, for review see 10], the feeding behavior of Aplysia has provided important insights into the underlying cellular and network mechanisms, due in large part to parallel studies in which the key neuronal components of the pattern-generating networks that drive feeding-related movements have been identified and characterized [2128, for review see 29]. Thus, by using in vivo operant conditioning paradigms in combination with in vitro approaches, it has been possible in some cases to pinpoint the sites and actions of reinforcing sensory signaling on known target neurons and networks and in turn relate these specific influences to the outcome for the actual behavior that the circuits produce. The purpose of this review is to highlight recent investigations on Aplysia in which learning-induced plasticity in its feeding behavior, particularly that related to appetitive operant conditioning, has been able to be traced to the biophysical properties and synaptic interactions of identified components within the central pattern-generating circuitry that produces the behavior. The data obtained not only demonstrate the involvement of operant learning processes in exploratory and goal-directed behaviors in which decision-making about action selection and initiation is pivotal, but also that memory storage resulting from the convergence of operant behavior and reinforcement can occur at diverse cell- and circuit-wide loci.

Operant reward learning in Aplysia’s feeding behavior

The herbivorous Aplysia (Fig. 1a) performs a variety of actions during its search for food, including locomotion, head-waving and cycles of protraction and retraction of its rasp-like radula [30]. Repeated exploratory and consummatory radula biting movements are expressed in an all-or-none manner, rendering them easily observable and quantifiable. Moreover, this goal-directed behavior, which in the absence of ingestible food is expressed relatively infrequently with highly variable and unpredictable inter-bite intervals (Fig. 2ai), can be durably modified by action-reward contingency in appetitive operant conditioning [31]. After 40 min of contingent reinforcement training during which an animal received an ingestible food reward (either a piece of algae or a bolus of seaweed extract injected directly into the buccal cavity) in association with each spontaneous bite (Fig. 2aii), the variability and cycle period of bite occurrences both drastically reduced, leading to prolonged (for several hours following training) bouts of intense and seemingly automatic radula movement cycles with an elevated frequency and a stereotyped rhythmic organization (Fig. 2b, middle). A strict contingence of the appetitive reward with the intrinsically driven motor action was found to be both necessary and sufficient for the induction of this behavioral plasticity, since changes in radula movements did not develop with the presentation of an unpalatable food stimulus or when repetitive delivery of the seaweed reward was uncorrelated (i.e., non-contingent) with biting behavior (Fig. 2aiii, b, right). Thus, the long-lasting plasticity in the cycle-by-cycle initiation and temporal organization of a goal-directed component of Aplysia’s feeding behavior emerged from a specific associative and operant process through which the animal was able to “memorize” the positive consequences (access to food) of its own actions.

Fig. 2
figure 2

Operant reward learning in Aplysia’s feeding behavior. a Protocol for appetitive operant conditioning. Three groups of animals (Control, Contingent reward, Non-contingent reward) were subjected to a sustained food stimulus to the lips (top of frames) to incite the maintenance of feeding posture (left) and the generation of repeating cycles of radula biting movements (blue circles at right). Control: no additional food reward stimulus was provided. Contingent reward: a small volume of seaweed juice was additionally injected into the buccal cavity (at green rectangles) in association with each spontaneous radula bite during a 40-min training period. Non-contingent: the food reward was delivered at regular intervals and independently of bite occurrences. b Behavioral changes induced by operant reward training. In control animals (left), radula bites occurred erratically (blue circles) and in a random temporal succession as indicated by the corresponding autocorrelation histograms (below). In contrast, contingently rewarded animals (middle) generated radula movement cycles at a higher frequency and in stereotyped rhythmic succession (as indicated by the autocorrelation histogram fit with a Gabor sinusoidal function). This intensified rhythmic behavior was a specific consequence of the contingent operant reward association, because it was not expressed by non-contingently rewarded animals (right) which continued to express erratic, irregular biting behavior similar to control animals

It is interesting, moreover, that this operant learning-dependent and persistent increase in the motivational state of the animal, in which an appetitive action switches from erratic, impulsive and irregular recurrences to an accelerated and stereotyped yet ultimately unsuccessful behavior (when the reward is withdrawn), bears a striking resemblance to the induction of non-pathological compulsive actions in more complex organisms, including humans. Here also, the internally driven impulse to produce a specific goal-directed act, such as feeding, sexual activity or drug taking, can be regulated by external rewarding stimuli that through contingent reinforcement in operant conditioning can lead to accelerated and quasi-automatic, compulsive-like recurrences of the action which persevere after reward stimulus withdrawal [11, 1316]. Thus, the contingent reward-induced modifications observed in the simpler Aplysia’s feeding behavior could provide relevant insights into the neuronal basis of non-pathological compulsive actions resulting from rewarding feedback in other more complex organisms, as well as in drug addiction and compulsive eating disorders in humans.

Operant learning that food is inedible

In direct contrast to the positive reinforcement of spontaneous ingestive behavior through appetitive operant conditioning, an Aplysia can also learn to modify its responsiveness to food on the basis of whether it succeeds in actually swallowing it [8]. In an associative learning paradigm of aversive operant conditioning, animals can be trained to attempt to swallow a food enclosed in plastic netting. The food is tasted through holes in the net, inducing ingestive biting, food entry into the buccal cavity, and repetitive but failed attempts to swallow. The inedible food is eventually ejected through radula movements in a sequence that pushes the net-enclosed bolus out of the mouth, rather than pulling it in (see below). The food continues to stimulate the lips, eliciting ingestive radula bites that again lead to failed swallows. The ineffective attempts to swallow act as negative reinforcement so that as training proceeds, the netted food remains in the buccal cavity for shorter periods, inducing fewer swallows and more rejections until eventually the animal stops responding as it learns that the food is inedible [8]. Inedible food training for periods as short as 5–15 min causes a subsequent diminished behavioral responsiveness that persists for >24 h [32], and as for long-term memory formation in other systems [33, 34], this learning-induced response decrement was found to be correlated with an increased expression of the transcription factor CCAAT/enhancer-binding protein (C/EBP) that was specifically localized to the nervous centers controlling feeding [35].

Cellular and network mechanisms of operant learning

The central pattern generator (CPG) network that drives Aplysia’s feeding-related behavior, including radula biting movements, is distributed between the bilaterally paired buccal ganglia and many of the key network components, their bioelectrical properties and synaptic connectivity have been previously identified and characterized ([2128, for review see 29]; Fig. 3a, b). Two operational features of the buccal CPG have been crucial to pinpointing the neuronal substrates of behavioral plasticity in this system. First the feeding neural circuitry, consisting of sensory input pathways, interneurons and motoneurons, continues to produce the motor output patterns for radula behavioral actions when the buccal ganglia are removed from the animal (Fig. 3a, upper panel). Secondly, this so-called “fictive” biting expressed by the motor pattern-generating circuitry of isolated buccal ganglia conserves essential features of the behavioral modifications induced in vivo by food-related classical or operant conditioning [8, 31, 36, 37] or via analogous paradigms of sensory nerve stimulation in vitro [3845].

Fig. 3
figure 3

Neuronal substrates for radula motor pattern selection and initiation. a Schematic of isolated bilateral buccal ganglia (B.g.) and sites for extracellular recordings of radula motor patterns (below) from peripheral motor nerves conveying protraction (Protr.), retraction (Retrac.) and closure (Clos.) phase motor discharge. During tonic electrical stimulation of sensory nerve 2,3 (Stim.) to mimic the inciting non-ingestible food stimulus applied in vivo (see Fig. 2a), two distinct motor patterns corresponding to ingestion (left) and egestion (right) can be generated. The patterns are distinguished by the relative durations of the protraction/retraction phases and the occurrence (or absence) of closure motor activity during the retraction phase. b Simplified representation of the buccal CPG that drives radula motor patterns. Protraction phase activity (Protr.) is triggered by electrically coupled neurons, including B30/B63/B65. Retraction activity (Retr.) is triggered by a different CPG subset that includes B64/B4/5. Closure motor activity (Clos.) is driven by neurons such as B30/B51, which discharge either uniquely during protraction (egestion) or during both the protraction and retraction phases of activity (ingestion). Resistor symbol electrical coupling, small black circles chemical inhibition, triangles chemical excitation. c Radula motor pattern selection. Ingestive (left) an egestive (right) motor patterns are generated by the same multifunctional CPG. In both patterns, bursting activity in B63 and B64 neurons generate the protraction (P) and retraction (R) phases, respectively. Bursting in neuron B51 occurs uniquely during ingestive motor pattern genesis (left) and elicits closure (C) activity during the prolonged retraction phase (R) of this pattern. In contrast, when B51 remains inactive it enables egestion motor pattern production (right). d, e Motor pattern initiation. The two electrically coupled B63 neurons in the bilateral B.g. are necessary and sufficient to initiate individual radula motor patterns. d Experimental depolarization of one B63 (horizontal bar) triggers bursts in both B63 neurons and elicits a fictive ingestive bite (left). However, motor pattern genesis fails to occur when the contralateral B63 is simultaneously hyperpolarized to prevent it from bursting (right). Note that burst discharge in the B30 and B65 neurons can activate the B63 cells through their electrical coupling (and thereby trigger a motor pattern) but neither are alone (or together) sufficient to trigger a bite cycle. Thus, any one of these cell types can be the first active cell in a radula bite cycle (e), although the actual pattern-initiating process is dependent on the activation of the bilateral B63 neurons

The motor output patterns that drive radula movements during feeding consist of all-or-none sequences of burst activity in motor nerves controlling radula protraction, retraction, opening and closure (Fig. 3a, lower panel). In the intact animal, depending on the quality of sampled food, the buccal CPG is able to alter the phase relationship between the two subsets of alternating protraction/retraction and opening/closing activity so that radula movements switch from normal ingestive biting that transports food through the buccal mass into the foregut to egestive actions that remove inedible objects from the foregut [8, 46]. During ingestion, the two radula halves are protracted out of the buccal cavity to close around food which is then withdrawn into the cavity during retraction. Alternatively, the radula can close on an inedible element in the buccal cavity and eject it by concomitant radula protraction. Thus, for an ingestive bite, radula closure occurs during the retraction phase, whereas in egestion, closure movement now occurs during radula protraction. On this basis, therefore, the feeding-related actions of Aplysia correspond to two fundamental components of behavioral choice and decision-making that are confronted by simpler [47, 48] and complex animals alike [49]: the decision as to whether or not to repeat a given act (e.g., an ingestive radula bite) and the selection between alternative behaviors (e.g., ingestion vs. egestion movements).

Cell-wide plasticity in operant conditioning of action selection

A key “decision-maker” for action selection in Aplysia’s feeding behavior is an identified element (neuron B51) of the buccal CPG (Fig. 3b) which, depending upon its inherent levels of excitability, can transfer the network between functional states of ingestive or egestive motor program production (Fig. 3b, lower; [39]). Specifically, the intrinsic membrane properties of B51 that include an ability to produce sustained, regenerative plateau potentials ([23]; see Fig. 5a) allows it to switch in an all-or-none manner between an inactive and active state during which the cell fires an intense burst of impulses. B51, which is not essential for the actual genesis of buccal motor patterns, exerts its effects through diverse synaptic connections, including mixed electrical and chemical excitatory synapses with radula closure motoneurons and with the B64 neuron, a major component of the CPG subcircuit that drives retraction phase motor output (see Fig. 3b). Thus, when B51 is in an active state (Fig. 3c, left), it reinforces and prolongs discharge in the B64 retraction generator, which additionally inhibits the protraction generator subcircuit (see Fig. 3b), and thereby ensures that closure motor activity occurs mainly during retraction. When B51 remains inactive (Fig. 3c, right), closure motoneurons are activated principally by central neurons belonging to the protraction generator (e.g., B65; see Fig. 3b), which in turn inhibits B64 and now leads to closure and protraction motor discharge occurring concomitantly. Therefore, although B51 is not involved in the cycle-by-cycle initiation of buccal motor programs, a dynamic switching between the cell’s two functional states is critical for determining the motor output phenotype that the buccal CPG produces: plateau-generated bursts in B51 promote the occurrence of ingestion-related motor patterns, whereas an absence of B51 activation represses ingestion patterns and promotes the expression of egestion-related patterns (Fig. 4ai) [40].

Fig. 4
figure 4

Learning-induced regularization of buccal CPG operation. a Schematic representation of bursting activity in the B51 neuron during the generation of successive motor patterns in isolated B.g. preparations from the three experimental groups of animals (ingestion, I; egestion, E; boxes indicate phases of protraction (filled) and retraction (unfilled) activity, respectively). The probability of burst recurrences in B51 and the expression of ingestion-type motor patterns increases after contingent reinforcement (a ii), as compared to control (a i) and non-contingent preparations (a iii). b Bursting activity in the pattern-initiating neurons B30, 63 and B65 under the same experimental conditions as in a. The frequency, regularity and coordination of bursts in the three cells increase after contingent reinforcement. c Summary diagram of the functional configurations of the buccal CPG corresponding to the burst patterns shown in a and b. In the control and non-contingent activity sequences, the network configuration varied considerably from one pattern to another due to spontaneous changes in processes of pattern initiation (filled circles indicate the first active CPG neuron in each case) and selection (dashed circles indicate inactive cells (notably B51) which contribute to motor pattern selection). After contingent reinforcement, CPG operation becomes stabilized so that the pattern-initiating B63 neuron is systemically the instigator of each cycle, and B51 is always active

The excitability-dependent ability of the single B51 neuron to “decide” between the two feeding-related motor programs can be modified by operant conditioning and associative learning processes. In initial in vitro and subsequent in vivo conditioning experiments, direct electrical stimulation of an esophageal nerve branch (En2), which is presumed to convey sensory information about the presence of ingested food, was used to substitute contingent food reward reinforcement during actual feeding behavior [36, 39, 40]. Both operant training paradigms, in which En2 stimulation was made conditional upon each spontaneous radula motor pattern produced by the isolated buccal CPG [39, 40] or an actual radula bite in the freely behaving animal (see Fig. 2aii; [36]), produced a long-lasting (several hours) rate enhancement of ingestion-related biting activity (Fig. 4aii) that was associated with an increased input resistance and a decrease in burst plateau threshold of neuron B51 (Fig. 5aii). Stimulation of En2 in a non-contingent manner, either in vivo or in vitro, did not alter the membrane properties of B51 (Fig. 4aiii, 5aiii). The operant conditioning-induced cellular changes in B51, which together increased its excitability and hence the probability of the cell becoming active, thus increased the likelihood of the buccal CPG to produce ingestive rather than egestive motor patterns during its reinforced operation (Fig. 4aii). Moreover, the learning-induced excitability changes were intrinsic to B51 (rather than originating at a presynaptic locus) since they were also expressed in an isolated cell analogue of appetitive operant conditioning in which spontaneous plateau potentials in cultured B51 cells were contingently reinforced with iontophoretically applied pulses of dopamine [36, 45], the transmitter that is likely to mediate food signals from the esophageal input nerve in vivo [50]. Furthermore, injection of cAMP into naive isolated B51 neurons mimicked the excitability changes induced by single cell operant conditioning [44, 45], and consistent with this finding, cAMP injected into B51 in situ also increased bursting activity in the cell and in turn the number of ingestion-type motor patterns produced by the buccal CPG [45]. Such findings, together with evidence for an associative convergence (upstream of cAMP production and acting through adenylyl cyclase) of activity- and dopamine receptor-induced signaling cascades involving PKC and PKA, respectively [44], are therefore beginning to pinpoint the molecular processes by which memory storage for operant learning can be inscribed within the bioelectrical properties of key decision-making components of behavior-generating central circuitry [51].

Fig. 5
figure 5

Plasticity in regenerative membrane properties. a After contingent reinforcement, the capacity of B51 to produce burst-generating plateau potentials (as is evident from brief depolarizing current pulse injections) is increased as compared to same cell type in control (left) and non-contingent (right) preparations. b An intrinsic oscillatory capability of a single B63 neuron can be revealed by continuous membrane depolarization during coincident hyperpolarization of the contralateral B63 (B63c) to prevent the activation of the other pattern-initiating neurons and CPG circuitry (see also Fig. 3d). In control (left) and non-contingent (right) ganglia, such a functionally isolated B63 generates infrequent and irregular bursts, as indicated by the corresponding autocorrelation histograms (lower). In contrast, after contingent reinforcement, B63 produces accelerated and regularized repetitive bursting as indicated by the sinusoidal Gabor function fit (solid line) to the corresponding autocorrelation histogram

Finally, it remains to be established whether B51, in light of its capacity to determine the type of motor output that the buccal CPG produces, also serves as a pivotal decision-maker in the opposing behavioral response to aversive operant learning that food is inedible [8; also see above]. In this case, the pairing of a punishing esophageal sensory nerve signal to failed attempts to consume food contained in the buccal cavity [52] could actively lead to a decreased excitability of B51, thereby promoting the production of egestive rather than ingestive motor programs as buccal network output declines until the animal eventually ceases responding to the inedible food.

Cell- and circuit-wide plasticity in operant conditioning of action initiation

Although considerable data are now available on the impact and consequences of operant conditioning in terms of the pattern-switching B51’s ability to determine how the buccal CPG should act once a feeding cycle has started, until recently, much less was known about learning-induced changes at the level of pattern-initiating neurons, the buccal circuit cells whose activity determines when a behavioral cycle is actually instigated. Since a natural food stimulus in operant conditioning increased the regularity and frequency of ingestive bite occurrences in vivo (see above), and for which the neuronal correlates continued to be expressed by feeding circuitry in isolated buccal ganglia [38], a first step towards a cellular analysis of this behavioral plasticity was to identify the buccal network elements that are primarily responsible for triggering the initial protraction phase of each radula bite cycle. Intracellular recordings made from a variety of previously identified protraction-initiating neurons [22, 2528] revealed that spontaneous bursts in three cell types—the bilaterally paired and electrically coupled B63, B30 and B65 interneurons (see Fig. 3b)—always started before each spontaneous radula motor pattern occurrence (see Fig. 4b), and were therefore considered to be the primary instigators in the bite-generating process. Consistent with this action-initiating role, burst activity elicited experimentally in any one of these three cells was found to be capable of initiating complete cycles of fictive biting in otherwise quiescent buccal ganglia. However, although depolarizing current evoked burst in a B63 cell could elicit a complete bite without the involvement of the electrically coupled B30 and B65 neurons (Fig. 3d, left), this initiating ability was dependent upon the co-activation of both bilateral B63 cells, since holding one B63 neuron silent (with negative current injection) while activating the contralateral partner blocked radula motor pattern production (Fig. 3d, right). Similarly, experimentally induced bursts in either B30 or B65 could trigger fictive bites, but again these only occurred when associated with burst activation in B63 as a result of their electrical coupling. Therefore, although burst onsets in these three cell types was sufficient to instigate radula motor patterns that thereby can have different cellular origins according to the arbitrary order in which these neurons become active in a given bite cycle (Fig. 3e), although in all cases, the actual pattern-initiating process is critically dependent on the intervention of the bilateral pair of B63 neurons. Such randomly distributed burst activity in this kernel of decision-making neurons thus underlies the erratic and irregular expression of motor output commands that subserve the hungry Aplysia’s “trial and error” sampling of its environment in search for food [53].

Importantly, the decision-making B63/B30/B65 subset in vitro continues to express the memory trace of the previous associative experience of the intact animal. Accordingly, in ganglia taken from naive control or non-contingently trained animals (i.e., had received food reward in a sequence that was uncorrelated with spontaneous bite occurrences, Fig. 2aiii) which generated biting behavior at irregular intervals and relatively low cycle rates (Fig. 2b, left and right panels), the three cell types produced impulse bursts that were variable, erratic and weakly coordinated, with burst onset in any one cell type being capable of initiating a given bite cycle (left and right panels in Fig. 4b, c). In contrast, in buccal ganglia from contingently rewarded animals in which the feeding CPG network produced accelerated and regularized biting motor patterns, all three cell types expressed faster and stereotyped rhythmic bursts that were now tightly coordinated, with B63 bursts almost always leading (middle panels in Fig. 4b, c). Thus, the cellular correlates of the rate increase and regularization of radula biting behavior induced by in vivo operant conditioning were expressed within a module of electrically connected buccal circuit neurons that are necessary (B63) and/or sufficient (B63, B30, B65) for initiating each bite cycle. Moreover, pharmacological [38, 54] and electrophysiological evidence [55] has indicated that, as for pattern-switching B51, the reinforcing food reward signal to this motor pattern-initiating subset is mediated by dopaminergic inputs from the En esophageal nerve. Together these findings have wider implications for understanding decision-making processes and identifying the convergence points of operant behavior and dopamine-mediated reward [15], since they show that learning-induced neural plasticity underlying the selection of an appropriate motor action (e.g., ingestive biting by B51) and its actual initiation (by B63/B30/B65) maybe encoded at different cellular loci.

It is also instructive to compare reward and decision-making in Aplysia’s simple feeding behavior to more complex goal-directed motor systems such as the corticostriatal brain circuitry of rodents and primate mammals. Here, decision-making processes involving behavioral actions that are flexible, goal-directed and sensitive to rewarding feedback (i.e., analogous to the Aplysia’s search for food) are mediated by striatal pathways that are distinct from those mediating habitual, and relatively automatic decision-making actions resulting from associative learning [15, 17, 56]. This is in direct contrast to Aplysia where the action-outcome dependent transition from sporadic food reward-driven actions to a stereotyped, compulsive-like expression of the behavior is accomplished by functional plasticity within the same single subcircuit of central neurons. In this case, moreover, experience-related variability in action initiation is inscribed within a discrete neuronal subset of the feeding CPG network itself, rather than arising from distributed excitability changes among the wider CPG circuitry [57] or among specialized command-like neurons that are upstream to the CPG [48].

Operant learning-induced changes in neuronal burst-generating properties

Further electrophysiological exploration of buccal ganglia from animals with different behavioral training histories has enabled identifying the cellular sites at which the induction and maintenance of operant memory in the pattern-initiating B63/B30/B65 subset occurs. In principle, the learning-induced changes in the rate and regularity of burst activity in these neurons could be the result of intrinsic biophysical changes or of modifications in the properties of other presynaptic buccal network neurons that in turn modulate the activity of B63/B30/B65. To distinguish between these two possibilities, the burst-generating capability of individual B63/B30/B65 neurons were examined by depolarizing current injection under conditions of in situ isolation achieved by continuously hyperpolarizing one of the critical B63 cell pair to block activation of the other pattern-initiating cells and thereby of remaining feeding CPG circuitry (Fig. 5b, see also Fig. 3d); [58]. This functional isolation procedure revealed not only that each of the pattern-initiating cell types possesses an endogenous oscillatory mechanism that underlies its bursting but also that the expression of this intrinsic property varied in close accordance with learning-induced changes exhibited by the functionally intact feeding network. First, individually isolated B63, B30 and B65 neurons in buccal ganglia from operantly conditioned animals were found to be significantly more excitable (indicated by an increase in input resistance and a decrease in threshold for depolarization-induced burst activation) than in control and non-contingent ganglia. This enhanced excitability, which led to significantly higher rates of endogenous bursting in the three neuron types in response to similar levels of suprathreshold depolarizing current injection (Fig. 5b), therefore suggested a subcellular mechanism by which the learning-induced rate increase in radula motor pattern production occurred.

Second, in situ isolated pattern-initiating neurons also expressed cell-wide correlates of the learning-induced increase in regularity of radula biting behavior and underlying buccal network operation. Depolarization-activated bursting occurred erratically with highly variable durations in all three cell types in control and non-contingent buccal ganglia whereas in contingent ganglia, bursts elicited by similar current magnitudes were now rhythmically recurring with stereotyped durations and inter-burst intervals (Fig. 5b, lower). Moreover, in direct contrast to burst frequency, neither the rhythmic nor erratic forms of single cell activity, associated, respectively, with the occurrence or absence of associative learning, were affected by experimental depolarization, therefore indicating that the regularization of bursting by operant conditioning was not a consequence of the concomitant increase in neuronal excitability. This finding in turn suggested that the acquisition of stereotyped rhythmic bursting in these neurons arose from learning-induced plasticity in a voltage-independent process that was separate from the voltage-sensitive component of their intrinsic oscillatory mechanism. A plausible substrate for this second modulatory process derived from earlier experimental and modeling studies suggesting that the non-linear dynamics of calcium exchange between the endoplasmic reticulum and cytoplasm, which occurs independently of membrane voltage [59], maybe at the origin of chaotic bursting activity in endogenous oscillatory neurons [60, 61]. Moreover, theoretical evidence suggested that an extrinsic stimulus-driven damping of lumenal calcium exchange could lead to a regularization of the otherwise erratic membrane voltage oscillations [61, 62], although whether such a subcellular process underlies the transition between irregular and regular forms of endogenous bursting in B63/B30/B65 as a function of dopamine-mediated operant experience remains to be determined.

Operant learning-induced changes in network properties

In addition to independently enhancing the excitability and regularity of intrinsic bursting within individual pattern-initiating neurons, operant conditioning also led to a widespread increase in the functional coupling between cells and thereby the coherence with which bursts were produced. Experimental evidence for this further learning-induced plasticity and its likely restriction to the pattern-initiating cell subset derived from examining the effects of contingent training on the degree of coordination between the bursting in cell pairs in functional isolation from the rest of the buccal CPG network. For this, a B63 neuron was again continuously hyperpolarized to repress CPG activation, while burst-generating oscillations at approximately similar frequencies were elicited simultaneously in the second B63 and one of its B30 or B65 partners (Fig. 6a). In such independently activated cell pairs from control and noncontingent preparations, and as seen in the functionally intact CPG network (see Fig. 4b), burst onsets in B30 or B65 relative to B63 occurred at highly variable phase intervals that could be advanced or delayed relative to B63 in successive burst cycles (Fig. 6a, left and right panels). However, in the regularized activity of contingent preparations, burst onsets in B30 and B65 were closely coordinated with those of neuron B63 (Fig. 6a, middle panel).

Fig. 6
figure 6

Plasticity in electrical coupling. a Bursting in pairs of pattern-initiating neurons (here B63/B65) elicited by simultaneous depolarization while the contralateral B63 (B63c) was hyperpolarized to prevent wider CPG activity. In control (left) and non-contingent (right) preparations, burst onsets in the two activated neurons are weakly coordinated and the leading cell varies between cycles. However, in contingent preparations (middle), the onsets of now regularized bursts in the pattern-initiating cell pairs are strongly coordinated, with B63 bursts always slightly leading its partner. b The increased coordination of the pattern-initiating cell bursts is associated with a learning-induced enhancement of their electrical coupling. These connections (resistor symbols in schematics) were tested by injecting a fixed negative current (−10 nA) into a B30 or B65 to alter its membrane potential (VB30/B65) and that of the electrically coupled B63 (VB63). After contingent reinforcement, the electrical coupling of both B30 and B65 with B63 increased significantly as compared to control and non-contingent conditions

This transition from weakly coordinated to coincident bursting as a consequence of operant conditioning was associated with corresponding changes in the strength of electrical coupling that was previously known to be widespread amongst ipsi- and bi-lateral pattern-initiating neurons [29, 30, 32]. Measurements of coupling strength between cell pairs with presynaptic hyperpolarizing current pulse injections into pre-junctional B30 or B65 revealed significantly stronger voltage responses in post-junctional B63 neurons of contingent buccal ganglia (Fig. 6b, middle panel) than in B63 of either control or noncontingent ganglia (Fig. 6b, left and right panels). An equivalent increase in electrical coupling strength, which was likely to underpin the coordinated bursting as a result of in vivo operant training, was found between ipsilateral components of the pattern-initiating subset as well as their contralateral partners. Whether this enhanced electrical connectivity results from direct alterations in the actual junctional resistance between pattern-initiating cell pairs is presently unknown, although the finding that their membrane input resistances increase with operant learning [58] suggested that the changes in coupling were likely, at least in part, to be attributable to non-junctional resistance changes.

Therefore, as a consequence of this network-wide plasticity acting in concert with a cell-wide transition to stereotyped endogenous bursting, individual radula motor patterns that are otherwise instigated by sporadic burst onsets in any one of the three cell types are transformed by operant conditioning to more frequent and regular occurrences in which the essential B63 neurons become the cycle-to-cycle leaders. The increase in electrical coupling between these cells can be seen to further strengthen the regularity and intensity of bursting in the bilateral B63 cell pair and thereby ensure motor pattern initiation in every burst cycle. Although electrical connectivity plays a major role in rhythmogenic circuit function [e.g., 63, 64] and can be modified by sensory and modulatory transmitter actions [6567], its involvement in network plasticity associated with learning processes and the expression of memory has not, until very recently [68], been previously reported.

Ubiquity of dopamine in mediating associative learning

The transmitter dopamine is known to play a critical role in reward reinforcement and associative learning processes, including operant conditioning in both invertebrates and mammals alike. Reinforcing stimuli activate dopaminergic neurons in the mammalian brain [for review see 6975], and electrical or pharmacological stimulation of DA systems can elicit behavioral responses that mimic the effects of primary reward reinforcement [69, 76, 77]. Conversely, the pharmacological blockade of DA receptors or the destruction of dopaminergic pathways leads to deficits in the acquisition of conditioned behaviors produced by associative learning procedures [69, 7881].

Equivalent lines of evidence indicate that DA plays a similarly important role in reward-induced learning in Aplysia. For example, the esophageal sensory nerve En, the reinforcement pathway that likely conveys food reward signals in vivo [36, 42, 52] and which mediates electrical stimulation-induced operant learning in vitro [3840] is rich with DA-containing fibers [26, 82] that provide synaptic input to buccal network elements, including the B51 [50] and B63 neurons [55]. Moreover, exogenous DA reproduces the effects of contingent reinforcement when applied to isolated B51 neurons in culture in a single cell analogue of operant conditioning [36, 45]. Conversely, blockade of endogenous dopamine’s actions with methylergonovine, a DA receptor antagonist [54, 8386], blocks the contingent-dependent alterations in buccal CPG activity produced by an in vitro analogue of operant conditioning [50] and correlatively, in the bursting and synaptic properties of in situ isolated pattern-initiating neurons [55]. Such findings therefore not only emphasize the extent to which dopamine-mediated reward is conserved in the animal kingdom, but also point to the utility of Aplysia’s feeding behavior as a model system for studying the underlying cellular and molecular mechanisms [44] that underlie the transmitter’s actions in associative learning.

Conclusions

Much of our current understanding of the operation of the neuronal circuits that generate behaviors has derived from the study of stimulus-driven motor reflexes, or stereotyped rhythmic behaviors (such as locomotion and respiration) in which the underlying motor programs recur spontaneously in a relatively predictable manner. Unlike such stereotyped behaviors, however, much less is known about the neural basis of other intrinsically driven motor acts that are irregularly expressed, such as feeding or sexual activity, and in which the incentive (or impulse) to act, and how to act, are to a large extent governed by a dynamical interplay between the motivational state of the organism and learning processes. Moreover, in contrast to reflexive behaviors in which learning-induced adaptability essentially modifies the through-pathway transmission of input to output signals at sensori-motor connections (Fig. 7a), in internally driven goal-directed actions, behavioral plasticity emerges from an association between reinforcing sensory information and the functional dynamics of separate autoactive circuitry intercalated between the input–output stages of the system (Fig. 7b). This in turn adds a further level of complexity to learning and memory processes since here, activity-dependent changes in the intrinsic properties of this central circuitry and its constituent neurons (non-linear membrane properties and synaptic interconnectivity) provide the essential substrates through which learning-induced behavioral changes occur.

Fig. 7
figure 7

Summary of learning-induced plasticity in behavioral circuits of Aplysia. a In reflexive pathways, the functioning of which is strictly dependent on extrinsic activation by sensory stimuli, learning intensifies (or reduces) otherwise fixed motor responses by modifying the linear through-flow of impulse activity from the sensory input to motor output stages via changes in synaptic connectivity and/or membrane excitability in chains of passively conducting neurons. b In neuronal pathways generating autonomous (motivated) behaviors, which partly depend on the intrinsically driven impulse to act, learning regulates the initiation of, and selection between, different motor patterns by rigidifying the underlying multifunctional CPG network into a fixed functional configuration though changes in the active membrane properties and reciprocal connections of constituent neurons

Moreover, unlike the CPG networks responsible for most stereotyped rhythmic behaviors where cycle-to-cycle automatism is conferred by the regular pacemaking properties of component neurons and their synaptic connectivity, the central circuitry that drives Aplysia’s erratic and irregularly repeating biting actions in its search for food, incorporates decision-making elements that are inherently capable of choosing whether or not to instigate individual cycles of motor activity or of selecting the output pattern that the CPG network produces.

The lack of a predesignated leader amongst these weakly coupled decision-makers further contributes to the variability (or “impulsiveness”) with which radula biting behavior is expressed. However, once the animal learns the positive consequences of its biting actions upon detecting food, the compulsive-like stereotypy that is now expressed in its predominantly ingestive behavior (selected by an increased excitability of B51) arises from the acquisition of accelerated, regularized and coincident bursting within an action-triggering neuronal subset (B63,B65,B30) which effectively stabilizes buccal network output into a phenotype resembling that of a more typical CPG.

Such observations on operant reward learning-induced changes in feeding behavior and its underlying circuitry in the simpler Aplysia model have thus begun to provide mechanistic insights into how animals, including humans, might learn to anticipate future appetitive or aversive events on the basis of past experience gained from the positive or negative consequences of their own behavior. Moreover, a greater understanding of the cellular and subcellular mechanisms underlying the switch in behavioral plasticity resulting from opposing positive and negative reinforcing learning processes may help guide future treatments for addictive behavioral disorders.