Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Many animals, from insects to mammals, exhibit complex collections of spatial behaviors for survival, including foraging for food, remembering where home is, remembering safe routes between home and various known food sources, improvising new routes back home after an exploratory outbound path to a previously unvisited location, learning maps of new environments, and setting goal locations. To be successful, these spatial behaviors must be robust and able to deal with obstacles that hamper straight paths and occlude views, with novel environments, and with dynamically changing cues within familiar environments.

Navigation involves localizing oneself in space, modeling or mapping the external environment, setting goals, and selecting routes. Self-localization and mapping have a chicken-and-egg relationship. If a “map” of the environment is available (i.e., the spatial coordinates for all landmarks are known), then localization is not difficult, through reference to a sufficiently dense set of landmarks. Conversely, if self-motion estimates are precisely integrated to determine location, then building a map of the environment involves simply visiting and attaching a coordinate to each landmark. Typically, however, neither is known reliably, and the problem is to simultaneously localize oneself within an environment while acquiring a map of it, a computationally challenging problem in sequential probabilistic inference. Goal selection brings into play decision-theoretic questions involving exploration–exploitation trade-offs (Dayan and Daw 2008). Finding routes home involves not only knowing where one is and where home is but also the conversion of those coordinates into the vector pointing to home, as well as deciding when to follow landmarks and when to follow beeline paths based on internal estimates of location.

In this chapter, we focus on a small subset of questions related to animal navigation. We discuss how animals estimate their locations within environments while building internal models of these environments. In other words, our central aim here is to describe potential neural substrates for localization and mapping and to discuss computational efforts to understand the mechanisms underlying how the brain solves these problems. Other important aspects of goal-directed spatial behavior, such as selecting spatial goals and routes, are not directly addressed here (however, such behaviors will depend on the animal’s ability to solve the problems of localization and mapping, which we discuss here). In what follows, when we refer to the entorhinal–hippocampal circuit for spatial navigation, we intend to convey the spatial functions subserved by this circuit without implying that this is the sole function of the circuit.

We begin with a historical perspective on advances in the field, then summarize the current state of electrophysiological findings on activity in different parts of the circuit, briefly describing progress in modeling the mechanisms underlying such responses, and finally highlight several models of the overall entorhinal–hippocampal spatial navigation circuit that show how the various subcomponents might contribute to simultaneous self-localization and mapping in the presence of noisy and ambiguous sensory cues.

2 Early Ideas on Animal Navigation: Stimulus-Response Learning Vs. Map-Based Learning

Experimental work on spatial navigation in the animal literature began in the early twentieth century. This work was largely confined to observing how rats learned spatial mazes. Behaviorists believed that spatial learning, like other types of learning, could only be based on the association of actions and stimuli with rewards. For spatial learning, this implied that only locations and location-relations associated with rewards could be learned. In this view, chaining stimulus-reward associations could subserve more sophisticated behaviors like navigation along a route.

Tolman and some of his contemporaries (Tolman 1948; Hebb 1949) observed that there was much more to spatial learning. A notable study that challenged the behaviorist point of view was the sunburst maze experiment. In this task, after preliminary training in a spatially restricted, roughly L-shaped enclosure, rats were given the choice of many novel radial arms, the sunburst, to move directly to the end point (Tolman et al. 1946). Interestingly, some rats were observed to take the shortest radial route corresponding to the beeline path between the starting and end points. This suggested that rats could improvise paths and shortcuts through regions they had not previously traversed and thus which had no reward associations. Tolman reasoned that animals are capable of constructing mental representations of spatial locations in the environment and relations between them, independent of reward. He called these representations “cognitive maps.” By construction, cognitive maps could hypothetically provide route information between any two points in the environment and thus be used to navigate the sunburst maze.

Tolman’s conception of cognitive maps (Tolman 1948) was perceptive, but the lack of any known neurophysiological basis for such hypothesized computations contributed to a decline in popularity of the cognitive map hypothesis in the decades after its proposal (O’Keefe and Nadal 1978). It was not until the 1970s that the cognitive map theory was dusted off and elaborated upon (O’Keefe and Nadal 1978), in light of the exciting experimental discovery of place cells (O’Keefe and Dostrovsky 1971).

3 Map-Based Navigation Theory Spurred by Discovery of the Place Cell

Place cells, discovered by O’Keefe and Dostrovsky in the early 1970s (O’Keefe and Dostrovsky 1971), are neurons in the hippocampus that fire if and only if the animal is in the immediate neighborhood of a particular location (the place field) in an environment. Many place cells possess place fields within any given environment. Because the recording environments (typically 0.5–1 m per dimension) tended to be covered by different place fields, place cells were hypothesized to form the basis for spatial mapping. This pointed to the hippocampus as the locus of the cognitive map for space.

The groundbreaking discovery of place cells prompted the renewal and further development of the cognitive map hypothesis. O’Keefe and Nadel extensively reviewed the psychological, behavioral, anatomical, and physiological evidence for the existence of abstract spatial maps in the brain in their comprehensive and prescient book on the topic (O’Keefe and Nadal 1978). O’Keefe and Nadel provided a clear definition of a spatial map as an abstract representation of locations in an environment, the relationships between them, and the sensory inputs related to the locations. An important contribution to the cognitive map theory by O’Keefe and Nadel (1978) was to elaborate on the problem of ongoing location identification. Reasoning that it was sufficiently difficult to estimate location purely from the observation of shifting angles of visible landmarks relative to the animal, they hypothesized that another system, sensitive to the movements of the animal, would be required, Fig. 14.1. This second system was hypothesized to follow the self-motion of the animal through space, shifting the hippocampal place representation accordingly. The self-motion drive was suggested to supplement purely external sensory inputs, which provided cues originating from viewing the world from different locations and angles. The “internal” system was seen as providing predictions about what to expect at a particular place that were compared with the actual sensory input provided by the “external” system. Discrepancies between expectation and actual input might be conveyed via misplace units (O’Keefe 1976), whose hypothesized role was to signal mismatches between the two systems, Fig. 14.1. In the theory, active misplace units would trigger further exploration of the environment until enough information was acquired to fix the incongruities between the two inputs and silence the misplace units. Thus, in a way, each system was seen as providing partially accurate representations of the animal’s location within the environment, with the interplay between the two suggested as leading to the formation of a consistent map.

Fig. 14.1
figure 1

High-level schematic of cognitive map hypothesis of O’Keefe and Nadel. Place cells in the CA1/CA3 region can be activated by two independent means: through external sensory (allothetic) cues, which are preprocessed in the DG, and through movement-related (idiothetic) cues. Mismatches in the two inputs propel the animal toward further exploration

O’Keefe and Nadel (1978) stimulated the rediscovery of the cognitive map theory by the rest of the scientific community and, at the same time, developed it into a far more complete framework for understanding the neural substrates of navigation. However, two key elements were lacking. First, although place cells offered the first glimpse of the neural substrates for high-level cognitive representations, all other critical components of the spatial navigation circuit remained poorly characterized at the time. Second, there were no computational models of how the appropriate navigational operations could be performed in the hippocampal circuit. In the next two sections, we describe how empirical work and theoretical and computational studies have begun to fill in these elements.

4 Neural Substrates for Navigation

A series of discoveries of cells beautifully tuned to specific spatial and navigation-specific variables followed the discovery of place cells, laying a more solid neurophysiological foundation for our understanding of a cognitive map in the brain. In parallel, computational and theoretical studies have helped provide mechanistic explanations for how such neurophysiological responses might arise in the brain.

4.1 Head-Direction Cells

In 1983, Ranck discovered head direction (HD) cells (Ranck 1984). HD cells fire when the animal’s head points in a particular direction, independent of the actual location of the animal within the environment, Fig. 14.2a. HD cells were found in many regions of the brain, including the postsubiculum and thalamus (Taube et al. 1990a, b; Taube 1995). The HD signal is carried to the entorhinal cortex (EC) through the postsubiculum (van Groen and Wyss 1990; Caballero-Bleda and Witter 1994; Taube 2007). The specific set of HD cells firing along one direction in an environment is not fixed by magnetic or compass cues but by a local orienting stimulus: a white stripe or cue-card placed on an otherwise featureless cylindrical wall provides a reference angle for the HD cells. If the cue is rotated clockwise, the preferred firing orientation of all the HD cells rotates clockwise, robustly and coherently, by a very similar amount (Dudchenko and Taube 1997). HD cells continue to fire if the lights are switched off, spiking at the appropriate orientations as the animal moves about the room for an extended period of time (several seconds) (Mizumori and Williams 1993). Moreover, HD cells can become decoupled from external cues, while maintaining their tuning curve shapes and relationships, highlighting the influence of idiothetic cues on HD activity (Knierim et al. 1995; Yoganarasimha and Knierim 2005). These observations, together with lesion studies on the vestibular inputs to HD cells (Stackman and Taube 1997), suggest that HD cells integrate the animal’s head’s angular velocity as signaled by the vestibular system to arrive at updated estimates of the animal’s head direction. Head direction estimation is a critical element of any navigational circuit; however, it is important to note that instantaneous head direction need not directly correspond to the instantaneous direction of movement of the animal through space (termed “heading direction”), because the animal can turn its head relative to the forward direction, to smell, view, or touch peripheral objects.

Fig. 14.2
figure 2

Neural models of grid cells, HD cells, and place cells. (a) Schematic of spatial responses of four basic cell types of the hippocampal navigation circuit: Place cell (top), grid cell (second panel), border cell (third panel), and HD cell (bottom). The place, grid, and border cell panels show the animal’s trajectory (gray trace) as it explores a box environment. Red dots represent locations at which the cell emitted a spike. The last panel is a polar plot of the firing rate of an HD cell as a function of the animal’s heading direction. (b) 1D continuous attractor model of head direction cells: Two populations of conjunctive HD cells (red and green circles), arranged according to their head direction preferences. Red (green) cells receive vestibular inputs (red or green boxes) coding for clockwise (counterclockwise) angular head velocity. The conjunctive HD populations project to a population of pure HD cells (black circles). Weights between and across layers are described in boxes below: Curves represent the weights of one neuron to its postsynaptic targets. Conjunctive layers receive uniform excitatory input (not shown), causing formation of stable bump in conjunctive (not shown) and pure HD layers (black curve). The bump state can be moved around the network by biasing input into the conjunctive layer. (c) 2D continuous attractor model of grid cells: Four populations of conjunctive grid cells (colored balls), arranged according to the preferred phases of their grid fields, receive input from speed-modulated HD cells (colored boxes) that encode animal velocity and project to a population of pure grid cells (black balls). The conjunctive grid cells inherit these directional preferences, as indicated by the colored tuning curves (as a function of animal’s heading direction, Θ) in upper right. Weights between and across layers are shown in the boxes below [same as in (b), only 1d projections are shown]. If the conjunctive layers are supplied with a uniform excitatory drive, a stable, multimodal pattern of activity arises (black surface with yellow spots, shown only for the pure GC layer). The activity pattern can be translated in any direction by biasing input into the conjunctive layers (encircled colored arrows indicate direction the activity pattern moves if that conjunctive layer receives biased input). (d) OI model of grid cells: Three velocity-coupled oscillators (red, green, and blue; the VCOs could be networks as pictured here, or single neurons, or dendrites of a single neuron) receive vestibular input whose amplitude is given by the projection of the animal’s velocity vector onto a specific preferred direction (red, green, blue arrows). These inputs modulate the oscillation frequency of the VCOs: this is the actual path integration stage in the OI model. The oscillatory signals are summed, along with a fourth, velocity-insensitive baseline oscillatory signal (gray), within a single grid cell (black circle). (e) Place fields driven by sensitivity to distance, direction, and/or angle subtended by landmarks in the environment (yellow blob represents the active ensemble of place cells, topographically arranged according to place preference). (f) Place fields formed by summation of multiple grid cells (black sheets in the MEC indicate different grid cell modules; yellow blobs indicate active neurons in each module). (g) Place fields formed by summation of BVC inputs, which convey information about distance and direction to geometric boundaries

The remarkable HD cells, when discovered, constituted the best evidence available then that the brain might compute using continuous attractor dynamics (Skaggs et al. 1995; Zhang 1996; Seung 1996; Redish et al. 1996): a continuum of stable—or attracting—neural activity states that can be used, because of their stability, to store short-term memories of continuous variables and integrate these variables over time based on motion input (integration is the operation of summing external inputs, and in the absence of changes in the external input, holding the state obtained from summing past inputs; therefore, integration requires memory). Because single-neuron activity states are transient, typically decaying within a membrane time constant of about 100 ms, models of continuous attractor dynamics in the brain involve strong recurrent connections that may be excitatory or inhibitory, whose function is to provide a net positive feedback drive that cancels the tendency of individual neural activities to decay over time (Skaggs et al. 1995; Ben-Yishai et al. 1995; Zhang 1996; Seung 1996).

The HD system is well modeled by a specific continuous attractor network, with neurons arranged (conceptually, not necessarily anatomically) along a ring, that excite or disinhibit each other locally and inhibit each other globally (Ben-Yishai et al. 1995; Zhang 1996). If the connectivity is the same across the ring, a bump-like activity state and all translations of the bump are stable states, Fig. 14.2b (Skaggs et al. 1995; Ben-Yishai et al. 1995; Zhang 1996; Stringer et al. 2002; Boucheny et al. 2005). Each bump location along the ring represents a specific head orientation. The network is further hypothesized to possess asymmetrical recurrent connectivity and feedforward inputs signaling angular head velocity, in such a way that the inputs drive the bump along the ring, with speed and direction proportional to the angular head velocity. In this way, these bump states can be used to maintain a representation of the animal’s current heading direction. A specific hallmark of the continuous attractor network theory is that the preferred orientations of pairs of neurons should maintain a fixed angular separation relative to each other, even when the overall orientation preferences rotate or are otherwise changed due to mismatched angular cues from the external world (Yoganarasimha et al. 2006). Experiments involving inconsistently rotated external cues cause rotations in the preferred directions of HD cells, but as predicted by attractor models, their relative preferred directions remain fixed (Taube et al. 1990b; Yoganarasimha et al. 2006).

4.2 The Entorhinal Cortex as Gateway to the Hippocampus

In addition to the hippocampus and the distributed regions where HD cells are found (Taube 2007), the EC is a key brain area involved in spatial navigation. Lesion studies have implicated the EC in spatial computation (Ferbinteanu et al. 1999; Parron and Save 2004a, b; Steffenach et al. 2005; Van Cauter et al. 2012). The EC is the cortical gateway of inputs to the hippocampus: the medial and lateral portions of EC (MEC and LEC, respectively) project to the proximal and distal portions of CA1, respectively, but have overlapping projections at the dentate gyrus (DG) and CA3 (Witter and Amaral 2004; Witter et al. 2006; McNaughton and Barnes 1977). Electrophysiological studies in the EC of the freely moving rat reveal a dissociation in the nature of the LEC and MEC representations (Deshmukh and Knierim 2011): LEC cells tend to respond to objects in the animal’s immediate environment (Zhu et al. 1995; Young et al. 1997; Wan et al. 1999; Deshmukh and Knierim 2011), while cells in MEC ignore object locations and instead fire at multiple locations in the open field (Barnes et al. 1990; Quirk et al. 1992; Frank et al. 2000; Fyhn et al. 2004; Wills et al. 2005). Thus, the LEC and MEC might form the two parallel streams postulated in the cognitive map hypothesis, carrying external sensory and internal motion-based cues, respectively, to be synthesized in the hippocampus.

4.3 Grid Cells

Less than a decade ago, the MEC was found to contain a class of cells—grid cells—with astonishing spatial firing characteristics (Hafting et al. 2005): each cell fires at multiple locations in an environment, and the locations are arranged on the vertices of an essentially equilateral triangular lattice, Fig. 14.2a. The period of the lattice is typically determined intrinsically by the cell network and not by the size and shape of the enclosure (Hafting et al. 2005) [but see Barry et al. (2007)]. Nearby cells share common grid periods and orientation, and there are a range of distinct periods, hypothesized to be discretely spaced (Fuhs and Touretzky 2006; McNaughton et al. 2006; Fiete et al. 2008; Burak and Fiete 2009) and later shown to be so in Stensola et al. (2012). Grid cells are most commonly found in layer II of MEC. The postsubiculum, a major source of input to the MEC, terminates in the deep layers (van Groen and Wyss 1990). Thus, layers III–V of the MEC contain cells responsive to the animal’s head direction, either in the form of pure head direction tuning or combined head direction and grid-like tuning (the latter are known as “conjunctive” grid cells) (Sargolini et al. 2006). In contrast, grid cells in layer II of the MEC tend to be insensitive to heading and head direction (“pure” or “non-conjunctive” grid cells) (Sargolini et al. 2006).

The spatial fields of MEC grid cells can rotate when salient external cues are rotated (Hafting et al. 2005), and periods of their spatial tuning can resize in response to a rescaling of a familiar environment (Barry et al. 2007), but other than firing-rate modulations (Savelli et al. 2008), the spatial locations of grid cell firing are relatively insensitive to the particulars of the environment. This is in contrast to the spatial responses in LEC and the hippocampus, which exhibit more detailed and complex changes to environmental manipulation (Zhu et al. 1995; Young et al. 1997; Wan et al. 1999; Deshmukh and Knierim 2011; Muller and Kubie 1987; Leutgeb et al. 2004; Wills et al. 2005; Leutgeb and Leutgeb 2007; Colgin et al. 2008). The relative insensitivity of grid cells to external cues and the stability of their fields in cue-poor environments and darkness (Hafting et al. 2005) suggest that self-motion is the primary determinant of grid cell firing. For these reasons, it is widely hypothesized that the grid cell system computes, or at least is responsive to, a path integrated estimate of the animal’s position (see Chap. 8). However, direct evidence of the role of grid cells in path integration is lacking.

Most grid cell models rely on the conversion of motion inputs into spatial representations (Welinder et al. 2008; Giocomo et al. 2011; Zilli 2012). On the one hand, continuous attractor models of grid cells (Fuhs and Touretzky 2006; Burak and Fiete 2006, 2009; Guanella et al. 2007; McNaughton et al. 2006), Fig. 14.2c, posit that strong local recurrent connectivity destabilizes the uniform activity state in the population and stabilizes a state which displays regular triangular lattice patterning within the population. The recurrent connections may be purely inhibitory, as first proposed in Burak and Fiete (2009) and supported by Pastoll et al. (2013) and Couey et al. (2013), or excitatory with a local inhibitory surround (McNaughton et al. 2006; Guanella et al. 2007; Burak and Fiete 2009). Translation invariance of such connectivity stabilizes all translations of this pattern through the population, and an asymmetric component of the connectivity allows external inputs signaling animal velocity to drive the pattern in direct proportion to the direction and speed of the animal’s movements. These models are straightforward 2D extensions of the 1D continuous attractor models for HD cells (Welinder et al. 2008; Zilli 2012).

All cells in the continuous attractor network model share the same grid period and orientation, because their responses are generated by translations of the same pattern, and all spatial phases are exactly uniformly distributed, consistent with the data. Disjoint network copies (modules) are required to produce different grid periods and, because each network is large (consisting of 4,000–40,000 neurons), leads to the prediction of a few, discrete grid periods within each animal (Fuhs and Touretzky 2006; McNaughton et al. 2006; Burak and Fiete 2009). This prediction was recently experimentally verified in Stensola et al. (2012). The fundamental prediction of continuous attractor models is that the differences in preferred spatial activation phase between pairs of grid cells will remain stable over time, regardless of environmental manipulations that induce sizeable distortions in the grid fields, if the network architecture remains unchanged. Recent analysis of simultaneously recorded grid cells with similar period and orientation across experiments involving grid cell distortion (including the environmental stretching experiments of Barry et al. (2007)) establishes the stability of these predicted relationships and shows that the grid cell population responses within a single grid network (module) are confined to a 2D manifold within the high-dimensional state space (Yoon et al. 2013).

On the other hand are oscillatory interference (OI) models in which interfering temporal oscillations produce periodically amplitude-modulated activity outputs (O’Keefe and Burgess 2005; Burgess et al. 2007; Hasselmo et al. 2007; Hasselmo 2008; Blair et al. 2008; Zilli et al. 2009) (see Chap. 12). These amplitude modulations can be mapped to spatial grid patterns that are invariant to animal speed if the frequency of the temporal oscillations is based on animal speed, Fig. 14.2d. The elementary oscillators are called velocity-coupled oscillators (VCOs). Whereas in the continuous attractor models, the grid cell network integrates animal velocity by shifting the phase of the periodic population-level grid pattern (made possible by asymmetric recurrent connections), integration in the OI models is performed by the VCOs, because the velocity inputs change their frequency and thus also increment their phase.

To account for the systematic variation in grid period along the dorsoventral axis of MEC (Hafting et al. 2005), OI models predict a gradient in the baseline oscillation frequencies of the VCOs. Empirical study (Giocomo et al. 2007) shows that, intriguingly, the intrinsic resonance frequency and subthreshold membrane oscillations of stellate cells in MEC decreases systematically toward the ventral end along the dorsoventral axis, qualitatively matching the prediction, if the hypothesized VCOs correspond to subthreshold oscillations in the membrane potential. However, there are biophysical arguments against identifying membrane potential oscillations with VCOs (Remme et al. 2010; Fiete 2010). Identifying the local field potential (LFP) oscillation at theta frequency with the baseline (not velocity-modulated) oscillator also has some problems, including a mismatch between the phase fidelity of spatial patterning in grid cells across multiple periods vs. the tendency of the LFP oscillation to decohere or lose all phase information after five to six cycles (Welinder et al. 2008). The theta frequency is not linearly related to grid period across modules (Stensola et al. 2012), as would be predicted by OI models if baseline oscillations were reflected in the LFP. Bats, which exhibit grid cell activity, do not display sustained theta oscillations, suggesting a dissociation between the grid-like spatial tuning and theta oscillations or at least between the hypothesized VCOs and theta oscillations (Yartsev et al. 2011). Together, these studies suggest that the LFP theta is not related to VCOs or that the OI models require revision. Studies showing that grid fields degrade when theta oscillations are abolished (via lesion to the medial septum) (Brandon et al. 2011; Koenig et al. 2011) are more consistent with a possible role for the LFP in grid formation. However, these results are also consistent with continuous attractor models in which the recurrent circuitry amongst grid cells is predominantly inhibitory (Burak and Fiete 2009; Couey et al. 2013) and grid patterning requires a spatially nonspecific excitatory input to drive cells above threshold (Burak and Fiete 2009; Pastoll et al. 2013; Bonnevie et al. 2013).

Recent intracellular recordings of grid cells (Schmidt-Hieber and Hausser 2013; Domnisoru et al. 2013) show that grid cell firing fields are clearly correlated with slow depolarizing voltage ramps that last the duration of the field and that the firing fields are better predicted by the voltage ramps than by the smaller superimposed oscillations that are also present during the field. These findings suggest either a feedforward or feedback synaptic contribution to the spatial patterning of grid cells. Synaptic contributions to the grid cell response are consistent with continuous attractor models of grid cells (Fuhs and Touretzky 2006; Burak and Fiete 2006, 2009; McNaughton et al. 2006; Guanella et al. 2007; Welday et al. 2011; Zilli 2012) as well as other models that involve cell–cell coupling (Kropff and Treves 2008; Mhatre et al. 2010) and may also be consistent with versions of OI models that incorporate lateral network connections (Zilli and Hasselmo 2010; Yoon et al. 2013). More specifically, a recent analysis of simultaneously recorded neurons shows that grid cells with similar spatial period and orientation (i.e., from a single network or module) exhibit key signatures of two dimensional continuous attractor dynamics, as predicted by the continuous attractor models (Fuhs and Touretzky 2006; Burak and Fiete 2006, 2009; McNaughton et al. 2006; Guanella et al. 2007). Thus, all models of a grid cell population should, to be consistent with the data, display continuous attractor dynamics across the population.

At the same time, the intracellular studies of grid cells show that individual spike timings within one firing field are strongly correlated with theta oscillation peaks in the intracellular voltage. Thus, while network mechanisms determine the spatial locations of firing fields, the OI mechanisms might be responsible for a more fine-grained temporal code of spike timing within field (Schmidt-Hieber and Hausser 2013; Domnisoru et al. 2013), including the phenomenon of phase precession (O’Keefe and Recce 1993; O’Keefe and Burgess 2005; Kamondi et al. 1998; Magee 2001; Lengyel et al. 2003). Finally, at present both classes of grid cell models (Welday et al. 2011; O’Keefe and Burgess 2005; Burgess et al. 2007; Hasselmo et al. 2007; Fuhs and Touretzky 2006; Burak and Fiete 2006, 2009; Guanella et al. 2007; McNaughton et al. 2006) are subject to the criticism of complexity in the wiring required to generate grid fields and to the question of how these architectures may form during development or through experience-dependent synaptic plasticity, although some progress has recently been made in understanding how the continuous attractor networks capable of generating grid cell activity may arise from relatively simple plasticity rules (Widloski and Fiete, unpublished observations).

There are a number of questions one may ask about grid cells and their downstream readouts in the brain. Are homing vectors computed from the grid cell representation, and if so, where and how is this done? Animals can compute and execute beeline paths home with the help of visual beacons or landmarks after executing tortuous outgoing trajectories, a behavior known as homing. There are some studies that this sort of behavior is EC-dependent (Parron and Save 2004a; Steffenach et al. 2005). However, while the grid cell output contains the information necessary for specifying vector displacement relative to a starting point (e.g., home) (Fiete et al. 2008; Sreenivasan and Fiete 2011), it is not coded in a straightforward or linearly decodable way: the grid cell activity patterns for nearby locations are nearly maximally distinct from each other, while the activity patterns for remote locations can be very similar (Fiete et al. 2008; Sreenivasan and Fiete 2011). This fact implies the need for separate nonlinear computations to recover the metric distance and direction information from the grid code, but it is not clear which downstream readouts of the MEC might perform these computations. It is possible that the subiculum, with its relatively uncharacterized role in the spatial circuit (Sharp 2006), may play a role.

Second, why, if grid cells are producing a path-integrated estimate of location, do they represent it in a single 2D network response, rather than in two 1D activity patterns that increment like a cartesian basis for 2D space. The number of neurons required to represent a 2D space in a combined representation scales like N 2, whereas the number required in two independent 1D representations scales only as 2N (Fiete et al. 2008). One problem with representing 2D space with a pair of 1D representations, each conveying information about one cartesian coordinate, is that one coordinate and thus one representation remains unchanged for all movements parallel to the corresponding coordinate axis (Fiete et al. 2008). Thus, the representation is not “whitened” or decorrelated across locations. A single 2D representation allows for a more whitened representation of different locations, without inducing correlations along two specific coordinate axes (Fiete et al. 2008). Whether this gain is enough to offset the neuron number costs of constructing a single 2D representation depends on the importance of achieving such decorrelation, another question that remains to be answered.

Closely related is the question of how the grid cell system represents 1D environments. Often during real-world navigation, the animal might run along a wire or along the wall of a hallway and perceive these trajectory segments as navigating in an inherently 1D environment, even though they are merely 1D paths embedded in a higher-dimensional world. Intriguing grid cell recordings from 1D environments (narrow elevated tracks) appear to show irregular, or at least non-periodic, responses (Brun et al. 2008; Derdikman et al. 2009; Domnisoru et al. 2013) that in some circumstances appear to be more closely tied to external cues than are the 2D responses (Brun et al. 2008; Derdikman et al. 2009). Moreover, the spatial periodicity of 1D responses appears to be much larger than the periods seen in 2D. These observations raise the question of whether the 1D response is generated under the same dynamical mechanisms as the 2D response (e.g., whether the 1D response is simply a 1D slice through a regular 2D grid (Yoganarasimha et al. (2011); Domnisoru et al. (2013) and unpublished observations by KJ Yoon, S Lewallen, A Kinkhabwalla, DW Tank, and IR Fiete), or whether 1D dynamics, and by extension, the grid cell code in 1D environments, is in a distinct dynamical regime and follows very different rules).

4.4 Place Cells

Layers II and III of the EC (both the LEC and the MEC where grid cells and conjunctive grid cells reside) project to the hippocampus and to areas DG/CA3 and CA1, respectively (Anderson et al. 2007). All of these target areas contain cells with place-like fields. Place cells, unlike grid cells, are sensitive to many aspects of the external environment, including features that animals are known to use for navigation [see Redish (1999) for a review]. This includes sensitivity to proximal and distal landmarks (Siegel et al. 2008; Yoganarasimha et al. 2006; Renaudineau et al. 2007), contextual cues (including nonspatial ones like color and lighting) (Muller and Kubie 1987; Hampson et al. 1999; Wood et al. 1999, 2000; Hayman et al. 2003; Komorowski et al. 2009; Manns and Eichenbaum 2009), geometric boundaries (Lever et al. 2002), and reward associations (Wikenheiser and Redish 2011). Place cells continue to fire in the absence of visual cues, suggesting that their activities can be updated through idiothetic cues (Fuhs et al. 2005; Gothard et al. 1996; Knierim et al. 1996; Taube et al. 1996; Jeffery et al. 1997; Quirk et al. 1990). These results support the hypothesis by O’Keefe and Nadel that the hippocampus contains the brain’s spatial map and that this map derives from both idiothetic and allothetic cues.

Early models of place cells hypothesized that they were driven primarily by visuo-spatial cues (Zipser 1985; Sharp 1991; Schmajuk 1990; Schmajuk and Blair 1993; Burgess et al. 1994; Benhamou et al. 1995; Prescott 1996). For instance, each cell might be particularly sensitive to the constellation of cues as seen from some particular location in the environment and would fire whenever that constellation was in view, Fig. 14.2e. In accordance with these models, external cues do seem to play a role in driving place cell activity: some place cells seem to fire based on landmark location (Deshmukh and Knierim 2013) and are sensitive to external sensory cues in general, as described above. These external sensory cues might arrive at the hippocampus through the LEC, given the presence of object/landmark related cell types found there (Zhu et al. 1995; Young et al. 1997; Wan et al. 1999), or possibly through the MEC itself.

To account for the continued expression of place fields in darkness and for the omnidirectionality of place cells in two-dimensional environments, which fire when the animal approaches a location from diverse angles with diverse views, a number of models (described in more detail in the following section) invoked the possibility that place cell activity was at least partially based on path integrated estimates of location (Touretzky and Redish 1996; Samsonovich and McNaughton 1997; Balakrishnan et al. 1999; Arleo and Gerstner 2000). Several models placed the locus of path integration within the CA1/CA3 network itself, suggesting how the hippocampus might integrate velocity inputs, for example, through the use of sinusoidal arrays (Touretzky et al. 1993) or continuous attractor networks (Tsodyks and Sejnowski 1995; Samsonovich and McNaughton 1997). However, lesion studies (Wan et al. 1993; Van Cauter et al. 2012), the discovery of grid cells (Hafting et al. 2005), theoretical considerations about the limited spatial range and resolution of the hippocampal code (Fiete et al. 2008), and models of path integration by grid cells (Fuhs and Touretzky 2006; Guanella et al. 2007; Burak and Fiete 2009; Burgess et al. 2007; Hasselmo 2008), point instead to the MEC as the locus of path integration, leaving to the hippocampus the still-formidable function of synthesizing information from multiple sensory streams and constructing associations between them.

Models of place cells that have followed the discovery of grid cells suggest that place fields are formed by summing and then thresholding the activity of multiple grid cells with different spacings and orientations, Fig. 14.2f (O’Keefe and Burgess 2005; Rolls et al. 2006; Solstad et al. 2006; Franzius et al. 2007; Hayman and Jeffery 2008; Savelli and Knierim 2010; Monaco et al. 2011). These models swing in the opposite direction, seeming to suggest that the primary input to and determinant of place cell firing is based on feedforward idiothetically derived grid cell activity, rather than external sensory cues or structured lateral connectivity. The reality of place cells is likely somewhere in between, if indeed place cells are the basis of the brain’s cognitive map of space. Thus, they must derive their activity by combining idiothetic and allothetic cues, as in the more comprehensive, functionally motivated models of place cell activity, summarized in the next section.

4.5 Other Cells with Strong Spatial Correlates

The spatial circuit contains several other cell types that respond selectively to external spatial cues. Some cells in the subiculum fire at a fixed perpendicular distance from environmental borders, even when the environment is resized (Lever et al. 2009). These cells were predicted to exist by the boundary vector cell (BVC) model of place cell firing, Fig. 14.2g (O’Keefe and Burgess 1996; Burgess et al. 2000; Hartley et al. 2000; Barry et al. 2006). According to the BVC model, place cells are formed by summing multiple BVCs with intersecting firing fields; BVC activation is clearly related to external features in the environment, and these cells are hypothesized to be largely driven by external sensory cues. However, it remains unclear whether in the hippocampus the BVC cells drive place cells or if place cells dominated by external inputs sum to drive BVCs (Derdikman 2009) (in a way similar to the Hubel–Wiesel model for V1 orientation tuning from selective feedforward summation of LGN neurons). Similar to BVCs, the MEC contains border cells (Solstad et al. 2008; Savelli et al. 2008), which respond by firing whenever the animal is directly at an environmental boundary. In contrast to BVCs, these cells do not tend to fire at a finite perpendicular distance away from the boundary, Fig. 14.2a.

4.6 The Responses of Spatial Cells to Changes in the Environment

The descriptions of neural representations of space described above involved the static representation of familiar, unchanging environments. How do these representations change when the environment changes? Real-world navigation involves representation of novel environments and modified familiar environments. Thus, it is critical to understand spatial representation under changing conditions.

When an environment is rotated, grid cells (Hafting et al. 2005) and HD cells (Taube et al. 1990b) continue to be active and coherently rotate their field centers as a group. Across different familiar environments, the firing fields of grid cells may additionally display coherent shifts in spatial phase while maintaining regular periodic tuning (Hafting et al. 2005; Fyhn et al. 2007; Yoon et al. 2013). These findings suggest that HD and grid cells track angular displacements relative to a starting angle or linear displacements relative to a starting location, respectively, roughly independent of context or specific location within the environment. Under mismatched cue rotations, HD cells rotate coherently (Yoganarasimha et al. 2006), suggesting that they represent a single best estimate of the external orientation of the world. Across diverse familiar environments, grid cells maintain the specific periodicity of their spatial responses, which suggests that they use a fixed internal scale to measure displacement.

Place cells, on the other hand, can display a response known as global remapping (Muller and Kubie 1987; Leutgeb et al. 2004; Wills et al. 2005; Leutgeb and Leutgeb 2007; Colgin et al. 2008) across environments: when an animal is moved to a clearly different environment, or the contextual cues in the environment are made sufficiently different (e.g., change in both the wall color and boundary shape or both boundary shape and texture), the ensemble of active place cells changes, and the relationships between their firing fields also changes (see Chap. 9). For example, one of two cells with overlapping fields might stop firing in the new environment, while the other continues to fire. In this way, place cells generate largely independent representations across sufficiently different environments. Place cells can alternatively display a less dramatic change in their representation, through rate remapping, in which the relative field amplitudes are differentially modulated by as much as a factor of 10 in the peak firing rates in response to more subtle contextual changes (Leutgeb et al. 2005). In rate remapping, the centers of place fields and their spatial relationships do not change.

What are the mechanisms underlying global and rate remapping? Global remapping is accompanied by shifts and rotations in the activity patterns of grid cells in EC neurons (Fyhn et al. 2007), but under rate remapping such changes are undetectable (Fyhn et al. 2007; Leutgeb et al. 2007). Theoretical (Fiete et al. 2008) and modeling studies (Monaco et al. 2011) suggest that if shifts or rotations (either shifts or rotations are sufficient) of grid cell responses are different across the different grid networks or modules, then the spatial representation undergoes discontinuous changes, and a procedure for constructing place cells from grid cells by summing the activities of grid cells from different grid networks will result in globally remapped place cell responses. This suggests that the orthogonalization of CA3 representations across environments can be attributed to changes at the level of the MEC (Leutgeb and Leutgeb 2007). What remains to be tested is whether global remapping can be observed in CA3 in the absence of such global remapping in the MEC, which would suggest that perhaps LEC, or perhaps DG, participate in hippocampal global remapping.

Rate remapping, then, might involve a separate, non grid-cell source, since consistent shifts or zero shifts in the grid input will produce no differential modulation of place fields under the model where place fields are driven only by grid cells. There are two likely candidate sources for rate remapping: the DG and the LEC. The DG is sensitive to subtle changes in environmental context, as revealed by the morph-box paradigm (Leutgeb et al. 2007), and several studies have shown that animals with DG lesions are impaired when making place discriminations (Gilbert et al. 2001; Goodrich-Hunsaker et al. 2008; Morris et al. 2012; Kesner 2013) and context discriminations (Lee et al. 2004b; McHugh et al. 2007; Tronel et al. 2012; Kheirbek et al. 2012; Nakashiba et al. 2012). For example, in a fear conditioning paradigm, animals with DG lesions failed to distinguish between ambiguous environments (one box in which they were fear conditioned and a second similar but unfamiliar box that differs from the first only in some nonspatial cue, like color) (McHugh et al. 2007). At the same time, the DG is not required for distinguishing between unambiguously different environments (McHugh et al. 2007) and in some circumstances does not appear to participate in global remapping (Leutgeb et al. 2007). These findings point to a role for DG in discriminating between subtle differences in environment context based on external sensory cues, through some form of pattern separation, and support the idea that DG might provide the drive for rate remapping in response to environmental changes that do not invoke global remapping (Treves et al. 2008). The LEC is also likely involved in rate remapping, as a recent study has shown that lesioning the LEC impairs the expression of rate remapping in CA3, even though spatial tuning remains intact (Lu et al. 2013).

To summarize, it is possible that parahippocampal remapping (grid field shifts and rotations and head direction rotations) elicits the near-orthogonal global remappings in hippocampus, whereas rate remapping is due to an altogether different mechanism. This mechanism could be intra-hippocampal (and DG-dependent) in origin (Leutgeb et al. 2006; Leutgeb and Leutgeb 2007), or alternatively, might depend on external sensory cues arriving via the LEC (Deshmukh and Knierim 2011).

What is the computational role of these different types of remapping? Clearly, if the brain uses cognitive maps of space to navigate, a new or sufficiently different environment calls for the construction of a new map. On the other hand, a given map should be capable of modification by smaller or incremental changes to a familiar environment, without losing the information already built into the present map or being rewritten by an entirely new map for the environment. Rate and global remapping may be the hippocampal solutions for these two scenarios, respectively. Important unanswered questions involve learning what determines the threshold of similarity before rate remapping gives way to global remapping, how flexible or adaptable are such thresholds as a function of animal experience in stable and unstable worlds, which computations and areas are responsible for setting the threshold, and what are the mechanisms by which global and rate remapping trigger map plasticity and learning.

4.7 Differential Roles of the Hippocampal Subfields in Localization and Mapping

The data recounted thus far indicate that the hippocampus receives both allothetic and idiothetic cues and is in a position to encode associations between the two to generate a map-like representation. These data support the cognitive map hypothesis of O’Keefe and Nadel (1978). But what differentiates the different hippocampal subfields, and where might the spatial map reside?

It has long been proposed that the DG performs pattern separation to allow, as discussed above, the disambiguation of relatively similar environments based on subtle differences (McNaughton and Morris 1987; Treves and Rolls 1992; O’Reilly and McClelland 1994; Kesner 2007). This hypothesis is supported by electrophysiological work (Leutgeb et al. 2007; Marrone et al. 2011; Satvat et al. 2011) and behavioral studies (McHugh et al. 2007; Tronel et al. 2012; Kheirbek et al. 2012; Nakashiba et al. 2012; Gilbert et al. 2001; Creer et al. 2010; Clelland et al. 2009; Sahay et al. 2011; Goodrich-Hunsaker et al. 2008; Morris et al. 2012; Kesner 2013; Tronel et al. 2012; Lee et al. 2004b). A number of developmental, physiological, and anatomical factors, including neurogenesis (Nakashiba et al. 2012; Piatti et al. 2013), sparse firing (Barnes et al. 1990; Jung and McNaughton 1993; Chawla et al. 2005; Neunuebel and Knierim 2012), large efficacious mossy terminals (McNaughton and Morris 1987; Henze et al. 2002), and the anatomical divergence of inputs from EC onto DG (Amaral et al. 2007), are likely to play a mechanistic role in the pattern separation functionality of this layer, possibly for the formation of independent (if not completely separated) representations of relatively similar places and environments [see recent review articles Aimone et al. (2011), Yassa and Stark (2011), Schmidt et al. (2012), Piatti et al. (2013), Kesner (2013)].

The functional advantage of such pattern separation is that downstream areas, in particular CA3, can easily recognize these places as distinct in forming a map and in forming episodic memories involving these places. Thus, DG may be viewed as a preprocessor of external sensory inputs that are used downstream in map building.

Computational models of the spatial circuit (some of which are described in the following section) suggest that the overall spatial map resides in either CA3, CA1, or both.

According to a number of studies that involve simultaneous recordings in both areas (Lee et al. 2004a; Leutgeb et al. 2004, 2006), CA3 can respond rapidly to environmental changes by exhibiting immediate (global and rate) remapping, in contrast to CA1, whose responses tend to often remain, at least initially, relatively stable and independent of such contextual changes. These findings are consistent with earlier studies that showed a lagging response in CA1 to environmental changes (Bostock et al. 1991; Lever et al. 2002), as well as behavioral studies showing that NMDARs in CA3, and not CA1, are necessary for rapid memory acquisition (Lee and Kesner 2002; Nakazawa et al. 2003). In addition to these differences in the time course of their responses to environmental change, CA3 and CA1 exhibit differences in their spatial representations: representations across environments in CA3 are more orthogonalized than those in CA1 (Vazdarjanova and Guzowski 2004; Leutgeb et al. 2004; Colgin et al. 2010). Moreover, CA3 representations tend to shift coherently when proximal and distal cues are put into conflict, in contrast to CA1, which shows more variable changes (Lee et al. 2004b).

CA3 and CA1 also differ in their internal anatomy and anatomical inputs: CA3 receives overlapping projections from LEC and MEC, while in CA1 these projections are well separated (Witter et al. 2006). Since LEC and MEC are believed to code for complementary aspects of the world [object vs. place; external sensory information vs. internal sensory information; non-self vs. self (Knierim et al. (2006), Lisman (2007)], this suggests an associative role for CA3, in this case binding together different kinds of cues to build episodic or conjunctive representations of place, context, reward contingency, etc. This is consistent with the primate literature on the role of hippocampus in forming associative and episodic memories (Eichenbaum and Lipton 2008; Buzsáki and Moser 2013). As variously noted, this associative role is consistent with the extensive recurrent excitatory collaterals in CA3 (Marr 1971; Hopfield 1982; Lansner 2009). A map of an environment is commonly understood to mean a representation that encodes relationships between pairs of locations and the relationships between landmarks in the environment and their locations. By this definition and based on the associational role of CA3, it’s likely that some version of a map of space resides in CA3. The map could in principle either encode detailed metric (distance) and geometric (angle) information relating different locations or encoding more qualitative topological information that preserves relative distances and other topological features (Muller et al. 1996; Balakrishnan et al. 1999). Recent mathematical analysis of the CA3 code suggests the latter, that the CA3 map appears to be more topological than geometric and metric (Dabaghian et al. 2012).

A large fraction of the lateral connections in CA3 are directed rather than reciprocal (Muller et al. 1996; Buzsáki 2006), suggesting the possibility that CA3 is further involved in the associative learning of location sequences between place cells. These place cell sequences would correspond to routes or trajectories between locations in the external environment. Indeed, studies report the existence of various sequence-playback events, including replay and preplay of place field sequences when animals are quiescent, sleeping, or about to start running down a path (Foster and Wilson 2006; Johnson and Redish 2007; Diba and Buzsáki 2007; Davidson et al. 2009; Karlsson and Frank 2009; Dragoi and Tonegawa 2011). The function of such playback events may be related to route memorization, recall, and planning (Hasselmo 2012).

O’Keefe and Nadel hypothesized that CA1 functions as a mismatch detector, or comparator, comparing predictions derived from the map with direct observations (O’Keefe and Nadal 1978). Mismatch detection is a form of novelty detection, and there is some empirical support for this hypothesis. CA1, but not CA3 or DG, showed marked increases in expression of the immediate early gene Fos, a marker for recent neural activity and plasticity, after animals were exposed to environmental novelty (VanElzakker et al. 2008). In addition, CA1 cells appear to primarily respond to combinations of input from CA3 and EC, not separately: only when inputs from CA3 and EC arrive concurrently at the proximal and distal portions of a CA1 pyramidal cell dendrite, respectively, does a dendritic plateau potential, necessary for burst firing and plasticity, triggered (Takahashi and Magee 2009). In the comparator view, CA1 is comparing the learned associations or predictions from CA3 with the sensory cue-driven outputs of EC to decide whether to fire. On the other hand, with inputs from MEC and LEC terminating on different cells of CA1 (Witter et al. 2006), it is unclear whether and how the MEC and LEC inputs may be combined and integrated within CA1.

Thus, spatial computation within the hippocampus might function as follows: In a familiar environment, sensory sensory cues, through the perforant path, retrieve a learned topological map in CA3 that contains relative spatial information about different locations, together with “handles” to other variables like context, salience, and reward contingencies. This map may be compressed in the sense that it lacks geometric and metric information about the environment (angles and distance between locations and landmarks) (Dabaghian et al. 2012). This retrieved map generates predictions about location based on learned knowledge of commonly taken past routes and relative locations, which may then be compared, may then be compared in CA1 against the sensory-based inputs arriving from the EC (Hasselmo and Wyble 1997), to perform self-localization and possibly influence, via feedback, the map in CA3 (Sik et al. 1994), Fig. 14.3. The role of CA1 in this view is as a user of the spatial map to perform localization during navigation. Finally, the more mysterious of the hippocampal subfields, the subiculum, contains cells of diverse spatial tuning, including place cells (Barnes et al. 1990; Sharp and Green 1994), with some cells whose field locations appear to be invariant to environmental context (Sharp 2006; Kim et al. 2012). From a computational point of view, many important functions, including the computation and incorporation of metric information into navigation calculations (for instance as needed in homing and map building), have yet to be assigned neural loci, some of which may be performed by the subiculum.

Fig. 14.3
figure 3

Self-localization and mapping in the hippocampal circuit. Spatial representations and computations in the hippocampal circuit: External sensory information and idiothetic cues are relayed to the hippocampus through the LEC and MEC, respectively. These areas convey information about location given the sensory data. The perforant path, via layer II of the EC, projects to both DG and CA3, where allothetic and idiothetic cues from LEC and MEC, respectively, are mixed (indicated by the shading of the cells) and used, together with the internal recurrent connectivity of CA3, which encodes a topological map of the environment, to generate a prediction of the animal’s current location. For self-localization, the prediction from CA3 is compared in CA1 against the direct sensory input conveyed from EC. Output from CA1 may be used to correct the PI in the MEC via the subiculum (Sub) or to alter the map in CA3 (gray arrow with question-mark, where the question mark highlights the lack of a direct connection from CA1 back to CA3), possibly through EC

In the following sections, we explore computational models that seek to explain how the different areas combine into an entorhinal–hippocampal circuit that is capable of solving the navigational tasks of localization and mapping.

5 Computational Models of the Cortical-Hippocampal Circuit for Spatial Navigation

Empirical findings do not yet provide a complete answer for how the components of the brain’s spatial navigation circuit work together to perform the computations necessary for localization and mapping. In this section we review three computational models from amongst a number of such models that incorporate, to greater or lesser extent, the neurophysiological findings on codes for space in the brain to obtain a functioning circuit for localization and mapping. These models help drive a better understanding of how the circuit might work, while highlighting the gaps in our knowledge.

A notable early model that incorporated both allothetic and idiothetic cues, and identified the brain areas likely to be involved in map building and self-localization, was presented in Redish and Touretzky (1998). Redish and Touretzky reasoned that the path integrator (PI) resides outside the hippocampus (Touretzky and Redish 1996). In Redish and Touretzky (1998), which forms one of a series of models involving both allothetic and idiothetic drive to place cells (Wan et al. 1993; Touretzky and Redish 1996; Redish and Touretzky 1997), a spatial map is constructed in a composite CA1/CA3 network. Local view and PI inputs are associated at the level of the EC layer, which then projects to CA1/CA3, Fig. 14.4a (see caption for details of model). The CA1/CA3 network is endowed with recurrent connectivity and learns a topological representation of the environment through the formation, by associative plasticity, of lateral connections between pairs of place cells with nearby field centers. Learning takes place under idealized circumstances, in which there is no ambiguity in the visual input, and the PI is error-free. After map formation, the system is capable of self-localization with noisy cues (including noise in the PI): the topological map arrives at a single estimate of location from ambiguous and possibly conflicting sensory cues (both PI cues and visual cues) through winner-take-all (WTA) dynamics in the CA1/CA3 network. The field center of the winner place cells represents a guess of the animal’s location, which can be used to reset the PI. The neural dynamics of this model are relatively realistic, incorporating rate-based neurons with biophysical time constants that support attractor dynamics in the PI, the local view network, and CA1/CA3. Associations between the allothetic and idiothetic (PI) inputs are formed in a high-level sensory area that is distinct from the CA1/CA3 network, but it is not entirely clear if performance would be hurt by shifting these associations to the CA1/CA3 network, as would seem more consistent with a modern understanding of the circuit. It is also unclear how the system would perform if the inputs during map learning were noisy and unreliable, as is the case in the real world. Indeed, the problems of localization and mapping become particularly difficult when both must be solved simultaneously in a noisy world: map development without accurate location coordinates and localization without an accurate map.

Fig. 14.4
figure 4

Hippocampal circuit models. (a) Model of (Redish and Touretzky 1998): Sub/PaS (subiculum/parasubiculum) and HLS (high-level sensory areas), respectively, represent path integration-based (unimodal bump on gray sheet; in this and other panels, neurons are arranged topographically according to place preference) and local view-based location estimates (the view-based estimate may be unimodal or multimodal, depending on the ambiguity of the visual cues). These regions project to EC. Sub/PaS projections to EC are fixed (solid line), whereas HLS projections to EC are learned (dotted line). This enables a mapping of the HLS representation into PI coordinates so that inputs from the HLS and PI representation match. EC projects one-to-one to CA1/CA3. During exploration, coactive CA1/CA3 cells become recurrently coupled through Hebbian plasticity (dotted lines). The learned weights in CA1/CA3 are excitatory and, coupled with global inhibition, facilitate winner-take-all (WTA) dynamics (red cell is winning cell), which can be used to reset the PI (red arrow) through one-to-one projections back to Sub/PaS. (b) Model of (Arleo and Gerstner 2000): Allothetic input drives a collection of snapshot cells, each connected to a random assortment of visual filters. Snapshot cells project to superficial EC cells (sEC) through plastic synapses (dotted line), creating a sparse representation of visual location signatures in sEC. Idiothetic cues drive a unimodal bump in the medial EC (mEC). Together, mEC and sEC drive CA1/CA3. During learning, when too few CA1/CA3 cells are active at a location, a new cell is added to the sEC and CA1/CA3 layers, with random initial weights to and from the EC layers. The weights between the CA1/CA3 and EC layers undergo Hebbian plasticity. Inset, when the path integration error exceeds a certain threshold, the animal moves toward familiar territory (e.g., toward home) to recalibrate the PI. Once the territory is sufficiently familiar, the position estimate in CA1/CA3 resets the PI

Arleo and Gerstner proposed a multilayer model (Arleo and Gerstner 2000), similar to that of Redish and Touretzky (Touretzky and Redish 1996), with a focus on how maps might be built over the course of exploration from noisy and ambiguous multimodal (allothetic and idiothetic) sensory cues. Here, the PI (a set of neurons with Gaussian tuning curves, whose firing is determined based on an integrated estimate of animal location, without a neural network model of the integration process; this area is referred to by the authors as mEC) is subject to error accumulation. The visually derived (allothetic) input to the system is nonmetric, meaning that it does not directly encode animal position in allocentric coordinates, Fig. 14.4b. This allothetic sensory input is computed [in a network referred to as the superficial EC (sEC)] from snapshot cells, which encode ego-centric pictures of the world. Snapshot cell firing rates are determined by the sum of the projections of the visual input scene onto a select group of visual filters. Unlike previous models, this model does away with the need for assuming explicit landmark identification from visual inputs, because the snapshot cells use simple linear filters to generate visual input-determined fingerprints for different locations.

A downstream layer, the CA1/CA3 network, receives input from sEC and mEC and constructs a map of the environment in the form of place cells that are activated by the combination of the external sensory cues (from sEC) and the corresponding PI inputs (from mEC), for each location. This is achieved by updating the sEC and mEC input strengths to active cells in the CA1/CA3 layer through Hebbian learning. New place cells are added when a location is sufficiently unfamiliar (i.e., when an insufficient number of the existing CA1/CA3 cells are activated). At any given time, the preferred locations of the active ensemble of CA1/CA3 cells, as driven by the mEC and sEC, are averaged to represent the animal’s location in the environment. The PI, which accumulates error over time by design, passes its inconsistencies to the map being learned, if uncorrected. The model mitigates the likelihood of any resulting discontinues in the map by assuming that the trajectory during map learning in unfamiliar environments consists of short exploratory excursions that loop back quickly to a familiar location, a hypothesis supported by behavioral evidence (Eilam and Golani 1989; Golani et al. 1993; Tchernichovski et al. 1998; Whishaw et al. 2006; Wallace et al. 2006). At familiar locations (determined by the number of active CA1/CA3 cells), the PI coordinates are reset, Fig. 14.4b, resulting in a PI whose error is effectively bounded.

A further development of this line of models, triggered by the discovery of grid cells (Hafting et al. 2005; Fiete et al. 2008), was provided by Sreenivasan and Fiete 2011. Theoretical considerations show that the grid cell code makes possible corrections of noise-driven errors resulting from the neural path integration process, even without the help of external landmarks (Sreenivasan and Fiete 2011). The readout layer for error-correction, equated with CA1, receives feedforward path-integrated inputs from the multiple-scaled grid cell networks and performs WTA dynamics. The winner place cell represents the estimated location, and the estimate thus formed is approximately a maximum-likelihood estimate of location given the noisy PI inputs of all spatial periods. The specific multi-period grid cell code ensures that the accuracy of this estimate of location is high, compared to if the PI inputs were coded simply as unimodal or more place cell-like representation [this is because of the specific, very perturbation-sensitive representation of different locations by the collective grid cell code; see Sreenivasan and Fiete (2011) for details]. Return projections from CA1 to grid cells (via intervening areas) would then reset the PI. The same return projections can correct the PI if the CA1 WTA dynamics were run based on visual or other allothetic inputs instead of grid cell inputs. A notable feature of the model is its separation of the roles of CA1 and CA3: CA3 inputs are hypothesized to provide internal guesses or predictions to CA1 that constrain the set of possible locations from which CA1 selects the winner for the current time-step. CA3 predictions are based on the last estimate of location, combined with learned knowledge about physical boundaries in the environment, about commonly taken past routes in the environment, and about physical constraints of the world, such as the impossibility of spontaneously tunneling between remote locations. CA3's constraints on CA1 are enforced by a coincidence rule, so that EC feedforward input can only allow a CA1 place cell to win in the WTA dynamics if the cell also simultaneously receives a CA3 input signifying that the cell represents a possible location for the present time-step. Thus, CA3 inputs are “enablers” of CA1 firing. External sensory cues, when present, are assumed to enter CA3, thus contributing to the prediction of possible locations.

The three models described above have several features in common; a notable shared element is that they all view location estimation as a process of computing at each time a single best guess, then updating that single guess in the next time-step. A contrasting approach, as widely used in robotic SLAM (Simultaneous Localization and Mapping) systems (Meyer and Filliat 2003; Durrant-Whyte and Bailey 2006), is to always represent a probability distribution over possible locations and then update that distribution over time (Balakrishnan et al. 1999). The representation and updating of probabilities can allow for far more robust and accurate location estimation in a noisy world, than possible by updating and storing only a single best guess.

The models covered here represent a small sample of the dozens of hippocampal spatial models introduced over the last 3 decades. A notable omission from the computational perspective in this review are ratSLAM algorithms (Milford and Wyeth 2010), which meld insights from robotic SLAM with the physiology of the hippocampus. These algorithms achieve impressive performance in localization and mapping, but many complex computational steps are performed without plausible biological implementation. We have highlighted a particular set of models because they illustrate some of the problems involved in mapping and self-localization in a transparent way. What conclusions do we draw from these models, and what gaps remain?

One important insight from these models is that combining information from multiple sources—external landmark-based sensory cues, self-motion-based cues, and information about previously visited locations—can greatly enhance one’s ability to self-localize, both in adding precision to one’s estimate as well as functioning in a compensatory fashion when a subset of cues are absent. Simultaneously, the models highlight the incompleteness of our knowledge in how the spatial circuit performs this information fusion: they lack the rich dynamics that the hippocampus expresses across environments and under different behaviors and at best provide caricatures of the different components of the circuit and their roles in navigation.

Conclusions

Our knowledge of how the brain performs localization and mapping is rapidly growing, thanks in large part to the discovery of various cell types that represent different pieces of spatial information in decipherable ways. In concert, our understanding on a mechanistic level of the underlying neural circuits for each of these cell types is also rapidly expanding, and the evidence of this is the agreement between computational predictions about neural activity from circuit models of cells (e.g., HD cells and grid cells) and the data. However, there are at least three major areas that deserve more attention from the communities of theorists and experimentalists: one is to determine the role and mechanisms of cell types whose codes are not easy to decipher but which make up a large or possibly even majority fraction of principal cells in areas like the MEC (Zhang et al. 2013; Sargolini et al. 2006; Boccara et al. 2010; Mizuseki et al. 2009). Some of these cells have firing patterns that correlate with spatial location and associated spatial variables, and others do not. The second is to resolve the mechanisms through which temporal oscillation dynamics in the theta and gamma bands obtain and represent spatial information. It is clear that neural firing rates convey spatial information through the tuning curves of grid cells, HD cells, and place cells; it is also clear that oscillations strongly influence spike timing and convey spatial information (Brown et al. 1998; Mizuseki et al. 2009; Domnisoru et al. 2013; Schmidt-Hieber and Hausser 2013; Royer et al. 2012; Jadhav et al. 2012; Reifenstein et al. 2012). However, are these spike time representations fundamental to the spatial circuit, in the sense that computations within the circuit are based on detailed spike timing and coincidences, or are the spike time outputs merely readouts, translated into a phase code, of the rate-based dynamics of the network? The third is the question of how the different cell types and areas of the spatial circuit work together to combine incomplete data and predictions about spatial location in order to arrive at the high-quality spatial inference that is a hallmark of navigating animals. Related to this third question, we will in a separate review discuss spatial navigation in the robotics field of SLAM (Durrant-Whyte and Bailey 2006) as it may relate to the problems faced by the brain in solving the same challenges. We seek to understand how the brain solves the sequential probabilistic inference problems of navigation that have been identified as computationally difficult in robotic SLAM. In this way, we may gain insight into some of the more mysterious aspects of neural representation in the spatial circuit.

In this chapter we have focused on exploring how the brain’s navigational circuit solves the problems of map building and self-localization in novel and familiar environments. Despite this focus, it bears emphasizing that the hippocampus does not likely exist solely or even primarily to serve this function. Even in seeking to learn the role of the hippocampal circuit in navigation, it might be profitable to take the bigger view of the hippocampus’ general computational role (Buzsáki and Moser 2013), because its spatial role may be understood as a special case of the general functions it performs. Given its intrinsic organization and anatomical relationship with the cortex, the hippocampus appears to organize, index, and enable access to the brain’s contents for fast, efficient retrieval (Teyler and DiScenna 1986; McNaughton et al. 1996; Leutgeb and Leutgeb 2007; Teyler and Rudy 2007). In this view, the role of the hippocampus in the brain is akin to the role of a librarian (Buzsáki 2006): given vague or partial information about a book (or an event or thing), the librarian (hippocampus) can retrieve the full record and return a pointer to the book (or the full memory of the event or thing). This is consistent with the autoassociative pattern-completion role usually ascribed to CA3 (Grossberg 1969, 1971; Hopfield 1982; Amit 1994; McClelland and Rumelhart 1985; McNaughton and Morris 1987; Treves and Rolls 1992). To understand the elevation of the spatial variable, we might build on the analogy. The full record kept by the librarian includes a title, author names, a summary, a publication date, a publisher, number of copies in the library, and importantly, a call number. The call number is a privileged indexing variable: one author can have multiple books and multiple books may share a title, etc., but each book has a unique call number, and this number further specifies where on the shelves to find the book. On the shelves, books placed near each other address related topics, and thus the call number conveys semantic meaning that goes beyond simply providing a unique identifier. Similarly, whereas the full record of an episode consists of a place, a time, context, valence, reward contingency, and landmarks, the place or location index is privileged. It is an efficient locator of a memory, and, in general, records with similar spatial labels will tend to have important relationships to each other because of the spatiotemporal continuity of the world.

The spatial roles of the hippocampus in mapping and self-localization are perfectly consistent with the view of hippocampus as a general indexing machine for episodic memory (Teyler and DiScenna 1986): mapping, which involves the acquisition of information about external sensory landmarks and contexts and their association with internal sensory inputs at specific locations and with each other, is a spatial form of memory storage. Self-localization, which involves the recall of spatial associations and memories when some semblance of the sensory inputs is reencountered in noisy or ambiguous form, is a spatial form of retrieval.