Introduction

The ability to navigate successfully through familiar and unfamiliar environments is essential in daily life. Rather than considering human navigation as a unique cognitive process, we can describe it as the result of interaction between different abilities, such as the ability to retain the spatial layout of an environment, to find a route connecting two locations or to create an interconnected network among different paths. Different frames of reference could be used to navigate and orient oneself in the environment, yielding to different types of spatial knowledge (O’Keefe and Nadel 1978). On one hand, environmental objects (i.e., landmarks) might be located by referring to one’s position, thus representing the environment from a route perspective (e.g., the traffic light is on my left, the restaurant is behind me on the right, etc.) based on an egocentric representation of the path. Route knowledge has been typically described as sequences or “chains” of landmarks linked by experienced paths of movement connecting them (Montello 1998). Thus, route knowledge consists of information about the order of landmarks and minimal information about the appropriate action to perform at “choice-point” landmarks, such as “turn right” or “continue forward” (Montello 1998). On the other hand, we might refer to the geometrical configuration of each landmark, typically absent in route knowledge, and of the environment itself and, thus, represent the environment using a survey (map-like, allocentric) representation (e.g., the traffic light is northwest, the restaurant is 5° south, etc.) (O’Keefe and Nadel 1978). Survey knowledge has been hypothesized to derive from accumulated route knowledge (Montello 1998), encompassing knowledge about routes and landmarks, organized within a common frame of reference (Montello 1998). In either case, human navigation depends greatly on degree of familiarity, which has been hypothesized to have a crucial effect on the format of representation of the same environment (Siegel and White 1975). Specifically, environments that are well known are more likely to be represented in a survey format (similar to cognitive maps) and the format of representation, which influences the style of navigation, changes with degree of familiarity (Siegel and White 1975). Montello (1998) proposed that both types of knowledge might be acquired from first exposure to the environment. Regarding familiarity, more than a qualitative shift between different representations (i.e., route and survey) he proposed that as familiarity with the environment increases there is a relatively quantitative continuous increase in the accuracy and completeness of spatial knowledge (Montello 1998). Taylor and Tversky made an in-depth study of memory of environments described from a route or a survey perspective. In particular, they found that individuals could proficiently shift from a route or survey perspective (Taylor and Tversky 1992). They developed a classical paradigm that required subjects to learn environments described in a route or a survey perspective; then they assessed the ability to retrieve the environments both from the same perspective used during learning and the opposite perspective. They found that, regardless of the format used during learning, participants form the same spatial mental models and capture the same spatial relationship between landmarks from both a survey and a route description (Taylor and Tversky 1992).

These cognitive models and classical studies about human navigation suggest that familiarity and task type, as well as their interaction, can influence the cognitive processes involved in spatial navigation. Although Shelton and Gabrieli (2002) assessed the neural substrates of encoding space from a route and a survey perspective, it is still unclear whether the learning strategy adopted during familiarization with a novel environment affects the retrieval of navigational knowledge when it is assessed from a specific perspective (i.e., a route or a survey perspective).

Over the past decade, fMRI studies of the neural correlates of human navigation have provided evidence of complex neural networks subtending the human ability to orient and navigate, that support the idea of navigation abilities as an ensemble of different and, to a certain extent, dissociable abilities (for a review see Boccia et al. 2014). Nonetheless, how the format of spatial representation modulates the activity of these brain regions is still unclear. This topic was directly assessed in recent studies by means of virtual reality paradigms, such as the cognitive map test (Iaria et al. 2007, 2008) and similar paradigms (Hartley et al. 2003; Latini-Corazzini et al. 2010) or by means of mental navigation tasks (Hirshhorn et al. 2012; Rosenbaum et al. 2004, 2007). Many studies have shown that egocentric navigation involves an ensemble of areas including the PPA, precuneus and cuneus, inferior parietal lobe and retrosplenial cortex (RSC) (see also Byrne et al. 2007). Instead, allocentric navigation seems mainly related to the hippocampal complex (Maguire et al. 1998; O’Keefe and Nadel 1978; Tolman 1948) and, more specifically, to a network of areas containing place cells (hippocampus) and grid cells (entorhinal cortex) (Byrne et al. 2007). Shelton and Gabrieli (2002) compared brain activation during route and survey encoding and found that a similar network underpins their encoding of space, but with important differences. In particular, they found that survey encoding recruits a subset of areas also recruited by route encoding, but with greater activation in some areas including the inferior temporal cortex and posterior superior parietal cortex. Moreover, route encoding recruited regions that were not activated by survey encoding, including the medial temporal lobe structures, anterior superior parietal cortex and postcentral gyrus (Shelton and Gabrieli 2002). Although several studies focused on the neural correlate of learning or retrieving spatial representation in a certain format, there is a lack of studies directly tackling the interaction about learning and retrieving format.

In the present study, we aimed to explore the effect of “familiarization” with a new environment on brain areas involved in human navigation and their context-dependent connectivity with other regions in the spatial navigation functional network. In particular, we set out to investigate whether learning strategies (i.e., acquiring spatial knowledge from a route perspective or from a map-like perspective) modulate brain activity associated with the retrieval of spatial information from a survey or a route perspective. Following previous models about spatial knowledge (Montello 1998; Siegel and White 1975) in the present study route knowledge was operationally defined as the knowledge of information about the order of landmarks and minimal information about the appropriate action to perform at “choice-point” landmarks, while Survey knowledge was defined as knowledge of geometrical, map-like configuration of each landmark in the environment. To pursue our aim, we used intensive learning, which lasted 5 days, during which participants had to encode two paths in the same real city, one from a route perspective (route learning, RL) and the other from a map-like perspective (survey learning, SL). Then, we asked each participant to retrieve each of these paths using a survey (survey task, ST) or route (route task, RT) frame of reference while undergoing an fMRI scan. The tasks were performed twice, on the first and the fifth day of training. fMRI acquisition at the beginning and at the end of the intensive learning allowed us to evaluate the effect of training on the neural responses of areas engaged in the navigational tasks. We also investigated learning-dependent connectivity using a psychophysiological interaction (PPI) method.

On the basis of the findings in the literature, we hypothesized that in retrieval tasks different networks would be associated with the two types of representation (i.e., route and survey) required during the task. Furthermore, we hypothesized that these differences would be observed not only in retrieval but also when different types of representation (route vs. survey) were used in learning.

Materials and methods

Participants

Fifteen right-handed healthy subjects (mean age 25.33, SD 4.78; 7 women) without a history of neurological or psychiatric disorders took part in this study. All subjects gave their written informed consent. This study was approved by the local ethical committee of Santa Lucia Foundation in Rome, in agreement with the Declaration of Helsinki. Subjects were unfamiliar with the city in which the experiment was carried out, so the two paths administered during the intensive learning were completely unknown to them at the beginning of the training.

Stimuli

For the intensive learning, we used a set of stimuli developed in the city of Latina, which is located about 90 km from Rome. This city was built during the 1930s in a rationalist architectural style. Thus, it has a simple radial-concentric plan. The set of stimuli included both video clips and maps of two paths through the city (paths A and B). Each path was about 9 km long, with 23 crossroads. In each path, the number of turns was balanced across three possible directions (i.e., straight, left and right).

A professional cameraman recorded video clips of the two paths from a first-person perspective. The fps of the video clips was modified to made the video length equal to 7 min (without taking into account the time spent by participants at the different crossings), resulting in an average perceived speed of 1.2 km per minute.

The maps of these paths were created using Google Maps© 2011 (satellite vision). In each map, (60 cm × 90 cm) a blue line indicated the path. A set of 23 postcards has been derived from 23 screenshots, taken from the video clips of each path. The postcards (8.5 cm × 14.5 cm) showed the crossroads from a first-person perspective. The video clips were used to encourage participants to develop a route representation (RL) of the city and the maps to encourage them to develop a survey representation (SL) (Fig. 1a).

Fig. 1
figure 1

Method. a Intensive spatial training materials: video clips from route learning are on the left and maps from survey learning are on the right. b Experimental task materials used during the fMRI scans in the first and second sessions: from the left, survey task, route task and control task. c Experimental timelines during fMRI. RL route learning, SL survey learning, ST survey task, RT route task, CT control task

For the retrieval tasks, 23 screenshots were taken from the video clips of each path. The screenshots depicted the 23 crossroads along each path. Starting from these screenshots we created two sets of stimuli, that is, 42 stimuli for path A and 42 stimuli for path B, to test the spatial representations acquired with learning. Each stimulus had a red arrow at the bottom center that pointed in one of three possible directions (i.e., left, straight or right) and a little box at the bottom right of the picture that indicated (1) the next step on the path in the route task (RT); and (2) either the starting or goal point in the survey task (ST); and (3) a detail of the current visual scene in the control task (CT) (Fig. 1b).

Procedure

The experiment took place on 5 consecutive days. On the first day (pre-learning), participants were shown the two paths in both the route and the survey perspective in a total of four path presentations (path A-route, path A-survey, path B-route, path B-survey) in random order. After the presentation, participants were submitted to an fMRI scan during which they performed the retrieval task (described below).

Then the participants were engaged in intensive route (RL) and survey (SL) learning, which lasted four consecutive days, during which half of them learned path A in a route perspective and path B in a survey perspective and half learned path A in a survey perspective and path B in a route perspective (Fig. 1a).

RL forced participants to encode the path in a route format, from an egocentric perspective, subsequent to watching the first-person perspective video clips during a trial-and-error learning procedure. Each video clip lasted 7 min. Presentation of the video clips, implemented in Matlab using Cogent 2000, stopped at each crossroad and participants were asked to indicate the direction in which the path proceeded by pressing the directional buttons on a keyboard. The video clip presentation started again only after they had indicated the correct direction. If they made an error, the presentation did not start and they were allowed to choose another direction by pressing the corresponding button. The time spent at each crossroad between the video clip stop and the correct response was recorded and was used as a measure of learning across pre-learning and the 4 days of learning.

Unlike RL, SL forced the participants to form a survey representation of space, similar to a cognitive map of the environment. Survey representation was elicited by presenting the real city map (60 cm × 90 cm) on a table. The starting point and goal of the path were represented on the map by means of postcards; a blue line connecting the starting-postcard and the goal-postcard was drawn to represent the path. Participants were asked to learn the map by observing the experimenter who consecutively placed the postcards representing each crossroad along the path on the map. After the experimenter had positioned all of the cards, the participants were asked to place the shuffled postcards in the correct position on the map. Verbal feedback (correct/incorrect) was given for each postcard. If they made an error, participants were allowed to continue to place the postcards until they found the correct location. The accuracy was measured as the number of postcards correctly placed on first attempt and was used as measure of learning during pre-learning and the 4 days of intensive learning.

The order of the learning presentation (SL and RL) was balanced across participants.

On the first (pre-learning) and the last day of learning, participants underwent an fMRI scan during which they were engaged in two retrieval and one control forced-choice tasks (Fig. 1b), presented in three separate runs. In the retrieval tasks, the acquired representations of both paths had to be retrieved in both a survey and a route perspective. In the survey task (ST), participants had to retrieve a survey representation (similar to a cognitive map) of the two learned paths. In each trial of the ST, participants watched a stimulus that represented: (a) a screenshot depicting a crossroad of one of the two paths, (b) a little box depicting the starting or the goal point of the same path and (c) a central red arrow (see Fig. 1b); participants were asked to judge whether the direction of the red arrow corresponded or not with the location of the crossroad with respect to the starting/goal point depicted in the trial. In half of the trials, the red arrow pointed to the correct direction and in half of the trials to a wrong direction.

In the route task (RT), participants had to retrieve an egocentric first-person perspective representation of the paths. In each RT trial, they watched: (a) a screenshot depicting a crossroad of one of the two paths, (b) a little box depicting another crossroad on the same path, and (c) a red arrow and they were asked to indicate whether the red arrow pointed or not to the direction (i.e., left, straight, right) they would have to follow to reach the crossroad in the little box starting from the screenshot (see Fig. 1b). As in the ST task, in half of the trials the red arrow pointed in the correct direction and in half in a wrong direction.

Unlike the ST and RT, in the control task (CT), participants did not have to retrieve any spatial information about familiarized paths, they just had to visually explore the scenes. In each CT trial, participants watched: (a) a screenshots of a crossroad of one of the two paths, (b) a detail of the same crossroad in the bottom box, and (c) a red arrow; they were asked to indicate whether or not the central red arrow pointed toward the area of the picture where the detail depicted in the little box was located (see Fig. 1b). Also in the CT, the red arrow pointed toward the correct direction in half of the trials and toward a wrong direction in the other half.

Tasks were administered in balanced order across participants in three different fMRI runs. In each fMRI run, participants were presented 84 trials, divided into 12 blocks, each including 7 trials. Trials from the different paths (A or B) were presented in different blocks. The block order was balanced across participants, and trial presentation was randomized within each block. Each block began with written instructions about the path (path A or B), which remained on the screen for 2 s. Each trial remained on the screen for 6 s and was followed by a fixation point, which lasted 500 ms (Fig. 1c). The interval between blocks lasted for 20 s, during which a fixation point was presented. Subjects pushed the right or left button of a hand pad if they thought the trial was true or false, on the basis of the information acquired during the learning session before the scan. For each task, we calculated accuracy and response time (RT) and considered the type of learning (RL or SL). For the accuracy, for each participant, we computed the sum of both the correctly identified and the correctly rejected trials. The time needed to carry out the task was computed for each participant using the median RT of the correct responses (including both corrected identifications and correct rejections). The experiment was implemented in Matlab, using Cogent 2000 (Wellcome Laboratory of Neurobiology, UCL, London, http://www.vislab.ucl.ac.uk/cogent.php).

Statistical analyses were performed using Statistica™ (StarSoft Italia, http://www.starsoft.com). On behavioral data, we performed (1) a repeated-measures ANOVA (5 × 1) on the accuracy at SL and (2) a repeated-measures ANOVA (5 × 1) on the response time (RT) at RL, to estimate the rate of learning across the consecutive 5 days of intensive learning. Regarding performances during retrieval tasks (i.e., ST and RT) and CT, we computed two 2 × 3 ANOVAs, with session (first vs. fifth day of training) and tasks (ST vs. RT vs. CT) as factors, on participants’ (3) accuracy and (4) RTs. Then, we performed two 3 × 2 ANOVAs with task (ST vs. RT vs. CT) and learning (RL vs. SL) as factors, on participants’ (5) accuracy and (6) RTs at the last fMRI session. Post hoc comparisons were performed using Duncan’s test.

Image acquisition

A Siemens Allegra scanner (Siemens Medical System, Erlangen, Germany), operating at 3 T and equipped for echo planar imaging was used to acquire functional magnetic resonance images. Head movements were minimized by mild restraint and cushioning. We acquired 38 slices of functional MR images using BOLD imaging (in-plane resolution 3 × 3 × 3 mm, slice thickness 2.5 mm, inter-slice distance 1.25 mm, time repetition 2.47 s, time echo 30 ms) covering the entire cortex.

Image analysis

Image analyses were performed using SPM8 (http://www.fil.ion.ucl.ac.uk/SPM) running in MATLAB 7.1 (TheMath-Works Inc., Natick, MA, USA). For each participant, 332 fMRI volumes were acquired in each of the three runs. The first four volumes of each run were discarded to allow for T1 equilibration. All images were corrected for head movements (realignment) using the first slice as reference. Corrected images were normalized to the standard Montreal Neurological Institute (MNI) Echo Planar Imaging (EPI) template using the mean realigned image as source and then spatially smoothed using an 8-mm full-width half-maximum isotropic Gaussian Kernel. Functional images were analyzed for each subject separately on a voxel-by-voxel basis according to the general linear model (GLM). Neural activation during the blocks was modeled as a box-car function spanning the whole duration of the blocks and convolved with a canonical hemodynamic response function, which was chosen to represent the relationship between neuronal activation and blood oxygenation (Friston et al. 1997). Separate regressors were included for each combination of session (first or fifth day of training), task (ST, RT, CT) and learning (RL or SL). Inter-block intervals were also modeled in relation to the nature of the previous block (RL-rest or SL-rest) and treated as baseline. Group analysis was performed on estimated images that resulted from the individual models of each condition compared with its baseline, treating subject as a random factor. At the group level, we first performed a voxel-wise analysis across the whole brain by means of a full factorial design, including session and tasks as factors. We then performed a full factorial design over the second session parameters, including both tasks (RT, ST and CT) and learning (RL and SL) as factors. We computed an F omnibus contrast of all conditions, masked by a t-contrast, which was obtained by contrasting the navigational tasks (ST and RT) with the non-navigational one (CT) to observe all the cerebral areas whose activity was modulated by the experimental tasks above the control task. The resulting statistical parametrical maps were thresholded at p < 0.05 FWE and cluster size k > 20 voxels. For each subject and region, we computed a regional estimate of the amplitude of the hemodynamic response in each experimental condition by entering a spatial average (across all voxels in the region) of the pre-processed time series into the individual general linear models. The regional hemodynamic response was then analyzed with two repeated-measures analyses of variance (ANOVA), with the same factorial structures of the above-mentioned full factorial designs. The first one, which had a 2 × 3 factorial design [session (first vs. fifth day) by task (ST vs. RT vs. CT)], was aimed at exploring the effect of task in the two experimental sessions of the intensive spatial training. The second one, which had a 3 × 2 factorial design [task (ST vs. RT vs. CT) by learning (RL vs. SL)], focused on the last session and was aimed at exploring the effect of learning on tasks.

We also performed a psychophysiological interaction (PPI) analysis (Friston et al. 1997) on the fMRI data from the fifth day of training across regions whose hemodynamic response was found to be modulated by navigational task and learning. A generalized PPI (gPPI) approach (McLaren et al. 2012) was used to evaluate context-dependent connectivity using the regions whose signal appeared to be modulated by task and learning interaction (RSC-RH, PHG-RH and AG-RH, see “Results”) as seed regions. This approach was implemented in SPM8 using the automated gPPI toolbox with SPM.mat files (http://www.nitrc.org/projects/gppi). Using the gPPI approach, we estimated four different PPI regressors for the RL, SL, RL-rest and SL-rest conditions, in the ST and RT design. This allowed testing for condition-specific functional integration when it was compared with the baseline (RL-rest and SL-rest). PPI regressors were then included in a standard GLM analysis for each subject and region. Group analysis for each region was performed on estimated images that resulted from the individual models of each condition compared with its baseline using a standard one-sample t test. We also performed one-sample t tests to directly compare the individual models of each learning condition with the other one, for each navigational task. The resulting statistical parametric maps were thresholded at p < 0.05, corrected for multiple comparisons using FDR at the cluster level (Genovese et al. 2002).

Results

Behavioral results

In the RL paradigm, we observed a significant effect of day of learning [F(4,52) = 15.84; p < 0.001] (Fig. 2a). Significant differences were observed between the first and the second recording, as well as between the second and the third recording, as shown by Duncan post hoc analysis. RTs decreased significantly with familiarization, especially during the earlier stages of learning (first, second and third day), demonstrating that pre-training exposure (day 1) plus 2 days of intensive learning (days 2 and 3) are sufficient to allow participants to successfully acquire navigational knowledge.

Fig. 2
figure 2

Behavioral results. a Learning performances across the 5 days of intensive spatial training (mean and SD). b Performance accuracy and response times (RTs) on the three experimental tasks performed in the first and second sessions during fMRI scans (mean and SD). c Performance accuracy and response times (RTs) on the three experimental tasks performed in the two learning conditions in the second session during fMRI scans (mean and SD). RL route learning, SL survey learning, ST survey task, RT route task, CT control task

A similar pattern of results was observed in the case of SL, with a significant effect of day of training [F(4,56) = 22.59; p < 0.001] (Fig. 2a). Also in this case, significant differences were observed between the first and the second day, as well as between the second and the third day, as shown by Duncan’s post hoc analysis. The rate of accuracy increased from the first to the last day of training.

Concerning participants’ performances during fMRI tasks at the first and fifth day of intensive learning, we observed a significant effect of session on accuracy [F(1,13) = 124.98; p < 0.001]. We also observed a significant effect of task [F(2,26) = 161.19; p < 0.001]: in both fMRI sessions, participants’ accuracy was greater in the CT compared with the RT and ST, but accuracy in the RT was also greater than in the ST (Fig. 2b). Regarding RTs, ANOVA evidenced significant effects of session [F(1,13) = 70.27; p < 0.001] and task [F(2,26) = 152.64; p < 0.001] as well as a significant interaction session by task [F(2,26) = 9.42; p < 0.01]. Participants were faster at the end of training. On the first day, they were faster in the CT than the ST and RT and were faster in the RT than the ST. On the fifth day of training, there were no significant differences between CT and RT, but they still were significantly slower in the ST.

Concerning ANOVAs on participants’ accuracy and RTs at the last fMRI session (last day of learning), we also modeled dependent variables according to type of learning (RL and SL) (Fig. 2c). Regarding accuracy, we observed a significant effect of task [F(2,26) = 79.217; p < 0.001] as performances were better in the CT than the RT, which were better than the ST, regardless of the perspective format (route/survey) adopted during learning (no significant effect of type of learning). Once again a significant effect of task on RTs was evidenced by the ANOVA [F(2,26) = 189.94; p < 0.001] as response time in the ST was significantly slower than that in the RT and the CT, which did not differ from each other.

Imaging

Neural network of survey and route representation

The first step of the analysis provided a general picture of the cerebral regions involved in recalling a spatial representation from a route or a survey perspective on the first and the fifth day. We performed a full factorial design, as described before. In our experimental paradigm, a task-related activation was operationally defined as an increase of BOLD signal in the retrieval tasks (ST and RT) relative to the CT.

Thus, to identify regions involved in retrieving survey and route spatial representations without considering differences between them we performed an omnibus F test. The resulting image, masked with a t-contrast [(ST and RT) vs. CT], showed all the voxels that were differentially activated by the retrieval tasks relative to the CT. This contrast (corrected for FWE, p < 0.05, k > 20) revealed a network of areas encompassing the bilateral retrosplenial cortex (RSC) as well as the bilateral parahippocampal (PHG) and fusiform gyri (FG), bilateral insula (Ins) and a set of frontal regions, including the superior frontal gyrus (SFG), middle frontal gyrus (MFG) and supplementary motor area (SMA) in the left hemisphere and the inferior frontal gyrus (IFG) and SFG in the right hemisphere. We also observed activations of the bilateral angular gyrus (AG) and left postcentral gyrus. Finally, we observed bilateral activations of the caudate nuclei (CN) (Fig. 1A in Supplementary materials).

The second step of the analysis explored the neural networks involved in retrieving a spatial representation acquired from a route or a survey perspective in a route or survey format on the fifth day of training. Again, we performed a full factorial design by taking into account the learning format (RL or SL) and the format required in the retrieval tasks (ST and RT) on the fifth day of training. A task-related activation was operationally defined within our experimental paradigm as an increase of BOLD signal in the navigational tasks (ST and RT) relative to the CT in the two different learning formats (RL and SL). Thus, to identify regions involved in a survey or route spatial representation, without considering differences between them, we performed an omnibus F test. The resulting image, masked with a t-contrast [(ST and RT) vs. CT], showed all the voxels that were differentially activated by the navigational task and CT.

The resulting thresholded map (corrected for FWE, p < 0.05, k > 20) partially overlapped that of the previous analysis. Indeed, we observed bilateral activations at the level of the RSC and PHG, as well as activations in the parieto-occipital sulcus (POS). We also observed activations at the level of the bilateral insula, superior frontal lobe (SFL), middle frontal lobe (MFL) and right inferior frontal lobe (IFL) and CN. A set of parietal areas, encompassing the bilateral AG, superior parietal lobe (SPL) and post central gyrus in the left hemisphere was also found, together with activation at the level of the left precuneus (pCU) and inferior temporal lobe (ITL) (Fig. 1B in Supplementary materials).

The main effect of session

Analysis of the regional hemodynamic response on the first and the fifth day (p < 0.05, Bonferroni correction) showed that a set of frontal areas (Fig. 3a), including the left SFG, SMA and MFG and the right SFG, together with the bilateral insula and left post central gyrus, was more activated on the first than the fifth day of learning (Fig. 3b).

Fig. 3
figure 3

Effect of session. a Regions showing higher activation in the first than in the second session. b Percent BOLD signal changes in the first and the second session (mean and SD)

Effect of session by task

Analysis of the regional hemodynamic response showed an interaction effect between session and task at the level of the bilateral PHG and right RSC (Fig. 4a). In the bilateral PHG, we observed a different pattern of activation on the first and the fifth day, as the ST was more activated than CT on the fifth day; on the first day, we observed greater activation of the same areas in CT compared with RT and ST (p < 0.05, Bonferroni correction) (Fig. 4b). We also observed an interaction effect in the right RSC, where the ST produced greater activation than the CT on the fifth day (p < 0.05, Bonferroni correction) (Fig. 4b).

Fig. 4
figure 4

Session by task interaction. a Regions showing a session by task interaction. b Percent BOLD signal changes in the first and second sessions in the three experimental tasks during fMRI scans (mean and SD). ST survey task, RT route task, CT control task

Main effect of task after intensive spatial training

We analyzed the effect of the tasks on hemodynamic responses after 5 days of learning. The SFL in the left hemisphere and MFL bilaterally were more activated by the ST than the RT and the CT (Fig. 5a, b). Hemodynamic response in the left SFL during the RT was significantly greater than during the CT, whereas that in the bilateral MFL showed no difference between these two tasks. On the contrary, the right IFL was more activated by the RT than the CT, but there were no differences between the ST and the CT (Fig. 5b) (p < 0.05, Bonferroni correction). Both the ST and the RT produced greater activation than the CT in the bilateral PHG, POS and RSC, right AG, left pCU, bilateral insula, left IFL and right SFL.

Fig. 5
figure 5

Effect of task during second session. a Regions showing effect of task in the second session. b Percent BOLD signal changes in the second session in performance of the three experimental tasks in the two learning conditions during fMRI scans (mean and SD). RL route learning, SL survey learning, ST survey task, RT route task, CT control task

Main effect of learning after intensive spatial training

We analyzed the effect of learning on the brain areas that were involved in the navigational tasks. The hemodynamic response of the right AG and left ITL (Fig. 6a) was greater during recall of the SL than the RL (Fig. 6b), with no differences due to task (i.e., RT or ST) (p < 0.05, Bonferroni correction), as shown by the absence of an interaction effect between task and learning.

Fig. 6
figure 6

Effect of learning. a Regions showing effect of the learning strategy. b Percent BOLD signal changes in the second session in the two learning conditions (mean and SD). RL route learning, SL survey learning

Psychophysiological interaction analysis

Then we analyzed the effective connectivity of three seed regions in the context of different retrieval tasks (i.e., ST and RT) and learning format (RL and SL) after intensive learning: right PHG, RSC and AG (Fig. 7).

Fig. 7
figure 7

Effective connectivity, PPI results. a Effective connectivity of right PHG in the three experimental conditions showing suprathreshold clusters. b Effective connectivity of right RSC in the four experimental conditions. c Effective connectivity of right AG in the two experimental conditions showing suprathreshold clusters. RL route learning, SL survey learning, ST survey task, RT route task, CT control task

Effective connectivity of right PHG

The neural activity in the right inferior frontal lobe was affected by activity in the right PHG when individuals performed a route retrieval of the path learned in a route frame (RT/RL). Otherwise, activity in the right fusiform and lingual gyri was affected by activity in the right PHG when the route task was performed on the path learned in the survey frame (RT/SL). Activity in the right inferior occipital lobe was affected by activity in the right PHG when the survey task involving the path learned in the route frame (ST/RL) was required. No suprathreshold cluster was influenced by the right PHG when the ST of the SL was required (Fig. 7a; Table S1 in Supplementary materials).

Effective connectivity of right RSC

During route recall of a path learned in the route format (RT/RL), neural activity in the right inferior frontal lobe, fusiform gyrus and inferior parietal lobule, as well as in the left middle occipital lobe, precentral gyrus and fusiform gyrus was modulated by neural activity in the right RSC. Otherwise, during route recall of a survey-learned path (RT/SL) the neural activity of the right inferior frontal lobe, superior parietal lobule, middle occipital gyrus, middle frontal gyrus, fusiform and lingual gyrus as well as that of the left superior parietal lobule, supramarginal gyrus and calcarine cortex was affected by neural activity in the right RSC.

Neural activity in the bilateral fusiform gyri was influenced by that in the right RSC during survey recall of the path learned in the route frame (ST/RL). Differently, neural activity in the right fusiform gyrus and superior occipital lobe (SOL), as well as in the left inferior frontal lobe, inferior and superior parietal lobule and lingual gyrus, was affected by activity in the right RSC when a path learned in a survey format was recalled in the survey task (ST/SL). The left cerebellum (crus 2) was influenced by neural activity in the right RSC during the ST (Fig. 7b; Table S2 in supplementary materials) and was significantly more affected when the ST referred to a path learned in the survey format (SL) than when it referred to a path learned in the route format (RL).

Effective connectivity of right AG

Activity in the right AG significantly affected neural activity in the right hemisphere when the RT was performed, that is, it affected neural activity in the right lingual gyrus when a path learned in the route format was retrieved from a route perspective (RT/RL) and affected activity in the right superior parietal lobule when the path was learned in the survey format (RT/SL).

No suprathreshold cluster was affected by right AG activity during the ST regardless of the format (SL or RL) used during learning. But the right AG shaped activity in the right posterior cingulum when a survey format was used for retrieval, as its influence was greater when the path had been learned in the route format (ST/RL) rather than the survey format (ST/SL) (Fig. 7c; Table S3 in supplementary materials).

Discussion

In the present study, we investigated whether the format of representation (i.e., route vs. survey) used during familiarization with a novel environment affects neural activity in areas known to subtend the recognition of navigational scenes. We also analyzed the areas involved in retrieving environments from a survey or a route perspective (ST and RT) when the environments had been learned in the same or in a different perspective. We also investigated how navigation-related areas interact to allow successful navigation in a real environment. For easiness of exposition the discussion will be divided into subheadings.

Behavioral results

We explored the effects of learning across 5 days of training and then the effects due to the type of representation used for learning and the type of representation required by the retrieval tasks. Participants’ performances improved significantly on both the RL and the SL across the 5 days of intensive learning. The greatest improvement was observed during the first 3 days (Fig. 2a); a plateau followed due to the ceiling effect. This was also confirmed by the behavioral results of the retrieval tasks performed during the fMRI scans. Indeed, accuracy increased significantly between the first and the second fMRI scan (Fig. 2b), and response times significantly decreased (Fig. 2b) because participants became more accurate and faster in the second session (i.e., on the fifth day of training).

Some comments must be made about performance on the three different tasks during each fMRI session. Indeed, participants were more accurate on the CT than the RT and on the RT than the ST at both the first and the second fMRI session. Differences in response time were also observed: at the first fMRI session, participants performed significantly slower on the ST than the RT and on the RT than the CT in the first session. At the second fMRI session, their performance was equally fast in the RT and the CT but significantly slower in the ST (Fig. 2b, c). No other significant difference was observed (Fig. 2c). These results support the hypothesis that retrieval of navigational information from a route or a survey representation requires cognitive processes that are at least partially different, as shown by the fact the participants showed different levels of performance improvement between the first and the fifth day. This difference was not due to the learning level or the learning strategies, because we did not find any differences between RL and SL.

Overall, our behavioral results confirm the findings of a previous study which showed that individuals can proficiently shift from one acquired format of representation to another (Taylor and Tversky 1992). However, they also suggest that using one format rather than another may not be entirely the same, because it is more difficult to use a survey perspective than a route perspective.

Whole-brain analysis

Regarding the neuroimaging results, preliminary whole-brain analysis of the fMRI data showed a network of areas that included regions well known for being involved in recognition and recall of navigational representations, such as the RSC and PHG bilaterally, the bilateral AG, left postcentral gyrus and bilateral CN. This network of areas is frequently found to be activated in fMRI studies of human spatial navigation (see for example, Boccia et al. 2014; Kravitz et al. 2011; Nemmi et al. 2013b) and also found to be involved in learning sequences of spatial positions beyond reaching space (Nemmi et al. 2013a). As a whole, these findings underline the validity of our fMRI paradigm to assess the neural correlates of human navigation.

Neural correlates of familiarization with the environment

Concerning the effect of familiarization, we found that a set of frontal areas (i.e., the SFG, SMA, MFG and post central gyrus on the left, SFG in the right hemisphere and the bilateral insula) were more activated at the beginning of learning than when learning had been completed (Fig. 3a, b). The involvement of the frontal areas in human navigation is not unusual. In fact, a recent meta-analysis of fMRI studies on human navigation (Boccia et al. 2014) found that the frontal areas were consistently reported to be involved in navigation. Although the role of the frontal lobes in human navigation has not yet been established, the fact that the frontal areas were more activated during the first session suggests that they have a significant role in the first steps of acquisition of a novel environment. We suggest that their role could be related to navigational planning or in the encoding of navigational information for memory purposes. The latter hypothesis is supported by a recent fMRI study in which improved performance in a virtual maze (due to prolonged spatial training) corresponded to reduced activity in the frontal areas (Hotting et al. 2013).

Moreover, as the insula is involved in the first stages of acquisition of a novel environment it may have a specific role in encoding directions along paths. In fact, the insula has been hypothesized to be part of a brain network involved in so-called topokinetic memory (Berthoz 1997), which refers to the dynamic aspects of spatial memory and includes the storage of directions and distances based particularly on the processing of movement patterns. Furthermore, experimental evidence from fMRI studies of mental simulation of routes demonstrates that the insula is part of a specific navigational network together with the left precuneus and medial part of the hippocampal regions (Ghaem et al. 1997). Furthermore, the insula has been found to be specifically involved in mental repositioning of the body with respect to external objects (Bonda et al. 1995). Of course, this interpretation is only speculative because no movement takes place in the scanner. It has also been noted that the mental processes mediating imagined self-motion and perceived real self-motion are different (Seemungal 2014).

Regarding the possible modulatory role of the type of representation, it seems to affect the activity of the bilateral PHG and the right RSC (Fig. 4a, b). These regions showed a reversed pattern of activation at the beginning and at the end of learning. At the beginning, the PHG was activated more during the CT than during the RT and the ST, whereas the right RSC showed no difference in activation in the various tasks. When learning was fully established, the PHG pattern of activity changed because activity during the CT did not differ from that during the RT, but both the bilateral PHG and the right RSC showed greater activation during the ST. Because the PHG (in particular a more caudal sub-region called the parahippocampal place area, PPA) is activated by the passive viewing of real-world scenes or landmarks, it has repeatedly been associated with perception and the encoding of environmental scenes (Epstein 2008; Epstein et al. 2007). On the other hand, as it is activated by both scene viewing and scene imagery as well as by mental navigation through familiar environments (Boccia et al. 2015; Ino et al. 2002), the RSC is thought to be involved in the recovery of long-term spatial knowledge about familiar environments (Epstein 2008). It has also been hypothesized that the RSC is the core of a brain network involved in different cognitive functions, such as episodic memory, navigation, imagination and planning the future (Boccia et al. 2015; Hassabis et al. 2007; Vann et al. 2009). A recent study which investigated whether the PPA and RSC represent scenes in terms of general categories or as specific exemplars found that both of these regions encoded for both environmental categories (e.g., mountain landscapes, city skylines, etc.) and landmarks (Epstein and Morgan 2012). Our results can be interpreted on the basis of the above-mentioned studies. On the first day of learning, the PHG was greatly involved in perceptual analysis and encoding of the as yet unlearned stimuli. It could have been more active during the CT because this task required localizing details in the perceived picture and, as the stimuli had not yet been learned, the task required a greater perceptual analysis effort. Instead, on the fifth day, localizing details in the CT stimuli required less effort and consequently less PHG activation because the stimuli had already been well learned. Thus, the PHG was more involved in processing features in the survey format of representation required to perform the ST. Future studies directly assessing the activity of PHG during scene viewing for different level of stimulus familiarity are needed to confirm this hypothesis.

The interaction effect we found in the RSC is in line with previous studies that reported that the RSC is involved in recovery of long-term spatial knowledge about familiar environments (Epstein 2008). More interestingly, the fact that the ST activates the RSC to a greater extent suggests that this area is especially involved in retrieving spatial knowledge in a survey perspective. Alternatively, the pattern of higher PHG and RSC activation during the ST at the end of learning could be related to the greater effort required by the survey format (as suggested by the behavioral results). This latter hypothesis is supported by the observation that in both the PHG and RSC activity during the ST significantly differed from that during the CT but not from that during the RT, which in turn did not differ from that in the CT, suggesting that the activity was primarily due to task difficulty, not type of representation. Another finding supports this hypothesis. A set of areas, spanning from the occipital to the frontal lobe and including the bilateral POS, left pCU, and right AG and bilateral RSC and PHG, together with the bilateral insula, left IFL and right med SFL, was found to be equally engaged in processing a route and a survey representation of space with significantly higher levels of activity than those recorded during the CT (Fig. 5a, b). These areas are frequently reported in fMRI studies of human navigation (Iaria et al. 2007; Nemmi et al. 2013b; Spiers and Maguire 2006) and our results demonstrate that they are equally involved in ST or RT of the same navigational environment.

Effect of the representation format during retrieving

We found a set of areas whose level of activity was affected by the representational format in which the learned path was retrieved. Namely, the MFL bilaterally and the left SFL were more highly activated by ST than RT. Since RT activity was greater than CT activity in both of these areas, we hypothesize that these areas process a survey representation of space and, to a lesser extent, a route representation. We also suggest that the bilateral MFL and left SFL might be part of a network that subtends the processing of allocentric information, which is required to develop and use a survey representation. The lesser activity recorded in this network when a route perspective is processed can be explained according to Montello (1998) who hypothesized that environmental features can be processed initially in both a route and a survey format and that the prevalence of one format over the other is due to several factors, such as task demands or cognitive style. Thus even when a route representation is required, environmental features are also processed in the corresponding survey format. Of course, this is a post hoc interpretation and additional studies directly assessing this particular hypothesis are needed.

Effect of the representation format used during learning

We observed an effect of the perspective used for learning on the left ITL and the right AG, with greater activation in both of these areas during SL, suggesting that they process environmental information to develop map-like survey representations.

Effective connectivity of PHG, RSC and AG

An effective connectivity analysis was performed to further investigate the role of these different areas in such a complex neural network. It showed that the above-reported areas, identified as the human brain system of spatial navigation (Boccia et al. 2014; Byrne et al. 2007; Nemmi et al. 2013b), interact with other brain regions only occasionally mentioned as involved in navigation and shape their activity according to task demands. Connectivity analysis was carried out on three main regions of the right hemisphere that were selected for their known core role in the human navigation network. Specifically, we chose the areas we found involved in the processing of spatial information acquired during intensive learning (i.e., the right PHG and RSC) and differentially activated by the learning format (i.e., the right AG). A recent model of the dorsal visual stream (Kravitz et al. 2011) supports the choice of these areas. This model hypothesizes the existence of three main sub-networks of the so-called where pathway. One of these, namely, the parieto-medial temporal lobe pathway, is thought to be critical in processing navigational information. The RSC, as well as the AG and PHG, is an important station in this neural network. When a route representation of a route-learned path was required, we found that the PHG affected activity in the right IFL, possibly because the right IFL is critical in updating first-person perspective frames of reference in solving the RT. Partially supporting this hypothesis, Shelton and Gabrieli (2004) found that activity in frontal areas was related to perspective shifting abilities during a navigational task, using both route and survey format. At the same time, when a route representation of a survey-learned path was required, the PHG affected activity in the right LG and FG. This suggests that the right FG and LG are crucial in transforming a survey representation of space to solve a route task. Actually, Nemmi et al. (2013b) found that activity in LG and FG was necessary to recall a particular sequence of landmarks (i.e., a route representation) in a highly familiar environment that could have been memorized in a survey format. On the other hand, when a survey representation of a route-learned path was required, the PHG affected activity in the right inferior occipital lobe. Here, we hypothesized that when this structure interacts with the PHG it is possible to build the mental map-like representation necessary to solve the ST. Corollary to this interpretation is the hypothesis that the influence of the PHG on the LG and FG, on one hand, and on the inferior occipital lobe, on the other, signals an interaction in the visuo-spatial system that is necessary to change the frame of reference (egocentric vs. allocentric or vice versa) or the representation format (route vs. survey or vice versa) of a given spatial perception. Although this hypothesis is tempting, it needs direct assessment in future studies.

The modulatory role played by the RSC seems to be more complex. When a route representation of a route-learned path is required, the RSC affected the activity in the right IFL, IPL and FG and left MOL and precentral gyrus. These results suggest the existence of a specific neural network involved in retrieving a route representation from a route perspective. They also suggest that this network includes brain regions whose activities are shaped by the RSC. When a route representation of a survey-learned path is required, the RSC modulates the activity of the right IFL opercularis, SPL, MOL, MFL, FG and LG and that of the left SPL, supramarginal gyrus and calcarine cortex. These results confirm some previous hypotheses about the neural network of human navigation and shed more light on the role of some other brain regions whose contribution to human navigation has not yet been established. Indeed, the RSC has been hypothesized to play a critical role in transforming environmental information from the egocentric format, stored in the posterior parietal lobe, into the allocentric format, stored in the medial temporal lobe, and vice versa (Byrne et al. 2007; Vann et al. 2009). Our results clearly demonstrate that by interacting with both the posterior parietal areas (i.e., SPL and supramarginal gyrus) and the medial temporal areas (i.e., FG and LG), this structure allows retrieval in a route perspective of a path stored in a survey perspective.

At variance with the former results, when a survey representation of a route-learned path was required, the RSC affected the activity of the bilateral FG. We hypothesize that in this case the RSC has a fundamental role because the transformation into a survey representation arises from the RSC interactions with the bilateral FG in the medial temporal lobe. When a survey representation of a survey-learned path is required, the RSC affects the activity in the right FG and SOL and in the left IFL, SPL and IPL and LG. These results suggest that a neural network subtends the storing and the retrieving of survey representations of the environment.

In addition, we explored the effective connectivity of the right AG. When a route representation was required, if the path was stored in a route format the AG affected the activity in the right LG and if the path was stored in a survey format, it affected the activity of the right SPL, confirming that this part of the parietal lobe plays a critical role in coding egocentric information and that its contribution is required when a route representation of an allocentric representation is required (Byrne et al. 2007). Instead, when a survey representation is required, the AG affects activity in the right posterior cingulum, because its influence is greater when the format of the path has to be shifted (i.e., it is greater for a route-learned than a survey-learned path). The posterior cingulate cortex is the core of the so-called retrosplenial complex (Epstein 2008) and is hypothesized to allow the transformation from an egocentric representation of the space (typically coded in the parietal lobe) to an allocentric representation (typically coded at the level of the medial temporal lobe) and vice versa (Vann et al. 2009). Our results suggest that this structure, which is influenced by activity in the right AG, makes it possible to use a stored route representation (acquired by means of RL) to solve a task requiring a survey perspective. This confirms the role of this structure in switching representation formats (allocentric vs. egocentric and vice versa; route vs. survey and vice versa).

Conclusion

In conclusion, our study shows that familiarization has a great impact on the network of cerebral areas related to spatial navigation. This impact takes different forms, depending on the kind of spatial representation used in the familiarization phase. Learning an environment in a route or a survey format leads to slightly different activations in the navigational network of the human brain. The type of representation used in retrieving environments may also correspond to slightly different brain activations. As shown by the connectivity study, the interaction between the kind of spatial representation used in the learning phase or the format in which the representation is stored, on one hand, and the kind of spatial representation required during the tasks, on the other hand, does not seem to be related to the activation of different sub-networks in the human brain navigational system but to the involvement of different brain areas in this network that are functionally connected to the navigational network.

The results of present study raise some questions that cannot be resolved with the current state of knowledge. For instance, if and how the visuo-spatial system was engaged in shifting the frames of reference or the representation formats is unclear. Further investigations are also needed to clarify and fully explain the contribution of the frontal areas in human navigation.

Another issue that remains open concerns the way in which gender may affect the psychophysiological mechanisms underlying human navigation. It is well known that gender affects spatial cognition both behaviorally (Piccardi et al. 2008, 2011) and physiologically (Gron et al. 2000). Although this topic is of clear relevance and we do not exclude a possible influence of gender on our tasks, the sample size of this study is too small to directly address this question. Further studies are needed to characterize the triple order interaction between learning and retrieving of spatial representation and gender.

Overall, the above-reported neuroimaging results provide evidence for activity in a set of areas that several previous studies (i.e., see review and meta-analysis in Boccia et al. 2014) have identified as the human brain system of spatial navigation. Although most of these areas seem to be equally involved in survey and route representations, we also identified some areas that are primarily involved in survey representations. The present results confirm Montello’s hypothesis that environmental features are processed in parallel in different formats as well as Tversky’s hypothesis about the feasibility of proficient shifting from one format to another.