Introduction

A key process thought to underlie memory consolidation is memory reactivation, also referred to as replay. Classical animal studies of replay have demonstrated that within hippocampal place cells, neuronal firing patterns that are present during learning are mimicked, or reactivated, during periods of post-learning sleep (e.g., Lee and Wilson 2002; Ji and Wilson 2007). In other words, these place cells “replay” the spatio-temporal sequence of neuronal firing that previously occurred during learning, thereby reinforcing or consolidating the memory traces. More recent studies have also found evidence of replay during periods of wakeful rest in rodents (Foster and Wilson 2006; Diba and Buzsáki 2007; Davidson et al. 2009; Karlsson and Frank 2009; Carr et al. 2011; Jadhav et al. 2012).

The rodent literature has delineated four hallmarks of memory reactivation: (1) it occurs throughout the brain, but especially in the hippocampus; (2) it correlates with later memory performance; (3) the temporal order of events is preserved in reactivation activity; and (4) the neural recapitulation is for specific memories. The human literature, using sophisticated fMRI techniques, has shown some, but not all, of these signatures, and typically not within the same study. Several studies have shown that encoding influences activity during a later brief rest period and that this rest activity correlates with subsequent memory performance (e.g., Tambini et al. 2010; Tambini and Davachi 2013; Schlichting and Preston 2014; Tompary et al. 2015; Gruber et al. 2016). However, only two studies have isolated recapitulation of specific memory traces (Deuker et al. 2013; Staresina et al. 2013), but neither showed item-specific reinstatement in the hippocampus. To our knowledge, no human studies have shown that the temporal order of events is preserved in neural signatures of memory reinstatement.

Here, we examined whether hippocampal signatures exhibited during post-encoding wakeful rest periods demonstrate the four hallmarks established in the rodent literature. We designed a novel paradigm in which participants encoded sequences of stimuli paired with semantically associated sound cues. The use of sequences provided leverage in measuring whether a human homologue exists for one of the hallmarks of rodent reinstatement, recapitulation of order. The sound cues provided both a bias signal and a timestamp that allowed us to directly compare portions of encoding to portions of post-encoding rest. Multi-voxel patterns of activation within the hippocampus were then extracted for specific sequences during the encoding and rest periods, and representational similarity analysis (RSA; Kriegeskorte et al. 2008) was used to measure pattern similarity. As in prior studies, we took higher pattern similarity scores to reflect stronger evidence of reactivation (e.g., Staresina et al. 2013; Tambini and Davachi 2013).

We predicted that the sound cues would bias which sequences were reactivated during rest and that hippocampal signatures of reactivation would correlate with subsequent memory performance. In addition, we predicted that participants would exhibit category-specific reactivation, depending on the type of stimulus sequence (faces or scenes). Finally, we predicted that when different sequences were compared, pattern similarity between distinct sequences would decrease across presentations, and this decrease in similarity would correlate with subsequent order memory, providing evidence that unique temporal sequences were reactivated within the hippocampus.

Materials and methods

Participants

A total of 30 healthy young adults (17 female, 13 male) between the ages of 18 and 32 (M = 21.53, SD = 3.13) participated in the present study. Participants were recruited from the Temple University community. All participants were right-handed with normal to corrected-to-normal vision. Participants had no history of psychological or neurological disorders and no MRI contraindications, as ascertained by self-report. Informed consent was obtained according to the guidelines of the Institutional Review Board of Temple University, and participants were compensated for participation.

Overview of experimental design

Our paradigm was adapted from behavioral studies investigating targeted memory reactivation (e.g., Rudoy et al. 2009; Oudiette et al. 2013), whereby sound cues are paired with encoded information and later used to bias reactivation towards some, but not all, of the memory traces. The study was comprised of two separate experimental sessions separated by 24 h. The fMRI scan took place during the first session, and the primary experimental task consisted of five total blocks (split across seven fMRI runs). First, a baseline rest scan was acquired, followed by two runs of one study set, a post-encoding rest scan, two runs of a second study set, and a final post-encoding rest scan. A schematic of the experimental design is depicted in Fig. 1. After the main task, two runs of functional localizer scans were acquired to localize brain regions sensitive to faces and scenes. Once participants were removed from the scanner, they completed a surprise memory test. Together, the complete first session lasted approximately 1.5 h. During the second experimental session, participants completed a set of delayed memory tasks, as well as a survey regarding the sounds played in the scanner. The second session lasted approximately 30 min.

Fig. 1
figure 1

Experimental design. MRI scanning occurred during the tasks within the shaded gray box. Order of the study sets (Study Set A vs. Study Set B) and rest sound types (cued vs. novel sounds) were counterbalanced across participants. Each encoding trial began with a fixation cross, followed by serial presentation of the three items in a triplet with a concurrent sound cue, followed by an odd/even judgment task. To index reactivation, multi-voxel patterns of activation were extracted from the hippocampus, and pattern similarity was computed between hippocampal activation elicited by a particular triplet at encoding and rest

Encoding task

Stimuli for the encoding task consisted of a total of 36 scene and 36 face (half male, half female) images. Face stimuli were color images of male or female faces with neutral expressions on a white background. Each person depicted in the images was wearing the same gray t-shirt, and images were cropped below the neckline. Scene stimuli consisted of color images of naturalistic, outdoor scenes, and none contained any people or faces. All images were sized 300 pixels by 300 pixels and projected to be 4 inches by 4 inches on the screen. The images were grouped into triplets, and each was comprised of either three unique faces or three unique scenes. There were a total of 24 triplets divided into two study sets (Study Set A and Study Set B), each containing 12 non-overlapping triplets. There were an equal number of face and scene triplets in each study set. All encoding took place in the scanner.

Triplets were presented according to a slow event-related design. Participants were instructed to learn the three items within a triplet, as well as the order in which they were presented. Each trial began with a fixation cross, presented in the center of the screen, for 2 s. Next, each item in a given triplet was presented sequentially in the center of the screen for 2 s (stimulus presentation = 6 s total). A semantically associated sound (e.g., an image of an open road presented with the sound of an engine revving for a scene sequence, or a face image presented with a sneezing sound for a face sequence) was presented with the first image in each triplet and played throughout the 6 s duration of the stimulus presentation period. Participants were instructed to try to use the sound cues to help them remember the triplets. Following each triplet presentation, a baseline task was presented for 10 s to allow the BOLD signal to rise and fall completely over the course of each trial. As is typical in slow event-related designs, this served as a “washout” period at the end of the trial to allow the hemodynamic response functions of different trials to be disentangled. During the baseline task, a series of integers was presented in succession, and participants were required to respond to whether each integer was even or odd. After the baseline task, the next trial began. During each study set, triplets were presented in a randomized order, with four repetitions of each triplet. The order of study set presentation (Study Set A vs. Study Set B) was counterbalanced across participants. In the rest of this paper, we refer to these study sets as cued encoding and uncued encoding, based on which type of rest period was subsequently presented (see below). Procedures were identical for both encoding periods.

Rest periods

Each rest period began with a black fixation cross in the center of the screen. A simple task, orthogonal to the encoding task, was used during the rest periods to ensure that participants maintained their alertness during these scans. Periodically, the fixation cross changed color from black to red, and participants were required to respond any time a color change occurred. In addition, during the rest periods, a set of 12 sounds was presented in a randomized order throughout the run, with two repetitions of each sound. Each sound was presented for 6 s, followed by a jittered inter-stimulus interval (ISI) of either 10, 12, or 14 s. The color change detection task occurred pseudorandomly after either 2, 3, or 4 sound presentations, yielding a total of 8 color change detections per rest period.

During one of the post-encoding rest periods (cued sound rest), the sound cues associated with the triplets encoded in the preceding study set were presented. During the other rest periods (baseline rest and uncued rest), 12 novel sounds, unrelated to the studied triplets, were presented. A different set of novel sounds was presented during baseline rest and uncued rest. All other aspects of these rest periods were identical to the cued sound rest. The presentation order of the post-encoding rest periods (cued sounds vs. novel sounds), as well as the novel sound set used for the baseline and uncued rest periods (novel sounds 1 vs. novel sounds 2) were counterbalanced across participants.

Functional localizer task

Two runs of a functional localizer task were acquired after the experimental runs. This allowed us to localize the fusiform face area (FFA), a region sensitive to faces, as well as the parahippocampal place area (PPA), a region sensitive to scenes. The localizer task was a modified version of the task created by Troiani et al. (2016), and consisted of a block design with three types of stimuli: faces, scenes, and scrambled images. There were an equal number of male and female faces, famous and non-famous faces, and famous and non-famous places. The stimuli were randomly selected from lists of 64 images per stimulus category. Importantly, the images were drawn from a different image pool than the studied triplet items. Images were presented sequentially for 750 ms each, with 250 ms ISIs. For each stimulus category, seven images were randomly presented with one stimulus repeat (i.e., the same image appeared twice in a row), yielding eight total trials. A run was broken into four “superblocks”, each comprised of two repetitions of each stimulus category block presented in a randomized order, interspersed with two 10 s fixation blocks. Participants were instructed to make a response any time the same image appeared twice in a row.

Immediate memory test

After participants were removed from the scanner, they completed a surprise serial position test. On each trial, the three items from a given triplet were presented on the screen side-by-side, with the labels a, b, and c displayed above them. Participants were required to indicate the correct order in which the items were originally presented by typing the letters corresponding to the labels for the first, second, and third items into a response window. One of the six possible order permutations was randomly chosen to display for each test trial. Study sets were tested separately, consistent with the order in which they were encoded. The functional localizer was placed after the second post-encoding rest period to add a delay between encoding and test of the second study set. Pilot testing did not reveal any differences in memory performance for triplets encoded during the first vs. second study set, even when the memory test immediately followed encoding of the second study set.

Timing of this test was self-paced. Two dependent measures were assessed: (1) episodic memory recall, which is the percentage of trials answered correctly for the cued and uncued study sets and (2) a cueing benefit index, which represents the direction and magnitude of potential targeted memory reactivation by subtracting accuracy on the uncued study set from accuracy on the cued study set. Positive values reflect enhanced memory for the triplets that were cued during post-encoding rest, a score of zero reflects no difference in performance across conditions, and negative values reflect that uncued triplets were better remembered than cued triplets.

Delayed memory tests

Participants came to the laboratory 24 h after the fMRI scan to complete a series of behavioral tasks. Participants completed two delayed memory tests. First, they completed a recognition task to test long-term memory of the items within each triplet. On this task, participants made a series of yes/no judgments in response to intact, rearranged, or novel triplet configurations. Twelve of the studied triplets were presented as intact triplet trials, 12 were triplets presented in a rearranged order, and 24 trials consisted of images that were presented during initial encoding, but were never presented clustered together as a triplet (novel trials). Participants first judged whether the three items presented on a given test trial had been clustered together and studied as a triplet during initial encoding. On trials in which participants indicated that the items on the screen had been studied as a triplet, they were prompted with a follow-up judgment asking them to decide whether the items were presented in the order that they were originally studied. Timing was self-paced.

For the initial recognition judgment (“Were these items studied together as a triplet?”), a measure of discrimination sensitivity, d′ (normalized hit rate − normalized false alarm rate), was used to index recognition performance. For the order judgments (“Are these items presented in the same order in which they were studied?”), the percentage of correct responses was used to assess order memory performance. This measure was chosen over d′, since the order judgments were secondary to the triplet recognition judgments, meaning that participants only made a decision about the triplet’s order if they endorsed the items as being studied together as a triplet. Therefore, it is possible that false alarms may have also arisen on trials where participants first endorsed recognizing a false lure triplet, complicating the calculation of false alarm rate. Thus, we instead assessed percent correct.

After the recognition test, participants once again completed the serial position test. The procedure was identical to the immediate memory test from session 1.

Sound cue survey

Finally, participants were given a survey to assess their levels of awareness regarding the sounds played during the rest periods. They were asked to answer questions about whether they paid attention to the sounds during the rest periods, whether they noticed anything about how the sounds during the rest periods related to other sounds during the scanner tasks, and whether they thought the sounds impacted their memory. This survey was administered to 22 of the 30 participants.

Image acquisition and preprocessing

MRI scanning was conducted at Thomas Jefferson University Hospital on a 3T Philips scanner with a 32-channel phased-array head coil. High-resolution T1-weighted anatomical images were collected using a three-dimensional magnetization prepared rapid acquisition gradient-echo pulse sequence (3D-MPRAGE). Salient imaging parameters were as follows: TR = 7.02 ms, TE = 3.19 ms, FOV = 256 mm2, flip angle = 9°, 1 mm slice thickness. Functional T2*-weighted images were collected using a gradient-echo planar pulse sequence with the following parameters: TR = 2000 ms, TE = 25 ms, FOV = 240 mm2, flip angle = 90°, 3 mm slice thickness. Each study set was presented across two runs, with each run consisting of two repetitions of the study list. This yielded encoding runs lasting 7.3 min (219 volumes per run). Each rest run lasted 8.5 min (255 volumes per run). After the experimental task, two 4.7 min runs (142 volumes per run) of functional localizer data were acquired using the same T2*-weighted imaging parameters. Total scan time was approximately 70 min.

Imaging data were preprocessed using FSL (Smith et al. 2004). The same preprocessing methods were applied to the encoding and rest runs. First, the data were corrected for subject motion using a six-parameter rigid body affine transformation, and then corrected for differences in slice timing acquisition using an interpolation algorithm. Next, participants’ functional images were co-registered to their respective T1-weighted anatomical images using boundary-based registration and normalized to MNI space for region of interest (ROI) definition and group-level analyses (see below) using a 12-parameter affine transformation. No spatial smoothing was applied for the pattern similarity analyses; however, the group-level contrasts for the functional localizer data were smoothed using a 5 mm FWHM isotropic Gaussian kernel.

Regions of interest definition

The primary ROI for this study was the hippocampus. Bilateral hippocampal ROIs were defined individually for each participant using FSL’s FIRST (Patenaude et al. 2011) segmentation algorithms. This process uses the high-resolution anatomical image registered in 1 mm standard MNI space to delineate the hippocampus based on known anatomical landmarks. Then, the hippocampal ROIs were transformed into subject-native space to extract patterns of activation during encoding and rest periods. Each ROI was visually inspected to ensure that there were no errors in the segmentation algorithm or native space transformation.

In addition, the FFA and PPA were defined bilaterally based on the functional localizer data. The FFA was defined by running a conjunction analysis to identify regions that showed greater activation in response to faces, compared to scrambled images and greater activation to faces, relative to scenes (faces > scrambled images ∩ faces > scenes). Likewise, the PPA was identified by the conjunction of scenes > scrambled images and scenes > faces. To ensure identification of the ROIs in every participant, the ROIs were first created in standardized space from conjunction analyses at the group level. Each contrast image (e.g., faces > places) was thresholded with a cluster-forming threshold of z > 2.50 and a cluster probability of p < 0.05.

Furthermore, to ensure that we isolated the FFA, and not other face-selective regions (e.g., the occipital face area), only activation that fell within the boundaries of the fusiform gyrus was included in the FFA ROI. Likewise, to ensure that we isolated the PPA, only activation that fell within the parahippocampal gyrus was included in the PPA ROI. These ROIs were then transformed into subject-native space. Given the proximity of the PPA to the hippocampus, overlapping voxels were excluded. Since there are known laterality differences with respect to function in each of the ROIs, we did not collapse across hemispheres in our analyses. For all ROIs, left and right hemispheres were analyzed separately.

Pattern similarity analyses

A slow event-related design was chosen for our experimental paradigm to give us the ability to model each encoding trial separately. The same general processing procedures were applied to both the encoding and rest runs. For every run, a separate GLM was constructed for each triplet (or sound cue) presentation to individually model activation associated with that specific triplet. For encoding runs, each triplet presentation was modeled, beginning at the onset of stimulus presentation, by convolving a 6 s boxcar function (representing the full duration of stimulus presentation) with a canonical hemodynamic response function. For rest runs, each sound cue presentation was modeled in the same fashion, beginning at the onset of the sound cue presentation and lasting for the 6 s during which the sound cue was presented. For both encoding and rest runs, the activation associated with all other stimulus presentations in that run was modeled in a single regressor by convolving the presentations of each stimulus with the canonical hemodynamic response. This regressor, along with the six motion parameters derived from FSL’s motion correction, were included in the model as nuisance regressors. This method has been used previously in an fMRI study of memory reactivation (Staresina et al. 2013) to test for reinstatement of specific memory traces. Finally, the z-scored parameter estimates associated with each individual triplet (or sound cue) presentation were extracted from the ROIs, yielding a vector representing the pattern of activation in that particular ROI, where each row reflects activation in a different voxel within the ROI.

Analysis of cued reactivation

To examine similarities in hippocampal patterns of activation across encoding and rest periods, RSA was performed on the z-scored vectors of hippocampal activation elicited by each individual triplet presentation. To examine presumed reactivation of a particular triplet during the cued sound rest period, the four presentations of the triplet that occurred within the cued study set were averaged together to form one cued encoding activation vector. Similarly, the two presentations of the triplet’s associated sound cue during the cued sound rest period were averaged together to yield a single cued rest vector. Then, a Pearson’s correlation was computed between the cued encoding and cued rest vectors to derive a measure of pattern similarity between the two patterns of activation. This procedure was repeated for each triplet, and then the pattern similarity measures were averaged across triplets to yield one cued encoding–cued rest pattern similarity measure for each participant. The same procedure was repeated, correlating patterns from each study set with the baseline rest and uncued rest patterns. To compute pattern similarity measures between the activation vector associated with a studied triplet and the activation vector of an irrelevant novel sound cue (i.e., from either the baseline rest or uncued rest periods), the sound presentations in these rest periods were assigned numbers one through 12, and since the sounds were presented in a randomized order, correlations were computed between corresponding number labels (e.g., encoding triplet 1 with novel sound 1).

The same procedure was used within the ROIs of the FFA and PPA. Here, we were particularly interested in potential category-specific reactivation in these regions. First, pattern similarity scores between the various encoding and rest periods were computed within the FFA and PPA ROIs in the same manner as the hippocampal ROIs. Pattern similarity scores for each triplet were then sorted based on encoding category, i.e., face triplet or scene triplet. This allowed us to test for evidence of targeted reactivation as a function of stimulus category in the FFA and PPA.

Sequence identity analysis

To examine evidence of the preservation of temporal sequences across periods of targeted reactivation, we applied methods used by Kalm et al. (2013) that have previously demonstrated the hippocampus’ role in the encoding of sequence identity. This analysis was restricted to triplets that were subsequently cued with their associated sounds during rest.

We computed between-triplet pattern similarity by correlating the activity patterns between the first presentation of every triplet, the second presentation of every triplet, and so on (e.g., triplet 1, presentation 1–triplet 2, presentation 1; triplet 1, presentation 1–triplet 3, presentation 1; yielding 66 total pairwise comparisons per presentation; see Fig. 2). These measures were then averaged across presentations to derive a single set of between-triplet similarity scores per subject. Change in pattern similarity across repetitions was computed by calculating the slope of the similarity measures averaged across participants using least-squares linear regression. Evidence of temporal sequence identity preservation would be characterized by a decrease in between-triplet similarity, meaning that hippocampal patterns associated with any given triplet became less similar to the other triplets over time. In addition, we fit slopes individually to each participant’s data to yield an individual differences measure reflecting the degree to which unique triplet sequence representations were preserved across individual participants.

Fig. 2
figure 2

Schematic of between-triplet pattern similarity computations. Unique triplet sequences were encoded across repeated presentations (left). Measures of between-triplet pattern similarity were computed for triplet presentations occurring across the cued encoding and cued rest periods. Between-triplet pattern similarity (right) was measured by computing pairwise correlations between hippocampal activation patterns associated with each unique triplet sequence across a given presentation timepoint

Statistical analyses

Statistical analyses were performed using SPSS (Version 23). Paired-samples t tests were used to compare differences in behavioral performance for cued vs. uncued triplets across memory tasks. Repeated-measures ANOVAs were employed to examine differences in evidence of targeted memory reactivation across encoding and rest conditions. Correlation analyses were used to investigate whether the neural signatures of targeted reactivation were related to subsequent memory performance.

Results

Behavioral results

Memory for cued and uncued information was tested immediately after the fMRI scan and again after a 24 h delay (see Table 1). While there was variability in performance across participants, all scores fell within 2.5 standard deviations of the mean for each behavioral task. Given that most tasks were presented after a full 24 h delay period, the variability in performance was not surprising. In general, memory performance did not differ as a function of stimulus category (i.e., face vs. scene sequences). Only one significant difference emerged. For uncued triplets in the delayed serial position test, participants remembered an average of one more scene triplet than face triplet [t(29) = 3.62, p < 0.001]. None of the other memory assessments revealed performance differences as a function of stimulus category (p > 0.20). In addition, sound cues did not significantly affect behavioral memory performance when examined at the group level. One’s ability to remember the serial position of stimuli after a brief delay or 24 h delay was unaffected by cueing [t(29) = 0.76, p = 0.46; t(29) = 0.35, p = 0.73, respectively]. Recognition memory after a 24 h delay was also unaffected by cueing [t(29) = 1.02, p = 0.32]. Individual differences in memory enhancement for cued information were examined by creating a cueing benefit index, which quantified the magnitude of potential memory benefits due to the cueing paradigm (discussed later).

Table 1 Mean memory performance

The sound cue survey was administered to 22 out of the 30 participants. Specifically, we examined differences in the cueing benefit indices as a function of whether or not participants reported paying attention to the sounds presented during the rest periods. Participants who endorsed paying little to no attention to the sounds during rest exhibited significantly higher cueing benefit indices for the recognition order memory test than individuals who endorsed paying a lot of attention [t(20) = 2.45, p < 0.02]. None of the other cueing benefit indices for the memory measures differed significantly based on whether or not participants paid attention to the rest sounds (p > 0.22).

Imaging results

Neural signatures of targeted reactivation

We first tested the hypothesis that sound cues would target preferential reactivation of the cued triplet sequences, such that pattern similarity in the hippocampus would be selectively enhanced between the cued encoding and rest set, relative to the uncued encoding and rest set, as well as relative to a baseline rest period. RSA methods were used to search for evidence of this in the hippocampus by comparing patterns of hippocampal activation during episodic memory encoding and periods of post-encoding rest. We performed this analysis on the level of individual triplets (see Fig. 1).

Pattern similarity scores were calculated between patterns of hippocampal activation associated with each triplet during the cued encoding period and the patterns representing the associated sound cue presentations during the cued rest period (cued encoding–cued rest similarity), between the cued encoding triplets and the patterns associated with the novel sounds presented during the baseline rest period (cued encoding–baseline rest similarity), between the triplet presentations during the uncued encoding period and the novel sounds subsequently played during uncued rest (uncued encoding–uncued rest similarity), and finally between the uncued encoding triplets and the baseline rest sounds (uncued encoding–baseline rest similarity).

Results are presented in Fig. 3. There was strong evidence of targeted memory reactivation in the left hippocampus, [F(3,87) = 5.13, p = 0.003], such that enhanced pattern similarity was exhibited between triplets during the cued study and rest periods (i.e., cued encoding–cued rest pattern similarity; see Fig. 3a). Pairwise comparisons specifically revealed that cued encoding–cued rest similarity was significantly greater than cued encoding–baseline rest similarity [t(29) = 3.97, p < 0.001], uncued encoding–uncued rest similarity [t(29) = 2.61, p = 0.01], and uncued encoding–baseline rest similarity [t(29) = 2.51, p = 0.02]. No other pairwise comparisons were significant (p > 0.22). Thus, our hypothesis was confirmed, since we observed heightened similarity between hippocampal patterns of activation specifically between the cued encoding and cued rest periods, but not in any other comparisons. The selectivity of the finding demonstrates that presentations of learned sound cues during a period of post-encoding rest were, in fact, able to bias reactivation towards the associated memory traces.

Fig. 3
figure 3

Neural signatures of targeted memory reactivation. Pattern similarity scores computed between multi-voxel patterns of hippocampal activation associated with specific triplet presentations across the various encoding and rest periods. a Left hippocampus. b Right hippocampus. ***p < 0.001, **p < 0.01, *p < 0.05, p < 0.07, n.s. not significant. Error bars represent standard error of the mean

In the right hippocampus, the overall ANOVA was not significant [F(3,87) = 1.71, p = 0.17; see Fig. 3b]. Given our a priori hypotheses, we followed up with pairwise comparisons that revealed marginally enhanced pattern similarity between cued encoding and cued rest, compared to cued encoding–baseline rest similarity [t(29) = 1.86, p = 0.07]. No other comparisons reached significance (p > 0.15).

Next, we investigated whether the neural signature of targeted reactivation correlated with behavioral performance outside of the scanner. To test this, we examined whether the cueing benefit index calculated for each of the memory measures was differentially related to presumed targeted memory reactivation (i.e., cued encoding–cued rest pattern similarity) vs. no targeted memory reactivation (i.e., uncued encoding–uncued rest pattern similarity). In regards to the immediate tests of memory, there were no significant effects (p = 0.07; see Fig. 4a). For the delayed tests, however, a positive relationship emerged between the cueing benefit index for order memory judgments on the recognition task and cued encoding–cued rest similarity in the right hippocampus [r(27) = 0.43, p = 0.02]. No such relationship was found in the control contrast [i.e., uncued encoding–uncued rest similarity in the right hippocampus; r(27) = − 0.04, p = 0.83; see Fig. 4b]. No other significant brain–behavior relationships were found (all p > 0.14). Taken together, these correlations suggest that targeted hippocampal reactivation has a positive impact on more long-term order memory.

Fig. 4
figure 4

Relationships between hippocampal reactivation indices and order memory cueing benefit. a Scatterplots depicting the relationship between presumed cued reactivation (cued encoding–cued rest similarity) in the left hippocampus and the cueing benefit index (memory for uncued triplets subtracted from memory for cued triplets) derived from performance on the immediate serial position test (left) and the relationship between the uncued reactivation index (uncued encoding–uncued rest similarity) and the immediate serial position test cueing benefit index (right). b Scatterplots depicting the relationship between presumed cued reactivation in the right hippocampus and the cueing benefit index derived from performance on delayed order memory recognition judgments (left) and the relationship between the uncued reactivation index and the delayed order memory recognition judgments (right)

Given that our strongest evidence of cued reactivation was found in the left hippocampus, but the positive correlation with delayed memory behavior was found in the right hippocampus, we conducted a post hoc follow-up analysis to investigate potential cued reactivation differences based on memory performance. We performed a median split based on participants’ overall performance on the delayed recognition memory judgments and found that top performers demonstrated some evidence of bilateral cued reactivation. Specifically, in the left hippocampus, top performers exhibited selectively enhanced cued encoding–cued rest pattern similarity compared to cued encoding–baseline rest similarity [t(14) = 3.80, p = 0.002], uncued encoding–uncued rest similarity [t(14) = 3.18, p = 0.007], and uncued encoding–baseline rest similarity [t(14) = 2.57, p = 0.02]. In the right hippocampus, top performers showed enhanced cued encoding–cued rest similarity relative to cued encoding–baseline rest similarity [t(14) = 2.29, p = 0.04] and uncued encoding–uncued rest similarity [t(14) = 2.39, p = 0.03]. Interestingly, the bottom performers did not show significant evidence of cued reactivation in either the left or right hippocampus (p > 0.08). Therefore, it is possible that the top performers could be driving the brain–behavior relationship found in the right hippocampus.

Many models of memory consolidation propose that the hippocampus interacts with sensory cortex to support systems consolidation (e.g., Marr 1971; Teyler and DiScenna 1985; McClelland et al. 1995; Squire and Alvarez 1995; Nadel and Moscovitch 1997; for a recent review, see; Squire et al. 2015). Based on this, we predicted that we would observe evidence of targeted reactivation in the FFA in response to face sequences, but not scene sequences, while in the PPA, we expected to observe targeted reactivation in response to scene sequences, but not face sequences. Results are presented in Fig. 5.

Fig. 5
figure 5

Category-specific reactivation. Pattern similarity scores computed between multi-voxel patterns of FFA activation associated with face sequences (left) and scene sequences (right) across the various encoding and rest periods. a Left FFA. b Right FFA. **p < 0.01, *p < 0.05

Within the FFA, we found no evidence of targeted reactivation in response to face triplets in the left FFA [F(3,87) = 1.50, p = 0.22], but some evidence in the right FFA [F(3,87) = 2.96, p = 0.04]. Follow-up pairwise comparisons revealed that specifically in response to face stimuli, the cued reactivation index in the right FFA, measuring cued encoding–cued rest similarity, was significantly enhanced relative to cued encoding–baseline rest similarity [t(29) = 2.97, p = 0.006] and uncued encoding–baseline rest similarity [t(29) = 2.20, p = 0.04]. There was, however, no significant difference between cued encoding–cued rest similarity and uncued encoding–uncued rest similarity [t(29) = 0.95, p = 0.35]. No other differences emerged (p > 0.10). In response to scene triplets, there was no evidence of targeted reactivation in either the left [F(3,87) = 1.48, p = 0.23] or right FFA [F(3,87) = 0.29, p = 0.83]. Together, these findings provide some evidence of category-selective reactivation of face stimuli, but not scene stimuli, in the right FFA.

In the PPA, no significant evidence was found for selective targeted reactivation in response to scene sequences in either the left [F(3,87) = 1.85, p = 0.14] or right PPA [F(3,87) = 0.83, p = 0.48]. In response to face triplets, there was no significant evidence of targeted reactivation in the left PPA [F(3,87) = 1.19, p = 0.32]; however, there was weak evidence in the right PPA [F(3, 87) = 2.69, p = 0.05]. Follow-up pairwise comparisons revealed that this effect in the right PPA was driven by selectively heightened cued encoding–cued rest similarity, compared to cued encoding–baseline rest similarity [t(29) = 2.15, p = 0.04]. No other significant differences emerged (p > 0.08). These findings do not support the prediction of scene-selective reactivation in the PPA.

Together, this set of results partially confirms our hypothesis regarding category-specific reactivation, since we found some evidence of face-selective reactivation in the FFA, but no evidence of scene-selective reactivation in the PPA.

Preservation of temporal sequences

Across the animal literature, one of the core components of offline reactivation is the sequentially ordered recapitulation of memory traces. Therefore, we predicted that when temporally ordered sequences were repeated, hippocampal patterns associated with a particular sequence would become more unique, i.e., more dissimilar to other sequences, over time. Thus, we calculated a measure of between-triplet similarity by computing pairwise comparisons between every pair of triplet sequences for each presentation timepoint during the cued encoding and cued rest periods (see “Materials and methods” for further details). Then, we assessed changes in between-triplet pattern similarity over time by fitting a regression slope to the group data. Results are presented in Fig. 6a. In the left hippocampus, a significant decrease in pattern similarity was observed across presentations [t(29) = 7.25, p < 0.0001]. Similar results were obtained in the right hippocampus [significant decreasing slope: t(29) = 7.52, p < 0.0001]. These results suggest that the pattern similarity of each unique triplet sequence did, in fact, become less similar to the other triplets across presentations, providing some evidence of preserved temporal sequences within the hippocampus.

Fig. 6
figure 6

Preservation of temporal sequences. a Between-triplet pattern similarity measured across triplet presentations during the cued encoding and cued rest periods. Between-triplet similarity indexes representational similarity between the different triplets at each repetition. Presentation numbers 1–4 represent the cued encoding presentations, while presentation numbers 5 and 6 represent the cued rest presentations. Each data point reflects a particular participant. Left: left hippocampus. Right: right hippocampus. b Scatterplot depicting the relationship between the slopes fit to each participant’s between-triplet similarity data, reflecting the magnitude of decreasing between-triplet similarity across triplet presentations during the cued encoding and cued rest periods, and the cueing benefit index derived from the delayed serial position test

Next, separate slopes for between-triplet similarity were fit to each participant’s data individually, and these slopes were then correlated with the behavioral cueing benefit indices. The only brain–behavior relationship that emerged was between the cueing benefit derived from the delayed serial position test and between-triplet similarity in the right hippocampus [r(27) = − 0.46, p = 0.01; see Fig. 6b]. This suggests that participants who exhibited a greater memory boost for cued triplets also demonstrated stronger decreasing similarity between individual triplet sequences over time. Therefore, the more unique particular triplet sequences became over time, the more participants’ order memory benefited, selectively for cued triplets.

Univariate analysis

To rule out the possibility that our results were simply a representation of increased activation in response to familiar sounds in the cued rest period relative to the uncued rest period, we conducted a follow-up univariate analysis to assess hippocampal BOLD activation in each of these conditions. During rest, there was no difference in univariate activation in response to cued sounds compared to novel sounds in either the left [t(29) = 0.37, p = 0.71] or right [t(29) = 0.80, p = 0.43] hippocampus. Therefore, our findings do not seem to be merely the result of the familiarity of the cued sounds played during rest.

Discussion

A key mechanism supporting systems consolidation is memory replay, or reactivation, first observed in rodents during post-learning sleep (e.g., Wilson and McNaughton 1994; Ji and Wilson 2007) and later, during wakeful rest (e.g., Foster and Wilson 2006). In humans, fMRI has been harnessed to demonstrate that activity during post-encoding rest or quasi-rest periods, measured by either functional connectivity or multi-voxel patterns of activation, is influenced by processing that occurs before rest (Tambini et al. 2010; Deuker et al. 2013; Staresina et al. 2013; Tambini and Davachi 2013; Vilberg and Davachi 2013; Schlichting and Preston 2014; Tompary et al. 2015; Gruber et al. 2016). However, only two studies have shown something that approaches the granularity of reactivation as measured in rodents. In these two instances, item-specific recapitulation was demonstrated (Deuker et al. 2013; Staresina et al. 2013). Here, we took this a step further by testing sequential memory, thereby allowing us to make parallel linkages with the rodent literature that has shown ordered reinstatement of specific sequences.

Within the hippocampus, we found evidence of biased reactivation, which was selectively enhanced for memory traces that were targeted for reactivation during offline rest, both compared to uncued traces, and compared to a baseline rest period. Evidence of targeted reactivation was particularly strong in the left hippocampus, while weaker, albeit similar, evidence was found in the right hippocampus. Importantly, there was a positive relationship between hippocampal signatures of reactivation in the right hippocampus and enhanced accuracy for the temporal order of cued sequences on a delayed recognition task, corroborating previous findings in which increased post-encoding reactivation was related to better memory performance (Deuker et al. 2013; Staresina et al. 2013; Tambini and Davachi 2013; Schlichting and Preston 2014; Gruber et al. 2016). Given that no significant relationship was found with immediate memory, it is possible that the full benefits of enhanced targeted reactivation may not manifest until after a 24 h delay, supporting claims that offline reinstatement of newly learned information may be an important initial stage of memory consolidation. However, it is worth noting that the test formats differed between immediate and delayed memory tasks, with potentially uneven sensitivity of taxing memory for temporal order. We chose to only present the recognition test after the 24 h delay to introduce an independent memory assessment that would not be influenced by potential test–retest practice effects. It is possible we would have observed a correlation with immediate memory if the same test was administered at both sessions.

We further hypothesized that we would find neuronal evidence of category-specific reactivation. This hypothesis was driven by predictions made by several models of systems consolidation (e.g., Marr 1971; Teyler and DiScenna 1985; McClelland et al. 1995; Squire and Alvarez 1995; Nadel and Moscovitch 1997; for recent reviews, see; Squire et al. 2015; Kumaran et al. 2016) and by prior studies showing reactivation in category-specific cortex (e.g., Tambini et al. 2010; Wimmer and Shohamy 2012; Schlichting and Preston 2014). Using the same RSA techniques for ROIs in the FFA and PPA, we observed face-selective reactivation in the FFA. However, no evidence of scene-selective reactivation was observed in the PPA, thus providing partial support for our hypothesis.

Finally, we examined potential effects of temporal sequencing and expected to see preservation of unique temporal sequences within the multi-voxel patterns of hippocampal activation. As predicted, over time, the patterns associated with a given triplet sequence became more distinctive from patterns associated with the other sequences, suggesting that increased exposure leads to increasingly unique representations in the hippocampus. Furthermore, an inverse relationship was found between the magnitude of each participant’s decreasing between-triplet similarity in the right hippocampus and enhanced recognition for the temporal order of cued triplets. Thus, the more unique particular sequence representations were within the hippocampus, the more accurately participants were able to remember the temporal order of the sequences. Together, these findings lend support to the claims that reactivation of specific temporal sequences occurs in the hippocampus during offline rest periods.

Limitations

A few limitations should be noted regarding the present study. The first limitation has to do with the use of sound cues in our paradigm. Sound cues have previously been used to bias the content of presumed memory reactivation (e.g., Bendor and Wilson 2012; Rothschild et al. 2017; Rudoy et al. 2009; Oudiette et al. 2013). Biasing reactivation through these techniques may function to tag particular traces for enhanced reinstatement. In our paradigm, for example, hippocampal signatures of targeted reactivation were associated with the relative memory enhancement of targeted items, not overall memory performance. This biasing may serve to “reweight” particular memory representations, therefore influencing what is and is not subsequently remembered (Kumaran et al. 2016). However, we cannot completely rule out the possibility that our measure of targeted reactivation was detecting something other than replay. While we did carefully isolate our analysis windows to time periods associated with stimulus presentations, it is possible that the neural signatures could be reflecting something other than memory processing, such as the simple tracking of sound information. Since the targeted reactivation index was comprised of sound cues that were repeated across encoding and cued rest, compared to the uncued reactivation index that reflected studied sounds compared to novel sounds, it is possible that our finding of enhanced reactivation could reflect encoding of familiar sounds. Several additional analyses mitigate this possibility. First, if the enhanced reactivation was solely an index of sound encoding, we would not expect to see any relationship to subsequent memory performance, especially after a 24 h delay on a memory test devoid of the familiar sound cues. Second, the evidence of category-specific reactivation in the FFA further suggests that our findings are not due to sound encoding in general, as differences were found for specific categories of to-be-remembered triplets. Finally, no univariate differences were found with respect to hippocampal activation in response to familiar vs. novel sounds during rest; therefore, our results do not seem to reflect a general increase in activation in response to familiar sounds. Together, our pattern of results rules out this possibility.

Related to this, it is important to consider whether the presentation of sound cues during rest triggered active memory rehearsal. We were concerned about this possibility, so we prevented participants from rehearsing by having them perform a color change detection task during rest periods. In addition, participants took a post-scan survey that assessed their awareness and meta-knowledge of the sound cues. Most participants surveyed reported that they paid little or no attention to the sounds presented during rest. Thus, it is unlikely that the observed effects were due to overt rehearsal.

Additionally, the observed behavioral effects (e.g., differences in overall memory performance based on cueing) did not reach statistical significance. It is important to remember that cues, whether presented during sleep or wakefulness, are simply the starting point for the reactivation of particular memory traces. Once reactivated, a given memory can be strengthened, weakened, or obliterated (Oudiette and Paller 2013). It is likely that the context in which the cue occurs influences effects on subsequent retrieval. For instance, when the cues occur in concert with a mentally demanding task, memory may be harmed instead of bolstered (e.g., Diekelmann et al. 2011; James et al. 2015). The nature of our design may have unintentionally blunted potential cueing effects, since cues occurred while subjects were engaged in a task that imposed modest perceptual and cognitive demands. It is also possible that the within-subjects study design may have decreased our power to detect more subtle behavioral effects. Similarly, brain–behavior relationships were observed in some, but not all, memory measures. Our sample size of 30, although standard in the neuroimaging literature and based on a priori power calculations, was potentially too small to give rise to multiple brain–behavior correlations.

Finally, we would like to note that the nature of post-encoding rest periods varies across studies of memory reactivation, ranging from passive rest (e.g., Tambini et al. 2010; Tambini and Davachi 2013; Schlichting and Preston 2014; Gruber et al. 2016), to orthogonalized tasks, such as odd/even judgments or arithmetic tasks (e.g., Staresina et al. 2013; Tompary et al. 2015). Future research should examine how the nature of the rest periods themselves influences reinstatement during post-encoding rest periods.

Conclusions

Daily life is filled with sounds and sights that have the potential to reactivate remote memories, thereby rendering them labile and open for modification. The instability of engrams allows our mental landscape to evolve with the changing statistics of our environment. The mechanism underlying this process may rely on the ability of cues to trigger the completion of an entire memory trace. Our findings show that a trivial event—a sound cue—can trigger the reactivation of specific memory sequences in the hippocampus. This enhanced reactivation was related to differences in subsequent memory, suggesting that preferential reactivation may be a mechanism by which specific memory traces can be strengthened for enhanced subsequent memory retrieval and consolidation.