Keywords

1 Introduction

In the study of explicit memory, episodic memory refers to the system of localizing events in specific time epochs. In comparison to semantic memory, which is knowledge not tied to one’s experience of time, it could mean “remembering” beyond merely “knowing” (Tulving, 1985). A double dissociation observed between episodic and semantic memory cases support the specialty of episodic memory, which has been evidenced widely in both clinical cases (De Renzi et al., 1987; Kitchener et al., 1998; Rosenbaum et al., 2005) and brain-imaging studies, including positron emission tomography (PET) and functional magnetic resonance imaging (FMRI) (Buckner et al., 2000; Tulving, 2002).

Based on the idea that episodic memory carries narratives of autonoetic experiences, there is a gap between all the information input and what is encoded in episodic narratives (Tulving, 2002). Concerning the gap, plentiful literature has investigated the determinants of whether one kind of external information can be encoded in and recalled from episodic memory. Multiple variables about the content that is memorized probably contribute to varied levels of episodic memory recollection (Vogt & Magnussen, 2007). There has also been rich evidence for the intuitive idea that memory fades with time, such as examinations of the forgetfulness of regularity over time ever since Ebbinghaus (1885) modeled his classic forgetting curve (Andermane & Bowers, 2015; Furman et al., 2007; Hu et al., 2013; Murre & Dros, 2015). Some features of participants like gender and age have been investigated as well (Grysman, 2017). Previously adopted methods to examine these variables range from tests with static images, separate words, sequential visual narratives, to transcranial magnetic stimulation (TMS) and the neuropsychological observations of clinical cases (Bayley et al., 2003; Hebscher & Voss, 2020; Magliano et al., 2017; Rosenbaum et al., 2008; Tang et al., 2016).

The present study is based on a previous study by Tang et al. (2016) named “Predicting episodic memory formation for movie events”. Their experiment investigated the relationship between near-real-life scenarios and episodic memory performance by evaluating participants’ performance of recognizing movie frames with different manipulated variables. They manipulated variables about the memorized content and the retention time between encoding and testing. The memory material adopted for this purpose are movie frames with multiple variables manipulated, such as with sound removed, temporal sequence reversed, or adapted into static frames. Their findings classified sound removal, temporal order reversal, and occlusion of 75% image area as manipulations that cause significant decrease in memory performance, so they are labeled “high-level manipulations”. The horizontal flipping of the frames and color removal led to no significant difference in performance and were categorized as two “low-level manipulations”. They also attempted to model episodic memory formation dependent on these evaluated variables.

Based on data collected in their experiment, the present study aims at replicating the previous result about the manipulations’ effect on recalling performance and further investigating the effect of time. First, as the previous result was analyzed with a two-sided non-parametric permutation test, we seek to obtain more solid evidence for the findings. Thus, we replicate the data analysis with Bayes Factor t test and ANOVA, and add an additional variable of motion, which was not analyzed in the previous research. We hypothesize that similar effects on the memory performance will be observed with each manipulation, and the additional manipulation of motion will cause significant effect, too. Second, the present research differs from the previous study in investigating the effect of retention time. Although Tang et al. (2016) collected experimental data from varied retention time, they did not analyze the effect of time across testing sessions. Nor have the existing research on the effect of time examined the interaction effect between time and the material manipulations on the episodic memory performance (Andermane & Bowers, 2015; Furman et al., 2007). Therefore, the present study examined the performance difference across varied retention time in each manipulated condition and their interaction effect with each manipulation. Given the previous literature supporting the memory decrease with time, it is hypothesized that we will observe worse performance in later time sessions than earlier ones (Andermane & Bowers, 2015; Furman et al., 2007; Hu et al., 2013; Murre & Dros, 2015).

2 Methods

2.1 Experiment Procedure: The Original Study

The original experiment by Tang et al. (2016) included 161 participants in total. The material adopted as the near-real-life movie stimuli was Episode 1 and Episode 2 from Season 6 of the TV series “24”. In their experiment, no participants had watched any episode prior to the encoding stage.

All participants went through an experiment procedure designed as described below: in the encoding stage, the participants watched a clip of the chosen movie. After a period of retention, they were tested on their performance in recalling the movie content. They were shown movie frames that are either studied frames that they had studied in the encoding stage or foil frames that are new to them. The frames were presented in pseudo-random order, and on each frame, participants were asked to determine whether it was frames that they were previously exposed to or new, foil frames. The number of studied frames and foil frames presented were equal, so they were equal in the probability of appearance, and the performance at chance level was 50%.

The original study conducted four experiments in total. First, to examine the validity of the experiment design, there was a main experiment testing the general level of memorability for the stimuli, which were shots and frames in movies in their design. This experiment showed a high rate of memorization for the stimuli even though the targets and foils were similar frames from two sequential episodes, Episode 1 and Episode 2, respectively. It supported the argument that the participants had memory performance above chance and below ceiling, leaving an ample range to investigate which variables contribute to varied levels of memory performance. Then, two variant experiments, Variant 1 and Variant 2 were carried out to test the generalizability of the conclusions from the main experiment. As the main experiment adopted frames from Episode 1 of the movie as the material for studied frames and Episode 2 as the material for foil frames, Variant 1 focusing on the effect of the difference among episodes. It switched the application of Episode 1 and Episode 2 and successfully replicated the findings in the main experiment, finding little influence from the variation in episodes. Variant 2 targeted the interference from the repetitive testing that each participant experience in multiple sessions. Although the test frames that appeared in the 6 test sessions are different, the same subjects went through repeated tests with frames from the same movie episode. As repeated exposure may lead to better performance in later sessions and act as a confound in the experiment, Variant 2 tested each subject only in one session. They found a small but significant result of lower performance than the former two experiments (79.2 ± 5.9% cf. 85.6 ± 5.3% in the main experiment). However, it is still consistent to the qualitative conclusions in the previous experiments: even in Variant 2, the performance remains higher than chance level (50%) and lower than ceiling level. Hence, they decided not to alter the experiment design and proceeded to Variant 3 to assess the effect of manipulations.

Based on these three tests, Variant 3 was conducted to test all the manipulated features of the memory stimuli. The manipulations they made to the movie stimuli include (1) removing the sound; (2) presenting static, single frames instead of video shots with motion; (3) flipping the frames horizontally; (4) replacing the color with grayscale shades; (5) reversing the temporal sequence that frames appear; and (6) occluding three randomly selected quadrant of the frame. They recorded the participants’ performance in the new/old determining test across six sequential sessions, each with increasing retention time since the encoding stage. The six sessions range from right after the encoding to around 24 h, 7 days, 30 days, 90 days, and one year later. Each subject participated in different number of sessions. Experimenters obtained the percentage of correct responses from 52 subjects in each combination of manipulation and session. The present study analyzes the data collected in this experiment.

2.2 Data Analysis: The Present Study

The data from experiment Variant 3 by Tang et al. (2016) were from 52 subjects and six sessions. Among them, we removed the data from six subjects who participated in two or less sessions. Only eight subjects participated in Session 6, and it is few compared to 16 participants in Session 5 and more of them in even earlier sessions. For the sake of larger subject size for the comparison across time sessions, the observations from Session 6 are not included in the analysis either. After these removals, the present analysis is based on a total of 165,860 responses from 46 subjects across five sessions. The later one session is, the longer is the corresponding retention time between encoding and testing. When a session is later, there is longer retention time between encoding and testing. We divided the number of correct answers with the total number of tests to obtain the accuracy of participants’ performance in each condition, testing six kinds of manipulations: sound removal, static frames, horizontal flipping, color removal, temporal reversal, and occlusion of random three-quarters of the image. Then, we analyzed the data with a Bayes Factor t test and a Bayes Factor analysis of variance (ANOVA) using R studio (Morey & Rouder, 2015; R Core Team, 2020).

First, we attempted to replicate the original study with Bayes Factor t tests. We compared the participants’ recognition accuracy between manipulated and not manipulated conditions, such as with or without sound, to examine the null hypotheses that the participants’ accuracies are the same between manipulated and not manipulated conditions. If the null hypotheses are rejected, the alternative hypotheses would be that there are significant effects of the manipulations on the participants’ episodic memory recalling performance.

Secondly, to test the effect of time in relation to different manipulations, we adopted the Bayes Factor ANOVA to compare the accuracy across five sessions, with different conditions combined. The data were examined against the null hypothesis that participants’ accuracies in the tests are the same in the five sessions. The alternative hypothesis would be that their accuracies are not the same over time, suggesting a significant effect of time on their memory performance. In addition to overall effect of the variable time, we tested the interaction effect between time and each variable with Bayes Factor ANOVA. We compared the accuracy across sessions in each 12 conditions (with/without manipulation * six types of conditions). We examined the data against the null hypothesis that in each condition, participants’ accuracies are the same across time sessions. If the null hypothesis were rejected, we would embrace the alternative hypothesis that the element of retention time has a significant interaction effect on the influence of the material-related variables.

3 Result

In the six Bayes Factor t tests on the six types of manipulations, the null hypothesis (H0) is that there is no difference in recognition accuracy between the manipulated and not manipulated conditions for each manipulation. For the manipulation of sound removal, the Bayes factor indicates evidence for H+. As is shown in Table 1, BF+0 = 4.27E+23, meaning that the response data are approximately 34.6 times more likely to occur under the circumstance of H+ rather than H0. It indicates strong evidence in favor of H+. The result is similar for the variables of motion (BF+0 = 2.64E+44), temporal reversal (BF+0 = 6.19E+17) and 75% image occlusion (BF+0 = 7.69E+48). The tests all provide strong evidence for H+, which is that these manipulations have significant effects on the participants’ accuracy in the recognizing test. According to the t test on the manipulation of horizontal flipping, BF+0 = 6.53, meaning that it is approximately 6.5 times more likely to obtain the data under H+ than under H0. This result indicates moderate evidence for H+. The tests on color removal indicate strong evidence in favor of H0: BF+0 = 0.09, which means that the data are about 11.1 times more likely to occur in the condition of H0 compared to H+.

Table 1 Bayes Factor t test and ANOVA

In the Bayes Factor ANOVA tests comparing the recognition accuracies across the five testing sessions, the null hypothesis (H0) is that the recognition accuracy is the same across all the time sessions. As illustrated in Fig. 1, this test indicates no evidence for H+. Specifically, BF+0 = 0.08, which means that the data are approximately 12.5 times more likely to occur under H0 than under H+, so it indicates strong evidence for H0.

Fig. 1
An error dot graph plots the mean and S E for accuracy across sessions. Minimum, median, and maximum values in sessions 1 to 5 are as follows. 0.680, 0.690, and 0.750. 0.679, 0.699, and 0.770. 0.680, 0.700, and 0.750. 0.665, 0.685, and 0.720. 0.630,0.670, and 0.700. Values are estimated.

The recognition accuracy across each time session

As for the Bayesian ANOVA on the interaction effect between time and the 12 manipulation conditions, the null hypothesis (H0) is that in each condition, the recognition accuracy is the same across all time sessions. As displayed in Fig. 3, tests in most of the 12 conditions indicates no evidence for H+, except the condition with the manipulation of 75% frames area occlusion. In particular, without the manipulation of sound removal (as in “sound_on”), BF+0 = 0.42, meaning that it is approximately 2.4 times more likely to obtain the data under H0, so it indicates anecdotal evidence in favor of H0. With the sound-removal manipulation (as in “sound_off”), BF+0 = 0.08, meaning that the data are around 12.5 times more likely to occur with H0, indicating strong evidence for H0. Similarly, with respect to the manipulation of motion removal, the tests indicate strong evidence for H0 with the manipulation as in “static_on” (BF+0 = 0.07) and moderate evidence for H0 without the manipulation as in “static_off” (BF+0 = 0.19). The tests for the manipulation of horizontal flipping showed strong evidence for H0 (BF = 0.04) with this manipulation as in “flip_on”. Without this manipulation, there is also anecdotal evidence for H0 (BF+0 = 0.37) as in “flip_off”. There is moderate evidence for H0 with the manipulation of color removal as in “color_on” (BF+0 = 0.22) or without this manipulation as in “color_off” (BF+0 = 0.28). There is anecdotal evidence for H0 with the manipulation of temporal reversal, without this manipulation, or without the manipulation of 75% area occlusion (“reverse_on”: BF+0 = 0.45; “reverse_off”: BF+0 = 0.6; “occlude_off”: BF+0 = 1.92). However, in the condition with the manipulation of 75% occlusion (as in “occlude_on”), BF+0 = 370,393.2, meaning that the data is over 370,000 times more likely to occur under H+ than H0, indicating strong evidence for H+.

According to a post-hoc Tukey Test comparing the recognition accuracies among the five sessions in the “occlude_on” condition, the accuracy in Session 3 is significantly higher than that in the other four sessions (p < 0.001), while there was no significant difference between other pairs of sessions (p > 0.5) (see Figs. 2, 3, and 4).

Fig. 2
Six error dot graphs plot mean and S E accuracy for different manipulations in sound on and off, static off and on, and flip off and on. The median value decreases eventually in all graphs. With sound on, the median value gradually decreases. The accuracy increases and finally decreases for others.figure 2

The recognition accuracy across each time session in each manipulation condition

Fig. 3
Six scatter plots of correlations. With sound and static manipulations, R = 79. With flip and gray manipulations, R = 0.82 and 0.73. With reverse and occlude manipulations, R = 0.79 and 0.68, respectively. All plots have an increasing trend.

Correlation between manipulations for each variable

Fig. 4
Ten scatter plots of the correlation between sound off and on, and static off and on manipulations in sessions i d 1 to 5. All plots have an increasing trend.figure 4figure 4

Correlation between manipulations for each variable in each session

4 Discussion

Regarding the six manipulations to movie content, our test results are supportive of the effects of the features of sound, motion, sequence reversal, and occlusion on the performance of episodic memory reflected in recognition tests. Meanwhile, the result provides evidence in favor of the effect of horizontal flipping, but the evidence is weaker. The result indicates no effect from color removal on episodic memory performance. The replication attempt validates the significant effect on memory performance of the elements rated as “high-level manipulations” in the original study, adding to it the manipulation of motion removal (Tang et al., 2016). Our analyses also support the finding that the effect of the two “low-level manipulations” were less prominent than the other elements regarding their interference with episodic memory formation and recollection. Hence, the present result is consistent with the original study, and the evaluation of the manipulations to the memory materials was successfully replicated.

Concerning the effect of time, the present study found no evidence for a change in episodic memory performance over varied retention time. As for the interaction effect between time and manipulations, the results indicate no evidence for varied performance across time in most conditions except for the condition with the occlusion manipulation. In this condition, the third session showed prominently better performance than the other sessions. Nonetheless, the results provided no evidence for a worsening performance over time: in the only condition in which the performance altered significantly with time, the fluctuation of performance did not exhibit a simple decrease with longer retention time, but one peak level in the middle of the sessions. In other words, there was an increase followed by a decrease. Moreover, the level of performance in the last session was not significantly different from the level in the earliest two sessions. Therefore, the results indicate no significant decrease across time sessions, against our hypothesis based on memory’s fading with time.

4.1 The Effect of Manipulations on Stimuli

Based on the experiment conducted through movie stimuli, these findings add to the growing body of literature on the influence of features of memorized materials on episodic memory encoding that have been acquired through different measures. For instance, the experiment of Vogt and Magnussen (2007) exemplified another feature of memory content that potentially influences the level of memory formation. In this study, researchers asked participants to discriminate studied pictures from distractors that were new to them, adopting images of different doors. Experimenters compared the distinguishing accuracy between original pictures and pictures with some extraneous details removed, and found the group experimented without the detail performing 20% worse. Hence, it is inferred that the object details facilitate pictorial memory encoding of the scenes by including richer recognizable information.

Vogt and Magnussen (2007) managed to select images that carry the same motif, doors, thus controlling the variable of image content. They could also edit the pictures to include or exclude irrelevant details, thus manipulating the exact evaluated variables artificially. Individual, static stimuli like words and images of faces, objects, or scenes are effective and straightforward choices as memory tests stimuli. This kind of stimuli is relatively easy for controlling involved variables. Each unit of material requires relatively short time or cost for encoding or testing, so the recalling process can be operated repeatedly to gain more reliable conclusions. However, it has been argued that it may not be convincible enough to generalize conclusions from separate words or static images to real-life events that involve sequential narratives, for the temporal and spatial information can be critical for natural context (Lee et al., 2020; Tang et al., 2016).

On the other hand, compared to individual stimuli in laboratory setting, inferring knowledge from the recall of real-life memories features a natural context but more challenges to the variables controlling. Multiple studies focusing on the autobiographical memories of amnesic patients have contributed to people’s knowledge about the brain areas related to episodic memory operations (Bayley et al., 2003; Rosenbaum et al., 2008). Nevertheless, due to the variety among people’s experience in their daily life, studying actual autobiographical memory is disadvantageous for controlling variables, such as the participants’ practice, exposure to cues, or reproduction appropriately (Tang et al., 2016). It is therefore difficult to study these stimuli systematically or include large number of tests.

Apart from empirical experiments and reference of neuropsychological cases, a more sophisticated measure has been transcranial magnetic stimulation (TMS), meaning stimulating neurons in different areas of the cerebral cortex with electric currents and observing the affected brain functions. It has been a recent argument for the possibility to examine understandings of episodic memory network via TMS non-invasively (Hebscher & Voss, 2020). Nevertheless, there may still be some time before this technology can be applied to episodic memory widely (Pascual-Leone et al., 2000).

Synthesizing the above-mentioned characteristics, an alternative approach that balances the representativeness of natural situations and the feasibility for manipulation and repetitive application would be preferable for investigating episodic memory properties in empirical experiments. Hence, we propose movie as an accessible and effective source of material. Researchers can obtain narrative memory materials from movies, ranging from audiovisual clips of videos to sequential frames and story plots.

The history of applying moving images to investigate memory recall can be traced back to existing literature by Boring (1916), who suggested that witnessing criminal events is like encoding movies into memory. Thus, by testing the witnesses’ capacity to report moving pictures, people can infer the reliability of their crime report.

Although the approach that adopts repetition of randomized, separate stimuli has prevailed among the episodic memory studies in a long period of time, recently, the utility of images containing narratives has been reported to increase (Lee et al., 2020; Magliano et al., 2017). For instance, an experiment asked participants to fill in gaps among sequential narrative images to explore implications of memory recalling and inferential processing (Magliano et al., 2017). Compared to individual words or images, they feature information about the temporal sequence, spatial environment, engaging plots, emotions, and so on (Tang et al., 2016). Apart from the “ecological validity” as Lee et al., (2020, p. 111) summarized, arguably, narratives stimulate brain structures and processes beyond what content without narratives stimulates. For instance, as for sequential images, a recent paper indicated a complexity in the bidirectional information transition between visual narratives and episodic memory (Cohn, 2020). Accordingly, despite the ubiquity of visual narrative usage, such as in instruction pictures and comic strips for children, their creation and understanding requires a proficiency that is obtained through complicated learning and brain functions. Hence, it is believed that movie cuts, a form of narrative containing both visual and auditory information, can be a kind of material that both simulate real-life events and provide convenience for variable controlling.

4.2 The Effect of Time Sessions

As for the second part, the results from the present study are unsupportive of the hypothesis that episodic memory performance worsens with longer time after encoding. Much previous empirical evidence has been in favor of this hypothesis, including those adopting similar stimuli, movies, or different material as memory content. For an example involving similar choice of stimuli, Furman et al. (2007) asked participants to watch a 27 min’ long movie and tested their performance in recall and recognition tasks and their metamemory confidence about events in the movie. They compared the test responses after a delay time ranging from three hours to nine months, and the results indicated worse performance corresponding to longer delay (Furman et al., 2007). As for experiments adopting different materials, the study of Andermane and Bowers (2015) compared participants’ accuracy in distinguishing studied images of static objects from foil images after one week’s retention time with their performance right after studying the images. Adopting thousands of images of different objects, they found a significant reduction in visual long-term memory recalling following the week’s retention, thus verifying the hypothesis. In comparison to them, the present analysis failed to obtain evidence for the memory decrease with time.

One probable contributor to this result lies in the methodological limitation of repetitive exposure. As has been explained in the methodology, the participants were repeatedly exposed to testing stimuli, which are movie frames from the same two episodes. Even if no frame has been presented to them more than once, and no feedback has been provided to them following any response, they may obtain cues from the events in the movie plot, other features of the narratives, or be more proficient in making similar judgments over repeated test sessions. They were likely to perform better over time due to the repetition alone, thus counteracting the effect of time that worsen the performance over time. The experiment Variant 2 has verified a small but significant effect of the repetitive exposure with the result of lower performance following an experiment design that tests each participant in only one session compared to the design with repetitive testing. Although the performance level with this limitation is still valid for examining the manipulated variables of the material, it may cause a quantitative difference that interferes with the investigation on the effect of retention time. Therefore, to obtain more valid measurements of the effect of time, a follow-up study could be conducted adopting the alternative experiment design in Variant 2 to eliminate the interference of repetitive exposure to the stimuli.

It is acknowledged that apart from the variables of material manipulation and retention time examined at present, another group of elements considered to vary the episodic memory recalling is the features inherent in the tested subjects. This type of factors includes the gender, age, cognitive abilities, and the participants’ knowledge prior to the concerned memory encoding. For instance, the experiment by Grysman (2017) exemplifies the difference in gender. His investigation indicates that female performs better than male in recalling autobiographical information details. It specified that females’ advantage lies at an early stage after encoding. After that period, the rates of memory decrease do not differ between females and males. Therefore, it is another limitation of the present study to not evaluate the effect of these potential confounders on the episodic memory performance.

5 Conclusion

Based on the response data collected from their experiment, the present study aimed at replicating the previous result about the manipulations’ effect on recalling performance and further investigating the element of time. First, as the previous result was analyzed with a two-sided non-parametric permutation test, we replicated the previous experiment with Bayes Factor t test and ANOVA, and added one variable, static or motion, which was not analyzed in the previous research. The variable evaluation of the original experiment was successfully replicated, indicating the significant effect of sound, reversal, occlusion, and the additional manipulation, motion. Second, the present research purpose differs from the previous study in the investigation of the effect of the retention time. Although Tang et al. (2016) collected experiment data from varied retention time, they did not analyze the effect of time (Andermane & Bowers, 2015; Furman et al., 2007). Therefore, the present study examined the performance difference across varied retention time in each manipulated condition, and the interaction effect. It indicated the effect of time not as significant as former research did.