Introduction

Embodied and grounded cognition theories (Barsalou, 1999, 2008) state that sensorimotor information plays a central role in cognitive functioning. Over recent decades, an increasing number of studies have been dedicated to this hypothesis and demonstrated the influence of the sensorimotor system on high-level cognitive functions such as language, reasoning and memory (see for example, Danker & Anderson, 2010; Gallese & Lakoff, 2005; Tomasino & Gremese, 2016). Memory in general is undoubtedly linked to action and sensorimotor activity. To describe human motor control and learning based on perceptual experience, numerous models have regarded action as the beginning of higher-level cognitive processes, from William James’s Ideomotor Theory (1890) to the modern Theory of Event Coding (Hommel et al., 2001). More directly related to declarative knowledge, functionalist models of memory (also referred to as multiple-trace models, e.g. Briglia et al., 2018; Versace et al., 2014; see also Hintzman, 1984, 1986) conceptualize memory functioning as the integration and reactivation of emotional, sensorial and motor experiences. Thus, contrary to the classic structuralist approach (e.g. Tulving, 2001), functionalist models do not focus on distinctions between various long term memory sub-systems but rather on gradual construction of knowledge from elementary stimulations. In this view, sensorimotor primitives are the basis from which declarative knowledge is made and reenacted. Memory encoding corresponds to the integration of a particular sensorimotor pattern, while retrieval corresponds to its online reenactment. Many experimental works support this hypothesis and have shown how the recollection of memories entails the reenactment of past sensory experiences (see, for example Brunel et al., 2009, 2013; Cortese et al., 2019; Rey et al., 2015).

Regarding the specific contribution of the motor system to declarative memory, several results can be considered. First, the positive role of motor activity in action-related memory tasks was understood before the emergence of embodied cognition theories. Therefore, motor enactment is a well-known feature of memory functioning that was originally described in the early 1980s by several independent research teams (Cohen, 1981; Engelkamp & Krumnacker, 1980; Saltz & Donnewerth-Nolan, 1981). Traditionally, it involves comparing memory performances for short action sentences (e.g., “Cross your fingers”, “Sharpen the pencil”) learned simply through reading or mimicking the action gestures (i.e., a “Subject-Performed Task” condition; for a review, see Engelkamp, 2001; Nilsson, 2000). Better memory performance is usually observed following the Subject Performed Task condition rather than the control-reading condition. The enactment effect is regarded as reliable and occurs in various experimental designs (e.g., recognition and free recall, with intentional or incidental learning). However, how exactly it enhances memory performance is still under debate. In particular, the extent to which the motor system contributes to the enactment effect is unclear. On the one hand, performing an action rather than merely reading it involves a greater number of encoding modalities (Bäckman et al., 1986; Engelkamp & Zimmer, 1984). Thus, sentence reading activates only verbal and visual encoding, whereas physically performing a task also fosters motor encoding. On the other hand, the Subject Performed Task condition enhances the relation processing of action’s items. Hence, the conceptual and semantic processing is improved in Subject Performance Task compared to reading (Kormi-Nouri & Nilson, 2001) perhaps in order for the participant to correctly plan and execute the activity (Koriat & Pearlman-Avnion, 2003). For example, performing the action “Take the fork and put it left of the plate” involves deep processing of the relationship between “fork” and “plate” to comprehend their interaction in the present context and therefore correctly execute the action. In contrast, more superficial processing could be sufficient for merely reading a sentence, as this can be completed through a verbatim transformation of written information to speech and does not require deep processing of how “fork” and “plate” interact in the given situation.

To assess which of motor coding or items relation processing is decisive for enactment effect, some studies have compared performances obtained following a Subject Performed Task condition and action observation. Notably, learning can be realized following an “Experimenter Performed Task” condition, i.e., asking the participants to learn action sentences performed in front of them by the experimenter. Most of the time, memory performance in an Experimenter Performed Task condition is better than that in a control-reading condition and similar to that found in a Subject Performed Task condition (Engelkamp & Dehn, 2000; Feyereisen, 2009; Hainselin et al., 2017). Quite surprisingly, such results are often interpreted as evidence that motor coding is not responsible for the enactment effect. For example, Steffens (2007) proposed that observing complex goal-directed actions improves encoding compared to written information because it allows participants to better focus on items relation processing (for a similar account, see Kubik et al., 2014; Schult et al., 2014; Steffens et al., 2015). An underlying assumption is that contrary to Subject Performed Task, Experimenter Performed Task do not involve the motor system and therefore should not lead to similar performances if motor coding contributes to memorization.

In our view, this conceptualization is problematic because it does not account for considerable evidence showing that action observation involves the activity of the motor system (see, for example, Avenanti et al., 2013; Cracco et al., 2018; Grèzes & Decety, 2001). From a neurophysiological perspective, action observation clearly elicits activation of the mirror neuron system (Rizzolatti & Sinigaglia, 2016). Recently a large-scale meta-analysis (Hardwick et al., 2018) confirmed overlapping activation for executed and observed actions in a network including premotor, parietal, somatosensory and subcortical brain areas. Accordingly, Jeannerod (2001) proposed that the understanding of perceived actions is based on motor simulation—i.e., a covert activity of the motor system—as opposed to overt activity, which corresponds to actual movement realization. Thus, action observation involves processing motor information that is likely to be encoded for memory purposes. Considering the close relationship between executed and observed actions it is not surprising both lead to rather similar performances. Even if it does not rely on overt motor activity, the Experimenter Performed Task condition cannot be regarded as a free-to-motor-coding condition. Therefore, it does not rule out the motor coding account of enactment effect in classical studies about Subject Performed Task. More directly related to this work, there is also no evidence that motor simulation is not, at least partly, responsible for memory improvement when Experimenter Performed Task is compared to reading.

The present work aimed to determine how action observation improves action-related memory performance. While we do not question that items relation processing has a role in certain context, we do wonder if other processes might come into play. Arguably, Experimenter Performed Task studies not only favor the items relation processing but also elicit the encoding of covert motor activation. Hence action observation come with activation of premotor and motor areas (Hardwick et al., 2018) and elicit implicit motor imagery (Jeannerod, 2001). Compared to read action, observed action would therefore contain another coding modality likely to improve memory performance. Such reasoning is indeed quite similar to the original account of enactment effect proposed by Engelkamp and Zimmer (1984) except that it regards coding of covert rather than overt motor activity. To assess whether such motor simulation contributes to improve memory performance, it is crucial to avoid any influence of items relation processing at encoding. To do so, we used action observation as an encoding condition for isolated action verbs rather than complete action sentences or complex goal directed actions. Indeed, relation processing is only possible when an action involves several items on which the participants can be more or less focused. In such situations, it is assumed that observation leads to better encoding than reading because it is easier to process the items’ relationship (Steffens, 2007; Steffens et al., 2015). However, when action depicts only one isolated verb, no relation processing can be improved. As a result, better memory performance for isolated action verbs in observation condition entails that other processes come into play. In contrast, if isolated actions and written definitions lead to similar performances, this would suggest that only complex actions with several items improve memory encoding and argue for an item relation processing-based account only.

Another specificity of the present work is the material used to depict observed actions. Contrary to Experimenter Performed Task studies, actions were not realized in front of the participant by an experimenter but rather displayed with short video clips of point-light human movement. Movement depictions using point-light displays provide various advantages over Experimenter Performed Task. Thus, they are quickly and easily identified (Johansson, 1973; for a review, see Blake & Shiffrar, 2007; Pavlova, 2012) and have been successfully used to investigate the close relationship between action observation and language processing (Beauprez & Bidet-Ildei, 2018, 2019; Bidet-Ildei & Toussaint, 2015). From a methodological point of view, point-light display video clips allow for better standardization of the observed actions than Experimenter Performed Task. Nevertheless, point-light movements can be modified to investigate the role of specific features of action. For example, Beauprez and Bidet-Ildei (2018) used modified point-light displays in a semantic decision task on action verbs and found that biological point-light movements influence performance, whereas nonbiological point-light movements (i.e., dots with similar spatial trajectories but modified kinematics) do not. Modified kinematics present an innovative way to investigate how actions are processed and encoded for memory purpose. Hence, biological point-light displays represent normal movements and are therefore processed through the motor system. At the opposite, it has been proposed that non-biological point-light displays do not elicit such processing (Beauprez & Bidet-Ildei, 2018; Bouquet et al., 2007; Martel et al., 2011). As a consequence, if the encoding of sensorimotor information is involved in memory, then it is expected that performance will be better when observing biological movements (i.e. normal kinematics) than when observing nonbiological movements.

In Experiment 1, we assessed whether the observation of point-light human actions improves memory performance for congruent action verbs. We hypothesized that recall would be better when action verbs were learned with a congruent point-light action rather than with a congruent written definition. In Experiment 2, we directly addressed the role of a particular feature of observed action in memory. The kinematics were modified (i.e. inversed velocity) in half of the point-light displays. Better recall performance was expected for verbs encoded with a biological action (i.e., point-light display with a normal kinematic) rather than those with a nonbiological action (i.e., point-light display with a modified kinematic).

Experiment 1

Methods

Participants

A priori calculation of the sample size was made with G*Power (Faul et al., 2007) based on Cohen’s (1988) recommendation for a medium effect size (f = 0.25). G*Power indicated that 54 participants should be sufficient to detect such an effect. Eventually, 62 French-speaking participants (age range 20–40 years; mean = 24 years; 45 women) participated in the online experiment. Most of them were university students or young workers. All reported French as their native language. None declared psychiatric or neurologic history, drug use, or learning disorder. They were recruited through social networks or mailing lists. The study conformed to the Declaration of Helsinki and was approved by the University of Poitiers Ethics Committee. The presentation messages sent to request participation mentioned that participants would rate how point-light videos or written descriptions fit with different verbs. No information about the following memory task was provided. All participants gave their informed consent before the experiment commenced.

Materials

We used thirty-two action verbs as stimuli for the learning task. All were in the French infinitive. Their lexical frequency and length were assessed using the LEXIQUE online database (New et al., 2001). For all verbs, we selected a point-light display that depicted the equivalent action (e.g., for the verb “to walk,” a point-light display of a walking man) and a written definition from the dictionary (e.g., for the verb “to walk,” the written definition “to move at a regular pace by lifting and setting down each foot in turn”). The point-light displays came from the PLAViMoP online database and are freely available at the following website: https://plavimop.prd.fr/en/motions. Dictionary definitions were taken from the LAROUSSE online French dictionary at https://www.larousse.fr/. When several definitions were given, we retained the one that fit, as closely as possible, the action depicted by the equivalent point-light display. Finally, we created two versions of an online experiment on Limesurvey (https://www.limesurvey.org/fr/). Both contained 16 verbs associated with their corresponding point-light display and 16 associated with their written definition. The verb-modality association was counterbalanced, i.e., each verb was associated with a different modality depending on the questionnaire version (see the supplementary material). Moreover, 8 distractors were added to the task to ensure that the participants were focused and did not provide random answers. Contrary to the normal items, the distractors were associated with incongruent point-light displays (e.g., the verb “to lie down” with a point-light display of a man bouncing a ball) or incongruent written definitions (e.g., the verb “to laugh” with the written definition “to move from a higher to a lower level, typically rapidly and without control”—a definition that normally corresponds to the verb “to fall”). Half of the distractors were incongruent verb-point-light display items, and the other half were incongruent verb-written definition items. They were not incorporated into the free recall analysis but were used as exclusion criteria (see the data analysis).

Task and procedure

When participants clicked the link for the survey, they were randomly assigned to one of the two versions. The instruction related to only the categorization task, which was in fact also an incidental learning phase. The participants believed that they would need to rate only the fit between verbs and their depictions through point-light displays or written definitions. They were also instructed to complete the tasks alone, in sequence and far from any possible noise or disturbance. During the learning phase, action verbs were randomly displayed. Half of them were simultaneously shown with a written definition, and the other half came with a point-light display. The participants rated the extent to which a written definition or point-light displays accurately described a verb with a 4-point Likert scale. A rating of 1 corresponded to a very bad description, a rating of 4 corresponded to a very good description. Subsequently, personal data related to the participants were collected. This phase also acted as a distractive task. The times spent in the learning and distractive phases were collected and used as an exclusion criterion (see the data analysis). As soon as these phases ended, the participants then began an unexpected free recall task. They were instructed to write as many verbs as they could remember from the previous rating task they had completed, without time limitation. The data are available at https://osf.io/894uh/.

Results

Data analysis

Two types of data were used as exclusion criteria. First, it was necessary to exclude participants who likely completed the categorization task with random answers. For this purpose, we analyzed the responses provided to the eight distractor items, which contained incongruent verb-modality descriptions. Participants who rated at least one distractor as “very good” or “rather good” were excluded from the subsequent analysis (N = 2). Second, we analyzed the times spent in the learning and distractive phases to control for their durations as much as possible. Mean learning time by item (16 s, SD = 14) and distractive task duration (127 s, SD = 69) were analyzed. All participants for whom the duration was above M + 1 SD or below M − 1 SD for at least one of these measurements were excluded from the dataset (1 exclusion for a longer learning time and 5 exclusions for a longer distractive task). Eventually, 54 participants were included in the data analysis, which consisted of one-way analysis of variance (ANOVA) with learning modality (point-light display/definition) as a within-subjects factor.

Categorization answers

We used the paired samples t test to assess whether the mean ratings given to the point-light displays and written definitions were equivalent. The analysis showed no difference between the two types of depictions (point-light displays mean rating = 3.38, SD = 0.34; written definition mean rating = 3.43, SD = 0.28; t53 = 0.87; p = 0.39).

Study time

Analysis of the study time for the point-light display and written definition modality indicated no significant difference [F (1,53) = 0.15, p = 0.69]. The participants did not spend more time studying verb-point-light items (mean = 15 s, SD = 4) than verb-written definition items (mean = 14 s, SD = 5 s).

Free recall

Analysis revealed the main effect of learning modality [F (1.53) = 14.84, p < 0.01, \(\eta_{p}^{2}\) = 0.22]. The rate of correctly recalled verbs was higher in the point-light displays modality (mean = 44%, SD = 16%) than in the written definition modality (mean = 36%, SD = 15%) (Table 1).

Table 1 Mean categorization rating, study time and percentage of correct recall for verbs of the point-light displays and written definition conditions

Discussion of experiment 1

The finding from Experiment 1 indicates better free recall performance for the verbs encoded with point-light displays compared to written definitions. The participants rated both conditions as equally good descriptions of the verbs (i.e., categorization answers), and no difference in the study times was observed. Consequently, the results in the free recall task do not originate from the different qualities of verb depictions or variations in study times.

Because observing and understanding one’s actions rely on a mirror neuron system (Rizzolatti & Sinigaglia, 2016) and motor simulation (Jeannerod, 2001), the point-light condition involves activation of the motor system. Therefore, improved free recall indicates that the encoding of covert motor activity occurs and is beneficial. In addition, the positive effect of action observation for isolated verbs shows that items relation processing is not required to improve memory performance. Even if relation processing does contribute when complex actions are learned (Steffens, 2007; Steffens et al., 2015) implicit motor simulation is encoded and strengthen memory trace compared to simple reading.

However, Experiment 1 had two main limitations. First, the typicality of the stimuli (i.e. definitions and point-light displays) may have been unequal for the participants. Written definitions are often encountered in daily life, while point-light human actions are original and possibly intriguing. Thus, the enhanced distinctiveness of point-light displays relative to written definitions could have influenced the results for the free recall task (Hunt et al., 2006; Schmidt, 1991). Second, the number of encoding modalities involved in each condition could explain our results. On the one hand, encoding a verb with a dictionary definition entails that only verbal written information is encoded. On the other hand, encoding a verb with a point-light display could rely on verbal written information when reading the verb and visual action processing while observing the point-lights. Indeed, one could argue that the point-light displays condition corresponds to dual coding, which will likely improve memory performance (Paivio, 1971). Similar limitation is in fact always present in studies about Experimenter Performed Task. However, comparison of biological and nonbiological point-light displays videos clips allows overcoming such issue. Because action is always learnt through observation, verbal and visual coding are possible in both condition. The only change regards the biological or nonbiological (i.e. kinematic inversion) nature of the depicted action, and therefore their processing through the motor system.

Experiment 2

Experiment 1 indicated that point-light displays lead to better encoding than written definitions. The aims of Experiment 2 were twofold. First, we assessed whether biological kinematics contribute to increased memory performance. Classically, the positive effects observed for point-light displays stimuli are directly related to kinematics. In contrast to biological kinematics, point light displays with nonbiological kinematics do not improve action identification (Martel et al., 2011) or action verb processing (Beauprez & Bidet-Ildei, 2019). The level of implicit motor imagery they elicit can be regarded as lower. Second, we addressed the methodological limitations of Experiment 1. Depictions of the verbs were given with only point-light display video clips. Moreover, nonbiological point-light displays are arguably more unusual and distinct than biological one (see the Supplementary material). The free recall of action verbs encoded with a normal (biological movement condition) or a kinematic-modified point-light displays (nonbiological movement condition) was compared. Better free recall in the biological point-light displays condition would suggest that action observation enhances memory because motor information is encoded. In contrast, if only dual coding drove the beneficial effect of the point-light condition in Experiment 1, then no difference should be observed. Finally, if the effect was due to distinctiveness, nonbiological movements should lead to better free recall performance.

Methods

Participants

As biological and nonbiological point-light displays are more similar than point-light displays and written definitions, we expected a lower effect size in Experiment 2 than in Experiment 1. We decided to increase the sample size and to recruit 100 new French-speaking participants (age range 20–40 years; mean = 24 years; 45 women) for the online experiment. Most of them were university students or young workers. They reported French as their native language and declared no psychiatric or neurologic history, drug use, or learning disorder. They were recruited through social networks or mailing lists. The study conformed to the Declaration of Helsinki and was approved by the University of Poitiers Ethics Committee. The presentation messages sent to request their participation mentioned that the participants would rate how videos fit with different verbs. Similar to Experiment 1, no information about the subsequent memory task was offered. All participants gave their informed consent before the beginning of the experiment.

Materials

The verbs and point-lights depicting biological movements were identical to those used in Experiment 1. Nonbiological point-light displays were modifications of the original one. To make the point-lights depict nonbiological movements, we used PLAViMoP software (Decatoire et al., 2019). The norm velocity of each point was inverted with respect to the mean norm original velocity. As a consequence, the tangential velocity of the different dots depicting the movement was modified (Fig. 1), but the original dot paths and durations were not changed (Fig. 2).

Fig. 1
figure 1

Velocity (cm/s) for one dot of biological (a) and nonbiological walking movement (b) from the beginning (0 s) to the end of the action (5.1 s)

Fig. 2
figure 2

Spatial trajectory of one dot of biological (a) and nonbiological (b) walking movement on a two-dimensional scale

Two versions of an online experiment were created on Limesurvey (https://www.limesurvey.org/fr/). Both contained 16 verbs associated with biological movements and 16 verbs associated with nonbiological movements. Similar to Experiment 1, the verb-modality association was counterbalanced, i.e., each verb was associated with a different modality depending on the questionnaire version (see the supplementary material). Eight distractors were also added to ensure that the participants paid attention to the task. Contrary to the normal items, the distractors were incongruent verb-movement associations (e.g., the verb “to lie down” with the point-light display of a man bouncing a ball). Half of the distractors were incongruent verb-biological movement items, and the other half were incongruent verb-nonbiological movement items. Distractors were not integrated into the free recall analysis but were used as exclusion criteria (see the data analysis).

Task and procedure

The task was similar to Experiment 1. The participants received identical instructions but asked to rate only verb and video clip associations. No information regarding the biological or nonbiological movement conditions was included. The data are available at https://osf.io/894uh/.

Results

Data analysis

Two participants were removed because they did not understand the instructions for the free recall task. The exclusion criteria were similar to those used for Experiment 1. First, we excluded participants likely to have completed the categorization task with random answers. We analyzed the 8 distractor items that contained incongruent verb-modality descriptions. Participants who rated at least one distractor as “very good” or “rather good” were excluded from subsequent analysis (N = 3). Second, we analyzed the times spent in the learning and distractive phases to control for their duration as much as possible. The mean learning time by item (17 s, SD = 7) and distractive task duration (224 s, SD = 220) was analyzed. All participants for whom these measurements persisted for M + 1 SD or below M − 1 SD were excluded from the dataset (4 exclusions for longer learning times and 5 exclusions for longer distractive task durations). Eventually, 86 participants were included in the data analysis, which consisted of one-way ANOVA with learning modality (biological/nonbiological point-light display) as a within-subjects factor.

Categorization answers

We used the paired-sample t test to assess whether the mean ratings given to the biological and nonbiological point-light display definitions were equivalent. An analysis revealed that biological movements were rated as better depictions of the verbs (mean rating = 3.7, SD = 0.22) than nonbiological movements (mean rating = 2.80, SD = 0.39; t85 = 26; p < 0.01).

Study time

Analysis of the study times for the verbs of the biological and nonbiological modalities indicated that the participants spent more time studying nonbiological point-light displays (mean = 17, SD 3.5) than biological point-light displays [mean = 14, SD = 3.5; F (1,85) = 30.3, p < 0.01].

Free recall

An analysis revealed a main effect of learning modality [F (1.85) = 5.88, p = 0.02, \(\eta_{p}^{2}\) = 0.06]. The rate at which verbs were correctly recalled was higher for the biological point-light display modality (mean 50%, SD = 15.5%) than for the nonbiological point-light display modality (mean 46%, SD = 15.5%) (Table 2).

Table 2 Mean categorization rating, study time and percentage of correct recall for verbs of the biological and nonbiological movement conditions

Discussion of Experiment 2

Experiment 2 was designed to assess the role of biological kinematic in action verb memory and to address the methodological limitations of Experiment 1. Better free recall was observed for verbs learned together with normal biological actions than for those learned with abnormal nonbiological actions. In addition, the participants spent more time rating nonbiological actions than biological one. They also indicated that biological actions more closely corresponded to verbs than nonbiological actions. These results are certainly not surprising: nonbiological point-light displays are more difficult to identify and do not provide action representations equivalent to those with biological point-light displays. Moreover, our finding is consistent with Beauprez and Bidet-Ildei (2018) who reported the identification of nonbiological point-light displays to be more difficult than that of biological point-light displays.

In contrast to Experiment 1, Experiment 2 involved learning only verb-point-light items. Consequently, the effect on free recall cannot come from a mere difference in the number of encoding modalities. Thus, one could argue that Experiment 1 compared a condition with one learning modality (i.e., only written information for verb-written definition items) to a condition with two learning modalities (i.e., written information and action observation for the verb-point light items). Using two learning modalities in the point-light condition could be regarded as a dual-coding condition and therefore improve encoding (Paivio, 1971). A similar limitation was not present in Experiment 2. In contrast, in Experiment 1, an alternative explanation originated from the distinctiveness of point-light displays. Human point-lights are uncommon and peculiar compared to written definitions. This distinctiveness may have led to improved memory for the point-lights (Hunt et al., 2006). However, the results of Experiment 2 argue against such an interpretation. Nonbiological point-lights are far more original, unusual, and even bizarre than biological one. Accordingly, the distinctiveness effect alone cannot fully account for free recall improvement following action observation. Indeed, based on the literature, which demonstrates the crucial role of biological kinematics in action processing (Badets et al., 2015; Beauprez & Bidet-Ildei, 2019; Martel et al., 2011), our results clearly suggest a link between kinematics and improved memory following observation. Biological action observation elicits implicit motor simulation, which can strengthen memory trace while nonbiological action arguably elicit limited, or no simulation. At the very least, the fact that biological point-light displays led to better memory performance than nonbiological point-light displays confirms that action observation is beneficial on its own, not simply because of dual coding or the peculiar features of the point-lights.

General discussion

The present work aimed to understand how action observation enhances memory encoding. In two experiments, we compared free recall performance for isolated action verbs incidentally encoded with various depictions. Some action depictions strongly involved the sensorimotor system (i.e., biological point-light displays), while others did not (i.e., reading; nonbiological point-light displays). Both studies indicated better memory after the observation of normal human actions. Unlike Experimenter Performed Task studies, we displayed isolated verbs rather than phrases (Feyereisen, 2009; Hainselin et al., 2017; Schult et al., 2014). The aim was to explore a more basic level of action observation memory. When an action sentence is learned, observation enhances encoding by improvement of items relation processing (Steffens, 2007; Steffens et al., 2015). However, it does not mean that coding of implicit motor activity does not occur. Our results suggest that complex or goal-directed actions are not required to get an observation-based improvement of memory. Action observation is beneficial even without the influence of items relation processing. Moreover, kinematics, an intrinsic characteristic of perceived action, contributes to the effect. Consequently, we propose that encoding is easier when action depictions elicit implicit motor simulation.

The results of Experiment 1 align with those of most studies about Experimenter Performed Task. However, to the best of our knowledge, this is the first attempt to generalize the effect of action observation on memory without using the classical Experimenter Performed Task paradigm (i.e. multiple items action realized by an experimenter). Our design has major advantages over Experimenter Performed Task studies. First, complete standardization of action depictions was accomplished. Second, our study demonstrates that memory is improved for not only large-scale, goal-directed action sentences but also isolated verbs (Steffens, 2007; Steffens et al., 2015). As a result, processing the relationship between items is not essential. Although such a mechanism can occur, it should not be regarded as the only cause. At a more basic level, implicit motor simulation is encoded for memory purposes and improves performance in free recall, a rather complex memory task. Nonetheless, two main limitations had to be addressed. First, comparison of written definitions and point-light display definitions entailed comparison of a single modality encoding with a dual-modality encoding. Therefore, the encoding of verb-written definition items relied on only reading, while the encoding of verb-point-light displays items relied on both reading and action observation. Second, compared to written definitions, point-light human movements are original and rare stimuli. Although the participants did not spend more time studying point-light human movements, their distinctiveness could contributes to better free recall performances.

To address such limitations and to extend our findings, Experiment 2 used biological (i.e., similar to Experiment 1) and nonbiological (i.e., inversed kinematic) point-light human movements. Once again, improved free recall was found in the biological point-light displays condition. Contrary to Experiment 1, the participants always learned the verbs with point-lights. Therefore, no difference existed in the number of encoding modalities. Moreover, the results argue against the idea that improved memory performance elicited by point-light displays comes from distinctiveness or attentional focus. Thus, the participants spent more time studying nonbiological than biological movements, and nonbiological point-lights are arguably far more unusual and intriguing than biological one (see the supplementary material). Differently, explicit ratings of the fitness between verbs and their depictions indicated that nonbiological movements did not describe the verbs as accurately as biological movements. One could argue that this weakens our conclusion. If the movements depicted by nonbiological point-lights are not correctly identified, the verbs viewed with such stimuli do not provide additional congruent information at encoding. However, we believe that this is highly unlikely: the results showed higher free recall rates for the nonbiological point-light displays in Experiment 2 (i.e., 46%) than for the written definitions in Experiment 1 (i.e., 36%). If nonbiological point-light displays were not recognized at all, how could they lead to better encoding than the written dictionary definitions?

Functionalist models of memory (Briglia et al., 2018; Versace et al., 2014) propose that the basic level of memory functioning is sensorimotor information. Accordingly, it is not surprising to observe better performance when item encoding relies on perception of sensorimotor activity rather than symbolic description. In our view, covert motor activity elicited by action observation could enhances memory because it allows encoding of an additional sensorimotor characteristic (i.e. implicit motor simulation). This account is in fact, very similar to the original explanation of the enactment effect. According to Engelkamp and Zimmer (1984), classical Subject Performed Task studies lead to better memory performance because encoding of overt motor activity corresponds to a supplementary encoding. Although other hypothesis based on items relation processing have become more and more influential over years (Koriat & Pearlman-Avnion, 2003; Kormi-Nouri & Nilson, 2001), the role of an additional motor coding has never been totally discarded in Subject Performed Task. Regarding Experimenter Performed Task, we propose that both items relation based accounts and encoding of implicit motor simulation can be of use. When complex actions are considered, various empirical evidences suggest a role for items relation processing (Steffens, 2007, Steffens et al., 2015; Schult et al., 2014). Nonetheless, a more basic consequence of action observation would be the elicitation of implicit motor simulation, which strengthen memory traces as soon as isolated action verbs are encoded.

Functionalist models of memory provide a new and meaningful background to investigate both Experimenter Performed Task and Subject Performed Task. Thus, a long-standing theoretical concern about the enactment effect was its meaning in the context of structuralist memory models. The title of Zimmer and colleagues’ book (2001) questioned the enactment effect in a straightforward manner: does it mean a distinct form of episodic memory exists for motor component? A distinct episodic memory subsystem for motor activity seems rather unlikely and is opposed to most of the structuralist memory models (e.g. Tulving, 2001). However, in the context of functionalist models, there is no need to hypothesize such additional subsystem. Episodic knowledge only corresponds to a particular set of sensorimotor encoding and reenactment. If item learning involves overt motor activity, therefore an additional coding exists compared to mere sentence reading and the memory trace is likely to be strengthened. Similarly, if action observation occurs, covert motor activity is elicited to understand action meaning and act as an additional coding, which can improve performance. Thus, additional coding does not mean a particular memory system for action. Moreover, as we stated above it is not opposed to the items relation processing account of Subject Performed Task and Experimenter Performed Task, which is arguably involved when actions to be learned are more complex.

Nonetheless, it is clear that concepts are built from sensorimotor experiences (Barsalou, 1999, 2008). In the present study, conceptual processing elicited by the various encoding modalities may differ in the level of semantic resonance they elicit. Hence, in a recent account of the link between action observation and action verb understanding, Bidet-Ildei et al. (2020) proposed that both are processed through common semantic representations. For example, understanding the action “to walk” when seeing a human being performing it or when reading the verb on paper is completed through a shared semantic action representation. In this view, the verb-biological point-light display condition would elicit a high semantic resonance because both the verb and biological point-lights are identified through their shared representation. In contrast, the verb-nonbiological point-light display and verb-written definition conditions could have elicited decreased levels of semantic resonance. Eventually, this difference in resonance level would likely influence memory encoding. Higher semantic resonance at encoding could improves memory trace. Further studies should be dedicated to the question and assess to what extent semantic resonance does contribute to our findings. Indeed, many works on language understanding posit that it is linked to the sensorimotor system (Gallese & Lakoff, 2005; Mazzuca et al., 2021; Meteyard et al., 2012; Pulvermüller, 1999). Various neuroimaging (e.g. Hauk et al., 2004; Van Elk et al., 2010) and behavioral findings (e.g. Masson et al., 2008; for a review see Jirak et al., 2010) support the hypothesis that language processing involves some type of sensorimotor simulation (but see also Mahon & Caramazza, 2008; Morey et al., 2021). Therefore, both the understanding of one’s action and the understanding of action sentence are associated with motor simulation processes. Nonetheless, because it is obvious that observed action and action sentence are not identical per se, at least slightly different types of motor simulation are necessarily involved. For the present work, the written dictionary definition used in Experiment 1 involved processing both very concrete components (e.g. action verbs) and more abstract or general components, in particular grammatical components (e.g. coordination, auxiliary verb). Arguably, although grammar can be embodied in the brain (Pulvermüller, 2010), its relationship to perception is not as straightforward as for semantic. Therefore, we can reasonably assume that action observation is likely to elicit easier or stronger motor simulation than sentence reading because it relies more directly on perception.

Nevertheless, investigation of brain activity during biological, nonbiological and sentence action encoding would be of main interest. Thus, it is crucial to link behavioral data to direct evidences of variation in the motor system. It is noteworthy that other behavioral findings suggesting a role for motor simulation in memory processes (Dutriaux & Gyselinck, 2016; Dutriaux et al., 2019) has recently been linked with variation in event related potentials (De Vega et al., 2021). In addition, deeper investigation of Experimenter Performed Task and Subject Performed Task in reference to functionalist models of memory would also be relevant. As we previously emphasized, those models could be a mean to go beyond former limitations rooted in classic models of memory. To conclude the present work indicates that memory improves when the format of encoded information is sensorimotor. Action observation is beneficial on its own, because it is processed through the motor system and therefore provide an additional encoding modality.