From the classroom to the vocational training course, the ability to retain and subsequently implement sequences of instructions is critical for their successful performance. Although considerable research has examined this ability in long-term memory tasks, very few studies have explored how this might operate over shorter time periods. This is surprising, given that instructions can often require temporary storage and immediate implementation, and given the close links that have been emphasized between working memory and action (e.g., Baddeley, 2012). Indeed, working memory itself can be defined as a limited-capacity system that “supports human thought processes by providing an interface between perception, long-term memory and action” (Baddeley, 2003, pp. 829). In the present study, we therefore aimed to explore the cognitive underpinnings involved in retaining and implementing instructional sequences, with a particular focus on how enactment and action planning relate to working memory.

The vast majority of research on working memory has examined simple verbal recall or recognition tests for verbal and visuospatial information. However, the few studies that have been carried out on enactment and working memory suggest that storing instructions for subsequent physical implementation involves factors additional to those involved in verbal repetition. Koriat, Ben-Zurr, and Nussbaum (1990) presented short sequences of action–object pairs involving real objects (e.g., “touch the stone, lift the ashtray, move the pencil”) and manipulated whether participants recalled via physical enactment or verbal repetition. They found that enacted recall was more accurate than verbal recall. In addition, when participants encoded the sequence in anticipation of enacted recall, performance on a surprise verbal test was improved, indicating that the enacted recall advantage at least partly reflects beneficial impacts of action planning during encoding. More recently, Gathercole, Durling, Evans, Jeffcock, and Stone (2008) examined the ability of 5- to 6-year-old children to either perform or verbally repeat sequences such as “Pick up the blue ruler and put it in the red folder then touch the green box,” and they observed a substantial advantage for enacted over verbal recall. These findings suggest that an imaginal-enactive action plan is constructed when instructional sequences are encountered for later implementation, which can be used to underpin actual performance of the instructions but may also benefit verbal recall (Koriat et al., 1990).

Working memory resources are likely have an important role in instruction storage and action planning (Logie, Engelkamp, Dehn, & Rudkin, 2001; Smyth, Pearson, & Pendleton, 1988). For example, Gathercole et al. (2008) found that performance on their instructions task significantly correlated with children’s working memory ability. In order to explore this further, Yang, Gathercole, and Allen (2014) examined the involvement of different subcomponents of the tripartite working memory model (e.g., Baddeley, 1986) in enacted recall performance and verbal repetition, testing the contributions of phonological short-term memory (using articulatory suppression during instruction presentation), spatial processing (based on simple spatial tapping), and executive control (based on backward counting) to memory for visually presented (written) instructions. Each of these manipulations negatively impacted on task performance, indicating roles for the proposed underlying subcomponents in the ability to follow instructions. However, none of these tasks had any impact on the magnitude of the action advantage that was consistently observed. Thus, although working memory is critical to the storage and processing of verbal sequences (see also Baddeley, Hitch, & Allen, 2009), the executive control and modality-specific subcomponents of the tripartite model (Baddeley, 1986) do not appear to fully capture action planning as indexed by the enacted recall advantage.

Therefore, if these components of working memory are not the source of the action advantage, what other factors might be important in action planning? One potential way to explore this might be provided by encoding enactment, or the “subject-performed task” (SPT) manipulation, in which participants enact each of the instructions during encoding. This should be distinguished from recall enactment, and indeed, it is open to debate whether the cognitive processes involved in actual enactment are exactly equivalent or differ in some respects from those arising from planning for later performance (as studied by Koriat et al., 1990, and others). Enactment at encoding has primarily been explored in the context of long-term memory tasks (involving large numbers of actions and delayed recall), with several studies showing beneficial effects on subsequent verbal recall (e.g., Cohen, 1981; Engelkamp, 1998). Although this effect is claimed to be automatic and nonstrategic in nature (Cohen, 1981), the form of representation that it provides has been the cause of some debate. Bäckman and Nilsson (1984) suggested that enactment at encoding drives the construction of a visual code that can be used to supplement verbal memory and support later recall. Alternatively, performing each action during instruction might lead to the construction of a motoric (or kinesthetic) code (e.g., Engelkamp & Zimmer, 1989). In line with this, concurrent performance of unrelated motor tasks has larger effects on later recall than does analogous visual interference (e.g., Cohen, 1989; Saltz & Donnenwerth-Nolan, 1981), and motoric similarity between actions disrupts later recognition (e.g., Engelkamp & Zimmer, 1995).

Given the effects of self-enactment on long-term verbal recall and recognition tasks, this methodology may provide a productive means of further exploring instruction memory and action planning in working memory. Few studies have examined the effects of enactment during encoding on working memory performance. Wojcik, Allen, Brown, and Souchay (2011) found that, whereas children with autism spectrum disorder were impaired on the Gathercole et al. (2008) working memory action task (relative to healthy controls), both groups showed improvement in enacted recall performance as a result of earlier enactment during encoding. A similar beneficial effect of encoding enactment was observed on the immediate verbal recall of instructions in groups of healthy older adults and patients with mild Alzheimer’s disease (Charlesworth, Allen, Morson, Burn, & Souchay, 2014). Although this work suggests that an SPT manipulation can boost working memory performance, it does not indicate how enactment at encoding and at recall may interact. To date, no previous research has examined the effects of enactment during encoding on enacted versus verbal recall in working memory. By orthogonally manipulating whether healthy young adult participants physically enacted during encoding or during recall, we could obtain novel insights into action planning, motoric processing, and working memory, while also possibly highlighting optimal ways in which instructions should be presented.

In this experiment, therefore, we examined the impact of enactment during encoding on immediate enacted recall or verbal repetition of short instructional sequences. During the auditory–verbal presentation of instructions made up of simple action–object pairs, participants either did nothing or enacted each pair in turn, before then attempting to physically enact or verbally repeat the entire sequence. In line with previous findings, we predicted positive effects of enactment at both encoding (Charlesworth et al., 2014; Wojcik et al. 2011) and recall (Gathercole et al., 2008; Yang et al., 2014) in groups of young adult participants. The novel central question was whether and how these forms of enacted processing might interact; would the beneficial effects of enactment during encoding vary in magnitude, depending on whether enactment was also required at recall? One possibility was that enactment at encoding could provide a larger boost to subsequent action performance than to verbal recall, in keeping with the principle of transfer-appropriate processing. Such a prediction assumes that action planning is not fully automatic and would benefit from the development and strengthening of visual and/or motoric coding that encoding enactment provides. Alternatively, encoding enactment might improve verbal recall to a greater extent. For the enacted-recall condition, participants may actively construct an action plan incorporating visuospatial and motoric coding, and thus would not substantially benefit further from actual enactment during encoding; for verbal recall, however, participants might not effectively construct such representations unless they were “forced” to through enactment at encoding, therefore showing a larger beneficial effect of this manipulation on recall performance. Such findings would provide insight into the source of the established enacted-recall advantage (Gathercole et al., 2008; Yang et al., 2014) and into the cognitive processes underpinning action planning and verbal memory.

Method

Participants

In all, 28 participants (22 females, six males; 18–22 years of age) took part in this experiment. All were undergraduate students at the University of Leeds, had English as their first language, and participated in exchange for course credit.

Materials

On the basis of pilot work, each sequence consisted of five action–object pairs. A pool of eight abstract shapes (star, square, moon, diamond, triangle, heart, circle, and hexagon) was used. Each of the five actions in a sequence was drawn from an experimental pool of six (flip, push, drag, spin, touch, and lift). Objects and actions were selected without replacement within a sequence (e.g., drag the hexagon, flip the circle, push the moon, lift the square, or touch the diamond), with 44 sequences being used across the experiment. The abstract nature of the stimulus set meant that any prior association in semantic memory between a particular object and action was unlikely, thus rendering these pairings truly arbitrary and likely to rely on temporary creation in working memory for their successful recall. The shapes were presented in neutral-colored laminated card form, each measuring approximately 5 ×5 cm.

Design and procedure

The experiment was implemented according to a 2×2 repeated measures design, manipulating encoding condition (no enactment vs. enactment) and response type (verbal vs. action recall). Each condition was performed in a separate block, in counterbalanced order across participants. One practice trial and ten test trials were presented in each condition.

Each session started by familiarizing participants with the shapes and their labels, and with what each physical action involved. The shapes were placed on the table in front of the participant, in a pseudorandom spatial configuration that remained constant for each participant. Each condition involved auditory–verbal presentation of the instructional sequence. Each sequence was orally presented to the participant at a steady rate, with a pause of approximately 3 s between each action–object pair. For the encoding-enactment conditions, participants carried out each instructional segment during the interstimulus interval. For the no-enactment-during-encoding conditions, participants simply listened to the instructional sequence. The presentation rates and durations were equivalent for all conditions, and shapes remained visible throughout all phases.

The response phase started immediately following the end of the instructional sequence and the final 3-s delay. For the verbal recall condition, participants attempted to verbally recall the entire set of action–object pairs, in their original order. For the enacted-recall condition, participants physically carried out each of the action–object pairs in turn.

Results

Responses were scored correct if the actions and objects were recalled in their original pairings and in the correct position in the sequence, with accuracy being reported as the mean proportions of action–object pairs correctly recalled.Footnote 1 Mean proportions correct in the encoding and recall conditions are displayed in Fig. 1, and as a function of serial position in Fig. 2. The data were analyzed using a 2 ×2 ×5 (Encoding Condition × Response Type × Serial Position) repeated measures analysis of variance. This revealed a significant effect of encoding condition, F(1, 28) = 6.99, MSE = .06, p < .05, η p 2 = .20, with a positive effect of enactment during encoding on later recall. The effect of response type was also significant, F(1, 28) = 32.45, MSE = .06, p < .001, η p 2 = .54, with enacted recall being more accurate than verbal recall.

Fig. 1
figure 1

Mean proportions of action–object pairs correctly recalled in each encoding and recall condition. Error bars denote standard errors

Fig. 2
figure 2

Mean proportions of action–object pairs correctly recalled as a function of serial position in each encoding condition, through (a) verbal recall and (b) enacted recall. Error bars denote standard errors

The interaction between encoding condition and response type was significant, F(1, 28) = 15.47, MSE = .04, p = .001, η p 2 = .37. Planned comparisons revealed a significant effect of enactment on verbal recall, t(27) = 5.41, p < .001, Cohen’s d = 0.86, but not on action performance, t(27) = 0.40 p = .70, d = 0.07. Comparing action and verbal recall, accuracy was higher for action recall, both without enactment during encoding, t(27) = 6.63, p < .001, d = 1.14, and with enactment, t(27) = 3.10, p < .01, d = 0.39, although the response type effect size was clearly reduced in the latter condition.

Returning to the omnibus analysis, the effect of serial position was significant, F(4, 112) = 90.04, MSE = .03, p < .001, η p 2 = .76. A post-hoc Tukey test indicated that the accuracy scores at all positions differed from each other (p < .05), apart from between Positions 3 and 5 and Positions 4 and 5, suggesting a strong primacy effect and the absence of a significant recency effect. In addition, a significant Encoding Condition × Serial Position interaction was observed, F(4, 112) = 3.95, MSE = .02, p < .01, η p 2 = .12. Post-hoc Tukey tests indicated that the beneficial effect of enactment during encoding was only significant (at p < .05) at Sequence Positions 4 and 5. We also found a significant interaction between response type and serial position, F(4, 112) = 6.57, MSE = .02, p < .001, η p 2 = .19. Post-hoc Tukey tests indicated significantly higher accuracy for action over verbal recall at each of the five serial positions, though the effect was somewhat larger at the final positions. Finally, the three-way interaction between encoding condition, response type, and serial position was not significant, F(4, 112) = 1.34, MSE = .02, p = .26, η p 2 = .05.

Discussion

Although considerable research has explored how long-term memory performance is impacted by enactment (e.g., Cohen, 1981; Engelkamp, 1998), few studies have applied this to the immediate recall of short sequences that is typically used in measures of working memory. Such studies (e.g., Gathercole et al., 2008; Koriat et al., 1990; Yang et al., 2014) have focused on enactment during the response phase, and typically have used real objects that may have preexisting associations with movements. In contrast, the present study was the first to explore the impact of physical enactment, performed during both encoding and recall, on the ability to follow instructions in working memory. We also used arbitrary pairings of abstract shapes and actions that were repeatedly used in different combinations on every trial, in order to emphasize temporary storage and working memory processing and to minimize contributions from long-term memory. We observed a significant recall advantage for enacted over verbal responses, adding to a developing body of evidence indicating that planning for and implementing a set of physical actions facilitates working memory performance (Gathercole et al., 2008; Koriat et al., 1990; Yang et al., 2014). In contrast, the effects of enactment during encoding were dependent on the type of recall required. A beneficial effect of encoding enactment emerged, but this effect was much larger, and indeed only significant, for verbal rather than enacted recall, a finding that runs contrary to the predictions based on transfer-appropriate processing. Thus, physical enactment during encoding particularly facilitates working memory when participants are preparing for verbal repetition rather than planning for enacted recall. This pattern of findings somewhat resembles those observed in long-term memory recognition (Freeman & Ellis, 2003), suggesting that actual enactment during encoding and planning for enactment at a later point in time are nonadditive in nature. Our analysis of performance across serial positions indicated that enactment effects were larger at later positions in the sequence and, indeed, that enactment at encoding only facilitated verbal recall at later positions. This would indicate that, whereas retention of the initial parts of a sequence may have a stronger verbal element (via storage in the phonological loop), this capacity is soon exceeded. Our study suggests that the availability of additional forms of coding, as provided by enactment, would then become particularly useful in supporting ongoing performance.

What additional processes or representational formats might contribute to these effects? One possibility is that sensorimotor information is incorporated into the memorial representation, increasing its distinctiveness and accessibility (Freeman & Ellis, 2003; Logie et al., 2001; Zimmer, Helstrup, & Engelkamp, 2000). Thus, when planning for enactment-based recall, participants may actively build a representation that includes visuospatial and motoric information, possibly incorporating representations of the intended actions (see Frith, Blakemore, & Wolpert, 2000). This would result in richer forms of coding that produce superior recall relative to verbal repetition. Actual enactment during initial instruction would not particularly add to this. However, it does boost verbal recall, by forcing participants to generate these additional forms of coding that might not otherwise be constructed. This account would also imply that spatial–motoric codes are not generated spontaneously whenever instructions are encountered, even though they can be beneficial to memory performance. Our study supports this idea, but further experimentation will be required to establish the conditions under which this becomes automatic.

Although considerable work has been done in the long-term memory domain, the impacts of enactment, in particular, and motoric processing, more generally, remain a relatively underexplored topic in working memory. How might models of working memory capture the outcomes of the present study? Baddeley, Allen, and Hitch (2011; Baddeley, 2012) suggested a multicomponent model in which a range of information, including tactile–kinesthetic input, might enter working memory, with processing and initial storage being attributed to the visuospatial sketchpad. The latter component has been further subdivided, with Logie (1995) distinguishing a passive store for visual information (the visual cache) from a system for representing space and movement (the inner scribe). In line with a relationship between movement and space, concurrent motor movement has been shown to disrupt performance on tasks requiring either memory or mental imagery for spatial paths (e.g., Baddeley & Lieberman, 1980; Quinn, 1994; Quinn & Ralston, 1986; Smyth & Pendleton, 1989). Similarly, Bo and Seidler (2009) have argued that the learning of new motor sequences may critically rely on spatial working memory. Thus, movement representations created by either enacting or planning to enact during encoding may be temporarily stored and rehearsed within the inner scribe spatial component of working memory (Logie et al., 2001). This would be consistent with our findings, with maintenance within the inner scribe supplementing other forms of storage (e.g., within the phonological loop), particularly when their capacities are exceeded (i.e., toward the end of each sequence). Thus, it is possible that the present findings might be accommodated within existing multicomponent models of working memory (e.g., Baddeley, 2012; Logie, 1995). However, it is worth noting that when performed concurrently with sequence presentation in following-instructions tasks, spatial tapping (an activity assumed to particularly disrupt the inner scribe component of working memory) does not appear to interact with the enacted-recall advantage (Yang et al., 2014). This in turn would indicate that spatial working memory may not be solely responsible for this effect, and that a separable motoric representation (e.g., Smyth & Pendleton, 1989; Sternberg, Monsell, Knoll, & Wright, 1978) might also contribute to enactment effects at both encoding and retrieval.

This interpretation of enactment effects in working memory as reflecting spatial–motoric processing is at least partly drawn from work on long-term memory (e.g., Engelkamp & Zimmer, 1989), and it is possible that other forms of coding may also play an important role. As with any working memory task, participants might draw on a range of information to support performance, including visual, spatial, verbal, and semantic, as well as motoric and kinesthetic, information (Baddeley & Logie, 1999; Logie et al., 2001). In the case of visuospatial information, stimuli were constantly present in all conditions in the present experiment, ensuring equivalent opportunities to utilize these forms of coding. However, the requirement to enact at encoding or recall may have encouraged participants to engage with this information, and thus have facilitated performance. More broadly, enactment may affect semantic processing (Zimmer, 2001) or help participants to maintain task focus. Therefore, representations specific to motoric and movement processing may not be the only causal factor underlying enactment effects. Regardless of the precise nature of the specific contributory representations involved, the enactment effects observed in the present study clearly indicate that multiple forms of processing can be utilized in order to support task performance. Although multicomponent approaches to working memory may capture how these separate processing streams could operate independently, effectively combining initially disparate forms of information into a holistic representation might require a modality-general capacity such as the episodic buffer (Baddeley, 2000; Baddeley et al., 2011). Explicit exploration of this possible component has focused on the binding of visual and auditory information (see Baddeley et al., 2011, for a review) and has indicated that this may develop relatively automatically; it may be that the integration of information (possibly including verbal, visual, spatial, and motoric information, depending on the condition) in the following-instructions task presently under exploration also proceeds in this way. Thus, different forms of coding, including those emphasized by actual or intended enactment, would be incorporated into a bound representation at little or no additional cost to executive control processes (Yang et al., 2014).

The present findings can be differentiated from previous work concerning the benefits of multimodal encoding on working memory (e.g., Delogu, Raffone, & Belardinelli, 2009), which have suggested that multiple input streams can benefit verbal recall, provided that these are nonredundant. In the present study, enactment at encoding was always additional to the verbal presentation of instructions, irrespective of response format. Thus, an account based on simple multimodal presentation would predict similar effects of encoding enactment across verbal and action recall. Instead, our findings suggest that it is the participants’ own anticipation and planning for future action that renders enactment during encoding relatively redundant.

Subsequent research will be required to more clearly ascertain the source of the various enactment effects observed in the present study, and the extent to which they emerge through effortful or automatic processing. However, in line with claims from the long-term memory literature (e.g., Engelkamp & Zimmer, 1989; Freeman & Ellis, 2003), we suggest that such effects may have an important motoric component. It would be fruitful for future work to consider how such processing may be incorporated into working memory, across different tasks and groups, and how performance in these tasks may correlate with other measures of working memory and wider cognitive function.