Introduction

Traditionally, grasping the meaning of a linguistic stimulus has been regarded as a process that is separated from other meaning-related processes in the brain. According to this view, linguistic meaning is represented in an amodal propositional manner. However, more recently it has been proposed that linguistic meaning is grounded in the sensorimotor system, that is, it is regarded as being closely related to other processes such as perception and action (also known as the “embodiment account of language processing”; e.g., Barsalou, 1999, 2008). According to this account, word meanings are stored in the brain in form of experiential traces that are generated during interactions with the world and resemble the action or perception processes that created them (Zwaan & Madden, 2005). Language comprehension can, therefore, be understood as a reactivation of the experiential traces that have been created when the word has been learned. In other words, understanding language is a process of experientially simulating the described events or situations by reactivating those traces.

A large body of research has been concerned with the demonstration that this kind of mental simulation process takes place during language comprehension, by showing effects of language processing on action or perception. For example, it has been shown that pictured objects were more easily identified as having been mentioned in a sentence when the implied shape of the object described in the sentence matched the shape of the pictured object (e.g., an egg in its shell vs. an egg sunny-side up; Zwaan, Stanfield, & Yaxley, 2002) or when the response direction described in a sentence matched the required response direction (Glenberg & Kaschak, 2002). Similar compatibility effects have also been found for words referring to objects with a typical location in the upper or lower part of the world (e.g., Dudschig, Lachmair, de la Vega, De Filippis, & Kaup, 2012; Dudschig & Kaup, 2017; Dudschig, Souman, Lachmair, de la Vega, & Kaup, 2013; Dunn, Kamide, & Scheepers, 2014; Lachmair, Dudschig, De Filippis, de la Vega, & Kaup, 2011; Thornton, Loetscher, Yates, & Nicholls, 2013). Taken together, these findings show that language processing affects action and perception, which has been interpreted as showing that participants reactivated experiential traces during comprehension. However, it is possible that these experiential simulations are not necessary for comprehension and just constitute an optional by-product of language processing. In order to conclude that simulations are functionally relevant for comprehension, an influence of action or perception on language processing needs to be shown. In this study, we addressed this question by investigating the effects of a secondary motor task (finger tapping in Experiment 1 and foot tapping in Experiment 2) on lexical processing. In the remainder of this introduction, we will first review studies using the same kind of stimulus material as we did (namely effector-related language), before describing previous studies investigating the functional relevance of experiential simulation.

Experiential simulation of effector-related words

One type of stimulus material that has been repeatedly used to investigate experiential simulations during language comprehension is effector-related language, that is words and sentences related to the hands/arms or the feet/legs, and language related to other specific body parts. One reason for this is that the embodiment account predicts distinct activations in the motor cortex for these word categories. Indeed, Hauk, Johnsrude, and Pulvermüller (2004) found that reading action verbs referring to actions related to the arms, the legs, or the face (e.g., pick, kick, and lick, respectively) activated areas in the motor cortex that were overlapping with those areas that were activated during the actual movement of the fingers, the feet, or the tongue, respectively (somatotopic activation). These findings confirmed earlier similar results that had been obtained using current source density measures in an electroencephalography study (Pulvermüller, Härle, & Hummel, 2001). Comparable results were also found using magnetoencephalography during action verb processing (Klepp et al., 2014). Furthermore, Carota, Moseley, and Pulvermüller (2012) extended the somatotopy finding to the processing of hand- and mouth-related nouns (tool words and food words, respectively). Tool words elicited a stronger activation than food words in a region that was associated with finger movements. The reverse was true for the motor region associated with tongue movements.

In addition to the somatotopy findings described above, effector-related words have also been shown to affect behavior differentially. For example, Boulenger et al. (2006) showed that reading action verbs describing actions that could be performed with the arms, the legs, or the mouth, interfered with a grasping movement when presented during grasping, and assisted grasping when presented before movement onset. This effect was more pronounced for words that referred to arm actions than for words that referred to other kinds of actions. Furthermore, Marino, Gough, Gallese, Riggio, and Buccino (2013) showed differential effects of hand- and foot-related nouns on hand responses in a go/nogo word categorization task. Participants were instructed to respond to concrete but not to abstract words after a go signal. When the go signal was presented 150 ms after word onset, participants who responded with their right hand reacted faster to foot than to hand words. The opposite pattern, faster responses to hand than to foot words, was found for left hand responses. The authors explain their data by speculating that hand word processing and right-hand response execution both involve the hand area of the left motor cortex and, therefore, interfere with each other. However, this explanation cannot explain the facilitation effect found for left-hand responses, since it would predict no overlap in the associated brain areas.

While the aforementioned studies have focused on the effects of effector-specific language on one kind of response effector (i.e., hand responses), other studies have investigated whether the processing of hand-, foot-, and mouth-related language has differential effects on different effectors. For example, Scorolli and Borghi (2007) investigated sensibility judgments to short phrases describing hand, foot, and mouth actions (e.g., to suck the sweet). Participants had to respond either vocally (mouth response) or with their foot. As predicted, foot responses were faster to foot-related phrases than to hand-related phrases, whereas vocal responses were faster to mouth-related phrases than to hand-related phrases. The results are not fully conclusive though, since vocal responses were also faster to foot- than to hand-related phrases, leaving open the possibility that hand-related phrases were overall harder to process than the other two. Ahlberg, Dudschig, and Kaup (2013) also investigated response effector compatibility during language processing but with individual words instead of phrases. In addition to hand- and foot-related action verbs, the authors also used nouns describing objects that were associated with the hands or the feet. These nouns either explicitly contained the lexeme hand or foot (e.g., handbag or football) or described objects that are usually manipulated by the hands or the feet (e.g., cup or shoe). Participants were required to respond by pressing a button or a pedal using their hand or foot, respectively, depending on the print color of the words. For both noun categories, participants responded faster when the word effector and the response effector matched (e.g., responding with the foot to shoe) than when they mismatched (e.g., responding with the hand to shoe). There was, however, no significant difference between matching and mismatching action verbs, although this should have been expected considering the results of the other studies described above. Nonetheless, this study shows clear effector-specific activation during noun processing in a setting in which participants did not have to actively process the words’ meaning.

Functional relevance of experiential simulations

All of the findings discussed so far are taken as evidence that participants reactivated experiential traces during language processing, which then influenced the responses on the tasks. However, just showing that there is somatotopic brain activation during language processing and that language processing affects subsequent action and perception does not imply functional relevance of simulations. It remains possible that experiential simulation is just an optional by-product of language processing (see Goldinger, Papesh, Barnhart, Hansen, & Hout, 2016). In order to conclude that simulations are relevant, an influence of action and perception on language processing has to be shown. First evidence for an effect in this direction comes from studies showing that perceived motion affects motion language processing. For example, Kaschak et al. (2005) demonstrated that visual motion interfered with sensibility judgments to sentences describing motion in the same direction (see also Kaschak, Zwaan, Aveyard, & Yaxley, 2006, and Meteyard, Zokaei, Bahrami, & Vigliocco, 2008, for similar investigations). Furthermore, it has been shown that action-related neural activation can influence language processing. Pulvermüller, Hauk, Nikulin, and Ilmoniemi (2005) reported evidence that a transcranial magnetic stimulation (TMS) pulse over the hand motor area of the left hemisphere led to faster lexical decision times for arm-related compared to leg-related action verbs (e.g., pick vs. kick). Stimulation of the leg area, on the other hand, facilitated the processing of leg-related words. Similarly, Willems, Labruna, D’Esposito, Ivry, and Casasanto (2011) showed that lexical decision times for manual, but not for non-manual, action verbs were faster after left hemisphere hand area TMS than after the same stimulation over the right hemisphere.

There also are a few studies investigating effects of action on language processing. For instance, in a study by Glenberg, Sato, and Cattaneo (2008), participants were moving beans either towards or away from their body from a wide-mouthed to a narrow-mouthed container for about 20 min, before making sensibility judgments to sentences describing motions towards or away from the body. Responses were slower when the direction of the bean moving task and the direction described in the sentence matched than when they mismatched. The authors concluded that moving beans caused plastic changes in the motor system, which in turn affected language processing. It is unclear, however, why these changes in the motor system would be affected by the direction of the bean moving task, since participants were moving the arm back and forth and not just in one direction. One could argue that the movement towards the target container had to be planned and executed more accurately due to the narrower opening and that this might have made the movement in this direction more salient.

Other studies have investigated the effects of a simultaneous motor task on language processing. For example, Yee, Chrysikou, Hoffman, and Thompson-Schill (2013) asked their participants to classify nouns as describing concrete objects vs. abstract concepts while they were performing a patty cake task that was engaging the hands, a mental rotation task, or no concurrent task at all. Additionally, the words referring to concrete objects were rated for the amount of experience that each participant had with handling these objects. Both the mental rotation and the patty cake task interfered with the performance in the noun classification task. However, only for the patty cake condition, object touching experience caused differences in performance, with larger interference for words referring to frequently touched objects. A possible explanation for these findings is that activity in the motor areas interfered with the ability to grasp the meaning of nouns referring to objects that are frequently manipulated by the hands, and thus, this activity must be essential to accessing the meaning of these words. Furthermore, Shebani and Pulvermüller (2013) investigated short-term memory for arm- and leg-related action verbs, while participants were performing a tapping task with either their hands or their feet. In the foot tapping condition participants committed significantly fewer errors for arm- than for leg-related words, whereas they committed fewer errors for leg- than for arm-related words in the hand tapping condition, although the latter effect was only marginally significant. This result shows that engaging the motor system in a secondary task impedes the memory for words related to actions usually executed with the occupied effector. Finally, Pecher (2013, Experiment 4) used a similar approach as Shebani and Pulvermüller (2013) and Yee et al. (2013) to investigate the effects of a secondary motor task on working memory. Participants had to remember object names, whose referents could be manipulable (e.g., binoculars) or non-manipulable (e.g., chimney), while performing a concurrent manual motor task (making fists and individually stretching out the fingers of both hands simultaneously one by one). This motor task should have interfered more with memory for manipulable than non-manipulable words if mental simulation were necessary to process these words; however, this was not the case. These results contradict the findings of Shebani and Pulvermüller (2013). Furthermore, one should take into account that Shebani and Pulvermüller just as Pecher did not investigate language comprehension but memory for linguistic items. Even if the results had been in agreement, it would be premature to conclude that simulations are functionally relevant for comprehension. It is conceivable that simulations are helpful for keeping a list of objects in working memory but that they are not needed for understanding these words in the first place.

In the current study, we investigate this question by using a similar approach as Shebani and Pulvemüller (2013). However, instead of a memory task we used a lexical decision task. This task measures a more basic aspect of language processing, namely lexical access. If experiential simulation is necessary for comprehension, we should be able to see similar effects as in Shebani and Pulvermüller for this task. More specifically, we measured participants’ lexical decision times to the hand- and foot-related nouns that were used in the study by Ahlberg et al. (2013). Responses had to be executed by either the hand or the foot depending on the font color or the words. Furthermore, in one half of the experimental trials, participants were required to perform a simultaneous tapping task either with their hand (Experiment 1) or with their foot (Experiment 2). These tapping tasks were expected to occupy the hand and foot system, respectively, thereby impeding simulations related to the same effector. According to the embodiment account of language processing, this should interfere with the processing of words that rely on those simulations (i.e., hand-related words in Experiment 1 and foot-related words in Experiment 2) and, therefore, slow down lexical decision times to these words to a larger degree than lexical decision times to words associated with the other effector. Furthermore, in accordance with the results of Ahlberg et al. (2013), we expected faster response times on trials in which the effector associated with the word and the response effector matched than on trials in which they mismatched. Finding this effect would confirm that participants indeed engaged in experiential simulation, whether they are functionally relevant or not.

Experiment 1: hand system occupied

In this experiment, we occupied the hand system in one half of the experiment with a finger tapping task on the number pad of a standard computer keyboard. We expected stronger interference of this task with the processing of hand words in comparison to foot words.

Methods

Participants

Participants were tested in a single session of approximately 1 h and 15 min duration. We aimed at a final sample size of 32 participantsFootnote 1 for the analyses and, therefore, replaced participants of the original sample if they had to be excluded for any of the reasons listed below until we reached that target. Overall 16 participants had to be replaced. Five participants committed errors on more than 20% of the trials. Eight participants did not follow the instructions of the tapping task, which resulted in faster response times in the dual- than in the single-task condition. Three further participants had to be excluded due to technical failure of the recording device. The final sample consisted of 32 participants (aged 18–53 years, average age 23.8 (7.0) years, 27 women). All participants reported to be German natives and right-handed. The scores in a German translation of the Edinburgh handedness inventory (Oldfield, 1971) ranged from 60 to 100 (M = 84.9, SD = 12.9). Participants were reimbursed with either course credit or 8 € per hour.

Experimental setup and materials

Participants were seated in front of a CRT monitor with their left index finger resting on the middle button of a PST Serial Response Box (Model No. 200A) and their left foot resting on a response pedal on the floor. In the dual-task condition, the participants’ right hand rested on the number pad of a computer keyboard. We chose to use the participants’ dominant hand for the tapping task because we expected interference effects to be stronger than for tapping with the non-dominant hand.

For the lexical decision task, we used the explicit (e.g., handbag and football) and implicit (e.g., cup and shoe) hand and foot nouns of the study by Ahlberg et al. (2013), with the exception of the word nail polish, which was replaced by the word faucet, since nail polish is not necessarily only associated with the hands but possibly also with the feet. There were 16 words in each category, resulting in 64 words overall. There were no significant differences in word length or frequency (retrieved from the Wortschatzportal of the University of Leipzig, http://wortschatz.uni-leipzig.de) between the categories (all ps > .10; see Table 1). All nouns were German. In addition, we created 64 pronounceable pseudowords that were matched in length and number of syllables to the experimental words. Of those 64 pseudowords 16 were compounds, starting with either hand or foot, that do not exist in the German language. This type of pseudowords was included in order to force participants to read the whole word before they could make a lexical decision. The remaining pseudowords were created using the pseudoword generator Wuggy 0.2.2b3 (Keuleers & Brysbaert, 2010) whenever possible. All stimuli were presented in Arial, 18 pt, bold, at the center of the computer screen. Half of the stimuli in each block were presented in orange (rgb 255, 128, 0) and the other half in blue (rgb 0, 0, 255) font color. The assignment of color to word was counterbalanced across blocks, such that each word and pseudoword was presented equally often in each color. The background color was white (rgb 255, 255, 255).

Table 1 Mean word length and frequency for explicit and implicit hand and foot words, with standard errors in parentheses

Procedure and design

The experiment consisted of two parts. In one half of the experiment, participants were performing a lexical decision task and a tapping task at the same time (dual-task condition), whereas in the other half they were only performing the lexical decision task (single-task condition). The order of these two task conditions was counterbalanced across participants.

For the lexical decision task, participants were instructed to respond to words by either pressing the button on the table or the pedal on the floor with their left hand or foot, respectively, depending on the font color of the word. Assignment of font color to response effector was counterbalanced across participants. When a pseudoword appeared, participants were supposed to withhold their response. Each trial started with the presentation of a fixation cross at the center of the screen for 1500 ms. This was followed by a 300-ms blank interval, after which the word or pseudoword appeared. The word stayed on the screen until the participant responded or for a maximum of 2000 ms. The trial ended with a 300-ms blank interval.

In the dual-task condition, participants simultaneously performed a tapping task on the number pad of a standard computer keyboard with the fingers of their right hand. The thumb rested on the 0-key, the index finger on the 4-key, the middle finger on the 5-key, and the ring finger on the 6-key. Participants were instructed to press the keys 0-4-6-5 repeatedly in quick succession. This task was practiced before the dual-task condition until it was fluent. Tapping responses were recorded while the fixation cross was presented and trials without any responses during this interval were excluded from the analysis.

In total, participants completed eight experimental blocks of 128 trials each, four in the single- and four in the dual-task condition. The order of blocks within the single- and the dual-task conditions was counterbalanced across participants using a Latin square procedure. This was done because the assignment of color to word (see “Experimental setup and materials”) differed in each block. In addition to the eight experimental blocks, there were three practice blocks of 20 trials each. The single-task condition always started with one practice block. The dual-task condition always started with two practice blocks to allow participants some additional practice time to adjust to the complicated task requirements.

The data were analyzed using 2 × 2 × 2 × 2 ANOVAs with the factors word effector (hand word, foot word), response effector (hand response, foot response), explicity (explicit word, implicit word), and task condition (single task, dual task). In the by-participant analysis (F1) all factors were within-participant factors (repeated-measures ANOVA). In the by-item analysis (F2), Response effector and task condition were within-items factors; word effector and explicity were between-items factors (mixed ANOVA).

Results and discussion

Participants committed an error on 5.8% of all trials. Those trials were excluded from the analysis. In the following, we will first present the hypotheses-relevant interactions before reporting the remaining results of the ANOVAs. Bonferroni corrected p values are reported for all post hoc tests.

Hypotheses-relevant results

With regard to the hypothesis that participants should respond faster when word and response effector match, we found the expected compatibility effect, that is a significant interaction between the factors word effector and response effector, F1(1, 31) = 138.10, p < .001, η 2 p  = .82, F2(1, 60) = 154.44, p < .001, η 2 p  = .72. When using their hands, participants responded faster to hand words (822 ms) than to foot words (913 ms), F1(1, 31) = 139.10, p < .001, η 2 p  = .82, F2(1, 60) = 52.65, p < .001, η 2 p  = .47. The opposite was true for responses with the feet. In that case, reaction times were faster to foot words (931 ms) than to hand words (983 ms), F1(1, 31) = 48.18, p < .001, η 2 p  = .61, F2(1, 60) = 29.50, p < .001, η 2 p  = .33. This finding replicates the findings of Ahlberg et al. (2013) and shows that participants indeed engaged in experiential simulation during word processing.

There also was a significant interaction between the factors word effector, response effector, and explicity, F1(1, 31) = 24.22, p < .001, η 2 p  = .44, F2(1, 60) = 7.95, p = .007, η 2 p  = .12. However, as can be seen in Fig. 1, the expected pattern of results was present for both explicit and implicit words. Pairwise comparisons (see Table 2) showed significant differences between hand and foot words for hand and for foot responses for both explicit and implicit words in the by-participant analysis. However, these differences were not always significant after Bonferroni correction in the by-item analysis. Most likely the differences in the effect size of the hand and foot word comparisons are due to differences in the word material.

Fig. 1
figure 1

Mean reaction times for hand and foot words depending on the response effector for explicit (a) and implicit (b) words in Experiment 1. The error bars represent 95%-confidence intervals as per Loftus and Masson (1994)

Table 2 Pairwise comparisons between reaction times to hand and foot words separately for explicit and implicit words and for hand and foot responses in Experiment 1

Most importantly, there was a significant interaction between the factors word effector and task condition, F1(1, 31) = 6.27, p = .018, η 2 p  = .17, F2(1, 60) = 5.19, p = .026, η 2 p  = .08. As predicted, the difference between the dual- and the single-task condition was larger for hand than for foot words (135 vs. 118 ms; see Fig. 2), although it was significant for both hand words, F1(1, 31) = 61.97, p < .001, η 2 p  = .67, F2(1, 30) = 450.06, p < .001, η 2 p  = .94, and foot words, F1(1, 31) = 47.90, p < .001, η 2 p  = .61, F2(1, 30) = 338.96, p < .001, η 2 p  = .92.Footnote 2 This finding is in line with the interpretation that experiential simulations might be functionally relevant for comprehension, since impeding hand-related simulations seems to have affected the processing of hand-related words to a larger degree than the processing of foot-related words. However, in principle this difference could also reflect other differences between the hand- and the foot-related words. Thus, before drawing strong conclusions from the results of the present experiment, we need to see whether the opposite effect emerges for a foot tapping task, which was employed in Experiment 2.

Fig. 2
figure 2

Mean reaction times for hand and foot words depending on the task condition (single or dual task) in Experiment 1. The error bars represent 95%-confidence intervals as per Loftus and Masson (1994)

Additional results

As expected, participants reacted faster in the single- (849 ms) than in the dual-task condition (976 ms), F1(1, 31) = 56.87, p < .001, η 2 p  = .65, F2(1, 60) = 786.18, p < .001, η 2 p  = .93. Unsurprisingly, hand responses (868 ms) were faster than foot responses (957 ms), F1(1, 31) = 61.17, p < .001, η 2 p  = .664, F2(1, 60) = 227.63, p < .001, η 2 p  = .79. The interaction between those two factors was significant in the by-item analysis, F2(1, 60) = 6.15, p = .016, η 2 p  = .09, and marginally significant in the by-participant analysis, F1(1, 31) = 3.74, p = .062, η 2 p  = .11. Although post hoc tests showed that there was a significant effect of task condition for both hand responses, F1(1, 31) = 60.39, p < .001, η 2 p  = .67, F2(1, 60) = 642.25, p < .001, η 2 p  = .94, and foot responses, F1(1, 31) = 45.30, p < .001, η 2 p  = .61, F2(1, 60) = 361.52, p < .001, η 2 p  = .92, the difference was numerically larger for hand responses (136 ms) than for foot responses (118 ms). This stronger interference effect for hand than foot responses might be related to the fact that the tapping task and hand responses were executed with the same kind of response effector (i.e., the hands), whereas foot responses and the tapping task were done with a different type of effector (hand vs. foot). The greater similarity between the tapping task and pressing a key with a finger than between the tapping task and pressing a pedal with the foot might have caused this effect.

Furthermore, responses to implicit words (902 ms) were faster than responses to explicit words (923 ms), F1(1, 31) = 18.15, p < .001, η 2 p  = .37, F2(1, 60) = 4.66, p = .035, η 2 p  = .07, possibly because some of the pseudowords included the lexemes hand or foot (see “Methods”) just like the explicit words, which might have made the task slightly harder for explicit than implicit words. However, as shown by a significant interaction between explicity and task condition, F1(1, 31) = 6.97, p = .013, η 2 p  = .18, F2(1, 60) = 7.45, p = .008, η 2 p  = .11, this was only true for the single-task condition, F1(1, 31) = 34.72, p < .001, η 2 p  = .53, F2(1, 60) = 8.12, p = .012, η 2 p  = .12, but not for the dual-task condition, F1(1, 31) = 1.48, p = .468, η 2 p  = .05, F2 < 1. Participants also responded faster to hand words (903 ms) than to foot words (922 ms). This effect was significant in the by-participant analysis, F1(1, 31) = 17.56, p < .001, η 2 p  = .36, but only marginally significant in the by-item analysis, F2(1, 60) = 3.19, p = .079, η 2 p  = .05. A significant interaction between the factors word effector and explicity, F1(1, 31) = 75.53, p < .001, η 2 p  = .71, F2(1, 60) = 8.77, p = .004, η 2 p  = .13, indicated that the effect of word effector was due to the explicit words, F1(1, 31) = 77.91, p < .001, η 2 p  = .72, F2(1, 30) = 11.15, p = .004, η 2 p  = .27, whereas there was no significant reaction time difference between hand and foot words for implicit words, F1(1, 31) = 3.09, p = .177, η 2 p  = .09, F2 < 1. These effects might be due to some kind of differences between the word stimuli other than frequency and length, since the latter were controlled across word categories (see “Methods”). However, since our hypotheses concern interactions and not main effects comparing different words groups, this should not pose any problems. All other interactions did not reach significance (all ps > .10, with the exception of p = .072 for the four-way interaction in the by-participant analysis).

Experiment 2: foot system occupied

The second experiment was identical to Experiment 1 with the exception that we occupied the foot system instead of the hand system in one half of the experiment. For this purpose, participants performed a tapping task with their right foot using a response device lying on the floor. If experiential simulations are functionally relevant for comprehension, we should find stronger interference of this task with the processing of foot words in comparison to hand words.

Methods

Participants

Participants were tested in a single session of approximately 1 h and 15 min duration. As in Experiment 1, we aimed at a final sample size of 32 participants and replaced participants that had to be excluded until we reached that target. Twenty-six participants did not follow the instructions (e.g., they mixed up the assignment of font color to response effector or they did not perform the foot tapping task as instructed, resulting in slower responses in the single-task than in the dual-task condition) and one participant committed errors on more than 20% of the trials. The high number of participants unable to follow the instructions appears to be due to the rather complicated task. Participants had to perform two motor tasks in parallel, using three effectors in total (both feet and one of their hands). For many participants it was a particular challenge not to interrupt the tapping task when responding to stimuli with the other foot. Previous studies investigating the effects of a secondary motor task on language processing usually used verbal responses in the first task (e.g., Yee et al., 2013), making the task much easier for the participants. However, this was not possible in the current study, since we were also interested in replicating the compatibility effect of the study by Ahlberg et al. (2013), which required hand and foot responses. It seems that the coordination of the involved effectors was not possible for everyone. The dual-task requirements in Experiment 1 seemed to have been slightly easier than in this experiment, possibly because in everyday life people are more often performing separate tasks with their hands than with their feet.

The final sample consisted of 32 participants [aged 19–41 years, average age 23.9 (4.8) years, 23 women]. None of them had participated in Experiment 1. All of them reported to be German natives and right-handed. The scores in the Edinburgh handedness inventory (Oldfield, 1971) ranged from 41.2 to 100 (M = 81.8, SD = 15.6).Footnote 3 Participants were reimbursed with either course credit or 8 € per hour.

Experimental setup and materials

The experimental setup was the same as in Experiment 1 with one exception. Instead of resting their right hand on the number pad of a computer keyboard in the dual-task condition, participants rested their right foot on a locally constructed response key box (see Fig. 3) on the floor. Since this response key box is usually used for manual responses, participants were asked to take off their right shoe and wear a disposable shoe cover instead.

Fig. 3
figure 3

Response key box used for the foot tapping task. A locally constructed overlay with four response buttons was placed over a standard German keyboard. For the current experiment, only the two central buttons were used

Procedure and design

In the dual-task condition, participants performed a tapping task with their right foot instead of their right hand. They were instructed to alternately press the two central buttons of the response key box on the floor at a speed of about two taps per second. Except for this change, procedure and design were identical to Experiment 1.

Results and discussion

Just as in Experiment 1, we will first present the interactions that are relevant for our hypotheses before turning to the other results of the ANOVAs. Participants committed an error on 6.5% of the trials. These trials were excluded from the analyses. For post hoc tests we report Bonferroni corrected p values.

Hypotheses-relevant results

As in Experiment 1, we found a significant interaction between the factors word effector and response effector, F1(1, 31) = 111.90, p < .001, η 2 p  = .78, F2(1, 60) = 136.76, p < .001, η 2 p  = .70. Participants responded faster to hand words (790 ms) than to foot words (876 ms) when using their hands to respond, F1(1, 31) = 110.10, p < .001, η 2 p  = .78, F2(1, 60) = 47.69, p < .001, η 2 p  = .11, and faster to foot words (906 ms) than to hand words (945 ms) when using their feet, F1(1, 31) = 33.72, p < .001, η 2 p  = .52, F2(1, 60) = 18.25, p < .001, η 2 p  = .23. Again there was a significant interaction between the factors word effector, response effector, and explicity, F1(1, 31) = 30.31, p < .001, η 2 p  = .49, F2(1, 60) = 15.61, p < .001, η 2 p  = .21. However, as can be seen in Fig. 4, the expected pattern of the interaction between word effector and response effector was present for both explicit and implicit words. Pairwise comparisons revealed significant differences between hand and foot words for hand and for foot responses for both explicit and implicit words in the by-participant analysis. However, these differences were not always significant after Bonferroni correction in the by-item analysis (see Table 3). The pattern of results is remarkably similar to the one in Experiment 1 (see Figs. 1, 4). This confirms the interpretation that this pattern is most likely due to the word material itself. These findings indicate that participants in Experiment 2 engaged in experiential simulation during word processing, just as participants in Experiment 1.

Fig. 4
figure 4

Mean reaction times for hand and foot words depending on the response effector for explicit (a) and implicit (b) words in Experiment 2. The error bars represent 95%-confidence intervals as per Loftus and Masson (1994)

Table 3 Pairwise comparisons between reaction times to hand and foot words separately for explicit and implicit words and for hand and foot responses in Experiment 2

Most importantly, there was a significant interaction between task condition and word effector in the by-participant analysis, F1(1, 31) = 5.48, p = .026, η 2 p  = .15, but not in the by-item analysis, F2(1, 60) = 2.76, p = .102, η 2 p  = .04. Although the difference between the dual- and the single-task condition was significant for both hand words, F1(1, 31) = 62.40, p < .001, η 2 p  = .67, F2(1, 30) = 336.93, p < .001, η 2 p  = .92, and foot words, F1(1, 31) = 55.43, p < .001, η 2 p  = .64, F2(1, 30) = 204.89, p < .001, η 2 p  = .87, interestingly, it was numerically larger for hand (95 ms) than for foot words (82 ms; see also Fig. 5).Footnote 4 Thus, contrary to the predictions, the foot tapping task had a slightly larger effect on hand words than on foot word, just as the hand tapping task in Experiment 1. It thus seems that impeding foot-related simulations and impeding hand-related simulations had similar effects on the processing of hand- and foot-related words. This was confirmed by an analysis across experiments. The interaction between experiment, task condition, and word effector did not reach significance (F1 < 1, F2 < 1).

Fig. 5
figure 5

Mean reaction times for hand and foot words depending on the task condition (single or dual task) in Experiment 2. The error bars represent 95%-confidence intervals as per Loftus and Masson (1994)

Additional results

Overall, the additional results were quite similar in Experiments 1 and 2. In Experiment 2, participants also responded faster in the single- (835 ms) than in the dual-task condition (924 ms), F1(1, 31) = 62.35, p < .001, η 2 p  = .67, F2(1, 60) = 525.43, p < .001, η 2 p  = .90. Hand responses (833 ms) were faster than foot responses (925 ms), F1(1, 31) = 73.64, p < .001, η 2 p  = .70, F2(1, 60) = 264.08, p < .001, η 2 p  = .81. However, in Experiment 2 the interaction between those two factors was not significant, F1 < 1, F2 < 1. If the explanation for this interaction in Experiment 1 (i.e., stronger interference of hand tapping with hand than foot responses because these two actions involve the same effector) is correct, we could have expected stronger interference of foot tapping with foot than hand responses. However, the foot tapping task is somewhat easier than the hand tapping task, since it only involves two instead of four buttons. Possibly it did not interfere that much with foot responses for that reason.

Furthermore, responses to implicit words (868 ms) were again faster than responses to explicit words (891 ms), F1(1, 31) = 20.29, p < .001, η 2 p  = .40, F2(1, 60) = 4.89, p = .031, η 2 p  = .08. Unlike in Experiment 1, this difference was not only found in the single-task condition, since the interaction between task condition and explicity was not significant, F1 < 1, F2 < 1. Participants also responded faster to hand words (868 ms) than to foot word (891 ms), F1(1, 31) = 24.38, p < .001, η 2 p  = .44, F2(1, 60) = 4.94, p = .030, η 2 p  = .08. Just as in Experiment 1, a significant interaction between word effector and explicity, F1(1, 31) = 23.93, p < .001, η 2 p  = .44, F2(1, 60) = 6.67, p = .012, η 2 p  = .10, indicated that this was only true for explicit words, F1(1, 31) = 34.97, p < .001, η 2 p  = .53, F2(1, 30) = 9.98, p = .007, η 2 p  = .25, whereas there was no significant difference between implicit hand and foot words, F1 < 1, F2 < 1.

Contrary to Experiment 1, the interaction between explicity and response effector did reach significance, F1(1, 31) = 10.31, p = .003, η 2 p  = .25, F2(1, 60) = 5.42, p = .023, η 2 p  = .08, and in turn interacted with the factor Task Condition, F1(1, 31) = 5.26, p = .029, η 2 p  = .15, F2(1, 60) = 4.72, p = .034, η 2 p  = .07. Post hoc tests showed that the interaction between explicity and response effector was only significant in the dual-task condition (the difference between explicit and implicit words for hand responses was 41 ms, and for foot responses it was 3 ms), F1(1, 31) = 12.64, p = .001, η 2 p  = .29, F2(1, 60) = 8.48, p = .005, η 2 p  = .12, but not in the single-task condition (the difference between explicit and implicit words for hand responses was 28 ms and for foot responses it was 22 ms), F1 < 1, F2 < 1. We currently do not have an explanation for this finding. None of the other interactions reached significance (all ps > .10).

First block analyses for Experiments 1 and 2

In both experiments, each word was presented eight times—once per block. Therefore, one could argue that the repetition of the words may have watered down the effects. That is, one could expect effects to be stronger at the first occurrence of the word and to become weaker during the course of the experiment. For this reason, we performed additional analyses using only the first block of the experiments and treating task condition as a between-participants variable.

In Experiment 1, the interaction between word effector and response effector remained significant, F1(1, 30) = 17.81, p < .001, η 2 p  = .37, F2(1, 60) = 20.19, p < .001, η 2 p  = .25; however, it did not interact significantly with the factor explicity anymore, F1(1, 30) = 2.09, p = .159, η 2 p  = .07, F2(1, 60) = 2.96, p = .091, η 2 p  = .05. The interaction between task condition and word effector also was no longer significant, F1 < 1, F2 < 1. Of the additional results, the main effects for task condition, F1(1, 30) = 8.49, p = .007, η 2 p  = .22, F2(1, 60) = 82.09, p < .001, η 2 p  = .58, and response effector, F1(1, 30) = 28.28, p < .001, η 2 p  = .49, F2(1, 60) = 37.04, p < .001, η 2 p  = .38, stayed significant, as well as the interaction between word effector and explicity, F1(1, 30) = 15.40, p < .001, η 2 p  = .34, F2(1, 60) = 6.05, p = .017, η 2 p  = .09. As before, the interaction between task condition and response effector was marginally significant in the by-participant analysis, F1(1, 30) = 3.59, p = .068, η 2 p  = .11, and significant in the by-item analysis, F2(1, 60) = 5.66, p = .021, η 2 p  = .09. All other effects did not reach significance (all ps > .10).

The overall data pattern in Experiment 2 was quite similar. The interaction between word effector and response effector remained significant, F1(1, 30) = 17.06, p < .001, η 2 p  = .36, F2(1, 60) = 21.80, p < .001, η 2 p  = .27. Other than for Experiment 1, the three-way interaction between explicity, word effector, and response effector remained significant in the by-participant analysis, F1(1, 30) = 6.98, p = .013, η 2 p  = .19, and was marginally significant in the by-item analysis, F2(1, 60) = 3.92, p = .052, η 2 p  = .06. As for Experiment 1, the interaction between task condition and word effector was no longer significant, F1 < 1, F2 < 1. Of the additional results, the main effects for task condition, F1(1, 30) = 9.28, p = .005, η 2 p  = .24, F2(1, 60) = 196.87, p < .001, η 2 p  = .77, and response effector, F1(1, 30) = 48.34, p < .001, η 2 p  = .62, F2(1, 60) = 38.26, p < .001, η 2 p  = .39, stayed significant, as well as the interaction between word effector and explicity, F1(1, 30) = 18.40, p < .001, η 2 p  = .38, F2(1, 60) = 10.30, p = .002, η 2 p  = .15. The other effects did not reach significance (all ps > .10, with the exception of p = .069 for the interaction between task condition, explicity, and response effector in the by-participants analysis).

As the results of these analyses show, the effects did not get weaker over the course of the experiment. On the contrary, the effects appear to be overall weaker or not even present in the first block. Of course, the overall smaller power of the first block analyses might have contributed to this. We can, however, be sure that the repetition of the stimulus material did not obscure any effects that were present at the first occurrence of the words.

General discussion

In this study, we replicated and extended the previous finding of faster response times when response effector and word effector match than when they mismatch (Ahlberg et al., 2013). Whereas Ahlberg et al. (2013) analyzed their data collapsed for matching and mismatching response and word effectors, we kept the initial separation into hand- and foot-related words and responses. This way we were able to show that the match effect occurred for both effectors. In both experiments, participants responded faster to hand words than to foot words when responding with their hand and faster to foot words than to hand words when responding with their foot. Additionally, we showed that this effect does not only occur in the Stroop-like task that Ahlberg et al. used, but extends to another task (i.e., lexical decision). Furthermore, despite a significant interaction between word effector, response effector, and explicity, the effect was present for explicit as well as implicit words. It seems that the significant three-way interaction was due to different effect sizes for different words (see also Figs. 1, 4). These findings support the claim that participants reactivated experiential traces during word processing that were generated when they interacted with the referents of these words. Reading a word referring to an object that is typically manipulated with one of the respective effectors seems to have primed the associated effector, thereby facilitating responses with this effector.

More importantly, we found that the tapping task in both experiments increased lexical decision times to hand-related words to a slightly larger degree than to foot-related words. That is, moving the fingers of the right hand and moving the right foot impeded performance in the lexical decision task in a similar way. This was confirmed by an additional analysis comparing the effect across experiments. Thus, the effect of the secondary task was not effector-specific, which contradicts predictions of the embodiment account of language processing. Furthermore, despite the significant interaction, the difference between the effects of the secondary tasks on hand-related and on foot-related words was quite small in both experiments (see Figs. 2, 5) and was overshadowed by the overall large effects that both finger tapping and foot tapping had on the processing of both hand- and foot-related words. The effect disappeared altogether when only the first block of the experiments was analyzed. Nevertheless, the finding of a slightly stronger effect of the foot tapping task on the processing of hand- compared to foot-related words was unexpected and needs to be explored further. One possible explanation is that the referents of the hand words were more action-related than those of the foot words. Since humans perform most actions with their hands, it appears possible that there are more hand- than foot-related objects with a high action association. In order to test this assumption, we conducted a short online study in which 20 participants rated each item according to its action-relatedness on a scale from 1 to 7. Hand items (4.1) were indeed rated as being more action-related than foot items (3.4), F1(1, 19) = 67.86, p < .001, η 2 p  = .78, F2(1, 60) = 4.23, p = .044, η 2 p  = .07. It thus is possible that the slightly larger effect of the two secondary tasks on hand words might be due to this difference in the item material. Action-relatedness also is a possible confounding variable in previous studies (e.g., Pecher, 2013; Yee et al., 2013) and should be taken into consideration when creating item material for future research.

The finding of a larger effect of action on action-related language compared to non-action language could also be interpreted as support for functional relevance of experiential simulation if one assumes that action-related simulation is impeded by the secondary task. However, in this case an action condition is compared with a no-action condition (i.e., hand and foot tapping vs. no tapping). Any differences found could not only be due to the action itself but also due to other differences between these conditions, such as higher attentional demands in the action condition. Investigating differential effects of two separate motor tasks on language processing, such as in the current study, is a much stronger test of the theory. Crucially, these differential effects were not found; therefore, we can conclude that detailed experiential simulations are not necessary for lexical access in the lexical decision task. This does not mean that participants did not engage in experiential simulation at all. In fact, the interaction between word effector and response effector implies that they did. The lack of a differential effect of the secondary task on hand and foot word processing just shows that they were not needed to perform the task. The hand tapping task should have made hand-related simulations harder than foot-related simulations, though not impossible. If these simulations had been needed to perform the task, reactions times to hand words should have been affected more than reaction times to foot words. The opposite is true for the foot tapping task.

The conclusion that experiential simulations are not functionally relevant in a lexical decision task does of course not rule out that they might be relevant for other types of tasks, which would be in line with predictions from hybrid models that postulate different systems of knowledge representation (e.g., Barsalou, Santos, Simmons, & Wilson, 2008; Connell & Lynott, 2013; Louwerse, 2007, 2008; see also Borghi & Cangelosi, 2014). According to these models, it may depend on the task characteristics which system is used. If it is possible to perform a task based on linguistic surface information, participants do not engage in experiential simulations but use the linguistic system as a shortcut instead. For example, this would be the case in a lexical decision task in which the pseudowords do not follow the rules of the phonology and orthography of a language. In that case, pseudowords can easily be identified relying on linguistic features alone. If the pseudowords follow these rules, as in the current study, the mental lexicon has to be accessed to make a decision (Barsalou et al., 2008). In the latter case, it is more likely that participants will engage in experiential simulation. Experiential simulations become even more important for tasks involving a larger degree of conceptual processing (e.g., interpretation generation, see Connell & Lynott, 2013). They might also be relevant for working memory tasks, since Shebani and Pulvermüller (2013) found differential effects of hand and foot movements on the memory for arm- and leg-related action verbs. The reason that Pecher (2013) did not find the predicted larger interference effect of a secondary manual task on memory for manipulable than non-manipulable objects might lie within the specifics of the manual task that was chosen in Pecher’s study. Participants had to make fists and stretch out the fingers of both hands simultaneously one by one, then make fists again, etc. Although the author picked this task to be maximal incompatible with motor actions necessary to grasp an object, it still bears resemblance to grasping movements at that point when the participants are required to close their hands to make fists. It appears possible that this movement might have facilitated simulations of words referring to manipulable objects, thereby counteracting the general interference effect that was caused by having to perform a secondary task and leading to a smaller interference effect for this word category. This explanation could account for the numerically larger interference effect for non-manipulable compared to manipulable words; however, it is highly speculative and further research is needed to test the effects of different manual tasks on language processing.

In conclusion, in this study we have shown that occupying the hand system and the foot system, respectively, does not have any differential effects on lexical access to hand- and foot-related nouns. This finding is not in line with predictions of the experiential simulations view and implies that simulation might be an optional by-product of language processing, at least in a lexical decision task. It remains possible that experiential simulation might be relevant for more demanding tasks such as keeping several linguistic stimuli in working memory or understanding longer narratives such as sentences and texts, which would also be in line with predictions of the hybrid models discussed above. Under which circumstances exactly simulation does aid language processing remains to be established; however, lexical access appears to be possible without simulation.