Introduction

Dual-task research paradigms are used to explore the role of executive functions in the performance of complex motor tasks such as locomotion (Al-Yahya et al. 2011; Bayot et al. 2018; Yogev et al. 2008). Performance of a concurrent cognitive and locomotor task, i.e. dual-tasking, can result in observable performance declines in one or both tasks (Ellmers et al. 2016; Hegeman et al. 2012; Kimura and van Deursen 2020). This impact on performance is thought to occur at the central response selection level of cognitive processing and deficits are often attributed to a general demand on executive functions being higher than available cognitive resources (Tombu and Jolicœur 2003) or to a ’bottleneck’ at specific processing junctions within the brain (Pashler 1994). Regardless of the mechanism, the increased demand on the central nervous system can result in longer verbal response times (RT) for cognitive tasks, such as an auditory Stroop task, when performing simultaneous locomotor tasks (e.g. Siu et al. 2008) and/or altered motor performance (e.g., Pitman et al. 2021; Weerdesteyn et al. 2003). This longer response time may suggest that more cognitive resources are allocated to the motor task (Kelly et al. 2010) while an altered motor performance, including reduced walking velocity (Raffegeau et al. 2018) may be a strategy used to reduce the demand of a locomotor task and free up the cognitive resources needed to complete a challenging cognitive task during these dual-tasking scenarios (Patel et al. 2014).

Our executive functions include response inhibition, selection of new responses, and the execution of a response to a stimulus (Banich 2009; Yogev et al. 2008). The Stroop paradigm is a classic stimulus-response paradigm which challenges both inhibition and response selection. One version of the traditional visual Stroop test requires individuals to identify the color of a word written (e.g., red or blue) rather than the word itself (e.g., “blue” or “red”) via a verbal response or a button press (e.g., Barbarotto et al. 1998). Presentation of an incongruent stimulus then creates a conflict in response selection as the word written (thought to be the more easily integrated stimulus characteristic) does not match the colour of the word (considered the less easily integrated characteristic). This conflict is measured through increased response times for incongruent stimuli compared to response times recorded for congruent stimuli, where the word and the color match (Stroop 1935). Cue congruency results in an observable facilitation effect via faster RT, although facilitation effects are considerably less robust than the inhibitory conflict demonstrated for incongruent Stroop responses (MacLeod and MacDonald 2000).

A drawback of the visual Stroop paradigm in dual-task locomotion research is that the visual task requires an individual to direct their gaze and visual attention to a stimulus word on a screen in front of them. This creates an inherent methodological challenge as visual attention is also required for the guidance of ongoing locomotion including online control of safe foot placement and the negotiation of obstacles in the travel path (Raffegeau et al. 2018). Directing gaze away from the path of travel effectively creates a structural interference at a sensory integration level (an individual cannot look in two places at the same time) rather than an interference at the central response selection level (Worden et al. 2016). This is particularly disruptive in complex locomotor tasks, where directing gaze away from objects has been found to decrease dual-task performance and increase risks of contact with the obstacle (Cho et al. 2019). A common workaround for locomotor research in this area is to use the auditory version of the Stroop task (e.g. Knight and Heinrich 2017; Morgan 1989). In this version of the Stroop test, the participant must identify the pitch (either high or low) of a recorded voice saying the word ‘high’ or ‘low’. Prior research has demonstrated similar executive function demands via response selection/inhibition as the visual Stroop task (Morgan 1989) without incurring structural interference that may unintentionally alter motor task performance during dual-task locomotor paradigms (Worden et al. 2016).

Neutral tasks have been utilized in visual Stroop paradigms as a tool to measure the general central processing time required to read and verbalize a stimulus irrelevant to the characteristics of the Stroop task stimuli, for example the word “stage” written in black ink (Parris 2014). The measured response time of the neutral task can then be compared to response times for the incongruent (inhibitory) and congruent (facilitative) Stroop task conditions, providing a comparator to measure both inhibition and facilitation effects of the Stroop paradigm. However, many experimental protocols employing the auditory Stroop task do not include a neutral task cue stimuli (e.g. Siu et al. 2008; Worden et al. 2016). Logistical constraints regarding protocol length may affect this choice as the addition of a neutral cue will increase the number of trials which may, in turn, result in the generation of unintended fatigue in study participants; this can be of particular concern when conducting research with populations experiencing mobility concerns. In the current study, we were interested in ascertaining how a pitch-irrelevant neutral word (that requires participants to repeat the word) within an auditory Stroop paradigm may impact cognitive and motor responses as this has been less explored in the dual-task locomotor literature. Additionally, we know that the more alike one auditory cue word is to another word, the harder it is to identify correctly (Gaskell and Marslen-Wilson 2002). Thus, it may be possible that neutral task stimuli words that differ in structure to the auditory Stroop stimuli, such as words containing two syllables compared to the single syllable “high” or “low” words presented in most auditory Stroop paradigms, may present a different cognitive challenge.

We also know that the Stroop task is sensitive to learning due to participant familiarity with cues following repetition (MacLeod 1991); individuals can ‘prime’ their cognitive system in preparation for an anticipated stimulus (Monsell et al. 2001; Koch et al. 2018). Given these past observations, some researchers have suggested that individuals could use preparatory behavioral strategies to disregard irrelevant information about a visual Stroop task (Goldfarb and Henik 2007); for example by prioritizing attention on the color of the word, which has been found to be one of the first characteristics processed in an object identification task (Zinni et al. 2014). If similar preparatory strategies are adopted for an auditory task, for example attending to the pitch of the stimuli prior to the word spoken, this could present a challenge for researchers as many locomotor dual-task paradigms present repeated auditory Stroop cues to participants without a neutral task present. Thus, when considering the experimental design for the current study we aimed to make it disadvantageous for participants to disregard the primary characteristic of the auditory Stroop stimulus (word spoken) by presenting cognitive cues in ‘mixed’ trial blocks thereby encouraging them to prioritize/focus their attention to the secondary characteristic (pitch of word).

The primary aim of this study was to explore the impact of neutral and auditory Stroop cues and a priori knowledge on cognitive and locomotor task performance. The secondary purpose of this study was to examine if the similarity of a neutral task word to the Stroop task word stimuli resulted in altered verbal response times. For the primary purpose, we hypothesized that response times would be fastest for congruent Stroop stimuli, followed by the neutral task stimuli, and finally the incongruent Stroop stimuli, confirming that our auditory Stroop protocol produced facilitation and interference for cognitive response processing. Further, we expected to observe the greatest disruption in dual-task performance (e.g. greater response times to all cognitive tasks and slower locomotor center of mass (CoM) velocity during trial blocks where cognitive cues were provided in a mixed trial bock, compared to trial blocks where cognitive cues were known ahead of time. These findings would provide evidence that having a priori knowledge of the experimental condition reduced on-line demand for cognitive resources in young adults. Finally, with respect to our secondary purpose, we anticipated that within our mixed cognitive trial blocks, participants would experience greater central processing delays when presented with a neutral task stimulus word with one syllable compared to two syllables, as single syllable words more closely resemble the auditory Stroop stimuli. Collectively, results from this study will provide important methodological information for researchers in this field which may help guide the selection of neutral task cues in future dual-task locomotor and auditory Stroop paradigms.

Methods and statistical analyses

Participants

Sixteen young adults (8 females; mean age 22.4 years, range 19–28 years) provided written consent to take part in this study. Individuals were excluded from participating if they disclosed any diagnosed auditory (e.g. hearing loss), visual (e.g. uncorrected visual acuity), musculoskeletal (i.e., muscle strain or tendon injury) or neurological (i.e., post-concussion symptoms) conditions which could have affected their ability to perform a walking, standing, or cognitive task. This study was approved by the University’s research ethics committee. Once consent was obtained, participants were instrumented with 43 retroreflective markers (OptiTrack, Corvallis USA; 120 Hz) placed on anatomical landmarks (e.g. toe, heel, right/left anterior superior iliac spine, xiphoid process, left/right acromion) (Leardini et al. 2007, 2011). A custom microphone affixed to each participant’s left shoulder was used to collect verbal responses to the cognitive task as analog waveform signals, digitally sampled at 1200 Hz.

Experimental protocol

Cognitive task

An auditory cue was presented randomly to participants during each trial via a custom-built software program which translated text inputs to voice (Arduino IDE v. 1.8.12). In half of these trials (50%), participants would hear an auditory Stroop cue (Knight and Heinrich 2017; Worden and Vallis 2016). The words “high” or “low,” spoken in a high or low pitch, would be played through a speaker at the center of the walkway, requiring participants to verbally identify the pitch of the word (high or low), and not the word itself. The response could either be congruent with the word spoken (i.e., a high-pitched voice saying the word “high”) or incongruent (i.e., a high-pitched voice saying the word “low”). In other trials, participants would hear a neutral cue; the specific neutral words used were, “lab,” “lemon,” “home,” or “hello” spoken in a moderate tone. The neutral task words were partially selected based on words used in prior published protocols from our laboratory (Pitman et al. 2021; Pitman and Vallis 2021) while careful consideration was also given so that the neutral task words would initially sound similar from our chosen Stroop task words (which started with either the letter either “H” or “L”, and had either one (Home, Lab) or two syllables (Hello, Lemon). A linguistic software tool was used to obtain the frequency of our verbal cues (“Praat”; https://praat.en.softonic.com). Neutral pitch cues (~ 121 Hz) were within a range of a typical male voice (~ 90–155 Hz) while our low pitch was slightly lower (~ 79 Hz). The high pitch cues (~ 295 Hz) were slightly higher than the range of a typical female voice (165–255 Hz; Baken and Orlikoff 2000).

After receiving instruction on the cognitive tasks, participants were provided with 24 seated familiarization trials of all cognitive tasks (neutral and Stroop stimuli presented in a mixed format), where accuracy but not response time was collected. Upon hearing a word over the speaker, participants were instructed to “Respond to the auditory cue as soon as you hear it, ensuring to the best of your ability that you are responding correctly.” Familiarization trials were included to ensure exposure to the cognitive tasks was consistent across participants, and that participants understood the cognitive task instructions; feedback on accuracy of responses was provided during the familiarization trials. Participants were required to answer at least 80% of the familiarization trials correctly before moving forward in the protocol.

Walking trials

Participants were asked to complete a block of 64 walking trials (7-metre straight walkway). All participants were provided with the same scripted set of instructions. Participants were provided general instructions for all trials: “Walk from the start line to the end line, at a brisk pace, but do not run and provide a verbal response to the auditory cue” and to “Please complete both the cognitive task and motor task to the best of your ability; do not prioritize one over the other”. Trials were completed in a blocked, randomized design (see Fig. 1) where specific instructions to participants were manipulated. In all trial blocks, an equal number of neutral words (One- and Two-syllable) and the standard auditory Stroop cues were presented (both congruent and incongruent cues).

Fig. 1
figure 1

Experimental protocol design. Familiarization trials were first completed in a seated position (80% accuracy required to continue). Walking trials were then completed by both groups. Group A completed the Known block of trials first followed by the Mixed block of trials; Group B completed the same number of walking trials, but in the opposite order (Mixed block first, followed by Known). In the Known block of trials, participants were given Instructions about the type of cognitive task in advance; a set Neutral cue trials were completed first followed by completion of the Auditory Stroop cues. Once completed participants took a short break and then commenced the Mixed block. In these trials, the type of auditory cue presented was unknown to the participant and changed for each trial, i.e. neutral cues were presented randomly with incongruent and congruent Stroop cues

For 50% of trials (Known condition block) participants were given Instructions a priori that they would hear a specific cue prior to starting the walking trials. A set of neutral cues were first completed; participants were informed that it would be a natural cue and that, similar to the familiarization trials, they should repeat the word out loud as quickly and accurately as possible. Once complete, the participants were told that they were starting a set of Stroop trials and were reminded that their response should identify the pitch of the word (high or low) as quickly and accurately as possible.

In the other 50% of trials (Mixed condition block), we randomly presented neutral stimulus cues (“lab,” “lemon,” “home,” or “hello”) or Stroop cues (“high” and “low” spoken in high or low pitch). Prior to commencing this block of mixed trials, participants were told that they would hear either a Stroop or a Neutral task cue and were reminded of the appropriate responses for each cue (identify the correct pitch of the word for Stroop cues; repeat the word out loud if it was a neutral cue). Importantly, participants were not aware before each Mixed condition block trial of the type of cue they were about to hear.

The trial order within each block was randomized prior to data collection, however each participant was presented with the same randomized order. Participants were randomly pre-assigned to one of two groups. Group A (n = 8 participants) completed the Known condition block of trials prior to the Mixed condition, while Group B (n = 8) completed the Mixed condition first.

Data analyses

All data was collected via Motive Software (v. 2.2; NaturalPoint Inc., USA); data processing and analyses were performed within Visual3D (v. 2021.10.2; C-Motion, USA). A zero-lag 2nd order Butterworth filter (10 Hz Low-pass cutoff) was applied to all kinematic data. Estimated position of the CoM was calculated using the weighted position of the upper trunk (bilateral acromion processes + xyphoid) and pelvis body segments (bilateral iliac crests and anterior superior iliac spine; adapted from: Winter et al. 1998). Velocity of the weighted average of this center of mass (CoMv) along the path of travel (anterior direction) was calculated as a vector based on the estimated CoM position for all walking trials.

Cognitive task accuracy during all non-familiarization trials was measured and only accurate trials were analyzed; any inaccurate cognitive trials were removed from further analysis to ensure that the participant had dedicated appropriate executive resources to the cognitive task performance (Pitman et al. 2021; Pitman & Vallis, 2021). Cognitive task response times (RT) were then processed using previously described methods (Pitman et al. 2021). In brief, verbal responses were collected via a custom microphone affixed to each participant’s left shoulder which relayed these responses as an analog waveform signal. These signals were rectified, a root means square calculation was performed, and 2nd order low pass zero-lag Butterworth filter was applied. A square-wave rising edge identified the start of each auditory cue presentation to participants. RT was defined as the time between onset of auditory stimulus and onset of response verbalization, which was defined to be the point at which the verbal auditory signal surpassed 1.5 times the mean signal recording (to account for any ambient noise in the room) and averaged prior to each trial initiating.

Statistical analyses

Prior to conducting our statistical analyses, we first calculated the average value for each of the two dependent variables (RT and CoMv) for each experimental trial condition, for each participant. Statistical outliers (± 2 standard deviations of the mean) were then removed (2 trials were removed across all participants; 0.19% of all trials completed). Aggregate (average) values were then recalculated for each instruction set, cognitive task and group conditions for each participant. We then performed two, three-way mixed measures ANOVAs on the two dependent variables, cognitive task RT and CoMv to analyze the effects of trial Instruction Set (Known, Mixed), Cognitive task (Incongruent, Congruent Stroop; One-syllable Neutral task, Two– Syllable Neutral task), and Group (Group A, Group B). Bonferroni post-hoc analyses and pair wise comparisons were performed post-hoc to determine specific cognitive task effects, when appropriate. All statistical analyses were performed using SPSS (SPSS Inc., USA, Version 28). Significance was set at p < 0.05.

Results

Cognitive task accuracy was very high; in total, only 9 out of a total of 1024 trials performed by all participants were removed from further analysis due to incorrect cognitive task response, representing 0.87% of all trials. Seven out of 16 participants had one incorrect trial; 2 individuals out of the 7 answered two trials incorrectly.

No significant interaction or main effects for Group were observed for either dependent variable (p > 0.05) thus data from both groups of participants were pooled for all subsequent statistical analyses.

No significant interaction effects were found however a main effect of Cognitive task [F(3,42) = 17.53,p < 0.001, ηp2 = 0.556] was observed for cognitive task RT (Fig. 2). Post-hoc analyses revealed that the incongruent Stroop RT was on average 0.068 ± 0.016 s (mean ± standard error) greater than congruent Stroop RT (p < 0.05); 0.097 ± 0.023 s greater than One-syllable neutral RT (p < 0.05); and 0.116 ± 0.025 s greater than Two-syllable neutral RT (p < 0.05). Congruent Stroop RT was not significantly different from One-syllable neutral RT but was on average 0.047 ± 0.014 s greater than 2-syllable neutral RT (p < 0.05). Similarly, One-syllable neutral RT was significantly greater than 2-syllable neutral RT by a mean of 0.018 ± 0.005 s.

An additional main effect of Instructional set was observed for cognitive task RT [F(1,14) = 46.73,p < 0.001, ηp2 = 0.769]. In general, participants RT was 0.087 ± 0.13 s greater for trials where the cognitive task set was Mixed compared to trial blocks where the cognitive task set was Known (p < 0.001).

Fig. 2
figure 2

Cognitive task response time (s) across Instruction Set and Cognitive Task conditions. No significant interactions were observed; main effects for both Instruction Set and Cognitive Task were found (*, p < 0.001). Not surprisingly, participants responded faster to all cognitive cues conditions when given Instructions a priori (Known) compared to trials randomly presented in a Mixed block. Regarding the Cognitive Task RT main effect, post-hoc analyses revealed that the fastest RT was observed for Two-syllable neutral cue words while the longest RT was observed for incongruent Stroop cues. No significant differences between the One-syllable neutral and congruent Stroop RT was observed, though Two-syllable neutral tasks RTs were faster than the congruent Stroop cue; see text for details

No significant effects were observed for CoMv across any condition, with participants averaging a gait speed of 1.39 ± 0.04 m/s across all test conditions (Instruction Set, Cognitive task; see Table 1).

Table 1 CoMv values (m/s) across cognitive condition and trial block. No significant effects were observed (p > 0.05)

Discussion

The primary goal of the current work was to explore the impact of neutral cues and a priori knowledge on cognitive and locomotor task performance, while the secondary purpose of this study was to examine if the similarity of a neutral cue word to the Stroop task word stimuli affected the central cognitive challenge presented by the neutral task. Our results partially supported our hypotheses. No change in CoMv was observed in any condition, however we did observe significant differences between congruent and incongruent Stroop RT, with incongruent Stroop RT being significantly longer than congruent Stroop RT. Of interest, we also found that the Stroop task accuracy was comparable to our earlier work and that neutral task response time was faster than either Stroop condition, similar to our previous findings which used a single neutral task stimuli word (Pitman et al. 2021; Pitman and Vallis 2021).

Our exploration of the effects of differing phonetic structures to neutral stimuli words revealed that the choice of neutral word impacted cognitive response time. We believe we are the first to explore this in an auditory Stroop paradigm. Specifically, we observed a significantly longer RT following presentation of One-syllable neutral stimuli words compared Two-syllable neutral stimuli words in the Mixed condition trial blocks during walking trials. In addition, we observed faster RTs for all cognitive cue conditions when given Instructions a priori, i.e. Known presentation block compared to trials presented in a Mixed presentation block.

Mixed cue presentation during locomotion affects cognitive but not motor performance in young adults

We observed a main effect of Instructional set for cognitive RT; young adults responded significantly slower for the Mixed instruction set condition. This is interesting, as it demonstrates that when information is not provided in advance, participants must listen carefully to the auditory cues and allocate cognitive resources to ensure appropriate cognitive and motor responses. In contrast, participants are able to prepare for, and execute faster responses when instruction set information is available in advance. This methodological context is important when considering auditory Stroop dual-task research design, and when comparing results between studies that used different instructional sets for auditory Stroop cues. There is prior evidence to demonstrate that central processing load is increased when visual Stroop and neutral task condition trials are presented randomly to participants without foreknowledge provided, as in our Mixed trial conditions. This past work demonstrated that cortex activity in the anterior cingulate was greater when performing blocks of trials with a mix presentation of neutral and congruent visual Stroop tasks compared to congruent Stroop tasks alone (Carter et al. 1995). The authors interpreted this observation as evidence that when switching task demands are required there is an increase in cognitive resources required to perform the task. Our results build on this knowledge and suggest that a similar demand on cognitive resources is present for auditory Stroop cues during a dual-task locomotion.

We initially expected to observe decreased performance in both cognitive task RT and CoM velocity during our Mixed block of trials, however this was not the case; rather, we only observed changes in cognitive RTs for Mixed Instructional Set conditions. It is likely that our locomotor task, unobstructed walking, was a fairly simple task for our young adult participants. We anticipated that the Mixed Instruction trials might be more challenging for our participants, however we observed a high degree of response accuracy for the auditory Stroop task. To our knowledge, previous dual-tasking literature has not directly compared the impact of mixing congruent, incongruent and neutral cues during a dual-tasking locomotor task performance. Our results suggest that a more challenging locomotor task may be necessary to observe the impact of instructional set on motor performance in a young adult population. Future work should explore this further using complex locomotor tasks, such as avoiding an obstacle (da Silva Costa et al. 2018; Siu et al. 2008; Worden et al. 2016) or changing travel direction (Ellmers et al. 2016).

Cognitive task performance is dependent on the complexity of the auditory task

As illustrated in Fig. 2, our participants required greater time to respond to the incongruent versus congruent Stroop cues, the so called “Stroop effect” for both Instructional conditions (Known and Mixed). This finding is similar to previous reports for dual-task locomotion (Siu et al. 2008) and suggests that incongruent cues created the highest cognitive load in our dual-task walking paradigm, regardless of the Instructional set trial condition. Interestingly, we did not observe an effect of Group on cognitive task performance; so even though Stroop paradigms are found to be sensitive to learning through exposure, being exposed to a mixed block of trials prior to a known block of trials, or vice versa, did not impact our participant’s motor or cognitive performance.

When comparing our congruent Stroop RT to our neutral task RTs, we did not observe a facilitation effect. This was not entirely surprising as facilitation is known to be relatively fragile of an effect compared to inhibition within visual Stroop paradigms (MacLeod and MacDonald 2000). Specifically, compared to our Two-syllable neutral task RT, the congruent Stroop required significantly longer RT, which would traditionally demonstrate greater processing conflict. The notion that a neutral task response may be faster than a congruent Stroop response is not unheard of. This “reverse facilitation” whereby conflict in task requirements (i.e., identifying the color of a word rather than reading the word itself) increases the central processing required to respond to the congruent Stroop relative to a neutral task (Littman et al. 2019). Previous work showed that the anterior cingulate cortex, a brain area which contributes to conflict monitoring and task switching showed greater activation when responding to the congruent visual Stroop than to a neutral task when performed without any secondary tasks (Aarts et al. 2009), suggesting different central processing mechanisms are utilized to respond to an identified visual Stroop task. Despite the slightly different methodology used in our study compared to this previous work (i.e. visual vs. auditory Stroop; single vs. dual task; single neutral task vs. four different neutral task conditions) our finding that the congruent auditory Stroop resulted in significantly longer RT when compared to the Two-syllable neutral task condition may be further evidence of this different processing pathway when responding to two different tasks.

This study is the first, to our knowledge, to directly compare One- and Two- syllable neutral stimuli words within an auditory Stroop dual-task locomotor paradigm. We anticipated that because the 2-syllable neutral words were less similar in structure to the single syllable auditory Stroop stimuli words, they would be more easily identified by participants (Gaskell and Marslen-Wilson 2002). Our analysis confirmed this hypothesis and suggests that the nature of the neutral stimulus word, and structural similarity to the Stroop cues, impacts cognitive resource allocation as observable through increased RT.

Main takeaways

The auditory Stroop is a useful tool to present a discrete cognitive challenge to participants in dual-task locomotor research, while avoiding visual structural interference presented by visual Stroop tasks. Our results make some incremental and important contributions to the literature. First, our results demonstrate an increase in cognitive challenge when all cognitive task conditions were presented in a mixed instructional set compared to when conditions are known a priori. From the perspective of dual-task locomotor research design, this has important implications when using the auditory Stroop. Secondly, as we demonstrated in our exploration of the neutral word syllable structure, cognitive reaction times will vary for One-syllable compared to Two-syllable neutral words. It appears that One-syllable neutral words present a greater challenge, likely due to overlap in syllable structure with incongruent and congruent Stroop task cues, indicating that the choice of neutral task word is of importance in auditory Stroop paradigms. As such, we recommend that researchers who plan on using an auditory Stroop task in their dual-task locomotor paradigms consider use of a neutral task stimuli, and carefully consider their choice of neutral task stimulus words as these methodological choices can significantly affect cognitive task performance results.