Introduction

Emotional stimuli are thought to hold a privileged status during perceptual processing because affective cues often indicate significant external events such as threats or rewards (Pourtois et al. 2012). This prioritization of emotional cues has been demonstrated in the visual modality by, for instance, faster detection of emotional than neutral stimuli in visual search tasks (e.g. Öhman et al. 2001), and in the auditory modality by faster detection of auditory targets following presentation of an emotional cue on the same side as the target, compared to the opposite side (Bertels et al. 2010). Emotional events in everyday life often convey highly correlated information along multiple sensory channels (e.g. seeing a flash and hearing the bang of an explosion), and there is a growing body of evidence showing that emotional cues in one modality can influence processing in a second input modality (for reviews, see Brosch and Grandjean 2013; Gerdes et al. 2014). For example, emotional pictures can facilitate categorization of auditory cues (Tartar et al. 2012), and auditory processing is boosted by visual emotion (Selinger et al. 2013).

In addition to general improvements in performance, affective visual cues can elicit exogenous shifts of attention which can boost processing in a different modality at the visually cued location. For example, using a temporal order judgement task, it was shown that visual threat cues can bias the distribution of spatial attention to targets subsequently presented in a different modality (Van Damme et al. 2009). However, attentional bias to the location of the threatening visual images and the subsequent boosting of auditory processing at that location, as reported by Van Damme et al. (2009), could have been caused by processes relating to different subcomponents of attention.

In Posner’s model of attention (e.g. Posner and Peterson 1990), attending to a new cue involves three processes: firstly, an initial shift of attention to the cue; secondly, attentional engagement with the cue; and thirdly, attentional disengagement from the cue. In the study by Van Damme et al. (2009), once attention was oriented to a particular visual image, either enhanced attentional engagement with the image and/or delayed disengagement from the image could have contributed to attentional bias to the image. The roles of the engagement and disengagement attentional components in the crossmodal modulation of attention by emotion can, in principle, be empirically distinguished using a spatial cueing task (Posner 1980). In the current study, we used a modified (i.e. crossmodal) version of the so-called emotional spatial cueing task (e.g. Mulckhuyser and Crombez 2014). In this task, participants are required to indicate whether a non-emotional auditory target appeared on either the left or the right, after seeing a spatially non-predictive peripheral visual cue (a pleasant, unpleasant, or emotionally neutral natural scene). On a ‘valid’ trial, the visual cue precedes the auditory target at the same spatial location; on an ‘invalid’ trial, the target appears on the opposite side to the visual cue. The large number of purely unimodal studies (i.e. visual cue and visual target) that have used the emotional spatial cueing design has generally reported that the cueing effect (i.e. faster responses to validly cued vs. invalidly cued targets) is enhanced for emotional versus non-emotional cues (e.g. Koster et al. 2006; Mulckhuyser and Crombez 2014; Yiend and Mathews 2001), reflecting bottom-up attentional capture by the affective nature of the cue. Moreover, by comparing reaction times to validly cued emotional and neutral trials, the role of attentional engagement with the emotional cue can be identified (Yiend and Mathews 2001). For example, a decrease in response times to validly cued targets preceded by an emotional cue, compared to validly cued targets preceded by a neutral cue, would indicate facilitated attentional engagement with the emotional cue. Conversely, by comparing reaction times to invalidly cued emotional and neutral trials, the role of attentional disengagement from the emotional cue can be indexed. For example, an increase in response times to invalidly cued targets preceded by an emotional cue, compared to invalidly cued targets preceded by a neutral cue, would indicate delayed attentional disengagement from the emotional cue.

Little is known about how emotion-related asymmetries in hemispheric processing affect the modulation of auditory spatial attention by affective visual stimuli. Lateralized asymmetries have, though, been reported for the modulation of visual spatial attention by peripheral auditory emotional cues (Brosch et al. 2008a, 2009; Harrison and Davies 2013; Schock et al. 2013), where the crossmodal attentional effects were greatest on the right side. For this reason and in the light of the hemispheric specialization theory of emotional processing (e.g. Demaree et al. 2005), we predicted that auditory spatial attention would be modulated by visual emotional cues more strongly on the right compared to the left side.

The present study used a crossmodal emotional spatial cueing paradigm to investigate the effects of affective (pleasant and unpleasant) visual cues on auditory spatial attention. To ensure that participants attended to the pictorial images, a secondary task required participants to detect an infrequent target in the visual cue. Auditory cues were presented via loudspeakers placed adjacent to the location of the visual images, to ensure approximate spatial alignment of the visual and auditory stimuli. Based on prior crossmodal studies (e.g. Brosch et al. 2008a), we expected to find a larger cueing effect (i.e. faster responses to validly cued vs. invalidly cued targets) for auditory targets preceded by unpleasant visual cues, compared to neutral visual cues. We also expected to find an enhanced cueing effect for targets preceded by pleasant visual cues, compared to neutral visual cues, as the previous research has shown that pleasant scenes can capture (visual) attention (Nummenmaa et al. 2006) (although it should be noted that previous crossmodal studies investigating attentional capture by pleasant images are lacking, to our knowledge). We also aimed to distinguish the role of engagement and disengagement attentional components in the crossmodal modulatory effect. We did not have a specific prediction about attentional engagement and disengagement due to mixed findings in previous studies (c.f., Mulckhuyser and Crombez 2014), but we expected to observe either enhanced engagement with and/or delayed disengagement from, the pleasant and unpleasant visual cues. Lastly, due to habituation to the affective content of the cues (e.g. Bradley et al. 1993) and based on previous studies of crossmodal attentional modulation (Brosch et al. 2008a), we expected that the influence of visual emotional cues on auditory spatial attention would be attenuated in the second half of the experiment compared to the first half.

Methods

Participants

Twenty-eight participants took part in the experiment. All reported normal hearing and normal or corrected-to-normal vision. Data from one participant were excluded due to equipment failure. For the remaining participants (N = 27), the mean age was 29.5 years (SD = 12.2), 27 were right-handed, and 18 were females. The experiment was approved by the Ethics Committee of the Psychology Department at Liverpool Hope University.

Stimuli and apparatus

Visual stimuli consisted of 60 images, selected on the basis of valence and arousal norms from the International Affective Picture System (IAPS; Lang et al. 2008). Twenty images were unpleasant (e.g. garbage), 20 images were pleasant (e.g. kittens), and 20 images were emotionally neutral (e.g. mushroom).Footnote 1 Mean valence ratings for the selected unpleasant, pleasant, and neutral images were 3.21 ± .66, 7.14 ± .51, and 5.22 ± .39, and the mean arousal ratings were 5.26 ± .60, 5.27 ± .51, and 3.94 ± .55, respectively [based on IAPS norms (Lang et al. 2008)]. The 60 pictures were rated by 17 of the participants who completed the main experiment on two dimensions (valence and arousal) using 9-point rating scales (valence: 1 = very unpleasant, 9 = very pleasant; arousal: 1 = not at all arousing, 9 = very arousing). Results showed that the unpleasant pictures (mean valence = 2.76 ± 1.38) were rated as less pleasant (p < .001) than the neutral pictures (mean valence = 4.73 ± 1.10) and that the pleasant pictures (mean valence = 7.09 ± 1.16) were rated as more pleasant (p < .001) than the neutral pictures. Both unpleasant (mean arousal = 5.45 ± 1.65) and pleasant (mean arousal = 4.77 ± 1.39) were rated as more arousing (p < .05) than the neutral pictures (mean arousal = 3.54 ± 1.00).

Images were projected at eye height onto a white wall using a Hitachi CP-X328 Multimedia LCD projector. Projected picture dimensions were approximately 30.0 cm long × 22.5 cm high, and the centre of each image was located 26.5 cm to the left or right of centre. At a viewing distance of 1 m, each image spanned approximately 17° of visual angle (i.e. from 15° to 32° to the left or right of fixation).

Auditory stimuli were presented using Creative GigaWorks T20 Series II loudspeakers, and the mean dB level for all sounds was 65 dB, measured at the participants’ ear. The left speaker was located directly below the bottom edge in the middle of the left projected image and vice versa for the right speaker. The auditory target was a 30 ms sine wave (1 kHz) produced using MATLAB. E-Prime 2.0 was used to control the experiment.

Procedure

Participants were seated 1 m from the projected images. The experiment began with a practice block of 10 trials, using visual images not included in the main experiment. Each trial began with a central fixation cross lasting 500 ms, immediately followed by presentation a single pictorial image for 250 ms. The image was unpleasant (p = .33), pleasant (p = .33), or neutral (p = .33), and was either presented on the left (p = .5) or on the right (p = .5) side, and was immediately followed by an auditory target in either the same location (‘valid’ trial; p = .5) or the opposite location (‘invalid’ trial; p = .5). The order of presentation was randomized. Participants were required to press the ‘Z’ or ‘M’ key if the auditory target appeared on the left or right, respectively. Participants had 1500 ms to respond after onset of the target. After response or after 1500 ms in the event of no response, there followed a random inter-trial interval of between 1250 and 1500 ms.

In a concurrent secondary task, participants had to detect an infrequent (p = .09) change in the visual images consisting of two vertical black lines 1 cm in length on the top and bottom edges of both pictures. The lines were displayed simultaneously with the onset of the pictures for 500 ms. Participants were required to press the space bar when they detected the lines. In total, participants completed 480 trials in the primary task (160 trials for each picture category) and 24 trials of the secondary task, divided equally into three experimental blocks (168 trials per block).

Results

The mean accuracy rate on the primary task was 83.7 %, and accuracy on the secondary task was 68.5 %; these data were not subjected to further analysis. After excluding trials in which an error was made, responses <150 ms, or more than 800 ms, and then those more than 2.5 SDs above each participants’ mean were removed to reduce the influence of outliers (6 % of the data). Reaction times were divided into first half (i.e. first 240 trialsFootnote 2) and second half of the experiment as the previous research found a reduction in attentional modulation in the course of the experiment (e.g. Brosch et al. 2008a). Mean reaction times are presented in Table 1.

Table 1 Mean (and standard deviations) of the reactions times to left and right-sided valid and invalid targets in the crossmodal emotional spatial cueing task, for each type of emotional category (neutral, unpleasant, pleasant)

Overall effects

A 2 × 2 × 3 × 2 repeated measures ANOVAFootnote 3 with the factors of Cue Validity (valid, invalid), Target Side (left, right), Emotion Category (unpleasant, pleasant, neutral), and Experimental Half (first half, second half) found a significant main effect of Cue Validity [F(1,26) = 58.23, p < .001], where RTs were faster following valid (M = 455.27, SD = 90.91) compared to invalid cues (M = 515.25, SD = 104.17). There was also a main effect of Emotion [F(1,37) = 8.35, p = .001], where responses to positive pictures (M = 481.10, SD = 83.30 ms) were faster than responses to unpleasant pictures (M = 492.10, SD = 82.70) [t(26) = 3.224, p = .003], and responses to neutral pictures (M = 483.85) were faster than responses to unpleasant pictures [t(26) = 2.772, p = .010]. We also found a main effect of Experimental Half [F(1,26) = 13.99, p = .001], where RTs were faster in the second half of the experiment (M = 466.83, SD = 87.17) compared to the first half (M = 503.62, SD = 99.33). In addition, the four-way interaction between Cue Validity, Target Side, Emotion Category, and Experimental Half was significant [F(2,48) = 3.65, p = .036]. We wanted to test the specific prediction that there would be differences in the magnitude of the cue validity effect between left and right targets as a function of Emotion Category and Experimental Half. To assess this prediction, for each condition a cue validity index was calculated as RTs on invalid trials minus RTs on valid trials (Koster et al. 2006); a positive cue validity index indicates attention towards a cue. The cue validity index for each condition is shown in Fig. 1. To interpret the four-way interaction, a simple interaction effects analysis was conducted using two-way ANOVAs (with the cue validity index as the dependent variable) with factors Emotion Category and Side at each level of the factor Experimental Half. There was a significant interaction between Emotion Category and Side in the first Experimental Half [F(2,44) = 4.44, p = .023], but not in the second [F(2,48) = 1.34, p = .271]. In the first Experimental Half, cue validity index differed between emotions when the target was presented on the right [F(2,52) = 4.53, p = .015], but not when presented on the left [F(2,52) = .40, p = .675]. Post hoc t tests revealed that unpleasant cues were associated with a larger cue validity index (M = 80.66, SD = 72.71) compared to neutral cues (M = 48.17, SD = 59.02) [t(26) = 2.29, p = .030, d = .447] and that the cue validity index for pleasant cues (M = 75.27, SD = 57.53) was larger than for neutral cues [t(26) = 3.88, p = .001, d = .747].

Fig. 1
figure 1

Mean cue validity effects (RT invalid minus RT valid) for auditory targets presented on left and right side, in the first half of the experiment. There was no difference in cue validity between emotional categories for targets on the left, but for targets on the right both the unpleasant and the pleasant cues resulted in an increased cue validity effect. Errors bars represent SEM

Attentional engagement and disengagement

The previous analyses revealed that the cue validity index was larger for pleasant and unpleasant cues compared to neutral cues (i.e. an emotional cue validity effect for unpleasant cues), but only for targets presented on the right in the first half of the experiment. Next we wanted to examine which subcomponents of attention were involved in the emotional cue validity effect for right-sided targets following emotional visual cues. To assess attention engagement, we analysed responses on valid trials (Yiend and Mathews 2001) using a one-way repeated measures ANOVA with the factor Emotion Category (unpleasant, pleasant, neutral) and found a significant main effect [F(2,39) = 3.631, p = .047]. Post hoc t tests revealed that RTs for valid cues in the pleasant condition (M = 460.84, SD = 95.62) were faster than RTs for valid cues in the neutral condition (M = 483.12, SD = 101.74) [t(26) = 3.838, p = .001, d = .753], revealing that pleasant cues elicited attentional engagement. There was no difference between valid cues in the unpleasant condition (M = 471.55, SD = 83.97) and valid cues in the neutral condition [t(26) = 1.16, p = .259].

To assess attention disengagement, we analysed responses on invalid trials for right-sided targets in the first half of the experiment using a one-way repeated measures ANOVA with the factor Emotion Category (unpleasant, pleasant, neutral) and found a significant main effect [F(2,40) = 3.99, p = .036]. Post hoc t tests revealed that invalid trials were slower on unpleasant (M = 552.21, SD = 99.64) compared to neutral trials (M = 531.29, SD = 90.10) [t(26) = 2.50, p = .019, d = .493], but there was no difference between invalid pleasant trials (M = 536.11, SD = 96.69) compared to invalid neutral trials [t(26) = .90, p = .377].

Discussion

We tested whether visual emotional cues modulated auditory spatial attention, using a modified exogenous spatial cueing design. Participants were required to indicate the spatial location of a non-emotional auditory target, after seeing a spatially non-predictive peripheral visual cue that was pleasant, unpleasant, or emotionally neutral. Compared to neutral cues, pleasant as well as unpleasant visual cues elicited automatic shifts of attention to the cued location. This led to facilitated processing of a subsequent auditory stimulus presented at the cued location, whereas processing of auditory targets presented at the opposite location was not facilitated. This effect was observed only for targets presented on the right. Further, we showed that the crossmodal modulatory effect for right-sided targets was due to delayed attentional disengagement in the case of unpleasant cues and due to enhanced attention engagement in the case of pleasant cues.

We found an overall cue validity effect (where responses to targets were faster on validly cued trials compared to invalidly cued trials), replicating the typical findings in spatial cueing paradigms (Posner 1980). More importantly, the cue validity effect was stronger for pleasant and unpleasant emotional cues, compared to neutral cues; in other words, we observed a crossmodal emotional cue validity effect. An emotional cue validity effect has been reported in previous unimodal studies where processing of visual targets was facilitated at the location of an emotional visual cue (e.g. Koster et al. 2006; Mulckhuyser, and Crombez 2014; Yiend and Mathews 2001), and in crossmodal studies showing facilitated processing of auditory cues at the location of threatening visual cues (Van Damme et al. 2009). The enhancement of the cueing effect for affective stimuli likely reflects automatic bottom-up driven attentional capture by the emotional cue (Pourtois et al. 2012).

To our knowledge, a crossmodal emotional cue validity effect for images with a positive valence has not previously been demonstrated. While it has been shown that positive emotional visual cues can enhance processing of subsequently presented visual targets at the same location (Brosch et al. 2008b), the current study demonstrates that pleasant visual cues can also influence the allocation of spatial attention in a different (i.e. auditory) modality. Our novel finding is in line with previous (unimodal) behavioural (Ferrari et al. 2008) and electrophysiological (Simola et al. 2015) studies, demonstrating that images of pleasant natural scenes can engage attention in a bottom-up manner, likely due to their intrinsic motivation properties as appetitive stimuli.

Going beyond previous studies that have showed that (threatening) visual emotional cues can modulate the distribution of spatial attention to subsequently presented auditory targets (e.g. Van Damme et al. 2009), the current experiment is, to our knowledge, the first to investigate which subcomponents of attention are involved in the crossmodal effects on auditory spatial attention by visual emotional cues. Our results provided evidence that the facilitated processing of auditory targets at the location of affective visual cues was due to different attentional components depending on the valence of the visual cue.

For unpleasant cues, the cue validity effect appeared to result from slower attentional disengagement; i.e., participants had difficulty disengaging attention from the unpleasant emotional cues in order to shift attention to an auditory target when it was presented to the opposite side. On the other hand, participants had no difficulty disengaging attention from the unpleasant cues when the auditory target was presented at the same spatial location. Slower disengagement from an unpleasant (or threatening) visual cue has previously been reported for visual cues followed by visual targets, indicating greater dwell duration on the unpleasant cue (Koster et al. 2006; Yiend and Mathews 2001). Here we extend the findings of previous unimodal studies to show that slower disengagement impacted the ability to shift attention crossmodally to a different spatial location. Presumably, this process is designed to prioritize sensory processing and information gathering at the location of a potentially unpleasant stimulus and to prevent attention from being attracted to competing information in either the same modality or a different modality, at another position in space.

For pleasant cues, we showed that the cue validity effect likely resulted from enhanced attentional engagement with the pleasant images, as responses were faster in the valid condition for pleasant cues, compared to neutral cues (cf, Yiend and Mathews 2001). This is in general agreement with eye-tracking studies that have shown early attentional capture by pleasant visual scenes (Nummenmaa et al. 2006) and demonstrates that attentional engagement at a location in space elicited by a potentially beneficial stimulus can enhance processing in a separate modality at the same location.

The current study also investigated whether the emotional cue validity effect was lateralized. Lateralized asymmetries in response to targets following emotional cues have been demonstrated for both unimodal tasks (auditory cues with auditory targets: Bertels et al. 2010) and crossmodal tasks (auditory cues with visual targets: Brosch et al. 2008a; Harrison and Davies 2013), but the authors are unaware of any studies that have tested for a lateralized effect using task-irrelevant visual emotional cues followed by auditory targets. We found that the emotional cue validity effect was evident only for targets presented on the right, and we argue that this pattern of results can be most readily explained by the valence hypothesis of emotion processing, where the right cerebral hemisphere is specialized for processing negative valence, and the left hemisphere is dedicated to processing cues of a positive valence (Demaree et al. 2005). The processing of negatively valenced visual cues in the invalid condition (i.e. presented on the left, followed by a target on the right) may have impaired the network subserving crossmodal shifting of attention, which is thought to be lateralized to the right hemisphere (Corbetta and Schulman 2002). On the other hand, auditory targets on the right following pleasant visual cues on the right (i.e. valid trials) were processed faster than auditory targets following neutral cues. In these valid trials, the pleasant cues on the right would be processed preferentially by the left hemifield, which is specialized for representation of positive affective cues, thus facilitating shifts of attention to auditory stimuli presented subsequently at the same location. As this interpretation remains necessarily speculative, further research is needed to fully disentangle the effects of hemispheric lateralization on the crossmodal shifting of attention following visual affective cues.

Recently, it has been argued that emotional cue validity effects may result not from emotional modulation of attention, but rather that (threatening) emotional cues could elicit faster response times in spatial cueing tasks due to enhanced response priming (Mulckhuyser and Crombez 2014). It is important to note that the emotional modulation of attention reported in the current study is very unlikely to be explained by the response priming account, for two reasons. Firstly, we found increased reaction times for targets invalidly cued by the unpleasant cues, compared to neutral cues, only for targets presented on the right. Presumably, the response priming account should produce slower RTs for invalidly cued targets following emotional cues on both sides. Secondly, we observed an emotional cue validity effect not just for unpleasant cues, but also for pleasant cues, whereas the response priming account deals only with motor priming following aversive, or threatening, cues. Furthermore, it is important to mention that in the current study the visual emotional cue did not predict the target location, thus ensuring that endogenous attention was not elicited during cue presentation.

The crossmodal effects on auditory spatial attention by visual affective cues reported in the current study were found only for stimuli presented in the first half of the experiment. This is in agreement with findings from previous studies (e.g. Brosch et al. 2008a) and most likely represents habituation to the emotive content of the stimuli with repeated exposure. The process of affective habituation has been shown to lead to decreased physiological reactions to emotional stimuli (Bradley et al. 1993); therefore, in the second half of the experiment, the emotional impact of the cues may be reduced, leading to a reduced influence on attention processes.

It should be noted that the pictorial images in the present study were natural scenes, as we were interested in understanding the effects of these stimuli on crossmodal attention in a situation that resembles the processing of intrinsically relevant stimuli encountered in everyday life. Future studies should investigate the effect of specific categories of visual emotional cues (e.g. faces) on the allocation of auditory spatial attention.

In the current study, the auditory targets were presented directly following offset of the visual images, but it is likely that crossmodal effects on attention by emotion differ depending on the cue–target asynchrony, as it is thought that crossmodal emotional facilitation operates in a phasic manner (Selinger et al. 2013). An interesting avenue for future research, therefore, would be to investigate the temporal characteristics of crossmodal facilitation by emotional cues, for example by varying the stimulus-onset asynchrony between visual cue and auditory target. Additionally, the auditory stimuli in the present investigation were neutral (non-emotional), but future studies could usefully examine the effect of visual emotion on the allocation of attention to emotional auditory cues, as it is known that responses to affective auditory stimuli (as measured, for example, by the acoustic startle reflex) are strongly modulated by emotional visual cues (e.g. Lang et al. 1990).

In summary, we demonstrated that task-irrelevant positively and negatively valenced visual cues modulated the distribution of spatial attention to a subsequently presented auditory target. Additionally, we provided evidence that the crossmodal facilitatory effect on spatial attention by emotion resulted from different mechanisms depending on the valence of the visual cues—speeded engagement in the case of pleasant cues and delayed disengagement for negative cues.