Introduction

The application of the information processing approach (IPA) (Massaro & Cowan, 1993) to the study of cognitive processes in sexuality has consistently found that sexual information is processed differently than non-sexual information (Geer & Manguno-Mire, 1996). The first information processing stage at which this difference has been evidenced is at the level of encoding or attention. Attentional factors in sexuality were first studied in relation to the interfering effect of distraction tasks on sexual arousal and to the role of misdirected attention in the development and maintenance of sexual dysfunction (Cranston-Cuebas & Barlow, 1990; Farkas, Sine, & Evans, 1979; Geer & Fuhr, 1976; Pryzbyla & Byrne, 1984). This research led to the investigation of the processing of sexual stimuli, with a series of studies using a variety of methodologies, including lexical decision, dot-probe, and priming paradigms. Findings supported the potentially interfering effect of explicit and consciously accessed erotic content on information processing time (Geer & Bellard, 1996; Geer, Judice, & Jackson, 1994; Geer & McGlone, 1990; Geer & Melton, 1997; Janssen, Everaerd, Siering, & Janssen, 2000; Spiering, Everaerd, & Janssen, 2003). In other words, sexual content was possibly exerting attentional capture and thereby implicating additional processing time.

Visual attention has been largely under-investigated in regards to sexuality, despite the long-standing interest in the role of attentional variables, the centrality of vision to the encoding process in general, and the existence of technology to track eye-movements reliably. Eye-tracking methodology is considered a reliable and valid measure of visual attention in reading and scene perception (Rayner, 1995), and has recently been used to investigate the attentional biases of individuals with anxiety disorders (Mogg, Millar, & Bradley, 2000) and dispositional traits, such as optimism (Isaacowitz, 2005). Its application to the study of the processing of sexual information (both words and images) has the potential to further inform us about this preliminary step in cognitive processing.

Eye-trackers vary in design and specifications, but all are designed to measure and record the eye movements of participants presented with visual stimuli. The data they yield provide a continuous and unobtrusive measure of cognitive and visual information processing, although they are limited in what they reveal about higher-order processes. Eye movements during scene viewing have typically been divided into two distinct temporal phases: fixations and saccades (Henderson & Hollingworth, 1998). Fixations refer to periods of time when the point of regard is relatively unmoving, and saccades refer to when the eyes are rotating at a relatively rapid rate as they reorient from one visual target to another. Visual attention has generally been defined as the selective orienting to information from one region of the visual field at the expense of other regions in the same field (Henderson, 1992). Thus, the focus in studies of visual attention has been primarily on fixations.

The important conceptual question in the eye tracking literature has been the extent to which fixations (overt and measurable eye movements) denote attention (a covert cognitive process). Although it has been demonstrated that individuals can and do attend to targets outside of their foveal fixation (e.g., Posner, 1980), 70 years of eye movement studies support Buswell's (1935) original finding that fixation positions cluster in a non-random fashion on scene regions that are either visually (stimulus features such as texture, color luminance, depth, or complexity) or semantically informative (for a review, see Henderson & Hollingworth, 1998). The consensus interpretation in that literature has been that fixations are related to cognitive processing, suggesting that people look longer at regions that take longer to process, for whatever reason.

Building on these findings Henderson and Hollingworth (1998, 1999) proposed the saliency map framework, whereby they posited that (1) visual-spatial attention is allocated to the scene region with the highest saliency weight, and (2) the eyes attempt to stay fixated on the attended scene region. The length of the fixation is determined by the amount of time needed to complete cognitive analysis of that region. Once processing is complete, the saliency weight for that region is reduced and attention is relocated to the region that now has the highest saliency weight. Initial movements of the eye are determined by saliency weights emanating primarily from visual features of the region. However, the source of the saliency weight for a given scene region shifts from visual to semantic interest, eventually leading to a greater fixation density and total fixation time on semantically interesting objects and scene regions.

Applied to sex research, eye-tracking methodology has the potential to inform us in an objective and continuous way about what individuals attend to when exposed to visually erotic situations. Much like priming, dot-probe, lexical decision, and Stroop paradigms, eye tracking can provide a non-invasive window into the attentional processes at play in sexuality, with the added value of greater ecological validity. After all, visual attention is central to the processing of most naturally occurring sexual situations. Eye movements could signal the arousal and/or aversion value of certain stimulus components for different individuals and groups, as well as elucidate the cognitive interference and distractibility associated with certain sexual dysfunctions. However, we first need to know whether and how erotic scenes are processed differently from non-erotic ones.

To this end, we presented both men and women with erotic and non-erotic scenes and tested for differences in eye movements between these two stimulus conditions. We had no reason to expect overall gestalt-type differences in visual attention to erotic vs. non-erotic images, thus we did not hypothesize a stimulus main effect. Rather, we expected different regions of the images to draw attention to greater or lesser degrees depending on whether the image was erotic or not. For this reason, we hypothesized a stimulus (erotic, non-erotic) × scene region (face, body, context) interaction, such that the body would be attended to preferentially in the erotic stimulus condition by both men and women.

Method

Participants

All participants were 21 years of age or older, and they had normal or corrected to normal vision. The sample consisted of 20 men and 20 women. All participants were right-handed, and all identified as heterosexual. Results indicated comparable sociodemographics for men and women (e.g., ethnic and religious distributions). Participants remained naïve with respect to the purpose of the study until debriefing.

Measures and Design

The stimuli consisted of 10 erotic and 10 non-erotic digital photographic color scenes. Of the 10 scenes in each category, five were of individual men and five were of individual women. Images were collected from Playboy and Falcon Studio internet websites.Footnote 1 They depicted scenes consisting of men or women in various states of undress, positioned provocatively, with facial expressions communicating high sexual receptivity. The matched non-erotic images were photographs taken by the primary investigator and their composition was guided by an attempt to maximize the parallelism between erotic and non-erotic images. Backgrounds approximating those of the erotic images were chosen, models were positioned similarly to the models in the erotic images, and zoom distances similar to those in the erotic images were employed. The models, however, were fully clothed, had neutral facial expressions, and their bodily positions were adjusted to divest the photo of erotic inference. All images were 800×600 pixels and were viewed at a distance of 82 cm.

In order to test whether the manipulation of eroticism in the images was successful, we asked participants to rate how arousing they found each set of images. Participants endorsed one of five response options, ranging from “very unarousing” to “very arousing.” On average, the women rated the erotic images as somewhat arousing (M=3.4, SD=.68), and the non-erotic images as generally somewhat unarousing (M=2.4, SD=1.05). A dependent samples t-test revealed that the erotic images were rated as significantly more arousing than the non-erotic images, t(19)=4.16, p < .001, providing evidence for the successful manipulation of eroticism. Men rated the erotic images as somewhat arousing (M=3.95, SD=.39), and the non-erotic images as neither arousing nor unarousing (M=2.8, SD=.83). A dependent samples t-test revealed that the erotic images were rated as significantly more arousing than the non-erotic images, t(19)=6.33, p < .001, providing evidence for the successful manipulation of eroticism.

There were two phases of the experiment for each participant. In the practice phase, participants viewed images of people in various non-erotic situations in order to acclimate to the equipment and procedure. In the test phase, the participant was shown the erotic and non-erotic images. Due to concern for the potentially interfering demand characteristics created by showing heterosexual individuals same-sex images, male participants were only shown images of women and female participants were only shown images of men.

In order to analyze visual attention to the different aspects of the scene, we divided each image into three scene regions: face, body, and context. The face scene region included the face and the hair (head) of the individual in the image. The body included everything below the head, including the torso, arms, and legs. The context of the scene was defined as everything in the rest of the image, such as the background and all of the objects included in it. There were no significant differences between erotic and non-erotic images in terms of the proportion of the image accounted for by the face, the body, or the context in either the male or female photographs.

Apparatus

The stimuli were displayed at a resolution of 1024×786 pixels × 256 colors on a True Color monitor using a Radon VE ATI Graphics card operating at a refresh rate of 85 Hz. Eye movements were recorded by an SMI Eyelink headband-mounted eye-tracker, which was carefully balanced to be comfortable even with extended use. The system used infra-red (940 nm) video-based technology to simultaneously track the eyes and head position composition. Eye positions were sampled at 250 Hz. Viewing was binocular, although only the position of the right eye was tracked, as is common in eye-tracking research.

Procedure

All stimuli and procedures were approved by the University of Nevada, Las Vegas Institutional Review Board, and participants received course research credit for their participation. Participants were briefed about the procedure of the experiment before it began and were encouraged to ask questions at any time. A male research assistant tested the male participants and the primary investigator (who is female) tested the female participants. Once the eye-tracker was placed upon the participant's head, the equipment was calibrated. Calibration consisted of having the participant fixate, or focus upon, nine markers on the display area, and the calibration was checked by having the participant perform the same task again. The Eyelink system was calibrated to each individual until the average error in gaze position was 0.5°. Once the eye-tracker was successfully calibrated, the practice session began.

In the practice session, the participants were presented with three images of people in non-erotic situations and were instructed to “look at the pictures as you normally would.” Once the practice session was completed, the experimental session began. In the experimental session, each participant was presented with 10 images of individuals (5 erotic and 5 non-erotic). Again, men only viewed images of women, and women viewed only images of men. Each scene was presented for 15 s. The presentation of erotic and non-erotic image sets was counterbalanced across all subjects, so that an equal number of participants saw erotic images first versus non-erotic images first.

Upon completion of the eye-tracking portion of the study, participants completed a short questionnaire including demographic variables and potential prior exposure to images or individuals in the images used in the study. The experiment lasted approximately 20 min.

Table 1 Means and SDs for female participants: Number of fixations, first gaze duration, and total time as a function of stimulus type and scene region

Data Analyses

The three dependent measures were total number of fixations, first gaze duration, and total time. These three eye-tracking measures are the most commonly reported dependent variables in the cognitive literature. Total number of fixations was a count of the times the eye landed on any given scene region; it is often theorized that total number of fixations is a measure of drawing attention, one indication of overall interest in that particular scene region. First gaze duration measured the total number of milliseconds the eye remained in a given scene region the very first time it landed on that particular scene region before moving away; it is thought to be a measure of attentional capture. Total time was a measure of the total number of milliseconds the individual attended to a particular scene region across the entire stimulus presentation time (in this case, 15 s); total time is also thought to be an indication of overall interest in a given scene region. For each dependent variable (number of fixations, first gaze duration and total time), results were analyzed in 2 (Stimulus: Erotic vs. Non-erotic) × 3 (Scene Region: Face, Body, Context) repeated measures ANOVAs. Greenhouse-Geisser corrections were applied when sphericity was violated and are clearly indicated throughout.

Table 2 Means and SDs for male participants: Number of fixations, first gaze duration, and total time as a function of stimulus type and scene region

Results

Viewing patterns of women

Means and SDs for women gazing at male images are shown in Table 1. For total number of fixations, there was a significant main effect for Scene Region, F(2,38)=25.20, p < .001, η2=.62, and a significant Scene Region × Stimulus interaction, F(2,38)=15.64, p < .001, η2=.67. The Scene Region × Stimulus interaction was analyzed using simple effects. There was a simple main effect for Stimulus (erotic vs. non-erotic) whereby women looked at bodies significantly more times, p < .001, and context significantly fewer times, p=.002, in the erotic stimuli than in the non-erotic stimuli. There was also a simple main effect for Scene Region (face, body, context) in the erotic stimuli, F(1.49,28.22)=37.84, p < .001 (with Greenhouse Geisser adjustment). Pairwise comparisons using Bonferroni correction for multiple comparisons indicated that women looked at bodies significantly more times than at faces, p < .001, and context, p < .001; and had significantly more fixations on faces than on context, p=.005. There was also a simple main effect for Scene Region in the non-erotic stimuli, F(2,38)=5.00, p=.012. Pairwise comparisons using Bonferroni correction indicated that women looked significantly more times at bodies than at context, p=.028.

For first gaze duration, there was a significant main effect for Scene Region, F(2,38)=11.71, p < .001, η2=.37, and a significant Stimulus × Scene Region interaction, F(2,38)=3.76, p=.032, η2=.12. The Scene Region × Stimulus interaction was analyzed using simple effects analyses. There was a simple main effect for Stimulus, whereby first gaze duration on context was significantly longer, p=.014, in the erotic stimuli than in the non-erotic stimuli. There was no simple main effect for Scene Region in the erotic stimuli. However, there was a simple main effect for Scene Region in the non-erotic stimuli, F(1.53,28.99)=24.99, p < .001 (Greenhouse Geisser adjustment). Pairwise comparisons using Bonferroni correction indicated that women had significantly longer first gaze durations on both faces, p < .001, and bodies, p < .001, than on context.

For total time, there was a significant main effect for Scene Region, F(2,38)=19.29, p < .001, η2=.63, and a significant Scene Region × Stimulus interaction, F(2,38)=14.28, p < .001, η2=.41. The Scene Region × Stimulus interaction was analyzed using simple effects analyses. There was a simple main effect for Stimulus whereby women spent significantly less time looking at faces, p=.024, and context, p=.001, and more time looking at bodies, p < .001, in the erotic stimuli than in the non-erotic stimuli. There was also a simple main effect for Scene Region in the erotic stimuli, F(2,38)=29.09, p < .001. Pairwise comparisons using Bonferroni correction indicated that women looked significantly longer at bodies than at faces, p=.033, and context, p < .001; and they looked significantly longer at faces than at context, p < .001. There was also a simple main effect for Scene Region in the non-erotic stimuli, F(2,38)=7.32, p=.002. Pairwise comparisons using Bonferroni correction indicated that women looked at faces significantly longer than at context, p=.002.

Viewing patterns of men

Means and SDs for men gazing at female images are shown in Table 2. Greenhouse-Geisser results are reported when appropriate. For total number of fixations, there was a significant main effect for Scene Region, F(1.51,28.72)=31.51, p < .001, η2=.57, (Greenhouse Geisser adjustment), and a Scene Region × Stimulus interaction, F(2,38)=38.53, p < .001, η2=.45. The Scene Region × Stimulus interaction was analyzed using simple effects analyses. There was a simple main effect for Stimulus whereby men evidenced significantly more fixations on bodies, p < .001, and significantly fewer fixations on faces, p < .001, in the erotic stimuli than in the non-erotic stimuli. There was also a simple main effect for Scene Region in the erotic stimuli, F(1.50,28.57)=54.38, p < .001 (Greenhouse Geisser adjustment). Pairwise comparisons using Bonferroni correction indicated that men looked significantly more times at bodies than at faces, p < .001, and context, p < .001. There was also a simple main effect for Scene Region in the non-erotic stimuli, F(1.52,28.96)=13.93, p < .001 (Greenhouse Geisser adjustment). Pairwise comparisons using Bonferroni correction indicated that men looked at bodies significantly more times than at faces, p=.023, and context p < .001; they also looked at faces significantly more times than at context, p=.038.

For first gaze duration, there was only a main effect for Scene Region, F(1.26,23.92)=11.06, p=.004, η2=.38 (Greenhouse Geisser adjustment). Pairwise comparisons using Bonferroni correction indicated that first gaze duration was significantly longer on bodies (M=1198.65, SD=1114.97) than on context (M=428.68, SD=182.77, p < .001) and significantly longer on faces (M=969.91, SD=539.70, p < .001) than on context.

For total time, there was a significant main effect for Scene Region, F(2,38)=32.14, p < .001, η2=.50, and a Scene Region × Stimulus interaction, F(2,38)=13.28, p < .001, η2=.43. The Scene Region × Stimulus interaction was analyzed using simple effects analyses. There was a simple main effect for Stimulus whereby men spent significantly more time looking at bodies, p=.006, and significantly less time looking at faces, p < .001, in the erotic stimuli than in the non-erotic stimuli. There was also a simple main effect for Scene Region in the erotic stimuli, F(2,38)=49.25, p < .001. Pairwise comparisons using Bonferroni correction indicated that men looked for significantly longer periods at bodies than at faces, p < .001, and context, p < .001; they also looked longer at faces than at context, p=.002. There was also a simple main effect for Scene Region in the non-erotic stimuli, F(2,38)=16.85, p < .001. Pairwise comparisons using Bonferroni correction indicated that men looked at faces, p < .001, and bodies, p < .001, significantly longer than at the context.

Discussion

As hypothesized, tracked eye movements during scene presentation showed differential viewing patterns to erotic and non-erotic images, and the difference was primarily in the preferential visual attention to bodies in the erotic stimuli. These findings provide further evidence that sexual information may be processed in a different manner than non-sexual information, as has been found in past research using other experimental paradigms (for a review, see Geer & Manguno-Mire, 1996). More importantly, eye-tracking methodology has the ability to capture this difference in sexual information processing at the level of visual attention.

Of interest was the variance in results contingent on the dependent variable under examination. Both total number of fixations and total time are relatively gross measures of overall attention to any given scene region, and it was on these two measures that we found consistent differences in visual attention to erotic vs. non-erotic stimuli; this was not the case with first gaze duration. A more subtle measure of attention capture, first gaze duration is the duration of the very first fixation on a region; this measure of visual attention was thus unaffected by the possibility that individuals may return to that region and fixate therein for long periods of time. Judging from our results, it did not appear that any specific scene region was preferentially attended to in erotic stimuli in terms of how long the first fixation in a region lasted. This may indicate that, during initial scene processing, individuals devoted similar amounts of time to the various scene regions, regardless of erotic content. Perhaps it was only after the gestalt of the image was understood that individuals attended to scene regions that interested them, thus resulting in the significant Stimulus × Scene Region interactions found on total number of fixations and total time, but not on first gaze duration. Another possibility is that first gaze duration was determined by the visual features (e.g., luminance, contrast, etc.) of regions rather than by their semantic content (meaning), the latter generally considered more likely to affect total time and number of fixations on a region (Henderson, Weeks, & Hollingworth, 1999).

We cannot comment specifically on gender differences, as men and women were not shown the same images. It does, however, seem worth commenting on some surface similarities and potential differences. Although both men and women exhibited a very similar pattern of preferential visual attention to the body in the erotic stimuli in comparison to the non-erotic stimuli, some interesting patterns emerged that may be worthy of future investigation. For example, men looked at bodies 22% longer in the erotic stimuli than in the non-erotic stimuli, while women looked at the bodies 38% longer in the erotic stimuli than in the non-erotic stimuli. Women looked at the context 33% less time in the erotic stimuli, while men looked at the context 12% less in the erotic stimuli than in the non-erotic stimuli. In terms of visual attention to faces, men focused on them 25% less time in the erotic condition, while the equivalent female decrease was 15%. Based on this cursory look, one could hypothesize that women's visual attention patterns may be more dramatically altered by erotic content than are those of men (with the exception of fixations on the face). This may be related to the gender differences found in other aspects of the cognitive processing of sexual information, in which women consistently showed evidence of additional cognitive processing as a function of erotic content (for a review, see Geer & Manguno-Mire, 1996). Whether these implied differences will be found statistically when men and women are shown the identical stimuli remains to be seen and deserves further study.

Perhaps the most central question in the interpretation of our findings relates to the mechanism underlying the fixation and total time differences we found between erotic and non-erotic stimuli. Although judging from our manipulation check, we appeared to have successfully manipulated the subjective sexual arousal value of images, it is unclear the extent to which we may have also unwittingly manipulated visual features, novelty, emotional valence, or other features that might reasonably affect viewing patterns. The visual features of the images did not appear to be a serious confound, as eye-tracking research suggests that visual features draw initial fixations but that semantic features are the ones that hold them (Rayner & Pollatsek, 1992). Our dependent measures of total time and number of fixations are generally accepted to be measures of visual attentional hold (Henderson, 1996). We did not attempt to tease apart eroticism from novelty and emotion by controlling for the latter, and that could certainly be useful in elucidating basic mechanisms driving visual attention. On the other hand, novelty and emotion could arguably be considered essential features of erotic content to the extent that sexual content is not, for most of us, a prosaic part of our existence–it usually stands out when it appears, and sex has been considered equivalent to an emotion insofar as it is characterized by subjective, physiological, and neurological correlates common to other emotions (Everaerd, 1988; Geer, Lapour, & Jackson, 1993; Hamman, Herman, Nolan, & Wallen, 2004).

More fundamentally, however, we now have evidence that eye-tracking methodology can detect differences in visual attention to erotic vs. non-erotic stimuli. The experimental research on eye-tracking and visual attention in other areas supports our contention that these fixations are indicative of covert attentional processes instated in response to informative, emotional, appetitive or aversive stimuli, all of which appear to draw attention (Calvo & Lang, 2004; Henderson & Hollingworth, 1999; Lundquist & Ohman, 2005). Although we can attest to consistent differences in attention to aspects of erotic scenes as evidenced through eye movements, we do not know the concurrent higher order cognitions (e.g., appraisal, interest) taking place during scene perception. Considering that a number of studies have found no differences in visual attention to positive and negative stimuli (for a review, see Calvo & Lang, 2004), future investigations of these higher order processes will likely necessitate concurrent self-report. Additionally, it would be interesting to explore potential gender differences in visual attention to erotic material to elucidate gender differences found in self-reported sexual fantasy content (Leitenberg & Henning, 1995) and memory for erotic material (Geer & McGlone, 1990). Utilizing paradigms common to eye-tracking studies to examine distractability and memory in clinical and non-clinical populations in sexuality may enlighten differences in cognitive processes that accompany various sexual dysfunctions. Investigating how sexual arousal (both subjective and physiological) relates to eye movement patterns may also be important to include in studies employing eye-tracking methodology.

Future studies may also want to improve on the current one by targeting its limitations. As previously mentioned, we did not attempt to isolate the novelty or emotional valence of the stimuli from their erotic content. Adding conditions that manipulate these elements would be useful. We were not able to test for gender differences in visual attention to erotic stimuli, as we showed men and women different images in an attempt to avoid the demand characteristics that might have come into play had individuals been shown same-sex images. Showing male and female participants the same erotic images is worth pursuing. In addition, our study broke down the images into only three general scene regions (face, body, context). A finer breakdown into smaller regions would yield more detailed information about the precise elements of the face, body or background that are being attended to. Finally, although we strove for parallelism of the erotic and non-erotic images using real people in natural settings, this could be improved with standardized computer generated images than can be more precisely designed to control for image features.