How body postures affect gaze control in scene viewing under specific task conditions

Backhaus, Daniel; Engbert, Ralf

doi:10.1007/s00221-023-06771-x

How body postures affect gaze control in scene viewing under specific task conditions

Research Article
Open access
Published: 01 February 2024

Volume 242, pages 745–756, (2024)
Cite this article

Download PDF

You have full access to this open access article

Experimental Brain Research Aims and scope Submit manuscript

How body postures affect gaze control in scene viewing under specific task conditions

Download PDF

697 Accesses
Explore all metrics

Abstract

Gaze movements during visual exploration of natural scenes are typically investigated with the static picture viewing paradigm in the laboratory. While this paradigm is attractive for its highly controlled conditions, limitations in the generalizability of the resulting findings to more natural viewing behavior have been raised frequently. Here, we address the combined influences of body posture and viewing task on gaze behavior with the static picture viewing paradigm under free viewing as a baseline condition. We recorded gaze data using mobile eye tracking during postural manipulations in scene viewing. Specifically, in Experiment 1, we compared gaze behavior during head-supported sitting and quiet standing under two task conditions. We found that task affects temporal and spatial gaze parameters, while posture produces no effects on temporal and small effects on spatial parameters. In Experiment 2, we further investigated body posture by introducing four conditions (sitting with chin rest, head-free sitting, quiet standing, standing on an unstable platform). Again, we found no effects on temporal and small effects on spatial gaze parameters. In our experiments, gaze behavior is largely unaffected by body posture, while task conditions readily produce effects. We conclude that results from static picture viewing may allow predictions of gaze statistics under more natural viewing conditions, however, viewing tasks should be chosen carefully because of their potential effects on gaze characteristics.

Gaze-cued shifts of attention and microsaccades are sustained for whole bodies but are transient for body parts

Article Open access 05 April 2022

Behavioral synergic relations between eye and postural movements in young adults searching to locate objects in room inside houses

Article 30 November 2021

Active vision in immersive, 360° real-world environments

Article Open access 31 August 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The area of high-acuity vision, the fovea centralis, corresponds to about $2^\circ$ of visual angle. As a consequence, we have to change our gaze frequently to process detailed visual information from the environment. In everyday tasks, human observers perform combined eye, head, and trunk movements for gaze shifts (Land et al. 1999) to keep eye movements within a comfortable range of up to about $25^\circ$ (Stahl 1999), while the maximum range of saccade amplitudes is approximately $\pm 55^\circ$ (Guitton and Volle 1987). However, even for smaller movements ($<15^\circ$), we produce coordinated eye and head movements under natural conditions (Franchak et al. 2021; ’t Hart and Einhäuser 2012). The physiological basis for coordinated eye, head, and postural movements is given by the neural coding of gaze positions (Paré et al. 1994). The proportion to which a subject uses eye or head movements for gaze shifts, however, varies greatly between individuals (Pelz et al. 2001).

The fact that coordination of eye, head, and trunk is ubiquitous in everyday situations, is in contrast with the static picture viewing paradigm, the well-established method to study visual scene exploration through gaze shifts. In the static picture viewing paradigm, gaze behavior on real-world scenes is investigated in darkened laboratory setups (Henderson 2003; Rayner 2009), where a stationary eye tracker is employed, participants are seated, typically with a head-supporting chin rest, and gaze shifts are produced by eye movements only and are practically restricted to the limits of the computer screen size.

While the static picture viewing paradigm has yielded many insightful results, the restrictions have always been criticized (e.g., Tatler et al. 2011), in particular, with the arising new technological progress to obtain high-resolution eye-tracking data in real-world situations. For a good overview of the critique, see Tatler et al. (2011) and Henderson (2003, 2006, 2007). The papers criticizing the static picture viewing paradigm not only question the generalizability of laboratory studies to real-world behavior, but also discuss the frequent lack of a concrete task, the sudden onset of the scene, the relatively short viewing time, the limited field of view, the lack of depth and motion cues, the limited dynamic range, and the photographer bias. A growing literature is investigating aspects between the laboratory and the real-world (e.g., Foulsham et al. 2011; Gert et al. 2022).

Well-established effects on gaze statistics were discovered with the static picture viewing paradigm. A prominent example is the central fixation bias (Tatler 2007; ’t Hart et al. 2009), which is strongest for sudden image onsets (Rothkegel et al. 2017). The participant’s gaze is biased toward the center of a given image, particularly at the beginning of scene exploration, i.e., for the first few saccades. But also later during the trial, fixations at central locations are disproportionately frequent, independent of the positioning of the image on the monitor (Bindemann 2010) and independent of the distribution of salient locations on the image. Even the starting position (central vs. non-central) has little influence to reduce the central fixation bias as long as sudden image onsets were applied (Tatler 2007; Rothkegel et al. 2017). Rothkegel et al. (2017) were able to reduce the strength of the central fixation bias by introducing a short preview time to the scene.

The interactions of eye, head, and trunk movements have been studied extensively (e.g., Stahl 1999; Imai et al. 2001; Pelz et al. 2001; Land 2004; Franchak et al. 2021). Among the key questions is the problem whether the addition of head and body movements is merely compensatory or whether the gaze positions and fixation times are modulated, when observers are permitted to produce head and body movements. Results are inconclusive. For example, Smith et al. (2019) found shorter search times in a visual search task in standing than in sitting, but this result was obtained without eye tracking and the effect occurred in the easier of two search conditions only. Other results of body posture manipulation on cognitive components produced ambiguous results. For example, the color stroop effect (Stroop 1935) was reduced in some studies (Rosenbaum et al. 2017, 2018; Smith et al. 2019; Caron et al. 2020). However, a meta-analysis and a replication showed that these findings cannot be confirmed (Straub et al. 2022).

In addition to postural influences, gaze movements are also dependent on the viewing task (Schwetlick et al. 2023). Early anecdotal findings date back to Buswell and Yarbus who found first differences in the gaze movements during picture viewing when viewing instructions of the observers were varied (Yarbus 1967; Buswell 1935). More recent research investigated the influence of instruction under controlled experimental procedures (e.g., Backhaus et al. 2020; Castelhano et al. 2009; Torralba et al. 2006). Additionally, knowledge of scenes and targets were analyzed (Mills et al. 2011; Kaspar and König 2011; Trukenbrod et al. 2019) with respect to their influence on gaze control.

The present work aims to investigate the generalizability of results from static picture viewing paradigm to less restricted posture under different tasks. In Experiment 1, a within-subject design is applied to analyze viewing behavior during sitting and standing in a free viewing and a more specific viewing task. In Experiment 2, we investigate viewing behavior under four different postural manipulations, from highly restricted to more flexible postures. Effects on gaze behavior are analyzed separately for temporal and spatial viewing characteristics. Since we are interested in the question of whether there are any differences between conditions, we do not formulate directed hypotheses.

Experiment 1

In the first of our experiments, we investigate the influence of two different body postures on gaze behavior under two different task conditions. The static picture viewing paradigm is typically investigated during sitting with chin rest support (Chin_Rest) without a concrete viewing task (Free_Viewing). We contrast this setup with the postural condition of quiet standing (Standing) and a more specific task condition, where participants were required to guess the time of the day the image was taken (Guess_Time), a task we used with the same image material in an earlier study (Backhaus et al. 2020). We chose this task since it is a slightly more concrete task than free viewing (subjects can develop their own strategy for extracting time from the picture) and there is no clear presumption of attentional locations in the picture. As a result, we apply a $2\times 2$ within-subject design.

Methods

Participants

Thirty-one students (26 females, 5 males, age range from 19 to 49 years, mean age $=$ 25.2 years) with normal or corrected-to-normal vision participated in this experiment. An additional eight students were excluded from the analyses since the experiment had to be stopped during the recording because of persistent calibration failures ($n=5$) or reported uneasiness ($n=2$). Another participant ($n=1$) was excluded due to abnormal fixation patterns produced during the experiment. Participants were recruited via a departmental internal portal and received credit points or monetary compensation (€ 9.00). To increase engagement with the task, we offered participants an additional incentive of up to € 1.50 for correctly answering questions after 30 of the 60 images. The study was carried out in accordance with the Declaration of Helsinki. Written informed consent was obtained for experimentation by all participants prior to testing.

Apparatus and saccade detection

Stimulus images were presented on a luminance-calibrated projector (JVC DLA-X9500B; Victor Company of Japan Ltd., Yokohama, Japan) with a refresh rate of 60 Hz and a resolution of 1920$\times$1080 pixels. Participants were placed at a distance of 270 cm from the projector screen in all experimental conditions, i.e., during sitting and standing. Infrared video-based mobile eye-tracking glasses (SMI-ETG 2W, SensoMotoric Instruments, Teltow, Germany) were used to record participants’ eye movements during the experiment. Gaze positions were obtained binocularly in scene camera coordinates on a sub-pixel level with a sampling rate of 120 Hz. Scene camera resolution was 960$\times$720 pixels (or $60^\circ \times 46^\circ$ visual angle) with a refresh rate of 30 Hz. Figure S1 in the supplement shows the experimental setup in our laboratory. For saccade detection, we transformed data from scene camera coordinates to stimulus image coordinates (cf., Backhaus et al. 2020). Next, we used both binocular gaze trajectories and applied a velocity-based algorithm (Engbert and Kliegl 2003; Engbert and Mergenthaler 2006) with the same set of parameters as reported in Backhaus et al. (2020). Gaze position was computed using the binocular stream provided by the hardware. After saccade detection, fixations were defined as time intervals between subsequent saccades. Saccade metrics were defined from gaze shifts on stimulus images irrespective of the differentiation between eye-in-head and head-in-space movements. The eye tracker detection was used to label the blinks. Both blinks and the preceding and succeeding events (i.e., fixations or saccades) were excluded from further analysis.

Materials and procedure

Natural photographs with a resolution of 1668$\times$828 pixels were presented in the center of the screen. Spatial extent of the stimulus images covered $40.6^\circ$ of visual angle in the horizontal and $20.1^\circ$ in the vertical dimension. For later screen detection, stimulus images were embedded in a grey frame that included 12 unique QR-markers (126$\times$126 pixels each). Colored photographs were taken from Backhaus et al. (2020). The photographs contained varying numbers of humans and animals (between 0 and 10), having overall sharpness, no prominent text, and are taken in different countries and on different daytime.

Experiment 1 consisted of four Blocks of fifteen images with a presentation time of 8 s each. In the first two Blocks, participants viewed 30 images in randomized order under task condition Free_Viewing, where subjects did not have a specific task instruction. The second manipulated factor was body posture with the variations of sitting with a chin rest (Chin_Rest) versus standing quietly (Standing). Note that screen height was adjusted to participants’ vertical eye positions in space. Body posture conditions were counterbalanced and assigned to Block A and Block B. In Block C and Block D participants viewed the 30 images for a second time in randomized order, but under the specific task condition of asking the subject to guess the time of the day the image was taken (Guess_Time). Body posture conditions were again counterbalanced and assigned to Block C and Block D. Every session started with detailed instructions of the upcoming task followed by a calibration. Trials consisted of a screen with a task reminder (1 s), followed by a fixation check (3 s), and the image presentation (8 s). In Blocks C and D, three alternative answers to the guessing task were presented. Participants were required to answer verbally (condition Standing) or by knocking on the table while fixating on the selected answer with their eyes (condition Chin_Rest). We have chosen this modality because speaking is not possible while the head and chin are fixed. The experimenter entered the answers into the computer. Correctly answered questions were rewarded with an incentive of € 0.05. The specific query and the reward serve to maintain the motivation of the participants. Therefore, we did not analyze the actual answers. Participants guessed the time correctly in 65 % of the guessing trials. A schematic sequence of an experimental trial is shown in Fig. 1.

Throughout the experiment, participants’ eye movements were recorded. For calibration, we used the SMI built-in 3-point calibration routine after at least every fifth trial or whenever the experimenter decided to recalibrate. For fixation checks, a black cross ($0.73^\circ \times 0.73^\circ$) on medium gray background appeared on a randomly selected position (from 15 possible positions defined by three vertical positions between 25% and 75% of the projector screen’s vertical size and five horizontal positions between 20% to 80% of the projector screen’s horizontal size). The experimenter started a calibration, whenever the eye position deviated more than about $1^\circ$ of visual angle from the initial fixation target at the beginning of each trial.

Data preprocessing

Since mobile eye-tracking signals are typically noisier than signals recorded via desktop devices, we applied a list of exclusion criteria to remove unreliable events during preprocessing. Blinks detected by the eye tracker, fixations shorter than 33 ms (equivalent to four samples), fixations with durations greater than or equal to 1000 ms and fixations with jittering signals (which exceeded the 2D median standard deviation of all fixations by a factor of 15) as well as saccades with amplitudes greater than $25^\circ$ were detected. In all these cases, we removed the events as well as the neighboring events before and after the critical event. Finally, all trials where gaze positions deviated greater than $2^\circ$ from the fixation target during the last 200 ms before image presentation were discarded.

Statistical analyses

For our statistical analysis, our approach is based on linear mixed models. This method allows us to include experimentally varied factors (fixed effects), covariates, as well as within-design groups (random effects) in one model. Our orthogonal contrasts of the fixed factors reflect our hypotheses about the varied body postures and tasks. As within-grouping factors, we integrate (whenever possible) the subjects and the presented images. The complexity of these random factors is chosen according to the recommendations of Bates et al. (2018) and Matuschek et al. (2017). For some dependent variables, a simpler random effect structure with only the intercept for the subjects and the intercept for the images is chosen for better comparability between models. The analysis was performed with R (v.4.2.1, R Core Team 2022) and the lme4 package (v.1.1-30, Bates et al. 2015). Models were estimated using maximum likelihood and the Bobyqa optimizer; p-values were calculated with the lmerTest package (v.3.1-3, Kuznetsova et al. 2017). In the presentation of results, we focus on the experimentally varied fixed effects, which are controlled for between-subject and between-image variances through the random effects. Details about the random effect variances can be found in the provided code. The resulting models for each dependent variable can be found in Table 1.

Table 1 Linear mixed-effects model structure

Full size table

Results

We investigated effects of body posture and task on different gaze parameters. First, we report effects for temporal parameters (fixations durations), and, second, we report obtained effects on spatial parameters (saccade amplitudes, gaze distribution via entropy, central fixation bias).

Temporal parameters

We analyzed the fixation duration, i.e., the key temporal parameter of gaze behavior, across the different experimental conditions (see Table 2). In a linear mixed effect analysis, we considered (A) the difference between the two task conditions (Task), (B) the difference between the two body posture conditions (Body), and (C) the interaction of predictors A and B (Interaction) as fixed effects. In addition, we controlled for the influence of the subjects and images by including one intercept estimate for both subject and image in the model as varying (random) components (see Methods, Table 1). We limited our analysis to the first 2 s of image presentation, since later effects were not expected (Table 2). Furthermore, the first fixations (on the fixation cross) were excluded from our analyses. Figure 2 visualizes the mean fixation durations over the initial 2 s of image viewing time. Table S1 in the supplement shows the results of the linear mixed effect model (LMM) analysis. Fixation durations were log transformed to better conform to the normal distribution assumptions of the residuals. Note, that t-values above 2 are considered significant results. We found a significant effect for the task contrast with longer fixation durations for the free viewing task [Task: $M=-0.03$; $SE=0.01$; $t=-3.34$]. No other contrast reaches significance level.

Table 2 Experiment 1: Descriptive means

Full size table

Spatial parameters

For the evaluation of spatial gaze characteristics, we examined the following metrics: saccade amplitudes, entropy of the fixation location density, and mean distance to the image center. For all analyses, we calculated linear mixed models (LMMs) with the same fixed effect structure as for the temporal parameters (see Methods, Table 1). The LMM variance components differ for entropy. Since entropy can only be calculated per image, subject variance components are excluded. However, we have added a slope estimation for the task contrast next to the intercept in the image variance components part, which resulted from the model selection procedure (Bates et al. 2018). For the distance to the image center analysis, we added two covariates which significantly improved the log-likelihood of the LMMs.

Saccade amplitudes were log transformed to better conform to the normal distribution assumptions of the residuals. Over the total viewing time of 8 s, we find a significant difference in log saccade amplitudes for the Task contrast (Fig. 3). The free viewing task induces shorter log amplitudes than the guessing task [Task: $M=0.05$; $SE=0.01$; $t=5.18$]. No other contrast reached the significance level. Looking only at the first 2 s of image viewing, the same pattern emerged (see Table S2 in the supplement).

The entropy is an information measure (Shannon and Weaver 1963) of the distribution of fixation locations on an image. First, we transformed the fixation location density into a probability $p_i$ of a grid (128$\times$128 cells) with $\sum _i p_i=1$. The entropy is computed as

$$\begin{aligned} {S} = - \sum \limits _{i=1}^{n}p_i \log _2 p_i. \end{aligned}$$

(1)

Thus, the entropy is measured in bits and ranges from 0 to $\log _2(128^2)=14$ bits, where the maximum corresponds to an equal distribution over the cells. Finally, we exponentially transposed the entropy values to better conform to the normal distribution assumptions of the residuals.

We find significant differences between tasks [Task : $M=-43129.69$; $SE=15391.02$; $t=-2.80$]. Free viewing task produces a larger entropy and thus a wider distribution of fixation locations across the image. Furthermore, we found a significant influence of body posture [Body : $M=-31005.46$; $SE=10550.38$; $t=-2.94$]. When sitting with chin rest, the subjects spread their gaze further over the image compared to the standing position (see Table S3 in the supplement). No interaction was found [Interaction : $M=7607.86$; $SE=21100.76$; $t=0.36$]. Figure 4 shows the differences in the original metric.

The distance to the center of the image is a measure for the central fixation bias (Rothkegel et al. 2017; Tatler 2007). To control the influence of the starting position (fixation cross), we sampled the data in such a way that all 15 starting positions were present equally often in all four conditions. We excluded the fixation on the fixation cross from the analysis. To control for its influence, we included the distance of the starting position from the image center as a covariate in the LMM. We also added the logarithmized sample number to the model as a further covariate to linearize the change of the CFB over time.

Over the whole viewing time, we find a significant difference caused by the different tasks [Task : $M=-0.43^\circ$; $SE=0.05^\circ$; $t=-8.65$]. Free viewing produces a less pronounced bias toward the center of the image. We also find an influence of body posture, while an interaction of task and body posture is absent [Body : $M=-0.28^\circ$; $SE=0.05^\circ$; $t=-5.51$; Interaction : $M=-0.01^\circ$; $SE=0.10^\circ$; $t=-0.13$]. Standing posture produces a stronger bias toward the center of the image. We also examined the evolution of central fixation bias in fine-scaled steps of 400 ms for the early phase to 1200 ms and a separate analysis of the later bias at the viewing time from 1200 to 8000 ms (Fig. 5).

For the later phase (1200 ms to 8000 ms), we find the same effects even more pronounced as for the whole viewing time. In the early viewing phase (0 ms to 1200 ms), we find no influence of the viewing task nor of the body posture, except for the time interval from 400 ms to 800 ms in which the effect of the viewing task already shows up in the same direction as in the later viewing phase. Note that for all analysis the residuals are not normally distributed because of a floor effect. The results can be found in Table S4 in the supplement.

Experiment 2

In the second experiment, we aim at a more detailed investigation of the possible influence of body postures. Here, we used the same setup as in Experiment 1, but varied the posture over four levels, ranging from strongly restricted sitting with a chin rest support over more natural tasks of normal sitting and normal standing to body postures resulting from standing on a balance board. For the viewing task, we gave a specific task, where participants were required to count the animals in a given image, across all postural conditions. This specific task clearly requires active gaze behavior, it minimizes the variation due to the level of understanding of the task between subjects, and yet does not strongly restrict the fixation locations, as animals can appear all over the image. We applied this task in an earlier study (Backhaus et al. 2020). Note that there was not a variation of task. The selected task serves to ensure that the participants do not choose their own task, thus minimizing task-related variability between subjects.