Introduction

In 2015, there were 841,100 nonfatal victimizations occurring at school towards students between the ages of 12–18 [1]. Research indicates that students who are aggressive in the school environment are at greater risk for academic failure, social maladjustment, and long-lasting destructive behaviors [2, 3]. Evidence-based methods of violence prediction and prevention are therefore needed to reduce the impact of school violence. While there has been progress in the domains of school-based school violence prevention and a more comprehensive understanding of the risk factors related to school violence & aggression [4, 5], work still remains in predicting potential aggressive and violent behaviors in school-aged children.

Social information Processing (SIP) theory of social adjustment may shed light on predictors of aggressive behaviors. SIP models of children’s social adjustment describe a multi-step process when a child confronts situational social cues. First, children encode pertinent social information, create a perceptual representation of that information, select a goal to choose the most desired outcome, then evaluate and initiate behavioral and emotional response selection, and finally begin enactment of the selected response. Reacting appropriately to social situations therefore requires accurate processing at each of the stages, and inaccurate or irregular processing increases the likelihood of reacting inappropriately. Irregularities in processing at earlier stages would therefore also theoretically affect processing at later stages. Interestingly, many studies have demonstrated that aggressive behaviors are associated with deviations in the later process steps of representation, response selection and enactment [6,7,8]. Less studied is the initial step of encoding and consequently an understanding of how deviations in processing at the earliest stage may contribute to aggressive behaviors.

It is hypothesized that the initial step of the SIP process, encoding of situational cues, is selective and automatic in order to efficiently process all relevant information, with the encoding of cues acting in a bottom-up manner that leads to cognitive representations of intent. The SIP model proposed by Crick and Dodge [6] hypothesized that more aggressive children pay measurably more attention to hostile versus non-hostile environmental cues; biased attention towards hostile cues via encoding would increase the likelihood of interpreting social situations as hostile, therefore increasing probability of aggression. One study found that boys diagnosed with oppositional defiant disorder demonstrated less precise encoding of social information, but did not differ from typically functioning boys in the subsequent interpretation of information once it had been encoded [9]. For children who had been physically maltreated, more attention was paid to hostile cues in the environment and less attention to other pertinent social cues, with poor encoding relating to higher levels of subsequent aggression [10]. Consequently, children with deviations in encoding may have a higher likelihood of hostile attribution biases, and be more likely to behave aggressively [11]. A meta-analytic review established a strong relationship between hostile intent attribution and aggressive behavior, that is, individuals who attributed more hostile intentions to others were themselves more likely to be aggressive [12]. However, the authors of the review also noted that the ability to assess hostile intent attribution was confounded by the measurement technique’s inability to distinguish encoding, i.e., what social information is attended to, from representation, how the social information is represented.

We wanted to better understand and delineate the relationship between the preemptive processing that occurs with encoding and aggressive behavior by employing a direct empirical measure of encoding with eye-tracking methodologies. We hypothesized that adolescents who paid more attention to hostile versions of ambiguous social interactions as indexed by encoded eye-tracking data of hostile cues would consequently have higher scores on measures of aggression, with potential utility as a predictive measure of aggression.

Methods

This brief report expanded upon prior child and adolescent violence research from clinical settings into schools, with the addition of a neurophysiological measure, eye-tracking [13]. The study design was approved by the institutional review board at our pediatric hospital (Study ID 2014–5033). The outline of the study is shown in Fig. 1.

Fig. 1
figure 1

Study progression outline

We explored the relationship between a standardized rapid aggression assessment measure to evaluate if visual attention and encoding of hostile interactions quantified from eye tracking measurements of SIP could predict aggressive tendencies.

Participants

We recruited 10 adolescent high school and middle school students ranging in age from 13 to 18 (Mean = 15.8, SD = 1.53) from a local school district and a pediatric hospital. Referrals were made to the research team if a student exhibited any behavioral changes, physical aggression and or/threats towards others or property damage at school. Referrals for subtle changes in behaviors, such as being more withdrawn at school, were also included. We provided our findings and recommendations to the guardians. If any help was needed such as counseling or medication management, the research team was able to provide families with additional resources or referrals.

Eye Tracking

Eye gaze patterns were recorded with the Tobii TX 300 eye tracker (Tobii, Stockholm, Sweden), used to record eye movements using pupil locations as well as corneal reflections at a rate of 300 Hz.

Social Information Processing Eye Tracking Assessment

Tobii Pro Studio eye tracking software was used to present cartoon illustrations from a previous experiment assessing social information processing in aggressive youths [14]. The cartoon illustrations were based upon previous peer provocation vignettes [15,16,17], and had previously been piloted with focus groups of children and pediatric clinical staff. The paradigm consisted of black and white cartoons of hypothetical real-life scenarios while eye movements were recorded. Vignettes described interactions between an active character (character A) and a passive character (character B), where character A initiated a behavior (hostile, non-hostile (accidental) or ambiguous) that affected character B and resulted in a negative outcome, as well as character A’s emotional response (mean, neutral, sad/apologetic) to that outcome. Each vignette had five gender-specific (boy or girl) versions. See Fig. 2 for three of the five vignettes. The five vignette combination possibilities (behavior type-emotional response type) presented were as follows: hostile behavior-mean emotion, hostile behavior-neutral emotion, non-hostile behavior-sad emotion, non-hostile behavior-neutral emotion, ambiguous behavior-neutral emotion.

Fig. 2
figure 2

Examples of three of the five different versions of cartoon vignettes

The first cartoon was the same across all vignettes and set the context within a specific social setting for the two characters. In the second cartoon character A (counterbalanced to either the left or right side) behaved in either a hostile, ambiguous or non-hostile (accidental) manner towards character B, initiating the interaction between the two characters. The third cartoon presented a negative outcome that resulted from the behavior in the second cartoon, as well as character A’s emotional reaction to the outcome (i.e., character A either had a sad/apologetic facial expression after a non-hostile (accidental) behavior, or a mean facial expression after a hostile behavior). Significantly, the first and second cartoons in the sequence were presented singularly, while the third cartoon demonstrating the outcome was presented simultaneously with the second cartoon to culminate the sequence. Figure 3 visually depicts the vignette sequence.

Fig. 3
figure 3

Example of sequential frames during presentation of cartoon vignettes

Behavior

The Brief Rating of Aggression by Children and Adolescents (BRACHA) is a 14-point scaling system used to provide a brief measure of risk of aggression [13].

Procedure

The participants and guardians came to an onsite location at the pediatric hospital. After obtaining informed assent and consent, the participant relocated to another room for the interview portion of the study visit. The participant’s guardian remained in the room and was also asked demographic and BRACHA interview questions. The participant was then relocated to a final room where they sat in front of the eye tracker while the eye tracker was calibrated using a five-point calibration grid. The participant was then asked to watch the social information processing paradigm described previously, and asked to relate to the emotion of character B (indicated by an arrow) that was being presented in the paradigm. The eye tracker recorded in real time the location of the participant’s gaze as well as the duration of time each participant spent focusing their respective gazes on specific areas of interest embedded within the vignettes. After the eye tracking portion was completed, participants were reunited with guardians and debriefed on the experiment.

Encoding

Continuous eye-movement data were used to calculate two specific eye-tracking indices of encoding: first-pass fixation duration time and second-pass fixation duration time [18]. First-pass time was defined as the duration of all eye fixations on the behavior in the second cartoon, before the third cartoon presenting the outcome was presented directly below it. Second-pass time was defined as the sum of all eye fixation durations on the behavior cue in the second cartoon after viewing the negative outcome and the emotion cue of character A (sad, mean or neutral) in the third cartoon. Second-pass time correlates with verification or reconsideration of the behavioral intent cues presented earlier in the second cartoon. First-pass fixation duration time is related to lower-level automatic encoding (bottom-up) processes, while the second-pass fixation durations involves higher-order (top-down) processes, in other words higher order global integration [19, 20].

The duration for eye gazes located in the same area for at least 100 ms was classified as a fixation duration. Predefined areas of interest (AOIs) embedded within the vignette cartoons were selected for analysis of fixation durations. An AOI was defined as a square area (200 × 200 pixels) that encompassed character A’s behavior, and was defined both when the second cartoon was presented by itself, as well as when the second cartoon (behavioral cue) and third cartoon (outcome and emotional cue) were presented simultaneously. In addition, an AOI (100 × 100 pixels) encompassed character A’s emotional expression in the third cartoon. Refer to Fig. 4 for a visual representation.

Fig. 4
figure 4

Example of fixation duration AOIs during sequential vignette presentation. a First-pass behavior AOI. b Second-pass behavior AOI c Second-pass emotion AOI

Statistical Methods

We assessed the relationships between aggression scores and eye gaze fixation time correspondent to each AOI in our adapted ambiguous social vignette paradigm. A simple linear regression was calculated to predict BRACHA score based on duration fixation of AOI type (hostile, neutral or non-hostile), first-pass or second-pass. Two-side alpha-value = 0.05 was used to determine the significance findings. A false discovery adjustment was used to partially account for multiple comparisons. No multi-testing correction was performed since this study was an exploratory pilot analysis and measurements derived from multiple AOIs were not completely independent of one another.

Results

A simple linear regression was calculated to test whether eye-tracking indices of attention during an ambiguous social vignette predicted scores on a brief assessment of aggression in school-aged children. Looking as indexed by fixation duration for two specific vignette type AOIs was found to significantly predict higher aggression scores on the BRACHA: fixation duration for the hostile behavior-mean emotion: second-pass AOI, and fixation duration for the hostile behavior-neutral emotion: second-pass AOI.

For the hostile behavior-mean emotion: second-pass AOI, a significant regression coefficient was found (F(1,8) = 10.86, p = .010), with an R2 of .575. For the hostile behavior-neutral emotion: second-pass AOI, a signification regression was found of (F(1,8) = 6.98, p = .029), with an R2 of .465. Children with higher BRACHA scores tended to look back longer (second-pass AOI fixation duration) at the hostile cues presented in the second cartoon, for either of the resultant emotional cues presented in the third cartoon (mean/neutral). Further, the more congruent hostile behavior cue-mean emotion cue predicted a higher BRACHA score as compared to the less congruent hostile behavior-neutral emotion cue, that is, for the hostile behavior vignette type, children looked back longest (second-pass AOI fixation duration) after seeing the mean emotion displayed by character A.

To partially account for multiple comparisons, we also calculated the false discovery rate of 0.05 adjusted p-values as follows: for fixation duration for the hostile behavior-mean emotion second-pass AOI, FDR-adjusted value was p = .043, while for fixation duration for the hostile behavior-neutral emotion second-pass AOI, FDR-adjusted value was p = .059. Our results are consistent with the SIP model proposed by Crick and Dodge and the hypothesis that more aggressive children pay measurably more attention to hostile versus non-hostile environmental cues.

Discussion

The results of our exploratory study differ from those of Horsley et al., whose cartoon illustrations we employed in our modified vignette paradigm [14]. Horsley’s results indicate that encoding of hostile cues with aggressive children might differ from the traditional hypothesis of the SIP model proposed by Crick and Dodge [6], which proposes that more aggressive children pay appreciably more attention to hostile versus non-hostile environmental cues. Instead, Horsley’s results indicate that more aggressive children from non-clinical samples look longer at non-hostile cues. The authors attributed their results to an alternative hypothesis termed the ‘schema inconsistency’ hypothesis, which proposes that individuals look longer at schema-inconsistent information, which for more aggressive individuals would be non-hostile cues. Similarly, a recent study by Lin et al. reports that mentally healthy adults with higher aggressive tendencies tended to avoid eye contact with potential violence perpetrators [21], providing support for the ‘schema inconsistency’ hypothesis. Even in studies whose results supported the traditional SIP model, effect size differed based upon the manner in which stimuli were presented, with video and picture stimuli of social interactions having smaller effect sizes than audio stimuli of interactions, and with actual staging of social interactions associated with the largest effects [12]. Such differing results and varying effect sizes even within studies with congruent results indicates that variables such as age, mental health status, and stimuli may all confound the relationship between visual attention and hostile cues.

Furthermore, our results indicate that what is needed is a more comprehensive understanding of the contributing role deviations during the initial step of encoding may have in increasing the probability of aggressive behavior. Downstream effects of such deviations at the earliest stage may be compounded by further deviations at subsequent steps, and may moderate hostile intent attributions and consecutively make it easier to interpret social information as hostile.

One primary limitation of our study was the small sample size. In addition, a major limitation is that our experimental design did not directly measure temporal information with regard to the sequential nature of eye gaze fixation durations. Specifically, the nature of our eye tracker could not explicitly measure how an individual might shift her/his gaze between different images as they were presented continuously. We attempted to account for such a limitation with the design of the cartoon vignette paradigm and the manner in which it presented each image. We formatted the vignette in a way that the sequential nature of first-pass and second-pass fixation durations could be inferred, with first-pass defined as the first presentation of the second cartoon of the behavior cue singularly, while second-pass was defined as the ensuing presentation of the second cartoon of the behavior cue presented simultaneous to the third cartoon of the consequential outcome and resultant emotional cue (Fig. 3). However, we could not directly confirm that the subject did first pay attention to the behavioral cue the second cartoon presented individually in the second frame, then focus on the outcome and emotional cue in the third cartoon in the third frame, then have their gaze return to the second cartoon presented in the third frame.

Despite the limitation of a small sample size, our findings are informative. Social information processing theories suggest that aggressive behavior is in part caused by a hypersensitivity to hostile cues in others’ behavior, leading to a bias to interpret others’ behavior as having hostile intent, characterized as hostile intent attribution. The results of our study support this theory, providing clear evidence that aggression is related to irregular encoding of visual information, with more aggressive adolescents attending to the vignettes with hostile cues and congruent emotional reactions. The eye tracking methodologies and the nature of the paradigm our study employs also provided a more nuanced delineation between SIP’s initial step of encoding from its consecutive steps, clarifying how the automatic bottom up cognitive processes of encoding may moderate the higher level successive stages in SIP theory and increases potential for aggressive behavior. Future studies with larger samples are needed to confirm our results. Once these findings are validated, the collective evidence may suggest that eye gaze patterns could facilitate violence risk assessment.