Introduction

Over the last four decades, empirical research on school bullying and its relation to youth outcomes has grown substantially (Hymel & Swearer, 2015). Reviews consistently indicate that bullying involvement is a substantial predictor of mental health problems and poor school outcomes for students around the world (Arseneault, Bowes, & Shakoor, 2010; Jimerson, Nickerson, Mayer, & Furlong, 2012; Jimerson, Swearer, & Espelage, 2010). Definitions of bullying can vary somewhat, but most scholars agree that it is a multidimensional behavioral construct characterized by three key features: (1) harmfulness, (2) frequency, and (3) peer power differential (Olweus, 2010). For a behavior to be considered true “bullying,” it must (a) injure another person in some way (e.g., physically, psychologically, socially), (b) be exhibited repeatedly over time, and (c) occur within a relational context in which the perpetrator has a distinct advantage (e.g., physically, psychologically, socially) over the victim. To date, empirical work regarding definitional issues has been overshadowed by investigations aiming to classify bullying involvement, determine prevalence rates, and then explore the relation of these classifications with valued youth outcomes. The overarching purpose of the present study was to extend this line of classification research by exploring the functionality of four competing schemas for categorizing youths’ bullying involvement.

Classifying Bullying Involvement

Research regarding students’ bullying involvement typically proceeds by, first, using youths’ self-reports to classify them into one of four mutually exclusive categories of involvement and then, second, investigating the prevalence rates and differential psychosocial outcomes associated with these classifications. These four classifications are commonly referred to as uninvolved, victim, bully, and bullyvictim (e.g., Veenstra, Lindenberg, Oldehinkel, De Winter, & Ormel, 2005) and make up the only major schema, to date, for categorizing bullying involvement. For the purposes of the present study, we slightly renamed these four classifications as follows: uninvolved, victim only, perpetrator only, and perpetrating victim. By adding “only” to two categories and replacing the term “bully” with “perpetrator,” we intended to clarify the distinguishing features of the classifications and avoid common confusions in interpretation. Previous studies not utilizing this four-group classification schema have typically assessed only one domain of bullying behavior—victimization or perpetration—and therefore used a partial but more limited schema (i.e., victims vs. non-victims, perpetrators vs. non-perpetrators; see Arseneault et al., 2010). Although all items used to measure youths’ bullying involvement are arranged along response scales that are characterized by relative frequency, it is noteworthy that categorical approaches to data analysis are the norm in this line of research. While using a continuous analytic approach seems warranted given the nature of the data, most bullying involvement measures are single Likert-type items, not more robust Likert-type scales comprised of multiple internally consistent items, and have therefore been treated as ordinal data as opposed to interval or ratio data when used for classification purposes (for more on this methodological issue, see Carifio & Perla, 2008; Gadermann, Guhn, & Zumbo, 2012; Norman, 2010). Adopting this methodological rationale, the present study extends this categorical analytic tradition by exploring the functionality of four competing classification schemas, all derived from single Likert-type items for measuring youths’ bullying involvement at school.

Research regarding the characteristics of youth involved in bullying has focused primarily on the association between particular classifications of involvement and the presence or absence of mental health symptoms and other undesirable life outcomes (e.g., Arseneault et al., 2010). For instance, compared to uninvolved students, those classified as perpetrator only have been shown to be more antagonistic and more aggressive and to exhibit a higher degree of power dominance over their peers—indicating elevated levels of externalizing behaviors and lower levels of prosocial behaviors (Cook, Williams, Guerra, Kim, & Sadek, 2010; Veenstra et al., 2005). On the other hand, youth identified as victim only have been found to exhibit higher levels of insecurity, anxiety, and depression and are less likely to attend school and more likely to perform poorly at school—indicating higher levels of internalizing behaviors (Cook et al., 2010; Haynie et al., 2001; Kochenderfer & Ladd, 1996; Swearer, Espelage, Vaillancourt, & Hymel, 2010). Perpetratorvictim students appear to be the most at-risk group, as they exhibit the least prosocial behavior, are the most aggressive, are likely to experience the highest rates of internalizing symptoms, have the poorest academic performance, and are considered to be the most socially “dislikable” (Cook et al., 2010; Farmer et al., 2010; Veenstra et al., 2005). Taken together, the upshot of these studies is that students’ involvement in bullying at any level is associated with poorer mental health and school functioning outcomes than being uninvolved in bullying and that greater levels of involvement (i.e., perpetrating victim) are associated with the poorest overall outcomes. The present study further explores this outcome trend by proposing classification schemas that account for greater frequencies of bullying involvement (as both a victim and perpetrator) and then investigating the concurrent validity of these enhanced schemas compared to the original four-group schema.

Measuring Bullying Involvement

Although research suggests meaningful differences between the mental health and school functioning of students involved in bullying at different levels, there is currently no “gold standard” measure or procedure for assessing bullying involvement, and several critical issues have been raised regarding the available methods. Many scholars, including Felix, Sharkey, Green, Furlong, & Tanigawa (2011), have raised this issue in the past, concluding that it is wise for researchers interested in assessing bullying involvement to familiarize themselves with the strengths and weaknesses of available methods. For example, Cornell and Cole (2012) note that the nature of definitions of bullying included (or not) in measures are likely to influence youths’ response rates, offering support from Vailancourt et al. (2010), who found fewer endorsements of bullying involvement when definitions of bullying were explicitly stated prior to the items assessing involvement. Moreover, Sharkey, Dowdy, Twyford, & Furlong (2012) suggest that the timeframe provided for endorsing bullying involvement (e.g., within the past month, past few months, or past year) is likely to influence the accuracy of prevalence rates and that the anonymity (or lack thereof) of the assessment processes might affect youths’ endorsements of bullying involvement. All of these issues are essentially concerned with the self-report method of assessing bullying, which is, to date, the primary method for measuring involvement levels, deriving prevalence rates, and investigating concurrent outcomes. These critiques further suggest that findings resulting from different measures that purport to have the same function are likely to yield different data regarding youths’ bullying involvement (see Swearer, Siebecker, Johnson-Frerichs, & Wang, 2010).

The most widely used self-report instruments for gauging youths’ bullying involvement in US schools are the Youth Risk Behavior Surveillance Survey (YRBSS; Centers for Disease Control and Prevention, 2014), the National Crime Victimization Survey (NCVS; Bureau of Justice Statistics, 2014), the Olweus Bully/Victim Questionnaire (OBVQ; Olweus, 1995), and the Health Behavior in School-Aged Children Survey (HBSC; World Health Organization, 2014). The YRBSS includes two items targeting victimization, asking youth to report how often they have been “bullied on school property” as well as cyber-bullied during the past year. The NCVS, on the other hand, includes only a single-item targeting general victimization at school, which is often interpreted as representing bullying-related victimization. The OBVQ is a much more robust measure than both the YRBSS and NCVS, as it consists of several items targeting victimization and a few items targeting perpetration behavior, specifying various forms of bullying-related behavior (i.e., verbal, cyber, physical, relational, racial, and overall). Yet the HBSC’s measure of bullying, which builds upon and extends the OBVQ, appears to be the most comprehensive measure to date, as it includes equal numbers of items targeting both victimization and perpetration across ten forms of bullying behavior: verbal, social exclusion, physical, social sabotage, racial, religious, sexual, computer, cell phone, and overall. Given its balanced focus on both victimization and perpetration as well as its administrative scope as a survey sponsored by the World Health Organization (2014), we suggest that the HBSC is currently the most useful measure available for assessing youths’ bullying involvement in schools.

As mentioned above, some attention has been paid to critical issues surrounding item wording, prompts, and response scales of these common self-report measures, but little attention has been paid to the classification schemas used to operationalize and interpret the data resulting from such measures. So far, the majority of available research classifying students into different levels of bullying involvement has used the same four-group classification schema outlined earlier (i.e., uninvolved, victim only, perpetrator only, and perpetratorvictim; e.g., Greif Green, Felix, Sharkey, Furlong, & Kras, 2013; Nansel, Overpeck, Haynie, Ruan and Scheidt 2003), which places youth into mutually exclusive groups based on their categorical endorsement (or lack thereof) of bullying perpetration and victimization. And our review of the contemporary scholarship failed to yield any conceptual or empirical literature regarding competing classification schemas. Given that the original bullying involvement classification schema—which we will refer to hereafter as the standard four-group model—dichotomizes the endorsement of bullying behaviors (i.e., completely uninvolved vs. involved at any level) and ignores the relative frequency that characterizes such behavior (i.e., lower vs. higher rates of involvement), we suggest that a logical next step for progressing research in this area is to evaluate competing classification schemas that vary according to their function of differentiating the frequency of youths’ bullying involvement as victims and perpetrators. Although the present study uses data obtained via the HBSC, it is plausible that such competing bullying involvement schemas could be investigated with any survey that yields measures of both victimization and perpetration behaviors (e.g., OBVQ; Olweus, 1995), but not with measures that only target victimization or perpetration behavior (e.g., California Bullying Victimization Survey; Felix, Sharkey, Green, Furlong, & Tanigawa, 2011).

Purpose of the Present Study

Given the context sketched above, the purpose of the present study was to initiate a line of research exploring the functionality of competing bullying involvement classification schemas, which differ according to where the conceptual boundaries are drawn regarding students’ relative rates of endorsing victimization and perpetration behaviors at school. Although classifications in the present study were defined by relative frequency of bullying involvement, suggesting interval or ratio data, the measures of victimization and perpetration were conceptualized as ordinal data because they were derived from single Likert-type items with limited response ranges, not from internally consistent scales of such items (see Carifio & Perla, 2008, for more on this issue). More specifically, the aim of this study was to investigate the differential prevalence rates and concurrent validity of the standard four-group model for classifying bullying involvement in comparison with three competing models—the alternative four-group model, the nine-group model, and the sixteen-group model (see the “Method” section, below, for operationalizations of these classification variables)—in relation to indicators of student mental health and school functioning. Considering that these competing classification schemas differed primarily according to the rate or frequency of bullying victimization and perpetration behaviors endorsed by students, we assumed that the prevalence rates yielded by all schemas would differ substantially, with the nine-group and sixteen-group schemas yielding fewer students at higher involvement levels than the four-group schemas. Furthermore, given that, theoretically, increased levels of bullying involvement should be associated with increased levels of student risk, we further hypothesized that compared to the standard and alternative four-group models, both the nine-group and the sixteen-group classification schemas would demonstrate incremental validity across two broad indicators of mental health (i.e., psychological distress and well-being) and two general indicators of school functioning (i.e., academic performance and feelings toward school).

Method

Participants

This study was conducted using the 2009–2010 sample of the Health Behavior in School-Aged Children Survey (HBSC), which is publicly available via the Inter-university Consortium for Political and Social Research (www.icpsr.umich.edu). This nationally representative sample of US students consists of 12,642 youth in Grades 5–10. Randomized samplings of school districts were used to collect data from balanced cross sections of the country. The present sample consisted of roughly equal numbers of males (51.4 %) and females (48.6 %), while the distribution of participants by grade was also approximately equivalent (5 = 13.6 %, 6 = 16.2 %, 7 = 19.2 %, 8 = 19.6 %, 9 = 16.4 %, and 10 = 15.1 %). Youths’ self-reported racial/ethnic identities were predominately White (48.8 %), followed by Hispanic (19.8 %) and Black/African American (17.9 %), with smaller proportions of Multiracial (6.8 %), Asian (3.9 %), American Indian/Alaska Native (1.8 %), and Native Hawaiian/Pacific Islander (0.9 %). Further details regarding the sampling and surveying procedures as well as more details regarding participant demographics for this particular sample are available via the 2009–2010 HBSC codebook (Iannotti, 2013).

Measures

HBSC Self-Report

The youth self-report form of the HBSC is used as the primary instrument for a cross-national research study sponsored by the World Health Organization (2014). This study originated in the 1985–1986 school year and is based on independent surveys conducted every four years in over 40 countries, including the USA. The main goals for the HBSC are to facilitate the development of longitudinal data that represents youth’s health and to deliver these findings to the research community to provide further examination of risk behaviors and attitudes related to these behaviors. Further information regarding the HBSC self-report is available via www.hbsc.org and Iannotti (2013). As mentioned above, the present study used the dataset from the 2009–2010 version of the HBSC, which consists of various self-report items regarding youths’ mental health, school functioning, nutrition and physical health-related behaviors, and various other aspects of psychosocial functioning. For the purposes of this study, however, only a limited number of items related to youths’ bullying involvement, mental health, and school functioning were analyzed.

Bullying Involvement

The HBSC survey includes two self-report items assessing general bullying involvement at school: one item targeting victimization and the other targeting perpetration. Prior to answering these items, youth are presented with a response prompt that consists of a definition of bullying derived from the Olweus Bully/Victim Questionnaire (Olweus, 1995):

We say a student is BEING BULLIED when another student, or a group of students, say or do nasty or unpleasant things to him or her. It is also bullying when a student is teased repeatedly in a way her or she does not like or when they are deliberately left out of things. But it is NOT BULLYING when students of about the same strength or power argue or fight. It is also not bullying when a student is teased in a friendly and playful way.

Following, the items assessing general perpetration and victimization read: “How often have you been bullied at school in the past couple of months?” and “How often have you taken part in bullying another student(s) at school in the past couple of months?” Both of these global items were arranged along a five-point, relative frequency-based response scale (i.e., 1 = I haven’t been bullied at school the past couple of months, 2 = It has only happened once or twice, 3 = 2 or 3 times a month, 4 = About once a week, 5 = Several times a week). For the purposes of the present study, and based on preliminary analyses of differentiations among the response options indicated in the response distributions, the five original response options were collapsed to four and were renamed using the following shorthand labels: 1 = never, 2 = rarely, 3/4 = sometimes, and 5 = often. All analyses were conducted and interpreted using this revised response scale. Preliminary analyses also indicated that participants’ responses to both bullying involvement items were substantially non-normally distributed (skewness and kurtosis >2). However, given that these responses were conceptualized categorically and were used to create the bullying classification schemas (described below), instead of functioning as continuous outcome variables, such non-normality was deemed to be inconsequential.

Bullying Involvement Classification Schemas

The two bullying involvement items described above were used to classify youth into four competing schemas of bullying involvement, which varied primarily as a function of the relative frequency of behavior endorsed in each response. The classification variable for the standard four-group model was created by categorizing youth endorsing responses of “1” (indicating no involvement) for both the victimization and perpetration items as uninvolved, those reporting “2” or higher (indicating some level of involvement) on only the victimization item as a victim only, those reporting “2” or higher on only the perpetrator items as a perpetrator only, and those reporting “2” or higher on both items as a perpetratorvictim. The alternative four-group model used similar decision rules, classifying youth endorsing responses of “1” or “2” into the never/rarely involved categories and those endorsing responses of “3/4” or “5” into the sometimes/often involved categories for both victimization and perpetration. The nine-group model also classified students endorsing responses of “1” or “2” into never/rarely involved categories, but it categorized those endorsing responses of “3/4” as sometimes involved and those endorsing “5” as often involved. Finally, the sixteen-group model separated out the never involved (endorsing “1”) and rarely involved (endorsing “2”) categories, while maintaining the same decision rules as the nine-group model for the sometimes involved (endorsing “3/4”) and often involved (endorsing “5”) categories. A visual representation of the decision rules and resulting categories for each classification schema is presented in Tables 1, 2, 3, and 4, respectively.

Table 1 Standard four-group classification schema of bullying involvement
Table 2 Alternative four-group classification schema of bullying involvement
Table 3 Nine-group classification schema of bullying involvement
Table 4 Sixteen-group classification schema of bullying involvement

Mental Health

Youths’ mental health was assessed using the Psychological Wellbeing and Distress Scale (PWDS), which is a 10-item measure embedded within the self-report HBSC survey (Renshaw & Bolognino, 2015). The PWDS has five items measuring psychological well-being and five items measuring psychological distress. The psychological well-being items use the stem, “Thinking about last week…,” followed by questions regarding how often one had a desirable or positive experience (e.g., “… have you felt full of energy?”). Participants endorse these items using a five-point, relative frequency-based response scale (1 = never, 2 = seldom, 3 = quite often, 4 = very often, 5 = always). Two items measuring psychological distress use a similar stem and response scaling, while three items use the stem, “In the past 6 months how often have you had the following…,” followed by statements regarding how often one experienced a particular mental health symptom (e.g., “… feeling low”). Response options for this stem are also arranged along a five-point, relative frequency-based response scale (1 = about every day, 2 = more than once a week, 3 = about every week, 4 = about every month, 5 = rarely or never). Exploratory and confirmatory factor analyses have indicated that the two subscales of the PWDS represent two distinct mental health constructs that are strongly negatively correlated (ϕ = –.59; Renshaw & Bolognino, 2015). For the present sample, preliminary analyses indicated that both scales were relatively normally distributed (skewness and kurtosis < |1|) and had adequate internal reliability (α > .70).

School Functioning

Youths’ school functioning was assessed using two single items embedded within the self-report HBSC survey: one measuring academic performance and the other feelings (or attitudes) toward school. The item assessing academic performance read as follows: “In your opinion, what does your class teacher(s) think about your school performance compared to your classmates?” Four response options followed this question, all of which were anchored to generic qualitative descriptors of academic performance (1 = very good, 2 = good, 3 = average, 4 = below average). The item assessing feelings toward school read as follows: “How do you feel about school at present?” This item was also followed by a four-point response scale, which was characterized by generic attitudinal statements about school (1 = I like it a lot, 2 = I like it a bit, 3 = I don’t like it very much, 4 = I don’t like it at all). For the present sample, preliminary analyses indicated that both school functioning variables were relatively normally distributed (skewness and kurtosis < |1|).

Data Analyses

After creating the classification variables for the four competing schemas, prevalence rates for bullying involvement within each category were calculated using frequency analyses, which were then transformed into percentages for ease of interpretation. Trend analysis was then used to explore differences in the prevalence rates among the subgroups within each classification schema as well as across schemas. Following, the comparative concurrent validity of the four competing classification schemas was investigated by analyzing the relation between each of the classification variables and students’ risk status across the two mental health and two school functioning outcomes. Prior to conducting this analysis, dichotomous risk-status variables were created for each concurrent outcome by transforming all variable scores into standardized values (z scores) and then using 1 SD cutoff points, which are commonly used in standardized testing and screening processes, for sorting students into “at-risk” and “typical” groups. Specifically, for the psychological distress variable, participants with z scores >1 SD were designated as “at-risk,” while those with standardized scores ≤1 SD were deemed “typical.” Conversely, for the psychological well-being, academic performance, and feelings toward school variables, participants with z scores <1 SD were designated as “at-risk,” while those with scores ≥1 SD were deemed “typical.” These dichotomous risk-status variables were then used as the criterion variables in a series of binary logistic regressions wherein the classification schema variables functioned as the predictor variables. To interpret the logistic regression results, both statistical significance levels (p values) and effect sizes (odds ratios) were considered. Visual analysis was then used to explore the patterns of resulting odds ratios among the subgroups within each classification schema as well as across schemas.

Results

Prevalence Rates

Prevalence rates for the four competing bullying involvement classification schemas are presented in Table 5. Results for the standard four-group model show that approximately 43.2 % of students report being involved in bullying of some kind and to some degree in the recent past. That is, they indicated a response >“1” (or greater than “never”) on either one or both of the victimization and perpetration items. Comparatively, the alternative four-group model yielded a dramatic decrease in the proportion of students endorsing substantive bullying involvement (16.4 %), assuming that there are no meaningful differences between being “never” and “rarely” involved across victimization and perpetration domains. The nine-group model, on the other hand, showed the same proportion of students reporting involvement with bullying as was found in the alternative four-group model (16.4 %), as the decision rules for this subcategory were held constant, and also indicated generally smaller proportions of students endorsing involvement at the “often” levels compared to the “sometimes” levels. Finally, the sixteen-group model further subdivided students’ endorsements across four response categories for both victimization and perpetration, yielding the same number of uninvolved students as the standard four-group model yet varying levels and patterns of involvement across the remaining categories. Although there are some exceptions, the general trend of findings observed in Table 5 indicates that higher prevalence rates of bullying involvement are observed for categories characterized by lower levels of frequency of endorsement.

Table 5 Prevalence rates for classification schemas of bullying involvement

Concurrent Validity

Odds ratios (OR) resulting from the logistic regressions for the standard four-group model are presented in Table 6 and show predominantly statistically significant yet small effect sizes (excepting the highest and lowest OR) across all levels of involvement (compared to the uninvolved referent group) and all four concurrent outcomes, ranging from 1.29 (a negligible effect) to 2.67 (a medium effect). Findings for the logistic regressions for the alternative four-group model are presented in Table 7 and indicate statistically significant and slightly stronger OR effect sizes overall, ranging from 1.42 (a negligible effect) to 2.95 (a medium effect). OR trends for the logistic regressions for the nine-group model are presented in Table 8 and again show generally statistically significant and larger effect sizes overall than for both the standard and alternative four-group models, ranging from 1.00 (no effect compared to the referent group) to 4.58 (a large effect). Finally, results for the logistic regressions for the sixteen-group model are presented in Table 9 and indicate a trend of statistically significant and highest-overall OR effect sizes across concurrent outcomes in comparison with the three other classification schemas, ranging from 1.12 (a negligible effect) to 5.39 (a large effect). Although there are some exceptions, the general trend observed among the patterns of resulting OR within and between the four competing schemas indicates that higher levels of bullying involvement are associated with greater odds of being “at-risk” of mental health problems and poorer school functioning.

Table 6 Standard four-group classification schema: concurrent validity with student risk indicators
Table 7 Alternative four-group classification schema: concurrent validity with student risk indicators
Table 8 Nine-group classification schema: concurrent validity with student risk indicators
Table 9 Sixteen-group classification schema: concurrent validity with student risk indicators

Discussion

Interpretation of Results

The purpose of the present study was to initiate a line of research exploring the functionality of competing classification schemas for categorizing students’ bullying involvement, which differ according to where the conceptual boundaries are drawn regarding endorsements of relative rates of victimization and perpetration behaviors at school. More specifically, the aim of this study was to investigate the differential prevalence rates and concurrent validity of the standard four-group model for classifying bullying involvement in comparison with three other models—an alternative four-group model, a nine-group model, and the sixteen-group model—in relation to two mental health indicators (i.e., psychological distress and well-being) and two school functioning outcomes (i.e., academic performance and feelings about school). We hypothesized that the prevalence rates yielded by all schemas would differ substantially, with the nine-group and sixteen-group schemas yielding fewer students at higher involvement levels than the four-group schemas. Furthermore, we hypothesized that compared to the standard and alternative four-group models, both the nine-group and the sixteen-group classification schemas would demonstrate incremental validity across all concurrent indicators of mental health and school functioning. Generally speaking, the results from the primary analyses supported our hypotheses, suggesting that there is value added in extending the current four-group bullying involvement classification schema to account for greater frequencies of victimization and perpetration behaviors.

As hypothesized, prevalence rates of bullying involvement were substantively attenuated when the standard four-group dichotomous schema was changed to the alternative four-group schema (see Table 5). The prevalence rate of the perpetrator only category, for example, dropped from 15.5 to 5.3 %. This trend of attenuated prevalence rates continued when the relative frequencies of endorsement were further separated into the nine-group (3.9 % for “sometimes” and 1.4 % for “often” perpetrator only) and sixteen-group schemas (2.6 % for “sometimes” and 1.1 % for “often” perpetrator only). Moreover, a broad trend was indicated when comparing the odds ratios of the classification schemas, showing that schemas containing more categories yielded relatively stronger effect sizes in comparison with the referent group than schemas with fewer categories. This trend is attributed to the differentiation of classifications based on their relative frequency of bullying involvement in the expanded schemas, suggesting that greater levels of bullying involvement are associated with greater likelihoods of concurrent risk across mental health and school functioning indicators. The crucial categorization element these findings highlight is that when the “sometimes” and “often” responses are separated into distinct categories (compared to when they are combined into a single category), then much larger odds ratios are indicated. For instance, a comparison of the OR for the perpetratorvictim category across the four competing classification schemas shows steady increases in risk potential in comparison with the referent group as the schemas become progressively refined according to frequency of involvement, resulting in the largest observed effect sizes at the greatest levels of involvement (see Tables 6, 7, 8, 9).

Taken together, these findings indicate that progressively higher levels of bullying involvement, at both the victimization and perpetration levels, predict significantly poorer concurrent mental health and school functioning. Although this study was conceptualized from the categorical analytic tradition, the results suggest that it may be useful to classify bullying involvement using continuous data and associated norm-referenced cutoff scores, which function to place individuals with higher and lower levels of victimization and perpetration into SD-derived categories common to psychological assessment (e.g., lower extreme, below average, low average, high average, above average, and upper extreme). Although we intended to explore the viability of this continuous analytic approach post hoc, preliminary findings indicated that classifications derived using a continuous approach were identical to those derived using the original categorical approach. Specifically, post hoc analyses indicated that each possible score for both single-item measures was observed to represent a different SD-derived category for both victimization and perpetration (i.e., 1 = low average, 2 = high average, 3/4 = above average, 5 = upper extreme), and thus classifications derived from using a continuous approach were redundant with those derived from the categorical approach—indicating that additional analyses would only duplicate the findings obtained previously. We suggest that this finding is primarily an artifact of using single items with limited response ranges as the sole measure for both victimization and perpetration behaviors. Future research is therefore warranted to investigate the viability of a continuous analytic approach using more robust measures that consist of multiple self-report items instead of single items, which would allow for a range of scores (as opposed to a single score) to represent each SD-derived category. For example, the nine-item victimization and perpetration scales inherent within the HBSC may be useful for this particular purpose. Thus, we have initiated efforts to validate the utility of these scales for this purpose as well as to explore their incremental validity compared to the two single-item measures used in the present study.

Implications for Practice

Findings from the present study have potential implications for school mental health practice. Given that the overarching aim of the school mental health practitioner is to identify and provide services to students who are at the greatest risk of adverse outcomes, these results indicate that simply refining the classification schema used for screening bullying involvement (as a function of the relative frequency of response endorsements) may help identify students who are experiencing more mental health problems and poorer school functioning compared to the standard four-group schema, which is currently the most common approach used in practice. Beyond identifying students with greater needs for mental health services, findings from the present study also have implications for the feasibility of screening for bullying involvement in schools and how the resulting data might be used to inform mental health services within multitiered systems of support (see Stoiber, 2014). To make these implications more concrete, we can extrapolate the present findings to a hypothetical 500-student middle school, which is a moderate-sized school population. If a school mental health practitioner working in a school this size conducted a universal screening for bullying involvement, the results from this study suggest that the alternative four-group model is a more practically useful schema than the standard four-group model, as it identifies less students as substantially involved in bullying and thus warranting follow-up support services, and the students it does identify have generally greater risk levels than those identified by the standard model (see Tables 6, 7). From a practical point of view, most schools simply have neither the staff nor infrastructure to follow up with nearly half of the school population (231 students) which would be identified as involved in bullying in a 500-student school; however, identifying 82 students for follow-up services is markedly more reasonable.

Furthermore, when screening for bullying involvement and associated risk at the schoolwide level, the nine-group model identifies the same number of students as substantially involved in bullying as the alternative four-group model (see Table 5), yet it also shows incremental validity for identifying students with higher frequencies of involvement that are at greater risk of poorer mental health and school functioning outcomes. The sixteen-group model, on the other hand, identifies far more students than would be practical to follow up with, similar to the standard four-group model. If the goal of the school mental health practitioner is to hone in on the students that are at greatest risk and to do so in the most feasible manner, the results of the present study suggest that the nine-group schema may be the most functional for this purpose. Both of the four-group schemas can be considered to be too simplistic and blunt to identify students with the greatest risk, while the sixteen-group schema has greater incremental validity related to risk identification but is too impractical from a follow-up service delivery perspective. In the context of the hypothetical 500-student middle school, we can extrapolate that use of the nine-group schema would identify 29 students endorsing often at some level of bullying or victimization and who would consequently be at the highest risk of poor mental health and school functioning outcomes. Its seems plausible that a school mental health practitioner could provide a follow-up assessment as a second gate of screening (see Glover & Albers, 2007)—potentially a more robust self-report assessment of bullying involvement—with this number of students and, if needs be, provide targeted tier-two support services (e.g., social–emotional skill building groups) as a targeted intervention.

Limitations and Future Directions

Although the findings from the present study are encouraging, they should be considered within the context of a few methodological limitations. First, given that the items assessing bullying involvement included in the HBSC consisted of five response options and the present study collapsed these to four response options, this investigation did not explore the comparative concurrent validity of a full 25-group schema (i.e., separating responses of “3” and “4” into distinct categories for another layer of analyses), which may have yielded more nuanced findings. We chose not to explore this maximized schema because the sixteen-group model already appeared to offer more information than is practically useful in school mental health practice, yet we recognize that this is a subjective decision grounded in feasibility considerations and not necessarily empirical decision rules. We should also mention that our preliminary consideration of the wording of the items (see the “Method” section) was key in deciding to combine response options, as discriminating between the relative frequency represented by responses of “3” and “4” did not appear necessary. Future research might replicate the current study using the full response options and a 25-group schema for the purposes of investigating more nuanced differences in bullying involvement categories and relations between mental health and school functioning outcomes. However, considering that findings suggest that it may be useful to classify bullying involvement using continuous data and associated norm-referenced cutoff scores, we suggest that a more fruitful direction for future research is to validate multiple-item scales that might be analyzed as continuous data and used to create competing classification schemas. Considered in this light, we suggest that the limitation of the categorical approach assumed in the present study—and the practical similarity observed between classifications derived via categorical and continuous approaches post hoc—is best understood as a methodological artifact that can be remedied in future research using more robust measures of bullying involvement.

Given that the present study used a nationally representative sample of youth from the HBSC, the typical limitations of sample demographics and the generalizability of findings to youth in schools do not apply. However, considering that data used from the HBSC were wholly derived from youths’ self-reports, it is noteworthy that the findings may be biased by common method variance (i.e., the variance attributed to the measurement method rather than to the constructs represented by the measures; Podsakoff, MacKenzie, Lee, & Podsakoff 2003). To prevent against this potential bias in future studies, we recommend that follow-up investigations use variables derived from other measurement methods, including informant-report (teacher or parent) behavior rating scales of students’ mental health and performance-based indicators of youths’ well-being (e.g., report card grades, standardized test scores, and school discipline data). In addition, we suggest that productive next steps for progressing this line of research may include using more high-information models for data analysis that investigate the interaction effects between victimization classifications and perpetration classifications (as opposed to combining both metrics within a single variable), using continuous (as opposed to categorical) mental health and school functioning outcome variables, and testing to see whether similar trends in findings are observed when using other, non-HBSC measures of bullying involvement. Ultimately, however, we emphasize that future research in this line of inquiry should be carefully guided by concerns for incremental validity and treatment utility (see Hayes, Nelson, & Jarrett, 1987), as the intention of investigating competing classification schemas is to inform preferences for assessment methods that, in turn, inform interventions targeted to decrease youths’ involvement in bullying and improve their mental health and school functioning.