Introduction

Tragic stories of relational aggression (RA) and related outcomes have gained growing media attention in recent years (FOXNews.com 2008; Goldstein et al. 2008). Being relationally aggressive or the victim of RA is associated with several serious negative outcomes including increased levels of depression, impulsivity, defiance, substance abuse, delinquency, and risk-taking behaviors (Spieker et al. 2012; Wang et al. 2012). Mounting concern amongst helping professionals has resulted in a range of books (e.g., Odd Girl Out: The Hidden Culture of Aggression in Girls by Rachel Simmons (Simmons 2002), Reviving Ophelia: Saving the Selves of Adolescent Girls by Mary Pipher (Pipher 1994)), websites (e.g., The Ophelia Project www.opheliaproject.org), movies (e.g., Paramount Pictures’ Mean Girls produced by Lorne Michaels and directed by Mark Waters (Michaels and Waters 2004)) and interventions (e.g., Friend to Friend, Leff et al. 2009; Owning Up, Talbot 2002). There is a clear need for effective interventions to address relational aggression among girls.

As a nonphysical form of aggression, RA is characterized by behaviors that harm others through direct or indirect attacks on relationships rather than direct attacks to physical objects or a person’s well-being (Crick 1995). Examples of RA include manipulative behaviors such as gossiping, rumor spreading, social alienation, exclusion and rejection (Crick et al. 2002). More than 50 % of youth report being the victims of RA with victimization and bullying peaking in grades 7 and 8 (Wang et al. 2012; Wang et al. 2009). While some research has been mixed as to the prevalence of RA across boys and girls, most suggests that girls are more likely to be relationally aggressive than boys or that a larger proportion of girls’ aggressive activities is relational in nature than boys, who tend to be more physically aggressive (Wang et al. 2012). Further, while boys tend to demonstrate less RA as they age, Spieker et al. (2012) found no difference across time for girls in grades 3–6. Relationally aggressive behaviors have been reported from early childhood to high school and college (Roecker Phelps 2001; Werner and Crick 1999). However, it is possible that the saliency of peer relationships and growth in cognitive and social skills during the late middle childhood and adolescence developmental period make the middle school years particularly troublesome and in need of intervention.

In early childhood expressions of RA are generally more direct and less covert (e.g., explicitly telling a girl that she cannot play a recess game unless she does it the way the peer wants her to), while RA in middle childhood is generally characterized by more covert and sophisticated behaviors (Crick et al. 2001). Examples may include spreading more sophisticated rumors, more subtle exclusionary behaviors, writing nasty notes, and/or talking behind the target’s back. In adolescence relationally aggressive behaviors continue to increase in sophistication and covertness, while also expanding to target both male and female peers as a result of increasing importance of opposite-sex relationships (Bjorkqvist et al. 1994; Crick et al. 1999a, b). Examples may include using opposite sex peers to hurt same-sex relationships and/or victimization within romantic relationships to hurt romantic partners. These changes reflect developmental growth in cognitive skills (e.g., social cognition; Sutton and Smith 1999) and social skills (e.g., use of subtle, nonverbal behaviors; use of negotiation and bargaining; Laursen 1993; Selman 1980). Additionally, peer relationships and social status become very important developmental tasks during later middle childhood and adolescence as children seek independence from their parents (Savin-Williams and Berndt 1990). The increased saliency of the peer group and need for social acceptance likely makes the threat of relationally aggressive behaviors that harm an intimate friendship or reputation more significant to children in middle school than elementary school (Yoon et al. 2004).

The development and maintenance of relationally aggressive behaviors has been empirically linked to child and family risk factors, including children’s attribution style and social information processing skills, as well as family relationships and parenting styles (Yoon et al. 2004). Crick and Dodge’s (1994) social information processing (SIP) theory of aggression posits that the ability of a person to process a sequence of social cues impacts their behavioral response. In comparison to their prosocial peers, relationally aggressive youth have been found to have distorted and impaired social information processing skills. Relationally aggressive youth tend to make hostile attributions of others’ intentions in relationally provocative and ambiguous situations, possess normative beliefs that RA is acceptable and frequent, and become more emotionally aroused than nonaggressive or physically aggressive youth in response to relationally provocative situations (Crick 1995; Leff et al. 2003). Furthermore, socialization theories provided by Dix (1993) and Dodge (2006) suggest that attachment relationships, parenting styles and other environmental risk factors provide socialization messages to children through modeling behaviors and disciplinary responses. It appears children may learn normative beliefs about RA and hostile attributions from adult models, such as their parents (Dodge 2006; Spieker et al. 2012). Parents who have normative beliefs about RA are also less likely to respond to such behaviors in the same punishing way they respond to physically aggressive behaviors (Goldstein and Boxer 2012; Werner and Grant 2009). These findings suggest that the development of deficits in children’s social information processing skills may be learned from social models (i.e., parents) and resulting relationally aggressive behaviors may be developed and maintained due to parents’ poor disciplinary responses.

Interventions that target deficits in the SIP sequence of aggressive children through a cognitive behavioral approach, including attribution re-training, have been proven effective for physical and overt forms of aggression (Brain Power Program, Hudley and Graham 1993; Anger Coping Program, Lochman 1992). Emerging evidence supports similar interventions that target RA, and similar group-based interventions may also be effective in reducing relationally aggressive behaviors (i.e., Making Choices Program, Fraser et al. 2004; Friend 2 Friend Program, Leff et al. 2007). Research also suggests RA begins to emerge at heightened levels during preadolescence (e.g., age 11; Bjorkqvist et al. 1992) and cognitive behavioral interventions are more effective with children assumed to be in the formal operational stage of cognitive development (e.g., age 11 and beyond; Durlak et al. 1991; Wang et al. 2012). Further, research suggests interventions only targeting child processes and deficits may not be effective (Werner and Grant 2009). Caregiver training programs that target parents’ and caregivers’ cognitions and behavioral responses to RA may be necessary in effectively reducing girls’ relationally aggressive behaviors across multiple settings and over time. While some literature suggests that targeting parents’ disciplinary practices in relation to RA may be a component of an effective intervention (Goldstein and Boxer 2012), additional research indicates the need to also include efforts that alter parents’ normative beliefs and cognitions about RA and their attributions of responsibility in relationally aggressive situations involving their children (Werner and Grant 2009; Werner et al. 2006).

Based on the review of literature identifying the risk factors and developmental pathways of RA, as well as findings from related interventions and recommendations of developmental researchers, we hypothesize an effective intervention for reducing at-risk girls’ demonstration of relationally aggressive behaviors should be multisystemic including both youth and their caregivers. Consideration of developmental changes indicates that intervention is most likely needed in middle school as relationally aggressive behaviors become more sophisticated and impactful during this developmental period (Yoon et al. 2004). Further, interventions that are simple, inexpensive, theoretically and empirically supported, and acceptable to youth, families, and mental health providers are needed to increase sustained implementation and fidelity (Embry 2004). The majority of previously evaluated interventions targeting RA have been either one or a combination of universal prevention efforts (Leff et al. 2010), group counseling only (Cappella and Weinstein 2006), or targeting elementary-age girls only (Fraser et al. 2004; Leff et al. 2009). We iteratively developed GIRLSS (Growing Interpersonal Relationships through Learning and Systemic Supports) to address the paucity of multisystemic, secondary interventions targeting middle schools girls available to mental health professionals. In the current study, we aimed to conduct a pilot test of GIRLSS and examine clinically and statistically significant differences in treatment participants’ level of relationally aggressive behavior compared to participants in a waitlist control condition. We hypothesized that participants in the treatment condition would demonstrate clinically and statistically significantly more change from high at pretest to low levels at posttest of self, teacher and school counselor-reported relational aggression than waitlist control group participants.

Method

Participants

Participants were recruited from a pool of eligible female students from two middle schools in a Midwestern state. Eligible participants were identified through a teacher-school counselor nomination procedure informed by empirical research (Leff et al. 1999) and developed in consultation with the natural implementers and university supervisors to maximize external validity and social acceptability. First, all sixth through eighth grade teachers in these two schools reviewed a description of RA adapted from an existing measure of RA (Children’s Social Behavior Scale-Teacher Report; Crick 1996) and identified female students who met the description from a class roster. We provided a list of exclusionary criteria that included suspected drug use, involvement with juvenile authorities, behavioral patterns of significant physical aggression, and significant social-emotional and/or health concerns beyond the scope of the intervention. Due to the high number of students referred and the limited space available for enrollment in the study (i.e., approximately 35 participants or less), the school counselors in each school were asked to review the list of identified students and rank them on levels of RA and family involvement using a three-point scale. In each school, school counselors were identified as the person most likely to be contacted if concerns of relational aggression were raised. They based their rankings on frequency of referrals for relationally aggressive concerns and frequency of contact with the student’s family.

This referral procedure yielded a total of 92 female students (45 students in School A and 47 students in School B) who were eligible for recruitment and enrollment into the current study. Participants were recruited by the school counselor using the ranked referral list of eligible students to maximize the number of participants who were considered to be highly relationally aggressive and have involved caregivers. Previous prevention and treatment research has demonstrated several barriers to securing adequate family involvement (Kazdin et al. 1997). Thus, we decided to prioritize caregiver involvement, in addition to high levels of relationally aggressive behaviors, in this pilot study in order to fully evaluate the intervention’s potential outcomes when implemented as intended. The school counselor used flyers and phone calls to recruit participants, and interested caregivers provided verbal assent to be contacted regarding the study. The primary author contacted verbally assenting caregivers by phone, provided additional information about the study, and scheduled a home visit for interested families to obtain written informed consent and youth assent forms, as well as to collect additional data (see “Procedures” section). Due to limitations in time and resources (e.g., only two iterations of intervention feasible at each school, funding), recruitment and enrollment in the intervention was limited to approximately 35 student participants or less.

Thirty-three participants consented to being in the study and were randomly assigned to the intervention or waitlist control condition. At pretest, 9 % of participants were in grade 6, 70 % in grade 7, and 21 % in grade 8. The mean age of participants at the time of intervention was 13. Regarding race, free or reduced lunch status, and household composition, data are missing for six of the 33 participants (18.2 %). Of the remaining 27 participants, nineteen reported their race as white, non-Hispanic (57.5 %), five African American (15.2 %), and three multiracial (9 %). Twenty-one caregivers reported their race as white, non-Hispanic (63.6 %), five African American (15.2 %), and one Hispanic (3 %). 14 (42.4 %) participants paid full price for lunch at school while 13 (39.4 %) qualified for free or reduced lunch price based on income level. Eight (24.2 %) participants lived with both biological parents while 7 (21.2 %) lived with their biological mother only, 8 (24.2 %) lived in a blended family with one biological parent, 2 (6.1 %) in a shared custody situation, and 2 (6.1 %) lived with adoptive parents. Due to scheduling conflicts and discomfort in the group setting, five intervention participants withdrew from the study following completion of pretest measures and one to three group counseling sessions. Thus, a total of 28 female youths participated in the study, including 12 waitlist control participants and 22 intervention participants. Six participants randomly assigned to the waitlist control group during the first iteration participated in the intervention during the second iteration and are thus counted in the total of each randomly assigned group.

Procedures

All study procedures were approved by the university’s Institutional Review Board. Participants who were assigned to the intervention group in School A participated in the intervention in Spring 2010, whereas participants in the wait-list control group in School A received the intervention in Fall 2010. Participants who were in the intervention group in School B participated in the intervention in Fall 2010. Participants who were assigned to the wait-list control group in School B participated in the intervention in Spring 2011 but, due to time and financial constraints, the intervention data for the waitlist control group in School B were not collected for the purposes of the current study.

As part of a larger study, data on relationally aggressive behaviors, internalizing problems, externalizing problems, emotional symptoms, and mechanisms of change (attribution styles, normative beliefs about RA, parenting practices and styles, and participants’ and caregivers’ knowledge of RA and the GIRLSS intervention) were collected pretest and posttest. Pretest data were collected within 4 weeks of the beginning of the intervention and posttest data were collected 2–6 weeks following conclusion of the intervention. Data were collected at school, during home visits, or at other locations selected by caregivers (e.g., place of employment, public library). Caregivers and student participants received a five dollar gift certificate to a local restaurant at each data collection time point. Teacher reporters were identified during the referral process by asking grade-level teaching teams to designate one teacher on their team as a reporter for each student referred. Identified teachers were those who had the most contact with the student and purported to have a neutral perspective of the student. Teachers and school counselors were not monetarily compensated for their time completing the measures.

Intervention: Growing Interpersonal Relationships Through Learning and Systemic Supports

GIRLSS is a 10-week intervention including school-based group counseling (Learning), and parent training and consultation (Systemic Supports) designed to target the empirically identified risk factors related to RA described previously and illustrated in Fig. 1. It was originally adapted from Relational Aggression in Girls (Kupovits 2008) and was further developed over multiple iterations in collaboration with school counselors and based on feedback from student participants and group leaders. These steps led to the development of a manualized intervention curriculum with student and caregiver components. Research suggests cognitive-behavioral, structured groups are more effective in reducing the potential for deviancy training and long-term outcomes for antisocial youth than process groups, and the GIRLSS group counseling component included a clearly outlined curriculum with process discussions that were structured and closely monitored by group co-leaders (Handwerk et al. 2000; Rhule 2005).

Fig. 1
figure 1

Multi-modal intervention model to reduce relational aggression based on combined risk model (Dix 1993; Dodge 2006)

Students participated in one 70 min group session per week for 10 weeks. The group intervention was delivered during the school day but not during core classes by trained graduate clinicians. Each group session focused on a specific topic taught through the use of interactive discussions, media-based examples, role-plays, journaling, and weekly goal setting. The topics were designed to target steps of the SIP sequence (Crick and Dodge 1994). For example, movie and book examples of ambiguous and relationally aggressive scenarios were shared during session two to help participants identify provocative and ambiguous situations (i.e., step one). In subsequent sessions, we asked participants to identify the roles of each character in the relationally aggressive scenarios, as well as their likely thoughts, feelings and actions to introduce the concept of attribution biases and hostile thoughts (i.e., step two). Personal examples, interactive discussion, and journaling were used to help participants identify physiological signs of emotional arousal, and role plays were used to help teach strategies for regulation (i.e., step three). We focused on applying these topics to participants’ personal situations through individual worksheets completed with group leaders, small group discussions facilitated by group leaders, and journaling. Self-talk strategies were taught to help participants reframe hostile thoughts and negative statements, and assertiveness skills were practiced to help participants increase the number of appropriate behavioral response options identified (i.e., step four). Finally, to increase participants’ awareness of relationally aggressive behaviors and immediate negative outcomes (i.e., step five), we used psychoeducational strategies to share data-based and anecdotal stories of negative outcomes, conducted a perspective taking activity using a fictitious scenario, and used a motivational interviewing technique to discuss and compare the pros and cons of continuing to be relationally aggressive versus changing.

To promote prosocial behaviors and prevent peer reinforcement of aggressive behaviors and rumination, group rules were established and a behavior management program was employed based on the recommendation of previous research (Dishion et al. 1999; Rose 2002). Participants received rewards for appropriate behavior, including following the rules, contributing positively to the group (e.g., sharing, asking questions), and completing assigned work. Clear and consistent behavior management strategies have been employed successfully in previous interventions with deviant youth (Fisher and Chamberlain 2000; Handwerk et al. 2000). In a group setting with adolescent girls who are not only relationally aggressive but also likely have low self-esteem and internalizing behavior problems, it is important to prevent both peer reinforcement of aggressive behaviors and rumination (Dishion et al. 1999; Rose 2002). Previous research suggests several strategies for preventing deviancy training and co-rumination, including involving parents, providing continuous training, supervision and evaluation of intervention staff, and decreasing opportunities for peer reinforcement of problematic behaviors during group (Fisher and Chamberlain 2000; Handwerk et al. 2000; Henggeler and Sheidow 2003; Rhule 2005).

The caregiver component of the intervention included two workshops and biweekly phone consultations. The workshops included didactic lectures; interactive process discussions; media-examples; self-evaluation of knowledge, beliefs and disciplinary responses; and role-plays. Topics of the workshops included (1) prevalence of RA and negative outcomes, as well as research supporting the program’s risk model (e.g., SIP and the influence of parents’ normative beliefs; Werner and Grant 2009); (2) appropriate disciplinary responses to instances of RA (Werner et al. 2006); (3) positive and appropriate communication, monitoring and supervision strategies (Kawabata et al. 2011); and (4) generalization strategies to help caregivers support their child participant’s learning at home. Consultants called each family biweekly to introduce and provide information about the intervention and group counseling curriculum, remind caregivers about the upcoming workshops, process discussions from previous workshops, and problem solve any issues identified by the caregiver.

Both intervention components were delivered by trained graduate clinicians. Graduate clinicians participated in a 3–4 h initial training prior to the beginning of each intervention group, as well as 1.5 h of weekly training and supervision during implementation of the intervention. Trainings and weekly supervision meetings were conducted by the primary researcher under the supervision of a licensed psychologist.

Intervention Dosage

Attendance for each component of the intervention was documented by the graduate clinicians implementing the intervention. Regarding the group counseling component, 41 % of participants, or nine of 22, attended all 10 group counseling sessions. Most participants attended at least 9 of 10 group sessions (82 %). Regarding the caregiver component, 64 % of student participants were represented by at least one caregiver at one or more workshops and 41 % attended both. Four phone calls were scheduled and outlined for each caregiver resulting in a total of 88 expected phone calls. Co-leaders called caregivers no more than two times for each scheduled phone call and left a message with a return phone number following the second attempt. Sixty-three of the 88 (71.6 %) scheduled phone calls were completed.

Intervention Integrity

The degree to which each component of the intervention was implemented as intended was monitored through self-reported fidelity checks completed by graduate clinicians following the delivery of each intervention component and supervisor review of videotaped group counseling and caregiver workshop sessions. The supervisor reviewed and completed fidelity checklists for 60 % of group sessions and 100 % of caregiver workshops in order to ensure reliable self-reporting by the graduate clinicians. Fidelity checklists were comprised of core activities and processes of each intervention component. For example, the group counseling fidelity checklist consisted of a list of activities (e.g., reviewing homework, setting a marble goal, handing out and reviewing journal homework for the next session) and processes (e.g., asking questions that elicited change talk, reflecting connections between thoughts, feelings and actions) that were consistent across sessions, as well as space to check activities specific to a given session. Fidelity checklists for the parent workshops and phone calls also included activities and processes consistent across each component and specific to each session. Approximately 92 % of group sessions, 88 % of caregiver workshops, and 89 % of caregiver consultation phone calls were implemented with fidelity.

Measures

As part of the larger study, additional measures were administered to assess secondary outcomes such as internalizing behavior symptoms and mechanisms of change proposed in the intervention’s conceptual model (see Fig. 1). However, only measures utilized in the analyses of statistical and clinical significance, as well as social validity, are described here.

Relationally Aggressive Behaviors

Participants’ levels of relationally aggressive behaviors was measured by the Children’s Social Behavior Scale-Self (CSBS-S; Crick and Grotpeter 1995) and Teacher report (CSBS-T; Crick 1996). The CSBS-S includes 15 items distributed across six subscales (RA, physical aggression, verbal aggression, prosocial behaviors, inclusion, and loneliness). Items were rated on a five-point scale (1 = never to 5 = all the time). The CSBS-S took 10–15 min for participants to complete and has shown high levels of internal consistency and reliability (i.e., Cronbach’s alpha ranging from 0.66 to 0.82 for each subscale, 0.73 for the RA subscale; Crick and Grotpeter 1995). The CSBS-T includes 13 items distributed across three subscales (RA, physical aggression, prosocial behavior) rated on a five-point scale (1 = never true to 5 = almost always true). It took approximately 10 min for teachers and school-based staff to complete and has been shown to be internally reliable (i.e., Cronbach’s alpha equal to 0.94, 0.94, and 0.93 for the RA, overt aggression and prosocial behavior, respectively). The CSBS-T has also been shown to correlate significantly with peer and/or teacher nomination of relationally aggressive boys (r = 0.57, p < .001) and girls (r = 0.63, p < .001) (Crick 1996). Crick and Grotpeter (1995) report that all of the CSBS scales have been found to be internally reliable (i.e., Cronbach’s alpha ranging from 0.82 to 0.89 for RA and 0.94 to 0.97 for overt aggression) and have high test–retest reliability over a 4-week period (i.e., r = 0.82 for the RA scale and r = 0.90 for the overt aggression scale). The CSBS-T was completed by a teacher and school counselor for each child participant.

Social Validity: Formative

Feedback from child participants regarding the usefulness and quality of group counseling sessions and caregiver participants regarding parent workshops was collected anonymously following each session/workshop using four multiple choice questions, including: (1) I think what the group leaders were teaching us was… (3 = easy to figure out, 2 = pretty easy, 1 = hard), (2) I think the group leaders did a good job leading our group (3 = totally agree, 2 = agree somewhat, 1 = disagree), (3) I think what I learned today was… (3 = very helpful, 2 = helpful, 1 = not helpful), and (4) I think that what we talked about as a group was… (3 = important and worth talking about, 1 = Not worth talking about).

Social Validity: Summative

Summative evaluation feedback regarding the acceptability of all 10 weeks was also collected. Questions included (1) How would you rate the overall group, (2) How would you rate the group content (topics that were discussed)?, (3) How would you rate the activities (discussions, movie clips, role plays, scenarios)?, (4) How useful did you find the journal prompts?, (5) How useful did you find the weekly goal sheets?, and (6) How well did the group coordinators lead discussions and the overall group? Each question was rated on a five-point scale with 5 being excellent and 1 being poor.

Data Analyses

Analyses included statistical tests for pretest group differences and differences in treatment and control participants change from pre to posttest (i.e., independent group’s t tests), tests of clinical significance (Reliable Change Index; RCI, Jacobson and Truax 1991), and evaluation of social validity. Independent group’s t tests examined whether or not the amount of change from pretest to posttest differed significantly (p < .05) between treatment and control group participants. Change scores were computed by subtracting pretest scores from posttest scores on the RA subscale of the teacher and school counselor CSBS-Ts and the CSBS-S for each participant. Given school counselors had some knowledge of intervention status, an average teacher/school counselor report change score was also computed by averaging pretest and posttest scores from the teacher and school counselor report of RA on the CSBS-T for each participant and then subtracting the averaged pretest score from the averaged posttest score for each participant.

Analyses of clinically significant change are based on procedures proposed by Jacobson and Truax (1991) and include classifying each participant as recovered, improved but not recovered, unchanged, or deteriorated based on the following three criteria. First, the participant or client must begin the intervention with elevated symptoms (e.g., at or above established cutoffs for clinical significance) and end treatment below the clinical threshold. In the current study, the CSBS measure does not have pre-established cutoffs for clinical significance. As a result, we chose to use the midpoint of the measure’s Likert scale as the threshold for clinical significance (i.e., 3 = Sometimes). Second, the change from pretest to posttest must be clinically significant. Jacobson and Truax (1991) suggest using normative data from functional and dysfunctional samples to determine clinically significant change. However, in cases where such data are not available they suggest using two posttest, control group standard deviations from the pretest mean as the cutoff for determining clinically significant change. The third criterion requires each participant to demonstrate statistically significant change, or reliable change. To test this, a Reliable Change Index is computed for each participant based on the difference between his or her pretest and posttest score divided by the standard error of the difference between the scores (Jacobson and Truax 1991). The RCI is a standardized metric and expressed as a z score. Thus, Jacobson and Truax (1991) suggest using ±1.96 as the cutoff if the desired level of significance is p < .05. However, given the current study’s pilot nature a more lenient RCI of ±1.64 (p < .10, two tailed) was used. This is common and acceptable when the reliability of the measure is high, which is the case for the CSBS-S and CSBS-T relational aggression subscale (Crick and Grotpeter 1995; Iverson 2012).

Participants are considered recovered if they meet all three criteria, including beginning the intervention above the clinical threshold and ending below, demonstrating clinically significant change bringing them closer to the functional range than dysfunctional, and demonstrating statistically reliable change (Jacobson and Truax 1991). Participants are considered improved but not recovered if they begin treatment above the clinical threshold and demonstrated clinically significant and statistically reliable change, but do not end treatment below the clinical threshold. No change indicates that participants did not demonstrate clinically significant or statistically reliable change, regardless of whether or not they started or ended treatment above or below the clinical threshold. Finally, deterioration indicates that participants demonstrated clinically significant and statistically reliable change in the negative direction, regardless of whether or not they started or ended treatment above or below the clinical threshold.

Results

Independent group’s t tests were used to examine any pretest group differences on major demographic variables described in the Participants section above and study variables reported in Table 1. No significant differences were found on study variables. However, intervention participants were statistically significantly older in age and grade than control participants t(37) = −2.055, p = .047 and t(37) = −3.274, p = .002, respectively.

Table 1 Descriptive statistics and independent group’s t test

The amount of change from pre to posttest reported by school counselors on the CSBS-T RA subscale for intervention participants (M = −0.83) was significantly different from that reported for control participants (M = −0.10), t(28) = 2.17, p = .038 (see Table 1). As reported by the school counselors, intervention participants demonstrated statistically greater change in the desired direction than control participants. No significance was found in the independent group’s t tests analyses for self-report CSBS-S t(32) = 0.01, p = .991 nor teacher report CSBS-T t(32) = 1.09, p = .283. The averaged teacher and school counselor report of change from pretest to posttest was statistically different between intervention participants (M = −0.66) and control group participants (M = 0.01), t(28) = 2.18, p = .038 with intervention participants demonstrating greater change in the desired direction than control group participants.

Clinically significant and reliable change was assessed on two variables, including the self-report (CSBS-S) RA subscale and the averaged teacher and school counselor report of RA on the CSBS-T for each participant (see Table 1). At pretest, only two intervention and two control group participants self-reported relationally aggressive behaviors above the determined cutoff for clinical significance (i.e., 3 = Sometimes). This equates to 9 % of intervention participants, 16.7 % of control group participants, and 11.7 % of the overall sample. Of these four participants, both intervention participants recovered while both control participants demonstrated no clinically or statistically reliable change. Two additional intervention participants actually deteriorated. These two moved from below the clinical threshold at pretest to above it at posttest and the amount of change demonstrated was clinically significant and statistically reliable.

At pretest, the adult raters reported that 13 intervention and three control group participants demonstrated relationally aggressive behaviors above the clinical threshold. Four of the 13 intervention participants above the clinical threshold at pretest recovered (30.8 %), three improved (23.1 %), and six were unchanged (46.1 %). One of the three control group participants above clinical threshold at pretest recovered (33 %) while the other two were unchanged (37 %). Overall, fifteen of the intervention participants were unchanged (68.2 %) and none deteriorated (0 %). Seven of the control group participants were unchanged (87.5 %) and none deteriorated (0 %). Figure 2 illustrates the amount of change reported from pretest to posttest by adult reporters for each participant according to their classification of recovered, improved, unchanged or deteriorated.

Fig. 2
figure 2

Averaged teacher and school counselor-reported RA change scores by clinical change classification

Results of each question on the formative group counseling session measure are reported for each session in Fig. 2. Overall, participants thought the session topics were fairly easy to figure out and important. Participants also felt the group leaders did a good job during most sessions. However, participants’ perception of the group topic dropped during sessions four and eight indicating that participants felt the topics were harder to understand, less helpful and less important than other session topics. Session four included a lengthy discussion structured by a motivational interviewing technique that elicited the pros and cons of being relationally aggressive and the pros and cons of changing relationally aggressive behaviors. The data presented in Fig. 3 indicates that this session may have been particularly problematic and unacceptable to participants. Further, participants’ perception of the group leaders dropped during sessions two thru four. From the data, it seems that participants began and ended the group with positive feelings towards the group leaders, but following the first session their perception dropped as they continued to build rapport and establish norms in the group. As the group built trust and moved into the working stages, the group leaders were again viewed more positively by participants in sessions five through 10. Session eight was mostly discussion and some individual work that may have been difficult for some participants. The individual work asked participants to apply several strategies related to the thoughts, feelings and actions sequence to a personal situation. The handout and activity seemed confusing to many participants likely leading some to feel the session’s topic was less helpful or less accessible as other sessions.

Fig. 3
figure 3

Average session social validity ratings

Participants’ responses on the summative evaluation indicated good to very good perceptions of the overall group (M = 3.85), content (M = 4.10), activities (M = 3.88) and group leaders (M = 4.2). Interestingly, participants rated the journal prompts (M = 3) and goal sheets (M = 3.55) lower than other areas. However, group leaders reported anecdotally that few participants completed these homework tasks prior to the following session.

Results of ratings from caregivers following each workshop indicate that they thought the material was easy to figure out and important. Caregivers also thought the group leaders did a good job leading each workshop. With regards to the helpfulness of the workshop topics, caregivers rated workshop one in the very helpful range and workshop two in the helpful range.

Discussion

In this study we evaluated the effectiveness of an empirically informed intervention designed to reduce relationally aggressive behaviors enacted by middle school girls. As a secondary intervention, GIRLSS seeks to fill a gap in the continuum of available practices to prevent and reduce RA and addresses both youth and parent/caregiver risk factors (see Fig. 1; Goldstein and Boxer 2012; Werner and Grant 2009). Findings from this pilot study provide initial support of the potential of GIRLSS to reduce relationally aggressive behaviors amongst participants and suggest potential modifications for future iterations. In the discussion that follows, we describe the study’s findings and limitations within the context of prior research and offer implications for future research and practice.

It was hypothesized that participants in the intervention group would demonstrate significantly more change (i.e., reduced RA) than participants in the control group. Results from the school counselor report and averaged teacher and school counselor reports supported our hypothesis and the initial effectiveness of GIRLSS. While school counselors reported very little change in control group participants’ relationally aggressive behaviors from pretest to posttest (M = −0.10), they reported nearly a full point decrease in intervention participants’ relationally aggressive behaviors (M = −0.83). Likewise, when averaged, teachers and school counselors reported little to no change in control participants’ relationally aggressive behaviors from pretest to posttest (M = 0.01) but reported more than a half-point decrease in intervention participants’ relationally aggressive behaviors (M = −0.66). Although similar levels of significant difference were not found in the reports of RA by teachers alone, the trajectory of intervention participants’ mean levels of teacher-reported RA from pretest to posttest was in the desired direction (Change Score M = −0.50) while ratings for control participants remained unchanged (Change Score M = −0.05). When averaged, participants reported low levels of relationally aggressive behaviors at pretest (Control M = 1.80 and Intervention M = 1.94 on five-point scale), thus limiting the amount of change that could be observed at posttest.

Clinically, more than half of the intervention participants reported to be at or above clinical levels at pretests by adult raters were recovered or improved at posttest while 40 % of control participants at or above clinical levels at pretest remained unchanged. Results from social validity surveys evaluating the group counseling and parent workshop sessions suggest that the overall groups and workshops were good/helpful, participants enjoyed the activities, and most topics were considered important. However, group counseling participants reported lower ratings of the ease, usefulness and importance of some sessions and overall group activities (i.e., journal prompts and goal setting). Parents also found the second workshop less helpful than the first. In light of these findings, future iterations of GIRLSS should focus on improving sessions four and eight of the group counseling curriculum, revising procedures related to the journal prompts and goal setting homework, and review the content of the second parent workshop.

While these results are promising, several limitations warrant caution when considering the current study. To begin, capacity constraints limited the sample size in the current study, thus reducing power and increasing the risk of a Type II error. The current study was designed based on similar studies with small sample sizes (see Leff et al. 2009 evaluation of Friend2Friend Program including 21 intervention and 11 control participants). However, an important recommendation for future research is to recruit and include a larger sample that sufficiently meets sample size standards for more sophisticated analyses. In addition, the study’s referral and recruitment procedures prioritized the approval of the teachers and school counselors in the research setting to increase external validity and social acceptability. However, it is possible that the internal validity of the referral measures were compromised given previous findings. For example, Leff et al. (1999) compared teacher nominations to peer nominations of bullies and victims and found significantly lower agreement in middle school than elementary school. When multiple teacher nominations were combined to predict peer-nominations, agreement across levels of schooling improved (Leff et al. 1999). These findings suggest that teacher nomination procedures are promising, especially when provided by multiple teachers as was done in the current study, but more research to establish internal validity is needed. Further, following referral, participants were recruited based on school counselors’ rankings of the severity of the relationally aggressive behaviors and caregiver involvement. Thus, the study may have benefited from the inclusion of more highly involved caregivers who may not be reflective of the true population. Given the study’s pilot nature, recruiting participants and caregivers who were most likely to attend program events was important.

Second, significant findings resulting from school counselors’ report of RA should be considered with caution because of the school counselors’ investment in the intervention, their knowledge of participants’ treatment status, and lack of independence among student participants. One school counselor participated in the iterative development of the intervention. As a result, she was invested in the intervention and its potential to produce positive outcomes. Both school counselors helped coordinate logistics of delivering the intervention (e.g., providing a list of students to be released during class for participation in the group) and therefore were aware of participants’ treatment status. Although it was impossible to maintain a blinded study, it could also be argued that the school counselors provided a more informed report of relationally aggressive behaviors than teachers, given that most relationally aggressive situations are referred to them. A middle school teacher who only sees a student for 1 h daily may not be as aware of all the relationally aggressive situations in which a student has been involved as a school counselor who hears about relationally aggressive behaviors from other students and/or teachers. Further research is needed to compare the reliability and correlation of teacher and school counselor reports of RA, as well as methods to maintain blindness for all reporters.

Finally, the study’s outcome measures may have been further limited by reliance on perceptual data rather than inclusion of direct or naturalistic observations. In more traditional aggression research, observational methodologies have been used to circumvent biases inherent in perceptual and self-report data. However, given the covert nature of RA, reliable and valid observations are extremely difficult, intensive and scientifically problematic (Crick et al. 1999; Young et al. 2006). Given the problems presented by perceptual and observational methodologies, many researchers have recommended using a multi-informant, multi-method approach to identifying and measuring RA (McEvoy et al. 2003; Tackett and Ostrov 2010). They suggest future research should examine the reliability and validity of a combined approach, such as using teacher rating scales to identify students perceived to be relationally aggressive and then confirming teacher opinions’ through direct observation. Future research should seek to develop methodologies that rely on multiple informants and methodologies while balancing issues of feasibility and acceptability.

Implications for Research and Practice

This study represents a first step in helping school mental health practitioners, such as school counselors, meet the intervention needs of a diverse group of relationally aggressive girls. Study participants represented diverse backgrounds in relation to race, free/reduced lunch status and household composition. While this expands the generalizability of the study’s findings, caution is noted due to the small sample size and prioritization of high caregiver involvement during the recruitment process. Strengths of the study include its strong theoretical basis, inclusion of multiple contexts as points of intervention, iterative development with significant contributions from natural implementers, and evaluation of social validity. Participants’ perceptions regarding the importance, usefulness, and ease of topics are important to collect during iterative development of an intervention. Social validity findings from the current study suggest that overall participants and caregivers found the intervention good/helpful, but also provide specific areas on which to focus iterative revisions. Combined with the statistical findings, these strengths suggest that GIRLSS may provide an effective, acceptable, generalizable, and translational strategy for practitioners with the need to reduce RA in their setting. Further, the current study demonstrates that GIRLSS can be implemented with high fidelity and minimal costs. However, future research using natural implementers as intervention providers should be considered in order to better understand the intervention’s feasibility.

An important strength of GIRLSS is its basis in sound psychological theory, including the SIP paradigm (Crick and Dodge 1994) and contributions of caregivers’ normative beliefs and disciplinary practices (Goldstein and Boxer 2012; Werner and Grant 2009). Development of intervention strategies to target SIP skills and caregiver beliefs and behaviors represents an important contribution to practice and suggests the need to measure changes in intervention targets from pretest to posttest. Future research should evaluate the validity of the intervention’s risk factor model and potential mediators of change. Further development of the intervention should also be prioritized based on feedback from participants and providers to ensure the various aspects and components of the intervention are acceptable.

The effects of RA have become popular topics in research and media over the past several years with a proliferation of developmental and epidemiological studies as well as sensationalized books and news stories. However, despite increased popularity, the effectiveness of efforts to reduce RA during preadolescence—a time of heightened concern—has received little attention (Leff et al. 2009). In response, this study provided a first step in developing and evaluating a theoretically sound, multisystemic intervention. Despite limitations, some important statistical and clinical findings emerged. These promising findings suggest the need for additional research and development to improve and further evaluate GIRLSS. Given the topic’s popularity across professionals and stakeholders, the current study makes an important contribution to the field and future research is warranted.