Introduction

Children and adolescents with autism spectrum disorder (ASD) frequently present to treatment with behavioral difficulties including non-compliance, temper tantrums, and self- and other-directed aggression (Kanne & Mazurek, 2011). In addition to behavioral concerns, emotional concerns are a common reason for presentation to treatment. Strikingly, > 80% of youth with ASD have co-occurring irritability (Mayes et al., 2017), > 40% meet criteria for anxiety disorders (Leyfer et al., 2006; Simonoff et al., 2008), > 45% have co-occurring attention-deficit/hyperactivity disorder (Kerns et al., 2020), and as many as 70% are diagnosed with a mood disorder (Kim et al., 2000; Lainhart, 1999; Leyfer et al., 2006; Mazzone et al., 2012). Moreover, a recent study in youth with ASD found that emotional difficulties at age 12 years predicted the presence of similar difficulties in young adulthood (Stringer et al., 2020). Emotional impairments in childhood and adolescence have lasting broad impacts across the lifespan, including increased loneliness, greater social impairment, and lower quality of life (Gotham et al., 2015; White & Roberson-Nay, 2009).

Emotion dysregulation (ED)—an impairment in modulating the experience, expression, and intensity of emotions in an adaptable and contextually appropriate manner—is hypothesized to be one underlying transdiagnostic mechanism related to clinically significant behavioral and emotional difficulties (Cai et al., 2021; Eisenberg & Spinrad, 2004; Weiss, 2014). The process of emotion regulation can occur at both conscious and unconscious levels. Emotion regulation can occur reactively after a trigger or proactively before a triggering experience. For most individuals, an emotion regulation style emerges that is either adaptive or maladaptive. ED often presents as increased reactivity, irritability, mood swings, and difficulty calming down once upset and can include higher rates of anxiety, depression, aggression, self-injury, and temper tantrums (collectively termed irritability Mazefsky et al., 2013; Pouw et al., 2013; Rieffe et al., 2011; Samson et al., 2015; Ting & Weiss, 2017; White et al., 2014)). Additionally, ED makes a unique contribution to both internalizing and externalizing symptoms above and beyond a categorical diagnosis when examined in a sample of typical controls and individuals with a psychiatric diagnosis (e.g., Generalized Anxiety Disorder, ADHD, and ASD; (Cai et al., 2021). ASD-specific predispositions may increase risk for ED, including alexithymia and difficulty in identifying others’ affect (Rieffe et al., 2007), reduced flexibility in behavior and thought, intolerance of uncertainty (Cai et al., 2018a, 2018b), executive functioning difficulties (Murray, 2010), poor problem solving (Cai et al., 2018a, 2018b), limited coping skills (Jahromi et al., 2012), sensory sensitivities (Rogers & Ozonoff, 2005), and biological risk (Cai et al., 2018a, 2018b; Mazefsky et al., 2008).

In individuals with ASD, ED has been linked to significantly higher rates of crisis intervention utilization (i.e., hospitalizations and police contacts (Conner et al., 2021), suicidal ideation (Conner et al., 2020), maladaptive behaviors (i.e., aggression, self-injury, and destructive behaviors; (Samson et al., 2015), worsening of social deficits (Goldsmith & Kelley, 2018; Nader-Grosbois & Mazzone, 2014), comorbid psychiatric diagnoses, and use of psychotropic medications compared to their peers without ED (Mazefsky & White, 2014; Mazefsky et al., 2013). Youth with ASD and ED (ASD + ED) also frequently experience a number of psychosocial difficulties, especially in interpersonal relationships and academic performance (Hill et al., 2006). Specifically, youth exhibiting emotional outbursts may be less accepted by peers and miss key learning opportunities in social and academic environments, exacerbating existing challenges in attention, problem-solving, communication, and social interaction (Samson et al., 2012). Conversely, positive emotion regulation skills are likely a protective factor for social development in ASD, as youth with ASD who demonstrate better emotion regulation have higher rates of prosocial behavior (Goldsmith & Kelley, 2018). Thus, addressing ED during childhood and adolescence is critical to improving a wide range of outcomes necessary for a successful transition to adulthood.

Interventions for ED have been developed and evaluated in other psychiatric disorders with positive results. For example, in borderline personality disorder, remediation of emotion regulation difficulties is the primary focus of evidence-based interventions, such as Dialectical Behavioral Therapy (Koerner & Linehan, 2000). Likewise, interventions focused on ED have been successful in youth with eating disorders (Corstorphine, 2006), childhood separation anxiety (Afshari et al., 2014), depression (Kumar et al., 2008), childhood irritability (Derella et al., 2017), anxiety (Mennin, 2006), and early emerging evidence has been shown in ADHD (Vacher et al., 2020).

Few randomized control trials (RCTs) have been conducted that specifically look at treatment for ED in ASD. Instead, ED is often a small component built into a larger treatment program or is not directly addressed at all (Beck et al., 2020). Recently, several pilot studies and open trials have demonstrated success with ED treatment for ASD, specifically the Stress and Anger Management Plan ([STAMP] Factor et al., 2019; Scarpa & Reyes, 2011), Secret Agent Society: Operation Regulation ([SAS:OR] (Weiss et al., 2018), and Emotional Awareness and Skills Enhancement (EASE) programs (Conner et al., 2019).

Factor et al. (2019) adapted the Exploring Feelings program into STAMP for young children with ASD ages 5–7 years to decrease anger and increase emotion regulation. STAMP is a group-based intervention with a concurrent caregiver group that meets for nine one-hour sessions. Results from their waitlist control design (n = 23, treatment = 12, waitlist = 11) demonstrated moderate to large effects in increased caregiver confidence in their child’s ability to manage anger (d = 0.63), decreased caregiver reported child anxiety (d = 0.84), decreased caregiver reported child lability and negativity (d = 0.80), decreased frequency and length of outbursts in children with ASD, and 67% of STAMP participants responding positively to treatment. SAS:OR (Weiss et al., 2018) is a manualized individual therapy program for youth with ASD ages 8–12 years old. The intervention includes caregivers in sessions and focuses on emotion awareness, mindfulness, acceptance, and generalization of skills to school and home. Using a waitlist control RCT, SAS:OR (n = 68, treatment = 35, waitlist = 33) demonstrated moderate to large effects in increasing emotion regulation skills (d = 0.79) and decreasing both lability/negativity (d = 0.58) and problem behaviors (d = 0.71). EASE is another ongoing RCT for adolescents with ASD + ED delivered via individual therapy that utilizes mindfulness and acceptance-based strategies. In an open trial, EASE (n = 20) demonstrated reductions in both reactivity and specific psychiatric symptom domains such as anxiety and depression for adolescents with ASD (Conner et al., 2019). Despite the promising results in ASD + ED research thus far, there is a strong need for a group-caregiver intervention across a broader age range to provide treatment in an ecologically-valid social setting that includes caregiver education to provide ongoing support of skill use.

In an effort to address the critical need for efficient, effective, and scalable intervention strategies for youth with ASD + ED, our group developed Regulating Together (RT, formerly called IO-PERT (Shaffer et al., 2018)—an intensive outpatient group program addressing ED for youth ages 8-18yrs. This program attempts to take a comprehensive approach, involving both caregivers and youth as well as utilizing evidence-based intervention techniques including cognitive behavioral therapy (CBT), parent management training, and mindfulness- and acceptance and commitment-based therapies (Fig. 1). To our knowledge, no other interventions have taken such a comprehensive approach to addressing ASD + ED. In order to meet the developmental needs of this sub-population, the groups are separated by age into Child (8–12 year) and Teen (13–18 year) using the same curriculum content with developmentally modified delivery (Table 1). The caregiver group directly teaches crisis management, reward systems, and coaching strategies for each concept targeted in the youth group. The program meets twice weekly in 90-min sessions for 5 weeks with concurrent caregiver groups. In addition, it is conducted in an ecologically-valid group setting that provides opportunities for in vivo practice of skills with other same aged youth (Table 2).

Fig. 1
figure 1

Evidenced-based components that comprise Regulating Together

Table 1 Details of caregiver and child (8–12 years old) sessions for regulating together
Table 2 Details of caregiver and adolescent (13–18 years old) sessions for Regulating Together

Evaluation of RT via retrospective chart review has demonstrated initial feasibility and acceptability of the intervention (c; Shaffer et al., 2018, 2019b, 2020). Previous examinations of RT demonstrated improvements in caregiver-rated irritability and clinician-rated global impressions for youth ages 8–18 years and lethargy/social withdrawal for youth ages 13–18 years (Shaffer et al., 2018; 2019a, b, c).

In order to further explore the impact of RT, a within-subject trial was conducted. The current trial included a 5-week control lead-in period for all participants, pre- and post-intervention assessments, and follow up assessments at 5- and 10-weeks post intervention. We hypothesized that statistically significant improvements would be found after the treatment period in irritability and emotional reactivity, and there would be maintenance and/or continued improvement at each follow up visit. To further explore outcomes of treatment, additional measures were examined including hospitalization rates, executive functioning, and cognitive flexibility. The current study expands upon our prior work by using a within-subjects design, including both control lead-in and follow-up periods, to identify changes on outcome measures associated both directly and indirectly with ED.

Methods

Fifty-two participants with ASD between the ages of 8–18 years were enrolled in a within-subjects design (4 Child Groups, N = 25, 4 Teen Groups, N = 27). Due to COVID-19, the last Teen group of eight participants was interrupted mid-intervention and switched to a virtual format. For that reason, the last round was excluded from analysis. For the rounds not impacted by the COVID-19 pandemic, there were 51 participants screened with 7 screen fails (Did not have ASD = 2, Language level too low = 3, Irritability score too low = 2; see Fig. 2), leaving 44 participants (4 Child Groups, N = 25, Age M = 10.48(1.53) and 3 Teen Groups, N = 19, Age M = 14.80(1.38)). Demographics are presented in Table 3 for the 44 participants. There was one participant who only completed the 5-week control lead in period and was not included in analyses. All analyses are based on the 43 participants completed treatment and post-treatment time point.

Fig. 2
figure 2

Consort Diagram of Study Procedures

Table 3 Participant demographics

Youth were recruited from multiple clinics within our hospital as well as community partners such as schools and autism agencies. Youth ages 8–18 years, with a documented diagnosis of ASD, IQ > 65 (one youth with an IQ of 63 was included because verbal IQ was much higher and the FSIQ was deemed an underestimate), English as their primary language, verbal functional communication determined by appropriateness to receive an ADOS-2 Module 3 or 4 (Lord et al., 2012), stable medication and other interventions, and at least one caregiver willing to participate in treatment were included in the study. ASD diagnosis was confirmed via medical and behavioral history, the ADOS-2, and expert clinical diagnosis. IQ was confirmed via the Wechsler Abbreviated Scale of Intelligence, Second Edition (WASI-II; (Weschler, 2011)) at screen. Youth were included if they had Irritability scores of ≥ 10 on the Aberrant Behavior Checklist, Second Edition (ABC-2; (Aman & Singh, 2017). If ABC-Irritability was not over 10 but families reported the child had increased ED, Hyperactivity scores ≥ 10 were also included. This inclusion score process was established from our initial chart review study (Shaffer et al., 2018). The ABC-Irritability scale is a composite of items related to ED including temper outbursts, depressed mood, mood changes, crying/screaming, self-injury, and aggression; it was used to ensure baseline impairment related to ED. Youth were excluded if they had any incidents of aggression toward other youth that resulted in injury in the past 2 weeks to ensure safety for all participants. Study participants completed a 5-week control lead-in period, a 5-week active treatment period, and follow up assessments at 5- and 10-weeks post treatment completion. IRB approval was obtained from Cincinnati Children’s Hospital Medical Center, and all participants provided consent or assent, with caregivers providing consent for all participant younger than 18 years. Families received compensation commensurate with the time commitment for each study visit.

Community Involvement

The Cincinnati Children’s Hospital’s Family Advisory Committee was involved as an advisory team throughout the study from creation through execution. The Committee is made up of caregivers of youth with a wide range of ages and developmental disabilities, including ASD. They provided stakeholder feedback about appropriateness of study procedures.

Intervention

The RT program met twice weekly for 90 min per session over 5 weeks (10 sessions) with concurrent Child/Teen and caregiver groups. There were two psychologists or masters level therapists (one for caregivers and one for youth) and one behavior assistant (bachelor’s degree) with youth during the program. Each RT session focused on teaching participants new CBT and mindfulness skills and strengthening those skills via repeated practice both within group and as part of weekly homework (Tables 1 and 2). Group topics varied slightly based on age group with the Child group spending more time on key topics and the Teen group covering additional developmentally appropriate material. The RT caregiver training curriculum was structured similarly to the Child/Teen groups, with its foundations in CBT and mindfulness principles. Direct instruction in behavior management strategies commonly included in evidence-based programs (i.e., reward plans, prevention strategies) was provided (Shaffer & Minshawi, 2014; Sofronoff & Farbotko, 2002). The group format of RT provides youth with ASD implicit and explicit opportunities to learn and practice skills among peers, normalize experiences, role model, share and validate feelings, reduce stigma, and engage in shared problem-solving. This was deemed especially important since youth with ASD tend to struggle with regulating emotions most in social settings (Maddox et al., 2017). Reinforcement plans based on behavioral strategies were utilized on both an individual and a group level to reinforce appropriate behavior. Group engagement with plans presented to caregivers as examples of how to reinforce similar behaviors at home were also provided.

Treatment Fidelity

Each group therapy session for both youth and caregivers were recorded for fidelity purposes. Fidelity was coded based on adherence to the key teaching points of each section of the session (learning, activity, break, learning, activity, and caregiver wrap up). An independent rater who was not involved in treatment provision rated fidelity for 70% of the youth and 50% of the caregiver video sessions, randomly selected. An additional 30% of youth and 25% of caregiver videos were randomly selected and double-coded by a post-doctoral fellow for coding reliability. Of the videos coded for fidelity, there was 100% fidelity to the treatment manual by therapists in both youth and the caregiver sessions and 100% coding reliability between the two coders.

Feasibility and Acceptability

Feasibility was assessed by collecting data on attendance and retention across the study. Acceptability was assessed through satisfaction data collected by caregivers post-treatment. The Caregiver Readiness and Satisfaction Survey (CRS) was created for this study to measure caregiver readiness for treatment, confidence in managing their child before treatment, and confidence and satisfaction post-treatment. Answers are rated on a 6-point Likert scale ranging from 0 (not at all or none) to 5 (very much). This survey was created specifically for RT and has been utilized throughout pilot testing, although it was expanded for this trial to include readiness for treatment (Shaffer et al., 2018).

Youth acceptability of the treatment was collected for the teen group only. They rated how much they learned in each session on a 5-point Likert scale from 1 (nothing) to 5 (very much). They also rated how they felt on the 5-point scale (I feel… 1 = Calm and in Control, 2 = Uncomfortable, 3 = Triggered, 4 = Mad, Sad, or Anxious, or 5 = Out of Control) once it was taught in group starting in session 2.

Adverse Events

At each assessment and treatment visit, families were asked if any adverse events or changes in behavior had occurred since the previous visit.

Measures

A multi-method, multi-informant assessment battery was used to provide accurate sample characterization, examine efficacy of RT, maintenance of change, and exploratory measures of potential predictive factors of treatment response. Demographics of both study participants and their families were collected through interviews and surveys including information about co-occurring diagnoses, household income, caregiver education, participant education information, and family history of diagnoses. We did not directly assess co-occurring diagnoses and rates are based on caregiver report. Psychiatric hospitalization rates were collected via chart review for 12 months prior and 12 months post-group participation to assess outcomes in crisis hospitalization use. All measures are described below.

Characterization Measures

The ADOS-2 (Lord et al., 2012) is a well-established clinician-administered diagnostic assessment (Lord et al., 2012). Modules 3 or 4 of the ADOS-2 were administered to all participants at the screen assessment to confirm ASD diagnosis. All administrations were performed by research reliable evaluators.

The Clinical Global Impressions Scale-Severity (CGI-S; (Guy, 1976)) was utilized as a clinician-rated measure to assess overall functioning severity of impairment in relation to ASD. A trained, independent clinician rated CGI-S at Baseline. CGI has been used widely in ASD pharmacology and behavioral trials (Bearss et al., 2015; King et al., 2009; McDougle et al., 2005; Minshawi et al., 2016). The CGI-S provides a qualitative measure of global severity through a rating from 1 to 7 (1 = normal, not at all ill; 2 = borderline ill; 3 = mildly ill; 4 = moderately ill; 5 = markedly ill; 6 = severely ill; 7 = among the most extremely ill patients). Rater training was conducted with gold standard vignettes and regular reliability training was conducted for all raters.

The Vineland Adaptive Behavior Scales, 3rd Edition (Vineland-3; (Sparrow et al., 2016) is a well-established standardized measure of adaptive behavior that assesses skills in Communication, Daily Living Skills, and Socialization domains and is widely used in ASD studies. The Vineland-3 was used at screening to characterize overall functioning of participants. Caregivers completed the Parent/Caregiver form.

The WASI-II (Weschler, 2011) provides a brief and reliable measure of cognitive ability for ages 6–90 years. The measure has been used extensively with the 8-18yrs age range and in an assortment of neurodevelopmental disorder clinical studies. The WASI-II was completed by a trained research assistant and was used as inclusion criteria (FSIQ > 65).

Primary Outcome Measures

The Emotion Dysregulation Inventory (Mazefsky et al., 2018; Mazefsky et al., 2018) consists of two scales, Reactivity (EDI-R), which captures poorly regulated negative emotional responses and Dysphoria (EDI-D), characterized by poor uptake of positive emotions and lack of motivation. Raw scores are converted to theta scores based on item response theory calibration in an autism sample (n = 1755) (Mazefsky et al., 2018; Mazefsky et al., 2018; Mazefsky et al., 2018). Theta scores have a mean of 0 and SD of 1 (equivalent to a t-score mean of 50 and SD of 10). The EDI-R subscale was selected a priori as the primary outcome measure in this study. EDI scores have been found to be stable in a non-treatment group (N = 1333) and sensitive to change in an inpatient group receiving treatment (N = 432; (Mazefsky et al., 2018; Mazefsky et al., 2018). The Reactivity subscale has an internal consistency of 0.97 and the Dysphoria subscale has internal consistency of 0.90 (Mazefsky et al., 2020). Scores greater than 1 standard deviation above general population norms (based on a sample of 1000 youth matched to the U.S. Census) are considered clinically elevated.

The ABC-2 (Aman & Singh, 2017) is a 58-item caregiver report questionnaire on behavior difficulties commonly seen in individuals with developmental disabilities. It is comprised of five subscales derived by factor analysis: Irritability, Social Withdrawal/Lethargy, Stereotypy, Hyperactivity, and Inappropriate Speech. The ABC-2 has been extensively used in psychopharmacological studies of ASD and was utilized in our previous RT chart review (Aman et al., 2009; Bearss et al., 2015; McDougle et al., 2005; Minshawi et al., 2016; Shaffer et al., 2018; Wink et al., 2018). Caregivers rate the severity of behaviors (i.e., temper tantrums/outbursts) on a 4-point scale ranging from 0 (not a problem) to 3 (the problem is severe in degree). The ABC-2 was used for inclusion criteria as well as a primary outcome measure (Irritability). The subscale has demonstrated internal consistency of 0.92 (Kaat et al., 2014).

Secondary Outcome Measures

Hospitalization rates were examined via medical record for all participants. It was calculated (total hospitalizations/n) for the 12 months prior to treatment and 12 months post-treatment.

The Emotion Regulation Skills Test (ERST) was created for this pilot of RT to measure knowledge acquisition of ED management skills taught in the program. It is composed of 14 multiple choice questions reflecting key ideas or skills presented in each session with 1–2 questions for each session presenting new material. Correct items are given a score of 1 and a total sum is calculated with a potential score between 0 and 14. Youth completed ERST at all time points and it was used as a secondary outcome measure. Test–retest reliability was acceptable for this pilot (r(42) = 0.70, p < 0.0001).

The Behavioral Rating Inventory of Executive Functioning, 2nd Edition (BRIEF-2; (Gioia et al., 2015) is an 86-item, caregiver-report inventory that measures executive functioning (EF) skills in youth ages 5–18 years. It is scored on a Likert scale from 1 (Never) to 3 (Often). Index scores include a Behavior Regulation Index (BRI), Emotional Regulation Index (ERI), and Cognitive Regulation Index (CRI) with an overall Global Executive Composite (GEC). Raw scores are converted to T-scores for the individual scales and standard scores for the Indexes. The BRIEF-2 is commonly used among youth with ASD and has strong internal consistency on its subscales ranging from 0.76 to 0.96, with all but one subscale falling above 0.82. T-scores on the BRIEF-2 ERI was selected a priori as a secondary outcome measure.

The Flexibility Scale (FS; (Strang et al., 2017) is a caregiver report measure developed to assess the multidimensionality of flexibility in youth with ASD, including cognitive aspects of flexibility in daily life, routines/rituals, transitions/changes, special interests, social flexibility, and generativity. It is rated on a 4-point Likert scale ranging from 0 (no) to 3 (always). Internal Consistency within scales is adequate ranging from 0.75 to 0.91 (Strang et al., 2017). Mean scores on the FS total and Social Flexibility and Transitions/Change subscales were selected a priori as secondary outcome measures.

The Clinician Global Impressions- Improvement (CGI-I; (Guy, 1976) was utilized as a clinician-rated outcome measure to assess response to treatment, specifically as it relates to ED. A trained, independent clinician rated CGI-I at the end of control lead-in, end of treatment, and all follow up visits. The CGI-I provides a qualitative measure of treatment response through a rating from 1 to 7 (1 = very much improved; 2 = much improved; 3 = minimally improved; 4 = no change; 5 = minimally worse; 6 = much worse; 7 = very much worse). The same training and reliability steps outlined above for the CGI-S were followed for CGI-I. CGI-I final scores were dichotomized into “responders” (1 or 2) and “non-responders” (3–7) for analysis.

Statistical Analysis

Repeated measures ANOVAs were conducted to assess changes between time points for all measures while covarying for age, sex, and number of sessions attended. The overall sample was initially analyzed together, and secondary analyses separated the two age groups, Child and Teen, to explore outcomes in the two developmental groups separately. For analyses demonstrating a significant change by time point, individual time periods were examined including the 5-week control lead-in period (T1-T5), start of treatment to end of treatment (T5-T10), start of treatment to 5 weeks post-treatment (T5-T15), and start of treatment to 10 weeks post-treatment (T5-T10). Note that no effect sizes accompany the F values because they are the result of repeated measures mixed models with covariates. It is not recommended to provide partial \({\eta }^{2}\) when covariates are involved (Lakens, 2013). However, we examined “pseudo” effect sizes defined in the spirit of Cohen’s d as the square root of: F-statistic x (numerator-degrees-of-freedom/denominator-degrees-of-freedom). We report these pseudo d’s, but they represent overall change which includes the control period. Cohen’s d effect sizes, based on the raw data, represent between time point effects, and thus are of primary interest. Positive effect sizes indicate improvement and negative indicate worsening. Lastly, for all comparisons of interest in Tables 4 and 5, a False Discovery Rate was applied in order to account for the multiple hypothesis tests (Benjamini & Hochberg, 1995). All statistical analyses were conducted using SAS ® version 9.4 (SAS Institute Inc., Cary, NC).

Table 4 Mixed model analyses covarying for age, sex and number of sessions attended
Table 5 Mixed model analyses covarying for age, sex and number of sessions attended by time period

In order to assess improvement between events the CGI-I was dichotomized by groups the 1’s & 2’s in the responder group and the 3’s, 4’s, & 5’s in the non-responder group. The association between the dichotomized CGI-I and the four event of assessment, T5, T10, T15, and T20, was analyzed using the Chi-Square test.

Hospitalizations pre- and post- treatment were defined as dichotomous variables regardless of the number of hospitalizations. An exact McNemar’s test was used to test for a difference in the hospitalization rates between the two periods. Further, separate analyses were conducted for those that had 6 months post data before COVID-19 occurred to protect against the possible impact of COVID-19 on hospitalization rates. These groups were then compared using Fisher’s exact test.

Results

Feasibility

Of the youth initially enrolled, one dropped out of the study during the control lead-in phase, six during the treatment phase, and four during the follow-up phase. We thus demonstrate a retention rate of 87% during treatment and 75% for the entire 20-week study. Our overall attendance rate was 82%. Examination of the six youth who dropped out during treatment (five Child group, one Teen group), demonstrated better retention in the Teen group (95%), although the Child group retention (80%) was still acceptable. Scores across key measures were compared between the youth who dropped out and the ones who completed the program. For the Child drop-outs during the intervention phase, they had similar IQ scores and lower scores on the EDI-Reactivity compared to the rest of the participants, suggesting they were likely not appropriate for RT from the beginning. For the teens, the one teen who dropped out during intervention had a very high EDI-R score.

Acceptability

Readiness for treatment was assessed through a caregiver survey pre-treatment and acceptability was assessed post-treatment with a similar caregiver survey. Results of both surveys are presented in Fig. 3. Teen rates of learning and emotion are also presented in Fig. 3. Teens rated session 9 (Review, M = 5) as the session when they learned the most, followed by session 1 (M = 4.8), session 3, and session 8 (both M = 4.13). The least rated learning occurred in session 4 (Distress Tolerance, M = 3). Average ratings of emotions ranged from 1.87 (Session 8) to 3 (Session 4). Session 4 appears to be the most triggering for youth and the one that they rated learning the least.

Fig. 3
figure 3

Caregiver measures including a caregiver confidence pre- and post-treatment, b caregiver readiness pre-treatment, and c caregiver learning, satisfaction, and child skill use post-treatment. Adolescent ratings of learning and emotions d are also presented

Adverse Events

There were two child participants and no teen participants with changes in behavior during the intervention. Caregivers related changes for both children to events outside of the program (start of school and vacation). There was one teen who had two hospitalizations for medical reasons over the course of the group intervention and one child who had a psychiatric hospitalization during the control period before the group began. This child had several hospitalizations prior to the study beginning and this was deemed in line with his baseline behavior. No additional adverse events were reported by families. Intervention leaders reported three instances of emotional outbursts involving either physical aggression toward leader (one teen), elopement (one child), or verbal aggression toward another child (one child) during the intervention sessions. Each of these instances were typical behavior for the child or teen and related to common triggers including being given a direction or disagreeing with a peer. All instances were managed by the group leader with the assistance of the caregiver and the child/teen returned successfully to the group session after each occurrence.

Primary Outcome Measures

Results from the full model of the Overall sample (Child and Teen combined) and separately by each age group are presented in Table 4. Results comparing individual time periods of the Overall sample and by age groups are presented in Table 5. The effect sizes, d, are pseudo effect sizes for results from Table 4 and Cohen’s d values based on raw data for results from Table 5. All analyses are listed first `for the Overall sample, then Child group, and then Teen group analyses. Of those enrolled, the Child and Teen groups did not significantly differ in terms of cognitive ability, adaptive skills, or baseline scores on outcome measures (ps > 0.430).

Emotion Dysregulation Inventory-Reactivity (EDI-R) and Dysphoria (EDI-D)

For the Overall sample, a statistically significant time point difference was found for EDI-R (F(4,100.1) = 7.05, p =  < 0.001). There was no change in scores during the 5-week control lead-in period, (p = 0.26, Cohen’s d = 0.20), but we found significant improvements on the EDI-R from treatment start to post-treatment (p = 0.02, d = 0.47) and from treatment start to 5 weeks (p = 0.005, d = 0.52) and 10 weeks follow-up (p < 0.001, d = 0.77). There was not a statistically significant time point difference on the EDI-D (F(4,100.6) = 1.83, p = 0.130).

Similar results were found on EDI-R for the Child group with a statistically significant difference by time point (F(4,54.22) = 3.80, p = 0.009). Similarly, we found no change in scores during the 5-week control lead-in period (p = 0.81, d = -0.06), but improvement on the EDI-R from treatment start to post-treatment (p = 0.034, d = 0.43), although when the false discovery rate was applied this was no longer statistically significant. Significant improvement was found from treatment start to both 5 weeks (p = 0.01, d = 0.55) and 10 weeks follow-up (p < 0.001, d = 1.00), both of which maintained significance post false discovery rate application. Of note, the mean participant EDI-R score at the 10-week follow up was no longer in the clinically significant range (+ 1.0 SD above mean). On the EDI-D for the Child group, there was a statistically significant time point difference found (F(4,57.68) = 3.40, p = 0.015). Further examination revealed significant results specifically between treatment start to 10 weeks follow-up (p = 0.0015, d = 0.98). For the Teen group, there was a statistically significant difference on the EDI-R by time point (F(4,56.66) = 3.17, p = 0.020). Details are presented in Table 4 and 5. For Teen group on the EDI-D, there was not a statistically significant difference by time point (F(4,40.27) = 1.11, p = 0.36).

Aberrant Behavior Checklist-Irritability (ABC-I)

We observed statistically significant changes in ABC-I at all time points, including the control lead-in period (F(4, 97.09) = 14.48, p < 0.001). The largest effect sizes were seen from treatment start to the 10-week follow-up. Full details, including Child and Teen specific results are presented in Tables 4 and 5.

Rates of Hospitalization

An exact McNemar’s test showed no significant difference between pre- and post- treatment hospitalization rates for the overall sample (p = 0.41). Although the before COVID-19 group had a smaller p value (0.25) than the after COVID-19 group (1.0); the difference in proportions between the before and after COVID-19 period were not statistically significant based on Fisher’s Exact test (p = 0.40).

Given the small sample size, particularly the limited number of hospitalizations overall, it is still clinically helpful to directly report the rates in addition to the statistical analyses. Psychiatric hospitalization rates (total hospitalizations/n) for our sample were 24% during the 12 months prior to RT treatment with a decrease to 8% in the 12 months after participation. In addition, the 8% of hospitalizations all occurred during the first 6 months with no hospitalizations occurring in the 6–12-month post group completion period. Many of our participants’ outcome period was impacted by the COVID-19 pandemic. Thus, we examined hospitalization rates in a smaller sample who were enrolled earlier in the study whose follow up rates were not impacted by the pandemic. There was a 21.7% hospitalization rate in this sample prior to intervention and a 4.3% rate in the 6 months post-intervention. This is comparable to the results of the overall sample as demonstrated in the statistical analyses.

Secondary Outcome Measures

Mixed model analyses are presented below of the Overall sample, with further detailed analyses by time point presented in Tables 4 and 5 and Child group and Teen group specific analyses presented in Tables 4 and 5.

Emotion Regulation Skills Test (ERST)

Statistically significant improvements were demonstrated on ERST between time points for the Overall sample (F(4,107.4) = 17.54, p =  < 0.001), with no significant change during the 5-week control lead-in period (p = 0.07, d = 0.23) but significant change from start of treatment to post-treatment (p =  < 0.001, d = 0.91) and follow ups at 5 weeks and 10 weeks post-treatment (p < 0.001, d = 0.72 and 0.75).

Behavior Rating Inventory of Executive Functioning-2 (BRIEF-2)

On the BRIEF-2 for the Overall sample, statistically significant changes were demonstrated by time point on the ERI (F(4, 100.7) = 8.19, p < 0.001). Further analyses are presented in Table 5 with treatment response being indicated by stability during the control lead-in period and change post-treatment on the ERI.

Flexibility Scale (FS)

The FS Total score indicated an overall change by time point (F(4,100.4) = 7.60, p < 0.001) for the Overall sample. Significant changes by time point were also found on the Flexibility subscales of interest including Social Flexibility (F(4,89.33) = 4.49, p = 0.002) and Transitions/Change (F(4,99.29) = 6.77, p < 0.001).

Clinical Global Impression—Improvement (CGI-I)

We saw a statistically significant change by time point in the Chi-square analysis for the overall sample, Child group, and Teen group (see Table 4). In all cases, the responders were greater in the post-treatment time points (T10, T15, T20) than in the control period (T5). For the Overall sample, there was improvement in 8% of participants after the 5-week control lead-in period. However, improvement was seen in 87.6% (31.3% much improved, 6.3% very much improved) post-treatment, 80% at 5 weeks post-treatment (40% much improved, 3% very much improved), and 87.8% (45.45% much improved) at follow-up.

On the CGI-I for the Child group, we saw improvement in 8% of participants after the 5-week control lead-in period. However, improvement was seen in 87.5% immediately following treatment (12.5% Very Much Improved, 37.5% Much Improved, 37.5% Minimally Improved), 75% at 5 weeks follow-up (37.5% Much Improved and 37.5% Minimally Improved), and 87.5% at 10 weeks post-treatment (37.5% Much Improved and 50% Minimally Improved).

On the CGI-I for the Teen group, statistically significant change was found for responders in the post treatment period (Table 4). More specifically, we saw improvement in only 6% of participants after the 5-week control lead-in period, 73% improvement following treatment (23% Much Improved and 50% Minimally Improved), 85.7% at 5 weeks follow-up (7.2% Very Much Improved, 42.8% Much Improved, and 35.7% Minimally Improved), and 90% at 10 weeks post-treatment (45.5% Much Improved and 45.5% Minimally Improved).

Discussion

Emotion dysregulation (ED) is a growing area of concern and therefore of interest in ASD research and clinical support options. The suggested mechanistic role of ED in interfering behaviors and co-occurring mental health conditions in ASD makes ED an important target for treatment. Despite this, there are few treatments available that directly focus on ED in this population. To address this need, we developed Regulating Together (RT), a comprehensive treatment approach that includes both youth and caregiver intervention, each in a group setting. The current RT within-subjects trial demonstrated high feasibility and acceptability of the treatment, consistent with our previous findings (Shaffer et al., 2018, 2019b, c). We replicated and expanded our initial retrospective results by demonstrating minimal, nonsignificant change during a control period for all participants and statistically significant RT-associated improvement in both emotional reactivity and knowledge of emotion regulation skills as well as promising outcomes on secondary measures of emotion regulation, flexibility, and global improvement. Importantly, following RT our participants demonstrated a notable reduction in psychiatric hospitalizations, demonstrating a clinically significant change, although not statistically significant, potentially due to insufficient power. Overall, in youth with ASD + ED, RT was associated with broad improvements in symptoms of ED. This supports the next step of a randomized controlled trial to further evaluate efficacy.

RT-associated improvement in emotion regulation was demonstrated on caregiver-reported, direct child assessment (emotion regulation knowledge specifically), and independent rater outcome measures. Specifically, we found caregiver-reported reduced reactivity on the EDI-R and improved emotion regulation on the BRIEF-2 ERI subscale, as well as youth-reported increased emotion regulation skills knowledge on the ERST. This suggests that not only are ASD youth showing enhanced knowledge of emotion regulation skills but also that caregivers are reporting improvements in their youth’s emotion dysregulation at home. Further, improvements in ED were found immediately post-intervention for all groups across age ranges. As an expansion of previous studies, we also demonstrated that participants maintained and continued to improve their ED symptoms at both 5- and 10-weeks post-intervention, suggesting that improvements in emotion regulation following RT are stable across time. The mean score of the EDI-R for the Child group was no longer in the clinically significant range at 10 weeks follow-up and it was very close to the clinical cut-off score for the teen sample. Together, these findings support RT as a feasible and potential intervention for ASD + ED, in addition to indicating an RCT is needed to determine whether RT is an effective and durable treatment option for this population.

Despite the positive results of the EDI-R, there were more variable results on two other measures, the ABC-I and the CGI-I. The ABC-I demonstrated change at all time points, including during the control period. Although this is concerning, it is consistent with previous research that finding changes on the ABC-2 in the absence of treatment (Jones et al., 2017). Given this concern and that the ABC-I measures broader irritability, including self-injury, and not specifically ED, we feel the EDI-R is likely the most suitable outcome measure for this intervention as the EDI-R is a more direct assessment of ED. Additionally, the CGI-I demonstrated improvements across outcome time points, but it is important to note that there was an increase in CGI-I scores at the 5 weeks post time point with a return to improvement at 10 weeks post. One possible explanation is that caregivers and youth stop using skills consistently during this period, leading to a reemergence of behavior, and then begin using them again, leading to the end results of improvement. However, we did not obtain direct information from caregivers or youth about use of skills at home during each study visit, and thus cannot determine whether our hypothesis is true or not. Still this finding is important to consider in context of future data collection and when a booster session may be beneficial to families in future studies.

We also found improvements in several areas of flexibility, a key difficulty for youth with ASD. Improvements were seen for both the Child and Teen groups, especially within areas of social flexibility and transitions. Previous studies targeting ED in other psychiatric disorders report improved cognitive flexibility following intervention (Afshari et al., 2020; Painter et al., 2019), highlighting an important link between flexibility and ED. Indeed, we reported improved behavioral performance on a reversal learning task of cognitive flexibility following RT conducted as part of our within-subjects trial (Schmitt et al., 2021). Though this connection has not yet been fully explored, recent evidence suggests that increased flexibility/reduced rigidity is an important factor that builds and maintains emotion regulation skills (Cai et al., 2018a, 2018b; Mazefsky et al., 2013; Mazefsky et al., 2018; Mazefsky et al., 2018; Miyake et al., 2000; Schreiter & Beste, 2020; Schultz & Searleman, 2002). Future work is needed to determine the extent to which these factors moderate primary outcomes in ASD.

Maintenance and Longer-Term Impact of Improvements

Although immediate change is promising, it is important to demonstrate maintenance and/or continued change of an intervention once a treatment is completed. Although evidence supports many interventions for school age and adolescent youth with ASD, longer term follow-up time points are not as readily available (Weitlauf et al., 2014). The current study addressed this need by collecting follow up data at 5- and 10-weeks post-treatment. Across all outcome measures (reactivity, irritability, flexibility), the largest effects were observed at 10-weeks post-treatment, demonstrating potential continued improvement even past completion of the intervention. There are several potential explanations for this finding.

First, changes in emotion regulation may be a gradual process that takes time to fully improve. This is consistent with previous research demonstrating that youth with ASD have difficulty generalizing skills and often require longer periods of time to implement them (de Marchena et al., 2015). For example, we observe the largest effect size on Child and Teen overall knowledge as measured by the ERST at treatment end, whereas caregiver-reported measures demonstrated largest effect sizes at 10-weeks follow up. Thus, the application of new emotion regulation skill knowledge to emotional processing and behavioral skills may take longer to manifest. This fits with anecdotal information from caregivers that youth often know what to do but struggle initially with implementing that knowledge. The fact that we demonstrated initial knowledge post-treatment and significant improvement in ED beyond that is notable, and is worth further investigation in future studies.

Second caregivers’ ongoing use of coaching strategies may allow the intervention to continue beyond the treatment period by building on the generalizability of skills outside the traditional treatment setting and increasing the dosage of the treatment beyond the ten formal sessions. This is consistent with previous research of an anxiety intervention for ASD that found significantly more improvement for participants who had a caregiver group versus those who did not or those in a waitlist control group (Sofronoff et al., 2005). Despite anecdotal reports from families that they continued to use skills past the end of the group, no concrete data was collected regarding this and future studies will need to directly assess this hypothesis by measuring caregiver coaching and child use of skills post group completion. Additionally, the outcomes may be strengthened by more formal generalization support for caregivers through individualized planning sessions post-group completion or treatment aids to remind families to use the skills.

It would be remiss not to consider other explanations for the observed changes over time such as measurement, maturation, or parent expectation effects. In terms of measurement effects, the EDI demonstrates strong test–retest reliability, stability in a non-treatment population, and sensitivity to change, all of which makes it unlikely that measurement effects were occurring, particularly given that the mean for the child group was no longer in the clinically significant range at week 10 (Mazefsky et al., 2016, 2020). During development, EDI scores were found to be stable in a non-treatment group (N = 1333) and sensitive to change in an inpatient group receiving treatment (N = 432; Mazefsky et al., 2018; Mazefsky et al., 2018). In terms of maturation, in the general emotion regulation literature, there is evidence that emotion regulation develops in a curvilinear fashion through childhood, stabilizes between ages 8–11 years, sharply decreases in adaptive skill use and sharply increases in maladaptive skill use between ages 12–15 years, and then improves again between ages 16–18 years. Although this research has not been conducted in youth with ASD specifically, it is possible a similar pattern would be found with stability during the child age range and worsening during the adolescent age range, making it unlikely that the changes observed were from maturation alone (Cracco et al., 2017). Finally, in terms of parent expectation effects, with any parent report measure it is possible that parent expectation may impact the results. It will be important in future studies to include more objective, and youth reported measures to protect against this possibility. Ultimately, it is likely that multiple processes are working simultaneously together and future work is needed to better clarify which specific factors contribute to longer-term maintenance of improvements.

Further support of the long-term impact of RT on ASD + ED was found in an examination of psychiatric hospitalization rates for 12 months prior to and 12 months after study participation. Before treatment there was a 24% rate of psychiatric inpatient hospitalization, but only an 8% rate of hospitalization in the year after participation. Given the study was impacted by COVID-19 during the 12-month follow-up period, we examined youth who finished the program and had 6 months of follow-up data before COVID-19 began and found very similar results to the larger sample. Although these results are promising, we did not find statistically significant changes, likely due to insufficient power. Despite the lack of statistical significance, this result is still important from a clinical perspective in terms of the significant disruption on youth’s and their families’ lives of inpatient hospitalization as well as the increased financial burden on families, insurance companies, and hospitals.

Age-Related Differences

Despite the similarities of curricula and baseline characteristics of the Child and Teen groups, we found a different pattern of results across the two age groups. The Child group demonstrated greater changes on our primary outcome measures (emotion regulation knowledge and reactivity), whereas the Teen group demonstrated greater changes on secondary outcome measures (flexibility and executive functioning). However, both groups demonstrated change on the BRIEF-2 ERI and overall global change on the CGI-I. The different pattern of results between the two age groups may point to a developmental difference in the process of changing ED. Additionally, we found slightly different retention rates across the two age groups with the child groups having lower retention, although still within an acceptable range. When examining youth who dropped out of the child intervention, they had very low EDI-R scores, suggesting that ED may not have been a primary concern, and they were likely not appropriate for the intervention. Future studies should adapt the inclusion criteria to better reflect the group that was retained.

The younger group may be more prone to changes in emotion regulation simply due to developmental brain maturation, such that there is increased malleability within younger children (Ahmed et al., 2015; Brown et al., 2012; Raznahan et al., 2011). As previously mentioned, research has shown age related use of emotion regulation strategies with stability in late childhood and a dramatic change in strategies during early adolescence. It is possible that this developmental shift impacted our adolescent outcomes. Given stability and later improvement was observed for the teens and not a worsening as could occur based on this developmental perspective, the results are still promising. For the children, it is possible that this stable period allowed them to learn and develop use of strategies more easily (Cracco et al., 2017). Additionally, examination of the intervention content for the two age groups showed a more pronounced emphasis on cognitive processes and flexibility in the Teen group, which may, in part, explain significant results in these areas. Based on the results of the current trial, future studies should evaluate the two curriculums separately, as distinct interventions, and utilize outcome measures that best reflect treatment related change.

Anecdotally, leaders of the two groups often commented on better engagement of caregivers in the Child group, whereas more skepticism about trying skills in both the Teen and the teen caregiver groups. Buy-in for the treatment model may be more difficult to achieve for the Teen age range overall. Younger children and their caregivers may be more open to trying new skills in treatment, whereas teens and their caregivers noted feeling they have already tried many solutions and are less hopeful of anything working for their teen. In addition, they be disheartened if their child had developed skills early and experienced a shift in adolescence of worsening emotion dysregulation. If so, this could support building in motivational interviewing techniques into the curriculum, especially into earlier sessions.

Despite concerns about buy-in, Teens rated that they were learning during each session and that they were overall calm (rating of 1 on the 5 point scale) throughout the intervention, with the exception of session four which focused on distress tolerance. This material was likely triggering for the teens and additional calming techniques may need to be engaged to help the youth stay calmer during the session. Finally, the teens and their families may operate with more ingrained patterns of behavior that take longer or more individualized approaches to change. Adaptations may need to be made to the Teen curriculum to address these needs, including motivational interviewing techniques, as well as more feedback from families about their experiences. It is possible that lower engagement, higher skepticism, and caregiver lack of satisfaction with past treatments could have all contributed to less improvement on the outcome measures and are all areas to examine in future research. Of particular interest is whether caregiver-rated readiness at the beginning of treatment predicts treatment response. Examination of responder characteristics was beyond the scope of this manuscript but will be conducted in future analyses.

Limitations

This study was not without its limitations, the greatest being the fact that this trial was a within-subjects study and not an RCT; however, with-in subject trials are a critical and necessary step in evaluating any intervention to gather adequate pilot data. The inclusion of a control period for all participants strengthened the pre- and post-analysis design by having a comparison for all participants, but the gold standard RCT is the next step in validating RT to further examine efficacy and improve outcomes of youth with ASD + ED.

Future studies should include additional outcome measures including a quality-of-life measure to assess whether demonstrated changes in ED are reflected in overall quality of life of individuals with ASD. It may also be beneficial to examine internalizing and externalizing symptoms separately to explore whether RT is better at addressing one versus the other, although given the connection between the two in ED (Cai et al., 2021), it is possible there is an overlap in symptom reduction. Additionally, further evaluation of co-occurring diagnoses will strengthen future studies to assess whether addressing ED has impacts on specific co-occurring diagnoses. The primary use of caregiver measures to evaluate the intervention is a critical limitation. Future analysis of additional child self-report measures and additional functional outcome measures such as rates of emotional outbursts, both at school and home, may provide more information about real-life outcomes for youth. Additionally, objective, quantitative measures like autonomic reactivity may provide a helpful biological mechanistic means to evaluate treatment response. Finally, the ABC-I may not be the strongest measure for inclusion criteria as it may miss youth who experience ED related to anxiety. Future studies should explore other inclusion measure options and one option may be the EDI-R.

We included a wide age range of participants in order to increase access to emotion regulation intervention to youth with ASD. We addressed the wide age range by both analyzing the results together but also in separate age groups. Given the clear overlap of material, a combined sample was deemed appropriate, and the outcome measures suggested improvement in all participants, but there were notably different patterns of improvement between the two age groups. This supports a plan to split the intervention by age group and examine them as two distinct interventions in the future.

Conclusions

Emotion regulation is an important area of research within ASD given its connection to increased rates of challenging behavior, psychopathology, suicidality, and hospitalizations. Our within-subjects trial of RT suggests promise in improving ED symptoms as well as impacting related areas of flexibility and overall global improvement. We also show clinically meaningful reduction in hospitalization rates following RT. The comprehensive nature of this intervention including the group format, caregiver mediation, and use of a range of evidence-based techniques, is novel in comparison to the available interventions for ASD. RT addresses a clear need and gap in the literature related to ASD + ED treatment for children and teens. Future directions include a specific measure of mindfulness to further evaluate the youths’ mastery of the skill, more youth reported measures, and examination of responder characteristics. Given promising results from our pilot study, we believe the next logical step for RT is an RCT to determine whether improvements in ED and related factors were specific to our intervention or not.