Longitudinal data has identified the emergence of persistent emotional distress in children as young as preschoolers. This often takes the form of an internalizing disorder such as depression or anxiety (Tandon et al. 2009). Early prevention and intervention are important as emotional disturbance has been found to worsen over time, with preschool internalizing symptoms predictive of symptoms 8 years later (Mesman and Koot 2001) and sub-syndromal presentation in adolescence predictive of later disorder (Thapar et al. 2012). Longitudinal studies of depression indicate that males have more depression than females at age 11, with females catching up by age 13, and both groups experience a steep increase from age 15 to 18 (Hankin et al. 1998). Findings in developmental neuroscience on brain plasticity in childhood and adolescence suggest that these are opportune times for school-based preventive interventions (Bradshaw et al. 2012).

The prevalence of anxiety and depression among Chinese children is estimated at over 10% (Zgambo et al. 2012), comparable to or higher than for Western samples (Avenevoli et al. 2008; Silverman and Kurtines 2001). Hong Kong children and adolescents aged 6 to 16 rank 6th highest on emotional and behavioral problem scores of 31 countries surveyed (Rescorla et al. 2007). Internalizing problems can interfere with adaptive family, social and school functioning, and increase risks of drug use and suicide (Clarke et al. 2003; Costello et al. 2002; Fanti and Henrich 2010). Suicide is the primary cause of unnatural death among Hong Kong children below age 18 (Child Fatality Review Panel 2015; Zhao 2015). A 2016 report on increasing Hong Kong youth suicides noted that 60% in an approximately 3-year period involved educational adjustment (e.g., school transition) (Education Bureau 2016). Recommendations have included universal preventive programs targeting well-being, including “bridging programs” during primary to secondary school transition (Education Bureau 2016). School programs targeting emotional well-being are increasingly important given youth exposure to 2019 political turmoil in Hong Kong, with its safety risks (Chui 2019; Master 2019) and potential for activist trauma (Matthies-Boon 2017).

According to the cognitive model of psychopathology, emotional distress can be mediated by dysfunctional thinking (Beck 2008). Cognition has the aspects of content and process. Cognitive content refers to representational meanings, while cognitive process refers to variables such as the frequency and duration of thinking. When a thought is focused upon repeatedly or for a prolonged period, this is known as ruminating or obsessing. Dysfunctional cognitive process in depression is labeled “depressive rumination” or “brooding,” in generalized anxiety disorder is called “worry,” and in obsessive compulsive disorder is termed “obsessing.” Ruminating, worrying, and obsessing may exacerbate distress. People with negative cognitive styles tend to engage in brooding rumination which is predictive of depressive symptoms (Lo et al. 2008).

Whereas cognitive therapy for internalizing disorders emphasizes challenging the irrational content of cognitions, mindfulness involves a shift in cognitive process involving a reduction in “overthinking.” Mindfulness, as intentional open awareness of the ongoing stream of sensorimotoric and cognitive experience (including, in some traditions, awareness of awareness itself), is one form of the perennial practice of meditation, defined by Goleman (1976) as “a consistent attempt to reach a specific attention position.” Meditation has over a half century of research support as a psychological intervention (Walsh 1979), including medium to large effect sizes for improvement in emotional and social functioning (Sedlmeier et al. 2012). Siegel et al. (2009) defined “therapeutic mindfulness” as “awareness, of present experience, with acceptance” (p.19). Mindfulness training can facilitate emotional regulation by activating the prefrontal cortex, which mediates reflection, and the limbic system, which mediates emotions, as well as enhance neural connectivity between these regions (Zelazo and Lyons 2012).

The logical place to train children’s minds is in the school. As mindfulness training promotes enhanced awareness of emotions, it may be considered complementary to SEL interventions. A multitier intervention model of public health has been adapted for the US educational system, including interventions targeting mental health emphasizing internalizing problems (Merrell and Gueldner 2010). Universal interventions (or Tier 1) are used for primary prevention and offered to all students, whereas targeted (or Tier 2) and indicated (or Tier 3) interventions vary in intensity for students with early signs of problems or very problematic behaviors, respectively. A review by the Collaborative for Academic, Social, and Emotional Learning (CASEL) of 317 studies encompassing universal, indicated, and after school SEL programs with 324,303 students identified cognitive, emotional, behavioral, social and academic gains (Payton et al. 2008). A meta-analysis of 213 school-based SEL programs serving students in Grades K through 12 found improved social and emotional skills; emotional and behavioral resilience; and an 11 percentile point increase in academic achievement (Durlak et al. 2011). Both Universal and indicated SEL programs have been implemented and researched in Hong Kong schools. Kam et al. (2011) found that first graders improved in emotion regulation and prosocial behaviors after a universal-level classroom-led shortened form of the PATHS program (Greenberg and Kusché 1993). Wong et al. (2014) found a decrease in internalizing problems and hyperactivity when six sessions modified from Strong Kids (Merrell 2010) were conducted as an indicated intervention with first, second and third graders identified with social or emotional difficulties.

Felver et al. (2013) proposed that mindfulness be integrated into existing school psychology service provision based on the above three-tiered educational support model and the Response to Intervention (RTI) approach. At Tier 1, mindfulness is incorporated into an existing SEL framework or offered as a stand-alone prevention practice. Renshaw (2012) proposed a Multitier Mindfulness-Based Intervention Service Model integrating mindful awareness practices (MAPs) with school crisis prevention and intervention. Students learn “to recognize and respond to stressors and stress reactions in more constructive ways” (p. 417), with Tier 1 offered to all students in general education classrooms by their teachers, who lead them in breathing meditation and mindful check-ins.

Although of 3350 articles on mindfulness in PsychINFO through November, 2015, only 8% involved youth and only 1% were related to youth in educational settings (Felver and Jennings 2016), youth mindfulness research has proven promising (Felver et al. 2016). Mindfulness programs with youth have benefited executive function and self-regulation (Flook et al. 2010). A meta-analysis (Zoogman et al. 2015) found a small to moderate omnibus effect size of del = 0.23, with significantly larger effect sizes for measures of psychological symptoms (del = 0.37) and clinical samples (del = 0.50). A meta-analysis on mindfulness-based interventions in schools (Zenner et al. 2014) found overall small to medium effect sizes for controlled (Hedge’s g = 0.40) and pre-post studies (g = 0.41). High and significant effects were reported for improving cognitive performance (g = 0.80), with small to medium effects for stress and resilience (g = 0.36–0.39).

Little mindfulness research has been conducted with Asian children. To our knowledge only three school-based mindfulness studies have been published in Hong Kong. Lau and Hue (2011) offered a modified after-school MBSR program for non-clinical 14- to 16-year-old youth. From pre- to posttest, the intervention group significantly improved and the control group significantly deteriorated on “personal growth,” one of six well-being domains assessed. The control group also significantly worsened on a combined depression, anxiety and stress score whereas in the mindfulness group there was nonsignificant change in the expected direction. Lam et al. (2015) conducted a modified MBSR program with ninth- and tenth-grade students and noted behavioral problems during sessions, as well as scheduling conflict, low attendance, high dropout, and low student engagement. Some students reported that the program was too passive, slow or repetitive, and even “strange and weird” (p. 382). Lam (2016) administered an after-school, mindfulness-based cognitive intervention to third- through fifth-grade Hong Kong children who were identified with subclinical internalizing difficulties. In single-trial analysis, there were significant decreases in both worry and in symptoms of panic disorder, obsessive compulsive disorder, generalized anxiety and overall internalizing problems. In order to allow more students to benefit from mindfulness practice, service delivery could be improved by offering the programs during regular classes to eliminate scheduling conflict, and accommodating students’ interests by incorporating engaging activities.

Learning to BREATHE (L2B; Broderick and Metz 2009) is a mindfulness-based curriculum for middle and high school students that has shown promise as “a potentially effective universal program to promote the development of key social-emotional learning skills during adolescence” (Metz et al. 2013, p. 269). It integrates themes from Mindfulness-Based Stress Reduction (MBSR; Kabat-Zinn 1990), and is informed by therapies targeting emotion-regulation skills such as Acceptance and Commitment Therapy (ACT; Hayes et al. 2016), Mindfulness-Based Cognitive Therapy (Segal et al. 2013), and Dialectical Behavior Therapy (Linehan 2015). It targets two major SEL domains: self-awareness and self-management (Broderick 2013). It was specifically created as a curricular supplement for classrooms, which allows youth to benefit from training without risk of overloading either their academic or after-school extracurricular schedules. L2B covers 6 themes corresponding with the acronym BREATHE: Body, Reflections (thoughts), Emotions, Attention, Tenderness/Take it like it is, and Habits for a healthy mind, with the final E representing the overall program goal of Empowerment/gaining an inner Edge. Two versions of the program cover the same themes, with six 45-min sessions or eighteen shorter sessions, to accommodate schools’ administrative demands and unique student needs. It also allows for flexibility in adapting the program for an alternative number of sessions to accommodate idiosyncratic time constraints.

Pilot studies have been conducted on L2B in school settings. Broderick and Metz (2009) carried out an L2B study with female high school seniors as part of their health curriculum. Although treatment gains were not significant when compared with high school junior controls, within-group pre-to-post intervention change included improvements in emotional regulation, with increased calmness and relaxation and decreased negative affect, tiredness, aches and pains. Metz et al. (2013) implemented an 18-session L2B program with tenth to twelfth graders during the first 15–25 min of concert-choir classes. As compared with an instruction-as-usual nonrandomized comparison group from a different school, participants experienced higher efficacy in emotion regulation, and less perceived stress and psychosomatic complaints. A small group of seventh- to eighth-grade ethnic minority students with elevated depression showed significant within-group improvement in youth-reported internalizing and parent-reported externalizing problems after twelve 60-min L2B sessions (Fung et al. 2016). Bluth et al. (2016) added restorative yoga to the L2B curriculum for a small group of at-risk ninth through twelfth graders during a special period. Depression significantly increased in the control group and decreased in the intervention group, with a similar trend for anxiety. In a 6-week L2B intervention, Eva and Thayer (2017) reported significant pre- to post-intervention gains on stress items and on a single-item self-esteem indicator in a small predominantly male group aged over 17 and at risk of school failure. In a randomized controlled study with a small group of ninth through twelfth graders, the participants in a six-session L2B program reported significant improvement in anxiety, but not on stress, emotional regulation or mindfulness, as compared with controls (Potek 2011).

While the preliminary data on the L2B program are positive, generalizability is limited by a number of design factors: most of the age groups studied were 9th grade or above; sample sizes were small (20–30) or only female; few were conducted in students’ regular classrooms or as part of the class schedule; significant results were mostly found for within-group pre-post comparisons rather than for treatment versus control groups; and when control groups were used they were not always matched in age or on school attended. It has yet to be shown that L2B is an effective school-based universal program as it was intended to be when led in students’ regular classrooms with a full class size of mixed-gender students, or when compared to a similar-age control group in a similar educational context.

The present controlled trial was designed to extend the mindfulness treatment literature by investigating whether an adapted six-session L2B curriculum would lead to improvement in executive functions, emotion regulation, internalizing difficulties, and ruminative cognitions in seventh graders when conducted in the classroom during the regular class schedule. We hypothesized that experimental-group students would show significantly higher gains from pretest to posttest on all outcome measures when compared with students in the IAU condition. We also explored their subjective experience of the mindfulness-based activities as well as perceived benefits.

Method

Participants

We recruited a convenience sample of all four seventh-grade classrooms at a Band 3 (the lowest of 3 academic tiers), government-subsidized secondary school in a predominantly low to middle SES neighborhood in Hong Kong. We approached 132 students, of which 119 (90%) consented to participate. Four were excluded from the analysis due to excessive missing data. The final sample consisted of 115 students (96 with complete data) (M = 75, F = 40), from 11 to 15 years old. The mean age was 12.4 years, with 90% aged 12 or 13. In the intervention group, 66% were male vs. 64.5% in the IAU group. All completed the program with no more than two missed sessions.

Procedures

With the school’s permission, we invited parents/guardians of all seventh graders via letter to participate, obtaining active parent/guardian consent for all participants. In Hong Kong, secondary school students typically attend all lessons in one classroom with the same classmates. Intact classes of students with class sizes of 32–35, rather than individuals, were randomized to either the intervention or control group. The study was conducted during religion/social studies lessons, as the intervention content was theoretically consistent with some lesson topics, such as contemplative practice (e.g., prayer or meditation), spirituality, morality and relationships. To minimize interference with the teaching schedule, given that the religion/social studies schedule was irregular, we asked the teacher-in-charge to randomly assign one academically stronger and one weaker class to each of the conditions. Based on scheduling convenience, the teacher assigned two classes (one academically stronger and one weaker) to receive the intervention (Intervention Group) and left the remaining two classes (one stronger and one weaker) to attend religion/social studies lessons as usual (IAU Group). The teacher was blind to the content of the measures.

The study was conducted during the second term of the school year. The six intervention sessions were scheduled about once a month over five months as double lessons (70 min). Aside from the 2 assessment sessions, students in the IAU group attended religion/social studies lessons as usual (70 min) conducted by the religion/social studies teacher. A simple behavioral management plan awarded book coupons for participation and home practice. The pre- and post-intervention assessments were conducted by a graduate-level research assistant, and only students’ class numbers were used for identification to protect anonymity and reduce response bias.

The program included six 70-min class sessions mainly adapted from the six-session curriculum of the L2B program, with some activities from the eighteen-session curriculum (Broderick 2013). Session content is depicted in Table 1. Each session included a presentation of the lesson theme, activities that facilitate understanding of the lesson theme, and in-class mindfulness practices (Broderick 2013). Each began by reviewing ground rules and a brief mindfulness practice such as attention to sounds or breath awareness. To maximize generalization, students received home practice handouts at the end of each session and could download audio files for guided practice via the school intranet. The program was delivered by a clinical and school psychologist (the first author) who is a Diplomate of the Academy of Cognitive Therapy (ACT) with training in MBSR and MBCT, with logistical assistance from a graduate-level research assistant. To maintain consistency in program delivery, the two intervention classes used the same structured session plans, PowerPoint slides, video/audio clips, games, activities, and handouts, with the approximately monthly lessons delivered to each class within a 2-week period.

Table 1 Content of Six Intervention Sessions

Adaptation of evidence-based interventions is important in school settings in response to contextual constraints (e.g., time, space, and client cultures) and does not necessarily diminish protocol effectiveness (Long et al. 2015). Developmentally tailored accommodations of mindfulness-based youth interventions, such as using multiple sensory modalities and metaphors, are recommended over simply creating “child friendly” adaptations of adult materials (Felver et al. 2013). Minor adaptions due to classroom space and time constraints included a shortened in-seat body scan. Much effort was made to engage students by modifying some activities by using games, videos, local newspaper clips, and examples derived from local Hong Kong materials while maintaining core content. For example, for the activity “A Stressed-Out Case” in the Attention theme, a video of youth confronting various stressors replaced a written story as a prompt to identify stressors. Some cultural adaptations were made as well. Prior research in Hong Kong (Lam 2016) had found that some students feel resistant or embarrassed if asked to close their eyes in class. Accordingly, “Mindfulness of Thoughts” in the Reflection theme, which is usually presented as a closed-eye guided meditation, was replaced by the verbal “Leaves on a Stream” exercise of Acceptance and Commitment Therapy (Harris 2009). This was adapted such that students were invited to observe videos clips of leaves floating on a stream while placing each thought that entered their mind on a leaf and letting it float by. The instruction maintained the content of the original guided meditation.

Measures

Outcome variables were assessed pre- and post-intervention in the classes. Chinese versions of the Youth Self-Report (YSR; Achenbach and Rescorla 2001) and Behavior Rating Inventory of Executive Function - Self-Report version (BRIEF-SR; Guy et al. 2004) were obtained from the publisher. The other two measures were back-translated (Brislin 1980) by two bachelor-level bilinguals. Another English-speaking American psychologist verified equivalent meaning. Cronbach’s alpha reliability based on the imputed dataset for each major scale and subscale at pre- and posttest was acceptable (Kline 1999), differing by less than 0.01 from the complete dataset.

Perceived Stress

A single-item measure of perceived stress level developed by the program developer (Dr. Broderick) was back-translated to evaluate effectiveness of the L2B program (Metz et al. 2013). In response to the instruction “sometimes people feel really stressed out and sometimes they don’t feel stressed out,” students were asked to rate how stressed they had been feeling the preceding week on a scale of 1 (no stress) to 10 (a lot of stress).

Emotion Regulation

The Difficulties in Emotion Regulation Scale (DERS; Gratz and Roemer 2004) is a 36-item comprehensive measure assessing emotion regulation ability across six domains. Good psychometric properties have been reported for adolescents (Weinberg and Klonsky 2009) and college students (Gratz and Roemer 2004; Rugancı and Gençöz 2010). It has been validated (Li et al. 2018) and used with Chinese adolescents (Yu et al. 2013) and adults (Liu et al. 2017). The internal consistency (Cronbach’s α) of a 30-item version used with Hong Kong youth was 0.96 (Mo et al. 2018). Given that the items measuring lack of emotional awareness were questionable (Van Lissa et al. 2017), this study adopted a revised 30-item version with five factors which was validated (Bardeen et al. 2012) and supported by Chinese samples (Li et al. 2018). In addition to a total DERS score (30 items) [α = 0.93 at pretest, α = 0.92 at posttest], there are five DERS factor scores: Nonacceptance of emotional response (6 items) [α = 0.71 at pretest, α = 0.74 at posttest], difficulties in engaging in goal-directed activity (5 items) [α = 0.83 at pretest, α = 0.79 at posttest], impulse control difficulties (6 items) [α = 0.84 at pretest, α = 0.84 at posttest], limited access to emotion regulation strategies (8 items) [α = 0.80 at pretest, α = 0.78 at posttest], and lack of emotional clarity (5 items) [α = 0.67 at pretest, α = 0.70 at posttest]. Items were measured on a 5-point Likert scale from 1 (almost never) to 5 (almost always). Ratings on the items were averaged, with higher scores indicating increased emotional dysregulation.

Rumination

The Ruminative Responses Scale (RRS; Nolen-Hoeksema and Morrow 1991) was used to measure the tendency to ruminate in response to depressed mood. The present study used a revised 10-item version [α = 0.74 at pretest, α = 0.72 at posttest] reflecting 2 distinct factors, reflective pondering (e.g., how often they “Write down what you are thinking and analyze it”) and brooding (e.g., how often they “Think about a recent situation, wishing it had gone better”). The 10-item version has shown satisfactory psychometric properties when used with Hong Kong college students and clinical samples (Lo et al. 2008) and has been validated for Chinese high school students (Yang et al. 2009). Items were measured on a 4-point Likert scale with higher averaged scores indicating a more ruminative response style.

Internalizing and Attention Problems

The Youth Self-Report (YSR; Achenbach and Rescorla 2001) is a widely-researched self-report instrument assessing emotional and behavioral problems in 11- to 18-year-old youth. The eight-syndrome structure has been empirically supported in Hong Kong and 23 societies, with reliability and validity of the Chinese version established for use with Hong Kong youth (Ivanova et al. 2007). Respondents are asked to rate the frequency of symptoms on a 3-point Likert scale (0 = absent, 1 = occurs sometimes, 2 = occurs often). The present study measured internalizing and attention problems by using the 9-item Attention Syndrome subscale [α = 0.76 at pretest, α = 0.81 at posttest], 9-item Somatic Complaints Syndrome subscale [α = 0.68 at pretest, α = 0.72 at posttest], and 14-item Anxious/Depressed Syndrome subscale [α = 0.80 at pretest, α = 0.82 at posttest]. Two items on the Anxious/Depressed Syndrome subscale relating to suicidal ideation or attempt were omitted. A total YSR score was derived based on the sum of the three syndromes (31 items) [α = 0.88 at pretest, α = 0.88 at posttest]. Item scores were averaged, with higher scores indicating higher psychopathology.

Executive Functions

The Behavior Rating Inventory of Executive Function - Self-Report version (BRIEF-SR; Guy et al. 2004) is a standardized neuropsychological measure assessing 11- to 18-year-old adolescents’ views of their own purposeful, goal-directed, problem-solving behavior via eight clinical scales related to problems with organization, planning, and attention. Due to class period time constraints, four scales (37 items) of primary interest were chosen for the present study and used to generate a total score [α = 0.89 at pretest, α = 0.91 at posttest]. The 10-item Shift scale measures one’s ability to transition among situations and activities, use flexibility in problem solving, and shift attention [α = 0.72 at pretest, α = 0.68 at posttest]. The 10-item Emotional Control scale measures mood stability and emotional modulation [α = 0.85 at pretest, α = 0.86 at posttest]. The 5-item Monitor scale measures self-monitoring such as checking work and behavioral awareness [α = 0.66 at pretest, α = 0.69 at posttest]. The 12-item Working Memory scale measures the ability to hold information in the mind to follow instructions and sustain attention [α = 0.72 at pretest, α = 0.76 at posttest]. The latter two scales are components of the 5-scale Metacognition Index which taps cognitive self-management of tasks and self-monitoring of performance (Guy et al. 2004). Respondents rate their experience in the past 6 months on a 3-point scale (“Never” = 1, “Sometimes” = 2, and “Often” = 3). Higher scores indicate more impairment of executive control functions. The Shift and Emotional Control scales are components of the 3-scale Behavioral Regulation Index which has been renamed the Emotion Regulation Index in a revision (BRIEF2; Gioia et al. 2015). The BRIEF parent form demonstrated satisfactory reliability and validity, and has been widely used as a clinical tool in Hong Kong (Zhang et al. 2017) to assess executive functioning in children.

Process Evaluation of Acceptability, Benefits and Utility

We employed a two-part participant survey to evaluate students’ overall experience. The first part followed the format of a survey developed by the program developer (Dr. Broderick) to evaluate L2B’s acceptability and perceived social validity (Metz et al. 2013). It contained 13 closed-ended items on perceived benefits of the program and its major components (e.g., mindful eating/breathing; body scan). Students rated each item from 1 (not useful) to 10 (very useful). Another 11 items were included to evaluate the frequency of homework practice (e.g., mindful eating/breathing; stretching) throughout the program by asking students to choose 1 (0 times), 2 (1–2 times), 3 (3–6 times), 4 (once a week), or 5 (more than once per week). The mean amount of practice was derived by averaging the ratings across nine specific skills taught and included in homework handouts. The second part was adapted from mindfulness research with children (Semple and Lee 2011) and has been used in a local school-based mindfulness study (Lam 2016). It consisted of 9 closed-ended items scored on a 5-point Likert scale from 1 (strongly disagree) to 5 (strongly agree), such as “This group has been helpful to my school life” and “I feel better able to handle my emotions since I participated in the group.”

Data Analyses

This study employed a quasi-experimental control group pretest-posttest design to examine intervention effects. The primary intervention outcomes were stress, emotion regulation (DERS), rumination (RSS), executive functions (EF), and internalizing and attention problems (YSR). A one-way multivariate analysis of variance (one-way MANOVA) was performed to determine whether there were statistical differences between the intervention and comparison groups on the pretest measures at baseline. Since the primary interest of this study was to examine whether the groups, on average, differed in change from pretest to posttest, mean gain scores were computed (pretest minus posttest) and treated as dependent variables in one-way MANOVA to examine intervention effects. Conducting an analysis of gain scores in a pretest-posttest control group design has been recommended (Dimitrov and Rumrill 2003) as it is more powerful than ANCOVA for small studies (Oakes and Feldman 2001) and is identical to the mixed factorial ANOVA or ANCOVA (Anderson et al. 1980). Improvement from pretest to posttest (a higher problem score minus a lower score) is indicated by a positive change score, with deterioration from pretest to posttest (a lower problem score minus a higher score) indicated by a negative change score.

Pillai’s Trace was used for MANOVA (Field 2009). For follow-up univariate ANOVAs, Brown-Forsythe F and Welch F were used when the assumption of homogeneity of variance as indicated by Levene’s test was violated (Field 2009). The Sidak correction was used for multiple comparisons due to concern about loss of power (Field 2009). Spearman’s correlation was performed to test whether homework practice was associated with intervention effects. The intervention effect size was reported as partial eta squared (ηp2), which for a one-way between-subjects ANOVA is the same as eta squared (Bakeman 2005). Cohen’s (1988) guidelines (small = 0.01; medium = 0.06; large = 0.14) were used to interpret eta squared for ANOVA (Gray and Kinnear 2012; Richardson 2011).

Missing Data

Multiple imputation methods in SPSS Statistics 23.0 (IBM Corp. 2015) via MCMC algorithm were used to handle missing data due to advantages over the listwise method (van Ginkel and Kroonenberg 2014). Multiple imputation is the benchmark for other missing data methods (van Buuren 2012), and is increasingly used in healthcare (Houchens 2015). All variables were included in the imputation model and five imputed datasets were created (Schafer and Olsen 1998). The pooled results take into account variation across imputations. The primary analysis was based on imputed datasets, but results of complete-case analysis (i.e., of the complete dataset) with pairwise deletion were also reported (Manly and Wells 2015) when there were inconsistencies in statistical interpretations. MANOVA and ANOVA do not support pooling of results in SPSS 23.0, there are no explicit rules for pooling F-tests, and even those proposed are not easy to follow (Manly and Wells 2015; van Ginkel and Kroonenberg 2014). To solve this problem, this study derived a pooled p value by using Finch’s (2016) Z and T methods, which combine p values from MANOVA conducted with multiple imputed datasets. The median and range of the relevant statistics were presented for the imputed datasets (Manly and Wells 2015).

Results

Preliminary Analyses

Screening of outliers and missing values indicated that in the final sample (n = 115), only 0.30% of the total data points/values for inferential testing (or group comparison) were missing. No item had missing data greater than 3.5%. Little’s MCAR test was not significant, indicating that missing data were missing at random, and that complete-case analysis based on listwise or pairwise deletion was appropriate. A dataset for complete-case analysis was created (Complete; n = 96) for analyzing pre- and posttest differences, including only participants with no missing data on any of the 5 outcome variables (stress, emotion regulation, rumination, executive functions, and internalizing and attention problems). The 19 students excluded from the complete-case analysis did not differ from others in age or on scales or subscales which they completed without missing data, including rumination, executive functions, and the DERS subscales of nonacceptance of emotional response, difficulties engaging in goal-directed activity, and lack of emotional clarity. Analyses were not performed on other subscales with missing data.

To establish that the groups were equivalent prior to the intervention, their pretest conditions at baseline were compared. The intervention and IAU children did not differ in gender χ2 (1) = 0.03, ns, or age, t (113) = 0.74, ns. The groups also did not differ in current or past 6-month practice of yoga, t (113) = 0.23, ns, or meditation, t (113) = − 0.34, ns, though the IAU group reported more frequent meditation than the intervention group. MANOVA was used to examine possible baseline differences between groups on all pretest measures (Stress, DERS, RSS, EF, and YSR). Results indicated that the effect of group across all pretest measures at baseline was non-significant across the imputed datasets, Pillai’s Trace = 0.07, F(5, 109) = 1.53–1.60, p = 0.16–0.18. Complete-case analyses also yielded non-significant results on all pretest measures, Pillai’s Trace = 0.08, F(5, 90) = 1.63, p = 0.16, supporting the conclusion that the two groups were comparable in baseline characteristics. No cohort effect was found on all outcomes in the complete dataset when tested with systems of equations (Zellner’s method; Kakarantza and Symeonides 2017).

Outcomes

Table 2 details complete and imputed dataset pretest, posttest and change scores for each measure and its subscales by condition. A one-way MANOVA was conducted on between-group differences in change scores when all cognitive, behavioral, and emotion outcome measures were analyzed simultaneously. Using Finch’s (2016) T and Z methods for combining p values across multiple imputed datasets, the pooled value across the five imputed datasets was 0.01, which suggests a statistically significant overall intervention effect, Pillai’s Trace = 0.13, F(5, 109) = 3.11–3.18, p = 0.01, ηp2 = 0.13, with a large effect size. This indicates that the groups differed on the linear composite of stress, emotion regulation, rumination, executive functions, and internalizing and attention problems.

Table 2 Descriptive Statistics (Means and Standard Deviations) for Outcome Measures by Intervention and IAU Conditions Based on Complete (n = 96) and Imputed Datasets (n = 115)

In separate follow-up univariate ANOVAs on the outcome variables for imputed datasets (see Table 3), there were significant intervention effects on rumination (RSS), median p value = 0.03, and executive functions (EF) with all p values uniformly = 0.002, corresponding with a small and medium effect size for RSS and EF, respectively. For executive functions, the intervention group showed slight improvement at posttest while the IAU group reported increased problems. Although both groups showed an increase in rumination at posttest indicating increased difficulties, the intervention group showed significantly less deterioration than the IAU group. The intervention effect on internalizing and attention problems (YSR) was in the expected direction but not significant, median p value = 0.065, with the intervention group improving in functioning and the control group deteriorating. The results in the complete case analysis were similar, with the univariate ANOVAs yielding significant intervention effects for internalizing and attention problems (YSR), F(1, 94) = 5.28, p = 0.02, with a small effect size, but not for rumination.

Table 3 Univariate Analysis of Variance Results for Intervention Versus IAUa Change Scores on Each Measure for Complete (n = 96) and Imputed Datasets (n = 115)

Given that there was a significant treatment effect on executive functions (for both the imputed and complete datasets), rumination (for the imputed dataset) and internalizing and attention problems (for the complete dataset), it was of interest to explore whether there were significant group differences on all or a subset of the dimensions of executive functions, rumination and internalizing and attention problems. Three separate MANOVA tests were therefore conducted on the three measures.

Across the five imputed datasets, the overall MANOVA test on the four subscales of executive functions (EF), namely Shift, Emotional Control, Monitor, and Working Memory, was significant, Pillai’s Trace = 0.11, F(4, 110) = 3.32–3.49, p = 0.010–0.013, ηp2 = 0.11. The median p value was 0.011, which was consistent with the pooled p value (Finch 2016) of 0.011. As seen in Table 3, follow-up univariate ANOVAs indicated that for the imputed datasets there were significant intervention effects on Emotional Control, with a median p value of 0.004; Monitor, with a median p value of 0.010; and Working Memory, with a median p value of 0.030, with medium effect sizes for Emotional Control and Monitor and a small effect size for Working Memory. The intervention effect on Shift was not significant. Across the five imputed datasets, the overall MANOVA test on the two factors of rumination (RSS), namely reflective pondering and brooding, was not significant (pooled p value = 0.08), Pillai’s Trace = 0.045, F(2, 112) = 2.58–2.67, p = 0.074–0.081, ηp2 = 0.05. The nonsignificant trend was however in the expected direction such that the increase in rumination in the control group was more than that of the intervention group. For the five imputed datasets, a nonsignificant result (pooled p value = 0.12) was found for the overall MANOVA test on the three syndrome subscales of YSR, namely Attention, Anxious/Depressed, and Somatic. For the complete-case analysis, however, the overall MANOVA test on the three YSR subscales was significant, Pillai’s Trace = 0.103, F(3, 92) = 3.51, p = 0.018, ηp2 = 0.10. Follow-up univariate ANOVAs showed significant differences on the Anxious/Depressed subscale, with a medium effect size. As shown in Table 3, this was consistent with the imputed data analysis which also yielded a significant intervention effect for the Anxious/Depressed subscale.

As illustrated in Tables 2 and 3, the significant between-group differences on change scores were such that the L2B group improved on three of the executive functions subscales (Emotional Control, Monitor and Working Memory) and on the Anxious/Depressed subscale whereas the IAU group deteriorated. This reflects a pattern of deterioration in the IAU group in executive functions, rumination, and internalizing and attention problems. To further examine the issue of deterioration, secondary analyses were conducted to examine pre- to post-intervention change for each condition. Paired t tests on the imputed datasets yielded significant pre- and posttest deterioration in the IAU group on the following: rumination (t = − 3.93, p < 0.001) and its subscales of brooding (t = − 3.34, p = 0.001) and reflective pondering (t = − 2.13, p = 0.03); and executive functions (t = − 3.90, p < 0.001) and its subscales of Shift (t = − 2.28, p = 0.02), Emotional Control (t = − 3.71, p < 0.001), Monitor (t = − 3.61, p < 0.001), and Working Memory (t = − 2.06, p = 0.04). The complete-case analysis reached the same conclusion except that the deterioration in reflective pondering for the IAU group was not significant (t = − 1.42, p = 0.16) and deterioration on the Anxious/Depressed syndrome subscale was marginally significant (t = − 2.01, p = 0.05). As expected, the L2B group did not show any significant deterioration except on the DERS subscale on impulse control difficulties (t = − 2.43, p = 0.02), and instead showed significant improvement on the DERS subscale on lack of emotional clarity (t = 2.39, p = 0.02). The complete-case analysis also showed a significant improvement for the L2B group on the Anxious/Depressed syndrome subscale (t = 2.28, p = 0.03). Figure 1 shows subscales for which there were significant pre- to posttest deteriorations in the IAU group but not in the L2B group using both imputed and complete-case datasets. Mean change scores by condition for variables with significant between-group differences and/or significant pre- to posttest deterioration within the IAU group can be found in the online Supplement (Figure S1).

Fig. 1
figure 1

Mean change scores from pre- to posttest by condition for variables with significant between-group differences and/or significant pre- to posttest deterioration within the IAU group. I (Imputed) = intervention condition, imputed data; I (Complete) = intervention condition, complete-case analysis. IAU (Imputed) = instruction as usual condition, imputed data; IAU (Complete) = instruction as usual condition, complete-case analysis. I (Imputed): (n = 53); IAU (Imputed): (n = 62). Only means were available for the imputed data which were pooled from five imputed datasets (n = 115). I (Complete): (n = 45); IAU (Complete): (n = 51). The complete-case analysis was based on the complete dataset for the 5 major outcome variables (n = 96). RSS-Ponder, reflective pondering; RSS-Brood, brooding; EF-Shift, shift attention; EF-Emotion, emotional control; EF-Monitor, self-monitoring; EF-Memory, working memory; YSR-Anx/Dep, Anxious/Depressed Syndrome subscale. An asterisk indicates significant between-group differences and significant within-group pre-to posttest deterioration in IAU (in both the imputed and complete-case analyses). An asterisk with a number sign indicates significant between-group differences but no significant within-group deterioration in IAU (in both the imputed and complete-case analyses). Positive change scores indicate improvement and negative change scores indicate deterioration

Process Evaluation

Qualitative process evaluation of the program and activities indicated marginally positive overall usefulness (M = 5.77, SD = 2.72) and satisfaction ratings (X = 6.62, SD = 2.50), and mean activity ratings ranging from 4.58 (SD = 2.25) to 5.70 (SD = 2.71) on a scale of 1 (not useful/not satisfied) to 10 (extremely useful/extremely satisfied). The highest activity rating was for “gratitude” (M = 5.70, SD = 2.71) and the lowest was for the body scan (M = 4.58, SD = 2.25). Process evaluation showed that approximately 52.8% of participants subjectively perceived the program as useful (a rating of 6 or above on a 10-point scale), with 66% satisfied with the program (a rating of 6 or above). Approximately 15% (8 students) indicated dissatisfaction with the program (a rating of below 5). Upon completion of the program, approximately 40–45% of participants agreed or strongly agreed that they had improved in managing emotions, interpersonal relationships, patience, and attention control, and 30–40% found the program helpful for their school or family life. When asked whether the program should be offered in secondary school, nearly half (41.5%) of the students agreed or agreed strongly, with another 45% not indicating a preference.

Spearman correlations between change scores in outcome variables and the mean amount of homework practice were nonsignificant. Eighty percent of students reported that they practiced the learned skills at least once or twice during the program (indicated by a rating of at least 2). Between 45.3 and 75.5% of the students in the intervention group practiced the following at least once or twice during the program: awareness of emotions or stress (75.5%); kindness to self (71.7%); breathing exercise (69.8%); gratitude (64%); awareness of cognitions (64%); stretching (60.4%); attention on senses (60.4%); observing emotions (57.7%); and the body scan (45.3%). The percentages of those who practiced specific skills at least once per week throughout the program were as follows: observing emotions (21.1%), gratitude (17%), breathing exercise (17%), kindness to self (15.1%), attention to senses (11.4%), body scan (9.4%), awareness of cognitions (8%), and stretching (7.6%). More than 73.6% reported that they applied what they learnt to handling difficulties at least once.

Discussion

This controlled-trial study extended empirical support of the L2B program to seventh-grade youth who had transitioned at the beginning of the academic year from primary to secondary school. The intervention group received a 6-session adapted form of the program in a regular classroom as part of the curriculum and experienced superior improvements as compared with an instruction-as-usual control in a linear composite of executive functioning, a cognitive process/style (rumination) associated with emotional distress, internalizing problems (depression and anxiety) and attention difficulty. Most between-group differences reflected improvement in the intervention group on variables on which the control group deteriorated. Results from the imputed datasets were similar to those from complete-data analysis (using listwise deletion).

The overall between-group differences in pre- to posttest change scores on all outcome measures when analyzed simultaneously were characterized by medium to large effect sizes. Analysis of between-group differences in change scores on specific subscales revealed significant differences favoring the L2B group and characterized by (1) medium effect sizes for the executive functioning components of emotional control and self-monitoring, with a small effect size for working memory; (2) a small effect size for the internalizing problem component reflecting anxiety/depression; and (3) a small effect size for the rumination component of brooding. The effect sizes in the present study were consistent with the small and medium effect sizes identified by meta-analytical studies of mindfulness-based interventions with youth under 18 years of age (Zenner et al. 2014; Zoogman et al. 2015), and larger than for 11- to 14-year-olds in a meta-analysis of school-based mindfulness programs that found no significant effects on mental health or well-being (Carsley et al. 2018).

The significant between-group differences on executive functioning and anxiety/depression corresponded with pre- to posttest improvement on these variables in the intervention group and deterioration in the control group. While both groups manifested increased rumination in the form of brooding at posttest, the intervention group did not experience as much of an increase in this maladaptive cognitive process as did the control group. There was therefore a pattern in which adolescents in the L2B group either improved or remained at approximately pretest levels, while the IAU group deteriorated on all outcome variables for which significant between-group differences were found, as well as on all subscales of rumination and executive functions. Lau and Hue (2011) similarly found that Hong Kong adolescents in a control group significantly worsened on personal growth and on a combined depression, anxiety and stress measure over time, whereas youth in an after-school mindfulness program significantly improved in personal growth and experienced nonsignificant positive change in combined depression, anxiety and stress. This pattern was also seen in Bluth et al.’s (2016) L2B study in which depression and anxiety scores worsened for youth in the control group over time but improved for the program participants. The data were consistent with previous findings that mindfulness training has a positive preventive impact on the executive functioning of children (Schonert-Reichl et al. 2015), and may foster resilience to the challenges of maturation. This is in keeping with findings in developmental neuroscience on brain plasticity in childhood and adolescence, making these opportune times for school-based preventive interventions (Bradshaw et al. 2012). The contribution of control group deterioration to effects observed in this and other SEL programs suggests that, without intervention, youth may deteriorate in functioning due to vast adolescent developmental changes (Blakemore et al. 2010).

From a consumer perspective, the program received some positive feedback from participants. At least half perceived it as helpful and were satisfied. While half rated it affirmatively on various indicators, however, approximately one third took a neutral or uncertain stance. The overall low weekly practice rate was consistent with some students’ ambivalence about the program. The low weekly practice rate for awareness of cognitions (8%) warrants greater emphasis in the program, as it is important for the cognitive therapy component (“dealing with troubling thoughts”). As some students may not favor mindfulness practice, caution should be exercised and further adaptations explored when mindfulness training is integrated into Hong Kong curriculums as a universal program.

The minimal between-session practice may have attenuated potential program effectiveness. Practice was not closely monitored or immediately reinforced for logistical reasons. Phone/text reminders were not implemented as some students did not own a phone or were unwilling to be contacted. Training teachers to integrate mindfulness into their curricula may provide opportunities for in-school practice and is a promising trend in research on mindfulness in youth education (Jennings 2016). Positive results were found despite nonsignificant correlations between outcome change scores and mean homework practice. This is inconsistent with findings in the literature that inter-session mindfulness practice is related to intervention effectiveness (Zenner et al. 2014). On the one hand, there may have been a floor effect due to limited variability in home practice. On the other hand, Potek (2011) found that L2B reduced self-reported anxiety in the absence of a practice effect, and a meta-analysis of youth mindfulness interventions similarly found no moderating effect of practice (Zoogman et al. 2015).

At a general theoretical level, the present study contributes to the literature on mindfulness in a number of ways: (1) The present universal school-based approach to intervention during a transitional life event (beginning secondary school) for Hong Kong youth is consistent with the trend toward the use of mindfulness programs targeting increasingly younger populations, and research on age-relevant outcomes. The school, as an environment where youth spend much of their time learning cognitive, academic, emotional, physical and social skills, has become a natural venue for both the practice of and research on youth mindfulness training as a universal (Tier 1), targeted (Tier 2) or indicated (Tier 3) mind-body health intervention. Cultural considerations are also seen as an important area for investigation (Bray and Maykel 2016); (2) Mindfulness research now extends downward in age to preschool (Thierry et al. 2016). It is interesting that a pattern of improvement in executive functioning in the mindfulness group and deterioration in the control group similar to that observed in the present study with early adolescents has been found with students as young as prekindergarten and kindergarten, and that such a pattern was especially characteristic of students learning English as a second language (Thierry et al. 2016). This suggests that mindfulness training may have preventive benefits with regard to age-related and culture-related declines in functioning at different stages of development. Identification of age-specific and culture-related risks of deterioration on cognitive, emotional and academic variables, and the impact of mindfulness training on these, could therefore be an important area for future research. As adaptive functioning with children as young as preschoolers aged 2–3 is predictive of later functioning at age 10–11 (Mesman and Koot 2001), preventing decrements at an early developmental stage could potentially reduce the risk of decreases on similar or different types of functioning at a later stage; (3) The consistency of the present findings with the smaller effect sizes observed in the literature for early adolescents as compared with younger and older youth (Carsley et al. 2018) raises the question of reasons for observed age-related differences in the effects of mindfulness training. Proposals include developmental differences in self-concept, neurocognitive maturity and self-awareness relevant to the experience and learning of mindfulness (McKeering and Hwang 2019). It is important to note that the impact of such differences on the learning of mindfulness is not necessarily related in a linear way to age. For example, despite the increased self-awareness and cognitive maturity of early adolescence as compared with late childhood, the substantial developmental changes occurring in early adolescence may account for the smaller effect sizes that have been observed among early adolescent participants in mindfulness programs as compared with youth of other ages (Carsley et al. 2018); (4) This study responded to calls in the youth mindfulness literature for more research on preadolescents, more qualitative feedback from participants regarding their experience of the training (McKeering and Hwang 2019), more information on specific program details and more active and engaging content (Tan 2016). We correspondingly utilized games, activities and videos; participants provided qualitative feedback; and we have specified the details of program content, themes and modifications in this article.

Limitations and Future Research

Since adaptations are warranted to address cultural differences, unique needs of early adolescents, and logistics such as space and time constraints, this study did not strictly follow the L2B manual. The sample was limited to only four classes within one school and the intervention was implemented by one psychologist. While the effect of environment can be ruled out as participants shared a similar school context, there may have been a diffusion of treatment if students discussed experiences. Such an occurrence could, however, lend ecological validity to the findings, as communication among participants would be expected for any school-based intervention.

Entire classrooms were randomized rather than individuals so that the study could be conducted as part of the regular class schedule without interrupting learning, mirroring the intended integration of the program into the school day. In Hong Kong, students in the same grade attend lessons together in one classroom all day. However, a selection bias was possible, because although the teacher was asked to choose at random one strong and one weak class for each condition, her judgment and therefore assignment to conditions was potentially subjective.

The present study was mainly interested in whether the integration of L2B into the existing curriculum would improve outcomes as compared with the existing curriculum. Instruction as usual, taught by the social/religious studies teacher, was selected as the comparison condition to serve as a naturalistic active control group to increase external validity. The intervention group participated in the mindfulness program during the same scheduled lesson time in the same classroom context as the IAU group. This controlled for some aspects of staff attention, time, educational content and psychosocial/spiritual content. There were, however, some differences in delivery, as the social studies and religion classes used as the control condition did not include some types of instructional activities used in the intervention conditions (e.g., games and videos). In order to match the expectation of improvement, to minimize the likelihood that students in the intervention group would expect improvement of a clinical nature, the mindfulness program was framed as an enrichment of the regular curriculum rather than as a treatment of problems.

All outcome measures were self-report, and some were based on one item, which lacks psychometric validity, or only on some subscales of the measures (i.e., BRIEF, YSR) due to time constraints. Use of self-report as the unique type of outcome measure may have led to common method variance or bias, with observed results being due to individual differences in participant reactions to self-report measurement. Behavioral measure of executive functions could be used in future studies.

A final issue is who should be conducting the intervention. Ideally, more than one trainer should be involved, or a treatment fidelity check should be done using live ratings or ratings based on video recordings. Renshaw (2012) proposed a Multitier Mindfulness-Based Intervention Service Model for crisis prevention in which mindfulness awareness practices are taught by classroom teachers as part of a universal program. Recent research has shown that SEL programs are more effective when implemented by teachers than by outside experts (Durlak et al. 2011), and that mindfulness training conducted by teachers is effective even if they themselves have received only brief mindfulness instruction (Schonert-Reichl and Lawlor 2010). Future research on L2B can explore whether programs by teachers would facilitate integration of L2B into the school curriculum.