Although universities and colleges are designed to foster the transition from young adult to productive emerging adulthood, many stressors accompany this transition to independence, including academic pressures and financial burdens along with new social networks and lifestyle choices, all of which may lead to stress, anxiety, and depression (Unwin et al., 2013). Mental health disorders have become more common among undergraduates, with a one standard deviation increase in clinical psychopathology scale scores for depression in the United States between 1938 and 2007 (Twenge et al., 2010). A 2021 report including 602 United States college and university counseling centers (= 185,440) showed steady annual increases in generalized anxiety, social anxiety, depression, suicidal ideation, non-suicidal self-injury, traumatic experiences, and eating concerns in students seeking counseling between 2010 and 2020 (Center for Collegiate Mental Health, 2021).

In the domain of physical health, obesity more than doubled among young adults in the past 30 years (Nelson et al., 2008; Ogden et al., 2016; Sparling, 2007). Physical activity levels have remained low across colleges in the United States with about half of students below nationally recommended guidelines (Keating et al., 2005; Weinstock, 2010). According to the 2017 National Survey on Drug Use and Health, binge drinking behavior among college students remains high at 34.7% (SAMSHA, 2018). Sleep quality and quantity are also low (Gaultney 2010). Undergraduates often live independently from their families for the first time, with strong peer effects (Nelson et al., 2008). The transition to college presents novel opportunities to experiment with excessive alcohol, illicit drugs, or sexual relationships, and to establish new health behaviors involving physical activity, diet, and sleep (Heller & Sarmiento, 2016; Luo et al., 2015). Possibly, mindfulness training can help with fostering well-being, both at this time in history and at this sensitive time in the young adult life course.

Jon Kabat-Zinn (1990) defined mindfulness as “paying attention, on purpose, in the present moment, non-judgmentally,” with a similar consensus definition elsewhere (Bishop et al., 2004). Mindfulness interventions have been increasingly used to address physical (Levine et al., 2017b; Loucks et al., 2015) and mental health (Goyal et al., 2014; Kuyken et al., 2016). The college undergraduate experience is a potentially important and sensitive period in the life course to participate in mindfulness interventions. Students often attend college to prepare themselves for a successful future, so they are at a particular time in their life course when motivation is high to find effective strategies, both within their careers and elsewhere in their lives. Furthermore, physiologically, the brain is maturing in regions targeted by mindfulness training: The prefrontal cortex develops during adolescence and is involved in self-regulation. The frontal lobe, which handles executive function, attention, and motor coordination, is one of the last areas to mature during early adulthood (Diamond, 2002; Gogtay et al., 2004). By engaging these developing regions, mindfulness interventions may increase connectivity between these regions, potentially enabling better self-regulation (Hölzel et al., 2011; Tang et al., 2015). Early evidence suggests that mindfulness interventions may improve self-regulation (Hölzel et al., 2011; Loucks et al., 2015; Tang et al., 2015) along with mental and physical health factors. Indeed, numerous universities and colleges around the United States and worldwide are actively incorporating mindfulness interventions to improve student well-being (Rogers & Maytan, 2012).

Mindfulness research shows promising preliminary effects among higher education students, including both undergraduate and graduate students. A systematic review and meta-analysis of 51 randomized controlled trials (RCTs) in undergraduate and graduate students demonstrated that mindfulness programs significantly improved distress, depressive symptoms, and state anxiety symptoms compared to inactive controls; yet, effects of mindfulness programs were weaker when compared to active controls (Dawson et al., 2020). This high-quality meta-analysis included individual studies published through March 2017, but it did not report specifically on undergraduates and it lacked analyses on plausible effect modifiers other than the control group type. A systematic review of studies published up to 2010, comparing the effects of mental health promotion programs in college students, found that mindfulness training was the most effective program at reducing emotional distress, improving social and emotional skills, and enhancing self-perception among this population (Conley et al., 2013). Other reviews echoed these results for populations of students in health-related disciplines such as psychology, nursing, and physical therapy (Chiodelli et al., 2022; McConville et al., 2017; O’Driscoll et al., 2017). Our meta-analysis of mindfulness interventions focused specifically on undergraduate student health. Furthermore, mindfulness interventions may have differential effects based on a variety of factors, such as gender, race, teacher training, in-person versus online delivery of the mindfulness program, amount of assigned mindfulness practice, and baseline clinical symptom severity, which have received limited attention in extant meta-analyses (e.g., Dawson et al., 2020; O’Driscoll et al., 2017). Crane et al. (2017) described a mindfulness-based program (MBP) as an intervention that meets a minimum set of essential elements (e.g., its teacher has appropriate training and commits to ongoing good practice), which can be considered a quality standard for mindfulness interventions and which was applied here. Finally, relevant meta-analyses have lacked statistical power for moderator analyses other than to examine whether the type of control group (active versus inactive) matters, but enough RCTs are available to enable more complex and correct model testing in undergraduate students.

Thus, the first objective of this systematic review and meta-analysis was to perform a comprehensive analysis of the effects of mindfulness interventions on physical, mental, and behavioral health outcomes in college undergraduate students. This quest is facilitated by recent developments in meta-analytic statistics, especially the ability to analyze all measured outcomes simultaneously; prior meta-analyses have yet to use this strategy. The second objective was to examine moderators of intervention effects to identify factors that may help improve existing university mindfulness programs and guide the design of new programs.

Method

This review was reported using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (Moher et al., 2009).

Eligibility Criteria

The PICOS (participants, interventions, comparisons, outcomes, and study design) framework was applied to identify relevant studies for the current systematic review. Studies were included if the entire sample was composed of college students with mean age no older than 24 years; this value matches global standards of undergraduate age (OECD Indicators, 2017). If the report did not include mean age, the study was included if it reported that at least 50% of the participants were undergraduate students. Mindfulness interventions were included when mindfulness training was an explicit component of the intervention as defined by the authors or the curriculum guide. The mindfulness intervention needed to address both components of the consensus definition that Bishop et al. (2004) proposed, specifically (a) “…involves the self-regulation of attention so that it is maintained on immediate experience...” and (b) “…involves adopting a particular orientation toward one’s experiences in the present moment, an orientation that is characterized by curiosity, openness, and acceptance ” (p. 232). Brief mindfulness induction studies (i.e., those shorter than 60 min) were excluded because the effects of such interventions may be transient, whereas the emphasis in the current review was on longer-term mental and physical health outcomes. Online delivery of mindfulness training was included, and there were no restrictions on delivery method (e.g., synchronous, asynchronous). Outcomes of interest were mental health (e.g., depressive symptoms, anxiety symptoms), physical health (e.g., blood pressure, lipid levels), general health and well-being (e.g., quality of life), and health behaviors (e.g., alcohol use, diet). This review was restricted to RCTs published in peer-reviewed journals or published as theses and dissertations. Any type of control group was eligible and there were no language or publication period restrictions.

Literature Search Strategies

Literature searches were performed in PubMed, EMBASE, APA PsycInfo, CINAHL, the Cochrane Library (Cochrane Database of Systematic Reviews, Cochrane Central Register of Controlled Trials (CENTRAL), Cochrane Methodology Register), and ProQuest Dissertations & Theses databases. Reference lists from retrieved articles were further searched to identify additional relevant studies. The most recent full search was implemented in November 2019. The electronic search strategy in PubMed was as follows: (adolescen* OR juvenil* OR youth* OR teen* OR under*age* OR underage* OR undergrad* OR college*[tw] OR “Universities”[Mesh] OR student*[tiab] OR “Students”[Mesh:NoExp] OR “Adolescent”[Mesh] OR “Young Adult”[Mesh]) AND (mindful[tw] OR mindfulness[tw] OR mindfully[tw] OR “mindfulness”[MeSH Terms]) AND (randomized controlled trial[Publication Type] OR (randomized[Title/Abstract] AND controlled[Title/Abstract] AND trial[Title/Abstract]) OR (randomised controlled trial)). This search strategy was adapted slightly for each successive database/host due to the distinct filters in each database. Supplement 1 provides search strategies for other databases. The search strategy was created and implemented with consultation from a medical librarian.

Study Selection and Data Extraction

Two reviewers independently screened the study titles and abstracts using Abstrackr software (Wallace et al., 2012). Disagreement was resolved through consensus and a senior investigator was consulted to resolve conflicts. Information was extracted independently and in duplicate by two independent investigators and all disagreements were resolved in meetings facilitated by a senior investigator.

Study information was extracted using a standardized data extraction form created using a spreadsheet, enabling data entry in each pre-defined category. Extracted data included the following variables: study population characteristics (type of educational degree; enrolled and analyzed sample sizes; age; gender and race/ethnicity distribution; geographic location); details of the intervention setting and delivery method; intervention characteristics (types of mindfulness training practices included in the intervention such as body scan, awareness of breath exercises, walking meditation, and sitting meditation); number and duration of sessions and homework assignments; overall intervention duration (total weeks of intervention, including the time during which participants were explicitly instructed to practice as part of the intervention; hours in mindfulness practice, if in person); provider qualifications (mindfulness-based training qualified/certified/expert; other); adverse effects monitoring and reporting; and type of comparison condition (active or inactive control). Inactive controls included both waitlist-control and no-treatment groups. If the study included an active control group, then the nature of the comparator was also extracted. Outcomes included the type of measured health outcomes, measurement tools, duration of follow-up periods, and data to calculate effect sizes such as means and standard deviations.

Risk of Bias in Individual Studies and Across Studies

Two evaluators used the Cochrane Collaboration’s risk of bias tool for randomized trials to evaluate methodological quality in individual studies at the outcome level (Higgins et al., 2011); coding took place independently and discrepancies were resolved by discussion, and with a senior investigator, as necessary. This tool addresses specific domains that relate to bias, with each study rated as high risk, low risk, or unclear risk. Studies were not excluded based on risk of bias ratings; instead, the ratings were used to evaluate the quality of included primary studies and how these quality ratings may influence outcomes. Funding sources and potential conflicts of interest were also noted in “other biases” (Figure S1 in Supplement 2 offers full study-level details for each risk of bias item).

Quantitative Strategies

Effect Sizes

The outcomes of interest were operationalized with standardized mean difference effect sizes (d); outcomes were coded so that positive ds represent improved outcomes (e.g., reduced symptoms of depression or anxiety, increased mindfulness) in the mindfulness treatment group relative to the control group. Effect sizes for all health outcomes were calculated when studies provided any type of statistical result that allows for effect size calculation (e.g., means and standard deviations; or conversions of equivalent test statistics using the Campbell Collaboration Effect Size Calculator, http://www.campbellcollaboration.org/escalc/html/EffectSizeCalculator-SMD5.php). Before analyses, the d-values were adjusted for small-sample bias (Hedges, 1981). We also calculated d-values for those studies that reported follow-up measurements of 1 month or more.

Analytic Strategies

Overall effect size estimates and their 95% confidence intervals (CI) were pooled across studies using robust variance estimation (RVE); this strategy allows for all possible contrasts to be included in the analysis. RVE controls for the dependence of effect sizes from the same study due to multiple simultaneous measures (Hedges et al., 2010). It was implemented through the robumeta macro (Tanner-Smith & Tipton, 2014) in Stata 17 (StataCorp). RVE was performed between groups (i.e., treatment vs. control) separately for studies with active controls and inactive controls, for (a) an overall comparison of all outcomes, (b) within each of the five outcome categories, (c) within specific outcomes with sufficient data, and (d) in exploratory subgroup analysis. As RVE is recommended on moderators that have at least 4 degrees of freedom (Tanner-Smith & Tipton, 2014), additional RVE models were conducted as data permitted, by combining active and inactive control groups for the remaining outcomes. Risk behavior was created as a combined outcome, including outcomes related to smoking, substance use, and alcohol. Rho was set to 0.80 for the correlated effects weight, and sensitivity analyses varied rho from 0 to 1 to ensure consistent results. Significance for moderator tests was set at p<.01, consistent with Tanner-Smith et al. (2016) recommendations for databases with relatively small k. Moderators were left in their observed units except in the case of total hours of mindfulness training, for which the best-known transformation was the logarithmic transform. Heterogeneity for each model is reported using τ2 (either overall, in the case of continuous moderators; or within each category for dichotomous predictors).

Moderator Testing

Bivariate moderator analysis was performed on predictor variables, as well as control variables such as demographic and quality features to ensure that results were consistent across these features. Each moderator was evaluated on a bivariate basis using the same RVE assumptions. Subgroup analysis was conducted to compare the categories of each variable, using RVE. There is no current convention about the number of moderators that may be evaluated simultaneously, but one reasonable procedure is to require at least 10 studies for every moderator added (Lipsey & Wilson, 2001). Therefore, moderators that reached at least marginal significance (p<0.10) were evaluated in simultaneous multiple-moderator models, building a cumulative model by starting with those with the smallest p values and adding those with larger values. Moderators that did not attain at least p<0.10 were trimmed and successful moderators were retained in the model. Each moderator was evaluated in sequence, one at a time, while controlling for the initial moderators. The final models included some moderators of interest that did not attain significance at an overall bivariate level, such as standard error and training type (MBSR/MBCT versus other). Such models allowed the possibility that they might attain significance when controlling for other moderators.

The moving constant technique was used to show predicted effects (i.e., a mean effect size, \({\hat{d}}_{+}\), and its 95% CIs) at values of interest for continuous moderators (Johnson & Huedo-Medina, 2011); for categorical moderators, mean effect sizes for each category are instead shown. In the multiple moderator versions, contrast codes were used for categorical variables, and continuous variables were centered at their means so that predicted values are adjusted for all moderators in the model. Effect magnitude used standard benchmarks of small=0.20, medium=0.50, and large=0.80 (Cohen, 1988), with the caveat that improvements in the control group render effect size magnitude more conservative; that is, the true amount of improvement is larger to the extent that the control group also improved. The final multiple moderator models were run using RVE for all outcomes combined, then separately for the three specific outcomes with sufficient data (e.g., anxiety symptoms, depressive symptoms, and mindfulness).

To inspect for publication bias across studies, funnel plots and statistical analyses were used to assess the possibility of bias stemming from the underrepresentation of small sample size studies with null or negative findings (Borenstein et al., 2009). The funnel plots were contoured, which permits a determination of whether unpublished (thesis) versus

published study effects differ in terms of reaching statistical significance (Johnson & Hennessy, 2019); if published effects routinely fail to reach significance, the inference is that there is little concerted pressure selecting for favorable results. Sampling error was used as a predictor of effect sizes (Stanley & Doucouliagos, 2014). Specifically, studies with large standard errors tend to be small sample studies and those with small standard errors are typically large sample size studies; hence, the use of the effect size standard error as a moderator provides a test of small study bias.

Transparency and Openness

The study protocol was preregistered (PROSPERO CRD42017052459) with the intention to perform a meta-analysis if there were sufficient numbers of studies. There were enough studies and the full meta-analytic plan was developed subsequently without further preregistration, guided by the pre-registered mental and physical health outcomes.

Results

Study Selection

As Fig. 1 shows, searches yielded 1625 articles. After de-duplication, 990 articles were screened at the abstract level, 542 were excluded, and 448 articles were assessed for eligibility at the full-text level. This review reports on the final eligible set of 57 unique RCTs with 3746 study participants, which included one study that followed up on another included RCT (de Vibe et al., 2018). Therefore, in total, there were 58 studies (Ajilchi et al., 2019; Alsaraireh, 2017; Askari et al., 2018; Astin, 1997; Baker, 2019; Beerse, 2018; Cairncross, 2019; Chen et al., 2013; Danitz & Orsillo, 2014; de la Fuente et al., 2018; Delgado et al., 2010; Delgado-Pastor et al., 2015Dvořáková et al., 2017; Falsafi, 2016; Fleming et al., 2015; Forman et al., 2016; Galante et al., 2018; Gallego et al., 2014; Greer, 2015; Gross et al., 2018; Gu et al., 2018; Hazlett-Stevens & Oren, 2017; Johnson-Waddell, 2018; Jones et al., 2019; Kang et al., 2009; Kar et al., 2015a, b; Kaviani et al., 2012; Kuhlmann et al., 2016; Levin et al., 2017a; Lyzwinski et al., 2019; Marx, 2016; McClain, 2017; McClintock et al., 2015; McIndoo et al., 2016; McMorran, 2018; Mermelstein & Garske, 2015; Mrazek et al., 2016; Noone & Hogan, 2018; Park, 2014; Ratanasiripong et al., 2015; Shapiro et al., 2008; Shapiro et al., 2011; Shearer et al., 2016; Siembor, 2017; Song & Lindquist, 2015; Ștefan et al., 2018; Symons, 2014; Tang et al., 2007; Tang et al., 2013; Thomas et al., 2016; Yamaji, 2016; Ye, 2017).

Fig. 1
figure 1

Flow of Records and Studies into the Qualitative and Quantitative Synthesis

Study Characteristics

The majority (k=55, 94.8%) of studies were published in 2009 through 2019, with the most reports appearing in 2015 (k=10, 17.2%), 2016 (k=9, 15.5%), and 2019 (k=9, 15.5%) (see Table 1). Most studies were conducted in North America (k=31, 53.4%) followed by Asia (k=16, 27.6%) and Europe (k=10, 17.2%). Within individual studies, sample sizes ranged from 14 to 612 participants (M=96). As Fig. 2 shows, studies have been appearing with increasing frequency, with thesis (Panel A) and online studies (Panel B) being relatively recent phenomena.

Table 1 Features of included trials
Fig. 2
figure 2

Temporal trends of studies, overlaying (A) counts of theses in orange, or (B) online studies in turquoise on the total counts (including each type of study). The best-fitting quadratic line for all included studies by year appears in both figures

Of the 3746 total participants, approximately 71% were female (range, 23 to 100%). Mean age ranged from 18 to 24 years (M=20.6). Six studies did not report on age, but at least 50% of participants were undergraduate students. Among trials conducted in North America that reported on race and/or ethnicity (k=26), participants predominantly identified as either White or Caucasian (i.e., mean=70% Caucasian, range: 18 to 91%). The majority (74%) of studies were published journal articles and 15 theses were included in the dataset. All but two theses were conducted in the United States (US). The majority of studies were conducted using the general student population (74%), with 15 studies specifying a clinical or subclinical population (i.e., students with a clinical mental health disorder diagnosis, students with a threshold score on a mental health rating scale to select those at greater risk of mental health symptoms).

There was substantial variability in the format of the mindfulness programs (Table 1). The majority of studies used in-person delivery (72%), while 24% utilized online delivery, and two studies used a combination of in-person and online. Notably, the majority of online studies were conducted in the US (11 out of 15). Intervention length averaged 6.0 weeks (range: 1–12 weeks) for in-person delivery, and 5.2 weeks for online delivery (range: 3–10 weeks). Interventions conducted in the US were approximately 1 week shorter in duration compared to studies conducted elsewhere.

Fifteen studies conducted short-term follow-up assessments (<6 months) and only three studies conducted long-term follow-up assessments (≥6 months) to measure whether the effects of the intervention persisted over time. Across such studies, the duration between post-treatment assessment and follow-up assessment ranged from 1 month to 6 years.

The majority (70%) of studies used an inactive control arm (e.g., waitlist, usual or routine care, no treatment), and 45% of studies used an active control arm, including progressive muscle relaxation, Hatha yoga, self-help skills handouts, nutrition psychoeducation, lecture on stress and coping, inhibitory control training, inhibitory control training combined with mindful decision training, written research assignments, behavioral activation, heart rate variability biofeedback, and animal therapy. Seven studies monitored adverse effects; of these, two reported adverse effects. Specifically, one study reported increased alcohol consumption in the control group after a Quit-Day Retreat intended to help participants stop smoking and drinking alcohol for 1 month (Davis et al., 2013). A second study reported that some (number not specified) students experienced unpleasant emotional, mental, or bodily states during mindfulness practice; these findings were considered by authors to be an expected result of becoming more mindful of inner experiences (de Vibe et al., 2013).

Forty-seven studies satisfied the criteria by Crane et al. (2017) as MBPs; the remaining 11 lacked sustained intensive mindfulness training, included a significant amount of training in a discipline other than mindfulness, lacked participatory learning between the mindfulness instructor and the participants, and/or did not report the instructor credentials. Thirty-eight percent of studies reported that the instructor training was advanced or at an expert level. Most studies (79%) assigned homework (e.g., home meditation practice). Only 5% included a retreat. Fourteen studies used one of the two standardized protocols, Mindfulness-Based Stress Reduction (MBSR) or Mindfulness-Based Cognitive Therapy (MBCT).

Evidence Map

The evidence map in Fig. 3 illustrates how frequently specific health outcomes were reported across all studies. This map identified areas of sizeable research, and gaps in the literature, on mindfulness interventions for college student health (Fig. 3). Health outcomes were categorized as mental health, physical health, general health and well-being, health behaviors, and mechanistic processes. The mental health domain included 16 specific outcomes; of these, the most frequently measured outcomes were anxiety symptoms (k=27), depressive symptoms (k=25), perceived stress (k=11), psychological distress (k=9), and eating-related outcomes (k=6). Physical health included 11 specific outcomes (e.g., blood pressure, lipid levels), but each outcome was only measured in one or two studies. General health and well-being encompassed measures ranging from averages of combination scores of depression, anxiety, and stress to measures that directly gauge quality of life, such as the World Health Organization Quality of Life inventory; there were also items that gauge mental toughness, life satisfaction, and emotional intelligence. Health behaviors included seven specific outcomes; alcohol use (k=4) was measured most frequently. Eight potential mechanistic processes were measured across studies, but the only one that was measured robustly was mindfulness (k=33). Overall, seven specific outcomes were measured in more than five studies; 32 specific outcomes were measured in only a single study. The evidence map shows that mental health outcomes and mindfulness are frequently measured while outcomes related to other aspects of health (e.g., physical, social, spiritual health) or potential mechanisms (e.g., self-regulation, self-compassion) are not included as often. See Tables S2 and S3 in Supplement 2 for lists of included studies for each specific outcome, and study characteristics, respectively.

Fig. 3
figure 3

Evidence map counting outcomes within general categories of outcomes. Note: Only studies with sufficient statistical information to calculate effect sizes are included. Radius of circles is equal to the log of the total sample size of the study. Abbreviations: ADHD: attention deficit hyperactivity disorder, ERSB: excessive reassurance-seeking behavior, General Health: General health and Well-Being, MID: maladaptive interpersonal dependency, MT: mental toughness, P/B: psychological/behavioral, P/N: positive/negative, PMS: pre-menstrual syndrome, QoL: quality of life, Sx: symptoms

Risk of Bias

As Fig. 4 shows, risk of bias varied across the literature but was generally “low” or “unclear” for the domains of random sequence generation and allocation concealment (Supplement 2, Figure S1 for details on individual studies). The majority of studies had high risk of bias due to lack of blinding for participants, personnel, or both (k=52; 90%), as participants are typically aware when they are doing mindfulness activities. Studies that successfully achieved “low risk” for blinding of participants utilized active controls in such a way that most participants were likely unaware of the proposed effectiveness of their treatment. There was also a high risk of detection bias due to a large reliance on self-report outcomes; blinding of outcome assessors was a concern in 97% of studies (k=56).

Fig. 4
figure 4

Risk of bias findings for included studies

The majority of studies either had low attrition and/or handled missing data appropriately and were labeled as having low risk of bias (k=35; 60%); nine studies (16%) were rated as unclear and 14 (24%) were labeled as high, primarily due greater than 20% attrition (k=11; 19%). For selective reporting bias, studies were primarily rated as low (k=30; 52%) or unclear (k=28; 48%). Studies rated as low risk of selection bias were primarily dissertations (k=15) or had a published protocol (k=7). Two studies were rated as high risk of selective reporting bias due to a lack of a published protocol coupled with limited details of study oversight.

Other biases were identified in six (10%) of the studies and included unclear follow-up periods, the use of financial compensation for completing additional intervention tasks, contamination of control groups by engagement with intervention group participants (e.g., being on the same sports team), and assessments taking place in classroom settings where the interventionist was also the PI. Finally, one study received a “high risk” rating for “other” bias because the lead author was a research associate at a company that develops commercialized web-based programs for mental health issues (Levin et al., 2016).

Overall Quantitative Results

Mindfulness Training vs. Active Controls

As Table 2 shows, overall, pooling effects across all health outcomes, mindfulness interventions outperformed active controls with the 95% CI not encompassing the null (d=0.21; 95% CI (0.03, 0.40); k=24). Note that these and subsequent analyses use all of the mindfulness trials and are not restricted only to those that satisfied Crane et al.’s (2017) MBP criteria. The mean effect size observed did not encompass the 95% confidence intervals at short-term follow-up time points (<6 months) but did at longer intervals (≥ 6 months post-intervention), yet the effect size was the same magnitude (Table 2). When evaluating effect sizes by the five outcome categories, only mental health, health behaviors, and mechanistic outcomes had sufficient data for moderator analyses. Further evaluation by specific outcome revealed three specific outcomes (anxiety, depression, and mindfulness) with sufficient effect sizes and studies for analyses. Mindfulness training did not outperform active controls for anxiety symptoms, depressive symptoms, or mindfulness such that 95% confidence intervals encompassed the effect sizes, although effect sizes trended in favor of mindfulness interventions (Table 2).

Table 2 Mean effect sizes for trials investigating mindfulness training with college students, grouped by whether there were active or inactive controls

Mindfulness Training vs. Inactive Controls

When comparing mindfulness interventions to inactive controls and pooling effects across health outcomes, the mindfulness interventions had larger benefits, with an effect size of d=0.47 (95% CI (0.36, 0.59); k=37). This effect was greater for short-term follow-up assessments (d=0.78; k=8), but was smaller for long-term follow-ups (d=0.20; 95% CI (−0.15, 0.55); k=6). Mindfulness training led to significant and consistent effects on mental health (d=0.53, 95% CI (0.43, 0.64)), which was the most assessed outcome category (k=31). Statistical power was lower for physical health, with an effect size of d=0.45 (95% CI (−0.22, 1.11); k=6). Significant effects in favor of mindfulness trainings were observed for general health and well-being, health behaviors, and mechanistic outcomes, with effect sizes ranging from 0.24 to 0.41. For specific outcomes, the largest benefits were seen for anxiety symptoms (d=0.77, 95% CI (0.51, 1.03); k=18), followed by depressive symptoms (d=0.53, 95% CI (0.38, 0.67); k=16). Mindfulness, which was examined as a mechanistic outcome, also showed robust improvements (d=0.41, 95% CI (0.25, 0.56); k=23).

Pooled effects on additional outcomes with insufficient data to be analyzed separately by control type appear in Supplement 2, Table S4. Psychological distress and well-being showed small, significant improvements in the mindfulness intervention arm, but perceived stress did not. The largest effect size was seen for symptoms of attention deficit hyperactivity disorder (ADHD), but 95% CIs encompassed the null (d=0.71, 95% CI (−0.99, 2.41); k=3). All remaining specific outcome effect sizes encompassed the null but in favor of mindfulness training, except for emotion regulation (k=3) and worry (k=5), which favored the control (active and inactive combined).

Models of Effect Size Variability

Bivariate Models of Intervention Efficacy

In bivariate analyses (Table 3), mindfulness interventions had significantly larger benefits on combined outcomes for clinical (versus general) populations, in-person (versus online) delivery, and reported in journal articles (versus thesis). Studies less than 3 weeks in length did not lead to significant improvements on average. Studies with interventions ranging from 4 to 7 weeks led to smaller effects than studies with interventions lasting 8 weeks or longer. Non-significant trends for larger effects were seen in studies with inactive controls (versus active controls), and in studies with high (versus low) standard error. Other moderators in Table 3 did not reach statistical significance, such as risk of bias, type of intervention (MBSR/MBCT versus others), whether it was retreat (versus not), intervention duration, and study location (Asia versus others).

Table 3 Bivariate analyses of moderators for effects of mindfulness training

Multiple-Moderator Models of Intervention Efficacy

Although bivariate analyses indicated several possible effect modifiers, findings may have been driven by covarying effect modifiers. For example, all studies in clinical populations were delivered in-person, and none was delivered online. Consequently, the effects of in-person versus online delivery may be confounded by the clinical severity of participants at baseline. Multiple moderator analyses adjust for such potential confounders. These multiple moderator models revealed four factors achieved at least marginal significance as predictors of overall effects on combined outcomes, all in patterns parallel to bivariate models: population (i.e., clinical versus non-clinical), control group type (i.e., active versus inactive), publication status (i.e., published versus thesis), and standard error (i.e., small study effects) (Table 4, Model 1, Column 1). Specifically, on average, when controlling for other moderators, in-person interventions worked as well as their online counterparts. Similarly, MBSR/MBCT forms of mindfulness interventions worked as well as other forms. Furthermore, when combining overall results, the length of the intervention was not a significant modifier.

Table 4 Summary of multiple-moderator model results for combined outcomes and for three specific outcomes

Moderator analyses to this point did not distinguish between the particular outcomes. Yet, as a post hoc result, patterns did differ when results were divided for anxiety symptoms, depressive symptoms, and mindfulness, the most-reported outcomes in the literature. For anxiety symptoms (Column 2, Model 1), benefits of mindfulness interventions were especially evident when compared against inactive controls, a statistically significant moderator; none of the other moderators was related. For depressive symptoms (Column 3, Model 4), the duration of the trial was the only significant moderator, such that longer trials succeeded progressively better (and no significant improvement was evidenced until around the 8-week mark). A further analysis (not tabled) considered whether, in in-person trials, total time in mindfulness training explained away this effect; although it was positively associated with effects, this trend did not reach significance, and the effect of intervention length remained significant. Finally, for mindfulness outcomes (Column 4, Model 1), mindfulness training succeeded especially well for clinical samples, in published studies, and in smaller trials; yet, none of these factors reached formal statistical significance.

Small Study Bias

Small study bias, which often reflects reporting or publication bias, was considered from multiple angles. First, the standard error of the effect size was inspected as a moderator, which was statistically significant in the combined moderator model (Table 4; Model 1, Column 1). Larger effects appeared for studies in the highest percentile of standard error, with smaller studies showing larger effects. Second, results also show that publication status was a significant moderator, even after controlling for other predictors; thus, theses had smaller effects than published studies (Table 4, Model 1, Column 1). This finding suggests that theses and, in general, studies with non-significant results may not be published, or, conversely, that published studies selectively report significant findings. With regard to specific outcomes, small study bias emerged in its most marked form for mindfulness outcomes (Table 4, Model 1, Column 4); it was unrelated to anxiety symptoms and depressive symptoms. Finally, funnel plots were created for each outcome category and specific health outcomes with sufficient data (Supplement 2, Figures S2–S9). For mental health, general health and well-being, and for the specific outcome of anxiety symptoms, unpublished studies tended to have smaller effects and more variability than published results. These patterns are consistent with the moderator analysis noted above, which showed that SE was a significant effect modifier. Together, these results suggest the presence of publication bias in mindfulness trials for college students, especially for anxiety symptoms and mindfulness outcomes, although significant heterogeneity plus significant moderators complicates interpretations.

Exploratory Analyses

Tables in Supplement 2 (Tables S4 through S7) summarize several other potential moderators that proved not to be significant when pitted against the moderators reported in Table 4 or else were based on insufficiently large pools of studies for the models to work robustly. Mindfulness training showed significant reductions of psychological distress and improvements in well-being. Results show slightly larger effects for in-person interventions compared to online interventions in the general population; yet, no online clinical interventions were included in the dataset, and clinical populations had larger effects in-person (d=0.75). Thus, the bivariate results showing larger effects for in-person delivery vs. online delivery but that were attenuated by controlling for other factors were likely driven by the confounding effects of population type. Moreover, mindfulness training led to much larger effects on anxiety symptoms and mindfulness in clinical populations compared to non-clinical populations. In the opposite direction, mindfulness intervention reductions of depressive symptoms were slightly larger in the general student-body than clinical populations. Studies from Asia, although showing larger effects than other regions, were more likely to use inactive controls, be performed in-person (as opposed to online), and use standardized protocols such as MBSR or MBCT. Finally, we examined whether the publication date of each trial was associated with effect size; it was not.

Sensitivity analyses were conducted to evaluate the robustness of results, (a) systematically altering the setting of rho in each RVE analysis, (b) repeating primary analyses while excluding outliers, (c) and whether results depended on the comparator (active vs. inactive controls). These analyses yielded similar results to those of the main analyses.

Discussion

Around the globe, universities and colleges often consider whether and how to offer mindfulness training to undergraduate students. This systematic review and meta-analysis focuses particularly on the undergraduate population. Our findings suggest that when examining overall effects across all health outcomes, mindfulness interventions significantly outperformed active controls with a small effect size, and outperformed inactive controls with a small to medium effect size. These findings suggest that mindfulness programs are at least incrementally better than other available interventions and that they may represent a valid and attractive alternative to existing interventions. For example, mindfulness programs are often offered in non-clinical settings, which may be more appealing to certain student populations, as many students avoid psychological treatment due to cultural and stigma-related biases (Wu et al. 2017). Mindfulness interventions in non-clinical settings may allow students to overcome feelings of stigma that may arise when accessing formal counseling or other psychological services (Eisenberg et al., 2009).

Based on its relatively large number of studies (k=58) with a large sample size (= 3746), the present meta-analysis was able to provide evidence for whom and under what conditions mindfulness programs may be most effective. Bivariate moderator analyses suggested two robust patterns, (a) that the mindfulness effect was larger in clinical populations and (b) in published (vs. unpublished) studies (Table 3). Multiple moderator analyses qualified these patterns (Table 4). First, advantages for clinical samples or for publication bias appeared most pronounced for mindfulness outcomes per se and did not reach even marginal significance for other anxiety or depressive symptom outcomes. For anxiety symptoms, mindfulness interventions achieved a significant advantage for trials that had inactive control groups (versus active ones); the effect size was large. For depressive symptoms, a single moderator emerged, duration of trial, such that no significant benefit emerged until 8 weeks; the average effect at 12 weeks was medium to large in magnitude. Finally, multiple moderator findings suggest that online versus in-person delivery, instructor expertise, and MBSR/MBCT status are not strong drivers of differential effects. These patterns may change with improved measurement and reporting on these outcomes, along with replication, but at present, the four aforementioned variables do not have the strongest evidence for differential effects in mindfulness interventions in college undergraduates.

The current systematic review’s findings are in general agreement with previous systematic reviews on outcomes such as depressive symptoms, anxiety symptoms, and stress (Bamber & Schneider, 2016; Dawson et al., 2020; de Vibe et al., 2017; O’Driscoll et al., 2017). In addition, this review advances past systematic review findings by presenting an evidence map of all measured health-related outcomes to highlight gaps in the literature (Fig. 3). Findings are most extensive for mental health outcomes, while limited findings are available for health behaviors (e.g., alcohol and drug use, physical activity, diet) and physical health outcomes (e.g., weight, sleep issues). Additionally, although certain studies have found promising benefits for ADHD symptoms, more studies are needed in the undergraduate population (Cairncross & Miller, 2020). Moving forward, our coding of risk of bias suggests that improving methodological rigor is of paramount importance for the field, following best practices recommendations that have appeared elsewhere (Loucks et al., 2021).

Limitations and Future Research

Strengths of this systematic review and meta-analysis include its thorough search for trials, data extraction performed in duplicate, adhering to PRISMA reporting guidelines, systematically addressing risk of bias in primary studies, and performing meta-analyses that also explained existing heterogeneity, where appropriate. The review benefited from the sizeable numbers of studies on depression and anxiety symptoms available to be included in meta-analyses. In turn, the larger database permitted more advanced meta-analytic modeling featuring multiple moderators (see especially Table 4).

Limitations of the literature include the potential for bias in several included studies due to instances of lacking adherence to CONSORT guidelines and the Cochrane Collaboration’s risk of bias tool (Higgins et al., 2011) (Fig. 4). A limited number of studies reported follow-up time periods of at least 1 month, and even fewer looked at long-term impact; consequently, the sustained impacts of mindfulness interventions on health-related outcomes for college undergraduate students remain unclear. Only five studies monitored adverse effects, highlighting the need for active monitoring of adverse events in mindfulness studies (van Dam & Vugt, 2018). The majority of study participants were White and female, which limited generalizability to other gender and racial/ethnic groups, demonstrating the need for further research in more diverse populations.

Some mindfulness programs, particularly in the US and Europe, have been tailored to encompass elements of various cultures and focus on non-White populations (Proulx et al., 2018); effect sizes may change as this process unfolds. A recent systematic review and meta-analysis showed small but significant effects of mindfulness-based programs in people of color in the US (Sun et al., 2021), suggesting effects are promising although potentially smaller in this population to date. The somewhat smaller effect sizes found in theses compared to published studies suggest publication bias or selective reporting of outcomes, although this tendency did not reach formal significance in multiple-moderator models; thus, publication bias does not appear to be a strong threat to the trends we have documented.

Some attention to our systematic review methods is also in order. Our inclusion criteria were broader than the MBP standard that Crane et al. (2017) introduced. It should be recognized that some reports may have lacked the necessary detail to determine whether they fully matched these criteria. Yet, interventions that explicitly and fully satisfied these criteria were on average superior in their effects (e.g., Table 3), supporting the validity of the criteria by Crane et al., but this difference did not achieve formal statistical significance. Future meta-analyses might more thoroughly vet the quality of the interventions. For example, they might contact trialists to obtain instructor credentials and treatment manuals, a method the current systematic review lacked. Furthermore, with the literature review performed on studies available through November 2019, future reviews on more recent studies will continue to improve accuracy about the effects of mindfulness interventions in young adults, particularly when utilizing data from high-quality studies with minimal bias.

As we noted in the “Transparency and Openness” section, the decision to conduct a meta-analysis of specific outcomes was guided by the mental and physical health outcomes pre-registered on PROSPERO. The full meta-analytic plan was made without further preregistration, which could increase risk for inadvertent selection bias. Our work documented the rapidly growing pace of relevant trials appearing in the literature (Fig. 2). There is no doubt that these trends continue at the time of writing; our online database repository lists trials that may qualify for a new meta-analytic review by a future team sufficiently resourced to conduct it.

Future meta-analyses should incorporate all available evidence. The fact that comparators sometimes appear to affect outcomes implies that a more accurate gauge of effects might well result from using change over time as the primary effect size within all trial arms and including uncontrolled mindfulness trials as well, similarly to the recent meta-analysis by Tran et al. (2022) of 146 mindfulness RCTs. This review showed that changes in mindfulness were related to improvements in mental health. Results from uncontrolled trials should be compared against the mindfulness arms in RCTs (and against effects in control arms) and relevant factors of methodological quality should be coded (e.g., preregistration; conflict of interest). Future inclusion of small dose/induction mindfulness interventions will allow dose-response functions to be further assessed, considering that fully 38 of such trials were omitted from the current meta-analysis. Thus, such databases promise to be markedly larger than those focused purely on RCTs and thus would have certain quantitative advantages.

In conclusion, this systematic review and meta-analysis found evidence that mindfulness interventions may have pragmatic utility for improving the mental health of college undergraduate students, particularly for anxiety and depressive symptoms. As higher education institutions consider whether to implement mindfulness interventions for their students, these findings provide supportive evidence that mindfulness training may be one effective method in managing the deleterious health effects often associated with the pressures of college life. Given the current unprecedented mental and behavioral health concerns in college students, mindfulness interventions may represent a path forward to a happier and healthier college student population.