Introduction

Treatment fidelity refers to methodology used to monitor and enhance the accuracy and consistency of a behavioral intervention (Bellg et al. 2004). Adequately monitoring and reporting treatment fidelity when publishing efficacy outcomes allows for assessment of whether observed treatment effects in an empirical trial are attributable to the intervention delivered (Carroll et al. 2007; Leeuw et al. 2009). Higher levels of treatment fidelity are associated with stronger, more interpretable, treatment outcomes and grant more confidence in specifying the underlying mechanisms of change (Carroll et al. 2007). Furthermore, only when treatment fidelity is executed with high integrity can researchers conclude that participant outcomes are due to the intervention curriculum and not random events. For example, if a program is not executed as intended, researchers run the risk of making a type III error (i.e., determining the program is not effective due to poor treatment fidelity or implementation failure), which incurs an economic and scientific burden (Gould et al. 2016; Breitenstein et al. 2010; Borrelli 2011). Reporting of treatment fidelity in empirical articles presenting behavioral intervention outcomes is not standard practice, drawing attention to an important weakness to address in behavioral science.

Guidelines for the assessment and practice of treatment fidelity in behavioral interventions have evolved over time, and various frameworks have been offered (e.g., Gould et al. 2016 and Lichstein et al. 1994). In 2004, the Treatment Fidelity Workgroup of the National Institutes of Health Behavior Change Consortium (BCC) synthesized existing definitions, methodologies, and measurements across frameworks and put forth new recommendations for treatment fidelity among behavior change interventions (Bellg et al. 2004). NIH sponsored research must follow extensive policies and procedures to maintain scientific integrity and rigor (NIH 2012), including treatment fidelity. Thus, for the current review, we selected the NIH-developed BCC guidelines instead of others because they appear to be the most comprehensive to date and were compiled by expert consensus.

The BCC recommendations detail five components of treatment fidelity to be considered when monitoring and reporting efficacy of behavioral interventions: design, training, delivery, receipt, and enactment (Resnick et al. 2005; Bellg et al. 2004). Design, training, and delivery components focus on the treatment provider while receipt and enactment focus on the participant. Design involves methodological strategies to ensure that a study can test hypotheses in relation to underlying theories and clinical processes (i.e., the intervention is delivered in the same dose within and across conditions, and a plan is developed for implementation setbacks). Training involves strategies to ensure interventionists have been properly trained on delivering the intervention to the target population (i.e., standardize training using curriculum manuals, ensure provider skills acquisition, minimize drift in provider skills, and accommodate for provider differences). Delivery involves strategies to monitor the intervention to assure it is delivered as intended (i.e., control for provider differences, reduce differences in treatment delivery, ensure adherence to treatment protocol, and minimize contamination between conditions). Receipt involves strategies to monitor and enhance participants understanding and performance of intervention related skills and strategies during the period of intervention delivery (i.e., ensure participant comprehension including cognitive capacity and behavioral performance). Enactment involves strategies to monitor and enhance participants’ performance of intervention related skills and strategies in daily life outside of the intervention setting (i.e., ensure participants’ use of behavioral and cognitive skills). These fidelity components serve as guidelines that can be adapted for a variety of behavioral interventions. Newer behavioral interventions showing rapid growth and acceptance in scientific spheres and those gaining public appeal are in particular need of ensuring treatment fidelity in order to enhance confidence that scientific findings are actually due to the active ingredients of the intervention (Borrelli 2011), and to reduce the risk and costs of a type III error.

Mindfulness training is a relatively newer behavioral program that shows increasing promise for efficacy to improve stress-related ailments, psychiatric disorders, and disease symptoms (Black and Slavich 2016; Goyal et al. 2014; Hofmann et al. 2010; van der Velden et al. 2015; Black 2012, 2014; O'Reilly et al. 2014). Mindfulness-based interventions (MBIs) represent a family of programs developed with the goal of helping people cultivate an ongoing daily practice of mindfulness operationalized as “the awareness that emerges through paying attention on purpose, in the present moment, and nonjudgmentally to the unfolding of experience moment by moment” (Black 2012, 2014; Kabat-Zinn 2003). MBIs are part of a third wave of empirically tested psychotherapeutics. The first two—behavioral therapy and then cognitive behavioral therapy—focus on modification of thoughts, feelings, and behaviors whereas MBIs focus on developing metacognitive awareness, acceptance, and a non-reactive stance to those same experiential processes (Crane et al. 2017). Given the complex nature of delivering MBIs (e.g., interventionist mastery of concepts, embodied skills, and personal practice) and that high levels of treatment fidelity are associated with stronger program effects (Borrelli 2011), evaluating MBI treatment fidelity and how methods and measures account for participant outcomes is a logical next step in improving scientific rigor and interpretability of findings.

In order to validly and reliably test if and how variation in implementation is related to participant outcomes across trials, we first need to describe how treatment fidelity components and subcomponents are currently being conducted and reported in mindfulness literature. This information can foster an understanding about where implementation gaps exist and offer recommendations for improvement. For example, a study may have conducted treatment fidelity to the highest standard using the BCC framework but did not publish that information or published it separately from participant outcomes (e.g., Zgierska et al. 2017). Two recent systematic reviews of treatment fidelity have highlighted this gap in the fields of school-based mindfulness and yoga interventions (Gould et al. 2016) and pediatric obesity intervention trials (JaKa et al. 2016). Both reviews identified low and inconsistent reports across all treatment fidelity components and offered recommendations for researchers and clinicians to enhance the development and publication of their multicomponent treatment fidelity methods. Yet, neither developed a standardized protocol for practicing and reporting on treatment fidelity. Furthermore, Gould et al. (2016) found less than 20% of MBI programs delivered to youth reported any component of fidelity implementation beyond participant dosage. However, dosage is only one element of treatment fidelity and no similar examination has been conducted for adult samples, which is where the majority of empirical evidence in the mindfulness field has accumulated. One way to approach this implementation limitation is to establish measurable criteria and offer a reporting tool.

Our current review of the literature describes how RCT studies testing MBIs report on the five treatment fidelity components outlined by the BCC guidelines. Specifically, we (1) identify efficacy trials of established MBIs that report on study treatment fidelity within a published main outcomes article, (2) describe treatment fidelity methods and measures in these articles using the BCC guidelines, (3) determine if identified components of treatment fidelity were integrated in an analysis with participant outcomes, and (4) provide a treatment fidelity tool adapted from the BCC guidelines that we tailored for MBIs. Our proposed Treatment Fidelity Tool for Mindfulness Based Interventions is intended to help researchers and program developers monitor and report treatment fidelity using common methods and measures.

Method

Literature Search and Study Selection

To identify published articles for inclusion in this methodology review, the authors determined the parameters for the search and the first author searched PubMed articles from 1966 (date of the first mindfulness publication) to February 27, 2017, using the following combined key words: clinical trial OR controlled trial AND mindfulness-based intervention. This was followed by a more specific search for Mindfulness-Based Stress Reduction, OR Mindfulness Based Cognitive Therapy, OR Mindfulness Based Relapse Prevention, OR Mindful Awareness Practices, OR Mindfulness-Oriented Recovery Enhancement, OR Mindfulness-Based Eating Awareness Training. The first author read titles and abstracts of approximately 202 retrieved articles and determined 116 did not meet the following inclusion criteria: (1) an efficacy trial of an established MBI and (2) participants 18 years of age or older. The first author then downloaded the remaining 86 articles and searched within their full text to determine whether they included (3) main outcomes from an experimental trial (secondary outcome articles were excluded to minimize redundancy) and (4) a description of treatment fidelity. Inclusion criteria for this review were set to maximize similarity of design and participants to most accurately compare articles. While it is essential for all behavioral interventions to conduct treatment fidelity, regardless of study design (e.g., experimental vs. quasi-experimental) and study phase (e.g., efficacy vs. effectiveness), we only included MBI experimental efficacy trials (i.e., RCTs). This criterion permitted a critical review and comparison of studies that have the most impact on evidence-based practice, robust interpretations of validity, and similar resources and requirements to conduct treatment fidelity. Study authors were not contacted for additional information, as the purpose of this review was to identify how treatment fidelity is commonly reported within available mindfulness literature. Twenty-five articles met all criteria and are included in this review.

Data Abstraction

The first author abstracted relevant data from each article under the following domains. “Sample” included participant demographic information and target outcome (i.e., reduction of substance use or substance use disorder symptoms, reduction of high blood pressure). “Study design” included study information on randomization and definition of treatment groups. Under treatment fidelity, “design” included any information about the program’s intervention and control condition(s), development of curriculum, and any mention of program adaptation. “Training” included any information about how the interventionists were initially trained and specifically trained on intervention curriculum content and delivery. “Delivery” included any methods of monitoring, evaluating, and supervising interventionists for competence and adherence during the trial. “Receipt” included any information about participants’ attendance, engagement, and acceptance. “Enactment” included any information about participant application of intervention skills in daily life. “Treatment Fidelity Measures Used in Participant Outcome Analyses” included information on use of identified treatment fidelity measures in participant outcome analyses, such as number of sessions attended as predictor for time to relapse. The first author checked abstracted data for errors and considered a treatment fidelity component to be present if at least one subcomponent strategy was described in the article.

Data Availability

All data generated or analyzed during this study are included in this published article [Table 2].

Results

Our literature search identified approximately 202 articles of which 25 (12%) were judged to meet study criteria representing a main outcomes article from a MBI RCT for participants ≥ 18 years old that described study treatment fidelity. That is, 116 of 202 (57%) were not RCTs for adults, and 61 of the remaining 86 (71%) experimental trials reviewed did not clearly describe treatment fidelity within a main outcomes article. Of the 25 studies included in this review, 9 reported on Mindfulness-Based Stress Reduction (MBSR), 11 on Mindfulness-Based Cognitive Therapy (MBCT), 2 on Mindfulness-Based Relapse Prevention (MBRP), 2 on Mindfulness-Oriented Recovery Enhancement (MORE), and 1 on Mindfulness-Based Eating Awareness Training (MB-EAT). Table 1 shows that 25 articles (100%) reported on design, 24 (96%) reported on training, 23 (92%) reported on delivery, 23 (92%) reported on receipt, and 16 (64%) reported on enactment. Fourteen (56%) reported on all five components. Eleven (44%) articles analyzed measures from receipt and enactment with participant outcome measures. Overall, we found variation in (1) the methods and measures used and the details provided for each treatment fidelity component, and (2) the report and results of analyses linking treatment fidelity components and main outcome measures. Below we detail specific consistencies and inconsistencies in the reporting of treatment fidelity in MBI efficacy trials. Our inclusions of quotations from original articles are used to demonstrate the various reporting styles used when describing treatment fidelity components. See Table 2 for full details.

Table 1 Reporting of treatment fidelity components in MBI RCTs by program
Table 2 Treatment fidelity reported in mindfulness-based intervention RCTs (N = 25)

Design

Design was reported in 25 (100%) articles. All articles included the name of the formal group-based intervention program by referencing the program developer and year. All articles included program duration in number of weeks, sessions per week, and number of hours per session. Duration, frequency, and time per session ranged from 6-weekly 1.5-h sessions to 10-weekly 2-h sessions. The most common format (46% of studies reviewed) was 8-weekly 2-h sessions. All articles included information on program adaptations of group size, duration, and target population, if applicable. For example, Geschwind et al. (2012) reported using MBCT by Segal et al. (2002) for 8-weekly 2.5-h sessions with 10–15 participants who had residual symptoms and history of depression.

Inclusion of study rationale varied among MBI program adaptations and comparison groups. For example, one article reportedly adapted the original MBCT program by Segal et al. (2012) for participants with treatment resistant depression, cited curriculum publications, and described adaptations (e.g., “shortened meditations to max 30 minutes, emphasize mindful movement, explore barriers to practice and focus on acceptance of emotional events”) (Eisendrath et al. 2016). In contrast, another article reportedly adapted the MBCT manual by Segal et al. (2012) with minor alterations to address suicidality (e.g., “introduction of a crisis plan and cognitive components addressing suicidal cognitions and hopefulness”) (Hargus et al. 2010). Additionally, details provided for comparison/control groups varied. For example, Hargus et al. (2010) reported using a wait-list control or treatment as usual comparison group and provided few details for what that entailed, while Gross et al. (2010) reported using a dose-matched attention control group and provided equivalent information and citations for both groups.

Training

Training was reported in 24 (96%) articles. All but one article (van Son et al. 2013) detailed the interventionist’s training, background, and/or experience (e.g., MBSR certified, LSW interventionist with training in MBCT). However, these details varied across articles. For example, Carmody et al. (2011) reported that “Classes were conducted by Center of Mindfulness instructors” whereas Cherkin et al. (2016) reported on the number of interventionists, years of experience delivering the intervention, and where the interventionist received certification for their respective intervention. The details on interventionist training and experiences also varied between groups within the same study. For example, Hughes et al. (2013) reported a clinical psychologist with MBSR training delivered program sessions but did not detail the credentials of the interventionist who delivered the control condition, whereas Garland et al. (2014) reported a nurse trained in MBSR with over 10 years of experience delivered program sessions and a doctoral-level student in clinical psychology with CBT-I training delivered the control condition.

Beyond interventionist qualifications, information about the interventionist training on the program-specific curriculum was infrequently reported or provided with little detail. For example, one article reported holding a 7-day intensive training for interventionists (Segal et al. 2010) while another reported assessing interventionists for competence and adherence and interventionists only progressed to delivering the intervention if all domains were clearly established (Kuyken et al. 2015). McManus et al. (2012) reported interventionists received preliminary supervision. While other articles only stated that MBSR interventionists were simply “in agreement with content and format of the MBSR course manual” (de Vibe et al. 2013) or did not report this element of treatment fidelity (Williams et al. 2014).

Delivery

Delivery was reported in 23 (92%) articles. Fifteen studies (65% of those reporting delivery) reported one or more subcomponent of delivery including recording intervention sessions (using either audio or video), reviewing and rating recordings for interventionist adherence and competency, and/or providing interventionists with supervisor feedback in real-time. Nine (39% of those reporting delivery) articles reported on at least one subcomponent such as interventionists received weekly supervision (Bowen et al. 2009) or were monitored for adherence to treatment protocol (Barnhofer et al. 2009). Two articles did not report on any subcomponent of delivery (Carmody et al. 2011; Palta et al. 2012). Among studies that included an active control group, all ten (40%) articles described equivalent methods of monitoring the delivery of both groups (Cherkin et al. 2016; Eisendrath et al. 2016; Garland et al. 2016; Garland et al. 2014; Gross et al. 2010; Hoge et al. 2013; Hughes et al. 2013; Shallcross et al. 2015; Williams et al. 2014; Kristeller et al. 2013).

Descriptions of how treatment fidelity delivery was assessed varied greatly in both detail and rigor. For example, Bowen et al. (2014) reported, “Treatment adherence to RP and MBRP established with weekly supervision and review of audio recorded sessions. Competence evaluated by random selection of 50% of sessions from 8 MBRP cohorts, each rated by 2 of 3 independent raters… Raters attended practice and review meetings until acceptable reliability was achieved, with regular recalibration sessions to prevent drift. Using 1-way random-effects models, interrater consistency was adequate for mean ratings of competence (intraclass correlation coefficient, 0.77), with mean (SD) competence rated between adequate and good (4.63[0.42])” on a MBRP-AC, 7 point scale an independent person to rate intervention sessions using assessment scales (e.g., MBRP-AC: MBRP Adherence and Competence Scale in (Bowen et al. 2014)). In contrast, de Vibe et al. (2013) reported, “Instructors consulted with each other after every class to ensure programme fidelity.”

Receipt

Receipt of the MBI was identified in 23 (92%) articles. Seventeen (74%) of those articles explicitly reported they collected participant attendance at intervention sessions. All but two of those (Eisendrath et al. 2016; Kristeller et al. 2013) reported average proportion of participant session attendance. Six (26%) articles implicitly reported they collected participant attendance to intervention sessions (e.g., attending at least 4 sessions was considered minimal dose for Bondolfi et al. (2010)). Furthermore, two articles reported they collected participant ratings on intervention credibility (Shallcross et al. 2015; Williams et al. 2014). One article reported they evaluated level of participation beyond attendance (Palta et al. 2012). Three (12%) of all articles reported more than one of the above methods (Palta et al. 2012; Shallcross et al. 2015; Williams et al. 2014).

While nearly all articles reported collecting measures that we considered treatment receipt, the measures and definitions varied. For example, the proportion of sessions participants were required to attend per protocol completion ranged across studies from 12.5% or 1 of 8 sessions (e.g., Hoge et al. 2013) to 62.5% or 5 of 8 sessions (e.g., Garland et al. 2014). Furthermore, Shallcross et al. (2015) reported using the Treatment Credibility and Expectancy Questionnaire, which is a validated measure that assesses participant perspective and fit of treatment. In contrast, other articles reported collecting an observational evaluation of participation completed by a study team member using a checklist (Palta et al. 2012) or provided an unclear description such as “rated credibility of treatment on 0–10 scales” (Williams et al. 2014).

Enactment

Enactment of the MBI was reported in 16 (64%) articles, 12 of which (75%) reported collecting mindfulness practice logs. Five articles reported collecting Five-Facet Mindfulness Questionnaire (FFMQ), a reliable and valid instrument for assessing five distinct elements of mindfulness across diverse populations (Bowen et al. 2009; de Vibe et al. 2013; Eisendrath et al. 2016; Garland et al. 2016; McManus et al. 2012). Two articles reported collecting Mindful Attention Awareness Scale (MAAS), a reliable and valid instrument for assessing trait mindfulness across diverse populations (Gross et al. 2010; Moynihan et al. 2013). One article reported a post-hoc questionnaire on mindfulness practice across three study periods (Bondolfi et al. 2010). Four (24% of those reporting any aspect of enactment) articles reported collecting both practice logs and FFMQ or MAAS (Bowen et al. 2009; de Vibe et al. 2013; Gross et al. 2010; Eisendrath et al. 2016).

While we identified three measures used across 16 studies to assess levels of enactment, there was some variation in how articles defined similar measures, such as practice logs. For example, one article calculated total number of minutes per day (Geschwind et al. 2012), another calculated number of days and hours per week (Barnhofer et al. 2009), and another calculated a practice score (0–20) based on four questions pertaining to the past month (de Vibe et al. 2013). Depending on the study design and resources, these measures were collected at only one or up to four time points.

Treatment Fidelity Measures Used in Participant Outcome Analyses

Eleven (44%) articles tested for an association between a treatment fidelity measure and study outcome. All measures of treatment fidelity that were identified in analyses with participant outcomes were from the components of receipt (e.g., session attendance) and enactment (e.g., FFMQ and practice logs). Four reported correlation analyses using receipt or enactment measures (Eisendrath et al. 2016; Garland et al. 2016; Gross et al. 2010; Kristeller et al. 2013). Two reported moderation with interaction term in regression analyses using both receipt and enactment measures (de Vibe et al. 2013; Shallcross et al. 2015). One reported moderation analyses using median split of enactment measures (Lengacher et al. 2014; Shallcross et al. 2015). Two reported mediation analyses using enactment measures (Eisendrath et al. 2016; McManus et al. 2012). One reported t test analyses using enactment measures (Geschwind et al. 2012; Moynihan et al. 2013). One reported hazard ratio analyses using receipt measures (Kuyken et al. 2015). One reported Fischer’s exact test using enactment measures (Bondolfi et al. 2010).

Higher reports of treatment fidelity did not always associate with greater improvement in participant outcomes. Articles that tested the association between receipt and enactment measures with main outcome measures using correlation and t tests found a significant positive association between treatment fidelity and greater improvement in participant outcomes. However, articles that reported testing associations between receipt and enactment measures with main outcome measures using hazard ratio and Fischer’s exact found nonsignificant differences between intervention groups. Articles that reported testing receipt and enactment measures with main outcome measures using mediation and moderation analyses found mixed significant results. For example, one article reported greater changes in FFMQ global scores mediated the relation between group condition (MBCT vs. Usual Services) and improvements in health anxiety among a group of adults diagnosed with hypochondriasis (McManus et al. 2012). While another article reported greater changes in FFMQ global scores did not mediate the relation between group (MBCT vs. Health Education Program) and observed improvements in depressive symptoms among a sample of adults diagnosed with treatment resistant depression and failed anti-depressant medication treatment (Eisendrath et al. 2016).

Discussion

We reviewed reports of treatment fidelity in the MBI efficacy literature. Only 25 (12%) of the 202 MBI articles identified represented RCTs testing a MBI among adults that described treatment fidelity in a main outcomes paper. Among articles that met all inclusion criteria, there was high variation in (1) the way each component was monitored and reported as well as (2) the report and results of analyses linking treatment fidelity components and trial outcome measures. Therefore, as a possible solution to the field’s general lack of reporting treatment fidelity with consideration of limited journal space, we developed a treatment fidelity tool to facilitate consistent collection and report of these methods and measures in published studies.

Our findings indicate that better reporting of MBI treatment fidelity is needed. Under the assumption that less than a third of the identified MBI efficacy studies (25 of 86) conducted treatment fidelity, we conclude that there is some threat to robust interpretations of MBI trials given the inherent influence of treatment fidelity on reliability and validity of findings. However, we only included RCTs that reported on treatment fidelity in the main outcomes article, which is not yet required by journals. Thus, we cannot assume that lack of specifying treatment fidelity methods and related findings in published articles means that such methods were not used and/or data collected. Given that behavioral intervention trial findings have limited interpretation value if they lack treatment fidelity (Forgatch et al. 2005), it is important researchers report methods and measures of treatment fidelity in main outcome articles to inform readers if the intervention was implemented as designed and then accurately tested in the experimental trial (Resnick et al. 2005). Similar to the CONSORT statement and checklist that over 400 journals promote for detailing the design of RCTs (Jull and Aye 2015; Schulz et al. 2010), we recommend improving reporting standards on treatment fidelity when publishing MBI efficacy studies as well as other behavioral interventions.

Using the BCC guidelines, we found consistencies and inconsistencies in how authors described treatment fidelity methods and measures. Articles reviewed were largely consistent in detailing the treatment fidelity component of design. While one to two subcomponents of training, delivery, and receipt were reported by majority of articles, the details provided regarding methods and measures varied considerably (e.g., interventionist rated using validated adherence and competence assessment scales by multiple independent persons vs. interventionist-to-interventionist consultation for monitoring program delivery); thus, limiting interpretation of the effect treatment fidelity may have on intervention efficacy. Enactment was least often reported in articles. This may lead to a limited understanding of potential underlying mechanisms of change in producing beneficial outcomes given the distinction of enactment measuring outside intervention practice versus receipt measuring within intervention exposure. Our findings are comparable to those of a systematic review that also used the BCC guidelines to identify reporting of treatment fidelity in the field of obesity interventions (JaKa et al. 2016). JaKa et al. also found reporting of design elements to be most common practice, while consistent report of elements in training, delivery, receipt, and enactment were lacking. Eighty-seven percent of studies they reviewed reported less than half of the BCC items. While our findings indicate that the proportions of components reported are higher in MBI RCTs, this is likely because we only included studies that described treatment fidelity and considered a component was reported if at least one subcomponent was present. However, paucity of treatment fidelity reports in published literature remains a weakness across these behavioral interventions, which limits our ability to conclude an intervention is efficacious due to the intended program elements (Breitenstein et al. 2010; Gould et al. 2016; Lichstein et al. 1994).

From the BCC guidelines, we identified two measurement gaps in MBI literature. First, no studies reported assessing participants’ understanding of mindfulness, which is described by the BCC as a subcomponent in receipt. To date, researchers tend to gauge participant receipt through utilization of mindfulness by collecting self-report logs on the type and duration of practices between-sessions and at post-intervention. However, we believe the quality of one’s practice is limited to their comprehension and competence, such that measuring the quantity of home practice does not gauge the degree to which participants properly comprehend and adhere to the principles of mindfulness practice (Lloyd et al. 2017). To our knowledge, there is no tool that directly assesses whether participants understand and/or demonstrate appropriate utilization of mindfulness skills. Example items may include, “True or False: My mind should be completely clear when I meditate” (answer: false, this is a common misconception) and “List the components in the Triangle of Awareness” (answer: thoughts, emotions, physical sensations). The development of such a tool could be useful in interpreting if and how participant’s understanding of mindfulness principles and practices associate with intervention utilization and outcomes. Second, direct measures of program acceptability and satisfaction were not reported in any of these articles and thus unable to be assessed with participant outcomes. The closest approximation of this element, if present, was often found in results under “Feasibility/Acceptability” where authors reported information on drop-out, attendance, and self-report practice (e.g., Garland et al. 2014). However, we believe these objective measures do not encompass participants’ subjective experience of MBIs and individual differences in mindfulness uptake (e.g., validated measures of participant satisfaction and program acceptability). This may be an important oversight if program acceptability functions as a mediating variable or necessary but insufficient element between MBI and change in treatment outcomes. To the best of our knowledge, no published research has examined whether measures of program understanding and acceptability mediate MBI participant outcomes.

We found that receipt and enactment components of treatment fidelity were integrated in an analysis of participant outcomes in 44% of included articles. However, across articles there were inconsistent findings in the relation between degree of treatment fidelity and improvement on main outcome measures, despite the use of similar measures and statistical methods. This aligns with a review by Gould et al. (2016) who used a less common treatment fidelity guideline (i.e., CORE: conceptualize core components, operationalize and measure, run analyses and report/review findings, enhance and refine) to assess rigor and reporting of treatment fidelity among school-based mindfulness programs for youth. We found similar reports regarding what Gould et al. described as participant dosage, such that the majority of articles reported information on attendance (% of sessions attended) and outside practice (# days/week or # minutes/day). Furthermore, Gould et al. identified six or 13% of articles they reviewed associated treatment fidelity measures of participant dosage with an intervention outcome, and the significance of those results was also mixed. For example, five of the six studies found at least one, but not all, element of attendance or practice was significantly associated with participant outcomes. However, each program utilized their own dosage cutoff (e.g., 4 or more days per week, 20+ minutes per day, 70% session attendance, etc.) and it is unclear whether those cutoffs were set a priori. One plausible reason for mixed findings regarding treatment fidelity used in outcome analyses could be the non-standard collection and definition of mindfulness practice logs and required session attendance. Mindfulness practice logs are inherently subject to report bias, and inconsistent definitions of practice (e.g., minutes vs. days) and attendance (e.g., 1 vs. 4 sessions required) may restrict our understanding of participant outcomes related to dosage. While comparing efficacy of MBIs across trials would be useful in evaluating treatment effects (e.g., identify how much practice is “needed” to significantly improve outcomes for different conditions and populations), such an assessment requires that MBIs incorporate use of consistent definitions, measures, and reporting of treatment fidelity components.

To help researchers conduct and then report treatment fidelity in a simple and standardized format, we developed a Treatment Fidelity Tool for MBIs (Table 3). This tool encompasses 15-items, each informed by our synthesis of MBI RCTs using the BCC framework. We recommend researchers use this tool to consistently assess and report on all items under each treatment fidelity component in main outcomes articles. This reporting practice will likely enhance MBI interpretability and integrity. Instructions for use are threefold. First, we encourage researchers conducting MBI trials to use this tool in their development of a treatment fidelity plan to address each of the five BCC fidelity components (i.e., design, training, delivery, receipt, and enactment). Second, we encourage researchers to complete the checklist in the center column indicating the fidelity methods used in their study. Third, we encourage researchers to write additional descriptions of how they approached each point and provide corresponding data, when applicable, in the space within the far-right column. See Table 4 for a completed example based on protocol from Moment-by-Moment in Women’s Recovery (MMWR), a randomized controlled trial testing the efficacy of a mindfulness-based relapse prevention program for racially/ethnically diverse women in residential treatment for substance use disorders (Amaro and Black 2017). Completing and including this checklist in main outcome papers will improve transparency of treatment fidelity methods and measures, which will provide critical information for the interpretation of MBI trial results.

Table 3 Treatment fidelity tool for mindfulness-based interventions
Table 4 Completed example of the treatment fidelity tool for a MBI

Limitations and Future Research

Our interpretations are limited as we only included articles that described treatment fidelity methodology in a main outcomes paper. While 61 of 86 studies were excluded due to lack of description of treatment fidelity methodology, some of those may have assessed treatment fidelity but did not include it in the published paper(s) because of journal page limitations or author reporting style. Furthermore, we only included RCTs with the intent to focus on studies with the highest standard for internal validity; consequently, we omit quasi-experimental studies that may have reported on treatment fidelity. If we included quasi-experimental designs, we would expect more studies with lower reports of treatment fidelity compared to efficacy studies since treatment fidelity implementation requires extra staff time and costs in which experimental studies receive more resources to execute (Borrelli 2011). Finally, only the first author identified studies and abstracted data, which limits the replicability of our results due to human error (e.g., rater bias). However, the first author checked abstracted data multiple times and we aimed to critically review what has been published on MBI treatment fidelity versus systematically reviewing the literature. Therefore, we believe the quality and diversity of the programs reviewed allowed for a balanced representation of published articles and tool development.

Our Treatment Fidelity Tool for MBIs can address some of the evident limitations in the field. Ultimately, what is measured is equally as important as how it is measured (Breitenstein et al. 2010). Thus, our intent in developing this tool is to promote standardized and routine practice and report of treatment fidelity alongside future MBI outcomes. We believe our tool can be used by MBI investigators regardless of study design or study phase in order to develop, monitor, and evaluate treatment fidelity (Onken et al. 2014). In fact, we encourage the use of this tool across MBI studies with diverse designs, phases, and samples for the enhancement of understanding both treatment fidelity and participant outcomes. The use of common language and definitions in standardized reports will allow researchers to more accurately identify (1) treatment fidelity variation on MBI participant outcomes and (2) the specific mindfulness practices or program components that are efficacious, for whom, and why (Gould et al. 2016). Furthering our understanding of these mechanisms and using standardized protocol for MBI treatment fidelity may help prevent the science to service implementation cliff between efficacy to effectiveness studies (Onken et al. 2014). That is, by examining the level of treatment fidelity required to effectively deliver MBIs in research settings, we increase our ability to properly refine and feasibly disseminate MBIs in real-world settings.