The aim of school mental health programs (SMHP) is to help the students to achieve their goals through the improvement and strengthening of their emotional well-being, their psychosocial skills, and their positive teaching-learning environments. Such programs should be based on empirically tested intervention models to ensure that the actions taken will produce the desired outcomes.

This realization and the interest in obtaining results has led to a growth of evidence-based interventions (EBI) in line with an approach that considers and summarizes empirical experience, professional expertise, and student characteristics in connection with the intervention (Weist and Lever 2014). One of the principles of this approach is that replication will make it possible to reproduce the results found in experimental tests; however, this does not always occur, because when EBIs are implemented in complex and heterogeneous contexts, such as schools, they seem to lose strength and applicability (Proctor et al. 2009). In fact, multiple SMHP have been conducted (e.g., www.casel.org, www.blueprintsprograms.com) whose effectiveness has been tested rigorously, but little is known about how to implement them properly in schools (Forman et al. 2013; Sarno et al. 2014).

In this context, the last decade has witnessed the development of implementation science, focused on understanding how to transfer the benefits of EBIs to the real world by studying the processes and components of their implementation in everyday intervention contexts (Bhattacharyya et al. 2009).

Studying implementation involves understanding the social contexts in which the actions are executed, and examining the technical resources and organizational conditions that support the proper execution of an intervention. In particular, it involves determining how the executed actions conform to a number of contexts while maintaining fidelity to the intervention model (Dupaul 2009; Perepletchikova 2011; Schulte et al. 2009).

Implementation fidelity (IF)—or treatment integrity—is one of the key aspects of implementation research and refers to the degree to which an intervention is conducted in accordance with its intervention model (Perepletchikova 2011; Schulte et al. 2009). Knowing the threshold at which IF starts generating the desired (or undesired) results makes it possible to estimate the efforts required to implement an intervention adequately, this is fundamental for political-technical decision-making, because investing resources in intervention programs without applying them correctly is as pointless as investing in ineffective programs (Durlak 2015).

IF is a complex variable that is obtained by measuring several basic components of the operative model (Dane and Schneider 1998; Dupaul 2009; Hagermoser-Sanetti and Kratochwill 2009; Sarno et al. 2014). Its complexity arises from its multilevel and multidimensional nature, as it incorporates different dimensions of the various participants who are involved in nested relationships in an intervention. Even though several taxonomies are available (e.g., Hagermoser-Sanetti and Kratochwill 2009; Schulte et al. 2009), they all consider at least four components of IF in SMHP that are associated with the interventionist and with the students taking part in the intervention. At the interventionist’s level, IF assesses the extent to which this practitioner’s work conforms to the planned actions and the prescribed quality level. In particular, two aspects are considered: adherence, which concerns the degree of fulfillment of or fidelity to the practical components specified in the operative model (Schultes et al. 2015), and intervention quality, which refers to the degree of skill, enthusiasm, and commitment in the execution of the actions (Dane and Schneider 1998). At the level of student participants, IF assesses whether the intervention was adapted to fit them (i.e., if they actually were the target population) and whether they received the prescribed number of sessions in the right way for attaining the expected results. Specifically, the model also considers intervention exposure, which refers to the doses received according to what was planned (Codding and Lane 2015), and receptiveness, which refers to the degree of relevance of the intervention or the commitment that the participants must display towards it (Low et al. 2014b).

Previous Reviews

To date, a number of reviews have been published showing that IF has been insufficiently reported (Sutherland et al. 2013). As shown in Table 1, this trend has remained stable over the last 35 years of research, never exceeding 50% of the articles published in specialized journals.

Table 1 Fidelity implementation report in school mental health and related fields

One reason for this is that researchers assume IF to be present when using experimental designs. These studies are focused on control over the execution of actions; therefore, the authors are not always interested in checking whether execution is implemented with fidelity, because they make great efforts to ensure that this be so by training, supervising, and monitoring the actions of the individuals in charge of the execution. However, since not all researchers are concerned with the degree of IF that their efforts produced, it cannot be assured that the results obtained are due to the actions executed, which means that their conclusions must be interpreted cautiously. For this reason, several professional organizations and funding agencies have made a commitment to IF, thus encouraging researchers to increase their efforts in this area (DiGennaro Reed and Codding 2014).

Given the recent interest in IF, and despite the relative consensus regarding its measures (Dane and Schneider 1998; Schulte et al. 2009), the researchers who report it do so incorrectly: they only mention the aspects considered to promote IF, such as the use of manuals, training, supervision, monitoring, and the implementation context, but only a few give specific details about how much IF was attained and in which component. As a result, reviews of IF have focused on describing these elements, showing that better intervention outcomes are achieved when IF is fostered. It has been demonstrated that well-defined interventions lead to better outcomes than ambiguous and unclear ones. In a pioneering study, Tobler (1986) reviewed 143 drug prevention programs aimed at adolescent school populations. He found that the results obtained were linked to the operational definition of the intervention, because programs that included such a definition obtained larger effect sizes than those that did not. More recently, in a study of 55 assessments of mentorship programs for adolescents, DuBois et al. (2002) found that those programs that used evidence-based practices (with adequate operational definitions) yielded better results. Another similar study conducted by Sklad et al. (2012) examined 75 articles on the effectiveness of universal socioemotional learning programs and found that outcomes are linked to the reported use of a manual.

Other reviews demonstrate that monitoring of implementation is associated with intervention outcomes. Smith et al. (2004) examined 84 studies on the effectiveness of anti-bullying programs and observed that those that involve systematic monitoring tend to be more effective than those without monitoring. The same finding was reported by DuBois et al. (2002), who warn that low-monitoring mentorship programs for adolescents can be detrimental to their young participants, especially if they come from deprived social backgrounds. Another review along similar lines conducted by Wilson et al. (2003) studied 221 articles on programs for reducing aggressive behavior and found that implementation quality (understood as the resolution of implementation problems through monitoring) is strongly related to the results obtained.

In the implementation field, the review by Durlak and DuPre (2008) is essential reading. Using several analytical strategies, these researchers conclude that IF is important, and they find that well-implemented programs (i.e., those in which IF is promoted) have effect sizes two to three times larger than those of badly implemented ones and that, under ideal conditions, interventions with high IF can be up to 12 times more effective than poorly implemented ones. These results suggest that studies striving to ensure adequate conditions for promoting IF have better outcomes than those that fail to do so. However, they provide no practical information about, for instance, how a manual must be used or how much it is possible to deviate from it to achieve the expected results. These data are essential for replicating interventions in everyday school contexts, where the possibilities of being totally faithful to the intervention model are slim (Sarno et al. 2014).

On the other hand, although these results stress the importance of monitoring and supporting the implementation of operationally well-defined interventions, they do not shed light on the specific relationship between the implementation level of each component and the outcomes observed. Despite the importance of this issue, studies usually fail to connect IF with outcomes (Schoenwald and Garland 2013). This is why only two reviews were found that specifically refer to such relationship: in the first, Dane and Schneider (1998) reveal that the higher the dose, the better the results, while in the second, Durlak and DuPre (2008) note that in 79% of the interventions reviewed there is a significant link between dosage and adherence and at least half of the intervention results.

Purpose of this Review

Understanding the complexity of IF in SMHP requires exploring how IF components associate with results and weighing the relative importance of their influence on them. In this regard, although some authors assume that adherence is the heart of IF because it hosts the specific strategies and techniques originating from the change model, it is necessary to delve deeper into the other components. This review sought to shed light on the link between IF components and the expected outcomes of SMHP. To accomplish this, a descriptive analysis of studies that addressed this connection was conducted by counting the number of times the components of IF were significantly linked to the results measured.

Method

Literature Search Strategy

Two search strategies were used to ensure an exhaustive search of the existing literature. The first was to identify primary studies in the online databases APA PsycNET (n = 315), PUBMED (n = 216), EBSCO (n = 95), ISI-WEB Science (n = 89), and Scopus Science Direct (n = 167) using the following keywords: fidelity, integrity, implementation, adherence, dosage, dose, exposure, quality, professional competence, engagement, receptiveness, outcome, effectiveness, efficacy*, school*, preschool*, and class*.

A total of 882 articles were reviewed by the author and her assistant by reading their titles and abstracts. Thirty-six articles were selected for potential inclusion in the review; after a detailed examination, this number went down to 20. The second strategy was to review the reference section of each selected article, which yielded 11 additional articles. Thirty-one articles in total were included in this review (see Fig. 1).

Fig. 1
figure 1

Flow chart on the different phases of the systematic

Inclusion Criteria

Eligible studies for this review were those that: a) reported measures of the relationship between an IF component (adherence, intervention quality, dose, and/or receptiveness) and the results of the intervention; b) assessed school-based mental health programs, defined as interventions intended to promote, prevent, or treat problems associated with students’ emotional well-being (externalizing and internalizing), psychosocial skills, and positive teaching-learning environments (such as school climate, school relationships, and bullying); c) were published between 2006 and 2016. This period was selected because, according to some authors, mentions of IF in articles began to increase in the mid-2000s (Hagermoser-Sanetti et al. 2011); d) were published in peer-reviewed academic journals only. Although this criterion introduces a publication bias, it was included given the importance of examining rigorous scientific information regarding the link between IF and outcomes.

Exclusion Criteria

Studies excluded from this review were those that: a) reported qualitative or review-based results; b) failed to provide statistical information about the relationship between IF and outcomes; c) reported results of school interventions aimed only at improving certain aspects of learning processes (e.g., literacy) or focused on physical health and risk behaviors (e.g., nutrition programs or sexual risk behaviors).

Special Treatment of Research Reports

Interventions conducted in different school levels (e.g., primary and secondary education) reported in a single study were coded and analyzed separately. The same procedure was followed when one study employed different measures to assess and determine the associations of a single IF component (e.g., adherence according to two informants or different temporal scales of exposure to the intervention).

Coding

Independent Variables: Components of Implementation Fidelity

Assuming that an intervention is operationalized through the measurement of IF components, these were regarded as independent variables in relation to the outcomes of SMHP. The underlying assumption is that measurement of these variables constitutes a window into the intervention’s black box, usually regarded as a dichotomous variable (treatment vs. control). Therefore, the four major components of IF were considered (Dane and Schneider 1998; Dupaul 2009; Schulte et al. 2009), and the presence or absence of their measurement in each study was assessed.

Adherence

This element concerns the degree of fulfillment of or fidelity to the practical components specified in the operative model. It was deemed to be present when the researchers measured the use of specific techniques, the application of general principles, or the fulfillment of key phases of the intervention process.

Intervention Quality

This element concerns the degree of competence with which the interventionist executes the actions. Intervention quality was deemed to be present when the researchers measured the interventionists’ knowledge about the intervention, the interventionists’ skill as demonstrated when performing the actions, and/or their attitudes towards the actions performed (e.g., enthusiasm and commitment).

Exposure to the Intervention

This component concerns the number of intervention sessions in which a student participates. Exposure to the intervention was deemed to be present when the researchers measured the number, or the proportion of sessions received compared to the total number of sessions (e.g., annual, biannual, monthly, weekly, or daily).

Receptiveness

This element concerns the degree to which students were committed to the intervention. Receptiveness was deemed to be present only when the researchers measured the participants’ attitude towards the intervention (e.g., enthusiasm and commitment) through post-intervention surveys or questionnaires.

Combined IF Indexes

This element concerns the construction or usage of IF indexes based on the combination of the components mentioned above.

Characterization Variables: Type of Intervention and Measurement of IF

The types of intervention were characterized following elements of the taxonomy published by Humphrey et al. (2013) and other categories used in reviews of prevention programs in SMHP (Sklad et al. 2012; Weare and Nind 2011; Wilson et al. 2003).

Scope of the Intervention

This element refers to the educational level in which the intervention is carried out. The label “universal” was assigned to the studies that report interventions designed for school-wide application. The label “selective/targeted” was used for those reporting interventions aimed at a subgroup identified as at-risk of experiencing (or currently experiencing) social, emotional, and behavioral problems. “Mixed” was used when both approaches were present. In each case, the coding was based on the descriptions included in the sample and participants section.

Structural Components

These are the types of activities performed during the intervention. The “skills teaching” code was used for studies that report interventions structured upon the basis of lessons and activities aimed at helping children develop and strengthen their social, emotional, and behavioral skills. The “strengthening of the school environment” code was used for studies that report interventions focused on improving school climate, culture, and norms. This category included interventions based on the Positive Behavior Intervention and Supports (PBIS) approach. The “mixed” code was used for studies reporting interventions based on both components. In each case, only the component addressed in the study was considered rather than the additional components in the full version of the program.

Prescriptivity

This element concerns how prescriptive the specific actions of a given intervention are. The “top-down” label was assigned to studies reporting interventions based on planned actions and structured guidelines or manuals describing the implementation procedures, with the explicit obligation of performing them exactly as designed. The “bottom-up” code was assigned to studies reporting interventions that emphasize flexibility and local adaptation in the activities to be implemented. The “mixed” code was used when prescriptivity depends on the structural components of the intervention. In each case, the coding was based on the descriptions included in the introduction and procedures sections; in some cases, the websites of the programs assessed were also reviewed.

Interventionist

This is the professional who executes the intervention. The “teacher” code was used when the teacher applied the components of the intervention to students; the “school team” code was selected when an internal school team implemented these actions; and the “another professional” code was assigned when the actions were conducted by social, medical, or other professionals.

School Grade Level of the Participants

This refers to the grade level to which the student participants belong. Articles were coded as relating to preschool, primary education (grades 1–8), or secondary education (grades 9–12). In each case, the grade level in which the students received the intervention was considered, rather than the grade level in which the outcomes were assessed.

To characterize the measurement of IF, the following elements were considered: measurement instruments, measure validity, and measurement frequency.

Instruments for Measuring IF

This component concerns the type of instrument used to assess IF. Studies were coded as: “permanent products” when assessment reports or cards were generated regarding the implementation process; “observations” when systematic observation checklists were used or when video recordings of the execution of the actions were coded; “self-reports” when questionnaires or surveys about the implementation process were used; “interviews” when conversational production techniques were employed; and “multiple instruments” when two or more approaches were used.

Measure Validity

This refers to whether, or not validity indicators were reported for the IF measures used. Studies were coded as yes/no depending on the presence or absence of measures of validity.

Frequency of IF Measurement

This component refers to the number of times that IF was measured. Studies were classified as: “session-by-session,” “weekly,” “monthly,” “bi-monthly,” “quarterly or four-monthly,” “five-monthly or bi-annually,” “annually,” or “once during the whole intervention”.

Dependent Variables: Outcomes of the Interventions

Five dependent variables were used in this section: internalizing mental health difficulties, externalizing mental health difficulties, socio-emotional skills, school relationships, and academic performance.

Internalizing Mental Health Difficulties

This category included outcome variables associated with internalizing behaviors such as depressive and anxious symptoms, loneliness, suicide attempts, somatic problems, psychological well-being, or need of psychological care.

Externalizing Mental Health Difficulties

This category included outcome variables associated with externalizing behaviors such as non-fulfillment of tasks, problems with classmates, aggressive, disruptive, hyperactive, or antisocial conduct, and substance use.

Socio-Emotional Skills

This category included outcome variables associated with the five domains of socio-emotional learning, which include: self-awareness (e.g., self-esteem, recognition of emotions), self-control (e.g., stress management, impulse control), social awareness (e.g., empathy, respect for others), social skills (e.g., active listening, cooperation), and responsible decision-making (e.g., problem-solving, anticipation of consequences).

School Performance

This category included outcome variables associated with academic performance (e.g., mathematics and language test scores) and dropout (absenteeism, dropout).

School Relationships

This category included outcome variables associated with the perception of social environments of teaching-learning, such as classroom climate or bullying level at school or in the classroom.

Coding Reliability

Coding was performed by the research team. In two training sessions, the team was instructed by the first author regarding the general topic under study and the search and coding procedures used. Each stage of the process was guided by a reference document that provided step-by-step instructions for executing the actions of the intervention. The search and coding processes were conducted in full by both team members, whose discrepancies were solved during weekly work sessions. When no agreement was reached, the third author was contacted to provide assistance as an expert referee.

Proportion Ratio and Data Analysis

Given the lack of consensus regarding the measurement of IF components (Lewis et al. 2015; Schoenwald and Garland 2013) and the diversity of outcome variables in SMHP (Durlak et al. 2011; Sklad et al. 2012; Weare and Nind 2011; Wilson et al. 2003), the counting technique was used to analyze the relationship between IF and results (Cooper 2017). Based on this data, a proportion ratio (PR) was calculated between the number of times that IF components were significantly associated with outcomes (p < .05) and the total number of outcomes measured. That is, if a study considered five possible outcomes and only one of them was found to be significantly associated with a component of IF, the PR was .2. It should be noted that the PR was used in this review as a measure to describe the ratio of the number of significant relationships to the total number of relationships evaluated. Reference to “higher” or “lower” PR values merely indicates that a higher or lower number of significant associations between IF and outcomes was observed, and is not meant to indicate statistical significance.

Results

Characteristics of the Studies Reviewed

The characteristics of the studies reviewed are summarized in Table 2. Half of the studies linking IF to SMHP outcomes were found to published between 2014 and 2016 (n = 16;52%), which indicates an upward trend. With respect to geographical location, most studies were conducted in English-speaking countries, especially the United States (n = 17;55%).

Table 2 Characteristics of the 31 studies reviewed

Seventy-seven percent (n = 24) of the studies only assess universal interventions, although if mixed interventions are added (i.e., those that include both universal and targeted activities), this value reaches 93% (n = 29). The same is true of the structural components of the intervention, most of which are associated with skills teaching (n = 23;74%) if we take into account the programs that conduct such actions alongside school environment strengthening ones. The most common interventionists are teachers, both as part of a team (n = 8; 26%) and as main implementers (n = 21; 68%). With respect to school level, most interventions were conducted in primary education (n = 23; 74%).

It was found that the 31 studies reviewed considered 171 outcome variables. Of these, 31% (n = 55) concern mental health difficulties, 32% (n = 57) refer to socio-emotional skills, 20% (n = 36) involve academic performance, and 13% (n = 23) reflect school relationships. With respect to the components of IF measured, the most prevalent indicators were adherence (n = 24; 77%) and dosage (n = 18; 58%). Nearly 40% (12) of the studies measure IF through self-report instruments, followed by those that employ more than one measure (9; 29%), including a combination of interviews, observations, and self-reports. 45% (14) provide data on the validity of their measures, although only consistency indicators tend to be included (Cronbach’s alpha or inter-rater agreement). In addition, it was found that most researchers measure IF more than once during the intervention (n = 22; 71%), commonly on a session-by-session basis (n = 11; 35%).

Associations between Outcomes and IF

Considering the special treatment of some studies included in the review, 41 interventions were analyzed. In total, the relationship between IF components and the outcome variables considered was measured 273 times, with a positive association being found in 40% of cases (n = 97; See Table 3).

Table 3 Proportion ratio of the significant relationships between results and IF

Examining the associations by IF component revealed that participant receptiveness is the element most frequently linked to outcomes (60%; n = 31), in contrast with the aggregate IF index, which only reaches 10%. With respect to IF components and outcome variables, it was observed that the significant association rate of adherence did not surpass 30%. Intervention quality was associated with internalizing difficulties on all occasions (n = 3), although its connection with school relationships was null. Dosage was associated up to 50% (n = 17) with outcome variables, except for school relationships, with which there was no association. The combined fidelity index displayed no association with any of the mental health difficulties considered, and was linked to the rest of the variables in less than 30% of the cases. The component that displayed the highest number of significant associations with the outcomes assessed was receptiveness, operationalized as student commitment. Receptiveness was linked in 100% of cases with internalizing difficulties, in 60% of cases with externalizing difficulties, in 70% of cases with socio-emotional skills, in 40% of cases with school performance, and in 30% of cases with school relationships.

Comparison of the Associations between Outcomes and IF According to Characterization Variables

The 273 assessments of the relationship between IF and outcomes were analyzed according to the selected characterization variables. In the studies that analyze universal interventions, the associations between IF and outcomes were found to be twice as high as those in studies reporting selective and/or mixed intervention data. PR values were 6% higher in top-down than in bottom-up interventions, although mixed interventions displayed 23% fewer significant associations. In general, PR values were found to be twice as high in primary education as they are in preschool and secondary education.

A look at the characteristics of IF measurements reveals that, when permanent products and self-report instruments were used, PR was approximately two times larger than when measured with other instruments. An unexpected result was that the studies reporting the validity of IF measures displayed lower PR values than those that did not provide this data. Finally, it was observed that PR values were 5% higher when researchers measured IF only once per year (See Table 4).

Table 4 Proportion ratio of the significant relationships between results and IF by Characteristics of the studies

Discussion

The purpose of this review was to shed light on the link between IF components and the outcomes of SMHP. Thirty-one articles were found that directly addressed this association. Of these, approximately half were published two years ago. This reveals a recent interest in empirically testing the hypothesis of the impact of IF on outcomes, especially in the United States, where most of these studies were conducted.

The interventions assessed are mostly universal, top-down, executed by teachers, aimed at primary school students, and focused on the improvement of socio-emotional skills, which is consistent with the international trend of evidence-based SMHP (Kutash et al. 2006). In this regard, it should be noted that despite extensive evidence in support of programs that target socio-emotional learning (Durlak et al. 2011), these are not free from criticism regarding the universality of their benefits outside their country of origin (Berry et al. 2015). Therefore, it is fundamental to explore the active ingredients of interventions and the processes involved in their implementation over the last years (Durlak 2016).

With regard to IF measures, the most frequent were adherence and dosage, a result that matches previous reviews (Dane and Schneider 1998; Durlak and DuPre 2008). In this respect, adherence is the heart of IF for some researchers (Gresham 2009), because if one wishes to measure the operationalization of the intervention model empirically, the differences between what interventionists actually do and what they should do cannot be ignored. Similarly, other researchers (Codding and Lane 2015; McGinty et al. 2011) have stressed the importance of dosage for the measurement of IF, because it is not enough for something to be done according to the guidelines, it is also important to know how much is received by students.

Adherence and dosage represent the quantitative aspect of IF. This means that researchers have chosen not to consider qualitative aspects such as intervention quality and interventionist–participant interactions, despite the importance of these IF components in other areas. In psychotherapy, for instance, the therapist’s competence is an essential factor in measuring IF (Perepletchikova et al. 2007). On the other hand, it was found that in response to the complexity of IF, some researchers have developed indexes that consider two or more components, usually adherence, dosage, and intervention quality.

With respect to measurement instruments, most researchers use self-reports, either exclusively or alongside interviews, observations, or permanent products. It is interesting to observe that nearly half of the studies report the psychometric properties of their measures, although most of them were inter-rater agreement and internal consistency indexes. In this regard, as other authors have pointed out (Dupaul 2009), robust psychometric instruments need to be developed, and this in turn requires further consensus on the definitions of IF and the key components common to all program types. The work of authors such as Abry et al. (2015) and Rimm-Kaufman et al. (2014) provides useful observation and self-report instruments in this direction.

As to the central topic of this review, an especially relevant aspect concerns the use of PRs as indicators of the association between IF and results. Even though the ideal procedure would have been to conduct a meta-analysis and calculate effect sizes, the diversity of dependent and independent variables forced us to seek new ways to tackle this issue. In this context, the PR assumes that if the researchers measured certain given dependent variables, it is because these emerged from the model supporting the intervention; therefore, the multiple components of IF should be linked to them. From the findings of this review, it may be concluded that the relationship between IF and SMHP outcomes is partial, as IF (understood as the sum of its components) was found to be associated with the measured outcomes of SMHP only 40% of the time, i.e., a low rate considering the theoretical importance of IF.

Regarding outcome variables, IF was found to be weakly associated with school relationships, school performance, and externalizing difficulties. This is a worrying finding, because one of the fundamental principles of SMHP is that interventions contribute to educational goals (Suldo et al. 2014), but this does not seem to be the case in the studies reviewed. This urges us to reflect on the ingredients of SMHP that actively contribute to academic improvement, because the fact that no significant associations were found between IF and these outcomes does not mean that interventions do not have a positive impact on academic performance. Actually, the available evidence shows that they do help (Durlak et al. 2011); thus, it is necessary to explore in depth the associations between the actions performed and their results. It could be hypothesized that the lack of a link between IF and outcomes arises from the presence of variables not considered in the intervention models (and thus not measured as part of IF) that exert a significant influence during the execution process; alternatively, it may be due to methodological problems in the measurement of IF components.

On the other hand, when the four components of IF were considered in connection with the multiple outcome variables examined, the relative importance of each was assessed. One of the main findings of this review is that adherence, despite being the most frequently reported component, is weakly associated with outcome variables. The same is true of intervention quality. This is an unexpected finding, given the importance that authors ascribe to these dimensions as reflecting the interventionist’s practices. On the other hand, the components of IF located in the participant were observed to display stronger associations. A case in point is receptiveness, which is expressed in students’ commitment to the intervention and is linked to 70% of socio-emotional outcomes.

Even though these results must be cautiously weighed, the PRs found may indicate that, in the case of SMHP, the most important aspect is for students to participate as many times as they can in the activities planned and with as much commitment as possible, regardless of the intervention model used. In this regard, it would be interesting to test the hypothesis that change takes place to a large extent due to personal factors such as involvement or attitudes towards the intervention activities, that is, depending on how strongly the participants believe that what they are doing can help them (Low et al. 2014a, b). These results highlight the importance of fostering students’ commitment to and enthusiasm for participating in school-based mental health interventions. This can be achieved through change readiness strategies and school-based mental health literacy (Macklem 2014).

When examining PRs in light of the characterization variables of the interventions, stronger links between IF and outcomes were found in universal top-down skills teaching programs for primary education students. This can be partly explained because school-wide readiness for the implementation of SMHP can create a suitable environment for commitment in all the parties involved, especially students.

Another relevant result is that more than half of the studies examined measure IF monthly or less frequently. Even though this appears to be a good decision because it makes it possible to describe the variation in IF throughout the intervention, the data collected in this review show that when researchers measure IF only once a year, more instances of significant association with outcomes are observed. This seems to suggest that an overall assessment of the implementation process is better than a detailed one.

The limitations of this study concern three aspects that should be considered in future research. First, given the recent attention paid to IF in applied research, several different outcomes and interventions were included. However, even though this approach provides for generalization, it restricts the depth of analyses. Future reviews should focus on a single type of intervention and its related outcome to eliminate possible biases introduced by broad categories such as those included in this review. A second limitation was introduced by the requirement of research rigor, represented by publication in peer-reviewed journals. Although this choice provided for collecting high-quality scientific information, it produced a major by omitting evidence from other sources. Interventions are carried out in many schools, but their results are not always reported in academic outlets and are instead filed away in the gray literature of foundations or government agencies. Therefore, it is necessary to explore these documents to shed more light on the associations examined in this study, especially in underdeveloped or developing countries.

The third limitation was the use of PRs for estimating the connection between IF and outcomes. This review set out to question indirect estimations of the importance of IF on outcomes. However, given the characteristics and designs of the primary studies analyzed, it was unable to make progress towards estimating IF effect size. In this regard, the PR is still an indirect estimation, but has the advantage of addressing direct associations between IF and outcomes, unlike previous reviews that had to create a posteriori implementation indicators. In this context, it is necessary to issue clear guidelines for researchers interested in assessing IF and to reach a consensus regarding the operational definitions of its multiple components.