Introduction

The common approach to developing educational and prevention programs has been to create a program, test it through a randomized trial, and then offer it to community institutions.1 This approach has led to the implicit expectation that districts or schools can and will adopt and implement evidence-based programs with a high degree of fidelity; however, implementation is typically poorer in real-world settings than in efficacy trials.2,3 As a result, there is increasing interest among federal agencies, researchers, and policy makers in the process by which prevention programs are moved into real-world settings, often referred to as “translational” research.4 This includes the process by which efficacious practices, interventions, or treatments become implemented effectively in real-world settings.5 Yet, there has been limited empirical work specifically on the process of translating efficacious practices into various contexts or on how to support implementation and the scaling-up processes.6

There is a considerable need for more research on factors that enhance or relate to the adoption and adequate implementation of programs and lead to effective practice and outcomes,47 particularly in school settings where there is a growing emphasis on the implementation of “evidence-based” prevention programs.8,9 Although this is a positive trend, previous research suggests that, on average, schools are implementing a dozen or more different prevention programs,10 leaving concerns about the implementation fidelity of these programs.10,11 In addition to this real-world concern, few rigorous studies of the effectiveness of prevention programs measure or report data on the level of implementation,12,13 and, therefore, even less is known about implementation of school-based programs when taken to scale.

The current paper applies a Type II translational research approach to examining how the implementation fidelity of an increasingly popular and widely disseminated school-based prevention model called School-Wide Positive Behavioral Interventions and Supports (SW-PBIS) 14 relates to student outcomes. School-level factors which previous research suggests are potentially related to both implementation and the targeted outcomes are considered.15 A unique feature of this study is the use of data from a state-wide scale-up effort of SW-PBIS, which includes over 870 Maryland public schools. Data for this study are from schools across the state. Maryland is not alone in its efforts to scale up SW-PBIS, as at least 44 states across the United States have developed a state- or district-level infrastructure to support its implementation. The large-scale implementation of SW-PBIS has important implications for behavioral and academic outcomes for students. Issues related to the type of implementation data collected in relation to the extent to which they predict student outcomes are also considered; this issue is of particular importance in scale-up efforts where the resources to collect fidelity data are often limited. First, we provide a brief review of translational research, followed by an overview of SW-PBIS and the infrastructure developed to scale up the model in Maryland.

Translational research

There has been a recent effort to differentiate between two types of translational research: Type I translational research focuses on discovery through clinical trials, whereas Type II examines the process by which efficacious practices, interventions, or treatments become implemented effectively in real-world settings.5 The current paper focuses on Type II translational research, which “is aimed at enhancing the adoption, implementation, and sustainability of evidence-based or scientifically validated interventions” 4(p 2) and focuses on achieving broad, population-level effects. While there is an established literature supporting the efficacy (Type I) and effectiveness (one element of Type II research) of a variety of intervention approaches or programs, there has been less empirical work specifically on the process of translating efficacious practices into real-world settings or on how to support implementation and the scaling-up processes.6 When it comes to scale-up efforts in schools, there is limited empirical research on the extent to which prevention programs are adequately implemented and the association between implementation quality and student outcomes. This illustrates a clear need for additional research on the process of dissemination or planned diffusion 16 of evidence-based programs and whether the effects seen in randomized trials are replicated when brought to scale.17

School-wide Positive Behavioral Interventions and Supports

The current paper focuses on the scale-up of SW-PBIS,14 which is a non-curricular, school-based prevention approach which aims to promote changes in staff behavior in order to positively impact student outcomes such as student discipline, behavior, and academic outcomes. SW-PBIS 14 is based on behavioral, social learning, and organizational behavioral principles. The model is implemented in all school contexts (classroom and non-classroom) with the aim of improving a school’s systems and procedures to prevent disruptive behavior and enhance the school’s organizational climate. The model follows the three-tiered prevention framework, where a universal system of support is integrated with selective and indicated preventive interventions for students displaying a higher level of need.18 Two recent randomized controlled effectiveness trials provide evidence of positive outcomes of the universal elements of the SW-PBIS model. Specifically, SW-PBIS has been shown to be effective at reducing student office discipline referrals and suspensions, and improving school climate.1922 Teachers in SW-PBIS schools also rate their students as needing fewer specialized support services, and as having fewer behavioral problems (e.g., aggressive behavior, concentration problems, bullying, rejection).23,24 In addition, there are some favorable results from state-wide evaluations of SW-PBIS.25,26 Taken together, these studies provide evidence that states can implement SW-PBIS on a large scale and that schools adopting SW-PBIS experience positive effects. Given the wide dissemination of SW-PBIS and previous research documenting its effectiveness, it is a particularly good candidate for Type II translational research focused on implementation quality in scale-up efforts.

SW-PBIS scale-up in Maryland

Maryland has developed a coordinated system for implementation of SW-PBIS. Over the past 12 years, a collaboration between the Maryland State Department of Education, Sheppard Pratt Health System, and Johns Hopkins University2527 has trained a total of 877 schools (e.g., elementary, middle, high, alternative, special) in SW-PBIS, of which 740 (84 % of trained schools) are actively implementing and participating in the state initiative. This is made possible through the state-wide infrastructure, which includes a variety of core elements for dissemination,7,2830 including a consortium of stakeholders (e.g., educators, researchers, policymakers) who jointly coordinate, train, and support schools in the implementation of SW-PBIS. There are multiple levels of coordination (for details, see Barrett et al.25) to promote high quality implementation. Similar systems of support have been utilized in other translational efforts to disseminate programs and achieve high fidelity (for examples, see Bloomquist et al.,31 Fixsen et al.,32 Spoth and Greenberg33). The Maryland Initiative also maximizes the dissemination of SW-PBIS through the promotion of exchange between school practitioners, who may be more effective in shaping their colleagues’ opinions about SW-PBIS than the consortium,34 and by utilizing coaches and district leaders.16,34 Finally, there is ongoing data collection, evaluation, and technical assistance provided by the partners regarding implementation and outcomes.29,3033 The data from the current study come from the state’s evaluation efforts.

Linking implementation with outcomes in scale-up efforts

While several efficacy studies of behavioral or mental health prevention programs have documented an association between implementation and outcomes,3539 there have been relatively few studies which have examined the link between implementation and outcomes within the context of state-wide scale-up efforts. For example, research on the Triple P-Positive Parenting Program, which targets changes in child behavior through training parents to alter the home environment,40 has found that the intensity of the program as well as the format (e.g., self-directed versus group format) was significantly associated with parent and child outcomes.41 Similarly, an evaluation of the Promoting Alternative Thinking Strategies (PATHS) social–emotional curriculum reported a significant interaction between implementation and contextual factors, like administrator support, on student behavioral and emotional outcomes.42 Taken together, the available research suggests a need for more empirical research on the association between implementation quality and outcomes when interventions are brought to scale.43

Role of contextual factors

When examining the association between implementation and outcomes, it is important to adjust for contextual factors, which may influence implementation, as well as the outcomes. For example, there is literature suggesting that a high rate of disorder or disorganization can impede successful implementation of programs and can negatively impact program outcomes, 15,20,44 whereas a climate which encourages the adherence to implementation (or fidelity) may improve implementation.16,45 A previous study of SW-PBIS in elementary schools found that implementation fidelity was associated with school-level factors, such as the percent of certified teachers in the school;27 however, the available district-level predictors were not associated with implementation. Also relevant was the number of years since training, such that schools that implemented the model longer achieved higher levels of fidelity27 (see Rohrbach et al.30 and Rogers34). Together, these findings suggest that it is important to account for school-level contextual factors when examining the association between implementation and outcomes in scale-up efforts.

Overview of the current study

The current paper examined how the level of implementation of SW-PBIS related to student outcomes, while adjusting for school-level contextual factors, which are associated with both implementation quality and student outcomes. The data come from the state-wide evaluation of SW-PBIS, which is led by the PBIS Maryland Consortium. A variety of data elements are collected by the PBIS Maryland Consortium, including the implementation quality of SW-PBIS. The data reported in this paper focus on program implementation in the spring of 2009 (i.e., from the 2008 to 2009 year) and student outcomes in spring 2010 (i.e., from the 2009 to 2010 year), while controlling for predictor variables, which preceded each individual school’s year of training. Data from elementary and middle schools were examined, including traditional K-5 or K-6 elementary schools, K-8 schools, and middle schools with grades 5 or 6 to 8. High schools implementing SW-PBIS were excluded because the assessment of student outcomes varied substantially for this school level (i.e., different standardized testing approach).

The outcomes of interest were student achievement on the Maryland School Assessment (MSA) for math and reading, truancy rates (i.e., percent of students absent greater than 20 days in the school year), and suspensions (i.e., the total number of suspension events divided by the total number of students times 100). Baseline data for each outcome (i.e., achievement, truancy, and suspensions in the year prior to the school’s training in SW-PBIS) were controlled for. The level of implementation of SW-PBIS was assessed by three measures: the Implementation Phases Inventory (IPI),46 the School-wide Evaluation Tool (SET),47 and the Benchmarks of Quality (BoQ),48,49

It was hypothesized that higher levels of implementation would be associated with higher levels of achievement and lower rates of truancy and suspensions. Based on the literature identifying potentially important contextual factors on implementation quality and student outcomes,15 a set of school-level variables (i.e., student enrollment; students per teacher; rates of mobility and teacher certification; and years since training) was controlled for. It was hypothesized that large school size and student to teacher ratio, lower rates of teacher certification, and high student mobility would be associated with poorer SW-PBIS implementation and outcomes, based on the use of these variables as proxies for disorder.15 On the other hand, we hypothesized that the longer the schools implemented SW-PBIS, the higher their implementation quality would be.50

Method

Participants

Eligibility

Within the state of Maryland, there are 24 districts, all of which participate in the SW-PBIS Initiative. The focus is on traditional (i.e., non-special education, non-alternative) elementary and middle school, since the initiative had a strong support system for these schools relative to high schools or non-traditional schools. There were 474 schools (i.e., traditional elementary or Kindergarten to grade 5 [K-5] or K-6 schools, traditional middle or grades 5 or 6–8 schools, and K-8 schools) across the 24 districts which were trained in SW-PBIS in 2008 or earlier. Of these schools, 421 (or 88.1 %) submitted data regarding implementation on at least one measure and therefore were eligible for inclusion in the analyses. The sample included 269 elementary schools, 140 middle schools, and 12 K-8 schools. School-level demographics for the sample are reported in Table 1.

Table 1 School and district demographic characteristics

Measures

Implementation of SW-PBIS using the Implementation Phases Inventory (IPI)

The IPI 46 assesses the presence of 44 key elements of SW-PBIS following a “stages of change” theoretical model, whereby schools move through a series of four stages: preparation (Cronbach’s alpha [α] = .65;Footnote 1 e.g., “PBIS team has been established,” “School has a coach”), initiation (α = .80; e.g., “A strategy for collecting discipline data has been developed,” “New personnel have been oriented to PBIS”), implementation (α = .90; e.g., “Discipline data are summarized and reported to staff,” “PBIS team uses data to make suggestions regarding PBIS implementation”), and maintenance (α = .91; e.g., “A set of materials has been developed to sustain PBIS,” “Parents are involved in PBIS related activities”). The schools’ PBIS intervention support coach reviewed each of the 44 items on the scale and indicated the extent to which each core feature was in place at the school on a 3-point scale from 0 (not in place) to 2 (fully in place). Schools received a percentage of implemented elements for each stage, such that a higher score indicated greater implementation. The IPI was developed in conjunction with the PBIS Maryland State Leadership Team to track different phases of implementation; it reflects the core elements of universal SW-PBIS (in the preparation, initiation, and implementation stages), as well as some more advanced features, such as preparing for parental involvement and implementation of selected and indicated preventive interventions (in the maintenance stage). A previous study of the psychometric properties of the IPI found it to have adequate internal consistency (α = .94) and reliability (test–retest correlation of .80).46

Implementation of SW-PBIS using the School-wide Evaluation Tool (SET)

The SET 47 is conducted by an external evaluator and consists of seven subscales that assess the degree to which schools implement the key features of SW-PBIS.51 The scales assessed include: (a) Expectations Defined; (b) Behavioral Expectations Taught; (c) System for Rewarding Behavioral Expectations; (d) System for Responding to Behavioral Violations; (e) Monitoring and Evaluation; (f) Management; and (g) District-Level Support. Each item of the SET is scored on a 3-point scale from 0 (not implemented) to 2 (fully implemented). A scale score reflecting the percentage of earned points is calculated, such that higher scores reflect greater implementation fidelity. The SET was created by the developers of SW-PBIS; it is the most commonly used measure of the core features of the universal SW-PBIS model. Previous studies have documented the reliability and validity of the SET.52,53

Implementation of SW-PBIS using the Benchmarks of Quality

The BoQ 48,49 is completed by multiple PBIS team members and the coach and consists of 53 individual benchmarks assessing 10 areas of implementation (i.e., PBIS team, faculty commitment, effective disciplinary procedures, data entry and analysis plan, expectations and rules, the recognition system, lesson plans for teaching expectations, implementation plan for PBIS, classroom systems, and evaluation). Team members and the PBIS coach each independently complete a rating of each item on a 3-point scale (0 = not in place, 1 = needs improvement, and 2 = in place) and their responses are combined, such that the most frequently endorsed rating for each item is the final score. An overall percentage of implementation was calculated by adding all earned points and dividing by the total possible points. In completing the BoQ, multiple team members and the coach provide ratings, which are then averaged into a single score for the school. Only the overall BoQ score is provided to the state, and thus only this score is available for analysis in the current study. This is the only implementation measure in this study which incorporates scores from multiple raters. The BoQ has documented adequate internal consistency, test–retest reliability, inter-rater reliability, and concurrent validity with the SET.49

School-level demographic characteristics

Data on the year in which the schools were trained were provided by the PBIS Maryland Consortium. These data were used to calculate the years since training (i.e., number of years implementing SW-PBIS) as well as to determine which year’s data should be used for the school-level covariates. This variable ranges from 1 to 10 years, reflecting training in the summers of 1999 through 2008, respectively. The demographic information regarding the schools was provided by the Maryland State Department of Education. Data regarding school size (e.g., student enrollment, student/teacher ratio [i.e., number of students per teacher]), percent of certified teachers (i.e., those certified to teach in the state of Maryland by completing the required coursework such as a Bachelor’s degree from a pre-approved teacher preparation program and have passed a basic skills and content area test), and student mobility (i.e., the percent of students who entered the school, plus the percentage who withdrew from the school, divided by total student enrollment) were obtained to serve as predictors, as were outcome data (i.e., MSA math and reading, truancy rates, and suspensions). The school covariates reflect data from the year preceding a school’s training in SW-PBIS (e.g., if a school was trained in summer 2007, then the data from the 2006 to 2007 school year were used; if the school was trained in summer 2005, data from 2004 to 2005 were used, and so on). This same procedure was used for the baseline data of each outcome (i.e., achievement, truancy, suspensions). The outcome variables were from the 2009 to 2010 school year in all cases (see Table 1 for a full listing of demographic and SW-PBIS information for this sample of schools). The inter-correlations among these variables are reported in Table 2.

Table 2 Correlations among school-level baseline variables (N = 421 Schools)

Procedure

As a requirement of the PBIS Maryland Initiative, the IPI is completed bi-annually (fall and spring) by a district-appointed technical assistance provider (i.e., a SW-PBIS Coach, which in Maryland is often a school psychologist or counselor) and submitted electronically to the PBIS Maryland Consortium through the www.PBISMaryland.org web site. As noted above, the SET is completed by an external district assessor, and the BoQ is completed by the school’s SW-PBIS team; these data elements are completed annually in the spring and are also submitted electronically through the Consortium’s website. The non-identifiable school-level data have been approved for analysis in this study by the Johns Hopkins Bloomberg School of Public Health Institutional Review Board.

Analyses

The Mplus 6.1 statistical software 54 was used to fit a structural equation model (SEM) 55 in order to test the hypothesized associations between fidelity and student outcomes, while adjusting for covariates. Specifically, an SEM using maximum likelihood robust (MLR) estimation was fit. A confirmatory factor analysis (CFA) was conducted on the measurement model of implementation (i.e., the four IPI scales and the seven SET scales). A latent variable SEM approach was taken to allow for a reduction in the fidelity data dimensionality; to allow for a parsimonious, more interpretable model; and to eliminate concerns of multicollinearity due to the high correlations among the subscales within each measure (i.e., the IPI and SET).56 The BoQ was modeled as a third manifest (i.e., observed) indicator of implementation fidelity, as only the overall score was available to the researchers. The two implementation factors and the observed BoQ scores were then used to predict student outcomes (i.e., math and reading academic performance, truancy, and suspensions), while adjusting for the school-level covariates (i.e., years since training, school enrollment, the student/teacher ratio, the percent of certified teachers, and student mobility) and the baseline outcome measures.

As schools were nested within districts, the clustering of schools within districts was accounted for using the Huber–White corrections to adjust the standard errors;54 however, district-level covariates were not modeled due to the relatively small number of districts (i.e., 24), and because prior research using this dataset suggested that district covariates generally were not significantly associated with implementation.27 Given that all of the schools in the study were from a single state, no state variables could be modeled. Model fit was determined through inspection of the Root Mean Square Error of Approximation (RMSEA), Comparative Fit Index (CFI), Tucker-Lewis Index (TLI), and the Standardized Root Mean Residual (SRMR).57 A value between .05 and .08 on the RMSEA is considered acceptable fit; a CFI and TLI of greater than .90 is considered acceptable; and the SRMR should be less than or equal to .08.58

Results

SEM results

As described above, an SEM was fit to test the primary hypotheses regarding the association between the different sources of implementation fidelity data and student outcomes, while adjusting for school-level demographic factors and the baseline student outcomes. First, the measurement model was fit to verify the factor structure of the two latent implementation variables (i.e., the IPI and SET). The CFA indicated that these latent variables had adequate fit with an RMSEA = .045, CFI = .962, TLI = .951, and SRMR = .116 (see the Measurement Model section in Table 3 for factor loadings).

Table 3 Standardized path coefficients for structural equation model of implementation and outcomes

Model fit

The measurement model was incorporated into the hypothesized structural model, which included the BoQ as a third manifest implementation fidelity indicator. Student outcomes were regressed on the implementation variables, adjusting for the five school-level contextual factors and the baseline student outcome variables (see Fig. 1). This model had adequate fit with an RMSEA = .070, CFI = .897, TLI = .859, and SRMR = .186. The modification indices were examined for potential aspects of the model that could be improved, but none were substantively relevant. Therefore, the model reported in Table 3 and Figure 1 was selected as the final model. The substantive findings from that model are reported below.

Figure 1
figure 1

Path diagram for structural equation model of implementations and outcomes. Standardized coefficients are depicted. The squares depict manifest variables and the circles depict latent variables; the measurement models for the IPI and SET are not reported in this figure. Each outcome was regressed directly onto its baseline measure (e.g., reading achievement in 2009–2010 was regressed on reading achievement in the year prior to the school’s training in SW-PBIS) as well as each covariate at the left. However, these paths are not depicted in this figure for ease of interpretation. SET School-wide Evaluation Tool, IPI Implementation Phases Inventory, BoQ Benchmarks of Quality. *p < .05, **p < .01, ***p < .001

Relationship between school-level contextual factors and implementation

The years since training (Standardized Coefficient [Std. Coeff.] = .289, p = .002) and percent of certified teachers (Std. Coeff. =.187, p = .002) were both positively related to the IPI factor. Schools with a greater number of years since their training in SW-PBIS and a higher percent of standard certified teachers had better implementation. The student enrollment (i.e., number of students in the school), student to teacher ratio, and mobility were not significantly related the IPI factor. None of the modeled covariates were related to the SET factor. Similar to the IPI factor, the years since training (Std. Coeff. = .131, p = .015) and percent of certified teachers (Std. Coeff. = .151, p = .011) were also positively related to the observed score on the BoQ. In addition, student mobility (Std. Coeff. = −.245, p = .007) was negatively related to the observed score on the BoQ, indicating that lower rates of mobility were associated with higher implementation scores based on the BoQ. Indicators of school size (i.e., student enrollment and student to teacher ratio) were not related to the implementation levels assessed by the BoQ. This model accounted for a significant proportion of variance in the IPI factor (i.e., R 2 = .156) and the observed BoQ score (i.e., R2 = .116), but not the SET factor.

Relationship between school-level contextual factors and outcomes

As expected, the baseline measure of the math and reading achievement and truancy outcomes were positively related to their respective outcomes, such that earlier high achievement on math (Std. Coeff. = .720, p < .001) and reading (Std. Coeff. = .634, p < .001), and higher rates of truancy (Std. Coeff. = .671, p < .001) were associated with higher levels of these outcomes in 2010. Surprisingly, the relationship between baseline suspensions and the suspension outcome only approached significance at the p = .10 level (Std. Coeff. = .400). The number of years since training (Std. Coeff. = .248, p < .001), student enrollment (Std. Coeff. = −.085, p = .001), and mobility (Std. Coeff. = −.193, p = .001) were significantly related to math achievement such that a greater number of years since training in SW-PBIS, smaller school size, and lower mobility all related to higher math achievement in 2010. Only years since training (Std. Coeff. = .169, p < .001) and mobility (Std. Coeff. = −.268, p < .001) were significantly related to reading achievement in 2010. Higher levels of mobility (Std. Coeff. = .157, p = .002) were related to higher levels of truancy in 2010. Finally, student enrollment (Std. Coeff. = .197, p = .004) was related to suspension rates in 2010. The relationships between student enrollment and math achievement and suspensions implicitly demonstrates that middle schools generally had lower rates of achievement and higher suspensions than elementary schools, as elementary schools on average are smaller.

Relationship between implementation and student outcomes

Controlling for the direct effects of baseline measures and the school-level covariates, the IPI factor was significantly related to the math, reading, and truancy outcomes. Specifically, higher implementation, as indicated on the IPI, was associated with higher math achievement (Std. Coeff. = .146, p = .042), higher reading achievement (Std. Coeff. = .171, p = .006), and lower truancy rates (Std. Coeff. = −.088, p = .056). Neither the IPI factor, SET factor, nor the observed BoQ scores were related to suspensions. The SET factor and BoQ also were not related to any of the other outcomes. This model accounted for a significant proportion of variance (i.e., R 2) for three of the outcomes. Specifically, the R 2 values are as follows: math achievement = .750, reading achievement = .707, truancy = .651. The R 2 value for suspension (R 2 = .296) approached significance at the p = .10 level.

Discussion

This paper examined the relationship between implementation, as measured by three different instruments, on student outcomes, using data from a state-wide dissemination of a widely used, school-based prevention program. The availability of three indicators of implementation quality and multiple student outcome data provided a unique opportunity to explore these associations. These findings also shed light on how the choice of an implementation measure can influence the pattern of findings. Below, the specific substantive findings based on the SEM results are considered, as are some implications of these findings for future studies of the association between implementation fidelity and outcomes in scale-up efforts.

The findings indicate that the IPI factor was significantly related to reading and math achievement and truancy such that higher implementation was associated with subsequent higher achievement and lower rates of truancy. Interestingly, suspensions were not related to either of the implementation factors or the observed scores on the BoQ. In addition, the relationship between the suspension outcome and baseline suspension was not significant and the proportion of variance explained in the suspension outcome was lower than for the other outcomes and only approached significance. The findings for suspension were surprising given that suspensions are considered to be a proximal outcome of SW-PBIS.21 It is important to note, however, that suspensions are the one outcome in which subjectivity plays a role, as adult behavior affects the rate of suspensions in a school (e.g., the supervision of adults to realize a negative behavior has occurred; the choice of a teacher to refer a student to the office; and the choice of a principal to suspend). In addition, there have been efforts to explicitly decrease suspension rates in the state of Maryland; in comparing the baseline and 2010 rates, one sees that there is a drop in average suspension rates (i.e., from 11 % to 9 %) and in the variability (i.e., standard deviation of 17 to 11). Therefore, there may be overall shifts in suspensions that are associated with accountability efforts. Finally, there is some evidence from effectiveness studies that the ability to detect effects of an intervention varies by the measurement approach. For example, it is common in school-based studies for some measures (e.g., teacher-reported measures) to generate larger effect sizes.59 A meta-analysis of the Triple P program also found that the type of measurement was associated with the detection of effects.41,60 The suspension measure may be less sensitive to implementation effects, within the context of a state-wide scale-up effort.

The SET factor and observed BoQ scores were not significantly related to any of the outcomes. Of concern in the case of the SET was that there was a potential ceiling effect of scores, such that the average score was about 95 % and there was little variability in these scores (see Table 1). The SET’s restriction of range likely led to its inability to discriminate between schools’ outcomes.

The scores on the BoQ were lower on average and showed greater variability than the SET, and were similar to IPI average scores. The SET also correlated with the IPI and BoQ at a lower magnitude than the IPI and BoQ correlate with one another (see Table 2). In addition, the proportion of variance explained in the SET factor was not significant and included a non-significant factor loading for one of the scale scores. The non-significant scale, District-Level Support, is a two-item scale asking whether the district provides coaching support and funding for SW-PBIS.

Perhaps the differences detected in the predictive validity are the result of these three measures assessing slightly different aspects of SW-PBIS implementation. For example, the IPI is inherently different than the other two measures, as it takes a “stages of change” approach, ranging from start-up activities to more advanced implementation and sustainability. This may make it more appropriate for assessing fidelity over multiple years of implementation. In contrast, the SET focuses primarily on the start-up activities and initial phases of implementation and is the only measure completed by an outsider to the school. In fact, recent research on the SET suggests that this measure is most reliable at the elementary (versus middle and high) school level and may be most appropriate for administration in schools which have just begun implementing SW-PBIS. Nevertheless, the SET is still the most widely used measure. The IPI and BoQ were more recently developed, and thus would benefit from further research on their psychometric properties and predictive validity. In particular, replications of this study should be conducted, particularly in other states where there may be different levels of infrastructure to support SW-PBIS implementation. These areas have implications for the data collection practices in the state, as currently all three measures are collected by the state on SW-PBIS schools.

Limitations and future directions

There are several limitations to consider when interpreting these findings. Type II translational research is often characterized as “messy,” 5 as it is difficult to implement carefully controlled designs when examining the real-world process of program implementation. In addition, the measures of contextual factors used were school-level proxies for disorder,61 rather than specific survey measures of students or staff members. On the other hand, multiple ratings of the implementation quality of SW-PBIS were used, which is a unique and strong aspect of this study.

The outcome and implementation data come from one time point (i.e., the Spring of 2010 and the 2008 to 2009 school year, respectively) which is 11 years after the state initiative began. This highlights two other common obstacles in translational research, including the extended amount of time it may take to disseminate an intervention or approach 30,34 and the difficulty of assessment of a “moving target”.5 (p 212) By focusing on two school years (baseline and outcome), the data analysis is simplified and made more interpretable; however, future analyses should take into consideration patterns of implementation over time, beginning with the first year of implementation. The fact that different numbers of schools joined the initiative at different points across the 12-year effort makes such analyses complicated, as the data would need to be aligned by implementation year, rather than calendar year. Despite this, all school demographics modeled were from the year preceding the training year and the number of years since training was accounted for. It is possible that the associations between implementation fidelity and outcomes would vary at different points in the scaling-up process (e.g., if measured earlier in the statewide scale-up or in the future). This is an area for further research.

Given that this study occurred in one state, it is unknown whether the findings would generalize to other states where there may be different levels of support provided to schools implementing SW-PBIS and the data collected regarding implementation, the school context (e.g., requirements for teacher certification, varying school sizes or levels of mobility), and the tests of achievement outcomes would be different. Finally, non-implementing schools were not examined in this study, as implementation data were not collected from these schools. Therefore, it is unknown whether schools not trained in SW-PBIS implement similar strategies to trained schools or whether the use of SW-PBIS is superior to non-use within the context of a state-wide scale-up (i.e., randomized controlled trials have established its effectiveness on a smaller scale). Similarly, we were not able to track the implementation of other programs in combination with SW-PBIS (e.g., bullying prevention); this is, however, an important area for further research, as previous studies suggest that schools are likely implementing multiple prevention programs simultaneously.10

As noted above, the findings for the SET were less informative than the other two measures of implementation. This was a bit surprising given that the SET is the most established and most widely used measure of SW-PBIS fidelity, whereas the IPI and BoQ are newer measures. On the other hand, these two latter measures were developed in part to address some of the concerns regarding the SET related to a potential ceiling effects 53 and the burden of administration by an outside assessor. The current findings suggest that the IPI has the best predictive validity of the three measures examined. This finding also highlights a practical barrier in conducting scale-up efforts and evaluating their effectiveness; what is practical (e.g., only collecting data using one implementation measure) and desired may not result in a comprehensive understanding of the outcomes. On the other hand, collecting multiple measures of implementation from different sources is often seen as burdensome and redundant by schools but may be important.

Implications for Behavioral Health

Numerous authors have concluded that findings from efficacy and effectiveness trials rarely directly translate when broadly disseminated.62,63 Instead, programs need to be evaluated under real-world conditions, be practically important, and have adequate supports in place (e.g., manuals, technical assistance) to ensure implementation and then also must be evaluated in a scale-up effort 1. The current study is one attempt to fill this research gap, as it relates to a school-based prevention framework targeting positive behavioral supports and improving school climate and orderliness. Currently, there are two randomized trials documenting positive effects of SW-PBIS on student office discipline referrals, student discipline problems, and school climate.2024 Research is also under way to determine the extent to which the trial findings generalize to the broader set of schools within the state.17 The current study represents an important next step in the research on state-wide dissemination of school-based prevention programs and highlights the importance of developing an infrastructure to collect data on implementation quality and program outcomes when prevention efforts are brought to scale.

In addition, the importance of how implementation is measured is highlighted. This includes consideration of how measures used in randomized controlled trials may translate into the real-world setting in terms of their reliability and validity, and how the utility of the measures may change over time. The purpose for which a measure is used and whether it continues to be an effective measure over time are also important factors to consider. Although this study revealed significant associations between one measure of implementation and student outcomes, it also demonstrated non-significant associations between outcomes and two other measures. This highlights the importance of developing implementation measures which are studied across time and in large-scale initiatives, and are shown to be reliable and have predictive validity. These findings have broader implications for behavioral science, as they suggest a need for better implementation measures that are sensitive to both to the foundational pieces needed when first beginning the implementation of a new program, as well as the evolving efforts over time which may be harder to detect.