Introduction

A growing body of research points to the relationship between health and student learning. For example, a longitudinal study by London and Castrechini (2011) found that youths’ cardiovascular fitness predicted better academic performance (e.g., standardized math scores). Similarly, a literature review by Basch (2011) concluded that dietary behaviors (e.g., consuming a nutrient-rich breakfast) are associated with indicators of cognition (e.g., problem solving). Furthermore, a meta-analysis by Dewald, Meijer, Oort, Kerkhof, and Bogels (2010) found evidence of an association between various indicators of sleep and school performance. As such, addressing health behaviors within the school setting may be of significant benefit to youths’ academic- and health-related development. With respect to primary prevention, it therefore may be beneficial to implement evidence-based programs that can affect both health and learning. This may be especially the case in settings known to be affected by disparities in both educational and health outcomes, such as low-income urban communities (Aud, Fox, & KewalRamani, 2010; Orsi, Margellos-Anast, & Whitman, 2010).

Traditional approaches to health promotion have focused on addressing the most proximal potential causes of behavior (e.g., promoting exercise by addressing attitudes towards exercise). However, as proposed by a number of theoretical perspectives (e.g., Flay, Snyder, & Petraitis, 2009), more distal skills and attitudes (e.g., self-control) may affect a wide range of behaviors, including those that are likely to promote or compromise physical health and well-being. As such, school-based programs that are designed to facilitate the development of core social–emotional competencies (e.g., self-awareness, self-control, social awareness, prosocial skills, responsible decision making; Collaborative for Academic, Social, and Emotional Learning, 2005) could also potentially have an influence on physical health behaviors and related outcomes, through improvements in social–emotional skills (Durlak, Weissberg, Dymnicki, Taylor, & Schellinger, 2011). Given that schools are increasingly tasked with addressing overall health and well-being, social–emotional programs implemented within the school context should be evaluated for their impact on health-related outcomes.

Positive Action (PA) is one example of a school-based program with a curriculum designed to foster the social–emotional, intellectual-, and physical-self (Flay & Allred, 2010). The PA program aims to develop these social–emotional skills, and positive actions for the physical self (e.g., health-promoting behaviors) represent a very small percentage (~8 %) of the PA curriculum. Cluster-randomized controlled trials of PA in Hawai’i, Chicago, and a southeastern rural school district showed positive program effects on a measure of social–emotional and character development (SECD) (Washburn et al., 2011). The program has also been shown to improve emotional health (Lewis et al., 2013a) and engagement with learning (Bavarian et al., 2013; Snyder et al., 2010), and to reduce such health-compromising behaviors as substance use (Beets et al., 2009; Lewis et al., 2012; Li et al., 2011), violence (Beets et al., 2009; Lewis et al., 2013b), and sexual activity (Beets et al., 2009). In the Chicago trial of PA, improvements in substance use and emotional health were mediated through changes in SECD (Lewis et al., 2012, 2013a). These findings provide a basis for hypothesizing improved physical health behaviors and status among students exposed to the PA program, as well as mediation through relative improvements in social–emotional competence. To date, however, the program’s impact on physical health outcomes, as mediated by social–emotional improvements, has not been examined.

We used data from a Chicago cluster-randomized controlled trial of PA to examine the impact of the program on several health-related behaviors and on one physical health outcome, body mass index; our secondary goal was to test for mediation through changes in a measure of SECD. Our central hypothesis was that students in schools implementing PA, compared to those in control schools, would exhibit more favorable change over the course of the study on behaviors relating to eating, exercise, personal hygiene, and sleep, and at the study’s end-point would have lower body mass index, and that these effects would be mediated through relative improvements in SECD.

Methods

Design and Sample

For the Chicago trial of Positive Action, we received approval of study procedures by the Institutional Review Boards of the University of Illinois at Chicago and Oregon State University, the Research Review Board at Chicago Public Schools (CPS), and the Public/Private Ventures Institutional Review Board for Mathematica Policy Research (MPR). MPR was contracted to collect data during grades 3–5 as part of the Chicago trial’s involvement in a seven-site Social and Character Development Research Consortium (SACD) that was funded by the Institute of Education Sciences.

We drew the study sample from the 483 K-6 and K-8 schools in the CPS system. We excluded schools from involvement in the trial if they: (1) were non-community schools (e.g., magnet schools), (2) already had PA or a similar intervention, (3) had an enrollment rate below 50 or above 140 students per grade, (4) had annual student mobility rates over 40 %, (5) had more than 50 % of students who passed the Illinois State Achievement Test, and (6) had fewer than 50 % of students who received free or reduced-price lunch; these criteria ensured selection of youth who could be considered “high-risk” (Ji, DuBois, Flay, & Brechling, 2008). Of the 68 schools meeting eligibility criteria, we successfully contacted 36 who were also able to attend information sessions; of these 36, 18 agreed to participate given the understanding that they would be randomly assigned to PA or a control condition (Ji et al., 2008).

From these 18 schools, we created 9 best-matched pairs (matched on demographic and performance variables such as ethnicity, attendance rate, truancy rate, percentage of students on free or reduced-price lunch, and achievement test scores) using a SAS computer program provided by MPR; within each pair, we randomly assigned schools to PA or control using a random-number generation function (Ji et al., 2008). Given the nature of the intervention, blinding of condition status (e.g., by schools, students, teachers) was not possible. Funding allowed for seven of the nine pairs of schools to be included in the study; we gave preference for inclusion to pairs that best represented the ethnic diversity within the CPS system (Ji et al., 2008). The seven pairs of schools we selected did not significantly differ from the remainder of the 68 schools eligible for the study (Ji et al., 2008). Matching and randomization were successful, as evidenced by baseline equivalency between PA and control schools on both the matching variables and aggregate and survey data (Lewis et al., 2012).

We implemented the program school-wide beginning in the fall of 2004 and continued doing so throughout the 6-year study (the maximum amount of time possible within funding limits); we asked control schools to conduct “business-as-usual.” Periodic monitoring of control schools indicated that although their implementation of activities directed toward character and social–emotional development was commonplace (SACD, 2010), there was no implementation of programs similar in scope or intensity to that of PA. Students in the treatment schools had their first exposure to PA lessons during the 2004/2005 academic year; students in the study cohort received developmentally appropriate PA lessons (i.e., those designed for students’ particular grade level) from grade 3 to grade 8. We gave teachers and other school staff flexibility to adapt lessons and other program activities for purposes such as cultural appropriateness and tailoring to the school environment (e.g., changing names of story characters to be more culturally relevant), as long as the core content of lessons was maintained. Implementation, as measured by different indices (e.g., trainings attended, teacher description of PA activities during classroom time; Bickman et al., 2009), varied across schools and improved across time; by the final year of the study, key program benchmarks were being implemented at moderate (50–59 %), moderate-to-high (60–69 %), and high (above 70 %) levels by one, three, and three schools, respectively (Li et al., 2011).

Students completed surveys in the fall of 2004 (beginning of grade 3), spring 2005, fall 2005, spring 2006, spring 2007, fall 2008, spring 2009, and spring 2010 (end of grade 8). Our longitudinal trial used a cluster-focused, intent-to-treat design with a dynamic grade cohort (Brown et al., 2008; Vuchinich, Flay, Aber, & Bickman, 2012) at the student level. That is, we followed a cohort of students starting in grade 3 through to the end of grade 8, we included students who entered study schools after baseline in the analysis and we excluded students who left participating schools.

We obtained parental consent before students completed surveys; we obtained parental consent from all students upon entry into the study, and re-consented for the second phase of funding at Wave 6. At baseline, 79 % of parents provided consent; consent rates ranged from 65 to 78 % for Waves 2 through 5 and 58 to 64 % for Waves 6 through 8. The total number of students enrolled in the study across all eight waves of data collection was 1170, of whom approximately 53 % were female; approximately 48 % were African American, 27 % Hispanic and 19 % were classified as Other (e.g., White, Asian, Native American, and “Other”). Demographic indicators (e.g., % male students, % African American students, % students receiving free or reduced price lunch, and school attendance rate) were comparable between control and PA schools at both Wave 1 and Wave 8 (Lewis et al., 2012). There was attrition at the student level due to student mobility (Fig. 1; Lewis et al., 2012), which is common in studies conducted in low-income, urban settings (Tobler & Komro, 2011). Specifically, 131 of the original 624 Wave 1 students were still present at Wave 8; 64 of the 316 PA students present at Wave 1 were present at all 8 waves of the study, and 67 of the 308 Control students present at Wave 1 were present at all 8 waves of the study. Table 1 compares student-level demographics at baseline (Wave 1) and endpoint (Wave 8). Mobility was more likely among African-American students, as compared to White, Hispanic, and Asian Students (Lewis et al., 2012). Also, mobility was more likely among older students (Lewis et al., 2012). Still, mobility patterns did not differ between control and treatment schools (Lewis et al., 2012). In addition, we conducted attrition analyses to determine if attrition was associated with the outcome variables, and we found no significant differences (results not shown). Lastly, the average number of waves of data collected per student was 3.1. Nonetheless, all 14 schools stayed in the study and in their assigned condition.

Fig. 1
figure 1

Flow diagram of students. Reprinted, with permission, from the following publication: Lewis et al. (2013b)

Table 1 Attrition analysis on student demographics

Intervention

Positive Action (Flay & Allred, 2010) is a school-based SECD program that includes PreK-12 curricula, of which the K-8 portion was implemented by intervention schools in this study. This portion of the curriculum consists of 140 15–20 min, age-appropriate, interactive lessons per grade, taught 4 days per week for grades K-6, and 70 20-min interactive lessons per grade, taught 2 days per week for grades 7 and 8. Lessons are delivered by the classroom teacher and involve both group work and homework. The program curriculum includes six units: self-concept is addressed in Unit 1, health behaviors are directly addressed in half of Unit 2, and Units 3 through 6 focus on social and emotional positive actions.

Measures

Outcomes

We collected student self-reports of health behaviors at each wave using a set of 10 items considered appropriate for grade 3 students. These items drew from the Youth Risk Behavior Surveillance System, and assessed how often (1 = “none of the time” to 4 = “all of the time”) students engaged in different behaviors. The correlation analyses we conducted suggested the ten individual items grouped as three multi-item measures focused on healthy eating and exercise, unhealthy eating, and personal hygiene, and one single-item measure pertaining to sleep.

We used Stata’s “sem” command to conduct confirmatory factor analysis on the nine items tapping the three types of health behavior. Using data from Wave 1 (baseline), the comparative fit index (CFI) and root mean square of approximation (RMSEA) for the model were 0.93 and 0.05, respectively; at Wave 8 (endpoint), the CFI and RMSEA for the model were 0.99 and 0.02, respectively. For both waves, items loaded significantly onto each specified factor. Using these suggested groupings, we computed average scores for each type of behavior, with higher scores reflecting engagement in more of that type of behavior.

We calculated student-level stability reliabilities using 2-week test–retest data available at Wave 5 for approximately 60 students. Test–retest correlations were 0.68 for healthy eating and exercise, 0.74 for unhealthy eating, 0.75 for consistent bedtime, and 0.76 for personal hygiene, reflecting the moderate to strong stability of the measures. As such, we proceeded with using the three multi-item measures and one single-item measure to assess program effects.

Healthy Eating and Exercise

Students indicated how much of the time they “eat fresh fruits and vegetables,” “drink or eat dairy products,” and “exercise hard enough to…sweat and breathe hard.” At Wave 8, standardized factor loadings for the three items ranged from 0.47 to 0.55.

Unhealthy Eating

Students indicated how much of the time they “eat junk food (chips and candy),” “eat fast food,” and “drink soda pop.” At Wave 8, standardized factor loadings for the three items ranged from 0.71 to 0.90.

Personal Hygiene

We examined hygienic habits using the following items: “I wash my hands after using the toilet,” “I brush my teeth at least twice a day,” and “I cover my nose and mouth when I sneeze.” At Wave 8, standardized factor loadings for the three items ranged from 0.46 to 0.57.

Consistent Bedtime

We assessed sleep-related behavior using a single item in which students rated how much of the time they “go to bed by 9:00 p.m. on school nights.” Due to the bimodal distribution of responses, we categorized this measure as dichotomous, with 0 = Not all of the time and 1 = All of the time; we chose this classification based on research suggesting school-age children need 10–11 h of sleep each night (National Sleep Foundation, 2016).

Body Mass Index z-Scores

Height and weight measurements were taken by trained data collectors at Wave 8 (grade 8). Students were asked to remove hats, coats, and shoes and were assessed one at a time in a private room using the Cardinal® Detecto (model number 6129) scale with a vertical ruler to simultaneously record height and weight. We used height, weight, gender, and age data to calculate body mass index (BMI) scores, and we converted these scores into BMI z-scores (zBMI) using United States reference data collected by the Centers for Disease Control and Prevention in 2000. This method has been used in past studies (Economos et al., 2013).

Mediator

We measured the hypothesized mediator, SECD, using the 28-item SECD Scale (Ji, DuBois, & Flay, 2013; Lewis et al., 2012). Example items, which encompass six subscales (i.e. prosocial interactions, honesty, self-development, self-control, and respect for teachers and parents), were: “I play nicely with others,” “I apologize when I have done something wrong,” “I make myself a better person,” “I keep my temper when I have an argument with other kids,” “I speak politely to my parents,” and “I speak politely to my teacher.” Students indicated how often they engaged in each behavior (1 = None of the time to 4 = All of the time). We created an average of scores on the 6 subscales, with higher scores indicating higher SECD skills. Cronbach’s alpha for this composite scale ranged from 0.92 to 0.95 across the 8 waves.

Data Analyses

In our preliminary analyses, we used Stata v13 to assess the distributions of each outcome, calculate intraclass correlations, and determine correlations between the measures at Waves 1 and 8, respectively.

We investigated program effects on the health behavior measures by estimating multilevel growth-curve models using Stata v13. We estimated models based on their distribution; healthy eating and exercise, unhealthy eating, and personal hygiene were negatively skewed, consistent bedtime was bimodal, and zBMI was normally distributed. We handled missing data using maximum likelihood estimation (Rabe-Hesketh & Skrondal, 2008). We used endpoint regression analyses to assess program effects on the zBMI measure. We used two-tailed p values for the condition by time interaction (for longitudinal data) and condition (for endpoint data) parameter estimates to assess statistical significance of program effects, with 0.10 > p > 0.05 considered marginal and p ≤ 0.05 considered significant.

For the longitudinal models, we first estimated a random-intercept model including quadratic terms for time and the interaction of condition by time (c × t). Nonsignificant higher order terms were dropped from the model for parsimony. When applicable (i.e., for the consistent bedtime outcome), we also estimated a random-coefficient model and we performed a likelihood-ratio test to determine which model (random-intercept or random-coefficient) was more appropriate (Rabe-Hesketh & Skrondal, 2008). We calculated effect sizes for most outcomes using the method described by Lipsey and Wilson (2001). We used the Cox transformation to calculate an effect size for consistent bedtime (a binary outcome; Sánchez-Meca, Marín-Martínez, & Chacón-Moscoso, 2003), and we calculated Hedges g effect size for endpoint zBMI.

We used two approaches to assess the robustness of results related to program effects (i.e., sensitivity analyses). First, we estimated a model including pairs as the cluster variable and compared results for the c × t interaction to the main analyses with school as the cluster variable. In addition, we compared the test statistic for the c × t interaction in the multilevel model to 2.18, the p ≤ 0.05 critical value for a two-tailed t distribution with 12 degrees of freedom (N = 14 schools; Raudenbush & Bryk, 2002).

We examined the roles of gender and student mobility in moderating program effects in supplemental analyses; the confounding nature between race/ethnicity and school in this trial precluded our testing for moderating effects by race/ethnicity. Given the mobility that occurred within this study, it is plausible that students who received the most exposure to the PA program (i.e., students who remained in a treatment school throughout the study’s duration) would have more favorable change across time. To characterize student mobility patterns, we used latent class analysis to group underlying patterns of mobility. In prior analyses (Lewis et al., conference presentation), a 5-class solution was found to be the most appropriate fit for the data. In these moderation analyses, students with the greatest time in the study (i.e., “stayers”) served as the reference group.

Upon completion of these analyses, we estimated longitudinal mediation models for outcomes with marginal or significant program effects. Prior to running the mediation models, we tested for the presence of a significant quadratic trajectory for measures collected across time (i.e., all measures excepting zBMI). Because our mediator (SECD) included a quadratic, the effect of the linear slope on the change in each outcome is interpretable at the intercept (Gordon, 2012); thus, our time scales were adjusted to run from −4.00 to 1.58 years (as opposed to the real time of 0.00–5.58 years), with the intercept being set at Wave 6, the first data collection time after the important transition to middle-school grades.

To test for mediated effects we used Mplus v6.12. We included the mediator (i.e., change in SECD) in the model to calculate direct (c′) and indirect (a × b) effects (Fig. 2). Given our adjusted time scales, the effect of the mediator on each outcome is based on the point in time when students had started grade 7 (Wave 6). Indirect effects (i.e., the effect of PA on the slope of each outcome as mediated by the slope of SECD) were computed as described by MacKinnon (2008). The small number of clusters (i.e., 14 schools; Hox & Maas, 2001), low intra-class correlations across schools at baseline (discussed below), and the non-normality of the outcome variables, all in combination with the technical complexities of mediation testing in a multilevel modeling framework, precluded a multi-level structural equation modeling analysis (Hox & Maas, 2001; Singer & Willett, 2003; Zhang, Zyphur, & Preacher, 2009). Instead, we used bootstrap estimation with 1000 replications to address issues of non-normality (William & MacKinnon, 2008), and used maximum likelihood to handle missing data. Mediation was classified as complete, partial, or non-significant (Baron & Kenny, 1986; MacKinnon, Lockwood, Hoffman, West, & Sheets, 2002).

Fig. 2
figure 2

Longitudinal mediation models. Note In the mediation model, paths were also included from Positive Action (Condition) to the intercept of the mediator and the intercept of the outcome. The variance of the quadratic for the mediator (i.e., social–emotional and character development) was set at zero, and the intercepts of the errors of the mediator and the outcome were correlated. In addition, the time measure was set from −4.00 to 1.58, with the intercept (time 0) set at Wave 6 (i.e., the start of grade 7). Notation Note “Condition” represents the Positive Action program; Ri, Rs, and Rq represent the variances in intercept, slope, and quadratic growth factors, respectively; a, b, and c’ represent the path coefficients where the a path represents the direct effect of the independent variable on the mediator, the b path represents the direct effect of the mediator on the dependent variable, and the c’ path represents the direct effect of the independent variable on the dependent variable with the mediator included

Results

The baseline intraclass correlations (ICCs) across schools for healthy eating and exercise, personal hygiene, and consistent bedtime (prior to categorizing the measure as binary) were low (0.02, 0.01, and 0.01, respectively). In contrast, the ICC for unhealthy eating was moderately high at baseline (0.14). The ICC across schools for endpoint zBMI was also low (0.01).

Table 2 shows the correlations among the health behavior outcome measures and SECD mediator at Wave 1 and Wave 8. At Wave 1, the healthy eating & exercise measure was significantly and positively correlated with personal hygiene and consistent bedtime, and personal hygiene was significantly and positively correlated with consistent bedtime. At Wave 8, the healthy eating & exercise measure was again positively associated with personal hygiene and consistent bedtime. Scores on zBMI were inversely associated with healthy eating & exercise, unhealthy eating, personal hygiene, and consistent bedtime, although only the association with unhealthy eating, which was in an unexpected direction, was significant. With respect to the mediator, at both points in time SECD had a significant (p < 0.01) and positive association with measures of healthy eating, personal hygiene, and consistent bedtime.

Table 2 Correlations between health measures at (A) Wave 1 (n = 593) and (B) Wave 8 (n = 331)

Findings from our analyses testing for program effects and associated effect sizes are presented in Tables 3 and 4, respectively. We observed a significant program effect for personal hygiene (c × t b = 0.05 (0.01, 0.08), p = 0.02, ES = 0.48), indicating that PA students had a more favorable trajectory for this behavior than students in control schools; this finding was maintained in both sensitivity analyses. The estimated program effect for healthy eating & exercise was marginally significant (c × t b-coefficient = 0.03 (−0.00, 0.06), p = 0.09, ES = 0.21), suggesting students in PA schools had more favorable growth in these health-promoting behaviors. Results for this measure were not significant in sensitivity analyses. For unhealthy eating, the program effect was not significant in the primary analyses (b = −0.03 (−0.06, 0.01), p = 0.12), ES = −0.19), but was marginally so in the sensitivity analysis that included pair as the cluster variable (b = −0.03 (−0.07, 0.00), p = 0.08), suggesting that growth in unhealthy food intake was attenuated by participation in the PA program. The program effect was not significant for consistent sleep in any analyses, although the effect size (Cox-d = 0.35) was moderate in magnitude. Considering that a consistent 9:00 p.m. bedtime may be appropriate only for younger students, we conducted supplemental analyses using Wave 5 as the endpoint; these analyses also showed a non-significant but moderate effect size (Cox-d = 0.33). Lastly, our endpoint analysis on the zBMI outcome indicated a marginally significant program effect in the hypothesized direction (b = −0.22 (−0.46, 0.02), p = 0.07, ES = −0.21). For all outcomes, we did not observe moderation by gender or mobility group.

Table 3 Multilevel growth-curve model estimates and confidence intervals for student-level measures (N = 1170 students in 14 schools)
Table 4 Estimated means and effect sizes

We estimated longitudinal mediation models (Table 5) for all outcomes except consistent bedtime (as it did not have a marginal or significant program effect). In all models, the “a” path (path from program condition to change in SECD) was significant at the 0.01 level. The path from SECD to each outcome was significant at the 0.01 level for healthy eating and hygiene, and at the 0.05 level for unhealthy eating. The calculated indirect effect (a × b) was significant at the 0.01 level for healthy eating and hygiene, and at the 0.05 level for unhealthy eating. As the c′ path remained significant in the model for healthy eating, the mediation observed was classified as partial. The c′ path was reduced to non-significance in the hygiene and unhealthy eating models. As such, the mediation observed for these two models was classified as complete. Mediation was not observed for the zBMI outcome.

Table 5 Summary of unstandardized mediation results, performed using MPlus v6.12 (N = 1170 students)

Discussion

Our study is the first to use a randomized design to examine the longitudinal impact of a social–emotional program, Positive Action, on youths’ health behaviors. We observed significant program effects across time for personal hygiene, and marginal condition × time interactions for healthy eating and exercise and unhealthy eating. We also observed a marginal program effect on endpoint BMI z-scores. Although the magnitude of effects was modest, they demonstrate the potential of a social–emotional program to impact health behaviors, even though a direct focus on physical health was a relatively small component (approximately 8 % of lessons) of the PA intervention. Specifically, while only half of Unit 2 covered health behaviors, we observed relative improvements in health behaviors among students receiving PA. Results are particularly encouraging because the trial’s setting (i.e., low-income communities) is one known to be impacted by health disparities (Orsi et al., 2010).

We also examined one proposed mechanism of program action. Our cross-sectional correlation analyses showed an association between SECD and health-promoting behaviors. Moreover, our longitudinal mediation analyses showed evidence of complete or partial mediation through changes in SECD for three of our outcomes. This finding supports the literature (Flay et al., 2009), we reviewed that suggests ultimate- and distal-level predictors’ impact various types of behaviors.

Our results also provide direction for future research. Unhealthy eating was the only measure which had an intraclass correlation above 0.02, suggesting that there is variation between schools in this behavior; it is possible that the extent of unhealthy snack availability (e.g., vending machines that dispense sugar-sweetened beverages) may vary by school. Future studies may consider accounting for the obesogenic nature of the school environment. Interestingly, we observed a significant inverse cross-sectional association between zBMI and unhealthy eating, whereby greater unhealthy food consumption was correlated with a lower zBMI. This finding contradicts results from a systematic review of 30 studies showing overweight and obesity to be associated with greater consumption of sugar-sweetened beverages (Malik, Shultze, & Hu, 2006). Our post hoc analyses showed the observed inverse correlation at Wave 8 was significant only for girls (r = −0.25 for girls; r = −0.15 for boys). Given the cross-sectional nature of the relationship, further examination of this observed correlation is warranted.

Limitations

Student self-reports are subject to social desirability bias; this bias, however, should affect students equally in both treatment and control schools, yet program effects were observed nonetheless. Additionally, the sole researcher-collected measure (i.e., zBMI) was available only at endpoint; conclusions related to the program’s influence on weight across time, therefore, cannot be drawn. Also, the measure of consistent bedtime did not account for student wake time, limiting the ability to measure sleep duration. Future evaluations of similar programs should, therefore, collect height and weight data across time, as well as include more items related to sleep behaviors.

Because students completed a survey that assessed a broad range of outcomes, the number of items specific to health was limited. Future research should include a more in-depth assessment of health behaviors and related outcomes, using standardized instruments with well-established psychometric properties. Future research could also consider modifying the content of certain questions related to health behaviors across time points in order to be sensitive to developmental differences (e.g., sleep-related items should be adjusted as youth transition into adolescence, as we witnessed marked declines across time points in the percentage of youth who always sleep by 9:00 p.m.). Even so, we found the items had moderate to strong stability reliabilities.

Another limitation relates to our lack of parent or teacher reports of student health behavior to substantiate our findings. Additionally, data on school-level barriers to healthy decision making (e.g., fast food outlet density) that may confound results were not available. Future research should substantiate student reports with parent- and teacher-reports of health behaviors. Archival data should also be used in future studies to control for the availability of healthy and unhealthy food options. Funding constraints limited our ability to rigorously examine the role of fidelity of implementation on program effects. Future evaluations of PA and PA-like programs should, therefore, aim to prioritize measurement of implementation.

With respect to study design, the small number of schools (N = 14) could limit statistical power. However, that some significant findings were found in spite of this limitation suggests that our findings are robust. Also, findings are generalizable only to similar schools (i.e., low-income, urban schools) that would self-select to participate in a trial of this nature. Nonetheless, we were able to focus on a high-risk population affected by disparities in health.

Lastly, the multi-faceted nature of the intervention under study also presented analytical limitations. Ideally, we would have liked to conduct a dose–response analysis, but using a measure of years of classroom exposure to assess dosage appeared problematic for a number of reasons. First, the classroom was only one component of the PA intervention. Specifically, the program includes teacher and counselor training in addition to the classroom student curricula. Second, fidelity of implementation within the classroom improved over time (Li et al., 2011) and it is also likely the school context was influenced by the presence of the intervention, thus altering the context for newly arriving students. In other words, fewer years of exposure due to late arrival was confounded with factors (e.g., co-occurring changes in fidelity and school context) in ways that would have prevented us from confidently attributing any differences in outcomes relating to exposure to a dose effect of the intervention as opposed to these other influences. In view of such concerns, we instead examined differences in intervention effects in association with student mobility classes that take into account both timing of entry (or departure) to the school and number of years of exposure. Although it could be useful in future research to attempt to disaggregate these types of influences, attempting to do so reliably was beyond the scope of the present study.

Conclusions

Strengths of this study included its longitudinal nature, experimental design, successful matching, lack of attrition at the unit of randomization (i.e., the school), and use of sophisticated techniques to handle the hierarchical nature of the data, student mobility and mediation. As such, we were able to use a methodologically strong study to examine the impact of a social–emotional intervention on physical health behaviors in a setting where the need to impact both social–emotional skills and physical health is present. We observed small-to-modest effects on health behaviors, which is understandable yet encouraging as only a small percentage of the PA curriculum focused on health behaviors. Results from the mediation analyses suggest that the program’s focus on ultimate causes of behavior representing core skills (e.g., SECD) was beneficial in addressing health behaviors. Moreover, the findings reported here parallel, albeit to a lesser extent, the trend of positive impact observed in the Chicago RCT of PA on emotional (Lewis et al., 2013a) and behavioral (Lewis et al., 2012, 2013b) outcomes. Given the interrelatedness of these outcomes, findings suggest that SECD programs should be considered by schools seeking to prevent a wide range of detrimental adolescent behaviors. Moreover, the results indicate that schools aiming to address a specific health behavior could see a greater impact by implementing a program that incorporates the development of social–emotional skills. Promoting these skills during childhood and adolescence can have longer-term prevention implications, as health-promoting behaviors can be developed during youth and maintained into young adulthood. Although the intensive nature of PA and similar programs may be intimidating, buy-in from key administrative staff, funding for incentives and the ability to readily provide technical support will help schools overcome the challenges of program implementation (Lewis et al., in preparation). The findings of this study suggest that SECD programs may be a useful supplement to other physical health instruction and activities already being implemented in the school setting.