Introduction

During adolescence, young people must negotiate complex and inter-related biological, cognitive, emotional and social-cultural changes. Problem behaviors including substance misuse, risky sexual behaviors, school dropout, antisocial attitudes and violence increase during early adolescence (e.g., Dryfoos 1990) and can lead to greater likelihood of negative behaviors into adulthood. Early adolescence is therefore an important time to intervene and influence the trajectory of an individual’s cognitive, social, emotional and cultural development and their risk behavior. Positive youth development interventions encompass these two overarching aims. Such programs need to be sensitively designed to capitalise upon the plasticity that characterizes this developmental period and address the unique challenges inherent in adolescence. As Tolan et al. (1995, p 579) note, “Intervention designs informed by this model emphasise developmentally appropriate components, sensitivity to the impact of timing of intervention, and evaluation of the impact on future development as well as curtailing or preventing the target symptoms”. It is the above features that may contribute to the potential success of positive youth development programs. This paper systematically evaluates the effectiveness of such programs, exploring both their impact upon risk factors (e.g., sexual behaviour, substance use, antisocial behaviour, depression) and positive outcomes (e.g., academic achievement, prosocial behaviour, psychological adjustment) using meta-analyses.

Adolescent Health Risk Behaviors as a Public Health Problem

Adolescence is a critical period during which many health-risk behaviors are initiated, including substance use, sexual risk and antisocial behavior (Degenhardt et al. 2008; World Health Organisation 2014). Despite the recent decline in some risk behaviors (e.g., smoking and unprotected sex), young people are still more likely to engage in risky behaviors than adults over 25 (Eaton et al. 2006). Risky sexual behavior in young people under 25 results in unintended pregnancies and sexually transmitted infections (e.g., Department of Health 2011). In addition, between 6 and 13 % of adolescents smoke regularly, drink alcohol and use illicit drugs (e.g., Connell et al. 2009; Gunning et al. 2010; McVie and Bradshaw 2005). Aggression and antisocial behavior in young people are also problematic, with approximately a quarter of young people found to carry a weapon and 19 % found to have attacked someone with the intent of seriously hurting them (e.g., Beinart et al. 2002). Taken together, these findings highlight the frequency of initiation of health risk behaviors in adolescence.

In addition to their frequent occurrence in adolescence, behaviors such as substance use, risky sexual behavior, smoking and antisocial behavior tend to cluster (Hale and Viner 2012; Jackson et al. 2012a, b; Mistry et al. 2009; Wiefferink et al. 2006). Individuals engaging in one risky behavior are more likely to engage in others (DuRant et al. 1999). Such behaviors are thought to share common biological and environmental determinants (Beyers et al. 2004; O’Connell et al. 2009; Resnick et al. 1997), which likely shape the development of multiple risk-taking. For example, substance use before the age of 16 has been positively associated with early sexual initiation, poor contraceptive use, violence and delinquency (Bellis et al. 2008; Hawkins et al. 1999; Parkes et al. 2007). Adolescent risk-taking often continues into adulthood, with consequent negative outcomes such as poor physical, mental and sexual health; substance abuse and addiction; poor educational and occupational achievement; future morbidity and premature mortality (Biglan 2004; Fergusson et al. 2007; Flory et al. 2004; Mirza and Mirza 2008). Youth risk-taking behavior therefore has substantial personal and economic costs for adolescents, their families, communities and services (Scott et al. 2001; Hoffman and Maynard 2008; Parsonage 2009).

Positive Youth Development

Public health experts believe that reducing the prevalence of modifiable behavior patterns could result in reduced public costs, improved overall well-being and health throughout adolescence and adulthood (e.g., Steinberg 2004). Given the observed clustering of risk behaviors, it has been suggested that interventions should take a broad approach and address multiple problems and their common determinants simultaneously (e.g., Bonell et al. 2007; Hawkins et al. 1999; Kipping et al. 2012). Positive youth development interventions aim to address the common determinants of adolescent multiple health risk behaviors. As the term implies, positive youth development interventions do not focus solely upon a pathology or deficit model. While accepting that a holistic understanding of adolescent development must include adverse aspects, the positive youth development approach takes the perspective that all young people have inherent strengths (e.g., Damon 2004; Roth and Brooks-Gunn 2003a) and that development takes place within relational systems (e.g., Lerner 2006; Overton 2013). The aims of positive youth development interventions are to support adolescents to acquire a sense of competence, self-efficacy, belonging and empowerment (e.g., Bowers et al. 2010), thus promoting positive behavior and reducing the likelihood of risk behavior. Effective positive youth development interventions should optimize the interaction between the unique strengths of the individual and their contextual resources (e.g., healthy relationships with adults, access to community-based activities; Spencer and Spencer 2014). The potential advantages of positive youth development interventions have led to major investment in many countries. For example, in the UK, millions of pounds have been invested in youth development interventions (Scottish Government 2009) as public health officials see these as essential in promoting the health and well-being of young people (e.g., HM Government 2010). Therefore, it is important to understand the impact of these interventions and their mechanisms of action.

Although the philosophy and aims of positive youth development have been well articulated (e.g., Benson et al. 2006; Damon 2004; Larson 2000; Lerner 2006), the core components of an effective positive youth development program remain unclear (Brooks-Gunn and Roth 2014). There is considerable diversity in the operational features and activities that currently characterize positive youth development programs. In the first literature review, Catalano et al. (2002) defined positive youth development as developing cognitive, social, emotional, behavioral and moral bonding competences; self-efficacy; prosocial behavior; a belief in the future; a clear and positive identity; self-determination and spirituality. Roth and Brooks-Gunn (2003b) believed that for a program to be considered positive youth development it must (a) foster program goals, such as confidence, competence, character, connections and caring; (b) provide young people with opportunities and experiences at school, at home and in the community so that they can develop their interests and talents and build new skills and competencies; (c) create a supportive atmosphere in which young people can develop bonds with the adults involved in delivering the program as well as with the other program participants. Positive youth development interventions also need to be stable and long lasting, so that the participants have sufficient time to form and benefit and from positive relationships. The mechanisms by which positive youth development interventions are hypothesized to work are equally diverse. The active ingredients are thought to include (1) engaging young people in structured and productive activities thus diverting them from unhealthy behavior (Roth et al. 1998), (2) providing adolescents with additional resources and time to develop knowledge, skills and social networks (Pettit et al. 1997) and (3) addressing risk factors such as low self-esteem, poor educational attainment and low aspirations for the future by developing protective factors such as social and emotional competencies (Catalano et al. 2002). These examples demonstrate that positive youth development interventions vary considerably in structural and process features.

Prior Reviews of Interventions to Promote Positive Youth Development

Despite extensive investment, the effectiveness of positive youth development interventions in reducing risky behavior and promoting positive behavior is uncertain. Positive youth development programs have been examined in meta-analyses (e.g., Durlak et al. 2010; Shepherd et al. 2010) and narrative reviews (e.g., Catalano et al. 2002; Clarke et al. 2015; Gavin et al. 2010; Roth and Brooks 2003b). Some of these reviews have shown that positive youth development interventions are effective, with others yielding mixed or inconclusive findings. Investigations focusing on social outcomes have shown positive effects for academic achievement and cognitive variables and social skills (e.g., Catalano et al. 2004; Clarke et al. 2015; Durlak and Weissberg 2007; Durlak et al. 2010). However, Zief et al. (2006) in a review on after-school programs that combined recreation and/or youth development programming with academic support services, found that there was limited impact on academic and behavioral outcomes. Systematic reviews examining health outcomes have focused primarily on sexual health and have also had mixed findings (e.g., Catalano et al. 2004; Gavin et al. 2010; Shepherd et al. 2010). Some positive youth development reviews have also reported reductions in violence and drug use (e.g., Catalano et al. 2002; Durlak et al. 2010; Roth and Brooks 2003b). These findings demonstrate the lack of clarity and consistency in the existing literature on the impact of positive youth development programs.

The observed variance in findings across reviews of positive youth development interventions may be explained both the variety in program components and differences in review methodology. Existing reviews differ in their inclusion criteria, data pooling methods and the outcomes examined. Reviews have generally focused on either health or social/behavioral outcomes and some were non-systematic or were limited to a narrative approach (e.g., Catalano et al. 2002; Gavin et al. 2010; Roth and Brooks 2003b). Other reviews have limited inclusion to programs that have evidence of effectiveness (e.g., Catalano et al. 2002). All previous reviews have included a mix of randomized controlled trials, quasi-experiments and even non-experimental studies. This raises the possibility of systematic bias affecting the reviews, potentially conflating the effectiveness of positive youth development interventions or reducing their ability to detect genuine effects.

Contributions of this Review

The mixed findings of previous reviews, the range of positive youth development interventions and the widespread interest and investment in positive youth development has motivated this review. This review aims to address some limitations of previous reviews by adopting an inclusive approach. It differs from previous reviews in several ways. First, this review included all possible randomized controlled trials, which provide stronger evidence of a program’s impact, since randomized controlled trials have the highest possible internal validity. Second, a systematic strategy was used to identify all possible published and unpublished studies that provided evidence of program impact, regardless of their findings (positive, negative or no effects). We included both published and unpublished documents to avoid review bias, since studies with significant results are published whereas those with non-significant results remain unpublished (i.e., the “file-drawer effect”; Rosenthal 1979). Third, the impact of positive youth development interventions was explored across a range of health, social and behavioral outcomes. Fourth, this review systematically assessed study quality according to established guidelines (i.e., the Cochrane Collaboration Risk of Bias Tool). Finally, the evidence on particular outcomes across the studies was pooled using meta-analytic methods (where appropriate) to maximize power to detect intervention effects.

Purpose of the Current Study

The purpose of this review and meta-analysis was to synthesize evidence on the effectiveness of positive youth development interventions in young people aged 10–19 years. As many outcomes cluster because they shared the similar risk and protective factors, the effects of positive youth development interventions on multiple health, social and behavioral outcomes were explored. These included substance use, sexual risky behavior, psychological adjustment, prosocial behavior and academic performance. We also examined whether the variation in the effects was moderated by study, intervention and participant characteristics.

Method

The PRISMA guidelines for the conduct of systematic reviews and meta-analyses (Moher et al. 2009) were followed for the planning, conduct and reporting.

Study Inclusion Criteria

The inclusion criteria were formulated in accordance with the PICOS approach and included the following:

Population

The focus of our intervention was young people. The majority of participants (at least 75 %) at the pre-test were 10–19 years of age. As our interest was in preventive approaches, programs for specific populations such as youth with learning or physical disabilities were excluded. However, studies which targeted young people on the basis of their pre-existing risk behavior or other forms of targeting, such as young people at a high risk of teenage pregnancy, students from poor socioeconomic status families and students with poor grades, were included in this review.

Intervention

Positive youth development programs were defined as those that involved voluntary education to promote positive development (National Youth Agency 2007). Specifically, programs needed to address at least one of the 12 positive youth development goals formulated by Catalano et al. (2002) across social domains, including school, community and family, or more than one goal in a single domain. These goals included bonding, resilience, social, emotional, cognitive, behavioral or moral competence, self-determination, spirituality, self-efficacy, positive identity development, belief in the future, recognition for positive behavior, opportunities for pro-social behavior and prosocial norms.

Outcome

Any health or non-health outcome with at least two measurements points was included. Outcomes were measured in several categories; social and emotional skills, positive social behavior, mental health issues, sexual risk behavior and academic performance. Programs that resulted in both significant and non-significant changes in the outcomes compared to the control conditions were included, and programs that only focused on knowledge and attitude changes were excluded. Self-reports, official records, and third party (i.e., parents, teachers) measures (both validated or not validated) were eligible for inclusion. Table 1 details the outcomes used in this meta-analysis.

Table 1 Outcome categories used in this meta-analysis

Setting

Out-of-school programs were the focus of the intervention. These included all activities targeting young people that were delivered regularly either in a community or a school-based setting outside normal school hours. Interventions that were delivered primarily during school hours were excluded, as these were the focus of a recent review (Durlak et al. 2011). For studies that included more than one intervention, only those interventions that focused on out-of-school programming as the main intervention were included. The following criteria were used to determine the main intervention: (a) if the author identified that out-of-school programming was the main intervention or (b) if the report gave out-of-school interventions a higher importance in relation to other interventions. Our review also excluded interventions that focused on family functions and so were targeted at parents/other family members as well as young people. Programs were only included if they focused primarily on young people and out-of-school programming to minimize the potential moderating effects of other variables (i.e. the effect of the program on parents, effects of in-school components) on intervention impact.

Design

Studies were eligible if they were randomized controlled trials and used a control condition to evaluate positive youth development interventions. Waiting list or no treatment, treatment as usual or alternative treatments were all considered valid control conditions. Other inclusion criteria were sufficient information to calculate effect sizes and publication in English between 1985 and 2015.

Search Strategy

A comprehensive literature search was performed to identify all published and unpublished studies that met the above inclusion criteria (see Fig. 1). Four main procedures were used to identify eligible studies: (a) An electronic search of 10 databases (Applied Social Sciences Index and Abstracts, Medline, PsycINFO, Embase, CINAHL Plus, ERIC, Social Services Abstracts, Cochrane Central Register of Controlled Trials in the Cochrane Library, BibioMap and Trials Register of promoting health interventions); (b) a search of relevant registers and youth work-related websites (e.g. National Youth Agency, National Council for Voluntary Youth Services (NCVYS) Publications, 4-H); (c) reference list screening from previous reviews (e.g., Dickson et al. 2013; Harden et al. 2006; Morton and Montgomery 2011) and articles identified through electronic databases; (d) information from researchers on unpublished or ongoing articles or to clarify reports identified through other sources. The electronic searches were initially conducted between September and December 2014 and re-run prior to the analyses in July 2015.

Fig. 1
figure 1

CONSORT diagram

Search terms were developed based on previous reviews and empirical studies (e.g., Dickson et al. 2013; Harden et al. 2006; Morton and Montgomery 2011) to reflect the agreed population criteria (young people), intervention criteria (positive youth development) and research methods. Specifically, keyword searches included variations in “children and young people”, “positive youth development”, “youth work”, “after-school” and (“intervention” OR “outcome” OR “program” OR “treatment”). Terms are available on request from the first author.

Study Selection and Data Extraction

Study selection was first performed in two main stages using a screening instrument. First, the titles and abstracts were scrutinized and excluded as appropriate. Relevant papers were then retrieved in full and assessed against the inclusion criteria. Documents that were potentially eligible were further reviewed to decide upon the final inclusion. Disagreements were resolved through discussion, and where necessary, studies were reviewed again. A data extraction form with five sections was used in the initial review to extract information from all articles that met the inclusion criteria. These were (a) general study characteristics (author, year of publication, country of origin), (b) population characteristics (number of participants, age, gender, grade level and risk level), (c) intervention characteristics (dosage, setting, format, components), (d) methodological characteristics (sample sizes, characteristics of the control group, design, attrition, follow-up period, intention to treat versus treatment on the treated analysis and outcome measures) and (e) statistical data needed for effect size calculations. For multiple publications from the same cohort, only studies with up-to-date or comprehensive data were included. Where data on study methods or results were missing, authors were contacted with a request to supply the information. In studies where the requested information was unavailable due to data loss or non-response, where possible the data was included in the meta-analysis.

Risk Bias Assessment

Two authors independently assessed each study’s methodological quality using the Cochrane Collaboration Risk of Bias Tool (Higgins et al. 2011). A third author assessed more than half the papers. All disagreements were discussed until a consensus was reached. Seven domains were scored with high, low or unclear risk of bias: Sequence generation, allocation concealment, participant blinding and personnel, outcome assessment blinding, incomplete outcome data, selective outcome reporting and other issues (i.e., baseline differences among groups). Each domain was scored as −1 for high risk, 0 for unclear risk and 1 for low risk. These scores were then summed to provide an overall quality score, which ranged from −6 to 6. Higher values signified a lower bias risk.

Statistical Analyses

Comprehensive Meta-Analysis Program (version 3) was used to carry out the meta-analyses. Statistical Package for Social Sciences (SPSS 21.0) was used to analyze the descriptive data.

Effect Size Calculations

For continuous outcomes, we calculated the standardized mean differences, or the difference between two means divided by their pooled standard deviations. To avoid effect size underestimation (Field 2001) we applied Hedge’s g correction, which is usually recommended for a sample size lower than 20 (Borenstein et al. 2009). For dichotomous outcomes, we calculated an odds ratio and then transformed these (using meta-analysis software) to g statistics to allow for across study comparisons (Borenstein et al. 2009). When the studies failed to report means, standard deviations or proportions, effect sizes were calculated using a t test, F-statistic or p value and sample size (Borenstein et al. 2009). All effect sizes were coded in which positive values indicated favorable intervention effects such as lower pregnancy rates or less substance use, with values of 0.20 considered small, 0.50 as medium and 0.80 as large (Cohen 1988). When a study had multiple measures for the same outcome, an overall effect size was calculated by averaging the individual effect sizes. Therefore, a single mean effect size per study was calculated for each outcome category, which ensured statistical independence (Lipsey and Wilson 2001). For each outcome, a separate analysis was performed to examine the intervention effect at both post-intervention and follow-up. Post-intervention effect sizes were calculated for the assessment nearest in time to the completion of the program. When a study reported multiple follow-up assessments for a particular outcome, the longest follow-up period was selected to examine the robustness of the intervention.

For clustered randomized trials in which the study had adjusted for a clustering effect, the analysis results were imputed to calculate the effect sizes. Conversely, for studies that did not correct for potential clustering prior to effect size calculation, we corrected for design effect using the guidelines of Higgins and Green (2009). A random-effects model was used in our statistical analyses due to the heterogeneity between studies in target population, interventions employed and outcomes assessed (Hedges and Vevea 1998). All effect sizes were weighted prior to any analysis by multiplying the values with the inverse of their error variance (Lipsey and Wilson 2001). This method ensured that larger studies contributed more to the effect sizes and were given more weight in the analyses.

Statistical Heterogeneity

Statistical heterogeneity between the studies was assessed using the Q statistic and the I 2 statistic. A significant Q rejects the homogeneity null hypothesis and indicates whether the effect sizes varied more across the studies than that expected from the sampling error alone (Borenstein et al. 2009). I 2 (Higgins and Thompson 2002) shows the heterogeneity percentage across the studies (0 % = none, 25 % = low, 50 % = moderate, 75 % = high; Higgins and Thompson 2002).

Publication Bias

Finally, the presence of publication bias was assessed using funnel plots (Sterne and Egger 2001) and Begg and Egger tests (Begg and Mazumdar 1994). Funnel plots measure effect size against study size, and when there is no evidence of publication bias these plots display studies symmetrically around the pooled effect size. The Begg and Egger tests measure the extent of the funnel plots asymmetry (with p < 0.05 indicating the presence of statistically significant publication bias).

Results

Figure 1 summarizes the search and selection process. The literature search identified 15,452 citations from the electronic bibliographic database searches and a further 2789 citations through website searches and searching lists of previous reviews and studies. After the removal of duplicates, 669 abstracts were screened for relevance. In total, the full text of 352 studies were obtained and screened for eligibility. Of these, 328 did not meet the inclusion criteria and were excluded. Twenty-four studies were included in the final meta-analysis. Nine trials were reported in multiple companion publications (See Table 2 for details).

Table 2 Characteristics of included studies (N = 24)a

Characteristics of Included Studies and Programs

Design

Twenty programs were conducted in the USA and the remaining four were conducted in Croatia, Ireland, UK and New Zealand. Fourteen were published in peer-reviewed journals, eight were technical reports and two were dissertation projects. Publication dates ranged from 1992 to 2014, with most studies being published after 2000 (75 %). All 24 studies employed randomized controlled designs. Fifteen used students as the randomization unit; seven used schools, classes or communities and the remaining two used a combination. Seven studies compared the treatment group with usual care groups such as regular sex education (e.g., O’Donnell et al. 2002) or standard alcohol and drug education programs (e.g., Komro et al. 2008; Perry et al. 1996), seven used an alternative treatment and the remaining nine used no treatment or wait lists as comparisons. Twelve studies reported high attrition rates. The sample sizes ranged from 30 to approximately 5812. In most studies, data were gathered through self-reports. Five studies had data from school records, parents or teachers. A summary of studies included in the meta-analyses can be found in Table 2.

Participants

The total participant number randomized across the 24 studies was 23,258. The mean age at baseline ranged from 10 to 16. Young people included in the programs attended elementary schools (12 %), middle schools (37.5 %), high schools (25 %) or a mixture of grade levels (25 %). The predominant race studied was African American (58.3 %), followed by Caucasian (37.5 %) and Native American (4 %). Most studies included mixed-sex samples, with three studies focusing exclusively on females. Fifteen studies focused on at-risk students, six focused low-risk students and three included both. The identifiers for at-risk populations included students from low-income backgrounds (n = 12 studies), students of racial or ethnic minority background (n = 5 studies), and students with low academic achievements (n = 6 studies).

Interventions

Programs were conducted in a range of settings. Specifically, five studies were conducted in the community, four were conducted on school grounds and one program was conducted in a combined school/family domain. The majority (n = 15) were delivered in mixed settings, with five being delivered in a combined school, community and family settings. Fourteen interventions were conducted in one geographical locality, with the remaining ten conducted nationally. Interventions varied in duration and number of sessions. The mean intervention duration was 80 weeks, with studies ranging from 3 to 240 weeks. Seventeen interventions involved at least 20 sessions and 12 had two or three follow-ups. The length of the first follow-up ranged from immediately after the intervention to 5 years after the intervention. The first follow-up was conducted immediately or within 3 months of the intervention in five studies. Eleven studies had the first follow-up after 5–18 months, seven studies involved a second follow-up after 6–20 months and five studies reported a third follow-up from 12 to 32 months.

The most common outcome measures examined were behavioral including problem behaviors (66 %), academic improvement and school adjustment (45 %) and sexual risk behaviors/pregnancy (45 %). A smaller number of studies examined the effectiveness of these interventions on positive social behaviors (29 %) and psychological adjustment (33 %). Five interventions were delivered in a group, two were delivered individually, and the remaining seventeen combined individual and group interventions. Twenty-one programs were multi-modal and involved primary and secondary interventions. Three single-modal programs provided mentoring, skills training or academic components. Of the 21 multi-modal programs, primary after-school activities covered academic and homework help (n = 8), mentoring (n = 7), community service projects (n = 9), social or cognitive/emotional skill development (n = 16), recreational activities (n = 6) and job clubs (n = 2). The most common positive youth development goals reported by approximately half the programs were pro-social bonding, social competence, cognitive competence, emotional competence, self-efficacy, self-determination and a belief in the future.

Bias Risk

The bias risk in the studies is summarized in Fig. 2. Overall, the studies did not provide sufficient information to judge the randomization procedure quality, with 16 studies having an unclear rating. Only seven studies described the randomization sequence and only two indicated how the allocation concealment was conducted. As seen in Fig. 2, participants and personnel blinding rates were found to be at a high risk across all studies. Nevertheless, given the nature of the interventions, the blinding of participants or personnel was often not possible. Only three studies reported a blinding of the outcome assessors.

Fig. 2
figure 2

Risk of bias ratings across included studies

Attrition bias was high in 13 of the included studies, uncertain in two and low risk in nine. The findings in these studies may be biased and may not reflect the true effects of the intervention as the results may have been influenced by the characteristics of the participants who dropped out of the studies. Reporting bias was assessed as low risk in 18 studies, as these papers appeared to have provided results on the expected outcomes. Three studies were assessed as high risk, as they had incomplete information on the expected outcomes.

Intervention Effects

Table 3 shows the mean effect sizes, the 95 % confidence intervals and the corresponding statistics for each outcome category. Forest plots were created for each of the five outcome categories. Effect sizes ranged from 0.04 to 0.22, and despite all being positive (i.e., favoring the intervention condition), only three were significantly different from zero. Specifically, the analyses indicated significant effects in two areas; academic/school outcomes and psychological adjustment. The largest positive effect size was found in academic achievement (g = 0.22), with the lowest effect size found in positive social behaviors (g = 0.04). Intervention effects were based only on post-intervention data. Follow-up data were only combined for two outcome categories—psychological adjustment and academic achievement—when the category included at least two studies. For the remaining outcomes, the follow-up effects were reported for each study.

Table 3 Effect sizes by outcome category

Heterogeneity

Heterogeneity was found in studies that reported the following outcomes; self-perception, academic achievement and sexual risk behaviors—which indicated the likelihood of moderating variables. However, there was no evidence of heterogeneity in studies that reported on problem behavior, academic adjustment or sexual health outcomes.

Behavioral Adjustments

Problem Behavior

Sixteen studies, which included 52 effect sizes, were averaged to show the intervention effects on problem behavior. To increase the statistical power and because moderator analyses showed no significant differences in intervention effects between conduct problems and substance use (t = 0.87; ns), we decided to pool all conduct problems and substance use measures into the same analysis. Eight studies investigated the impact of interventions on substance use, two measured conduct problems and six measured both conduct and substance use problems.

The results of the meta-analysis indicated no statistically significant differences between groups (g = 0.04; 95 % CI = −0.01, 0.10; ns; see Fig. 3). The statistical heterogeneity across the studies was neither important nor significant (Qtotal = 20.75; I 2 = 27.73 %; ns). Only one study (Dolan et al. 2011) measured the problem behavior after 21 months but showed no significant effects (g = 0.04; 95 % CI = −0.30, 0.39; ns).

Fig. 3
figure 3

Effect sizes for problem behaviors

Positive Social Behavior

This category was reported in seven studies and included outcomes such as getting along with others, social competence and prosocial behavior. Measures included teachers, parents and self-reports, the latter being the most frequently used. The results showed no significant statistical differences between groups at post-treatment (g = 0.04; 95 % CI = −0.11, 0.21; ns; see Fig. 4) and the effects’ heterogeneity was moderate but significant (Qtotal = 11.84; I2 = 49.34 %; p < 0.05). One study (Karcher et al. 2002) had an impact on heterogeneity and overall effect size, so after excluding this study, heterogeneity was lower (Qtotal = 7.86; I2 = 36.40 %; ns), but the effect size was smaller (g = 0.01; ns). Only one study (Dolan et al. 2011) measured positive social behaviors after 21 months and showed a positive significant effect (g = 0.27; 95 % CI = 0.00, 0.55; p < 0.05).

Fig. 4
figure 4

Effect sizes for positive social behaviors

Psychological Adjustment/Internalizing Behavior

Eight studies, including 17 effect sizes, were averaged to show the intervention effects on psychological adjustment. The meta-analysis indicated a small but significant treatment effect (g = 0.17; 95 %CI = 0.04, 0.31; p < 0.05; see Fig. 5). The homogeneity analysis indicated a moderate degree of statistical heterogeneity (Qtotal = 27.16; I2 = 67.15 %; p < 0.01). The results suggested that exposure to positive youth development interventions improved psychological adjustment compared to the control condition, which was equivalent to a 0.17 standard deviation in magnitude. Three studies examined long-term intervention effects on self-perception and found that this did not change over time (g = 0.23; 95 % CI = 0.05, 0.42; p < 0.01). Feinberg et al. (2013) conducted a follow-up at 10 months post-intervention and reported no lasting significant effects on depression levels (g = −0.04; 95 % CI = −0.38, 0.30; ns).

Fig. 5
figure 5

Effect sizes for psychological adjustment

Academic/School Outcomes

Academic/school outcomes were reported in 11 studies and included measures in relation to academic achievement and academic adjustment. Academic achievement outcomes were measured in 10 studies using school records, teacher, parent and self-report. Academic adjustment was reported in five studies based on self-report. A significant difference in academic achievement outcomes was found between the groups after the intervention (g = 0.22; 95 % CI = 0.07, 0.38; p < 0.05; see Fig. 6). Low but significant heterogeneity was found across the studies (Qtotal = 28.30; I2 = 68.20 %; p < 0.01). On average, the positive youth development interventions had a 0.22 standard deviation improvement in scholastic performance relative to the control groups. Generally, however, the analysis found no significant overall intervention effect on academic adjustment (g = 0.09; 95 % CI = −0.02, 0.20; ns) and the effect heterogeneity was low and non-significant (Qtotal = 4.15; I2 = 3.75 %; ns). Three studies measured academic/school outcomes that also indicated no lasting significant effects (g = 0.07; 95 % CI = −0.05, 0.20; ns).

Fig. 6
figure 6

Effects sizes for academic achievement

Sexual Health Outcomes

As the moderator analyses showed no significant differences in the intervention effects between sexual risk behaviors and pregnancies (t = 0.32, ns), all measures were pooled. A meta-analysis of the 11 studies on sexual health outcomes showed no statistically significant differences between the positive youth development intervention and the control group (g = 0.05; 95 % CI = −0.00, 0.12; ns), with non-significant heterogeneity found across the studies (Qtotal = 8.41; I2 = 0 %; ns). All but two studies (Bonell et al. 2013; Trenholm et al. 2007) showed a positive effect size though none were statistically significant. Only one study measured effect over a longer follow-up (Bonell et al. 2013) and a small, non-significant effect was reported (g = 0.03; 95 % CI = −0.39, 0.45; ns) (Fig. 7).

Fig. 7
figure 7

Effects sizes for sexual risk behaviors

Subgroup Analyses

Homogeneity analyses were performed for each set of effect sizes (see Table 4). Significant heterogeneity was present in three of the outcome variables (positive social behaviors, psychological adjustment and academic achievement) so potential moderators of these effects were examined. Effects for problem behavior and sexual health outcomes were not heterogeneous, so moderators were not examined. Twenty-one analyses were performed with seven potential moderators across the three outcome variables. These moderators had three intervention characteristics (setting, duration and type), two sample characteristics (youth risk level and age) and one study characteristic (publication). Emotional distress and self-perceptions were pooled into psychological adjustments because the groups were too small for separate analyses.

Table 4 Moderators of effect sizes

Table 4 shows that the only moderation effect found was for the youth risk level moderator in relation to positive social behavior. Interventions delivered to low or mixed risk youth (k = 4; g = 0.23; p < 0.05) were more likely to produce a significant positive effect than interventions applied to high-risk young people. One study (Karcher et al. 2002) significantly contributed to this result (g = 0.79; p < 0.05). No other significant moderation effects were found. However, several trends emerged. Intervention characteristics that showed small significant trends were delivered in community-based settings (k = 3; g = 0.30; p < 0.05, psychological adjustment outcome), were in mixed settings (k = 6; g = 0.29; p < 0.05; academic achievement outcome), or lasted for more than 1 year (k = 4; g = 0.20; p < 0.05; psychological adjustment; and k = 4; g = 0.25; p < 0.05; academic achievement outcomes). Sample characteristics with significant trends included interventions designed for middle-school youth (k = 4; g = 0.16; p < 0.05, psychological adjustment outcome) and high-school youth ((k = 3; g = 0.36; p < 0.05; academic achievement outcome).

The 24 studies were grouped by one of the five primary intervention types; academic and skills training (n = 6), recreation (n = 2), community service projects (n = 2) and mixed (i.e. life skills/recreation, life skills/community or education/work experience) (n = 4). The effect sizes did not differ significantly between the groups, suggesting that the primary intervention type did not have a strong influence on the overall positive youth development intervention effect on any outcome. However, mentoring interventions showed a significant impact in relation to psychological adjustment outcome (k = 5; g = 0.21; p < 0.01).

Publication Bias

Publication bias was not detected, as the funnel plot shapes were symmetrical for all analyses (data not shown). The results of the Egger and Begg tests were non-significant for asymmetries on all outcomes (psychological adjustment: Egger p = 0.36, Begg p = 1.00; school outcomes: Egger p = 0.22, Begg p = 0.18; positive social behaviors: Egger p = 0.11, Begg p = 0.13; Problem behavior: Egger p = 0.12, Begg p = 0.26; sexual health outcomes: Egger p = 0.19, Begg p = 0.21).

Discussion

High-risk behaviors occur frequently in adolescence and tend to cluster together. They are associated with a range of adverse physical, psychological and occupational outcomes, which can persist into adulthood and carry significant personal, societal and economic costs. Adolescence is a developmentally sensitive period that presents a window of opportunity in which to potentially alter the course of high-risk behavior. Positive youth development interventions aim to prevent the escalation of risk behavior and enhance personal growth by drawing upon the strengths of young people and their contextual assets. Such programs have received significant investment in the past decade. The impact of these programs, as assessed by a number of previous reviews, is unclear. This lack of clarity may be accounted for variance in program components, methodological shortcomings in trial evaluations (e.g., measurement of relevant constructs) and differences in review methodology. Consequently, there is a lack of specificity in the outcomes that positive youth development interventions impact upon and their effective components. The aim of this systematic review and meta-analysis was to examine the effectiveness of positive youth development interventions on a wide range of outcomes and to offer an updated review of the evidence base. To our knowledge, this is the first meta-analysis to examine both published and unpublished randomized controlled trials and to report on the effects of such interventions on multiple outcomes. Twenty-four studies were included in the meta-analysis, nine of which have not been included in prior reviews.

Many of the included studies were affected by systematic bias and attrition rates were high. Positive youth development interventions did not lead to a significant reduction in antisocial/violent behavior, substance abuse or risky sexual behaviors or improve positive social behaviors in comparison to control interventions. However, significant effects for two outcome variables—psychological adjustment and academic achievement—were found. Positive youth development interventions were associated with modest but significant improvements in three areas; self-perception, emotional distress and academic achievement. In particular, significant improvements in self-perception were found to reduce emotional distress and improve academic achievement. These effects were small, with many individual studies failing to detect significant effects. This provides partial support for the notion that enhancing the assets of adolescents can support them to thrive academically and manage emotional difficulties. Given that poor academic achievement and low self-perception are associated with lower earnings, poor health (Bonell et al. 2005; Wellings et al. 2001; Emler 2001; Hallfors et al. 2006; Wheeler 2010) and delinquency (Maguin and Loeber 1996; Donnellan et al. 2005), improving youth academic achievement and self-perception could result in improved economic well-being and possibly positive health outcomes (Maynard 1996).

In addition to exploring which behaviors are modifiable through positive youth development intervention, it is important to gain an understanding for whom they work and how programs may be tailored to particular individual and contextual characteristics. This review addresses this question in a limited way through the examination of program and individual moderators. In this review, program characteristics were not associated with the strength of program outcomes, suggesting that positive youth development intervention effects were independent of setting. Program effects were similar regardless of age. However, young people deemed low-risk were more likely to benefit from positive youth development interventions than high-risk youth. It will be important for future research to examine both individual and contextual moderators of program effectiveness. Factors such as socioeconomic status, ethnicity, access to and engagement with ecological assets (e.g., community resources) are likely to be important. Such research would highlight the characteristics of individuals most likely to benefit from existing positive youth development interventions and which modifications will be required to extend the reach of programs to marginalized or high-risk groups. Cohort studies (e.g., the 4-H Study of Positive Youth Development; Lerner 2006) have already begun to measure these constructs and the relationships between them. Incorporation of such measures into randomized control methodology would provide a robust test of relational development systems theories upon which many positive youth development programs are based.

The findings of this review are consistent with prior quantitative (Durlak et al. 2010) and narrative reviews (Roth and Brooks 2003b; Catalano et al. 2004; Gavin et al. 2010) for some outcomes but not others. For example, the behavioral adjustment evidence contradicts the positive effects found in Durlak et al. (2010) in relation to problem behavior and positive social behaviors. However, they are consistent with Zief et al.’s (2006) and Durlak et al.’s (2010) findings regarding the non-significant effects on drug use. The academic adjustment results were in agreement with the findings from prior reviews, in that these interventions did not report any beneficial effects (Zief et al. 2006; Durlak et al. 2010). With regard to sexual health outcomes, the present results contradict the conclusions offered by prior reviews, which reported significant effects on sexual risk behavior and pregnancy rates (Shepherd et al. 2010; Gavin et al. 2010; Catalano et al. 2002). The conflicting findings between this review and previous reviews may be accounted for by a number of factors. These include differing inclusion criteria, study design and sample size. A conservative approach was adopted through selection of randomized controlled designs, utilization of an inclusive approach to study inclusion (e.g., published and unpublished data) to maximize the statistical power and a robust measure of quality assessment. The focus upon both positive and negative behavioral outcomes reflects an increasing consensus over the last two decades to the benefits for research, policy and practice of integrating promotion and prevention approaches to youth development (e.g., Brooks-Gunn and Roth 2014).

Whilst there has been considerable progress in research on positive youth development, this review highlights the limited extent to which conclusions can be drawn about positive youth development intervention effects on adolescent health and well-being. These limitations include the lack of high-quality studies and studies delivered outside the US. Nevertheless, this lack of high-quality evidence is not evidence of a lack of effectiveness. Future investment and research is therefore required to replicate USA-based programs on adolescent populations in different countries/communities, and the quality of the evaluation studies needs to be improved through more rigorous designs and minimization of potential biases. Evaluation of positive youth development programs needs to go beyond measuring the effects on behavior alone to encompass measures of individual positive youth development (e.g., the “5Cs”, hope, future expectations; Bowers et al. 2010) and contextual resources (e.g., parents and other adults, community activities and institutions). Measuring these constructs is complex and consensus is yet to be reached (Roth and Brooks-Gunn 2016; Tolan 2016). Future studies should also include adequate sample sizes and repeated follow-ups, on the basis that positive youth development is likely a dynamic process rather than an endpoint (Scales et al. 2016). There is a need for clearer descriptions of the program goals, components that contribute to program outcomes and their implementation features. When more programs have been rigorously evaluated and adequately reported, it may be possible to pool the evidence to determine program effectiveness. Further research is needed to understand the critical intervention components that contribute to the prevention of risk-taking behaviors and to robustly test novel interventions.

The findings of this review and the conclusions drawn about the positive youth development intervention effects need to be interpreted in the context of their limitations and the current state of evaluation research in this field. A paucity of rigorous research evidence exists on the impact of positive youth development interventions in adolescents; hence, this meta-analysis had only a few studies, some of which had small sample sizes and validity problems. Although we identified a sample of 24 studies, the number of studies included in any single analysis was much lower, particularly in the analyses that examined the impact of these interventions on certain outcomes. As a result, our power to detect significant differences was reduced. All the included studies suffered from some internal validity problems due to methodological flaws. Most studies had high performance bias, selection bias and detection bias risk, and therefore, may have overestimated the positive youth development intervention effect. However, evidence suggests that although adequate procedures to ensure random sequence allocation and allocation concealment are often followed, these are frequently underreported (Hill et al. 2002). In addition, participant blinding is often impossible in these types of interventions. There was a significant study heterogeneity associated with several outcomes, and moderator analyses were unable to explain this variability. Subgroup analyses only included a small number of studies, which led to a low statistical power, so the results may have failed to detect some important subgroup differences. The non-significant p value might have been due to low power and not necessarily related to effect size consistency (Borenstein et al. 2009). As with any systematic review, we cannot be sure that our search identified all possible studies, especially all unpublished studies, which have a tendency to report more null effects. Other limitations include an overreliance on studies conducted in the US, the use of self-report measures and the inclusion of urban, high-risk youth, predominantly African American samples. Therefore, these results cannot be generalized to all adolescents or to all programs outside the US. Given these limitations, it is not currently possible to give strong recommendations for the use of positive youth development programs. In summary, the findings of this review are encouraging but further research using robust methodology is required.

Conclusion

Positive youth development interventions are the focus of significant investment, especially in the US and UK. This review found that the effects of positive youth development on various outcomes were either non-significant or modest in magnitude. Additionally, the evidence base for their effectiveness was dominated by a high number of USA-based studies, many of which were poor quality. Nevertheless, the results of this review support the effectiveness of positive youth development interventions on academic achievement and psychological adjustment. Low-risk young people appear to benefit particularly from these programs. Substantial progress has been made in theoretical development of positive youth development. Improvements are now needed in the way studies are designed, evaluated and reported so that we can draw more concrete conclusions as to their real potential in reducing risk behavior and encouraging adolescents to thrive.