Depression affects a substantial proportion of the population and results in significant distress, disability, and economic cost (Kessler et al. 2005). The risk for depression rises substantially during adolescence and young adulthood, with lifetime prevalence estimates for major depressive disorder reaching from 17% to as high as 40% of the population (Moffitt et al. 2010), and growing with each decade cohort. Over a third of all people who have one episode of major depression will have one or more recurrent episodes, leading to severe and chronic burden for many (Hardeveld et al. 2010). As a result, there has been a major effort over the past decade to develop and test programs to prevent depression. These programs have targeted adolescents based on the notion that early intervention may be able to reduce risk for initial onset, as well as short-circuit the cycle of recurrence.

Randomized trials have provided evidence that many of these interventions are effective in reducing risk for subsequent depression. However, some trials also find that these interventions are not equally beneficial for all participants, and that initial benefits may change over time (Merry et al. 2011; Stice et al. 2009). This raises questions about preventive effect heterogeneity: that is, do different adolescents respond differently to preventive interventions? And if we find evidence of differential effects, what might account for it, and how might that help us shape the next generation of prevention programs? In addition, can such findings provide new insights for our theories of depression etiology?

This supplemental issue of Prevention Science brings together a set of papers from leading investigators who have conducted trials testing whether intervention programs prevent adolescent depression. Using data from these trials, these papers explore a series of factors that might account for variation in intervention benefit, employing several novel methods for assessing effect heterogeneity. These studies follow two general paradigms: three papers report findings from single randomized preventive intervention trials, while the remaining papers develop and apply new methods for combining data from multiple studies to evaluate effect heterogeneity more broadly.

Testing Effect Heterogeneity Within Individual Trials

Garber et al. (2015) report on a 32-month follow-up of a randomized controlled trial of a cognitive behavioral program designed to prevent future depression in adolescents deemed at risk because of current elevation in subsyndromal depressive symptoms or prior history of DSM-IV depressive disorder. The program consisted of eight weekly and six monthly group sessions emphasizing cognitive restructuring and problem solving. This paper applies recursive partitioning to identify clusters of baseline risk factors, testing whether intervention effects on time to onset of a major depressive disorder vary by cluster. Preventive effects were found only for the lowest-risk cluster (which was itself at heightened risk, given the study sampling criteria).

Focusing on a different intervention format, Connell et al. (2015) report on trajectories of depressive symptoms across 3 years following a randomized controlled trial of The Family Check-up (FCU), a family-centered program designed to reduce adolescent problem behaviors through improving parenting skills and family interactions. In distinction to the Garber et al. trial, prevention of depression was conjectured to be an indirect effect of changing family processes, reducing problem behavior, altering association with deviant peers, and enhancing academic performance. Families were recruited from three middle schools, with no selective restrictions on baseline functioning. The intervention involved family consultation, a Family Check-up assessment, and referral as necessary for further family services. Connell et al. applied generalized growth mixture modeling (GGMM) to four annual waves of depressive symptom measures, identifying three growth patterns, and finding intervention effects only for the class that had low initial symptoms that increased over time. They then explored predictors of class membership as a means of identifying predictors of effect heterogeneity, finding that this class was distinguished by having more females with lower levels of baseline depression but higher levels of antisocial behavior. Findings parallel those of Garber et al. in that no preventive effects were found for the class with high baseline depression that was more stable over time.

As with Connell et al. 2015; Mauricio et al. (2017) also focus on a family-based intervention, presenting findings from a 24-month follow-up of a randomized trial of Bridges to High School (Bridges), a family intervention designed for Mexican-American families and targeting parenting, teen competence, and the reduction of risk for mental health and substance use problems. Similar to Connell et al., investigators hypothesized that changes in these outcomes would also lead to reductions in risk for internalizing symptoms including depression. Initial analyses found no overall impact on internalizing symptoms, but also indicated substantial variation in program attendance. This paper applied GGMM to data on attendance across the 11 group sessions, finding evidence for four distinct attendance patterns for the intervention group. Two of these groups, those with sustained attendance and those that dropped out early, were further distinguished by having subgroups that had adolescents higher or lower on baseline internalizing symptoms. Correlates of group classification pointed to the importance of acculturation factors in combination with baseline internalizing symptoms.

Testing Effect Heterogeneity by Combining Data from Multiple Trials

Meta-analysis, which combines trial-level data on effect sizes, is currently the predominant method for aggregating findings and studying general impact. Meta-regression is then used to study effect heterogeneity, and to test whether trial-level factors such as gender proportion or intervention type are associated with different effect sizes. However, the statistical power of these methods for detecting and studying effect heterogeneity is very limited, and can also lead to fallacious results when we assume that trial-level variables operate in the same was as individual-level moderators (Dagne et al. 2016). Recent advances in statistical methods that can be applied to individual-level data aggregated over two or more studies, referred to as integrative data analysis (Curran and Hussong 2009) or individual patient or participant meta-analysis (Stewart and Parmar 1993), have much greater power to detect effect heterogeneity and characterize its sources (Dagne et al. 2016). Such methods have only recently been applied to trials data, and quantitative methods are still being developed. Five papers in this issue focus on these novel methods for combining data across multiple trials as a means of studying effect heterogeneity.

Brunwasser and Gillham (2015) report analyses combining data from three randomized controlled trials of the Penn Resiliency Program (PRP). Similar to the program evaluated by Garber et al., PRP is a group-based cognitive behavioral program for adolescents designed specifically to target individual risk mechanisms for depression. Results of 20 trials reflect variability in impact on depressive symptoms; some studies have reported significant main effects, others have found effects only for specific subgroups, and still others have failed to find significant effects (Brunwasser et al. 2009). The three trials used here utilized similar designs, including selection of adolescents with elevated but subclinical levels of depressive symptoms, use of a standard PRP curriculum, and repeated assessment of depressive symptoms at post-test and 6-month follow-ups through 24 months. The trials differed on implementation site; one trial was implemented in two primary care settings, while the other two trials were conducted in four suburban middle schools. Brunwasser et al. model variation in symptom trajectories, and then apply both standard moderator analyses and recursive partitioning analyses to these trajectories in order to identify subgroups reflecting effect heterogeneity across the combined sample. Findings reported here are complex; for example, as with Garber et al. and Connell et al., baseline depressive symptoms moderate the impact of PRP, but the pattern of moderation varied across sites. For example, in one primary care clinic effects were stronger for adolescents who began the intervention with more depressive symptoms, but in the second clinic effects were stronger for those with low or moderate symptoms.

Brincks et al. (2016) also combine data from four randomized controlled trials, in this case of the Familias Unidas (FU) intervention. Similar to the trials reported in Connell et al. and Mauricio et al., FU is a family-focused intervention targeting family interactions as a means of preventing adolescent substance use and sexual risk. The program is designed specifically for Hispanic families. Two trials included adolescents from general school populations, and two included adolescents with a history of conduct, aggression, or attention problems or delinquency. As with Connell et al. and Mauricio et al., these investigators hypothesize that changes in family functioning and adolescent problem behavior will lead indirectly to reductions in risk for internalizing symptoms. All trials involved the same intervention program and used the same measure of internalizing symptoms. The trials did vary in timing and length of follow-up. Brincks, Perrino, et al. use a two-step extension of growth mixture modeling to identify distinct trajectory classes, allowing for comparison of intervention and control trajectories within class. Intervention effects are present only for adolescents with high baseline internalizing; here, adolescents in the intervention show no changes in internalizing, while those in the control group show significant worsening over time. This group was more likely to include female adolescents and families with poor parent-adolescent communication at baseline.

Reports from the third synthesis project, reported in a trio of papers by Brincks et al. (2017), Brown et al. (2016), and Siddique et al. (2017) present findings from a large-scale synthesis effort that combines individual-level data from 19 different prevention trials, including most of the trials described in the other papers in this supplemental issue. These trials vary in several ways. Some involve programs such as those studied by Garber et al. and Brunwasser et al. identify adolescents with elevated but subclinical depressive symptoms and use programs that target cognitive and behavioral risk factors through adolescent group intervention. Others involve programs such as those reported by Connell et al., Mauricio et al., or Brincks, Perrino, et al. that use family-focused interventions to improve family functioning and reduce adolescent problem behavior, assuming these will indirectly reduce risk for internalizing symptoms.

Brincks, Montag, et al. provide an extended discussion of several methodological challenges that must be met when combining data from such trials to conduct integrative data analysis. They note that trials can vary on outcome measures, follow-up assessments, sample characteristics, control conditions, and impact trajectories. Novel quantitative methods are presented for attending to the sources of variation in integrative data analysis, and evaluating impact of such variation on internal and external validity of results. These methods integrate missing data methods with multilevel latent variable modeling combined with latent growth analysis.

Brown et al. conduct tests of effect heterogeneity with this combined individual-level dataset, using the methods outlined by Brincks, Montag, et al. Findings suggest that overall preventive effect is present and remains stable across 2 years, a finding that contrasts with results from individual trials that report effects that fade over time. Findings also suggest that interventions specifically targeting depression, most of which addressed cognitive behavioral or interpersonal risk factors, had stronger preventive impact compared to those targeting problem behaviors or general mental health, most of which were family-based. The authors discuss sample differences that might account for this, noting that longer follow-ups may be necessary for intervention effects when programs target family processes, an interpretation consistent with the few interventions employing follow-ups of 3 years or more (Sandler et al. 2015). Analyses also fail to find moderation by baseline severity of depressive symptoms, in distinction to findings reported here by Garber et al., Brunwasser et al., and Connell et al. This may reflect more complex and nuanced moderation, consistent with Brunwasser et al.’s finding that such moderation varied by intervention site. These findings suggest that there are different pathways in the development of depression, and that this heterogeneity requires preventive interventions that target different risk factors (e.g., cognitive factors, family dysfunction, externalizing behaviors.)

Using data from the same set of trials as Brincks, Montag, et al. and Brown et al., the paper by Siddique et al. (2017) provides a cautionary note concerning the challenges of combining data from different measures across multiple trials. In contrast to Brown et al., which used latent variable models, Siddique et al. employ multiple imputation methods with summary measure scores to test whether this approach can be used to harmonize measures across trials. These methods hold promise for harmonization because they allow for calibration methods that produce a common metric independent of the specific measure employed. However, Siddique et al. find that these methods do not provide accurate imputations for all these data, based on diagnostic evaluation of the imputation model. Siddique et al. discuss the characteristics of this dataset that contribute to this limitation, and suggest future avenues for developing methods that may be less sensitive when some assessment methods are only rarely used. They note that, when their analyses exclude specific trials and measures with the greatest amount of “missing” data, results are consistent with the Brown et al. finding of beneficial preventive impact.

Novel Methods for Studying Effect Heterogeneity

Taken as a whole, this series of papers explores a range of novel methods for studying effect heterogeneity. These include new methods for characterizing change and identifying sources of heterogeneity. They also include new methods for combining datasets for integrative data analysis.

Longitudinal designs are essential for evaluating the preventive effects of intervention programs, and the papers in this issue study effects using follow-up data ranging from 12 to 36 months. Brunwasser et al. and Brown e al. study continuous variation in trajectories of depressive symptoms, while Brincks et al. and Connell et al. employ generalized growth mixture modeling, both finding that different adolescents follow categorically distinct trajectories of depressive symptoms. Garber et al. study time to onset of new major depressive episodes across 32 months.

These more complex characterizations of change are used as a basis for exploring heterogeneity of intervention impact. These studies focus on somewhat different sources of heterogeneity and employ different statistical techniques. Several papers attend to baseline factors, all measured prior to assignment to intervention or control condition, as potential sources of effect heterogeneity. These include demographic factors such as gender or acculturation as well as baseline severity of subclinical depressive symptoms. Recursive partitioning using regression trees, a set of methods that are relatively new to prevention science, are also explored as a means of characterizing sources of effect heterogeneity, either through identifying clusters of baseline risk factors, or through partitioning on those risk factors to identify specific subgroups showing different patterns of intervention impact.

Several papers also advance the use of integrative data analysis with individual participant data from combined sets of prevention trials. These methods hold substantial promise for increasing power to detect moderation, although they introduce new challenges that will require development and testing of new quantitative methods.

Conclusions and Implications for Policy, Research, and Practice

Two commentaries from scientists at the National Institutes of Health and the Substance Abuse and Mental Health Administration place these papers in broader perspective. Goldstein and Avenevoli (2017) from the National Institute of Mental Health highlight the growing importance of collaborative data sharing essential for conducting integrative data analysis, and discuss NIMH initiatives to support such activities. They note that these collaborative syntheses of multiple trials support two transformative ideas: the same targetable risk mechanisms can shape different emotional and behavioral disorders, and adolescents can follow several different trajectories leading to the same internalizing disorder. They also emphasize the need to expedite implementation and dissemination of tested interventions through supporting science on the process of service delivery.

Paolo del Vecchio, director of the Center for Mental Health Services within the Substance Use and Mental Health Services Administration, discusses implications of this work from public health and prevention service delivery perspectives (del Vecchio 2017). He notes the importance of the growing prevalence of depression in several populations, including Latina girls, LGBT-identifying youth, American Indian, and Alaska Native youth. He highlights findings from the supplemental issue suggesting that preventive interventions are equally effective for minority and majority youth, and pointing toward SAMHSA initiatives for supporting and disseminating effective prevention programming through school-based mental health programs. He concludes that linking prevention science findings to the behavioral health service field will allow us to design strategies that address the mental health needs of all communities.

As the authors and commentators in this supplemental issue make clear, these studies reflect significant progress in our understanding of how to prevent adolescent depression, but much remains to be done. Synthesis of findings from multiple trials holds great promise for advancing the field, and progress will be accelerated if collaborative data sharing becomes the norm rather than the exception. The evidence is strong that carefully designed prevention programs can reduce risk for adolescent depression, but we need to know more about what programs work best for which groups of adolescents, and under what conditions. The papers in this supplemental issue provide exemplars for how to learn more from individual trials, as well as from data shared and combined across multiple trials.