Introduction

Children and adolescents surviving large scale traumatic events, such as disasters, accidents, and terrorism present with a wide range of psychological problems, including posttraumatic stress disorder (PTSD) [1, 2]. Meta-analyses of child interventions focused on PTSD [3•, 4] have revealed promising results, but the lack of specific focus upon the efficacy and effectiveness of child disaster mental health interventions leaves a major public health concern. Recently, several qualitative reviews have examined child disaster mental health interventions [5, 6, 7••, 8••, 9], describing specific trends and gaps in the field.

To extend this work, this meta-analysis was conducted to examine effect sizes of interventions upon PTSD and explore the moderating effects of factors raised in descriptive reviews [5, 6, 7••, 8••, 9]. Our specific aims were to: (1) examine outcomes in children receiving treatment relative to control and waitlist groups; (2) evaluate treatment outcomes yielded by specific intervention packages, and (3) examine the moderating role of intervention packages, treatment modality (individual vs. group), providers’ level of training, intervention setting, parental involvement, participant age, length of therapy, intervention delivery timing, and the methodological rigor of the study based on Nathan and Gorman [10] criteria. These moderators have been the subject of critical focus and debate in the field. Broadly, we seek to delineate treatment-improving factors, guiding recommendations for psychological practices in post-disaster environments for children and adolescents presenting with trauma-related symptoms.

Method

Study Selection

Research studies assessing outcomes of interventions for children exposed to mass trauma were identified by searching the following bibliographic databases: Ovid, Medline, PsycInfo, EBM Reviews, EMBASE, ERIC, and PubMed. The initial search was conducted in 2011 and updated in the winter of 2012. Articles were identified by the following terms: treatment(s), intervention(s), child(ren), adolescent(s), disaster(s), terrorist event(s), terrorism, teen, youth, hurricane, tsunami, disaster intervention, posttraumatic stress disorder, treatment outcomes, grief, drug therapy, early intervention, trauma, and family therapy. Titles and abstracts identified in the searches were reviewed to select material for potential review.

All manuscripts published in English providing useable statistical outcome data were included. Initially we reviewed all studies addressing ongoing political conflict, war, and single incidents such as accidents. However, we excluded articles on ongoing war and terrorism because continuous exposure to terrorism and war likely requires different interventions that address both past and current concerns that are different from past war and terrorism. For similar reasons, we also excluded treatments aimed at single accident events and interpersonal violence.

Of the 74 articles identified in the initial search, 18 were classified as war or chronic terrorism studies and were thus excluded. Additionally, four case studies [1114] and one mixed sample study [15] were dropped. Seven articles focused on accidents or sudden loss of a loved one were excluded [1622]. Finally, 19 were excluded due to a lack of outcome data on PTSD, missing appropriate numerical outcome data required to calculate effect sizes for analysis (i.e., means, standard deviations), and/or if consisted of treatment description only [2341], thus yielding 25 studies. Finally, one study was excluded as an outlier (Cohens d > 3.5) [42]. Seven studies [4349] offered results for two independent samples (see below). All told, the final sample includes 31 main data points, from 24 distinct studies.

Coding Procedure

Rater pairs coded each useable article on a variety of dimensions, including sample demographics (mean age, % male), trauma exposure, outcome measures, use of control group (placebo, waitlist, non-treatment), total sample size at each time point, days since trauma (i.e., intervention delivery timing), number of treatment sessions, session length, follow up since treatment, and means and standard deviations for each outcome measure of each group and at each time point. Studies were further coded for the presence or absence of blinded assessment, random assignment, use of treatment manuals, clear inclusion and exclusion criteria, treatment adherence, and psychometric properties of the outcome measure (reliability and validity), thereby allowing the classification of each study into Types 1, 2, or 3, according to the Nathan and Gorman [10] levels of methodological rigor. A Type 1 study represents the highest level of rigor, applicable to a treatment study that had comparison groups, random assignment, blinded assessment, clear inclusion and exclusion criteria, state-of-the-art diagnostic measures, adequate power, and clearly described methods [10]. Type 2 studies are missing one or two of the Type 1 criteria, but are otherwise sound. Type 3 studies lack comparison groups, present only pre-post data, are open treatment studies obtaining pilot data, or have obtained outcome information retrospectively. Type 4, 5, and 6 studies were not identified in the literature review. Following the evaluation of individual studies, the research team met to sort articles by intervention package, as follows: Strict Cognitive Behavioral Therapy (CBT), CBT with Grief Interventions, Eclectic with and without CBT components, Eye Movement Desensitization and Reprocessing (EMDR), Exposure, Relaxation, Psychological First Aid, and Psychological Debriefing/Crisis Intervention. All coding disagreements and inconsistencies were resolved by consensus and consultation.

For values where ranges were given, the midpoint was used. Baseline values were used as pre-treatment measures, while the last follow-up data collection values were used as post-treatment (outcome) values. When effect sizes were calculated for intervention groups relative to controls, the last time point when data were collected for both groups was used. In cases where waitlist control groups were used for comparison, post-intervention data of the intervention group was compared to pre-intervention data of the waitlist control group. Finally, studies comparing active treatments side by side or using a modified treatment as a comparison were separated and treated as individual sources. Fifteen different measures assessed PTSD symptoms (see Table 1).

Table 1 Studies included in the meta-analysis

Coding procedures were refined, and coders trained through pilot coding of five articles outside the useable set but from related domains (e.g., adult trauma and child maltreatment). Coding was refined until coders reached an average agreement of r = .80. Median kappa and median percent agreement after coding 18 articles was .80 and 88.9 %, respectively (means, .80 and 88.6). Due to staff changes, the final seven articles added to the study were coded by only one of the raters in consultation with the lead author. All discrepancies were resolved in consultation with the team.

Effect size Calculation

Within group pre-post treatment effect sizes were calculated for each study using Cohen’s [67] d = (M1–M2)/σp, where M 1  = pre-treatment mean, M 2  = post-treatment mean, and σ p  = pooled standard deviation. The same formula was used to calculate between- group outcome effect sizes, where M 1  = intervention group mean, M 2  = control group mean, and σ p  = pooled standard deviation. Small, medium, and large effect sizes were defined as d = .20, d = .50, and d = .80, respectively [67].

Meta-analysis Procedures

Mean effect sizes and cross-study moderators were assessed using Hunter and Schmidt’s [68] “bare bones” meta-analysis procedure, which averages observed effect sizes across studies without correcting for systematic effects of measurement unreliability and range restriction. The “bare bones” approach is recommended when reliabilities and information needed to determine differential range restriction are too infrequently reported to permit the associated corrections, as was judged the case here. The procedure yields two key outputs, per analysis (i.e., overall and by moderator subgroup): (1) the sample N-weighted mean d closely approximates the value of d that would obtain with all participants from all studies combined into a single sample; and (2) SD(d’) is the standard deviation of d with variability due to sampling error removedFootnote 1. To the degree SD(d’) > 0, the mean weighted d may be considered the mean of multiple uncorrected δs, warranting consideration of possible moderators. Of further relevance to moderators, SD(d’) is used to calculate an 80 % credibility interval [mean d ± 1.28* SD(d’)], identifying the points at which 10 % of uncorrected δs (represented by the given sample of input ds) are expected to fall above and 10 % to fall below. Where SD(d’) = 0, the interval has zero width, and N-weighted mean d is taken to estimate a single uncorrected δ. Success rates, or relative improvement, were assessed using the binomial effect size display (BESD) [69].

Due to the relatively low power conferred by the small number of available studies, we relied on procedures offered by Hunter and Schmidt [68, p. 293-294], based on logic underlying analysis of variance, when testing for moderators. Specifically, moderation is suggested to the degree that (a) mean d varies between subgroup and (b) the k-weighted average Var(d) across subsets is less than that based on the combined (i.e., overall) sample. The Hunter and Schmidt [68] method has been found to yield the most accurate estimate of the moderating effect magnitude, as well as superior regarding accuracy of Type I error rates [70].

We also used a more traditional method of testing moderator effects, comparing subgroups based on the Q statistic [71]. Specifically, between-group heterogeneity (Q between) is tested as \( \chi \) 2 with df = N groups – 1. For moderators with three or more levels (i.e., intervention package and methodological rigor), an additional conservative omnibus moderator comparison approach was used [71]. This approach resembles the Bonferroni correction used in ANOVA, where a series of comparisons between all possible two-group combinations is made, and the Q between for each of the two groups is examined using a corrected significance level based on the total number of comparisons.

Analyses

Three sets of analyses were conducted: (1) intervention outcomes comparing treatment relative to control groups for PTSD (k = 14); (2) pre- versus post-treatment intervention data without control groups for PTSD (k = 30), including 13 of the 14 from the analysis above; and (3) potential moderating effects of intervention package, treatment modality (individual vs. group), providers’ level of training, intervention setting, parental involvement, participant age, length of therapy, intervention delivery timing, and the methodological rigor of the study. Moderator subgroups were derived from the 30 pre/post studies, permitting direct comparison of mean subgroup Var(d) to overall Var(d). Intervention packages represented by only one study (i.e., Psychological First Aid, Relaxation, CBT with Grief components, Psychological Debriefing, and Spiritual-Hypnosis Assisted Treatment) were excluded altogether from this particular moderation analysis. To allow more robust comparisons for the five remaining packages with k ≥ 2, we re-ran the overall analysis without the five k = 1 intervention packages (k = 25). Similarly, five input samples did not specify the setting in which the treatment was carried out, and were thus excluded from the moderation analysis examining the effect of setting. We re-ran the analysis without these five studies (k = 25), to allow more robust comparisons for the remaining samples with clearly specified setting. Across all moderator analyses, we report meta-analytic results for k = 2 cases, but interpret the outcomes with caution.

Results

The 31 input samples had a combined N of 2630 participants (Table 1). Averaging across samples, the mean participant’s age was 10.9 years (SD = 2.3), and 46 % of participants were male. All samples had been exposed to traumatic events, with median intervention starting 365 days after exposure. Interventions averaged 8.4 (SD = 4.9) hours of therapy across 8.1 (SD = 4.6) sessions.

Symptom Outcomes

Meta-analytic results, presented in Table 2, suggest that treatment group outcomes are better than control and waitlist group outcomes on PTSD measures, showing a medium effect size (Cohen’s d = .74, SD = .59, 80 % CI = -.02 to 1.49). With respect to binomial success rates, participants receiving some form of psychological intervention averaged 66 % more improvement over that of control and waitlist group participants on PTSD measures. The weighted mean effect size for the reduction in PTSD symptoms as a result of psychological intervention (pre- vs. post-treatment) was large (d = 1.13, SD = .69, 80 % CI = .25 to 2.02), yielding 74 % relative improvement.

Table 2 Meta-analytic results for treatment effects (N = 2630)

Moderator Analyses

Moderator analyses examined whether the effects of psychological intervention on PTSD outcomes varied as a result of intervention package, treatment modality (individual vs. group), providers’ level of training, intervention setting, parental involvement, participant age, length of therapy, intervention delivery timing, and methodological rigor.

Comparisons by intervention package (focusing on the five intervention packages with k ≥ 2) yielded different mean effect sizes, accompanied by a moderation [mean subgroup Var(d) = .42, Var(d) overall  = .57; \( \chi \) 2 (4, K = 25) = 149.64, p < .001]. Large treatment effect sizes are evident for EMDR (d = 2.15), followed by Exposure (d = 1.56), Strict CBT (d = 1.25), Eclectic with CBT (d = 1.07), and Eclectic with no CBT (d = .56). Using the Omnibus moderator comparison approach corrected for multiple comparisons at p = .005 level, the Q between analysis revealed that EMDR significantly differed from Eclectic with CBT [\( \chi \) 2 (1, K = 12) = 63.48, p < .001], Strict CBT [\( \chi \) 2 (1, K = 11) = 42.11, p < .001], Exposure [\( \chi \) 2 (1, K = 4) = 9.55, p < .005], and Eclectic with no CBT interventions [\( \chi \) 2 (1, K = 4) = 115.04, p < .001]. Exposure significantly differed from Eclectic with no CBT [\( \chi \) 2 (1, K = 4) = 43.39, p < .001], Eclectic with CBT [\( \chi \) 2 (1, K = 12) = 12.35, p < .001], but not Strict CBT [\( \chi \) 2 (1, K = 11) = 4.84, p = .03] interventions. Strict CBT significantly differed from Eclectic with CBT [\( \chi \) 2 (1, K = 19) = 10.57, p < .005], and Eclectic with no CBT interventions [\( \chi \) 2 (1, K = 10) = 71.55, p < .001]. Finally, Eclectic with CBT and Eclectic with no CBT interventions differed significantly [\( \chi \) 2 (1, K = 12) = 46.87, p < .001].

The analysis further yielded higher mean effect sizes for children receiving individual treatment (d = 1.38) relative to those in group therapy (d = 1.04), accompanied by a moderation [mean subgroup Var(d) = .43, Var(d) overall  = .54; \( \chi \) 2 (1, K = 30) = 55.38, p < .001].

Interventions carried out by mental health professionals yielded larger effect sizes (d = 1.19) than those carried out by teachers and other school personal (d = .93) for PTSD outcomes [mean subgroup Var(d) = .47, Var(d) overall  = .54; \( \chi \) 2 (1, K = 30) = 26.93, p < .001].

Outcomes varied according to the setting in which interventions were carried out [mean subgroup Var(d) = .53, Var(d) overall  = .56; \( \chi \) 2 (2, K = 25) = 31.72, p < .001], such that the largest effect sizes were observed for interventions conducted in refugee camps (d = 1.47), followed by interventions at health or mental health sites (d = 1.31), and finally, schools (d = 1.11). Using omnibus moderator comparison approach corrected for multiple comparisons at p = .017 level, the Q between analysis revealed that interventions conducted in refugee camps predicted better outcomes than interventions in schools [\( \chi \) 2 (1, K = 20) = 28.25, p < .001], but did not significantly differ from those conducted in mental health settings [\( \chi \) 2 (1, K = 7) = 2.62, p = .11]. Gains were significantly greater for interventions conducted in health or mental health settings than in those conducted in schools [\( \chi \) 2 (1, K = 23) = 6.34, p = .012]. Notably, only two studies in our sample were conducted in refugee camps; the results should be interpreted with this in mind.

Interventions involving parents yielded better outcomes than those not involving parents [d = 1.30 vs. d = 1.04; mean subgroup Var(d) = .52, Var(d) overall  = .54; \( \chi \) 2 (1, K = 30) = 37.08, p < .001].

Outcomes varied across different age groups [mean subgroup Var(d) = .50, Var(d) overall  = .54; \( \chi \) 2 (2, K = 30) = 71.26, p < .001]. Using omnibus moderator comparison approach corrected for multiple comparisons at p = .017 level, the Q between analysis revealed that children in the 10-11.6 age group (d = 1.30) exhibited greater treatment gains than children in the 5.3-9.83 age group (d = .91) [\( \chi \) 2 (1, K = 22) = 68.43, p < .001], but not children in the 12-16 age group (d = 1.19) [\( \chi \) 2 (1, K = 22) = 4.29, p = .04]. Additionally, children in the 12-16 age group exhibited greater treatment gains than children in the 5.3 - 9.83 age group [\( \chi \) 2 (1, K = 16) = 28.11, p < .001].

Outcomes further varied as a result of number of hours spent in therapy [mean subgroup Var(d) = .53, Var(d) overall  = .54; \( \chi \) 2 (2, K = 30) = 36.95, p < .001]. Using omnibus moderator comparison approach corrected for multiple comparisons at p = .017 level, the Q between analysis revealed that spending 0.5 – 4.5 (d = 1.18) and 6 – 9 (d = 1.25) hours in therapy did not significantly differ [\( \chi \) 2 (1, K = 20) = 2.12, p = .15]. Both significantly predicted better outcomes than spending 10 – 18 (d = .95) hours in therapy [\( \chi \) 2 (1, K = 20) = 19.36, p < .001 and \( \chi \) 2 (1, K = 20) = 34.37, p < .001, respectively].

Interventions delivered within four months following trauma exposure yielded largest effect sizes (d = 1.47), followed by interventions delivered between 39 and 69 months (d = 1.20) and interventions delivered between 4.5 and 12 months following trauma exposure (d = .76), supported by a moderation [mean subgroup Var(d) = .48, Var(d) overall  = .54; \( \chi \) 2 (2, K = 30) = 185.08, p < .001]. Using the Omnibus moderator comparison approach corrected for multiple comparisons at p = .017 level, the Q between analysis revealed that interventions delivered within the first four months significantly differed from interventions delivered between 4.5 and 12 [\( \chi \) 2 (1, K = 17) = 178.60, p < .001] months and interventions delivered between 39 and 69 months following trauma exposure [\( \chi \) 2 (1, K = 23) = 27.76, p < .001]. Finally, interventions delivered between 4.5 and 12 months differed significantly from those delivered between 39 and 69 months following trauma exposure [\( \chi \) 2 (1, K = 20) = 78.29, p < .001].

With respect to methodological rigor, the largest effect sizes were observed in Type 2 studies (d = 1.22), followed by Type 1 (highest rigor) (d = 1.20), and Type 3 (d = .93) studies. A moderating effect of methodological rigor was supported by Q between [\( \chi \) 2 (2, K = 30) = 39.48, p < .001], but not a Var(d) comparison [mean subgroup Var(d) = .54, Var(d) overall  = .54]. Using the Omnibus moderator comparison approach corrected for multiple comparisons at p = .017 level, the Q between analysis revealed that Type 1 studies significantly differed from Type 3 [\( \chi \) 2 (1, K = 18) = 22.28, p < .001], but not Type 2 studies [\( \chi \) 2 (1, K = 18) = .13, p = .72]. Finally, Type 2 and Type 3 studies also significantly differed [\( \chi \) 2 (1, K = 24) = 36.23, p < .001].

Discussion

This meta-analysis assessed (1) the effectiveness of psychological interventions in reducing PTSD symptoms in 2630 child and adolescent survivors of mass trauma; and (2) PTSD factors that may moderate treatment outcomes, including intervention package, treatment modality (individual vs. group), providers’ level of training, intervention setting, parental involvement, participant age, length of therapy, intervention delivery timing, and the methodological rigor of the study.

In general, results suggest that psychological interventions are successful in ameliorating posttraumatic stress symptoms. The weighted mean effect size of PTSD reduction as a result of psychosocial treatment was large, yielding a 74 % improvement compared to baseline. Moreover, children and adolescents receiving any psychological intervention fared significantly better than those in control or waitlist groups with respect to PTSD symptoms, averaging 66 % more improvement than those not receiving treatment. Together these results suggest that natural recovery or regression to the mean cannot fully explain the positive outcome from these interventions [7••]. Thus there is clear evidence that child mental health interventions are effective. Future studies should provide evidence about the characteristics of those who do not improve in order to hone treatments to address the needs of all children and adolescents post disaster.

Although our meta-analysis includes several interventions with only two studies examining them (EMDR, Exposure, Eclectic with CBT), compelling evidence suggest that EMDR, Strict CBT, Exposure, and Eclectic with CBT intervention packages are all effective for children and adolescents post disaster. As EMDR, Exposure, and Eclectic with CBT demonstrated strong effectiveness based on only two studies each, more studies yielding positive effects are needed before such interventions can be recommended. Based on the studies included in this review, it may be prudent to use strict CBT for large-scale disaster responses until more is known. Strict CBT, which had a slightly lower effect size than EMDR and Exposure in this meta-analysis, was based on more studies (nine studies, across six research teams) involving more participants, suggesting that at this time, the evidence in its favor is more robust. Nevertheless, all interventions were effective and EMDR, Exposure, and Strict CBT had the largest effect sizes and should be considered evidence-based practice. These findings and guidelines differ from those of the International Society of Traumatic Stress [72], which could not determine conclusive recommendations based on the available evidence at that time. However, the ISTSS guidelines were confined to interventions solely in the early phase, included data from studies involving accidents and had fewer studies on which to make a determination. Future research might examine what specific intervention components are shared across these treatments and dismantling efforts might determine which are most critical [7••]. Given the sizable public health demands of a disaster, distilling the necessary curative factors can be expected to enhance the cost-efficacy of interventions.

Several moderators were shown to affect treatment outcomes. Children receiving individual therapy had greater improvement than those in group interventions, although both formats yielded large effect sizes. Given that 43 % of the treatments were delivered more than a year after the event, it may be that more individualized treatment was necessary due to the chronicity of the PTSD symptoms. Similarly, while all providers were effective, those interventions carried out by mental health professionals yielded more improvement in PTSD than those carried out by other types of providers. These results need to be interpreted cautiously as only four studies examined non-mental health professionals. Outcomes varied according to the setting in which interventions were carried out such that the largest effect sizes were observed for interventions conducted in refugee camps, followed by interventions at health or mental health sites and finally, schools. Since only two studies in our sample were conducted in refugee camps, these results indicate that, at a minimum, interventions can be successfully delivered in temporary sheltering. Ultimately we need a larger set of studies to examine and untangle the interactions of format (group, individual), provider training, and setting.

Parental involvement in treatment enhanced outcome, but the effects of treatments not involving parents were still large. Thus, while parental engagement may be optimal it is not a necessity. In general, children ages 10 and above demonstrated greater treatment gains than children below 10, but all age groups benefited from treatment. It is noteworthy that effect sizes for all ages were of medium to large size (d range = 0.91 to 1.30).

Optimal timing and length of post-disaster services continues to be a greatly debated topic [8••]. Strikingly, children who engaged in one half hour to nine hours of an intervention predicted better outcomes, but all the outcomes had medium to large effects (d range = 0.95 to 1.25). It is unclear if this is an artifact due to the fact that those children and adolescents with greater needs required more time to address symptoms. Interventions delivered within four months of the event yielded the largest effect size, supporting the need for early interventions. Of the ten studies within four months, four initiated delivery within the first month, or the acute phase [73]. Thus we do not have enough information yet to make recommendations about optimal timing within the first four months and whether the acute interval is indeed an optimal time of receptiveness to secondary prevention strategies [73]. Strategies and funding to help researchers quickly assess acute pediatric post-disaster interventions are a high priority public health need.

Studies with more rigorous methodology were more likely to have stronger effect sizes. Strong effect sizes are noteworthy in light of concerns raised that the demands of conducting outcome research might detract from addressing and individualizing post-disaster needs [5]. These meta-analytic results suggest quite the contrary conclusion; despite the challenges of conducting clinical trials in post-disaster recovery environments, such efforts likely enhance outcomes. This conclusion is tempered by the fact that the inclusion criteria for this meta-analysis necessitated that all studies at minimum had some outcome data.

Our results bear consideration in light of several limitations. First and foremost, the number of studies meeting inclusion criteria was modest, especially within moderator subgroups. More robust analysis awaits accumulation of larger numbers of point estimates from independent samples. Moreover, many new recommended disaster-related interventions for children are not yet tested in scientific literature, nor explicitly linked to PTSD outcomes. For example the core principles of disaster interventions [74] are beyond the scope of this review. Second, as an extension of the first point, interactions among intervention packages and other moderators were precluded by the small k, warranting future examination as more studies become available for meta-analysis in this area. Finally, this review was confined to examination of PTSD symptoms only and did not include other important outcomes such as depression, anxiety, academic success, quality of life, and interpersonal functioning.

Conclusions

Meta-analytic results of the existing literature indicate that disaster interventions for children and adolescents clearly are efficacious. More outcomes research on emerging and existing interventions is needed to enhance public health interventions and address issues that cannot yet be determined based on the existing literature. Future research would benefit from including evaluations of cost-effectiveness and ease of dissemination.