Introduction

Conduct problems in children and adolescents (herein referred to as children if further specification is not warranted) refers to violations of social rules and negative actions toward others, including behaviours such as fighting, lying and stealing [24, 26]. Whether antisocial behaviour is sufficiently severe to constitute a diagnosis of oppositional defiant disorder (ODD) or conduct disorder (CD) depends on a number of characteristics of the behaviours. According to the Diagnostic and Statistical Manual of Mental Disorder, fourth edition (DSM-IV), ODD and CD involve the behavioural disturbances causing clinical impairment in social, academic and/or occupational functioning [1]. A large number of studies have reported worryingly high prevalence of antisocial behaviours [54]. Conduct problems in childhood increase the risks of dropping out of school [45], teenage parenthood [44], and marital instability [46]. It is suggested that antisocial behaviours are stable over time, do have a long-term impact, and do increase the risk for antisocial personality disorders [77]. Severe antisocial behaviour in children causes considerable economic costs to the community, although the burden still falls most heavily on the family [65]. These findings indicate a need for knowledge of effective treatment interventions. The objective of the present investigation was to perform a meta-analysis of studies addressing the effects of various psychotherapeutic interventions treating conduct problems in children.

Due to the multiple domains influenced by disruptive behaviour disorder (DBD), several domains of functioning besides reductions in disruptive behaviours are of interest. We hoped to examine multiple domains of functioning, such as internalizing problems, self reported delinquency, and in vivo observation of children’s behaviours, but information of these domains of functioning was rarely reported in many of the included studies. Since it is a necessity that a sufficient number of studies report this information to maintain statistical power, we decided to address only three domains of functioning other than reductions in disruptive behaviours: changes in oppositional and aggressive behaviours in day-care and school settings, changes in social functioning, and reductions in parental distress.

Several meta-analyses relevant to the treatment of DBD are available in the literature. Weisz and his colleagues have conducted two major meta-analyses addressing psychosocial interventions for children, also focusing on aggressive behaviours [93, 94]. Their main search for articles included studies published between 1970–1985 and 1983–1993 in addition to studies referred to in previous published meta-analyses. The mean effect size (ES) was 0.79 in 76 studies concerning under-controlled behaviours (such as delinquency and aggression), in their meta-analysis published in 1987 [93]. The mean ES was 0.52 in the 59 studies of under-controlled behaviours in their meta-analysis published in 1995 [94]. In a meta-analysis focusing explicitly on behavioural interventions for disruptive behaviours, including 26 studies published from 1969 to 1992, the overall mean ES was 0.82 [72]. It is conducted two meta-analyses focusing on cognitive-behavioural interventions for conduct problems [7, 80]. Sukhodolsky et al. [80] reported an overall mean ES of 0.67 based on 40 studies, published from 1977 to 1996. Bennett and Gibbons reported an overall mean ES of 0.23 based on 30 studies, published in the period 1974 through October 1998 [7]. Recently, a meta-analysis of differential effectiveness of behavioural parent-training and cognitive-behavioural therapy for antisocial youths was conducted [55]. When combining the behavioural and cognitive behavioural therapy for antisocial youths, in all 71 studies, McCart et al. [55] reported an overall mean ES of 0.40, in studies published prior to 2005.

Reviewing the previous meta-analysis on treatments for DBD, the need for an updated review including multiple treatments such as behavioural, cognitive-behavioural, family and psychodynamic therapies is evident. This review includes recent psychosocial interventions, interventions aimed at reducing aggressive, oppositional and maladaptive behaviours through counselling, training programs or treatment plans, focusing on studies published from 1987 to January 2008. Studies with less stringent designs were included, such as those with no control condition (e.g. pre- post designs), in order to hopefully identify a wider spectrum of recent developments in the treatment of DBDs. The alternative would have been to eliminate these studies, which would imply the assumption of their lack of usefulness as potential sources of valuable information regarding the effectiveness of treatment for children being disruptive. In addition to exploring the treatment effects of psychosocial interventions in reducing oppositional and disruptive behaviours, our meta-analysis focuses on gains from treatment in other functioning domains, as mentioned earlier. To our knowledge, no meta-analysis has focused explicitly on DBD and possible additional treatment effects including various treatments for children. Our meta-analysis also addresses the effects of independent replication studies, and as such, our review may represent new and important knowledge regarding replicable treatment effects of particular clinical interest.

Method

Criteria for review

Inclusion and exclusion criteria were used to identify studies included in the meta-analysis. Inclusion criteria were: (1) the children were in clinical range when their disruptive or aggressive behaviours were evaluated (for instance a t-score above 67 on CBCL externalizing or a score above the 90th percentile on Eyberg Child Behavior Inventory, and/or children fulfilling the diagnostic criteria of ODD/CD); (2) reports were published or written in the period covering January 1987 until January 2008; (3) mean age was below 18; (4) the study reported at least one quantitative measure (rating scale or method of observation) of change in disruptive and aggressive behaviours. The exclusion criteria were (1) studies with participants in non-clinical range, (2) studies of psychosocial interventions not identified or described by the authors; (3) single-case studies; (4) studies not maintaining psychopharmacological treatment throughout the study period; (5) studies only reporting follow-up data; (6) inpatient or residential treatments.

Search for literature

A systematic and comprehensive search for studies for the period 1987 until August 2005 was conducted. A detailed overview of the search process and the words and truncations in the searches is presented in Fig. 1. First, PsychINFO was searched for outcome studies on disruptive and aggressive behaviours. More searches on the authors of these studies being included in the meta-analysis generated by the previous searches were conducted. These searches resulted in 35 studies fulfilling intake criteria. In order to include relevant literature not previously identified, searches in the reference lists of relevant literature reviews were conducted (i.e. 24, 26, 41). This method resulted in an additional 18 articles. A personal request for articles in progress or unpublished material was sent by electronic mail to researchers who had been involved in two or more of the studies already included. These requests resulted in one article in press and two unpublished reports being included. To include more recent studies and decrease the possibility of overlooking studies of interest, a final search for studies published from 1987 to January 2008 was conducted, resulting in nine studies being included, thus yielding a total of 65 studies included in the meta-analysis. The first author conducted the screening for relevant articles. Previous results of meta-analyses have confirmed the significance of including doctoral dissertations as well as published studies, since the latter obtained larger ESs [56]. A total of four dissertations were included fulfilling the inclusion criteria.

Fig. 1
figure 1

Figure caption search for literature

Coding of studies

Psychosocial treatment was defined as any psychological intervention aimed at reducing aggressive, oppositional and maladaptive behaviours, or enhanc- ing prosocial behaviour through counselling, training programs or predetermined treatment plans. This definition is in line with the definition used by Weiz and his colleagues [56, 93, 94] in several meta-analyses. In accordance with Weisz et al. [94] studies were excluded if they only included reading interventions (“bibliotherapy”), although studies applying bibliotherapy accompanied by other interventions (such as counselling or “video-based” interventions) were included as well. Psychotherapeutic interventions conducted by fully trained professionals, as well as therapists in training (e.g. clinical psychology and social work students, and child psychiatry workers) and trained paraprofessionals were included.

A list of potential moderator variables was coded for each study. These included mean age of the sample. The proportion of boys included in the study. The mode of treatment (coded as behavioural therapy [BT], cognitive behavioural therapy [CBT], BT and CBT in combination, family therapy [FT], or psychodynamic therapy [dyn]) applied in the study. How participants were included or recruited to the study (“inclusion”). The informant of the behavioural data (parent, teacher or in vivo observations), experimental design (i.e., randomization procedures, matching, or no randomization), and whether the studies reported diagnostic conditions (such as ODD and/or CD) in accordance to standards set by ICD 9 or 10, or DSM IIIR or IV or not [1, 95], scored as a categorical variable. Studies were coded whether it was an independent replication of a model program or not. The percentage drop-out (i.e. the percentage of the participants not available at posttreatment) was coded. The number of participants included in the meta-analysis and the total number of boys and girls did not always correspond to the percentage drop-out in the study. The percentage drop-out was calculated using information from the studies of participants after inclusion or randomization, when available. The year of publication was also included as one of the moderator variables. Studies were also coded for features related to the training and experience of the therapists, the ethnicity of the participants, the number of treatment-sessions, and various factors regarding research. The coding manual involved a total of 52 coding variables. Some of these, such as socioeconomic factors, were reported in some of the studies with various formats and not reported in some studies, making it difficult to score this variable in a meaningful way for the purpose of merging the data. Further, none of the studies reported information on all 52 coding variables. It seems probable that 52 coding variables were an overestimate of the information actually reported in the studies. For analytic purposes across the range of studies, we were able to include 11 coding variables in the moderator analysis, as they were often reported and appeared more relevant for the purposes of the study. After training in the coding system, three coders independently scored 20% of the studies. The range of percentage inter-rater agreement was between 83 and 90% for the variables. Disagreements were resolved by discussions. In case of irresolvable disagreements, the first author (SF) would have decided the outcome.

Procedure and statistical analysis

Sixty-five studies were included, and in order to ensure independent ESs a single ES was calculated for changes in aggressive behaviours for each study. Non-independent ESs are problematic as these will give more weight to studies with multiple ESs compared to studies with only one effect size when merging the data, as well as violating the assumptions underlying estimation and testing of variance across studies [52]. Several of the studies included multiple interventions or several intervention modifications, and a pooled total effect was computed for these, weighted by the number of participants in each condition. Possible differences in treatment effects between these conditions were eliminated by the procedures adopted. The descriptive characteristics of the 33 studies involving an untreated control (waitlist) condition (design-1) are reported in Table 1 and the descriptive characteristics of the 32 studies with either a treated control or no control (design-2) are presented in Table 2. Ten of these studies involved a “treatment as usual” control condition, marked with * in Table 2, but the pre–post differences in the “treatment as usual” conditions were not included to secure independency of the data when merging the studies. Data from each relevant comparison in each study were used to estimate the ESs. If a study reported several measures of aggressive and disruptive behaviours, a pooled ES of the measures was calculated and reported in Tables 1 or in 2, where each individual effect was weighted by the number of participants. Mothers were preferred to fathers as respondents, since mothers in general outnumbered fathers as respondents, and because many studies do not report father reports, which could cause difficulties in making comparisons. In design-1 studies the ESs were calculated using the following formula:

$$ {{\rm ES}}_{{{\rm 1}}} = \frac{{m_{{{\rm I}}} - m_{{{\rm C}}} }} {{{{SD}}_{{{\hbox{(pooled)}}}} }}$$
Table 1 Study characteristics: Studies involving an untreated control (ES1)
Table 2 Study characteristics: studies involving no untreated control (ES2)

ES1 were calculated as the difference between the mean changes in the treatment intervention condition(s) (m I) and the untreated control condition (m C) divided by the pooled standard deviation of the pre-test score for the two conditions (SD (pooled)). For design-2 studies, a within-group effect sizes were calculated, using the following formula suggested by Becker [6]:

$${{\rm ES}}_{{{\rm 2}}} = \frac{{m_{{t1}} - m_{{t2}} }} {{{{SD}}_{{t1}} }}$$

ES2 were calculated subtracting the mean score at time 1 (m t1 ) with the mean score at time 2 (m t2 ) divided by the standard deviation of the pre-test score (SD t1 ).

The pre-test standard deviation was chosen as denominator because it has not been influenced by the experimental manipulations (i.e., differential treatment effects) and is therefore more likely to be consistent across studies, permitting an estimate of treatment effects in studies without control groups [6]. ESs were calculated from means and standard deviations when these measures were available. If not, the most relevant information regarding change in oppositional and aggressive behaviours was applied, such as t tests, F tests, or P values. All ESs were calculated using the comprehensive meta-analysis program [11]. Each ES was weighted by the inverse of its variance (ω), in order to give more weight to studies with larger sample sizes. The statistical significance of each ES was estimated. A positive ES indicated a reduction in aggressive and oppositional behaviour from pre- to posttreatment or a preferable treatment result. According to Cohen’s descriptions, an ES of d = 0.2 denotes a “small” effect, a value of d = 0.5 denotes a “medium” effect, and a value of d = 0.8 denotes a “large” effect [14].

Within meta-analyses there is a distinction between fixed effects models and random effects models (see e.g. 29, 30). Fixed effects models make the assumption that the population effect size is constant, and unless this assumption is met, the analyses will have inflated the Type 1 error and will report overly narrow confidence intervals [35]. The random effects model was used in this investigation, as it is more likely that there is true variation in the population parameters, and the random effects model is more appropriate under these assumptions. Another argument supporting the use of a random effects model was the relatively small number of studies in this area, which will result in low statistical power for the chi-square test used to test variation between studies [49]. The test may, under these circumstances, fail to reject the homogeneity hypothesis even with substantial differences between studies.

The analyses of potential moderator variables were calculated using SPSS. A weighted, inverse variance, correlation analysis of the continuous variables and a weighted analysis of the discrete variables was conducted in order to assess the relationship between the ESs and the moderator variables in accordance with recommendations by Lipsey and Wilson [49]. The value of ω was used in these analyses as weight.

Results

Sample characteristics

A total of 33 studies applied design-1. Sample characteristics for the individual studies included in the meta-analysis are shown in Table 1 All the ESs (100%) were positive in direction, indicating an improvement after treatment, and 21 (63.6%) reported significant results (P < 0.05). A total of 2,512 individual participants were included, with mean age ranging from 4 years to 13.5.

The sample characteristics for the individual studies using design-2 are presented in Table 2. Of the 32 studies being design-2 studies, all the ESs (100%) were positive in direction, indicating an improvement after treatment, and 24 (75%) reported significant results (P < 0.05). A total of 2,459 individual participants were included, with mean age ranging from 4 years to 16.

Reductions in aggressive behaviours

The overall mean weighted ES in design-1 studies was 0.62, indicating moderate treatment effect, while the overall weighted ES in design-2 studies was 0.95 indicating large treatment effects. Both these ESs are significantly different from 0. Whether or not this indicates larger treatment effects for design-2 studies as compared to design-1 studies is unresolved, since design-1 studies represents a more stringent calculation of treatment effects, compared to the calculations of ESs in design-2 studies. Table 3 presents the overall weighted mean ES for design-1 studies and design-2 studies, the corresponding confidence intervals and tests of heterogeneity. All the analyses were based on a random-effects model.

Table 3 Meta-analysis results for overall between (design-1, ES1) and within (design-2, ES2) group effect sizes and the corresponding tests of heterogeneity

Moderator variables

The moderator analysis was computed separately for the two designs of calculating ESs. For design-1 studies, one variable was significant at the 0.05 level. Studies with smaller sample size did result in larger effect sizes compared to studies with larger sample sizes. For design-2 studies, three variables were significant at the 0.05 level. Studies with younger children resulted in larger ESs than studies with older children, studies applying a BT intervention resulted in significantly larger ESs as compared to studies applying FT interventions, and studies providing diagnostic information, in all 15 studies, did result in larger ESs than the 17 studies not presenting this information. Table 4 presents the weighted ESs of the variables included in the moderator analysis separately for both design-1 and design-2 studies.

Table 4 Mean effect sizes for variables in the moderator analysis

Studies involving more boys tended to yield larger ESs for both methods of calculating ESs, although this was not significant. Further, there was a tendency for more recent studies to result in smaller ESs as compared to older studies. None of the other variables in the moderator analysis indicated any interpretable trends. Still, it is important to note that some of the variable categories contain fewer studies while other variable categories contain more numerous studies. As a consequence of this imbalance, the differences between the categories need to be large in order to result in significant differences.

Additional treatment gains

A total of 27 studies reported changes in aggressive behaviour in day care or in school, 20 design-1 studies and 7 design-2 studies. From these studies, 32 independent ESs were calculated. The overall mean design-1 ES was 0.41 and the overall mean design-2 ES was 0.63, both these effects were significantly different from 0. Changes in teacher-reported aggression were significantly correlated with parent-reported changes in aggression (r = 0.65, P < 0.01). In the 23 studies reporting a change in social functioning, 13 design-1 studies and 10 design-2 studies, the overall mean design-1 ES was 0.42 and the overall mean design-2 ES was 0.49. Both these ESs were significantly different from 0. Changes in social functioning correlated significantly with changes in aggression (r = 0.43, P < 0.05). A total of 33 studies reported changes in parental distress, 16 studies of design-1 studies and 17 studies of design-2 studies. The overall mean weighted design-1 ES for parental distress was 0.39 and in design-2 studies the mean ES was 0.47, both effects were significantly different from 0. Changes in parental distress correlated significantly with changes in aggression (r = 0.58, P < 0.01). Table 5 presents the overall weighted mean ESs in teacher reported reductions, improved social functioning, and reductions in parental distress and the corresponding confidence intervals.

Table 5 Generalization of treatment effects, changes in social skills, and change in parental distress

Discussion

The overall mean ESs were 0.62 in design-1 studies (untreated control group) and 0.95 in design-2 studies (pre-test–post-test), which indicates moderate and large reduction in oppositional and aggressive behaviours. Consequently, a number of psychosocial interventions are effective in reducing disruptive behaviours. Nevertheless, the tests of homogeneity were significant for both ways of calculating the ESs, implying that the effectiveness of interventions varies across studies. The control-group design has a higher internal validity compared to a pre-test–post-test design, where factors such as test-retest effects, history and maturation may be contributing to the observed difference in pre-test and post-test scores in addition to the treatment effects. These threats to internal validity may explain why the mean estimated effect size is larger for a pre-test–post-test design compared to a control-group design [73]. Further, the differences between the two methods of calculating the ESs could at least partially be explained by the differences in the two designs’ capacity to detect regressions toward the mean, as well. Within-group comparisons of control conditions do indicate that control groups experience changes from pre- to posttreatment equals small ESs [92], which in turn implies smaller ESs in control-group designs as compared to pre-test–post-test designs. The overall mean ESs in this meta-analysis were congruent with the previous meta-analyses [7, 72, 80, 93, 94].

The reductions in teacher-reported aggression, improved social functioning, and reductions in parental distress were all moderate in size. The reductions in disruptive behaviours as reported by the mothers were correlated to reductions in disruptive behaviours at school, improved social functioning, and reduced parental distress, still these correlations were moderate in size. It appears that there is still a need to further develop interventions that will lead to progress in children’s and parents’ functioning besides reductions in disruptive behaviours. Nevertheless, it should be noted that not all children referred for aggressive behaviours display aggressive behaviours in day-care or in school, or have reduced social functioning, or that all parents experience parental distress. As a consequence, the potential for improvements for these variables are smaller, and as such the obtained ESs are promising. The fact that improvements in school behaviour, social functioning, and parental distress correlate significantly with reductions in disruptive behaviours indicate the relevance and importance of these items in understanding treatment progress in aggressive children.

The findings in the moderator analysis and implications for research

The variable “sample size” was significant in the moderator analysis of design-1 studies. Studies involving smaller samples did obtain larger ESs than studies with larger samples, a finding corresponding to Serketich and Dumas’ finding in their meta-analysis [72]. One possible explanation for this finding is that fewer participants being offered treatment allow for more intensive treatment efforts. If this is the case is unknown. Another possible explanation is that small samples are more vulnerable for outliers in both conditions, that is, participants in active treatment experiencing relatively large improvements and/or participants in the waiting condition experiencing deterioration. A final explanation is publication bias, which could imply that studies with small samples and possibly insignificant effects are less likely to be published than studies with large samples, but small treatment effects. If this is true, important information for practitioners and researchers regarding ineffective interventions may be concealed.

The variables “mean age”, “treatment”, and “diagnosis” were significant in the moderator analysis of design-2 studies. The finding concerning “mean age” indicated that studies involving younger children do obtain larger ESs compared to studies involving older children and adolescents. The effect of maturation may be stronger for younger compared to older children possibly explaining the larger effect sizes for pre-test–post-test design. Findings from the earlier meta-analyses indicate that older children and adolescents typically benefit more from CBT interventions [7, 55, 80], while younger children tend to benefit more from BT interventions [55].

The variable “treatment” indicated that BT interventions obtained significantly larger ESs as compared to FT. This effect may well relate to the same effect seen on the variable “mean age”. In general we found that BT interventions, most often various parent training interventions, seem to be the treatment of choice for children, while FT seem to be the treatment of choice for adolescents. The mean age of the participants in the studies applying BT were 6.3 years, the mean age of the participants in studies applying CBT interventions were 11.0 years, and the mean age of the participants in studies applying FT interventions were 14.8 years. Consequently, it seems that these interventions target children being on different developmental stages. It is further important to note that the ESs in studies applying FT interventions for adolescents varies across studies.

BT interventions, sometimes in combination with CBT, are empirically evaluated more often than other interventions. We were able to identify ten studies using FT (of which one study actually was a combination of FT and BT; see [68]) as the main therapeutic intervention, which seems like a rather a small figure especially on the basis of these interventions relevance for older adolescents and the promising treatment effects obtained by several of the included studies. As a consequence more studies of treatment effects of FT are needed.

The variable “diagnosis” indicated that larger ESs were obtained by studies reporting diagnostic information contrary to studies not reporting this information. One possible explanation is that studies providing this information are of better methodological quality, including better-developed treatments, thereby resulting in larger ESs. Nor can we rule out the possibility that this effect is due to some sort of “researcher effect”. Some researchers do routinely present diagnostic information (such as Kazdin and Eyberg), and they have been involved in several studies with relatively younger children obtaining fairly large effects. It is also possible that the children who meet diagnostic criteria have higher levels of aggressive behaviours at pre-treatment, resulting in a larger potential for reducing aggressive and oppositional behaviours due to treatment. Nevertheless, it was somewhat surprising to find that less than half of the studies provided information of diagnostic status to the children according to either the DSM or ICD classification systems. This gives cause for concern regarding assessment practice in research on psychosocial interventions for DBD, particularly in view of the relevance of diagnostic information for communication purposes [83].

Still there is a need to further develop effective treatment interventions capable of leading to reductions in children’s disruptive behaviours, especially regarding interventions for adolescents. Interventions may also need to improve the effects on additional problems among children and their families, as is suggested by our findings on additional treatment effects. It is noteworthy that an intervention focusing on parent stress enhanced the therapeutic changes in children as well as reducing parental stress [42]. Interventions that focus on social functioning in older children have additional effects on social functioning and behaviour at school [39]. The findings of an intervention that focused on social skills in younger children are somewhat mixed [87, 89, 90], while an intervention focusing on classroom management of younger children did improve the generalizing effects of treatment [90]. Likewise, adding contingency management to multi-systemic therapy (MST) did increase the treatment effectiveness for juvenile offenders [31]. Although these tendencies are only reported in sub-samples of participants in single studies, somewhat lowering the statistical power of the analysis, the findings could indicate the possibility of improving functioning on multiple domains when specific problems are addressed in treatment. There was an insignificant tendency indicating less treatment effects in more recent studies for both design-1 and design-2 studies in the moderator analysis. This finding is somewhat surprising since the developments of new interventions are believed to increase the treatment effectiveness and not the opposite.

Although several of the included replication studies had some elements of effectiveness trials, that is clinically referred groups under representative clinical conditions, this was not the case for all replication studies. Still, it is promising that independent replication studies did not result in significantly smaller ESs, although the weighted means in the replication studies in general were smaller compared to the model programs. We were able to include 13 studies claiming to be replication studies. This is in itself not a low figure, but then again, eight of these studies were partial or full replications of Webster-Stratton’s treatment program “The Incredible Years” [2, 15, 27, 28, 59, 71, 78, 81]. We identified one study being a replication of Eyberg’s “Parent–Child Interaction Therapy” [60], one study was a replication of Henggeler’s treatment program “MST” [61], two studies were replications of Lochmann’s model program “Anger Control Training” [8, 58], and one study was a replication of a modified version of Turecki’s psychoeducational treatment [76]. Besides Webster-Stratton’s model program “The Incredible Years”, the need for more studies that replicate various model programs is apparent.

Twenty-two studies did not include any control condition. Due to the reductions in problem behaviour equalling a small ES in within-group comparison of control conditions [92], as mentioned earlier, there is a need to include a control condition in all primary research studies of treatment effects to increase internal validity.

Limitations

Our study has several limitations. First, the categorizations of the treatment modes are somewhat large-meshed. Although several researchers for example reported that the treatment involved was family therapeutic or cognitively behaviourally oriented, many of the studies included in this analysis adopted different therapeutic approaches from various therapeutic traditions. Further, several of the studies applied various interventions on subsamples, making our categorization even somewhat more biased. Consequently, caution is of importance when considering mode of treatment and treatment effects. Secondly, it is unfortunate that we were unable to control for effects of treatment dosages in the studies involved in our meta-analysis. As noted by Jensen and his colleagues [37], many studies do not report the treatment dosages, which was true of many of the included studies in this study as well. The application of treatment dosage as a potential moderating factor was therefore omitted. Finally, the statistical power of most of the moderator analyses was low due to a small number of studies, making it difficult to detect real moderator effects. This is especially true for some of the categories concerning “mode of treatment”. Caution is consequently essential when interpreting the results of the moderator analysis. Since few studies were included in this analysis, conducting a weighted regression analysis of moderator variables in models were omitted due to low statistical power.

Conclusion and clinical implications

Several of the programs, with different therapeutic approaches described here, resulted in significant progress in reducing aggressive and disruptive behaviours, and these improvements could produce important additional treatment effects. Still, there is a need to further develop effective outpatient interventions for children being disruptive, and especially for adolescents. The results of the moderator analysis suggest that empirically supported treatments can be implemented in regular clinical settings, since several of the replication studies of model programs do lead to important changes in children’s behaviour. Nevertheless, it is by no means a straightforward decision to decide which programs to implement due to the heterogeneity of the ESs obtained, both in reducing aggressive and oppositional behaviours and in additional treatment gains. Consequently, caution and careful consideration regarding treatment effects as well as possible additional effects due to treatment are essential.