Introduction

A significant socioeconomic burden on public health in developed countries are child behavior disorders (e.g., conduct disorder, attention deficit/hyperactivity disorder), given that between 15 to 20 % of children suffer from these types of disorders during their development (Belfer 2008; Huss et al. 2008). Various risk factors for the development of psychopathology in children have been empirically identified (Capaldi et al. 2002; DeGarmo et al. 2004; Zahn-Waxler et al. 2008), clearly implicating targets of preventive actions. Among the efficacious prevention strategies for child emotional and behavioral problems are behavioral family interventions that explicitly target parenting practices, one well-established risk factor for child externalizing problem behavior. Examples include the Incredible Years Program (Webster-Stratton and Hancock 1998), Parent Management Training (Patterson 2005), or the Positive Parenting Program Triple P (Sanders 2012). These programs draw on social learning models of parent–child interactions aimed at changing children behavior by modifying family environments. Several meta-analyses on parent trainings have been published (e.g., Nowak and Heinrichs 2008, on Triple P; Lundahl et al. 2006, on various programs; Thomas and Zimmer-Gembeck 2007, on Triple P and Parent Child Interaction Therapy), indicating general efficacy for this approach. Kaminski et al. (2008) reported an average effect size collapsed over 77 studies from different theoretical backgrounds of 0.34 (95 % confidence interval [0.29; 0.39]).

Triple P emphasizes the importance of comprehensive dissemination efforts and exemplifies a public health approach with a tiered multi-level intervention model (Sanders 2012). The program consists of five intervention levels of varying degrees of intensity, with Level 1 (information about parenting via various channels, such as radio or TV spots) reflecting the lowest and Level 5 (Behavioral Family Therapy) the highest intervention dose. All levels aim to prevent behavioral, developmental, and emotional problems in children. The program promotes (1) the enhancement of skills, knowledge, confidence, and resourcefulness of parents; (2) the development of more nurturing, safe, engaging, and nonviolent environments for children; and (3) the support of children’s social, emotional, linguistic, intellectual, and behavioral competencies. There is a large body of evidence supporting the efficacy and effectiveness of the Triple P Program, including randomized controlled trials of Level 3, 4, and 5 interventions, as well as trials investigating the delivery of all five levels simultaneously in a community (e.g., Prinz et al. 2009).

In a meta-analysis of the Triple P evidence base (including studies available up to publication year 2007), we concluded that Triple P is related to positive changes in parenting skills, child problem behavior, and parental well-being in the small to moderate range with effects varying as a function of the intensity of the intervention (the Triple P levels), the informant (parental report versus observation), the initial level of problem behavior in children, and the setting (individual, group, or self-help delivery format; Nowak and Heinrichs 2008).

In German-speaking countries, Triple P has been researched in a few trials. For example, in a study by Bodenmann et al. (2008), Triple P (Level 4 group format) was compared with a relationship enhancement program and a non-intervention group in 150 couples randomly assigned to either condition. Families attending Triple P showed a better long-term benefit (i.e., less dysfunctional parenting and higher parental sense of competence in mothers) at 1-year follow-up assessment compared with the other two groups. Child behavior problems were also significantly lower in the Triple P intervention compared with the no-intervention group. In another study, Heinrichs and Jensen-Doss (2010) examined the influence of incentives on families’ outcome in Triple P. Triple P was offered in an individual or group format (both Level 4) to 248 families, and 197 were randomized to incentives (being paid or not paid for participation in the intervention). Families showed significant improvements in parenting and child behavior problems across 2 years with effect sizes varying from small to large, dependent upon the delivery format and the incentive condition. Finally, a variant of the core Triple P program for families of children with a disability, Stepping Stones Triple P, was investigated in Germany (Hampel et al. 2010, b). In an analysis of 118 families aggregated from two differently designed studies (efficacy study with randomization and effectiveness study without randomization), results demonstrated significant reductions in dysfunctional parenting, parental distress, and child problem behavior.

In all of these studies, either parents whose children already displayed some difficulties in behaviors (evidenced by more than 50 % of the sample displaying clinical behavior problems) participated or parents were selectively targeted because of a risk factor (such as a disability or living in a socially disadvantaged neighborhood). In fact, many studies investigating the efficacy of parent trainings selectively target families with specific risk factors. However, selectively targeting only families with specific risk factors requires identification of these families, which increases the risk of stigmatization. Furthermore, a selective or indicated approach may recruit more families from lower socioeconomic areas (because social-economic status is a risk factor, on a family and neighborhood level) but might neglect to appeal to middle- and high-income families who represent the majority of the population in developed countries and who also may have children at risk for problem behaviors (Bayer et al. 2007). Finally, there are practical difficulties in effectively choosing target families (e.g., time span between identification procedure and intervention delivery is sometimes large and accompanied by changes in child functioning that may make a family no longer a target for the intervention, or not identified families may have turned into appropriate targets).

Undertaking a universal preventive effort addressing all families avoids these pitfalls. However, the long-term effects of universal prevention programs targeting dysfunctional parenting in parents with young children have rarely been investigated. There is some longitudinal research on universal parenting and family programs introduced to families with children in middle childhood, for example, with the Iowa Strengthening Families Program and the Preparing for the Drug-Free Years program. These programs have been researched for up to 6 years post-intervention (Spoth et al. 2004). The vast majority of randomized controlled trials investigating parenting programs in parents of young children, however, included a wait list control group, which received the parent training after the waiting time and was therefore not available for long-term follow-up. Investigating the long-term effects of a parenting program introduced in young children on a universal basis might be helpful because children that age change considerably over time. There are a number of developmental transitions ahead for such children (e.g., transitions to school), and it is unclear if potential effects from parenting courses are maintained over the long-term. It is not only important to establish when an intervention produces change but also when it stops to be beneficial. This could, for example, help identify useful points in the developmental pathway for parenting booster sessions.

Finally, the only other study investigating the effects of a universally administered parent training for families with preschoolers in Germany has employed a modified version of Parent Management Training (EFFEKT, Lösel et al. 2006; Stemmler et al. 2007). The intervention comprised five sessions of 1.5 to 2 h each, and 163 mothers and 48 fathers participated in the parent component of this project. Results on child behavior problems (Lösel et al. 2006) demonstrated significant reductions only in families who initially reported elevated child behavior problems prior to intervention. In a separate report on the same study, Stemmler et al. (2007) focused on dysfunctional parenting in a subsample of 128 mothers and 16 fathers who had completed the 1-year follow-up after the intervention. They reported significantly stronger reductions in inconsistent discipline and a stronger increase in positive parenting in parents who participated in the parent training compared with matched control families who did not participate in the intervention. Effect sizes were in the small range (ES < 0.30) and decreased somewhat for positive parenting strategies 1 year later (Stemmler et al. 2007). However, in this study, a significant number of children whose parents attended the parent training also received child social skills training.

The aim of the present study was to investigate the effects of the Triple P parent group training on parenting and child behavior problems monitoring both intervention and control group participants in annual assessments over a follow-up period of 4 years. We previously reported results of 1-year follow-up (Heinrichs et al. 2006), 2-year follow-up (Hahlweg et al. 2010), and 3-year follow-up (Heinrichs et al. 2009) using repeated analyses of variance on an individual level, with families recruited via preschools and randomization occurring on the preschool level. Therefore, one disadvantage of previous outcome reports was the lack of analyses at the level of randomization. Furthermore, we previously dealt with two- versus single-parent households as separate conditions, even though we attempted to explore the universal efficacy of Triple P. Thus, the present report has two aims: (1) to report about potential universal effects of participating in a parenting program 4 years after the initial training offer in preschools reflecting the longest follow-up of a universal prevention program targeting parents only and (2) to appropriately analyze the data with hierarchical linear models to account for randomizing preschools instead of families.

Method

Sample and Procedure

The current study used a stratified randomization to assign treatments to schools. The census bureau of the city provided the social structure index, which served as the stratifying variable. Thus, schools were randomized to treatment, stratified on social structure (and hence, treatments were balanced on social structure by design). Preschools in the city primarily educate children between the ages of 2 to 6 years, and level of training of staff is similar. The staff to student ratio is approximately 1:15.

To be eligible for participation, families had to have a child aged between 2.6 and 6 years attending one of the preschools in Braunschweig, Germany. Braunschweig is a moderately sized city with a slight urban background. Siblings of children already enrolled in the study and families with migration background who exhibited significant problems in communicating in German were excluded. A total of N = 280 families from 17 preschools/kindergarten enrolled in the study (31 % of 915 targeted families; Heinrichs et al. 2005). The social structure index of the preschool’s neighborhood (low, moderate, high number of social problems in the respective neighborhood) was inversely related to participation: More families from preschools located in neighborhoods with less social problems participated in this project. Therefore, it is not surprising that the comparison of the recruited sample with the target population revealed a smaller proportion of families from lower socioeconomic backgrounds among participants. Randomization to the parenting program occurred on the level of preschools, i.e., when families belonged to a Triple P preschool (11 preschools), they were offered the program. If the preschool belonged to the control group (six preschools), families were offered participation in the study with subsequent repeated assessments to observe the child’s development.

It was estimated in advance that only about 50 % of parents would accept the Triple P offer, therefore twice as many preschool were recruited for the intervention condition (n = 11 preschools, 186 parents) than for the control condition (n = 6 preschools, 94 parents). After randomization to the intervention group, only 23 % of parents (compared with the anticipated 50 %) declined the offer of participating in a parent training (n = 42) leading to more participating families in the intervention than the control group (see Fig. 1). These are therefore non-exposed participants, and they differed with respect to several variables from those accepting the offer. Most markedly, pre-intervention decliners reported less child behavior problems pre-intervention (for more details, see Heinrichs et al. 2005).

Fig. 1
figure 1

Flow of participants from recruitment to follow-up after 4 years

Apart from a higher proportion of single-mother families in the control group (34.0 % versus 15.6 % in the intervention group; χ 2(1) = 12.5; p < 0.001), no significant differences between study groups on sociodemographic variables at baseline where found (such as education or immigration background). There were n = 144 boys (51 %) in the total sample. Child mean age was 4.5 years (SD = 1.0). Two parent families represented 78 % (n = 219) of the sample, 22 % (n = 61) were single parent families (in all but one case the child lived with her mother). Maternal age was on average 35 years (SD = 5.0); fathers had a mean age of 38 years (SD = 6.1). The large majority of families were German; 11 % had a migration background (among which families with a Turkish background were the most prominent group).

Assessments were conducted at six time points: initial evaluation (pre), immediately after the intervention (post), and annual follow-ups from 1 to 4 years after post assessment (FU1 to FU4). Self-report measures were sent to families and collected during a home visit. For children living with both parents, mothers and fathers were requested to independently complete questionnaires. Two hundred nineteen mothers provided data for at least one assessment point. In comparison, 201 fathers (corresponding to a paternal participation rate of 91.4 %) provided data for at least one assessment point. The control group was maintained throughout the study period and did not receive this intervention. However, a number of families in the control group developed problems in the 4-year follow-up period and sought professional help. Families received a reimbursement of € 50 (approx. $72USD) for pre- and 1-year follow-up assessments which included behavioral observation and € 20 (approx. $29USD) for all other evaluations. The Human Subjects Protection Board of the German Association of Psychology approved all procedures.

Measures

Parenting Scale

The German version of the Parenting Scale (PS) (Naumann et al. 2010; based on Arnold et al. 1993) was administered to assess parenting skills. The PS is a 30-item questionnaire that measures dysfunctional discipline styles in parents. It yields a total score based on three factors: Laxness (permissive discipline), Over-reactivity (authoritarian discipline, displays of anger, meanness, and irritability), and Verbosity (overly long reprimands or reliance on talking). The total score has adequate internal consistency (alpha = 0.84), good test–retest reliability (r = 0.84), and reliably discriminates between parents of clinic and non-clinic children. Internal consistencies of the German version in the present sample were α = 0.81 (mothers), and α = 0.75 (fathers). The German version of the PS demonstrated a similar factor structure compared with the original version. Each subscale as well as the total score correlated with positive parenting (negatively) and with child behavior problems (positively), as expected (Naumann et al. 2010).

Positive Parenting Questionnaire

The 13-item Positive Parenting Questionnaire (PPQ) was adapted from several existing questionnaires (e.g., Strayhorn and Weidman 1988) and assesses positive and encouraging parental behaviors (e.g., “I cuddle with my child”). Parents rate the frequencies of their behavior during the most recent 2 months. Answer categories are 0 = never to 3 = very often. Cronbachs α values were 0.85 for mothers and 0.87 for fathers in the present sample. Scores were significantly negatively correlated with the PS and positively correlated with parental self-efficacy.

Child Behavior Checklist (CBCL 1 1/2–5 and CBCL 4–18)

The German versions of the Child Behavior Checklist (CBCL) for children aged 1 1/2–5 years and 4–18 years (Arbeitsgruppe Deutsche Child Behavior Checklist 1998; 2000) ask parents to rate presence and frequency of child problem behaviors and emotional disturbances on 100 and 113 items, respectively. Two global dimensions—internalizing (α = 0.86 for mothers and 0.89 for fathers in the present sample) and externalizing problem behavior (α = 0.90 for mothers and 0.92 for fathers in the present sample) are compiled to a total score (α = 0.94 for mothers and 0.96 for fathers in the present sample). The two age-dependent versions of the CBCL cannot be directly compared with respect to raw scores due to different numbers of items and differing assignment of particular behaviors to the internalizing and externalizing scales. Thus, in accordance with the author (T. Achenbach, personal communication, March 2008), both versions’ scores were transformed into standardized Z scores. The prevalence of clinically significant internalizing problems was 9.8 %; for externalizing behavior, the prevalence was 6.2 %. Within the borderline range, the rate was 8.3 % for internalizing as well as externalizing problems. The CBCL was validated by the German Task Force on the CBCL system, and the original factor structure was replicated in a German field study (Döpfner et al. 1995).

Intervention

Triple P was implemented in four group sessions that lasted 2 h each (Level 4; Sanders 2012). During the training, parents are being taught various parenting strategies, including the basics of positive parenting, the etiological and maintaining factors for child problem behavior (e.g., non-attendance to positive child behavior, inconsistent or inappropriate reinforcement), supportive strategies for the child’s development, and techniques to cope with problem behavior (e.g., family rules, contingent consequences, use of clear and calm messages). After the completion of group sessions, parents were offered four weekly individual telephone contacts (15–20 min) to discuss progress, questions, and difficulties with a Triple P facilitator. The Triple P model is based in a self-regulation framework, which promotes parental self-regulation at all times when teaching these strategies. Parental self-regulation includes self-sufficiency (e.g., parent defines a parenting goal), self-efficacy (e.g., parent learns to monitor own and child behavior), self-management (the parent self-evaluates), personal agency (the parent gives self feedback and is prompted accordingly during the training), and problem-solving (the parent applies skills to solve problems, not the trainer).

Five licensed trainers led a total of 28 groups. These were usually conducted at the participating preschool. Licensing requires a thorough study of preparatory material followed by a 3-day intensive training seminar to provide the skills to implement the Triple P Level 4 parent training. An elaborate training manual was used to secure intervention integrity. The adherence to the manual according to post-session trainer ratings was greater than 90 % for all four sessions. Supervision was provided during regular weekly staff meetings and included the discussion of difficult situations from group sessions, coaching, and conducting role plays with alternative trainer behavior.

At least three out of four sessions were attended by 114 mothers, and at least one session was attended by 144 mothers (with 42 declining participation completely). Telephone advice was sought at least once by 101 parents. Thirty-nine percent of participants used the telephone session four times, 13 % three times, and 12 % twice or once, respectively. Fathers showed a pronouncedly lower participation rate with 69 % attending no session at all and only 6 % participating in at least three sessions. Ninety-one percent of participating families were satisfied with the training; 86 % liked the atmosphere during the group sessions, and 94 % rated the program as helpful.

Statistical Analysis

Our primary research interest was to assess universal application of the intervention, and therefore analyses focus on the main effect for the overall sample. Outcome ratings based on either parent were analyzed separately to investigate informant discrepancies (see De Los Reyes and Kazdin 2005). We used hierarchical linear modeling (Raudenbush and Bryk 2002) to account for the nested data structure, with longitudinal assessment data nested within subjects and subjects nested within preschools. This technique is more appropriate than the commonly used repeated-measures ANOVA; it allows within-subjects and/or between-subjects heterogeneity (Keselman et al. 2001) and explicitly models the covariance structure of the data (O’Connell and McCoach 2004). HLM is especially valuable when data are incomplete (e.g., when only pre- and post-data are available for a family) because it derives Bayes estimates for missing time points (Keselman et al. 2001). Therefore, the analysis is based on the intent-to-treat sample of n = 93 (including 31 single parents) participants assigned to the control group, and n = 185 (including 28 single parents) families randomized to the Triple P intervention based on their respective preschool. The model specified each individual’s outcome scores over time at Level 1. Following Osgood and Smith (1995), the outcome will be modeled as a function of a linear and quadratic time trend. The dummy variable pre–post indicates whether the outcome level is different between pre-intervention and post-intervention to follow-up after 4 years. Furthermore, in order to shed light on possible changes during the follow-up period, the dummy variable fu1-fu4 captures differences in score level after post-intervention assessment. At Level 2, the pre-test level (β0i) as well as post-treatment and follow-up change (β3i and β4i, respectively) are modeled as a function of group assignment (0 = control, 1 = intervention group), child gender (0 = boy, 1 = girl), their interaction with each other and family status (0 = two-parent household; single-parent household = 1). The linear and quadratic time effects (β1i, β2i) are treated as non-random, mainly to avoid a reduction in power as a consequence of many random effects in the model (Osgood and Smith 1995). Since these time trends are not of primary interest, they essentially serve as statistical controls. At level three, the effects of group assignment (γ01k, γ31k, and γ41k), child gender (γ02k, γ32k, and γ42k), and family status (γ04k, γ34k, and γ44k) are modeled as varying randomly between preschools. The aforementioned parameters are specified as randomly varying because variance attributable to other variables than those modeled is expected. In light of previous results from meta-analyses on parenting programs and from earlier assessment points of the present study demonstrating significantly different outcomes dependent upon the informant (i.e., mother, father, or observational data) and sometimes also child gender and single parenthood (i.e., two- versus single-parent household) on an individual level of analysis, we included both, single parenthood and child gender, as predictors on Level 2 of the HLM to control for their potential impact. The full model is available in the Electronic Supplementary Material. To obtain a quantification of change, within-group effect sizes (ES) were calculated from pre-test to follow-up after 4 years by dividing the difference between pre- and 4-year follow-up scores by the standard deviation of change scores (Rustenbach 2003; longitudinal effect gain). Data were analyzed using SPSS20.0 and HLM6.0 (Raudenbush et al. 2004).

Results

The results of the HLM analysis (i.e., the estimated parameters and significance levels) summarized below are provided in Table 1. Furthermore, Fig. 2 illustrates the course of means for child behavior and dysfunctional parenting, including their standard error separated by single parenthood. The respective figure for positive parenting behavior may be found in the supplementary material (Electronic Supplementary Material Figure A). Finally, means and standard deviations for all six assessment points may be found in the additional Table available as Electronic Supplementary Material to this manuscript.

Table 1 Parameter estimates and significance levels for child behavior and parenting outcomes derived by hierarchical linear modelling
Fig. 2
figure 2

Child Problem Behavior (upper series) and Dysfunctional Parenting Behavior (lower series) over the course of the study according to mothers and fathers

Differences at Baseline

Three significant pretest coefficients were obtained involving child sex and group (see Table 1). Child sex yielded small differences in child behavior and positive parenting behavior between boys and girls: Mothers of girls reported generally fewer child behavior problems and more positive parenting behavior at baseline. The third significant predictor was group with mothers from the Triple P preschools reporting slightly more child problem behavior than those from control preschools validating differences at baseline reported previously (Heinrichs et al. 2005, 2009). These baseline differences were controlled for in the subsequent analyses.

Child Problem Behavior

Significant change occurred during treatment as indicated by a significant group effect in maternal ratings of child behavior. Maternal child behavior scores decreased more in the intervention than in the control group. No further main effects of group, child gender, or single parenthood were found either from pre- to post-, or during the follow-up period (see Table 1). Fathers did not report significant change in child behavior at any time. The difference in pre-4-year follow-up ES revealed a small advantage for the intervention group for the total score of the CBCL completed by mothers (ES = 0.19). With single mothers differing in their report on child and parenting behavior at baseline, and our previous analyses of outcome at prior assessment points (not controlling for preschool variability or other predictors, such as child gender), we illustrate the course of means including their standard error in Fig. 2 separately for single- and two-parent households. The figure demonstrates that the slight short-term superiority of the intervention group for reduction in child behavior problems is caused by a decrease in child behavior problems in families from intervention preschools compared with an increase in these behaviors in families from control preschools, specifically in mothers and fathers from two-parent households. In contrast, single mothers seem to show a different pattern. The sample size was, however, very small, and the lack of a main effect of single parenthood indicates that this different course may either be random or may be a result of an interaction with other variables not controlled for in the present model. As can be seen in Fig. 2, there was considerable heterogeneity within each group of single mothers leading large confidence intervals. During follow-up, no clear differences were observed between families from intervention and control preschools.

Dysfunctional Parenting

On the Parenting Scale, there was a significant reduction of dysfunctional parenting from pre- to post-assessment in the intervention group. This effect was obtained for both mother and father report (see Table 1) and remained stable over the follow-up period (see also Fig. 2, lower series). During the follow-up period, the change from 1 to 4 years was greater for single mothers than for mothers from two-parent households. Figure 2 illustrates a decline in dysfunctional parenting for both intervention and control group single mothers during the follow-up while for mothers from two-parent households this decline was lower (combining means from intervention and control group). These figures potentially indicate that there was an interaction effect of group by single parenthood; however, the sample size was too small for testing such an effect. Finally, there were significant positive mean ES from pre- to follow-up after 4 years for both the control and the intervention groups. The difference in positive change for the intervention group compared with the control group was larger for both maternal (ESdiff = 0.24) and paternal (ESdiff = 0.18) ratings.

Positive Parenting

With respect to positive, warm parenting (PPQ), mothers in the Triple P intervention reported a lower decline (and therefore significant improvement) from pre to post but not in the control group, whereas father ratings did not reveal a significant group effect (Table 1; see also Figure A in Electronic Supplementary Material). In general, both parents reported a decrease in positive parenting across the 4 years of follow-up. Fathers’ data revealed that belonging to the preschools with a Triple P offer led to a significant lower decline in paternal positive parenting during this follow-up but not immediately during the intervention (see Table 1 and Figure A in Electronic Supplementary Material). Mothers of girls reported using more positive parenting strategies at post intervention than mothers of boys (independent of group membership). No further significant changes occurred. Effect sizes mirror the less pronounced decline in the intervention group resulting in ES favoring the intervention group over the control group with ESdiff = 0.38 for maternal ratings and ESdiff = 0.33 for paternal ratings. Confidence intervals did not overlap for mothers but did slightly overlap for fathers (see Electronic Supplementary Material). Focusing on single mothers, the figure (Figure A, Electronic Supplementary Material) illustrates a similar course of positive parenting scores across the 4 years in both groups, the intervention and control group, again suggesting the potential for interaction effects involving single parenthood (e.g., a group by single parenthood interaction). Finally, changes in dysfunctional parenting were significantly associated with changes in child problem behavior from mothers’ and fathers’ perspective (with r = 0.18, p = 0.006 and r = 0.20, p = 0.009, respectively).

Discussion

This study presents results from a randomized controlled trial of the Triple P Positive Parenting Program for families of preschoolers, delivered universally, with subsequent follow-ups at immediate post-intervention, 1-, 2-, 3-, and 4-year intervals. As with other studies on Triple P conducted outside of Germany (see, for example, Nowak and Heinrichs 2008), the present results document the impact of this prevention program on parenting competencies. It extends previous research by demonstrating that, even if administered universally to all parents, there was an impressive stability of reduced dysfunctional parenting behavior in intervention groups according to both informants (mothers and fathers) across 4 years. Furthermore, for mothers from Triple P preschools, Triple P significantly buffered a decline in positive parenting behavior, which occurred across time (and therefore with increasing child age) for families from the control preschools. Finally, parents from Triple P preschools also reported reduced maternally rated problem behavior in children during the intervention; however, there was no evidence for long-term effects on child behavior problems in this universal sample. With regard to child problem behavior, it is highly relevant to consider the overall (low) level of problem behavior in the sample. Child behavior was, for the large majority of families, in the normal range at all assessment points as is to be expected for a universal administration of a parenting program. Even if significant changes occurred (in the positive or negative direction), the vast majority still remained within the range of normal child behavior. Furthermore, the lack of long-term results in child behavior may also imply that the intervention was not sufficiently powerful to change child behavior in the long run.

Compared with the results of the only other randomized controlled study on a universally administered parent training for preschoolers in Germany (EFFEKT, Lösel et al. 2006; Stemmler et al. 2007) which was in part based on the Patterson (2005) Parent Management Training program, the present study also produced positive effect sizes for positive parenting according to both mothers and fathers (moderate range in the current study, compared with small effects across the first year after the intervention in Stemmler et al. (2007)). In terms of child behavior problems, in both studies, fathers reported no significant changes, supporting the dependency of outcome and effect sizes on the respective informant. However, the participation rate of fathers was also rather low in both trials. This challenges if and how much change based on father report may be expected. Indeed, the differential timeline of change in positive parenting may reflect a process in which fathers from Triple P preschools (usually not attending the parenting group) themselves observe changes after these occur at the parents’ home. Generalizing the parenting strategies to this setting may take more time and may also be more clearly visible to fathers than to mothers who may be more focused on this during the time of the intervention when they actually tried to change their parenting behavior.

Interestingly, child gender and single parenthood were significant predictors, mostly for initial baseline scores, but single parent status also predicted dysfunctional parenting behavior during follow-up, indicating that single mothers (independent of group membership) showed more reduction in dysfunctional parenting than two-parent mothers during the 4-year follow-up. We conclude that family status and child gender are potentially relevant variables in universal prevention efforts with the Triple P group program that may unfold their influence only in interaction with each other or even further variables. Testing these interactions in the group program therefore might be an important target for future research. Several meta-analyses have reported on single parenthood as a risk factor for worse outcome, and lower intervention effects were recently also again supported in a prevention trial by Gardner et al. (2009). Another alternative explanation may be that single parenthood is a time-varying state and changed during the course of the study. We assessed single parenthood independently at each assessment point, and the longer the follow-up the more change in family status occurred, specifically in those families where mothers were single parents at baseline. Therefore, the favorable developmental trajectory in single parents in the control group may also be due to the fact that these families were more likely to have changed their family status during the study. Future studies may investigate exactly these circumstances and their impact on program effects.

In the absence of intervention, positive parenting tends to decrease with time. Families may consider praise to be less desirable as a parenting strategy as children grow; they may expect children to comply and do well without being externally motivated to behave in this way. However, the present results encourage the continuous use of positive parenting strategies even when the child grows older. Similarly, the reduction of dysfunctional parenting behavior occurring from pre- to post-assessment in mothers from Triple P preschools was maintained up to 4 years after intervention. The stable pattern of change was specifically pronounced in mothers, and it is impressive that such a brief intervention can lead to these long-term changes in dysfunctional parenting which again were linked to positive changes in child behavior problems. This result encourages the implementation of evidence-based parenting programs as a universal prevention strategy.

Limitations of the present study include the sole reliance on self-report measures at the 4 year follow-up, which may be biased. For example, mothers’ positive reports on their parenting and on child outcome might be a justification of the effort they invested. However, this explanation for bias may not hold for fathers’ report (who did not invest much because they rarely attended the intervention). Furthermore, due to restrictions in power, there is a threat to the validity of the findings for single mothers of this study. We did not stratify single-parent status in the randomization, and significant pre-intervention differences in this variable occurred. Thus, the results with regard to single-parent households should be considered with caution, and future research may focus specifically on the impact of including single parents in parent groups with parents from two-parent households. Finally, 69 % of the population did not get involved in the project. While these rates are very common for universal prevention efforts with parents (Heinrichs et al. 2005), many parents were not reached with the current project (and therefore also not assessed regarding intervention efficacy). The present refusal rate (23 % of those offered the intervention) is difficult to put into context as there is little information available on refusal rates in universal prevention. However, as these parents were treated as if they participated in the intervention with the current analyses, concern about this potential bias in interpreting the outcome findings is reduced.

Strengths of the study are the very long follow-up period with a high retention rate across assessment points, a high participation rate of fathers in the assessment (not in the intervention), the use of psychometrically sound measures which are used across countries allowing comparability of this work internationally, and the universal implementation setting providing free-of-cost evidence-based parenting information to all families independent of risk. Furthermore, the analyses attempted to take into account the level of randomization and followed an intent-to-treat approach. The current method of analysis obtains appropriate estimates and standard errors for both individual-level and preschool-level covariates. It is therefore superior to prior analyses of data in this project.

Implications of Findings for Practice

The present findings support the positive effects of Triple P, which is one example of a parent training based on social learning principles. Similar parenting interventions, such as Parent Management Training, have also been established effective in Germany (in addition to the numerous efficacy trials conducted in the United States). However, few long-term outcome studies are available. This has fired the controversy of lack of benefit from behaviorally oriented parenting trainings in Germany, and sometimes even potential damage has been insinuated. For example, the time-out procedure has been discussed as potential child maltreatment if the child is sent to time out in a different room against his or her will, and it has been stated that behaviorally oriented parent trainings damage attachment (e.g., Deegener and Hurrelmann 2002; Deegener and Tschöpe-Scheffler 2004; Tschöpe-Scheffler 2004). With more evidence being published, it has now been agreed that these programs may be effective for children with conduct disorders but not for those without such a disorder (Tschöpe-Scheffler 2005). Furthermore, it has been suggested that, if not immediately after the intervention, parents may still notice potential adverse effects years later. The present study contradicts these assumptions, demonstrating that, even in the long-term, there was on average more benefit to participating families than damage. Clinicians may therefore use the Triple P model in consulting families even if these families do not report clinically elevated behavior problems. Furthermore, there are several studies with an assessment period of 2 years post-intervention using a public health approach with Triple P (e.g., Prinz et al. 2009). These demonstrate significant population-level effects on child malpractice and support the implementation of evidence-based parenting programs in large-scale public health initiatives. However, as always, the questions remain how much evidence is enough or needed to make a recommendation for a program? Indeed, science depends on replication. Results regarding Triple P have been replicated many times across different countries. In Germany, only two studies have been conducted (for the other one, see Heinrichs and Jensen-Doss 2010). Many of these trials demonstrate clear evidence for maternally reported positive change in parenting behavior. Is this enough? Likely not, as there are many other aspects to clarify and explore, such as the informant dependency of results, the difficulty to find significant effects when working with non-risk samples (and how to interpret no change in this context), the role of child gender and single parenthood, application to other age ranges, and so on. Compared with the current policy in developed countries to disseminate all kinds of parenting programs, most of which have never provided or even collected evidence for or against their effects, it is not surprising that reduction in child maltreatment indicators widely failed on a population level (Gilbert et al. 2012). Does it seem justifiable to at least hint at the potential of public health initiatives using evidence-based parenting information in light of this? The question remains open for discussion. We indeed argue that the use of evidence-based parenting information for public health initiatives would likely not solve all problems, but it is still worth a try.