Problem behavior often develops at a young age. A considerable number of children suffer from mental health problems. Prevalence figures show that between three and eighteen percent of children exhibit symptoms of psychopathology (Carter et al. 2010; Costello et al. 2003). Externalizing problems, such as oppositional defiant behavior, antisocial behavior, and attention difficulties, as well as internalizing problems, including separation anxiety, anxiety, and depressive symptoms, are most common in young children (Egger and Angold 2006; Klein et al. 2005; Lavigne et al. 2009). In addition, co-morbidity is quite common, especially with regard to young children (Lavigne et al. 2009; Scheeringa and Zeanah 2008).

It is important to be able to examine psychopathology at a young age, since high degrees of aggressive and oppositional behavior may become permanent and develop into chronic patterns of externalizing and psychopathological behavior at a later age (Reef et al. 2010). Problem behavior is associated with increased risks of poor academic, social and occupational performance, deteriorated physical and mental health, and substance use (Ansary and Luthar 2009; Bayer et al. 2011; Fergusson et al. 2005; Kim et al. 2008; Morcillo et al. 2011; O’Neill et al. 2011). When assessed early in development, interventions may contribute to the reduction of aggressive, oppositional and other externalizing behaviors, before these negative behavioral patterns become integrated into the child’s personality (Hill et al. 2004).

Several factors have contributed to the phenomenon that, in both research and clinical practice, the emphasis is on externalizing rather than internalizing problems. Probably, one major reason behind this is that externalizing behavior is easier to observe than internalizing behavior. Externalizing behaviors, such as tantrums and resistance against rules, are outwardly directed, generally troublesome for the environment, and often provocative in terms of negative feelings (Rubin and Mills 1990). On the other hand, internalizing problems are intra-individual in nature, inwardly directed, and more easily shielded from the environment by the child (Luby et al. 2009). These behaviors attract less attention and cause fewer problems for the child’s environment. Of course, a child may still experience such internalizing problems and suffer from them. Indeed, research shows that even young children report on internalizing problems (Luby 2010), and that these problems are related to negative developmental outcomes later in life, including recurrent depressive episodes, poor school performance, impaired functioning of peer and family relationships, and an increased risk of suicide (Bhatia and Bhatia 2007; Cicchetti and Toth 1998). The fact that internalizing problems at a young age are predictive of problems at a later age, stresses the need of early intervention (Bayer and Sanson 2003).

Yet, while 50 % of children expressing externalizing behaviors receive help, this is true for only 20 % of children suffering from internalizing problems (Merikangas et al. 2011). Some researchers suppose that internalizing problems are generally better recognized by children themselves than by other informants (Achenbach et al. 1987). In one respect, it is possible that an informant’s background distorts his/her perception of a child’s behavior, particularly when the behavior is more ambiguous, as is the case with internalizing problems (Kroes et al. 2003). For example, personality characteristics such as hostility and inadequate interpersonal sensitivity, are associated with reporting on internalizing problems. In another respect, it is likely that children behave differently in several environments (e.g., at home versus at school), which ensures that information derived from different informants is related to the specific context by definition. Hence, the problem with obtaining information from different informants is that these perceptions are context specific and biased by personal backgrounds (Los Reyes and Kazdin 2005a, b). Alongside conventional screening instruments that are used during the problem analysis phase in clinical practice, including the CBCL/TRF and SDQ (Achenbach and Ruffle 2000; Goodman et al. 2000), it seems worthwhile to pay attention to the possibility of adopting instruments that refer to the child as an informant. This is in accordance with the so-called ‘multi-informant approach’, in which it is recommended to take into account context (i.e., at home and elsewhere), and perspective (i.e., self and other), when selecting informants (Kraemer et al. 2003). By using self-report instruments, the risk of under-reporting of internalizing problems may be reduced and a more comprehensive picture of the existing problems will arise (Kraemer et al. 2003).

Screening instruments use self-reports of young children to a minor extent. Young children are not always considered reliable informants of their own behavior (Mutsaers 2009; Scheeringa and Haslett 2010). Children’s vocabulary and cognitive development may affect their understanding of questions and interfere with the duration of administration (Arseneault et al. 2005). Furthermore, it is doubted whether children are capable of self-perception, as this concept is related to cognitive development. Moreover, young children are very sensitive to suggestion, which makes interviewing children a challenge and requires specific interviewing skills. Still, already in the 80’s, Harter (1982) showed that children from the age of eight can meaningfully differentiate between various competence scales (cognitive, social, and physical competence, and general self-esteem). Measelle et al. (1998) stated that children’s self-perceptions can indeed be reliably measured by using an age-appropriate instrument. In clinical practice it is also known that children from 6 years can be interviewed as a part of the diagnostic cycle (Van Leeuwen 2002), thereby adding unique information to the diagnostic process. In the last few years, children’s self-reports are valued increasingly (Arseneault et al. 2005; Ialongo et al. 2001; Luby et al. 2007). Specific self-report questionnaires are available for children from 8 years onwards, such as the Child Depression Inventory (CDI; Kovacs 2001), Screen for Child Anxiety Related Emotional Disorders (SCARED; Birmaher et al. 1999), and Perceived Competence Scale for Children (PCSC; Harter 1982). However, in practice, there is no screening instrument available in the Netherlands, that uses children younger than 8 years old as informants for the assessment of their psychopathology. The Berkeley Puppet Interview (BPI; Measelle et al. 1998; Morris et al. 2002) is an interactive interviewing technique, developed in the USA and designed to elicit perceptions of 4, 5 to 8-year-olds in an age-appropriate way. During the BPI, children are interviewed by two hand puppets in order to simulate a conversation between three peers. Each time, these two hand puppets make opposing statements. For example, one puppet indicates: ‘I am a sad child’, whereas the other puppet states: ‘I am not a sad child’. Then, they ask the child together: ‘How about you?’. Influencing the child in the direction of the question that is asked by the interviewer is thus largely avoided (Fig. 1).

Fig. 1
figure 1

Pictures of the Berkeley Puppet Interview

In previous studies the BPI has proven to be reliable and valid (Ablow et al. 1999; Arseneault et al. 2005; Luby et al. 2007; Measelle et al. 1998; Morris et al. 2002; Ringoot et al. 2013). However, only one of these studies used longitudinal data and the sample of this study was rather small with less than 100 participants (Measelle et al. 1998). In addition, recent former studies investigated specific problem clusters of the BPI, such as conduct problems or depression (Arseneault et al. 2005; Luby et al. 2007), with one exception (Ringoot et al. 2013). Our aim is to investigate the BPI as a whole. Further, more research into the BPI’s psychometric properties may facilitate its use in clinical practice. As such, the BPI may be suitable for embedding into the diagnostic cycle. Clinicians naturally conduct interviews with children, and the BPI allows doing so in a standardized manner, without disregarding particular case-dependent questions. In addition, it is an age-appropriate instrument of which the administration will take less time than a diagnostic interview. Recently, the BPI was used as a research instrument as part of two large-scale studies in the Netherlands: the Kind in Zicht study (Stone et al. 2013a), and the Generation R study (Jaddoe et al. 2012). Kind in Zicht is a longitudinal research project on incipient emotional and behavioral problems in young children (Stone et al. 2013a). Generation R involves research into early influences on growth and development within a longitudinal multi-ethnic birth cohort (Jaddoe et al. 2012). For the BPI to be used in these studies, a Dutch version was developed in collaboration with the developers of the instrument.

In the present article, we introduce the Dutch version of the BPI as a useful instrument complementary to the diagnostics in the field of psychopathology, and we examine the test–retest reliability, and the congruent and concurrent validity of the BPI in the Kind in Zicht study. We expected the Dutch version of the BPI—like the American version—to be a reliable and valid instrument for self-reports of psychopathology in young children.

Method

Sample and Procedure

In this study, 300 children were interviewed during the first measurement (T1). One child was excluded due to missing data and another child because she was over 8 years old. One year later (T2), 288 of these children (96 %) were re-interviewed, of whom one was excluded because of her advanced age. This resulted in a sample of 298 children at T1, and 287 children at T2. Of these participating children, 50 % was male and the mean age was 6.95 years (SD = 1.13; range 5–8 years). The majority of the children was of Dutch origin (97.4 %) and grew up in a two-parent family (92.2 %). Teachers (T1 n = 282, T2 n = 245) and parents (T1 n = 289, T2 n = 269) completed questionnaires about the children at both time points. In addition, teachers (n = 287) and parents (n = 287) completed a questionnaire about the children 1 year before the interviews took place, and this measurement point is referred to as T0. At T0, the teachers’ mean age was 36.57 years (SD = 10.43), and 93.9 % of them was female. The parents who filled out the questionnaires were on average 38.29 years old (SD = 3.88), and 92.9 % of them was female. Over half of the parents were highly educated (54.8 %), 37.3 % had an intermediate education level, and 6.6 % lower education. Slightly over 1 % received some other type of education.

For the present study, longitudinal data (2011(T1)–2012(T2)) from the Kind in Zicht project were used (Stone et al. 2013a), which was approved by the committee on ethics. Within this project, information was collected about the individual children, using multiple informants. Informed consent from the children’s parents was obtained. Each year, the BPI was administered to the children by five certified master students or researchers. They all completed a training course in which the interviewing techniques of the BPI were extensively practiced. Subsequently, they each conducted eight practice interviews, and were then evaluated. The interviews were administered at primary schools in January and February of 2011 and 2012. Children were interviewed in an empty classroom to ensure confidentiality. Interviews were videotaped and after completion, the children received a pair of stickers to thank them for their participation.

Measures

BPI

The Berkeley Puppet Interview (BPI; Measelle et al. 1998) is an interactive and age appropriate interviewing technique, designed to elicit self-perceptions in 4.5–8 year-olds. During the interview, children were asked questions by two identical hand puppets: Iggy and Ziggy. Prior to the interview, the puppets introduced themselves and explained in a playful way how the interview is carried out. Using three practice items, the interviewer assessed whether the procedure was clear to the child, and continued with the actual interview or repeated the practice items until the procedure was clear. An example of such a practice item is: Puppet 1: ‘I like chocolate’, Puppet 2: ‘I do not like chocolate. How about you?’. Throughout the interview, the puppets exchanged opposing statements and then asked the child: ‘How about you?’. The puppet with which the child agreed repeated the response, thereby confirming the child’s answer.

After administration of the interviews, the children’s answers were coded by trained observers on a 7-point scale (see Fig. 2). Answers that reflected the absence of psychopathology were coded as either 5, 6, or 7, depending on possible amplifications or attenuations in the child’s response. Code 7 comprised the strongest absence of psychopathology (e.g., ‘I am never a sad child’), whereas code 6 meant a neutral absence (e.g., ‘I am not a sad child’), and code 5 represented a hesitant response (e.g., ‘Usually, I am not a sad child’). On the other side of the spectrum, code 1, 2, or 3 reflected the presence of psychopathology. Code 1 stood for a strong presence (e.g., ‘I am always a sad child’), while code 2 represented a neutral response (e.g., ‘I am a sad child’), and code 3 was equivalent to a hesitant response (e.g., ‘Usually, I am a sad child’). When a child was unable to choose between the two statements, this response was coded as 4. In order to test the reliability of the coding, 15 % of the interviews were double-coded.

Fig. 2
figure 2

Coding scale of the Berkeley Puppet Interview

The BPI includes 8 subscales (i.e., the symptom scales), that constitute the basis for two overall scales: internalizing problems and externalizing problems. The internalizing problems scale comprises three subscales: depression (7 items; e.g., ‘I am a sad child/I am not a sad child’), anxiety (7 items; e.g., ‘I do have many bad dreams/I do not have many bad dreams’), and separation anxiety (6 items; e.g., ‘When I am at school, I miss my mum or dad/When I am at school, I do not miss my mum or dad’). We used the internalizing problems scale, as well as the separate symptom scales. The externalizing problems scale also comprised three subscales: oppositional defiant behavior (6 items; e.g. ‘Sometimes I curse, or I use bad language/I do not curse, or use bad language’), behavioral problems (9 items; e.g., ‘Sometimes I act cruel towards animals/I do not act cruel towards animals’), and aggression and hostility towards peers [from here referred to as aggression] (6 items; e.g., ‘I often fight with other children/I do not fight with other children’). In addition, two subscales focus on relationships with peers: acceptance and rejection by peers [from here referred to as acceptance/rejection] (5 items; e.g., ‘Other children ask me to play along/‘Other children do not ask me to play along’), and being bullied (4 items; e.g., ‘Children hit me, or beat me up/Children do not hit me, or beat me up’). The negative and positive statements were presented in a random order. No Cronbach’s alpha’s will be reported regarding the BPI, since the interview is considered an index scale instead of a Likert scale, making it unsuitable for calculating this reliability coefficient (Stone et al. 2013b). The interrater reliability is reported in the results section.

SDQ

The Dutch parent and teacher version of the Strengths and Difficulties Questionnaire (SDQ) was used to assess internalizing and externalizing problems (van Widenfelt et al. 2003). The subscales measuring emotional problems (e.g., often unhappy, down-hearted or tearful) and behavioral problems (e.g., often lying or cheating) each consist of five items. Parents or teachers judged children on a 3-point scale, from 0 (not true) to 2 (very true). The scoring manual is available online (www.sdqinfo.com). In the Kind in Zicht study, the psychometric properties of the SDQ were adequate, as described elsewhere (Stone et al. 2013b).

CBCL/TRF

The Dutch versions of the Child Behavior Check List (CBCL) and Teacher Report Form (TRF) were also used (at T0) to measure internalizing and externalizing behavior, as reported by parents and teachers (Achenbach and Rescorla 2000; Achenbach and Rescorla 2001; Verhulst et al. 1997). The C-TRF and C-CBCL are intended for children aged 1.5–5 years and contain 100 items; the TRF and CBCL are intended for 5–18 year-olds and contain 118 items. The C-TRF and TRF were filled out by teachers, whereas the C-CBCL and CBCL were filled out by parents. Items were scored on a 3-point Likert scale, where 0 represents ‘not true’, and 2 stands for ‘very true or often true’. Three scales (i.e., somatic symptoms, anxious-depressed, and withdrawn) were combined in order to constitute the internalizing scale. Combining two scales (i.e., violation of rules and aggressive behavior) resulted in the externalizing scale. The psychometric properties of this instrument in the Kind in Zicht study were again adequate (Stone et al. 2013b).

Strategy for Analysis

First, descriptive statistics that provide insight into the level of psychopathology for the whole sample will be shown, disaggregated for gender and age group (4–5 and 6–7 years). Besides, an independent t test was conducted to test whether the mean scores of boys and girls, and younger and older children, respectively, differ statistically. Originally, the BPI is scored in such a way that higher scores reflect lower levels of psychopathology. In our opinion, this is somewhat confusing. For the sake of clarity regarding the interpretation, the scores were therefore coded the other way around (i.e., 1 becomes 7, and vice versa), such that higher means reflected higher levels of problem behavior. These reversed scores were used for calculating means and standard deviations.

Subsequently, the reliability of the BPI codes was examined using intra-class correlations and test–retest correlations. The intra-class correlation coefficient [ICC] was calculated to determine the reliability between two coders per BPI subscale. The higher the ICC, the more reliable the coding, where a score of 1 represents absolute agreement. ICC values of >.60 are considered good and values >.75 are considered excellent (Cicchetti et al. 2011). Pearson correlations were used for calculating test–retest correlations. These test–retest correlations were calculated for the entire group, and for gender and age separately.

In terms of validity, congruent validity was examined first by mutually correlating the BPI subscales. Additionally, concurrent validity was defined by correlating the BPI outcomes with the outcomes of the other questionnaires; again using Pearson correlations. When comparing the BPI with the SDQ and CBCL, the BPI subscales were ranged under two headings; the internalizing problems scale and the externalizing problems scale. These were compared with the emotional and behavioral problems scale of the parent and teacher versions of the SDQ. The CBCL also used an internalizing and externalizing problems scale, that was completed by both parents (CBCL) and teachers (TRF). Because of the ages of a restricted group of children, alternative versions were deployed; the C-CBCL and the C-TRF. In order to clearly show the possible similarities and differences between the BPI and CBCL, the standardized T-scores of the CBCL and C-CBCL, and those of the TRF and the C-TRF were combined.

Results

Descriptive Statistics

The descriptive statistics of the BPI subscales appear in Table 1. The mean scores on the subscales were low. T tests for paired observations showed that the mean scores of depression, separation anxiety, anxiety, behavioral problems, and being bullied, declined from T1 to T2. In addition, it was tested whether mean differences regarding age and gender at T1 and T2 were present. The t test for gender at T1 showed that there were statistically significant mean differences for separation anxiety (t(286) = −2.25, p < 0.05), aggression (t(289) = 3.56, p < 0.01), and acceptance/rejection (t(284) = 2.04, p < 0.05), but not for depression, anxiety, behavioral problems, oppositional defiant behavior, and being bullied. The mean scores of boys on the aggression and acceptance/rejection subscales were higher than those of girls, while girls scored higher on separation anxiety than boys. At T2, the t test for gender was statistically significant for the subscales separation anxiety (t(279) = −3.37, p < 0.05), oppositional defiant behavior (t(280) = 3.02, p < 0.05), behavioral problems (t(280) = 2.07, p < 0.05), aggression (t(279) = 3.96, p < 0.01), acceptance/rejection (t(280) = 2.49, p < 0.05), and being bullied (t(279) = 2.39, p < 0.05), but not for depression and anxiety. Mean scores of boys at T2 were higher than those of girls on the subscales oppositional defiant behavior, behavioral problems, aggression, acceptance/rejection, and being bullied, whereas girls reported higher scores on separation anxiety than boys. In conclusion, boys generally reported more externalizing problems than girls at both time points.

Table 1 Descriptive statistics of the BPI subscales at T1 and T2

As regards the t test for age, mean scores for depression (t(282) = 2.46, p < 0.05) and acceptance/rejection (t(276) = 2.22, p < 0.05) were found to be higher for younger children as opposed to older children at T1. At T2, younger children also reported more symptoms of depression (t(273) = 2.76, p < 0.01), as well as aggression (t(272) = 2.12, p < 0.05), and they indicated to be bullied more than older children (t(272) = 3.77, p < 0.01).

Intra-class Correlations

The following ICC’s were obtained for the separate subscales, for T1 and T2 respectively: depression (.74, .86), anxiety (.70, .80), separation anxiety (.70, .83), oppositional defiant behavior (.66, .71), behavioral problems (.81, .66), aggression (.78, .77), acceptance/rejection (.82, .82), and being bullied (.74, .88). These correlations indicated that the BPI subscales can be reliably coded by multiple coders.

Test–Retest Reliability

In Table 2, the results with regard to test–retest reliability, with a time interval of 1 year, are presented. These showed that, overall, the psychopathology self-reports as provided by the children were rather stable. Boys appeared to report somewhat less stable than girls, in terms of oppositional defiant behavior, behavioral problems, and being bullied. Moreover, the correlations regarding depression, separation anxiety, acceptance/rejection, and being bullied were less pronounced in young children than in older children. The test–retest reliability of these scales thus increased with age.

Table 2 Longitudinal associations of the BPI subscales by gender and age-group

Congruent Validity

As is apparent from Table 3, the BPI subscales correlated significantly at T1 and T2. The correlations were weak to moderate, and the pattern of correlations was as expected; the reports of certain types of problem behaviors were associated with the reports of other types of problem behaviors (e.g., anxiety was correlated with depression). The internalizing subscales depression, separation anxiety, and anxiety, correlated weakly with the externalizing subscales oppositional defiant behavior, behavioral problems, and aggression. The correlations between the internalizing subscales themselves were stronger, especially between anxiety and depression, and anxiety and separation anxiety. Furthermore, oppositional defiant behavior, behavioral problems, and aggression correlated relatively strongly with one another. Acceptance/rejection correlated predominantly with depression and oppositional defiant behavior, and to a lesser extent with behavioral problems, aggression, and anxiety. The subscale being bullied was correlated with all other subscales. In summary, various problem behaviors were meaningfully intercorrelated within this young age group.

Table 3 Correlations among the BPI subscales at T1 and T2

Concurrent Validity

The externalizing subscales of the BPI and the SDQ were correlated at T1 and T2, concerning both parents and teachers (see Table 4). The more externalizing problems the children reported, the more behavioral problems parents and teachers reported likewise. It is noteworthy that the internalizing subscales of the BPI and the SDQ correlated to a lesser extent than the externalizing subscales. In order to explain this difference, the individual internalizing BPI subscales (i.e., anxiety, depression, and separation anxiety), were correlated to the SDQ emotional problems scale score. Depression, separation anxiety, and anxiety were uncorrelated with emotional problems as reported by teachers at T1: r(277) = .09, n.s.; r(277) = .05, n.s.; r(277) = .09, n.s., respectively. Similarly, separation anxiety (r(237) = .11, n.s.) and anxiety (r(237) = .10, n.s.) did not correlate with emotional problems as reported by teachers at T2, but depression did: r(238) = .21, p < .01.

Table 4 Correlations among the BPI subscales and the SDQ T1 and T2 scale scores

As for the parents as informants, it was noticed that whereas at T1 the internalizing BPI scale was correlated with emotional problems, it was no longer at T2. Next, idem, the separate BPI subscales were correlated to the SDQ emotional problems scale score. At both time points, no correlation was found between separation anxiety and emotional problems (T1: r(280) = .04, n.s.; T2: r(255) = .02, n.s.) and between anxiety and emotional problems (T1: r(276) = .08, n.s.; T2: r(255) = .02, n.s.). Depression was found to be associated with emotional problems at both T1 and T2 (T1: r(280) = .12, p < .05; T2: r(256) = .17, p < .01). From these results, we can conclude that children’s self-reports of depression corresponded to some extent to the emotional problems reports by teachers and parents; the more emotional problems teachers and parents reported, the more depression children reported. However, children’s self-reports of anxiety and separation anxiety did not correspond to teachers’ and parents’ reports of emotional problems.

The BPI subscales measured at T1 have also been compared with the CBCL/TRF scale scores at T0. Children’s self-reported internalizing problems did not correlate with parent’s and teachers’ reported problems (r(278) = −.00, n.s.; r(286) = −.02, n.s., respectively). However, the correlations between children’s self-reports and the reports of their parents (r(279) = .20, p < .01) and teachers (r(287) = .14, p < .05) on externalizing problems were significant. Children’s reports regarding internalizing problems were not correlated with the reports of parents and teachers about the children’s behaviors in the previous year, while children’s reports regarding externalizing problems were.

Discussion

At present, no standardized instrument is available in the Netherlands for measuring self-perceptions of problem behavior in young children (Mutsaers 2009). This is problematic, since it is known that there may be great differences in reports of parents and teachers about children’s behaviors (Los Reyes and Kazdin 2005a, b). As a consequence, certain problem behaviors may not be recognized. Therefore, it is important that attention is paid to self-reports of problem behavior by young children. In this article, the Dutch version of the Berkeley Puppet Interview (BPI) was presented, which is a standardized and age-appropriate instrument for interviewing young children about their self-perceptions of problem behaviors. In addition, several psychometric properties of the BPI were presented.

We expected that the results regarding reliability and validity would be consistent with earlier research into the BPI. The results suggest that the BPI scales can be sufficiently reliably coded, that the subscales are correlated after 1 year, and that the subscales are meaningfully intercorrelated, which indicates congruent validity. The analyses concerning the intra-class correlation coefficients and test–retest reliability imply that the BPI is a consistent, reliable interviewing method. Though, it should be noted that the intra-class correlation for oppositional defiant behavior were somewhat lower. The interpretation of the results of this subscale should be interpreted with some caution. Still, even after a 1-year interval, during which, of course, not only reliability was assessed, but also development, there appeared to be clear patterns in the behaviors children report. The test–retest coefficients are not as high as typically found in studies that focus on adults, but are similar to other studies investigating the BPI’s psychometric properties (Measelle et al. 1998). Furthermore, theoretically speaking, it was to be expected that the BPI subscales were meaningfully interrelated. This indicates that the BPI seems to measure the constructs that are intended to be measured. However, for determining congruent validity, it is also necessary that the BPI will be compared to external measures, such as standardized tests that assess school performance. Although children are sometimes still not considered reliable informants of their own problems (Mutsaers 2009; Scheeringa and Haslett 2010), the results of this study seem to indicate the opposite. This is in line with other studies that have been conducted into the BPI (Arseneault et al. 2005; Luby et al. 2007; Measelle et al. 1998), and with recommendations to clinicians, that children from the age of 6 years can be interviewed as part of the diagnostic cycle (Van Leeuwen 2002). The comparison of the BPI with the SDQ and CBCL/TRF, shows that differences between reports of multiple informants are indeed great. It is important to note that comparing scores on the BPI on the one hand, and the SDQ and CBCL/TRF on the other hand is difficult, given the nature of the instruments; an interviewing technique versus a questionnaire. In spite of this difference in method, the correlations between comparable concepts measured using the BPI and SDQ or CBCL/TRF, remain weak.

This phenomenon, ‘informant disagreement’, is a well-known issue when comparing reports from multiple informants (Los Reyes and Kazdin 2005a, b). As expected, the agreement was greater in terms of externalizing behavior, than with respect to internalizing behavior, although the agreement on externalizing behavior was also very low. These results underscore that reports of problem behavior by parents and teachers cannot simply be regarded as corresponding to children’s perceptions (Achenbach et al. 1987; Los Reyes and Kazdin 2005a, b), particularly when it comes to reporting internalizing problems, where agreement between children and parents and teachers was very limited (Achenbach et al. 1987). These results also imply that child reports provide important information additional to the process of information gathering in the problem analysis phase. In this respect, the BPI could be a useful instrument. Based on the current state of research into the BPI, however, clinicians are recommended to also keep in mind the limitations of the BPI, when using this instrument. It is not recommended to use the BPI as a single instrument, but it seems suitable for gaining more insight into certain symptoms and for confirming or rejecting hypotheses regarding a child’s symptoms. In addition to the BPI, another promising instrument is available for children aged 6–11 years old: the Dominic Interactive (DI; Valla 2000; Kuijpers et al. 2013). The DI is a structured digital questionnaire that assesses the most common internalizing and externalizing problems in children. It takes into account the child’s developmental level, by means of supporting the questions by visual and auditory stimuli. The item is both displayed through an image of the problem situation, and made audible by being read out loud by the program.

Limitations and Future Directions

The present study showed that the BPI has adequate psychometric properties, although we believe that more research into the internal structure of the BPI is necessary and highly recommended for further research. A recent study did confirm the internal structure of the BPI and reported Cronbach’s alpha’s for the subscales (Ringoot et al. 2013). Yet, a thorough test of the internal structure of the BPI is hampered by the bimodal frequency distribution, and in our opinion as such, not suitable for the execution of conventional reliability analyses, such as calculating Cronbach’s alpha and testing the factor structure. The BPI thus appears to be a sound and useful instrument which could be used in child and youth care. Still, it is important that, in the future, the experiences using the BPI in clinical practice, and its functioning in a clinical setting will be explored. After all, little is known about using the BPI in clinical practice. Thus far, the results that have emerged from studies into the BPI are promising (Arseneault et al. 2005; Measelle et al. 1998; Ringoot et al. 2013), and suggest that the BPI can constitute a valuable supplement to youth care practices. When research from a clinical setting on use of the BPI is available, it may possibly be embedded in evidence-based practice (Mash and Hunsley 2005). In conclusion, by means of this article we hope to have provided greater BPI publicity, to allow for optimal utilization of this instrument within youth care.